AI-Generated Rap Songs

0
746

Share on LinkedIn

I often tell my younger coworkers that the most boring way to start a blog post is, “This post is about …” — unless of course you rap it!

Yo!
This post is about generating free text
with a deep learning network
particularly it is aboutBrick X6,
Phey, cabe,
make you feel soom the way (I smoke good!)
I probably make (What?)
More money in six months,
Than what’s in your papa’s safe (I’m serious)
Look like I robbed a bank (Okay Okay)
I set it off like Queen Latifah
‘Cause I’m living single I’m feeling cautious
I ain’t scream when they served a subpoena (Can’t go back to jail)
I heard that he a leader
(Who pood, what to be f*****’ up
The baugerout Black alro Black X6,
Phantom White X6 looks like a panda
Goin’ out like I’m Montana
Hundred killers, hundred hammers Black X6,
Phantom White X6, panda
Pockets swole, Danny
Sellin’ bar, candy
Man I’m the macho like Randy
The choppa go Oscar for Grammy
B**** n**** pull up ya panty
Hope you killas understand me
Hey Panda, Panda Panda,
Panda, Panda, Panda, Panda
I got broads in Atlanta
Twistin’ dope, lean and the Fanta
Credit cards and the scammers
Hittin’ off licks in the bando

You’d think I can rap. I cannot. The rap song above was written by my deep learning rap-trained neural network.

Generating Free Text with LSTM Networks

I do know that recurrent neural networks (RNN) have been successfully experimented in generating free text [i][ii]. The most common neural architecture for free text generation relies on at least one long short term memory (LSTM) layer.

In a previous post on the KNIME blog, “Once upon a time … by LSTM Network,” K. Melcher[iii]trained an LSTM-based RNN with texts from the Brothers Grimm’s fairy tales downloaded from the Project Gutenberg site.

The network consisted of only three layers: an input layer, an output layer, and, in between, an LSTM layer (Fig. 1).

The network was trained at the character level. That is, sequences of mcharacters were generated from the input texts and fed into the network.

Each character was encoded using the hot-zero encoding; thus, each character was represented by a vector of size n, where nis the size of the character dictionary of the input text corpus.

The full input tensor with size [m, n]was fed into the network. The network was trained to associate the next character at position m+1to the previous mcharacters.

All of this leads to the following network:

* The input layer with nunits would accept [m, n]tensors, where nis the size of the input dictionary and mthe number of past samples (in this case characters) to use for the prediction. We arbitrarily chose m=100, estimating that 100 past characters might be sufficient for the prediction of character number 101. The dictionary size n, of course, depends on the input corpus.
* For the hidden layer, we used 512 LSTM units. A relatively high number of LSTM units is needed to be able to process all those (past mcharacters – next m+1character) associations.
* Finally, the last layer included nsoftmax-activated units, where nis again the dictionary size. Indeed, this layer is supposed to produce the array of probabilities for each one of the characters in the dictionary — therefore, the noutput units, one for each character probability.

Figure 1. The deep learning LSTM-based neural network we used to generate free text. ninput neurons, 512 hidden LSTM units, an output layer of nsoftmax units where nis the dictionary size, in this case the number of characters used in the training set.

Notice that in order to avoid overfitting, an intermediate dropout layer was temporarily introduced during training between the LSTM layer and the output dense layer. A dropout layer chooses to remove some random units during each pass of the training phase[iv]. The dropout layer was then removed for deployment.

The KNIME Keras Integration

I am not a Python programmer, and unfortunately, I cannot take the time to learn how to implement and train a deep learning neural network using TensorFlowor even Keras in Python.

However, I am in luck! KNIME Analytics Platform offers a GUI-based integration of Keras and TensorFlow. Indeed, the KNIME deep learning extension integrates functionalities from Keras libraries, which in turn integrate functionalities from TensorFlow in Python (Fig. 2).

More information on how to install the KNIME Deep Learning –Keras Integration and Python, and on how to connect the two, is available in the KNIME Deep Learning – Keras Integration documentation page.

The advantage of using the Keras integration within KNIME Analytics Platform is the drastic reduction in the amount of code to write. A number of Keras library functions have been wrapped into KNIME nodes, most of them providing a visual dialog window and a few of them allowing for the integration of additional Keras/TensorFlow libraries via Python code.

Figure 2. The deep learning integration in KNIME Analytics Platform 3.7 encapsulates functions from Keras built on top of TensorFlow in Python.

The KNIME Workflow

Another advantage of using KNIME Analytics Platform for my data analytics project is the bonanza of available example workflows, all free to download and scoping a large variety of use cases.

The KNIME workflows used in the blog post “Once upon a time… by LSTM Network” are no exception. These workflows, which train and deploy an LSTM-based recurrent neural network to generate free text from the Grimm’s fairy tales, are available and downloadable for free from the KNIME EXAMPLES server under:
04_Analytics/14_Deep_Learning/02_Keras/11_Generate_Fairy_Tales.

Figure 3. The example workflow adopted in the blog post “Once upon a Time … by LSTM Network” as it appears on the KNIME EXAMPLES server under 04_Analytics/14_Deep_Learning/02_Keras/11_Generate_Fairy_Tales.

I, of course, tried to save time and downloaded the workflows to adapt them for this particular project of text generation. The folder 11_Generate_Fairy_Tales contains two workflows (Fig. 3):

* 01_Training –trains the LSTM-based neural network
* 02_Deployment –applies the model and generates the free text

I adapted the text preprocessing part and the neural structure to my new corpus of rap songs to obtain the following two workflows.

Figure 4. The training workflow trains a (62 inputs – 512 LSTM – 62 outputs) neural network to predict the next character in the text based on the previous 100 characters. The training set consists of 23 popular rap songs. Thus, the network learns to build words and sentences in a rap song style.

Figure 5. The deployment workflow takes 100 characters as a start and generates the next character in a loop until 1,000 characters are generated, making the full final rap song.

Notice that if the training set is large, this network can take quite a long time to train. It is possible to speed it up by pointing KNIME Analytics Platform to a Keras installation for GPUs.

AI-Generated Rap Songs

I collected texts from 23 popular rap songs (Fig. 6) for a total of 1,907 sentences. Characters were index-encoded and then hot-zero encoded to feed the neural network. A past of 100 characters was used to predict the next character in the sentence. The network was trained on 100 epochs.

Figure 6. The list of 23 rap songs used as training set.

To create a new rap song, we need a sequence of 100 starting characters. The network will then take it from there and create the remaining text in a rap style. Any 100-character sequence would work.

We could start with the first 100 characters of an existing rap song and see how the AI deep learning network would continue it. Rap on rap, however, might not show the big change in free text style.

We could start with the opening line of a fairy tale and observe the transition from sweet and slow to angry and hectic. We took the opening line of “The Golden Bird” from the Grimm’s fairy tale. The result might deserve the title “Snow White Gone Sour.”

A certain king had a beautiful garden,
and in the garden stood a tree
which bore golden apples.
The Pouble trap now my heart
is with whoever I’m st’eppin’
And the motherf*****’ weapon is kept in
A stash box, for the so-called law
Wishin’ Ren was a n***** that they never saw
Lights start flashin’ behind me
But they’re scared of a n****
so they mace me to blind me
But that s*** don’t work, I just laugh
Because it gives em a hint, not to step in my path
Forging as a right
If I had a billion dollars (yeah!)
Yeah, I’d spend as a s***
Mingap notsin’ bodies to this bants a** makin’ shot
But in the Pusso
Beathe from the one that b**** workne
on my tent to the money at vioks
The upbor come from out on tha
why I’m here and I can’t come back home
And guess when I heard that?
When I was back home
Every interview I’m representing you,
making you proud
Reach for the stars so if you fall,
you land on a cloud
Jump in the crowd,
spark your lighters,
wave ‘em around
If you don’t know by now,
I’m talking ‘bout Chi-Town!
I’m coming home again
Do you think about me now and then?

In the rap song above, you can see some overfitting effect due to the too small training set. Indeed, you can recognize pieces from the songs in the training set. However, the result seems quite satisfactory.

Give deep learning a 100 character starting line, and it can rap it!

[i]Ian Goodfellow, Yoshua Bengio and Aaron Courville, “Deep Learning”, The MIT Press, 2016
[ii]J. Brownlee, “Crash Course in Recurrent Neural Networks for Deep Learning,” Machine Learning Mastery, 2016
[iii]K. Melcher, “Once upon a time… by LSTM Network,” KNIME blog, 2018
[iv]A. Budhiraja, “Dropout in (Deep) Machine Learning,” blog post on Medium, 2016

As first published in Data Science Central.

Rosaria Silipo, PhD
Rosaria Silipo, Ph.D., principal data scientist at KNIME, is the author of 50+ technical publications, including her most recent book "Practicing Data Science: A Collection of Case Studies". She holds a doctorate degree in bio-engineering and has spent 25+ years working on data science projects for companies in a broad range of fields, including IoT, customer intelligence, the financial industry, and cybersecurity.

ADD YOUR COMMENT

Please use comments to add value to the discussion. Maximum one link to an educational blog post or article. We will NOT PUBLISH brief comments like "good post," comments that mainly promote links, or comments with links to companies, products, or services.

Please enter your comment!
Please enter your name here