Artificial Intelligence - Music Generator

Introduction

In this project we will be building a model capable of generating notes and chords after learning from the dataset of songs we provide to our recurrent neural network and create songs. Before we start, let us recall few of the basic concepts and terminologies we will be using in this project and add to that knowledge the concepts required to successfully train our model to be an excellent composer.

Technical Terminologies

Recurrent Neural Networks (RNN): A recurrent neural network is a class of artificial neural networks that make use of sequential information. They are called recurrent because they perform the same function for every single element of a sequence, with the result being dependent on previous computations.
Long Short-Term Memory (LSTM): A type of Recurrent Neural Network that can efficiently learn via gradient descent. Using a gating mechanism, LSTMs are able to recognise and encode long-term patterns. LSTMs are extremely useful to solve problems where the network has to remember information for a long period of time.
Music21: A Python toolkit used for computer-aided musicology. It allows us to teach the fundamentals of music theory, generate music examples and study music.
Keras: A high-level neural networks API that simplifies interactions with Tensorflow.

Musical Terminologies

Note: Note is a small bit of sound, similar to a syllable in spoken language.
Chord: Any harmonic set of pitches/frequencies consisting of multiple notes that are heard as if sounding simultaneously.
Pitch: The frequency of the sound, or how high or low it is and is represented with the letters [A, B, C, D, E, F, G], with A being the highest and G being the lowest.
Octave: Which set of pitches you use on a piano.
Offset: Where the note is located in the piece.
Lofi Hip/Hop: Lo-fi Hip Hop refers to a subliminal genre of music that fuses traditional hip-hop and jazz elements to create an atmospheric, soothing, instrumental soundscape. It is characterized by the high-utilization of elements such as introspection, mellow tunes, and Japanese anime.

Now that we have brushed up on the basic terms used in this project, let us start preparing our data.

Preparing Data

Let us start by importing all the libraries that we will be using in our project.

import numpy as np 
import os
import tensorflow as tf
import pickle # serializing and de-serializing a Python object structure
from music21 import converter, instrument, note, chord

music21

music21.converter contains tools for loading music from various file formats, whether from disk, from the web, or from text, into music21.stream.:class:~music21.stream.Score objects (or other similar stream objects).
music21.instrument represents instruments through objects that contain general information such as Metadata for instrument names, classifications, transpositions and default MIDI program numbers. It also contains information specific to each instrument or instrument family, such as string pitches, etc.
music21.note contains classes and functions for creating Notes, Rests, and Lyrics.
music21.chord defines the Chord object, a sub-class of General Note as well as other methods, functions, and objects related to chords.

GREAT!!! , We have imported all the libraries required in our project.

Now, Let us define a function to get all the notes and chords in our midi files. The function will take the directory where the files are stored as an argument and return all the notes.

def get_notes(dir=None):
    '''
    Get all the notes and chords from the midi files in the directory
    '''
    notes = []

    filepaths = os.listdir(dir)

    for file in filepaths:
        midi = converter.parse(dir+"/"+file) #loading each file into a Music21 stream object
        parsed_note = None

        try:    #file has instrument parts
            meta = instrument.partitionByInstrument(midi)
            parsed_note = meta.parts[0].recurse()
        except: # file has notes in a flat structure
            parsed_note = midi.flat.notes

        for element in parsed_note:
            if isinstance(element, note.Note): #The isinstance() function returns True if the specified object is of the specified type, otherwise False.
                notes.append(str(element.pitch))
            elif isinstance(element, chord.Chord):
                notes.append('.'.join(str(n) for n in element.normalOrder)) #Chord.normalOrder Return the normal 
                                    #order/normal form of the Chord represented as a list of integers.
                                    #append every chord by encoding the id of every note in the chord together into a single string, 
                                    #with each note being separated by a dot. 
                                    #These encodings allows us to easily decode the output generated by the network into the correct notes and chords.

        with open('data/notes', 'wb') as filepath:
            pickle.dump(notes, filepath)

    return notes

Now that we have extracted all the notes from our training dataset let us define a function to create our input and output sequences.

def sequence():
    '''
    create input sequences for the network and their respective outputs. 
    The output for each input sequence will be the first note or chord that 
    comes after the sequence of notes in the input sequence in our list of notes.
    '''
    sequence_len = 100

    pitch = sorted(set(notes)) #get all pitches

    # create a dictionary to map pitches to integers
    int_note = dict((note, number) for number, note in enumerate(pitch))

    net_in =  [] #Network Input
    net_out = [] #Network Output

    # create input and output sequences
    for i in range(0, len(notes)-sequence_len):
        seq_in   = notes[i:i+sequence_len]
        seq_out  = notes[i+sequence_len]

        sequence=[]
        for note in seq_in:
            sequence.append(int_note[note])

        net_in.append(sequence)
        net_out.append(int_note[seq_out])

    n_patterns=1800

    # reshape the input into a format compatible with LSTM layers
    net_in =  np.reshape(net_in, (n_patterns, sequence_len, 1))

    # normalize input
    #net_in = net_in / float(n_vocab)

    net_out = tf.keras.utils.to_categorical(net_out) #Converts a class vector (integers) to binary class matrix.

    return (net_in,net_out)

The above function will creates sequences of notes and chords of length 100 to be used to train our model and the model will predict the next note to be used after the sequence in order to chain the sequences and create an entire song by iterating over this process.

Model

Now that we have created our input and output sequences, let us start building our model.

def create_model():
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.LSTM(
        units = 512, #Positive integer, dimensionality of the output space.
        input_shape=(net_in.shape[1], net_in.shape[2]),
        recurrent_dropout=0.3, # Fraction of the units to drop for the linear transformation of the recurrent state.
        return_sequences=True # Whether to return the last output in the output sequence, or the full sequence.
    ))
    model.add(tf.keras.layers.LSTM(512, return_sequences=True, recurrent_dropout=0.3,))
    model.add(tf.keras.layers.LSTM(512))
    model.add(tf.keras.layers.BatchNormalization()) #Layer that normalizes its inputs.
    model.add(tf.keras.layers.Dropout(0.3)) #Applies Dropout to the input.
    model.add(tf.keras.layers.Dense(254))
    model.add(tf.keras.layers.Activation('relu'))
    model.add(tf.keras.layers.BatchNormalization())
    model.add(tf.keras.layers.Dropout(0.3))
    model.add(tf.keras.layers.Activation('softmax'))

    model.compile(loss='categorical_crossentropy', optimizer='rmsprop')

    return model

In our model we use four different types of layers:

LSTM layers: A Recurrent Neural Net layer that takes a sequence as an input and can return either sequences (return_sequences=True) or a matrix.

Dropout layers: A regularisation technique that consists of setting a fraction of input units to 0 at each update during the training to prevent overfitting. The fraction is determined by the parameter used with the layer.

Dense layers: A fully connected neural network layer where each input node is connected to each output node.

Activation layer: Determines what activation function our neural network will use to calculate the output of a node.

After creating our model we can start training it on our dataset. We will be defining a function to train our model

def train_model(model, net_in, net_out):
    '''
    Training your neural network
    '''
    filepath="weights-{epoch:02d}-{loss:.4f}-bigger.hdf5"
    checkpoint = tf.keras.callbacks.ModelCheckpoint( #Callback to save the Keras model or model weights at some frequency.
        filepath,
        monitor='loss',
        verbose=0,
        save_best_only=True,
        mode='min'
    )
    callbacks_list = [checkpoint]

    model.fit(net_in, net_out, epochs=100, batch_size=128, callbacks=callbacks_list) #Train the model

    return model

While training, we will be using the checkpot callback provided by keras to store the weights and choose the most optimal weight later while making predictions on our generated input sequences.

Generate Song

Now that our model is trained we can start generating our songs. Let us define a function that will generate songs for us using our trained model.

def generate_notes(model, network_input, notes, n_vocab):
    """ 
    Generate notes from the neural network based on a sequence of notes 
    """

    # pick a random sequence from the input as a starting point for the prediction
    start = np.random.randint(0, len(network_input)-1)
    pitch = sorted(set(notes))


    int_to_note = dict((number, note) for number, note in enumerate(pitch))

    pattern = network_input[start]
    prediction_output = []

    # generate 500 notes
    prediction_input = np.reshape(pattern, (1, len(pattern), 1))
    prediction_input = prediction_input / float(n_vocab)

    predictions = model.predict(net_in[100:400], verbose=0)
    for i in range(len(predictions)):
        index = np.argmax(predictions[i])
        result = int_to_note[index]
        prediction_output.append(result)

    return prediction_output

The function takes our trained model, network input and the array of notes we extracted as input, generates the song using the trained model and returns the output.

Stream Music

Now that our model has done its job, let us write it's output in midi format to our mentioned directory so we can start listening to it.

from music21 import stream

def create_midi(prediction_output):
    """ 
    convert the output from the prediction to notes and create a midi file
    from the notes 
    """
    offset = 0
    output_notes = []

    # create note and chord objects based on the values generated by the model
    for pattern in prediction_output:
        # pattern is a chord
        if ('.' in pattern) or pattern.isdigit():
            notes_in_chord = pattern.split('.')
            notes = []
            for current_note in notes_in_chord:
                new_note = note.Note(int(current_note))
                new_note.storedInstrument = instrument.Piano()
                notes.append(new_note)
            new_chord = chord.Chord(notes)
            new_chord.offset = offset
            output_notes.append(new_chord)
        # pattern is a note
        else:
            new_note = note.Note(pattern)
            new_note.offset = offset
            new_note.storedInstrument = instrument.Piano()
            output_notes.append(new_note)

        # increase offset each iteration so that notes do not stack
        offset += 0.5

    midi_stream = stream.Stream(output_notes)

    midi_stream.write('midi', fp='test_output3.mid')

Congratulations!!!

Our model has successfully created a song.

https://soundcloud.com/user-467169078/ai-lofi?utm_source=clipboard&utm_medium=text&utm_campaign=social_sharing

Omkar's Blog