Making Music With Computers

Table of Contents

  1. Introduction
    1. Project Breakdown
  2. Pure Data
    1. Basics
    2. Resources
  3. DSP
    1. Sound
    2. Harmonics and Music Theory
    3. MIDI
    4. Polyphony
  4. Synthesis Algorithms
    1. ADSR Envelope
    2. Samplers
    3. Karplus-Strong String Synthesis
  5. Making the Glove
    1. Sewing
    2. Circuitry
    3. Cables
  6. Making the Synthesizer
    1. Sampler
    2. Karplus-Strong
    3. Oscillator
    4. UI and Cleanup
    5. Interfacing with Arduino
    6. Playing Music
  7. Conclusion


What is sound? What is music? What makes music sound good? How can we create sound? What makes a violin sound like a violin, and a piano like a piano? These questions inspired my interest in this project. I wanted to understand and appreciate how sound and music work so that I can make it on my own not through engineering and music theory, but through technology. Through this project, I hope to help bridge the gap between music and technology.

This documentation answers many of these questions and documents my project. The project comprises of four parts.

The goal of this documentation is to:

Project Breakdown

I built a glove interface and a synthesizer.


Then, I composed a piece of music with my friend (Sam Markowitz) and played it exclusively using instruments generated through the synthesizer. You can find it as an MP3 download here.

Below is a video showing the various components of the project.

Pure Data


Pure Data is a visual programming language. In Pure Data, or pd, the fundamental objects of calculation are not variables and expressions, but rather boxes and lines. Boxes represent any number of objects, including traditional variables and expressions but also encompassing MIDI input, audio control, and more. Lines represent connections between these objects.


In the image above, we see how easy it is for pd to generate a tone at 400 Hz, a task which is very difficult in any traditional programming language. Pd also lets us change the tone by changing the frequency in real time.

pd is designed for use in live environments, in particular with music. That is why I chose to use it for this project. It works well with realtime-generated signals. For instance, if I want to generate a constant sound and vary its characteristics in realtime, this is very easy to do in pd. On the other hand, some extremely simple functions present in normal languages are difficult to deal with in pd, such as for loops.


I will not give a description of how Pure Data works because it is much better documented elsewhere. However, pd is old and relatively obscure software, so it can be difficult to find proper documentation. Instead, I will point to helpful resources that got me started on my way.

First of all, pd has two versions. It has the vanilla version, which is more up-to-date but lacks features, and the extended version, which is no longer being developed but is much more feature-full. I would recommend using the extended version for a majority of projects, despite its shortcomings. Technically, if you have an extremely solid knowledge of C, you can manually import the missing libraries into the vanilla version, but most projects won't need the additional stability offered by the slightly newer vanilla version, anyway. In my project, the delay lines and serial communication would not work properly without pd-extended.

Here is a list of the most useful resources I found during my explorations of pd.



Fundamentally, this project deals with sound, sound waves, and how those sounds interact with our brains to make music. The study of digitally making music is called DSP, or Digital Sound Processing. In order to make sound, we first need to understand what sound is.


We hear sound with our ears. An ear is essentially a microphone - which makes sense, as they serve the same purpose. The eardrum is a membrane. Small bones transmit vibrations from the air to this membrane, which are interpreted as sound. Similarly, in a speaker, vibrations in the air cause a magnetic transducer to create disturbances in a membrane.


In other words, sound is just a vibration in the air. When these vibrations are regular, we interpret them as a tone. Sound waves are modelled as simple vibrations, i.e. sine waves. The amplitude of the sine wave dictates how loud the sound is, while the frequency dictates the pitch of the sound. Pitchless sound is either too irregular and noisy to carry a pitch, or has a frequency too low to be interpreted as a pitch by humans.

Other tones can be made through more complex vibrations. Where a sine wave is a very simple and organic model of a vibration, any sound can be looped an arbitrarily large number of times within a second to create a frequency. We will come back to this later when we talk about samplers, but for now we will introduce some simple waves.


As discussed, sine waves are the most simple, and have soft, floaty tones. Square waves have sharper tones. They can be modelled as a constant on-off throughout a second. Sawtooth waves also have sharper sounds, and can be modelled as a constant voltage increase followed by a steep drop. Triangle waves have a softer sound, and can be modelled as either a constant voltage increase followed by a constant voltage decrease, or as alternating between two opposite sawtooth waves.

It also turns out that every non-sine wave is just a sum of many sine waves. This is called a Fourier Analysis, and is beyond the scope of this documentation, as we will not really concern ourselves with the fundamental science behind the sounds we create.

Harmonics and Music Theory

We will cover some very basic music theory here in order to contextualize our discussion on sound.

You may have heard of major and minor keys:


In a major key, most notes sit on the major scale, and vice versa for minor. In the picture above, C major and C minor scales are shown. Major scales tend to sound "happy" while minor scales tend to sound "sad." Why is this? This is the question that will motivate our discussion.

The inner ear has cilia that vibrate with sound. It turns out that these cilia are set up to vibrate in time with integer multiples of one frequency. These multiples are known as harmonics. For instance, for 440Hz, 880 and 1320 would be harmonics of 440.

The consequence of this ear structure is that integer multiples of the same frequency played at the same time are completely indistinguishable to the human ear. The same is true for frequencies differing by a ratio of small integers. For instance, 400 and 600 Hz played at the same time would be indistinguishable, since 600 is 3/2s of 400.

Very similarly, the minor and major scales, the two dominating scales in music, are built on small integer ratios of a frequency. For instance, both the minor and major scales contain the interval of a perfect fifth, which is 3/2 of the fundamental frequency, and the perfect fourth, which is 4/3 of the fundamental frequency. Below is a table of frequencies and ratios of the C major scale.

1:1 9:8 5:4 4:3 3:2 5:3 15:8 2:1
440 495 550 587 660 733 825 880

Answering this question of what makes major and minor scales sound good happens to also answer the question of what makes two instruments sound different. Why is an A on a violin different from an A on piano?

The answer has to do with the ratio of volumes between these indistinbuishable sounds. 440 and 880Hz together will just sound like 440Hz. However, if 880 is twice as loud as 440, it will sound different than if they are the same volume. Real sound is comprised of complex series of many harmonics, and thus different ratios of volume can give a diverse array of sounds. Thus, when we aim to create different sounds, we will essentially deal with volumes of harmonics.


MIDI is an interface used for exchange musical information between different devices. It stands for Musical Instrument Digital Interface. MIDI data is composed of a constant stream of events. Each event is two bytes and comprises of two pieces of information. One is the note number, the other is the velocity. The note number indicates the note being pressed, and the velocity indicates how hard it is pressed. Below is a table of MIDI note numbers, where column represents note and row represents octave.

C C# D D# E F F# G G# A A# B
0 0 1 2 3 4 5 6 7 8 9 10 11
1 12 13 14 15 16 17 18 19 20 21 22 23
2 24 25 26 27 28 28 30 31 32 33 34 35
3 36 37 38 39 40 41 42 43 44 45 46 47
4 48 49 50 51 52 53 54 55 56 57 58 59
5 60 61 62 63 64 65 66 67 68 69 70 71
6 72 73 74 75 76 77 78 79 80 81 82 83
7 84 85 86 87 88 89 90 91 92 93 94 95
8 96 97 98 99 100 101 102 103 104 105 106 107
9 108 109 110 111 112 113 114 115 116 117 118 119
10 120 121 122 123 124 125 126 127

A velocity of 0 means that the note is let go. So, if I were to play the sequence of notes C-D-E getting louder in the 4th octave, this might be represented in MIDI data as (48, 80), then (48, 0), (50, 100), then (50, 0), (52, 120), and finally (52, 0). Notice that every time a new note is put, the old note has to be turned off, unless we want simultaneous voices.

In Pure Data, MIDI input is extremely input. The notein object outputs the note number and velocity as two outputs each time a MIDI event is received.


We can create a naive MIDI input in pd as follows:


The notein object will take each MIDI event, turn the note number into a frequency, give that frequency to a sawtooth oscillator and set the volume to be scaled by the velocity. The issue with this method is that it can only create monophonic synthesis. In other words, it cannot play two notes at once.

We can get around this using pd's poly object. The poly object will route multiple voices at once. In other words, if a C is being played on one voice, and a D is played before the C is let go, the poly object will send the D signal to an alternative route rather than overriding the C.


You will notice that we have to duplicate our note segment so that we have one instance for each voice. This is cumbersome, particularly as we add more voices.

Unfortunately, there is no way to easily get around this and create a truly polyphonic synth. Part of the reason for this is that on a real instrument, you can play an arbitrarily large number of notes at an arbitrarily large volume. However, on the computer, you can only play so many notes before the membrane of the microphone reaches its limit and begins to distort. This is already obvious when using sawtooth harmonics. The default phasor~ object in pd creates a sawtooth from 0 to 1. Adding two phasor~ signals that are multiple frequencies of each other will thus result in an output of 2 at every period, which is not representable by a speaker membrane.

The best alternative is to encapsulate the note into a separate object so that it can be duplicated, and then multiply each poly voice by the reciprocal of the voice number. For instance, if we have 5 voices, multiply the volume of each voice by 0.2 so that it will never add up to be more than 1. Obviously, this makes the sound softer, but this way no data is lost through distortion, and the output sound can just be amplified.

Synthesis Algorithms

We will cover a few simple synthesis algorithms before jumping into the details of the synthesizer that I built. ###ADSR Envelope Previously, we have been modelling sounds as just sharply turning on and off. When we press a key on our rudimentary MIDI controller, we're just turning a constant oscillator on or off and perhaps setting the volume. However, this is clearly not an accurate representation of real sound, nor is it particularly interesting.

In order to obtain a more organic sound, we should have an initial "attack," then a "sustained" portion, then eventually a "release." We don't want abrupt sounds, as they will interfere with the smoothness of our sound. We can model this with the following curve, called an ADSR envelope.


ADSR has four parameters: attack, decay, sustain, and release. For a given MIDI event, the velocity represents the maximum of the ADSR curve, the attack represents the time to scale up from 0 to that maximum, the decay is the time to decay from the maximum to the sustained volume, and the release is the time to decay down to no volume after a 0 velocity input is sent. For each voice, we will take the output of our synth and multiply it by an ADSR envelope to obtain a smoother sound. By changing A and R, we can get less or more smooth sounds, for obvious reasons.

In Pure Data, the ADSR patch looks like this:


A, D, S, and R are being sent as variables in our controller patch. The ADSR patch takes velocity as an through the inlet and outputs the envelope (from 0 to 1) through the outlet. The patch is pretty self-explanatory, and has an A-D-S route if the velocity is nonzero, and an R route if the velocity is 0.

Using this patch by itself for each of our notes won't work, since our main patch is still setting the volume of each oscillator the the velocity of the note. So, we need to make an adjustment.


This patch would represent an encapsulated note object, taking a MIDI event as input and outputting a sound. In this case, volume is set if velocity is nonzero, but ignored if it is 0 so that the release parameter may smoothly reduce the volume.


A sampler is a type of synthesizer that allows us to record a sound, then play it back and change the pitch in correspondence with the note. We can record a pitched tone, for instance, someone humming, and turn it into a note. We can also record an unpitched tone, for instance, hitting a table, and sample a tiny portion of it as a repeated cycle to create a tone.

The pd tutorial site has a section on samplers. Here is the patch they give:


We're not going to worry too much about the specific details of this patch, as they involve advanced pure data control, and it can be easily repurposed through encapsulation anyway. However, we will discuss its function.

On the right side, there are two sliders which allow for selecting which part of an audio segment to sample from. On the left side is the actual sampler. Essentially, it plays back the recorded sound starting from the start location and ending at the end location, and delays a small amount in between each portion of the sample in order to achieve the pitch change effect.

For instance, to decrease pitch by an octave, as discussed earlier, speed must be doubled. So, the sound would be delayed by 1 sample per sample, or, alternatively, read at 22050 samples per second instead of 44100 samples per second.

Karplus-Strong String Synthesis

Karplus-Strong is an algorithm for simulating a plucked string, like that of a guitar. The Wikipedia article gives some history of the algorithm and also gives some example sounds made with it.

The Karplus-Strong algorithm is extremely similar to a sampler, since that's essentially all it is. The algorithm generates a burst of noise, then feeds this noise into a filter, waits a little bit, and feeds it back again, and so on. The initial noise burst gives the twang effect of the guitar, while the feedback loop is essentially the same as making a sampler. This algorithm is visualized in the following picture.


The patch for this is also given at the pd tutorial book website.


In this case, the vd and delwrite objects read and write to a delay line, which is essentially an array of delays. This has the equivalent effect of delaying for some amount of time at each sample. Again, we will not worry about the details, since we can simply encapsulate the object for our own use.

Making the Glove

In this section, I document and describe the construction and design of the glove. ###Sewing I cut and sewed the glove from an old t-shirt that I bought at Goodwill. I traced an outline around my hand on one side of the glove and went over the trace with a sewing machine. I also attached a button for fastening.


I used flex sensors (I will elaborate on the technology behind flex sensors in the next section), which had to be attached in a way such that they could be being in a regular manner while still being robust and difficult to break. At first, I tried to attach them to the glove using duct tape.


Since duct tape isn't particularly stretchy, this ended up inhibiting my range of movement. In addition, it caused the already problematic solid core wires to stick in place and be unable to move, causing them to constantly bend into new positions that would then stick. So, I eventually decided to superglue each flex sensor into place and sew little pockets for each sensor to keep it relatively fixed.


I discovered that for each of these little pockets, one easy way to hold down the ends of the sewing thread is to use little dabs of superglue. This pretty much guarantees that the end knot absolutely will not come undone. By the end, sewing gave a much cleaner and stronger final product, though it took much longer than using duct tape.



The primary concept used in the circuitry of this glove is called a voltage divider. A flex sensor is nothing more than a variable resistor. The default voltage of each flex sensor is 10k Ohms, and flexing it either way gives a variation of about 30%, giving an effective range of 7k to 13k Ohms.

The challenge in desigining the circuitry is that we would like to be able to measure the resistance of the flex sensor at any given time. Since we can simply calculate and remap values, we're less concerned with actually measuring resistance, and more concerned with just getting a continuous range of output with some degree of precision.

Unfortunately, the analog pins on the arduino cannot measure resistance nor current - only voltage. So, we must design a circuit that varies in voltage as resistance changes. It turns out that this is essentially how multiple resistors in series works. Consider the following variations on the same circuit:


All we're doing is monitoring the voltage drop over each resistor. It turns out that we have a formula for the voltage at Vout:

$$V_{out} = V_{in} * \frac{R_2}{R_1+R_2}$$

We're not really concerned with trying to plug in values for our flex sensor into this formula, since we can just test them using the Arduino. However, the important thing to note is that the voltage change varies with the resistance of each resistor. In other words, if we can keep one resistor fixed and vary the other, then the voltage will vary, too.

Our formula shows us that in order to balance to change in voltage with each resistor such that it is neither too large nor small, we should select a resistor with a similar resistance to our flex sensor. So, in our circuit, R1 will be a 10K resistor, and R2 will be our flex sensor. The basic flex sensor circuit is shown in the next image.


We can easily generalize this to multiple flex sensors by running the same circuit in parallel. Our final circuit looks like this.



Since I was using electronics on flexible textiles, every aspect of the project had to be robust enough to survive the flexing and movement of my wrist and fingers. At first, I used solid core wire for the ground and power cables. This ended up being a terrible idea, as the wires tore the solder joints apart whenever I bent my fingers. Also, the distance between the Arduino, where voltages are measured, and the flex sensors, where they are changed, meant that cable management was a vital component of this project.

The circuit has one ground, one voltage, and 5 Vouts, for a total of 7 wires. This proves to be a nightmare to manage. First, in order to keep the circuit flexible (in a literal sense), I used a horizontal wire to connect all the ground and Vin wires.


This reduces the clutter from the five ground and voltage wires, but I still have 7 wires to deal with. I wanted to make the wires such that I'd be able to neatly slot the end of the cable into the breadboard, so I taped 7 square jumpers together to substitute for ribbon wire.


The end of this cable slots into the end of the breadboard, on the lower left side.


I soldered headers onto the end of each loose cable, allowing me to slot the end neatly into the bunch of wires that I made.


While everything ended up working out neatly, the main takeaway of this is to plan out construction beforehand. Desoldering every single solid core wire took a very long time.

Making the Synthesizer

This is the UI of the synthesizer. Certain features aren't available on the UI, such as the sequencer and parameters for Karplus-Strong.


The five sliders in the upper-left correspond to the five fingers on the glove. Each one is mapped to a different sound effect. Finger one is the wah-wah effect, and finger two is the Q factor for that wah-wah effect. Finger 3 is vibrato. Finger 4 increases the volume of the high harmonic, and finger 5 increases the volume of the low harmonic.

On the right of this, we have four sliders for an ADSR envelope. There are also a handful of sliders for vibrato and tuning. All the way on the right is volume sliders. Everything in the middle has to do with the oscillator.


I used the basic sampler patch that was introduced in an earlier section for the sampler.


The variables start and end are set as sliders in the UI. The vibrato variable is calculated based on the position of the third finger, scaled by the "vibrato scale" slider.


The sampletune variable is used to alter the pitch of the sampler, since there is no way to guarantee that the sampler has the same pitch as the oscillator. Well, technically there is a way involving Fourier transforms, but there are various complications that prevent it from working properly.

The sampler writes a 3 second recording to file and reads it into a table. The writesf~, or "write sound file," object will open a filestream with the open message, start recording with the start message, and stop recording with the stop message.


I wanted to introduce an element of texture sensing, but all the texture sensors I found were quite expensive. The solution I came up with was to strap a latex glove as a high-friction membrane over a condensor microphone. Unfortunately, the membrane was not high-friction enough to pick up minute surface details.



Again, I used the same Karplus-Strong patch as was introduced earlier. However, whereas I could encapsulate the sampler in an independent note array, I could not do the same for the Karplus-Strong.


Notice that each voice shares an encapsulated midtermnote object, but has its own separate kp object. The reason for this is that you cannot read from one delay line from multiple sources. As it is, the Karplus-Strong patch uses a delay line.


When trying to read from the same delayline buffer from multiple different sources at once, only one sound ends up playing. The rather rudimentary solution is then to create 5 separate delay lines - buffer1,buffer2,buffer3,buffer4, and buffer5 - and assign them to each voice separately.

The original goal of the project was to generatively assign a sound texture to any input sample, thus allowing for emergent sounds without careful human tuning. One way we can do this is by changing the Karplus-Strong damping factor. A damping factor close to 1 indicates a very "twangy" sound, while one close to 0 indicates a very soft click.

Thus, we want to look for a way to calculus Karplus-Strong damping based on the quality of the waveform. We do this through the "twang" variable, which is calculated as the variance of the waveform, viewing the size of each sample as a variable from 0 to 1.



The oscillator follows a similar philosophy as mentioned before. The goal is to allow the oscillator to make a sharper sound.

The way we do this is by, once again, treating the series of samples as a variable. In this case, we essentially create a histogram of the sounds. If the input sample is 'smoother,' or the envelope is on average lower, so the histogram is right-skewed, the resulting oscillator sound is softer and smoother. In other words, triangle and sine waves dominate. If the input sound is "spikier," or the histogram is left-skewed, then the resulting oscillator sound is sharper and grainier. In other words, square and sawtooth waves dominate.

The process encapsulation takes the input sample and iterates through it sample-by-sample and creates a five-bar histogram based on four boundaries set through the program, which can also be changed using the sliders on the UI of the synthesizer. In this image, the bins are set to 0-30, 30-40, 40-50, 50-70, and 70-100. This is the default rather than an even split of 20-40-60-80 since most audio samples have a default background noise, meaning that sound will rarely fall below an envelope of 20.


The five outlets produce weights, which determine the ratio of the five oscillators in the note patch. These are indicated as w1, w2, w3, w4, and w5.


As is seen in the patch, the lower weights (1 and 2) influence softer sounds lke cosine and triangle oscillators, while the higher weights influence grainer oscillators like square and sawtooth.

In a previous section, I mentioned that the fourth and fifth fingers are mapped to low and high harmonics. How do I select which harmonic without violating the existing sound texture? I created a domosc object which selects the oscillator with the greatest weight. The fourth and fifth fingers thus amplify this dominating oscillator.


UI and Cleanup

There is a lot more to the synthesizer than the previous picture would indicate...


However, this gets hidden away by using Pure Data's "graph-on-parent" option. When this option is selected, only the objects in the red outline, as shown, are displayed on the canvas. This way, I can move all the calculations off of the canvas, and put only sliders and buttons (and graphs) on the canvas.


Interfacing with Arduino

I connected with the Arduino using the "Firmata" firmware and the "Pduino" library. This is a pre-compiled Arduino binary that allows direct communication with the Arduino through Pure Data without having to manually set up serial communication. It can be downloaded here.. Using Pduino and Firmata gives an arduino object in pd.


In the above patch, the open 1 message opens the arduino object on serial port 1. The pinmode messages set up the analog pins for reading and outputting values. These analog values are then sent through ard variables, where they are received at the synthesizer.


The arduino object does not work by default on pd vanilla, only on pd-extended.

Playing Music

I composed a track in Garage Band with a friend. However, the piece was too hard for me to play by myself. So, I attached a sequencer to the synthesizer, which would play a pre-written score automatically as I adjusted the parameters of the synthesizer.


The toggle square on the upper-mid-right starts a metro, which then regularly reads the next line of a pre-written score at score.txt. Each line of the scores gives a MIDI event. The sequencer is monophonic, and disables the previous note while starting the next note.


Overall, this has been a fun project and a great learning experience. It has demonstrated how an understanding of technology opens up new opportunities in music, and vice versa. Understanding the way that signals work allows me to create specific sounds that I desire, while understanding music allows me to frame my project in a certain way.

There are still many, many techniques that I learned about that did not make it into the final project, including a variety of generative music algorithms.

If I do additional work on this project, I'd like to: