Wednesday, July 23, 2014

Beethoven's Piano Sonatas: an Application of music21

[This is a guest post by Derek Klinge who uses music21 in his research on music and disability. I thank him for his contribution. - MSC]

I am a researcher within the Performing Arts Medicine Association. I was interested in looking at Beethoven's use of range over time in his piano sonatas. Although several previous studies have looked at the question of how Beethoven's compositions were affected by his hearing loss, the results were far less than conclusive. A study in the British Medical Journal counted the notes in the first movements of the first violin parts of Beethoven's string quartet's by hand. For a number of reasons, I thought it might be better to look at the piano sonatas, including that Beethoven wrote more piano sonatas than he did string quartets and symphonies, so the statistical power would be greater. Counting all of the notes in Beethoven's piano sonatas by hand would be a Herculean task for sure, but fortunately with scores available from the Center for Computer Assisted Humanities and music21 sufficient coding skills would do the job.

Why music21?

In addition to the number of high notes, I was also interested in Beethoven's overall use of range, the average note, average frequency, number of measures with high notes, and in calculating values based on the number of notes, as well as weighting those measures by the duration of notes. The methods available in music21 allow the collection of this data very quickly. To collect the majority of the data I needed from all 103 movements of Beethoven's piano sonatas, count over a quarter million individual notes, and organize the data into sonatas, and separating the data by movement number, takes about 11 minutes.

Some Interesting Findings

Beethoven's use of high notes was lowest around 1800 (for all the graphs below, the colors within the dots represent the Sonata Numbers, going from red to purple from 1-32):


The average frequency of each sonata follows a similar trend:

In general, as there are more notes per measure, there are more high notes per measure. This trend does not hold many of the sonatas written before 1802.

Also, the relationship between the use of high notes, and the average frequency was different between the earlier and later sonatas:


Conclusions

Technology like music21 is an invaluable tool for the empirical study of musicology. Relatively quickly, data gathered can be used to analyze the possible relationships between Beethoven's use of high notes and his overall range, and compare that with what we understand about his hearing loss. These data suggest that Beethoven was significantly affected by his hearing loss, though it seems that sometime around 1802 he developed strategies to cope with his progressing disability.

Wednesday, June 25, 2014

Music21 v1.9 released

We are proud to release music21 v1.9.3, the latest and last release in the 1.x series.
There have been 147 commits in the two months since v1.8; here are some of the highlights:
  • MUCH faster .getContextByClass (KUDOS to Josiah Oberholtzer for this). Even if you don't use .getContextByClass in your own code, you're definitely calling something that calls it. This method figures out where the most recent key signature, time signature, clef, etc. is for any given object, finds relationships between notes in different voices, etc. For analysis of medium-sized scores (say, 3 voices, 100 measures) expect a 10-fold speedup. For larger pieces, the speedup can be over 100-fold.
  • A new stream/timespans module that makes the previous speedup possible by representing m21 Streams as AVL trees -- it's used in a few places (needs more docs), forthcoming releases will use it in a lot more places
  • Python3 support (3.3 and later). The entire test/multiprocessTest.py suite passes on Python 3. N.B. to contributors -- from now on all contributions need to pass tests on both Python 2.7 and 3.3 and later. Negative -- in the past you could have made music21 run on unsupported older systems (2.6 and sometimes 2.5); now from music21 import * will fail on pre-2.7. 2.7 has been a requirement since Music21 1.7. Fewer than 30% of Macs still in use are running Lion or earlier and thus will need to update to 2.7. This version of music21 runs about 25% faster on Python 3 than Python 2, but otherwise no new features of Python3 are used. Python 2.7 will be supported throughout the Music21 2.x cycle so no panicking -- it'll be years (if ever) before Python 3.3+ is a requirement.
  • Improvements to reductions of scores. And to analyzing voiceleading motion (some of this is backwards incompatible)
  • Better, faster, and more consistent sorting of elements in a Stream
  • Changes to the derivations module that I doubt anyone else was using anyhow...
  • Removed obsolete files.
  • Stafflines import and export from musicxml (thanks Metalmike!)
  • Complete refactoring of converter.py to make it easier for users to write their own Subconverter formats (that can eventually be put into the system)
  • Complete serialization of Streams via a new version of jsonpickle. This has big implications down the line; for now it affects...
  • Vexflow output is much improved (unless you were counting on Voices; in which case do not upgrade) using the alpha version of music21j -- Javascript reimplementation of music21's core features.
  • IPython improvements, allowing for robust and persistent communication between Javascript and Python. This will eventually (once I document it...) let you use the web browser as a UI for music21 python apps including live updating of music notation. It's too complex for most users right now, but I can attest that this will be one of the biggest perks of the 2.x development.
The usual bug fixes, documentation improvements and fixes, etc. are implemented. Thanks to MIT, the NEH, and the Seaver Institute for funding the project. (and to MIT for tenuring me in part on the basis of music21). This is the last release that Josiah Oberholtzer was lead programmer for; his considerable talents will still be on display in Abjad and many other projects he works on, and the implications of the new storage system he has developed will continue to pay off for years.

What's next?

Starting work on music21 2.0 today. That release will have some backwards incompatible changes that developers will need to deal with -- just as the path to 1.0 meant that some things that were originally thought of as good ideas were thrown out, the path to 2.0 will rely on 8 years of using music21 to fix some things that really should've been done differently from the beginning. Having just spent 2 weeks making m21 compatible with Python 3, I will give my assurance that as few incompatibilities as possible will be introduced. Most of the major changes will be on the core -- so if you've never messed with Sites, SpannerStorage, etc., you'll be fine.
  • Problems with 5 quintuplets = .99999999 of a beat will disappear. Music21 2.X will store offsets and quarterLengths internally as rational numbers (actually a custom MixedNumeral class, so that the __repr__ is nicer...). All music21 objects will gain four properties: ".offsetRational, .duration.quarterLengthRational, .offsetFloat, and .duration.quarterLengthFloat" -- in music21 2.0, .offset and .duration.quarterLength will be aliases for offsetFloat and .duration.quarterLengthFloat -- so no changes will be needed to existing code. This will give a period of time (6 months?) to switch .offset either to .offsetFloat or .offsetRational. We'll have a tool to make the switch automatically. Then at a certain point, .offset will become an alias for .offsetRational. By music21 3.0 .offset will only support Rational numbers.
  • Streams will store the position of notes, etc. in them. Right now this is all stored in the Note object itself. There are some great reasons for doing it that way, but significant speedups will take place by shifting this.
  • inPlace will be False by default for all operations on Notes, Streams, etc. -- you can plan for the migration by explicitly setting inPlace for every call now.
  • Some changes to boundary cases in .getElementsByOffset will take place -- it will not change much, but for a few users this will be crucial.
  • NamedTuples and OrderedDicts will appear in a lot of places
that's all for now, but more examples to come soon. - Myke

Sunday, May 25, 2014

Python reimports

We've been working a lot recently on two kinds of optimization in music21: improving speed and then using some of the speed increases to add functionality and stability, so that new features can be added without slowing down the process. One of the places we found where we could make changes is in our over-cautious use of imports. 

While everyone says that in Python you can import a module inside a function without it going through the overhead of actually reimporting, there is some real overhead still, especially if the function is called a lot of times:

Here I compare ten million calls to reference an object vs. doing the same while also importing a module that is already imported:

>>> from timeit import timeit as t # number = ten million; output in secs to 3 decimals
>>> t('x', setup='import weakref; x=5', number=10000000)
0.278
>>> t('import weakref; x', setup='import weakref; x=5', number=10000000)
7.810

So it's approximately two orders of magnitude slower than direct access alone.  Even with using the module and creating the weakref itself, the check-for-reimport timedominates five-fold over the creation of the weakref:

>>> t('weakref.ref(x)', setup='import weakref; from music21 import pitch; x=pitch.Pitch()', number=10000000)
2.098
>>> t('import weakref; weakref.ref(x)', setup='import weakref; from music21 import pitch; x=pitch.Pitch()', number=10000000)
9.823

for historical reasons (porting to systems without weakref, etc.) the “common.wrapWeakref” function of music21 (which does a try: except to see if a weakref could be made) did the import within the function.  Moving it outside the function sped it up considerably and made it only half the speed of calling weakref.ref(x) directly -- worth it for the extra safety--and only an order of magnitude slower than direct access to x itself:

before, with common.wrapWeakref doing a safety "import weakRef" call 
>>> t('common.wrapWeakref(x)', setup='from music21 import common,pitch; x=pitch.Pitch()', number=10000000)
17.112

after, without it:
>>> t('common.wrapWeakref(x)', setup='from music21 import common,pitch; x=pitch.Pitch()', number=10000000)
4.171

So this is the speedup in music21 that you'd find if you managed to grab the GitHub repository right now.  But we're planning on using the speedup to make things more functional.

As a practical consideration, one of the things that I’ve never been able to fix in music21 is the ability of elements embedded in a Stream to change their duration without telling their sites that things have changed for an element. There are expensive operations such as calculating that the length of a Measure, the last object, etc. which we cache as long as no .append(), .insert(), .remove() etc. are called.  But a Note inside the measure may have changed length so that the information in the cache is no longer accurate. I've been wanting to fix this for a while.

The problem is that the Note object itself has no idea that its duration has changed, because while the Note has a reference to the Duration, the Duration does not have a reference to Note -- it can't have a normal reference because this would create a circular reference (Note.duration = Duration; Duration.client = Note). With a circular reference, neither the Note nor the Duration will ever disappear even after they're not needed anymore, causing memory leaks. The obvious solution is to use a weak reference which behaves mostly like a normal reference but does not cause circular references. If the Note should disappear then the Duration.client weakref is not strong enough to keep the two objects alive.

With the speed increases, it should be possible to store a weakref on Duration and also Pitch to the object they’re attached to so that they can inform their “client” that they’ve changed.  The client can then inform its Sites (measures, etc.) that it has changed and clear the appropriate cache.  The extra overhead of creating the weakref ends up being only about 20% of object creation time; a small price to pay for the security of knowing that nothing can change and screw up the overall system:

>>> t('d=duration.Duration();', setup='from music21 import common,duration,pitch; x=pitch.Pitch()', number=10000000)
19.382
>>> t('d=duration.Duration(); pitchRef = common.wrapWeakref(x)', setup='from music21 import common,duration,pitch; x=pitch.Pitch()', number=10000000)
23.787

Expect to see more functionality like this in a forthcoming release of music21.

Friday, May 23, 2014

Speed, Speed, Speed, ... and news.

The newest GitHub repository contains a huge change to the under-the-hood processing of .getContextByClass() which is used in about a million places in music21.  It is the function that lets any note know what its current TimeSignature (and thus beatStrength, etc.) is, lets us figure out whether the sharp on a given note should be displayed or not given the current KeySignature, etc.  While we had tried to optimize the hell out of it, it’s been a major bottleneck in music21 for working with very large scores. We sped up parsing (at least the second time through) a lot the last commit. This was the time to speed up Context searching.  We now use a form of AVL tree implemented in a new Stream.timespans module — it’s not well-documented yet, so we’re only exposing it directly in one place, stream.asTimespans(recurse=True|False).  You don’t need to know about this unless you’re a developer; but I wanted to let you know that the results are extraordinary.

Here’s a code snippet that loads a score with three parts and 126 measures and many TimeSignatures and calculates the TimeSignature active for every note, clef, etc. and then prints the time it takes to run:

>>> c = corpus.parse('luca/gloria')
>>> def allContext(c):
...     for n in c.recurse():
...         k = n.getContextByClass('TimeSignature')
... 
>>> from time import time as t
>>> x = t(); allContext(c); print t() - x

with the 1.8 release of Music21:
42.9 seconds

with the newest version in GitHub:
0.70 seconds

There’s a lot of caching that happens along the way, so the second call is much faster:

second call with 1.8 release:
44.6 seconds ( = same within a margin of error)
with the newest version in GitHub if the score hasn’t changed:
0.18 seconds

You’ll see the speedup immensely in places where every combination of notes, etc. needs to be found.  For instance, finding all parallel fifths in a large score of 8 parts could have taken hours before. Now you’ll likely get results in under a few seconds.

I have not heard of any issues arising from the change in sorting from the last posting on April 26, so people who were afraid of updating can breath a bit more easily and update to the version of music21 at least as of yesterday. The newer version, like all GitHub commits, should be used with caution until we make a release.

Thanks to the NEH and the Digging into Data Challenge for supporting the creation of tools for working with much bigger scores than before.

In other news: 

Music21j — a Javascript implementation of music21’s core features — is running rapidly towards a public release.  See http://prolatio.blogspot.com/2014/05/web-pages-with-musical-examples.html for an example of usage.  We’ll be integrating it with the Python version over the summer.

Ian Quinn’s review of Music21 appeared in the Journal of the American Musicological society yesterday.  Prior to this issue, no non-book had ever been reviewed. It’s a great feeling to have people not on this list know about the software as well.

Oh, and MIT was foolhardy enough to give me tenure! Largely on the basis of music21.  If you’re an academic working on a large digital project, I still advice proceeding with caution, but know that it can be done.  Thanks everyone for support.

Sunday, February 16, 2014

Plotting pitches and durations continuously in music21

With music21, it's not hard to plot discrete data (pitches, durations, etc.) as continuous data. There isn't a built in tool for doing this, but since music21 is written in Python, it is easy to take advantage of the tools from matplotlib, numpy, and scipy to create "cubic bezier-curve splines" that show these points in an easily visualized format.

In music21 you can easily plot the position of notes as a piano roll:

from music21 import corpus

bach = corpus.parse('bwv66.6')

bach.plot('pianoroll')



which preserves pitch names, measure numbers, etc.  But the case we're asking for requires a plot more like this:




















The numbers at the left are midi numbers while the bottom is number of quarter notes from the beginning. Here's some code to help you achieve this:

import numpy as np
import matplotlib.pyplot as plt
from scipy import interpolate
from music21 import corpus

bach = corpus.parse('bwv66.6')

fig = plt.figure()

for i in range(len(bach.parts)):
    top = bach.parts[i].flat.notes
    y = [n.ps for n in top]
    x = [n.offset + n.quarterLength/2.0 for n in top]

    tck = interpolate.splrep(x,y,s=0)
    xnew = np.arange(0,max(x),0.01)
    ynew = interpolate.splev(xnew,tck,der=0)
    
    subplot = fig.add_subplot(111) # semi-arbitrary three-digit number
    subplot.plot(xnew,ynew)
plt.title('Bach motion')

plt.show()

With this sort of graph it's easy to isolate each voice (not much overlap of voices in this chorale) and to see the preponderance of similar motion among the Soprano, Alto, and Tenor, but lack of coordination with the Bass (which would create forbidden parallels if it coordinated).  More sophisticated examples with better labels are easily created by those with knowledge of matplotlib, but this simple demonstration will suffice to get things started.