A Statistical Analysis of Tonal Harmony

By David Temperley
2009

————

Overview

It is generally believed that harmony in common-practice music (i.e. 18th and 19th century Western art music) is characterized by certain basic principles. Dominant harmonies (V and vii) go to tonics (I), predominants (IV and ii) go to dominants, root motion by descending fifth is especially favored, and so on. But to what extent are these principles actually followed in common-practice composition? There has been surprisingly little empirical study of this question. [1]

This page presents a statistical analysis of harmonic progressions in a corpus of common-practice music. The data files and programs mentioned below are available in this zipped file.

The data comes from the workbook accompanying Stefan Kostka and Dorothy Payne’s theory textbook Tonal Harmony, 3rd edition (McGraw-Hill, 1995). The workbook contains a number of excerpts of common-practice pieces, to be analyzed by the student; an accompanying instructor’s manual contains “correct” analyses done by the textbook authors, in conventional Roman numeral notation. The analyses also show modulations, and represent each chord in relation to the local key.

I created a corpus consisting of all of the analyzed excerpts in the workbook of 8 measures of more in length; there were 46 such excerpts. I call this the “Kostka-Payne corpus.” (A list of the excerpts is at the end of this page, and also in the file kp-corpus-info.) I created midifiles and “notefiles” (textfiles listing the notes with pitches and on/off times) of all the excerpts. (This was done in connection with the testing of the Melisma music analysis system; the notefiles and midifiles are available at the Melisma download site.) The harmonic analyses of the excerpts were computationally encoded by Bryan Pardo, and added to the midifiles (these midifiles are available at Pardo’s website). I then converted Pardo’s analyses into another format, which I call “chord-list” format. The beginning of a chord-list (for the opening of the Minuet in G major from the Notebook for Anna Magdalena Bach) is shown here:

 0.000  2.608 -  0  1  7  7
 2.608  3.913 -  5  4  7  0
 3.913  5.217 -  0  1  7  7
 5.217  6.521 - 11  7  7  6

Each line represents a chord segment. The first number indicates the beginning of the segment, in seconds. (For each excerpt, I chose a tempo that I thought was reasonable, and then generated times for the chord segments using this tempo.) The second number represents the end time of the segment. Following this are four integers. The first is the “chromatic relative root”: the chromatic interval from the root to the tonic. I use the usual pitch-class notation for intervals: I = 0, bII (or #I) = 1, II = 2, etc. The second integer indicates the “diatonic relative root” – the Roman numeral number (I = 1, bII = 2, II = 2, etc.). The third number indicates the tonic (assuming the usual pitch-class notation: C = 0, Db/C# = 1, etc.), and the fourth number indicates the _absolute_ root (again assuming the usual pitch-class notation). So the first chord statement above indicates I in the key of G major – a G major chord, in absolute terms. (Applied chords were relabeled in relation to the local key: for example, V/V was converted to II.)

Note that this format contains no information about the quality of chords (major/minor/diminished) or extensions (e.g. sevenths, ninths). This information is available in Pardo’s midifiles, but I did not encode it. [2]

The file kp-chord-list contains the chord-lists for the complete KP corpus. The title of each excerpt (using the short names shown in the kp-corpus-info is indicated at the beginning of the excerpt. Dotted lines “—” separate one key section from another. (“Pivot chords” – chords at key boundaries that function in both the previous key and the following one – are represented in both key sections.) I also separated the corpus into major-key and minor-key key sections; the file kp-chord-list includes just the major-key ones, and kp-chord-list-mi includes just the minor-key ones.

A few chords in the corpus were given chord symbols for which there is no widely accepted root, such as “German 6th”. For such chords, the label -1 is used for the chromatic, diatonic, and absolute roots.

Some Aggregate Statistics

Once I had the KP corpus in “chord-list” form, I then wrote a perl-script, tally.pl, which extracts various kinds of aggregate statistics.

The corpus contains 919 chords, and a total time of 1354.116 seconds.

First I extracted the total count of each chromatic relative root, and the total amount of time spent on that root.

                     proportion total
                     excluding  time
Root count proportion  tonic   (secs) proportion
I     318  0.346        ---  553.792  0.409
bII    17  0.018      0.029   29.805  0.022
II    104  0.113      0.180  118.766  0.088
bIII   10  0.011      0.017   16.668  0.012
III    21  0.023      0.036   25.104  0.019
IV     70  0.076      0.121   91.622  0.068
#IV    17  0.018      0.029   18.652  0.014
V     214  0.233      0.370  302.102  0.223
bVI    34  0.037      0.059   44.383  0.033
VI     50  0.054      0.087   76.706  0.057
bVII    6  0.007      0.010    8.301  0.006
VII    35  0.038      0.061   37.552  0.028

(The first “proportion” column shows the count of the chord as a proportion of the total count; the second “proportion” column shows the time spent on the chord as a proportion of the total time.)

There were also 23 “miscellaneous” chords, not assigned any explicit root (such as augmented-sixth chords), taking a total time of 30.663 seconds. (These are assigned chromatic root of -1 in the chord list; diatonic root and absolute root are also -1.)

Then I looked at the “chord transitions” — the number of times each chord moves to each other chord. “Antecedent” chords are shown on the vertical axis, “consequent” chords on the horizontal; for example, the number of occurrences of I moving to II is 31. (The data only reflects transitions within a single key section; no transition is recorded for moves from one key section to another.)

CHROMATIC ROOT TRANSITION COUNTS

 Cons   I  bII   II bIII  III   IV  #IV    V  bVI   VI bVII  VII
Ant 
   I    0    7   31    1    4   45    2  116   11   17    3   19 
 bII    3    0    8    0    0    0    1    2    0    0    0    1 
  II   22    3    0    1    4    1    7   45    2    8    0    6 
bIII    1    1    0    0    0    0    0    4    4    0    0    0 
 III    1    0    2    0    0    7    0    1    0    7    0    1 
  IV   32    2   10    0    4    0    3   11    0    1    1    4 
 #IV    7    0    0    0    0    0    0    9    0    0    0    0 
   V  167    0    8    1    2    4    0    0    7    6    0    2 
 bVI    5    2    8    0    1    3    0    2    0    3    2    0 
  VI    4    2   28    0    1    4    2    1    0    0    0    1 
bVII    0    0    0    5    0    0    0    1    0    0    0    0 
 VII   27    0    0    0    3    0    1    1    1    0    0    0 

It is useful to represent this data in two other ways. First, we represent chromatic root transitions as a proportion of the total count for the consequent chord. The values in each column sum to 1; thus one can see, for example, that I is approached by V 62.1% of the time.

CHROMATIC ROOT TRANSITIONS AS PROPORTION OF COUNT FOR CONSEQUENT CHORD

 Cons     I   bII    II  bIII   III    IV   #IV     V   bVI    VI  bVII   VII
Ant 
   I  0.000 0.412 0.326 0.125 0.211 0.703 0.125 0.601 0.440 0.405 0.500 0.559 
 bII  0.011 0.000 0.084 0.000 0.000 0.000 0.062 0.010 0.000 0.000 0.000 0.029 
  II  0.082 0.176 0.000 0.125 0.211 0.016 0.438 0.233 0.080 0.190 0.000 0.176 
bIII  0.004 0.059 0.000 0.000 0.000 0.000 0.000 0.021 0.160 0.000 0.000 0.000 
 III  0.004 0.000 0.021 0.000 0.000 0.109 0.000 0.005 0.000 0.167 0.000 0.029 
  IV  0.119 0.118 0.105 0.000 0.211 0.000 0.188 0.057 0.000 0.024 0.167 0.118 
 #IV  0.026 0.000 0.000 0.000 0.000 0.000 0.000 0.047 0.000 0.000 0.000 0.000 
   V  0.621 0.000 0.084 0.125 0.105 0.062 0.000 0.000 0.280 0.143 0.000 0.059 
 bVI  0.019 0.118 0.084 0.000 0.053 0.047 0.000 0.010 0.000 0.071 0.333 0.000 
  VI  0.015 0.118 0.295 0.000 0.053 0.062 0.125 0.005 0.000 0.000 0.000 0.029 
bVII  0.000 0.000 0.000 0.625 0.000 0.000 0.000 0.005 0.000 0.000 0.000 0.000 
 VII  0.100 0.000 0.000 0.000 0.158 0.000 0.062 0.005 0.040 0.000 0.000 0.000 

Now the same for the antecedent chord. Now each row sums to 1. For example, I moves to V .453 of the time.

CHROMATIC ROOT TRANSITIONS AS PROPORTION OF COUNT FOR ANTECEDENT CHORD

 Cons     I   bII    II  bIII   III    IV   #IV     V   bVI    VI  bVII   VII
Ant 
   I  0.000 0.027 0.121 0.004 0.016 0.176 0.008 0.453 0.043 0.066 0.012 0.074 
 bII  0.200 0.000 0.533 0.000 0.000 0.000 0.067 0.133 0.000 0.000 0.000 0.067 
  II  0.222 0.030 0.000 0.010 0.040 0.010 0.071 0.455 0.020 0.081 0.000 0.061 
bIII  0.100 0.100 0.000 0.000 0.000 0.000 0.000 0.400 0.400 0.000 0.000 0.000 
 III  0.053 0.000 0.105 0.000 0.000 0.368 0.000 0.053 0.000 0.368 0.000 0.053 
  IV  0.471 0.029 0.147 0.000 0.059 0.000 0.044 0.162 0.000 0.015 0.015 0.059 
 #IV  0.438 0.000 0.000 0.000 0.000 0.000 0.000 0.562 0.000 0.000 0.000 0.000 
   V  0.848 0.000 0.041 0.005 0.010 0.020 0.000 0.000 0.036 0.030 0.000 0.010 
 bVI  0.192 0.077 0.308 0.000 0.038 0.115 0.000 0.077 0.000 0.115 0.077 0.000 
  VI  0.093 0.047 0.651 0.000 0.023 0.093 0.047 0.023 0.000 0.000 0.000 0.023 
bVII  0.000 0.000 0.000 0.833 0.000 0.000 0.000 0.167 0.000 0.000 0.000 0.000 
 VII  0.818 0.000 0.000 0.000 0.091 0.000 0.030 0.030 0.030 0.000 0.000 0.000 

As a final analysis, we consider the counts of different root interval motions. The left column below shows each chromatic interval (+m2 = ascending minor second, +M2 = ascending major second, etc.) along with its count. The right column groups these into diatonic intervals. (Each interval is represented by its smallest possible form; so a descending fifth is represented as an ascending fourth, +P4.)

INTERVAL COUNTS

Chromatic   Diatonic
+m2  72     +M/m2 127
+M2  55
+m3   7     +M/m3  32
+M3  25
+P4 308     +P4   308
-TT  25     TT     25
-P4 167     -P4   167
-M3  21     -M/m3  64
-m3  43
-M2  34     -M/m2  65
-m2  31

Discussion

To a considerable extent, the conventional rules of harmony are supported by this data. This is perhaps most clearly seen in the table of root transition counts. The most common root motions, in order, are V-I, I-V, ii-V, and I-IV (the last two are equally common). All of these are standard, “correct” progressions of tonal harmony. “Incorrect” progressions such as V-IV are generally less common.

A few things are surprising. In particular, the frequencies of ii-I and IV-I are surprisingly high. Both of these represent “predominant-to-tonic” motions and are generally considered undesirable. IV-I progressions do occur in certain circumstances (such as plagal cadences and I-IV-I motions expanding an opening I) but their frequency here seems high. This appears to be largely due to cadential 6/4 chords; this is discussed further below.

The interval counts are also of interest. Traditional theory holds that certain intervallic root motions are preferred over others: descending fifths are most preferred (strongly favored over ascending fifths), descending thirds over ascending thirds, and ascending seconds over descending seconds. This data clearly shows all three of these preferences: descending fifths (+P4, 308) are much more common than ascending fifths (-P4, 167), descending thirds (65) are more common than ascending (32), and ascending seconds (127) are more common than descending (65). Overall, fourths are by far the most common (475); seconds (192) are much more common than thirds (96), and tritones least common of all (25).

Aggregate Statistics (with Cadential 6/4’s Reanalyzed)

A close inspection of the data revealed that the oddities noted above — the high frequency of ii-I and IV-I — were largely due to cadential 6/4 chords. Cadential 6/4’s, which are extremely common in the KP corpus (and in common-practice music generally), are analyzed in the Kostka-Payne text in a “two-level” fashion: A I6/4-V is placed inside a larger V. (This is in fact a common convention; under this convention, the cadential 6/4 is labeled as V6/4.) The encoding of the data by Pardo reflected the lower level (I6/4-V), and the data presented above reflects that as well. However, cadential 6/4’s are frequently (indeed normally) preceded by II or IV; thus it seemed likely that this largely accounted for the high frequency of II-I and IV-I motions. I thought that using the “V6/4” analysis might permit the conventional principles of tonal harmony to emerge more strongly. (This is surely one reason why many people prefer the V6/4 analysis.)

The data was therefore recoded, using the higher-level (V) analysis of cadential 6/4’s. That is, every two chord statements representing a cadential I6/4 followed by a V were replaced by a single statement representing V. The modified chord-list is kp-chord-list-2. Consider just the transition table:

 Cons   I  bII   II bIII  III   IV  #IV    V  bVI   VI bVII  VII
Ant 
   I    0    7   31    1    4   45    2   84   11   17    3   19 
 bII    2    0    8    0    0    0    1    3    0    0    0    1 
  II    5    3    0    1    4    1    7   62    2    8    0    6 
bIII    1    1    0    0    0    0    0    4    4    0    0    0 
 III    1    0    2    0    0    7    0    1    0    7    0    1 
  IV   27    2   10    0    4    0    3   16    0    1    1    4 
 #IV    3    0    0    0    0    0    0   13    0    0    0    0 
   V  166    0    8    1    2    4    0    0    7    6    0    2 
 bVI    3    2    8    0    1    3    0    4    0    3    2    0 
  VI    4    2   28    0    1    4    2    1    0    0    0    1 
bVII    0    0    0    5    0    0    0    1    0    0    0    0 
 VII   26    0    0    0    3    0    1    2    1    0    0    0 

The recoding of cadential 6/4’s has a significant effect. The count of II-I is reduced from 22 to 5; the count of IV-I is reduced from 32 to 27. The top 10 transitions are now V-I; I-V; II-V; I-IV; I-II; VI-II; IV-I; VII-I; I-VII; I-VI.

Once the “V6/4” analysis of cadential 6/4’s is assumed, the conventional principles of tonal harmony appear to be very strongly confirmed. Not a very earth-shattering conclusion (which is why I decided to put this in a web page rather than trying to publish it!) but I think it’s good to know.

A number of other comments could be made about this data. For example, compare the transitional frequency of IV-II (10) to II-IV (1); IV-II is much more common, again confirming a conventional rule. But I will leave further explorations to the reader. The reader could also use tally.pl to reproduce these statistics, and to gather further statistics from the chord lists provided — for example, analyzing major and minor key sections separately. (In fact, the differences between the major and minor key distributions are fairly modest. Perhaps this should not surprise us, since the primary tonic/dominant/predominant harmonies – I, V, II, IV – are the same in both modes, and function similarly.)

Notes

1. A few sources deserve mention. Helen Budge’s (1943) dissertation, “A Study of Chord Frequencies Based on the Music of Representative Composers of the Eighteenth and Nineteenth Centuries,” presents an interesting statistical analysis of tonal harmony, systematically gathered from analyses by experts. But only data on the frequency of individual (diatonic) chords is provided; there is no data about transitions (motions from chord to chord). Allen Irvine McHose’s (1947) study “The Contrapuntal Harmonic Technique of the 18th Century” offers occasional statistics about the frequency of various chords and progressions, but presents no complete data (such as tables of chord or progression frequencies). Philip Norman’s 1945 study “A Quantitative Study of Harmonic Similarities in Certain Specified Works of Bach, Beethoven, and Wagner” has statistics about chord progressions, but he assumes a new chord on every note – that is, he makes no allowance for non-chord-tones; this goes against the modern practice of harmonic analysis. Dmitri Tymoczko’s paper “Root Motion, Function, Scale Degree” (Musurgia 2005, available in English at Tymoczko’s website) analyzes a set of progressions from major-key Bach chorales. Finally, David Huron, in his book Sweet Anticipation (2006), presents data about chord transitions for “a sample of Baroque music” (pp. 250-1; no further information is given about the sample).

2. The mftext program available at the Melisma website) can be used to extract the chord labels from Pardo’s midifiles. While I have not analyzed the labels in detail with regard to mode and inversion, I did extract a few basic statistics. There are 949 chord labels total (this is slightly greater than my count, since in Pardo’s annotations, there may be two chords of the same root and key in succession). Chords built on major triads (including seventh chords that contain major triads, e.g. dominant sevenths) are 68.3% of the total; those built on minor triads, 21.2%; those built on diminished triads, 9.9%. Root-position chords are 60.7 of the total; first-inversion, 23.3%; second inversion, 12.9%; third inversion, 3.1%.

Downloads


(All in the zipped file kp-corpus-files.zip)

kp-corpus-info: List of excerpts in the Kostka-Payne corpus

kp-nbck: This directory contains “note-beat-chord-key” files for all excerpts in the corpus: A list of notes (“Note [ontime] [offtime] [pitch]”), beats (“Beat [time] [level]”), chords (“Chord [ontime] [offtime] [root]”) and key sections (“Key [start time] [end time] [tonic] [mode:ma=0,mi=1]”). I made these as an intermediate step towards making the “chord-lists” below. These files bring together the “beat list” and “note list” formats that I used with the Melisma system (see the Melisma website for explanation) with the harmonic and key information from the Kostka-Payne analyses.

kp-chord-list: Chord list (list of chord statements) for the KP corpus

kp-chord-list-ma: Chord list for the KP corpus, major key sections only

kp-chord-list-mi: Chord list for the KP corpus, minor key sections only

kp-chord-list-2: Chord list for the KP corpus with the “V6/4” analysis of cadential 6/4 chords

kp-chord-list-2-ma: The “V6/4” chord-list, major-key sections only

kp-chord-list-2-mi: The “V6/4” chord-list, minor-key sections only

tally.pl: a perl script for extracting aggregate data from chord lists. (The tables presented above are all outputs of tally.pl.)

List of excerpts in the Kostka-Payne Corpus

The Kostka-Payne corpus is a set of 46 excerpts from the workbook and
instructor's manual for _Tonal Harmony_ (1995, 3rd edition) by Stefan
Kostka and Dorothy Payne. The excerpts in the corpus are as follows.

Name of file		Composer, Title, Measure numbers		p.# in	p.# in 
									inst.	wkbk.
									manual

bach.annamin    	Anonymous (but often attributed to Bach),	49	79	
			minuet in G, mm. 1-16

bach.jesu    		Bach, Chorale, "Jesu, der du meine Seele"	104	163	

bach.kindlein    	Bach, Chorale, "Uns ist ein Kindlein heut' 	103	162	
			geborn"

beet.rondo    		Beethoven, Rondo Op. 51, no. 1, mm. 103-120	129	212	

beet.son10-1.II    	Beethoven, Sonata Op. 10 No. 1, II, mm. 1-8	62	92	

beet.son10-3.II    	Beethoven, Sonata Op. 10 No. 3, II, mm. 9-17	106	168	

beet.son13.II    	Beethoven, Sonata Op. 13, II, mm. 1-8		85	135	

beet.son14-1.III    	Beethoven, Sonata Op. 14 No. 1, III		101	-	

beet.son2-3.III    	Beethoven, Sonata Op. 2 No. 3, III, mm. 81-88	45	72	

beet.son27-2.I    	Beethoven, Sonata Op. 27 No. I, mm. 1-9		129	209	

beet.sq135.III    	Beethoven, String Quartet Op. 135, III, 	87	138	
			mm. 1-10 

beet.strio    		Beethoven, String Trio Op. 9 No. 3, II, 	134	225	
			mm. 1-10

brahms.undgehst    	Brahms, "Und gehst du ueber den Kirchhof", 	95	152	
			Op. 44, mm. 29-37

campbell.barb    	Ayer (arr. by Campbell), "Oh! You Beautiful	150	249	
			Doll!", mm. 1-9

chop.maz63-2    	Chopin, Mazurka Op. 63, No. 2, mm. 1-16		149	248	

chop.maz67-2    	Chopin, Mazurka Op. 67, No. 2, mm. 1-16		86	137	

chop.noc27-1    	Chopin, Nocturne Op. 27 No. 1, mm. 41-52	144	238	

grieg.mountain    	Grieg, "The Mountain Maid", Op. 67 No. 2, 	150	255	
			mm. 1-11

haydn.son22.III    	Haydn, Sonata No. 22, III, mm. 1-8		120-1	-	

haydn.son30.I    	Haydn, Sonata No. 30, I, mm. 84-96		81	125	

haydn.sq20-4.I    	Haydn, String Quartet Op. 20 No. 4, I, 		76	118	
			mm. 13-24	

haydn.sq50-6.II    	Haydn, String Quartet Op. 50 No. 6, II,		76	117	
			mm. 55-63

haydn.sq74-3.II    	Haydn, String Quartet Op. 74 No. 3, II,		133	223	
			mm. 30-7

haydn.sq.76-6.II    	Haydn, String Quartet Op. 76 No. 6, II,		144	241	
			mm. 31-9

mzt.bsnconc    		Mozart, Bassoon Concerto K. 191, II, mm. 42-50	87	139-40	

mzt.ekn.II    		Mozart, "Eine Kleine Nachtmusik", K. 525, II,	61	92	
			mm. 1-8

mzt.pc488.II    	Mozart, Piano Concerto K. 488, II, mm. 1-12	131	-	

mzt.son330.II    	Mozart, Sonata K. 330, II, mm. 21-8		104	164	

mzt.son333.III    	Mozart, Sonata K. 333, III, mm. 91-8		116	-	

mzt.trio    		Mozart, Piano Trio K. 542, I, mm. 210-229	123	199	

mzt.voiche    		Mozart, Marriage of Figaro, "Voi che sapete",	105	167	
			mm. 41-52

schub.bfson.I    	Schubert, Sonata in Bb, D. 960, I, mm. 149-68	144	239-40	

schub.erlkonig.I    	Schubert, "Erlkonig", mm. 113-23		129	210	

schub.erlkonig.II    	Schubert, "Erlkonig", mm. 134-48		129	211	

schub.flusse    	Schubert, "Auf dem Flusse", mm. 14-21		112	179	

schub.imp1    		Schubert, Impromptu Op. 90 No. 1, mm. 42-55	124	201	

schub.strio    		Schubert, String Trio D. 471, mm. 187-201	138	233	

schub.tanze    		Schubert, Originaltanze Op. 9 No. 14, mm. 1-24	145	242	

schum.grenadiere    	Schumann, "Die beiden Grenadiere", mm. 23-37	133	221	

schum.sehnsucht    	Schumann, "Sehnsucht", mm. 2-11			133	219	

schum.thranen    	Schumann, "Aus meinen Thranen spriessen",	96	154	
			mm. 1-17

schum.tragodie    	Schumann, "Tragodie", mm. 1-9			134	224	

schum.wennich    	Schumann, "Wenn ich in deine Augen seh'", 	105	165	
			mm. 1-21

tchaik.morning    	Tchaikovsky, "Morning Prayer", mm. 1-17		95	-	

tchaik.nurse    	Tchaikovsky, "The Nurse's Tail", mm. 5-15	138	232	

tchaik.symph6    	Tchaikovsky, Symphony No. 6, I, mm. 89-97	150	251-3