Sunday, December 27, 2009

Groovy XML parser for ITunes Music Library

I work in the Video On Demand field and so am very interested in the distribution of content selection. In other words, which content gets played and how often.

A few years ago the standard thinking was that most people were watching the small set of popular content. This was known as the 80/20 rule, that 80% of people were watching 20% of the content. Then the notion of Long Tail came along which said that although some content would be popular the set of content that got at least some plays (i.e. the long tail of graph) was quite large.

Being a bit of geek I decided to write a Groovy based program to analyze the data in my ITunes library (even though this week I switched to the Motorola Droid).

ITunes keeps all of its information about your music, the play counts and so on in a file called library.xml. This file can get quite large so I decided to go with a SAX XML parser approach to minimize my memory consumption. The program reads the libary.xml file, extracts the Artist, Song title and play count and emits a csv file that can later be read by Excel.

The program is shown below:

import javax.xml.parsers.SAXParserFactory
import org.xml.sax.*
import org.xml.sax.helpers.DefaultHandler

class MyHandler extends DefaultHandler {
def tempVal
boolean expectingArtist = false
boolean expectingSong = false
boolean expectingPlayCount = false
String artist
String song
Integer playCount
def outFile = new File('\\musicPlays')

void endElement(String namespace, String localName, String qName) {
if(expectingSong) {
song = tempVal
expectingSong = false
}

if(expectingArtist) {
artist = tempVal
expectingArtist = false
}

if(expectingPlayCount) {
playCount = Integer.parseInt(tempVal)
expectingPlayCount = false
String thisLine = artist + '; ' + song + '; ' + playCount + "\n"
println(thisLine)
outFile.append(thisLine)
}

if(tempVal.equalsIgnoreCase('Artist')) {
expectingArtist = true
}
if(tempVal.equalsIgnoreCase('Name')) {
expectingSong = true
}
if(tempVal.equalsIgnoreCase('Play Count')) {
expectingPlayCount = true
}
}

public void characters(char[] ch, int start, int length) throws SAXException {
tempVal = new String(ch,start,length)
}
}

def handler = new MyHandler()
def reader = SAXParserFactory.newInstance().newSAXParser().xMLReader
reader.contentHandler = handlerdef inputStream = new FileInputStream('\\library.xml')reader.parse(new InputSource(inputStream))
inputStream.close()


This program let me analyze my listening preferences. Turns out I have about 8000 songs in my library, of which I've listened to about 3200 of them at least once.

I've listened to about 260 songs at least ten times and about 1000 songs at least five times.

1 comment: