Just the other day I was using Rhythmbox to play a few songs I had downloaded. I don’t remember what it was that struck me about the album cover image that was automatically loaded in the bottom left corner of my screen, but it made me wonder how that had happened? I mean, not all my songs had album cover images loading every now and then. Then what was special about this particular song (oh btw, the song was Amplifier by Imran Khan)? I also happened to notice that the recently downloaded music files were a little larger in size in comparison to the songs from yesteryear.
I knew about album art being put inside the same folder as the audio files (a lame image titled folder.jpg), but this was different. There was no such image present in the album directory. So I guessed that it must be part of the file header. A few Google searches later, I affirmed my hypothesis and found myself going through the entire spec for the latest version of ID3 tags at http://www.id3.org/Developer_Information
ID3 tags have been around for slightly over a decade with the first version ( a simplistic “TAG” appended towards the end of the file) released in 1997. Over the years, they have evolved along with the quality of sound they represent. The current ID3 tags (v3.2.4) are capable of storing more information about a song, than perhaps musicians and producers can even provide. What’s more interesting is that these tags allow for variable length fields. So your information is not confined (unlike the earlier versions where names and other information had to be squeezed within 30 bytes).
A review of the ID3 tags specification will tell you that all the metadata about a song is organized as “frames”. Each frame represents a unique field. For instance you have one frame (TPE1) telling you the name of the performing artist and another frame (TALB) bearing the name of the album. The album art may be found under yet another frame (titled APIC, yes.. the entire jpeg image is actually stored against this frame in the file). All frame IDs are four bytes long (TPE1, APIC, TALB, TRDC… etc).You can find a more comprehensive list of frames at http://www.id3.org/id3v2.4.0-frames
From a programmer’s perspective, what one needs to know is:
- how much part of the file must be parsed? It makes little sense to go on from the first byte to the last when the info you need is only present in the beginning of the file
- how do we know how much data to parse after each frame ID (the data is variable length right?)
Firstly, the size of the entire ID3 tag can be got by checking the size field of the ID3 tag. This info can be found in the ID3 header (first ten bytes of the file that look like this: “ID3 XXXXXXX”). Attention must be paid to the fact that the bytes that will be read in hexadecimal format, and if you’re going to read one byte at a time, you will have to clobber a few of them together to construct the size field. I had to write a small routine that would convert a hex string to an integer so that I could compute the value of the size field. So getting hold of this value will solve the first issue.
Each frame ID is followed by a size field as well. This value indicates the length of data corresponding to each frame. So we simply use that to solve the second issue.
Another point to note is that not all tags listed in the specification, will be present in the file stream. Most mp3 files will contain basic info about the performers and producers, the album and a jpeg for the album art if you’re lucky.
I created a class called TagParser, which basically creates a sliding window (4 bytes long) and rolls all the bytes in the file through this window. Whenever the window holds a valid frame id, it looks for the size field for that frame and proceeds to extract the metadata for that frame based on the value of the size field.
The extracted data is stored against the corresponding frame id in a dictionary. The parser scans the entire ID3 Tag in a file and extracts any frame that can be found. (I made an app using this module called "Appellation". Go to my project page and download the app.)
The object of this class invokes a method called ‘parser’ which accepts a path to a file as an argument and returns a dictionary containing the frame IDs as keys and a tuple of (description, metadata) as the value against each key.

