How GJay Works

Most of GJay's code simply describes the UI and basic data structures. The interesting algorithms are relatively small and self-contained. The frequency and BPM algorithms are derived from programs written by Daniel Franklin and Van Belle Werner, respectively, and I have only a cursory idea of how they work.

GJay is written in C and uses the Gtk+ GUI library. It is released under the GPL.

Background Processing

GJay analyzes new songs in a background process. The analysis process (daemon) forks to convert ogg and mp3 files to raw audio data; we don't decompress to disk, of course. Analysis processes are nice'd so they affect your system as little as possible.

Although mpg123 is the standard command line mp3 tool, I choose the clone mpg321 because it's more stable and offers some useful command line options that are lacking in mpg123.

If the UI application is running, the analysis process periodically sends its status so the user can see what files is being analyzed and how far along it is.

Note that frequency and BPM analysis are CPU intensive; decompressing mp3 and ogg files also entails a fair bit of CPU, disk access and memory bandwidth. The upshot is that analysis is slow, on the order of taking about 30 seconds for a 4 minute MP3 on a 1Ghz Athlon system. It took about 3 days to analyze my entire 30Gb music library.

Frequency

The frequency algorithm performs a FFT on each song frame, giving a frequency histogram. I sum all these histograms to create a histogram for the entire song, then reduce it to 30 bins. These bins follow a log frequency scale, i.e. the first bin covers 0-100hz, the last bin covers 18-22.5khz. The data is normalized because songs may have been recorded at different volumes. These 30 bins given the song's frequency "fingerprint."

We also sum across each frame and all frames to determine both the average volume and maximum volume. We store the percent difference betwee the two along with each song's data. This value is used to nudge frequency comparisons towards songs which have the same frequency fingerprint but have different volume characteristics. Empirically, "higher-energy" songs tend to vary less in volume during the course of the song than "lower-energy" songs. This is because most "lower-energy" songs maintain a constant volume except for some sort of climax about 2/3 of the way through the song.

BPM

I have very little idea how this algorithm works. It phase shifts a sample 4 beats over itself at varying BPMs, measuring the difference between the shift and the original, to determine at which BPM there is the best fit. It works pretty well at finding some beat, but this is often not the information you're looking for. The example I keep coming back to is a Gillian Welch (bluegrass) song which I mentally categorize as "slow," but clocked 159BPM because that's the strum of the guitar.

My sense is that BPM works well to match songs within a genre, but isn't a good way to match soungs outside of a genre. Users have suggested separately performing a BPM analysis at the start and end of a song, to better mesh with the songs around it. While this hasn't happened yet, I'm on it. Oh, yeah.

Playlist Algorithm

The playlist-generation algorithm is appallingly simple and inefficient, but does the job quite nicely.

  • Copy the list of songs to a working list
  • Remove songs below the rating cut-off, if it's applied
  • Pick the starting song. This could be random, assigned by the user, or based on a color. If we're starting from a color, sort the list by color (which is based primarily on hue and adjusted by brightness), find the list position that's closest to the color, then randomly pick a song within the user-specified variance range
  • Remove the starting song from the working list
  • (*) Pick a random subset of the working list. The random subset's size varies by the user's randomness paramater; the more random the playlist, the fewer songs in this subset.
  • Sort this subset with a ranking algorithm. The most appropriate song to go next in the list will be first in the list. Each factor is weighted by its user-defined importance. If a factor is marked as maintain, we calculate the difference between a given song and the first song in the playlist; otherwise, the most recently added song to the playlist is used for comparison.
  • Repeat (*) until the playlist is done

The user will need to tweak the playlist to generate "good" playlists. For example, if you have lots of songs by a particular artist and give them precisely the same hue and brightness, a search with low variance and strong emphasis on hue and brightness will get stuck in the same artist.

Useful Tidbits for Programmers

  • GLib is your friend. It's a library that provides all that typical foundation stuff (data structures, event handling, etc.) that people are constantly re-writing. I use the generic linked list (GList) everywhere.

  • If you're all set to fork() and exec() to get the stdout of an application (e.g. to decode an mp3 to RAW), consider popen(), a bit of POSIX that I didn't know about until recently but have found really useful.

  • Songs by the group Stereolab (bloopy indie, very French) offer excellent boundary test cases for song categorization, both from an automatic analysis and user categorization perspective.