How GJay Works
Most of GJay's code simply describes the UI and basic data structures. The interesting algorithms are relatively small and self-contained. The frequency and BPM algorithms are derived from programs written by Daniel Franklin and Van Belle Werner, respectively, and I have only a cursory idea of how they work.
GJay is written in C and uses the Gtk+ GUI library. It is released under the GPL.
GJay analyzes new songs in a background process. The analysis process (daemon) forks to convert ogg and mp3 files to raw audio data; we don't decompress to disk, of course. Analysis processes are nice'd so they affect your system as little as possible.
Although mpg123 is the standard command line mp3 tool, I choose the clone mpg321 because it's more stable and offers some useful command line options that are lacking in mpg123.
If the UI application is running, the analysis process periodically sends its status so the user can see what files is being analyzed and how far along it is.
Note that frequency and BPM analysis are CPU intensive; decompressing mp3 and ogg files also entails a fair bit of CPU, disk access and memory bandwidth. The upshot is that analysis is slow, on the order of taking about 30 seconds for a 4 minute MP3 on a 1Ghz Athlon system. It took about 3 days to analyze my entire 30Gb music library.
The frequency algorithm performs a FFT on each song frame, giving a frequency histogram. I sum all these histograms to create a histogram for the entire song, then reduce it to 30 bins. These bins follow a log frequency scale, i.e. the first bin covers 0-100hz, the last bin covers 18-22.5khz. The data is normalized because songs may have been recorded at different volumes. These 30 bins given the song's frequency "fingerprint."
We also sum across each frame and all frames to determine both the average volume and maximum volume. We store the percent difference betwee the two along with each song's data. This value is used to nudge frequency comparisons towards songs which have the same frequency fingerprint but have different volume characteristics. Empirically, "higher-energy" songs tend to vary less in volume during the course of the song than "lower-energy" songs. This is because most "lower-energy" songs maintain a constant volume except for some sort of climax about 2/3 of the way through the song.
I have very little idea how this algorithm works. It phase shifts a sample 4 beats over itself at varying BPMs, measuring the difference between the shift and the original, to determine at which BPM there is the best fit. It works pretty well at finding some beat, but this is often not the information you're looking for. The example I keep coming back to is a Gillian Welch (bluegrass) song which I mentally categorize as "slow," but clocked 159BPM because that's the strum of the guitar.
My sense is that BPM works well to match songs within a genre, but isn't a good way to match soungs outside of a genre. Users have suggested separately performing a BPM analysis at the start and end of a song, to better mesh with the songs around it. While this hasn't happened yet, I'm on it. Oh, yeah.
The playlist-generation algorithm is appallingly simple and inefficient, but does the job quite nicely.
The user will need to tweak the playlist to generate "good" playlists. For example, if you have lots of songs by a particular artist and give them precisely the same hue and brightness, a search with low variance and strong emphasis on hue and brightness will get stuck in the same artist.
Useful Tidbits for Programmers