My first thought here is to use MediaLab's WebTracker item type. You could create a simple HTML list with links to pages that play the various sound files. Eiether a Back button could return them to the index after the file plays, or a automatic redirect could return them after n seconds.

The WebTracker item type will track every click and the times at which they occur. All of this with a single MediaLab item and a few simple HTML pages. The timing data would be quite adequate for the kind of task you're talking about.

The timing for thought listings is also just fine in ML. It's really an issue of timing error relative to DV variability. If your RTs are averaging 10 seconds +/- 5 seconds, then the effect of 50-100ms of timing error will be neglible. However, if they average 700ms +/- 200ms, then all of a sudden 100ms becomes very important and DirectRT becomes the easy choice.