OK, because you say "SOA of 50, 11, 245 or 542 ms following onset of the letter" it sounds like the SOA is relative to the letter and is thus independent of the response to the letter, yes?

OK, if so, then you can achieve that by creating a sound file for each of your SOAs--each having x milliseconds of silence at the beginning. E.g., a 245ms SOA would be achieved by inserting 245 ms of silence at the start of the sound file. This can be done easily in a freeware sound editor like Audacity (let me know if you have any trouble with it).

Then... you play the sound file before the letter (yes, before!) with a time value of 0ms. This will cause the letter to be presented in perfect synch with the sound file. This is because, with sound files, the time value defines how long to wait before the next stimulus is presented. If the sound has not finished playing, then it plays out while the trial continues. This is what gets you the various SOAs.

You can then list a blank stimulus after the letter to capture a second RT. If the codes for the letter and the sound responses are both listed for each stimulus then you should catch them both no matter the order in which they occur.

Just remember that the second RT will be the number of ms elapsed from the first RT. So you may need to add the first and second RT together when you process the data.

Hope that made sense, but I have a tiny sense it may have been very confusing. Let me know if you would like me to elaborate on or clarify any of this.