Would it suffice to have the three steps you mention occur within the same item? I ask because there is no way to eliminate the temporary blank between items (except for with the repeated components of thought listing and recall items which are actually considered to be 1 item). The three steps you described could be combined as one. Using a question onset parameter could delay the appearence of the question for 3 seconds (i.e., after the image appears) and a fixed delay could be inserted at the start of the sound file(s). However, I'm not sure if you were simply using that sequence of events to illustrate your point. Hope that made some sense!