I see you've already been through this thread on how to yoke pairs of stimuli:

http://www.empirisoft.com/support/showthread.php?t=549

Unfortunately, DirectRT does not have a function to determine "correct" on the fly. It would have to be calculated via post-test analysis.