I think you were on the right track with your email regarding trying this with DirectRT. Take a look at the attached input file. I've created a very simplified version that shows the logic you could use to create the fuller design you are describing. See if it makes any sense to you and let me know if you have any questions. It uses the function to recall randomly generated stimuli from other trials (e.g., ?b1t1s1) and also uses a different copy of the question files for each image so that you never get the same question twice for any given image.