If I have understood correctly, one way to achieve what you are trying to do would be as follows.
Rather than organizing your stimuli by position in the trial (sets M, A, B, C, X, Y in your example code), you could arrange them by yoked set. So, for example, you might have stimulus set StimA, which contains an initial image, the three stimuli to be presented if participants choose the first (jump) response option after image A, then the three to be shown if they choose the second (non-jump) option. You would then end up with as many lists as you have yoked sets of stimuli.
Each trial in the main body of the experiment would use one stimulus set. You would initially present the first stimulus in the set, then stimuli 2-4 if they press up, and stimuli 5-7 if they press down. I've attached a simplified example to show what I mean; I cut it down to two yoked stimuli for each initial image (one for each for the jump and no-jump options), but it would work with six as you originally had (three for the jump option, three for the no-jump option). [NB I also switched the response keys to up and down, so I could run it on my computer.]
You can still get random presentation order by using within-group randomization. The downside of this method is that you will need as many 'jump' trials as you have stimuli. The upside is that you have an absolute reference for each stimulus.
I hope this solution works with your design.