Say you were to display three images, back to back on a single trial, each for 10ms. They would in fact be shown for 10ms each with no interval--resulting in the total sequence taking 30ms. Does that answer your question?

Note that it does get a bit trickier when you are dealing with full screen rather than partial screen images or text because, on a CRT, the top of the screen is drawn before the bottom. Consequently, with a full screen image you see the top of the image earlier than the bottom--but the bottom stays on longer because the top is also erased first. Every pixel is displayed for the same amount of time but this needs to be considered in terms of how it could affect perception. When using a relatively small stimulus on the center of the screen, this issue becomes close to moot.