Frequently asked questions about deinterlacing
by Billy Biggs <firstname.lastname@example.org>
I often find myself discussing deinterlacing on IRC, so this page will be updated and maintained as an FAQ on topics relating to deinterlacing video content for computing applications. The content on this page has been edited and co-authored with help from people on IRC, including Bart Dorsey and Tero Auvinen. Many thanks to them.
A more visual description
I have written up a new page on deinterlacing entitled interlaced video on a computer display which I believe helps explain some of the concepts described here more easily. I'll try and clean this up as soon as possible, but please check out that page if you find the descriptions here too technical.
What is the difference between interlaced and progressive video?
People often state that NTSC is 29.97 frames per second and PAL is 25 frames per second. However, both of these formats are interlaced, and so these numbers only describe the amount of data, not the visible output framerate. Interlaced video is actually a sequence of fields. A field is half of the scanlines of a full image, either all of the even scanlines (0, 2, 4, ...) or the odd scanlines (1, 3, 5, ...) of potentially an individual and unique image. Therefore, it is better to say that NTSC is 59.94 fields per second, and PAL is 50 fields per second.
One way to think about interlaced video is to picture a camera that captures at 59.94 frames per second, but only stores half of the scanlines from each frame, and switches between storing the even scanlines to storing the odd scanlines. I find this analogy very helpful when thinking about motion in interlaced video sequences.
Interlacing is a very effective form of compression. The human eye is less sensitive to high frequency motion. Consider a spinning bycicle wheel: at sufficient speeds your eye cannot distinguish the individual spokes. With interlaced video, if you rapidly switch between seeing the even scanlines and seeing the odd scanlines, the image will appear whole. Interlacing uses this to achieve 59.94fps video in the bandwidth of a 29.97fps sequence.
Progresive video, the opposite of interlaced video, is when every frame contains all of its scanlines. It's called progressive because the image progresses all the way from the first scanline to the last scanline without skipping any.
How do progressive scan cameras work?
A progressive scan camera simply captures a full resolution frame, and consecutively outputs two fields which together represent that image. The mapping between pairs of fields and frames is arbitrary. Some systems use top-field-first groupings into frames, others use bottom-field-first groupings. Deinterlacing progressive content formed like this is simply a matter of knowing which of these two systems is being used.
What does it mean to deinterlace video?
Deinterlacing is the process taking a stream of interlaced frames and converting it to a stream of progressive frames. Ideally, each field becomes its own frame of video, so an interlaced NTSC clip at 29.97 frames per second stream becomes a 59.94 frame per second progressive. Since each field is only half the scanlines of a full frame, interpolation must be used to form the missing scanlines. There are various methods of doing the interpolation, ranging from simply doubling scanlines to motion-adaptive methods.
In many computing applications, deinterlacing is part of a recompression process, for example, converting a DVD of music videos in interlaced MPEG2 format to a progressive MPEG4 stream. For these applications, an output framerate of 59.94fps or 50fps is too high. Instead, most deinterlacer filters convert every interlaced frame to one single progressive frame, effectively halving the framerate of the content.
Why do I need to deinterlace? My TV does fine without!
When I first decided to write a deinterlacer, I thought: "I'll start by doing exactly what my television does, that way, we'll start at a common baseline of quality, then I can start to improve picture quality from there". Televisions effectively use linear interpolation, that is, each refresh shows a new half-quality field of data, and the spot size is large enough that each missing "scanline" gets about half of the one above and below. Why doesn't linear interpolation "just work" on a PC?
The main reason for deinterlacing is framerate conversion. Televisions are locked to the refresh rate of the input; every field of video that arrives is shown for exactly one refresh on the television. A viewer will see equal amounts of the even- and odd-scanline fields, and so the missing information turns into slightly annoying flicker on detail and sharp edges.
In the computing world, the video card controls the refresh rate of the display, and software has no control over its exact timing. Monitors also run at refresh rates incompatible with video rates of 50hz or 59.94hz; they run at 70hz, 75hz, 80hz, or even the annoyingly-close-to-59.94hz rate of 60hz. Because of this, linearly interpolated fields won't be shown for exactly equal amounts of time by the video card: the deinterlaced output will wave between favouring top fields to favouring bottom fields. This causes the image to 'bob' up and down, with small detail bouncing in and out of two distored images.
To compensate for this effect, deinterlacing algorithms try and make every image presented to the monitor look as close as possible to a complete frame, that way, if the frame is shown for too long, it won't look out of place.
The other reason for deinterlacing is of course to improve clarity and reduce flicker. For NTSC content, most people don't notice interlaced flicker artifacts as much as they notice chroma artifacts due to the composite nature of the signal (use S-Video or component wherever possible!). Of course, using a deinterlacer can help improve clarity and stability of the picture even if your output is hardware synchronized with the input.
However, nothing can be done about some problems in smoothness (or temporal aliasing), displaying 59.94fps video on a 75hz monitor will never be as smooth as on a 59.94hz display being driven by the signal.
What is 2-3 pulldown?
2-3 pulldown is a method of converting a 24 frame per second film into 59.94 field per second NTSC. It does not apply to films encoded for PAL. It works by showing every second frame for three fields, resulting in a frame-to-field pattern of 2-3-2-3-2-3 ...
2-3 pulldown is also known as 3:2 pulldown, but that name is confusing, since it is not a ratio.
If you're paying attention, you'll notice that using this sequence will cause 24fps to be shown at 60fps, but NTSC is actually 59.94 fields per second. To compensate for this, before applying the pulldown sequence, the film is slowed down to a speed of 23.976fps, and the audio is also slowed down to compensate. The difference is small enough that you don't notice (but if you're writing a DVD player or ripper, you should!).
What is inverse telecine?
Inverse telecine, also known as 2-3 pulldown correction, is the process of recovering the original progressive 24fps material from source that has undergone the pulldown and converted to 59.94 fields per second.
What is going on with NTSC DVD framerates for film content?
NTSC DVDs are always encoded for an output framerate of 59.94 fields per second, however, you are allowed to mark frames as being progressive and having a duration of three fields. There are two flags on each frame of video: top_field_first, and repeat_first_field, which you must watch. They lets you encode 23.976fps film sequences on a DVD. It usually looks like this:
So the output (in fields) looks like:
This is a perfect pulldown pattern. Notice that if you use these flags, the framerate of the video is effectively 23.976fps progressive, if you just look at the encoded frames.
But if you expand everything, the output field rate is still exactly 59.94fps, and this is all you need to keep constant on a DVD. That is, you can use these flags for a while, then switch back to normal interlaced content, then switch back to the flags. For example:
That is perfectly valid, and in fact, you see this alot. A great example is the 'Mallrats' NTSC DVD. It has a special 'angle' track where for some chapters, it shows a video sequence of the director and some of the actors discussing the movie. This is shot on video at 59.94fps. While this angle is going on, the film sequence must be expanded as interlaced frames, otherwise it would lag behind the other angle including the director. So, some chapters are interlaced 29.97, other chapters are progressive 23.976.
Even worse, some DVDs like my copy of Akira seem to have been touched up at video speed. That is, you'll see short sequences of like 10 interlaced frames, then back to 23.976fps encoded material.
How do I correct pulldown on DVD content?
You may think that all is lost, but no. There are two approaches. The first is to have the MPEG2 decoding layer expand all content on the DVD to 59.94 interlaced fields per second, and pass it to a heuristics engine. You may say: why do that, when the flags tell you exactly what to do! However, remember that if you do this, copied fields will be byte-for-byte identical, and they'll be easy to test for, and secondly this will get around some problem cases.
The other method is to enable pulldown detection whenever you see interlaced frames, and remember where you were in the pulldown sequence based on what sequence the flags were telling you. I do this in my DVD player app 'movietime' (see http://www.sf.net/projects/movietime/). Whenever I see interlaced content, I fill my diff_factor history with the ideal based on what I was decoding, and assume the interlaced content is pulldown and go at it. It works ok.
How can I mix video and film content without dynamically changing my refresh rate?
What is 2-2 pulldown?
2-2 pulldown is the process of converting 24 frame per second film to 50 frame per second PAL. It just means that for every frame of film, two fields are sent.
When should I deinterlace?
Deinterlacing is useful for viewing interlaced content on a progressive display, and should not be used as a preprocessing stage before TV output.