eoio :: video codecs

video codecs

lva 187.089 vu2: ak d. i. u. k-systeme 5 aka "quicktime"

about
	dates join jörg+tom students
software linx
	after effects pd + gem max/msp + jitter jmax + dips
video
	capture codecs formats

compression

"why you need to compress
video, in its raw form, takes up huge amounts of space. for example, uncompressed ntsc video is about 27 megabytes per second! at this size, you could fit only about 24 seconds of video on a cd-rom ... and no cd-rom drive could transfer such a file fast enough to play it smoothly.
on the audio side, compression is also important, especially for web use. for example, uncompressed cd audio is 150 kilobytes per second, which would completely saturate a t1 connection and leave no room for video. in order to make desktop movies feasible, compression algorithms were created. compression is the process by which large movie files are reduced in size by the removal of redundant audio and video data. for more dramatic size reduction, less important data may also be removed, resulting in image and/or sound degradation. the codec is the algorithm that handles the compression of your video or audio, as well as the decompression when it is played. quicktime has several codecs available within it for free, and there are professional versions of certain codecs which may be purchased for superior quality and options.
how video codecs work
most codecs compress video using spatial and/or temporal compression techniques to remove redundant data. understanding the basics of how a codec compresses video can help you create and process your material to make the codec's job easier, which in turn will produce superior final quicktime movies.
spatial compression
one method of compressing movies is to remove the redundant data within any given image. for example, in a given movie there may be areas of flat color with many identical pixels. instead of specifying each pixel and its color, a codec can generalize by specifying the coordinates of the area and the area's color; it doesn't have to note all the little details. this manner of reducing the size of an image is called 'spatial compression'.
the less detail there is in the image, the better the codec is able to generalize the image and compress it. removing fine details in preprocessing can improve the spatial compression of an image. video noise often looks like fine detail to a codec, and should be removed to improve spatial compression. creating video with simple backgrounds will also improve how well the final movie compresses.
temporal compression
another way to make a frame smaller is to look for changes between consecutive frames and only store the differences instead of the entire image. the original reference frame on which these differences are based is called a keyframe. keyframes contain the entire image, and look just like a normal picture.
the frames based on the changes between frames are called delta frames, or difference frames. they contain only information for the areas that are different from the last frame, and are usually much smaller than the keyframes.
for example, the first frame of any movie is always a keyframe, and contains the entire image. after this initial keyframe, there normally follows a series of delta frames. these delta frames show only the differences between the previous frame and the current frame. the delta frame wouldn't contain information on a truly static background, because it wouldn't be changing. every second or so a new keyframe is added to correct for slight cumulative errors in the delta frames. this kind of compression tracks changes over a period of time and is therefore called 'temporal compression'. video content that changes very little from frame to frame is best suited for temporal compression. whenever possible, you should use a tripod when filming video for desktop playback and attempt to reduce camera and subject movement. you should also avoid complex transitions and fast edits to minimize the differences between frames and improve the final compression.
a|symmetry
the actual process of analyzing each frame and creating a compressed version is what takes so long in video compression – for each frame, vast numbers of mathematical calculations are performed to generate the final compressed frame.
the codec also controls the playback of the compressed video. it's no accident that the decompression routines are usually much faster than the compression routines – this allows the frames to be decompressed fast enough to play in real-time. a frame that took a couple of seconds to compress might take less than 1/30th of a second to decompress. codecs that take a long time to compress but decompress quickly are known as 'asymmetric'. for example, the sorenson video codec is extremely asymmetric, which means that movies made with sorenson video take a long time to compress, but decompress in real-time and play smoothly.
codecs that are intended for 'live' broadcasts and video teleconferencing are usually 'symmetric', meaning they both compress and decompress in the same amount of time. fast compression and decompression is critical for real-time broadcasting. h.263 is very close to being a symmetric codec.
because symmetric codecs don't have as long to optimize each frame during compression as asymmetric codecs, the results often don't look as good as movies made with asymmetric codecs. if you are planning to put video onto a web site for viewers to watch 'on demand', you should probably choose a high-quality asymmetric codec, such as sorenson video."

[compiled from terran interactive's pdf 'how to produce high-quality quicktime', online as html here ... terran interactive is the former maker of sorenson video]

"video compression is the art of throwing as much data away as possible without it showing. video compression methods tend to be lossy – that is, what comes out after decoding isn't identical to what was originally encoded. by cutting video's resolution, colour depth and frame rate, pcs managed postage stamp-size windows at first, but then ways were devised to represent images more efficiently and reduce data without affecting physical dimensions. the technology by which video compression is achieved is known as a 'codec', an abbreviation of compression/decompression. various types of codec have been developed – implementable in either software and hardware, and sometimes utilising both – allowing video to be readily translated to and from its compressed state.

lossy techniques reduce data – both through complex mathematical encryption and through selective intentional shedding of visual information that our eyes and brain usually ignore – and can lead to perceptible loss of picture quality. 'lossless' compression, by contrast, discards only redundant information. codecs can be implemented in hardware or software, or a combination of both. they have compression ratios ranging from a gentle 2:1 to an aggressive 100:1, making it feasible to deal with huge amounts of video data. the higher the compression ratio, the worse the resulting image. colour fidelity fades, artefacts and noise appear in the picture, the edges of objects become over-apparent, until eventually the result is unwatchable.

by the end of the 1990s, the dominant techniques were based on a three-stage algorithm known as dct [discrete cosine transform]. dct uses the fact that adjacent pixels in a picture – either physically close in the image [spatial] or in successive images [temporal] – may be the same value. a mathematical transform – a relative of the fourier transform – is performed on grids of 8*8 pixels [hence the blocks of visual artefacts at high compression levels]. it doesn't reduce data but the resulting coefficient frequency values are no longer equal in their information-carrying roles. specifically, it's been shown that for visual systems, the lower frequency components are more important than high frequency ones. a quantisation process weights these accordingly and ejects those contributing least visual information, depending on the compression level required. for instance, losing 50 per cent of the transformed data may only result in a loss of five per cent of the visual information. then entropy encoding – a lossless technique – jettisons any truly unnecessary bits.

initially, compression was performed by software. limited cpu power constrained how clever an algorithm could be to perform its task in a 25th of a second – the time needed to draw a frame of full-motion video. nevertheless, avid technology and other pioneers of nle [non-linear editing] introduced pc-based editing systems at the end of the 1980s using software compression. although the video was a quarter of the resolution of broadcast tv, with washed-out colour and thick with blocky artefacts, nle signalled a revolution in production techniques. at first it was used for off-line editing, when material is trimmed down for a programme. up to 30 hours of video may be shot for a one-hour documentary, so it's best to prepare it on cheap, non-broadcast equipment to save time in an on-line edit suite.

although the quality of video offered by the first pc-based nle systems was worse than the vhs vcrs used for off-line editing, there were some advantages. like a word processor for video, they offered a faster and more creative way of working. a user could quickly cut and paste sections of video, trim them and make the many fine-tuning edits typical of the production process. what's more, importing an accurate edl [edit decision list] generated by an nle system into the on-line computer on a floppy disk was far better than having to type in a list of time-codes. not only was nle a better way to edit but, by delivering an off-line product closer to the final programme, less time was needed in the on-line edit suite.

nle systems really took off in 1991, however, when hardware-assisted compression brought vhs-quality video. the first hardware-assisted video compression is known as m-jpeg [motion jpeg]. it's a derivation of the dct standard developed for still images known as jpeg. it was never intended for video compression, but when c-cube introduced a codec chip in the early 1990s that could jpeg as many as 30 still images a second, nle pioneers couldn't resist. by squeezing data as much as 50 times, vhs-quality digital video could be handled by pcs.

in time, pcs got faster and storage got cheaper, meaning less compression had to be used so that better video could be edited. by compressing video by as little as 10:1 a new breed of non-linear solutions emerged in the mid-1990s. these systems were declared ready for on-line editing; that is, finished programmes could essentially be played out of the back of the box. their video was at least considered to be of broadcast quality for the sort of time and cost-critical applications that most benefited from nle, such as news, current affairs and low-budget productions.

the introduction of this technology proved controversial. most images compressed cleanly at 10:1, but certain material – such as that with a lot of detail and areas of high contrast – were degraded. few viewers would ever notice, but broadcast engineers quickly learnt to spot the so-called ringing and blocky artefacts dct compression produced. also, in order to change the contents of the video images, to add an effect or graphic, material must first be decompressed and then recompressed. this process, though digital, is akin to an analogue generation. artefacts are added like noise with each cycle in a process referred to as concatenation. sensibly designed systems render every effect in a single pass, but if several compressed systems are used in a production and broadcast environment, concatenation presents a problem.

compression technology arrived just as proprietary uncompressed digital video equipment had filtered into all areas of broadcasters and video facilities. though the cost savings of the former were significant, the associated degradation in quality meant that acceptance by the engineering community was slow at first. however, as compression levels dropped – to under 5:1 – objections began to evaporate and even the most exacting engineer conceded that such video was comparable to the widely used betasp analogue tape. mild compression enabled sony to build its successful digital betacam format video recorder, which is now considered a gold standard. with compression a little over 2:1, so few artefacts [if any] are introduced that video goes in and out for dozens of generations apparently untouched.

the cost of m-jpeg hardware has fallen steeply in the past few years and reasonably priced pci cards capable of a 3:1 compression ratio and bundled with nle software are now readily available. useful as m-jpeg is, it wasn't designed for moving pictures. when it comes to digital distribution, where bandwidth is at a premium, the mpeg family of standards – specifically designed for video – offer significant advantages." [quote-source]

mpeg

"the moving picture experts group [mpeg] have defined a series of standards for compressing motion video and audio signals using dct [discrete cosine transform] compression which provide a common world language for high-quality digital video. these use the jpeg algorithm for compressing individual frames, then eliminate the data that stays the same in successive frames. the mpeg formats are asymmetrical – meaning that it takes longer to compress a frame of video than it does to decompress it – requiring serious computational power to reduce the file size. the results, however, are impressive:

mpeg-1 [aka white book standard] was designed to get vhs-quality video to a fixed data rate of 1.5 mbit/s so it could play from a regular cd [for the more or less defunct videocd format]. published in 1993, the standard supports video coding at bit-rates up to about 1.5 mbit/s and virtually transparent stereo audio quality at 192 kbit/s, providing 352*240 resolution at 30fps, with quality roughly equivalent to vhs videotape. the 352*240 resolution is typically scaled and interpolated. [scaling causes a blocky appearance when one pixel – scaled up – becomes four pixels of the same colour value. interpolation blends adjacent pixels by interposing pixels with 'best-guess' colour values.] most graphics chips can scale the picture for full-screen playback, however software-only half-screen playback is a useful trade-off. mpeg-1 enables more than 70 minutes of good-quality video and audio to be stored on a single cd-rom disc. prior to the introduction of pentium-based computers, mpeg-1 required dedicated hardware support. it is optimised for non-interlaced video signals.
during 1990, mpeg recognised the need for a second, related standard for coding video at higher data rates and in an interlaced format. the resulting mpeg-2 standard is capable of coding standard definition television at bit-rates from about 1.5 mbit/s to some 15 mbit/s. mpeg-2 also adds the option of multi-channel surround sound coding and is backwards compatible with mpeg-1. it is interesting to note that, for video signals coded at bitrates below about 3 mbit/s, mpeg-1 may be more efficient than mpeg-2. mpeg-2 has a resolution of 704*480 at 30fps – four times greater than mpeg-1 – and is optimised for the higher demands of broadcast and entertainment applications, such as dss satellite broadcast and dvd-video. at a data rate of around 10 mbit/s, the latter is capable of delivering near-broadcast-quality video with five-channel audio. resolution is about twice that of a vhs videotape and the standard supports additional features such as scalability and the ability to place pictures within pictures.
mpeg-3, intended for hdtv, was rolled into mpeg-2.
in 1993 work was started on mpeg-4, a low-bandwidth multimedia format akin to quicktime that can contain a mix of media, allowing recorded video images and sounds to co-exist with their computer-generated counterparts. importantly, mpeg-4 provides standardised ways of representing units of aural, visual or audio-visual content, as discrete 'media objects'. these can be of natural or synthetic origin, meaning, for example, they could be recorded with a camera or microphone, or generated with a computer. possibly the greatest of the advances made by mpeg-4 is that it allows viewers and listeners to interact with objects within a scene.
mpeg-7, formally named 'multimedia content description interface', aims to create a standard for describing the multimedia content data that will support some degree of interpretation of the information's meaning, which can be passed onto, or accessed by, a device or a computer code.

mpeg video needs less bandwidth than m-jpeg because it combines two forms of compression. m-jpeg video files are essentially a series of compressed stills. using intraframe, or spatial compression, it disposes of redundancy within each frame of video. mpeg does this but also utilises another process known as interframe, or temporal compression. this eradicates redundancy between video frames. take two, sequential frames of video and you'll notice very little changes in a 25th of a second. so mpeg reduces the data rate by recording changes instead of complete frames.

mpeg video streams consist of a sequence of sets of frames known as a gop [group of pictures]. each group, typically eight to 24 frames long, has only one complete frame represented in full, which is compressed using only intraframe compression. it's just like a jpeg still and is known as an i frame. around it are temporally-compressed frames, representing only change data. during encoding, powerful motion prediction techniques compare neighbouring frames and pinpoint areas of movement, defining vectors for how each will move from one frame to the next. by recording only these vectors, the data which needs to be recorded can be substantially reduced. p [predictive] frames, refer only to the previous frame, while b [bi-directional] rely on previous and subsequent frames. this combination of compression techniques makes mpeg highly scalable. not only can the spatial compression of each 1 frame be cranked up, but by using longer gops with more b and p frames, data rates are pushed even lower." [quote-source]

m-jpeg

"jpeg is a well-known standard for compressing stills. unlike mpeg, m-jpeg compresses and stores every frame rather than only the differences between one frame and the next. thus it requires more space than mpeg, but it is more efficient when rapid scene changes are involved, and easier to edit. it is capable of a variety of compression ratios, typically between 2:1 and 12:1. at 5:1 or lower, its deemed broadcast quality. higher than that, up to about 12:1, is more than acceptable for semi-professional or consumer purposes.

the m-jpeg codec works best when contained in microcode on a video capture card chip. when implemented in hardware in this way the pc's main processor is left free to concentrate on other tasks, such as maintaining the required hard disk data transfer rates. the algorithm can also be worked into a software codec, which allows video to be seamlessly edited in applications such as adobe premiere.

despite its role as the workhorse of the digital video universe, the future is looking uncertain for m-jpeg. the new dv format has spread like wildfire through the professional and mid-range video market. it's totally digital, offers better picture quality than analogue-to-digital conversion can ever hope to achieve and has industry heavyweights sony and panasonic behind it. more importantly, it's custom-designed to bring real-time, high-quality video editing to the desktop pc." [quote-source]

cinepak

"cinepak is another asymmetric video compressor, developed jointly by apple and supermac [a company later acquired by radius]. the format outputs 320*240 [quarter screen] at 15fps with good quality, at a data rate that even slow single-speed and 2* cd-rom players can deliver. on high-performance computers, the playback rate can reach 30fps, but cinepak movies are usually recorded at intentionally low frame rates to accommodate the installed base of slower cd-rom players. scaling the window size requires additional processing power and tends to be pixelated [a blocky appearance]. this cross-platform, software-only, scaleable codec is licensed for several video players, including microsoft video for windows and apple's quicktime. with better colour definition than other codecs, cinepak is the choice for compressing 'natural' video, i.e., video without a lot of graphics or animation." [quote-source]

ivi/indeo video interactive

"shortly after the introduction of apple quicktime, intel responded with its indeo video interactive [ivi or indeo 4.0] codec. this format allows for scaleable software-only video playback. ivi can compress video symmetrically [in real time, larger file size] or asymmetrically [off-line, smaller file size, low data rates, highest quality]. compression times have been dramatically shortened by the new off-line quick compressor, which is up to 50 times faster than previous versions. the earlier indeo 3.1 and 3.2 codecs typically managed 320*240 at 15fps on intel 486-based computer, and scaling the window resulted in a pixelated image. the current version is optimised for pentium pro and pentium ii processors, resulting in smooth 30fps playback. indeo delivers good quality on low-end pentium-processor computers as well, employing special techniques for graceful scalability.

in contrast to quicktime, which drops frames intentionally to accommodate slower computers, indeo dynamically varies image quality according to processor power available during playback. the frame rate remains constant – with no dropped frames – instead trading off a degree of detail. additionally, indeo's 'alternate line zoom-by-two' doubles window size by horizontal pixel doubling, then drawing a row of black pixels in between each row. this smoothing technique minimises the pixelation associated with scaling the window. other innovative features include 'transparency', a compositing effect in which an object can be layered on top of video, just as a tv weatherman stands in front of a blue screen so that his image can be electronically cut and placed on top of a background layer, the weather map. indeo's sophisticated implementation includes compositing over moving backgrounds, moving objects [sprites] across frames, and more, comprising the 'interactive' features. indeo is supported by microsoft vfw and activemovie." [quote-source]

quicktime

"recognising the drawback of requiring a costly playback adapter, apple developed a video format that can be played without special add-on hardware. the result, quicktime, represents a major milestone for digital video. it provides a multimedia architecture that synchronises all types of digital media, including video, sound, text, graphics and music. on playback quicktime movies gracefully drop video frames as necessary to maintain continuous sound synchronisation. such scalability was a major breakthrough, transforming the macintosh into a viable video playback platform. while early quicktime movies were typically grainy postage-stamp-size windows [160*120 pixels] with jerky motion [12fps], the format has matured to deliver full-frame [640*480], full-motion [30fps] video suitable for professional applications. due to its well-defined hardware abstraction layer, quicktime is a cross-platform standard, with versions running on windows- and nt-based pcs and unix-based workstations in addition to its native apple macintosh environment. its open architecture supports many file formats and codecs, including cinepak, indeo, motion jpeg and mpeg-1, and is extensible to support future codecs, such as dvcam." [quote-source]

sorenson video

"sorenson video's single biggest advantage is its ability to deliver excellent quality video at low data rates. the first mistake people usually make with sorenson is to give it too much data rate. giving sorenson video too much data per second can "choke" the codec on playback, and make it start skipping frames as it runs out of cpu power. if you're used to compressing 320*240 movies with cinepak at 200kbytes/second, try them with sorenson video at 100kbps, or even 50kbps – you may be surprised with the resulting quality.

for the best results, always use variable bitrate [vbr] encoding with sorenson video. this is a two-pass technique which analyzes each clip to determine which sections are the hardest, then allocates bytes as efficiently as possible. it takes longer, and requires both sorenson video developer edition, and media cleaner pro, but it's worth it. some clips can retain their quality at half the data rate they'd otherwise require, and transitions in particular tend to look much better at low data rates. as a point of comparison, nearly every major dvd-video title released uses variable bitrate mpeg in order to get the best results – vbr is a really good thing. temporal compression is a real strength of the sorenson video codec. movies with relatively low motion [such as 'talking head' clips of interviews, etc] can compress extremely well. also, doubling the frame rate does not usually require doubling the data rate for comparable image quality.

sorenson video takes more computing power to get a pixel to the screen than cinepak. so it's important to be realistic about frame sizes and frame rates. 320*240*15fps will play fine on almost all powermacs g3 and pentium iii, while 640*480*30fps won't. on the bright side, sorenson-compressed movies scale up much more smoothly than cinepak. try doubling 320*240 to fullscreen for impressive results on fast powermacs. another way to keep the pixel rate lower is to take advantage of wide aspect-ratio movies. a theatrical trailer shot in 16*9 aspect ratio, properly cropped, and intelecine'd to its original 24fps has almost half the pixel rate it would if left at 320*240*30fps ... but will look even better.

note: sorenson video doesn't need nearly as many key frames as other codecs, such as cinepak. using too many key frames often results in poorer image quality. the difference in size between sorenson video key frames and delta frames is often much greater than with other codecs - sorenson key frames are usually very large relative to the small delta frames. this is normal, and doesn't cause problems in playback.

pros

performance is primarily constrained by the pixel rate of the compressed video
provides much higher image quality than cinepak, with smaller files. it is often possible to get twice the image quality at less than half the data rate
tuned to work well from 2–200 kbps
supports variable bitrate encoding, which provides the best possible results at any data rate

cons

playback of cd-rom video requires faster computers than cinepak
movies larger than 480*320, or at data rates above 150kbps, do not play smoothly except on high-end machines. while picture quality is usually outstanding at higher rates, you should test these movies on your target machines to determine if playback performance is acceptable
highly saturated colors [especially bright red] tend to produce blockiness and 'bleeding'

tips

always make your movies a multiple of four pixels in both dimensions. this allows sorenson video to use hardware acceleration for better results when it is available
be wary of bright saturated colors, especially primary red. they tend to "smear" a bit regardless of how much data you give the codec. avoiding saturated colors in lettering is especially important
if you're familiar with cinepak and just starting with sorenson, you'll need to get used to working with data rates much lower than you're used to. for a 320*240*15fps cinepak movie, 300kbyte/s is a great rate. for sorenson, 75 would be a whole lot better
a good starting point for key frames is one every 10 seconds. this is 10 times your fps. for example, if you are making a 15fps movie, set the key frame field to 150
performance is primarily constrained by the pixel rate of the compressed video
in some cases, scaling the video up by 1.5 may provide the best results. for example, encode at 192*144 and play back at 288*216
keep both horizontal and vertical dimensions a multiple of four [i.e. 240 wide is good; 241 is not]"

[compiled from terran interactive's pdf 'how to produce high-quality quicktime', online as html here ... terran interactive is the former maker of sorenson video]

avi/video for windows

"avi [audio video interleaved] is microsoft's generic format for digital video in windows, provided via its mci [media control interface]. avi allows for a number of compression methods, in real-time, non-real-time, and with or without hardware assistance. unlike quicktime, the video for windows [vfw] video player is not a cross-platform technology, but then windows is the dominant operating system. the initial release, introduced in late 1992, was capable of displaying 320*240 pixels at 15fps. the small window size and slow frame rate were largely a limitation of the hardware of the day, typically a 486-based computer with 4mb of ram. today's pentium processors are capable of full motion playback of avi files at the maximum resolution of the screen. codecs supported include cinepak, indeo and microsoft video 1." [quote-source]

activemovie

"activemovie, a microsoft api that was announced in march 1996, is receiving wide support in the computer industry as 'the next generation of cross-platform digital video technology for the desktop and the internet', according to a microsoft press release. it is being touted by industry observers as the cure for the deficiencies in microsoft's vfw and apple's quicktime. activemovie removes most of the limitations imposed by vfw, such as the small number of supported file formats, limited i/o throughput, inconsistent driver models, and the lack of driver compatibility between windows 95 and windows nt. activemovie solves these problems primarily by using the component object model [com] as its foundation, the most widely recognised implementation of which is object linking and embedding [ole]. various objects in the model control such actions as decompressing data, adjusting volume levels, and so forth.

building activemovie on the com architecture, microsoft has provided application developers with a digital video api that has a number of benefits, such as independence from operating systems and programming languages, thus allowing the same or similar code to be used on multiple platforms. activemovie also supports more popular formats – including mpeg audio, .wav audio, mpeg video, and apple quicktime video – making it especially convenient for internet and intranet application builders. moreover, activemovie is integrated with microsoft's directx technology. this allows it to automatically take advantage of accelerating video and audio hardware to allow each computer to perform according to its capabilities. for example, activemovie improves the video playback quality of avi and quicktime movies by using directdraw, a directx component, along with features present on many standard graphics cards.

one of activemovie's most impressive features is the ability to decode mpeg video using either hardware or software, including mpeg-2. it can decode mpeg-1 entirely in software and provide high-quality playback on pentium-based systems. or, if the computer has hardware for decoding mpeg, activemovie can use directmpeg, another component of directx, to access this hardware and play back the video seamlessly.

activemovie has recently been enhanced and is now called directshow. the largest enhancement included in this change is that directshow supports dvd while activemovie did not." [quote-source]

modified 2003–11–13