Figure 1. HEVC divides video frames into a hierarchical quadtree coding structure that uses coding units, prediction units and transform units. CUs, TUs and PUs are grouped in a treelike structure, with the individual branches having different depths for different portions of a picture, all of which form a generic quadtree segmentation structure of large coding units. NEW YORK Internet video has become a practical medium for the delivery of video content to consumers. What has made this possible is the development of video compression, which lowers the enormous amount of bandwidth required to transport video to levels practical with most Internet connections. In this article, well examine some of the technical and business issues associated with two video codec frontrunners: HEVC and VP9.
HEVC, for High Efficiency Video Coding, is also called MPEG-H Part 2 and ITU-T H.265. It represents a state-of-the-art video compression standard that provides about a 50 percent bitrate savings over H.264/MPEG-4 AVC, which in turn provided a similar efficiency over its MPEG-2 predecessor.
AVC solutions have already become widespread in many professional and consumer devices. HEVC, having been ratified by ISO and ITU in 2013, is similarly growing in the same applications, and would appear to be on the road to replacing the earlier codecs. But while MPEG and HEVC have been developed by standards committees representing a legion of strong industrial players, other forces have sought to displace their primacy, most notably, Google, with its VP9. Lets look at the toolkit of each codec.
HEVC incorporates numerous improvements over AVC, including a new prediction block structure and updates that include intra-prediction, inverse transforms, motion compensation, loop filtering and entropy coding. HEVC uses a new concept called coding units (CUs), which sub-partition a picture into arbitrary rectangular regions. The CU replaces the macroblock structure of previous video coding standards, which had been used to break pictures down into areas that could be coded using transform functions. CUs can contain one or more transform units (TUs, the basic unit for transform and quantization), but can also add prediction units (PUs, the elementary unit for intra- and inter-prediction).
While AVC improved on MPEG-2 by allowing multiple block sizes for transform coding and motion compensation, HEVC coding tree blocks can be either 64x64, 32x32, 16x16 or 8x8 pixel regions, and the coding units can now be hierarchically subdivided, all the way down to 4x4 sized units. The use of tree blocks allows parallel processors to decode and predict using data from multiple partitions called wavefront parallel processing, which supports multi-threaded decode.
Because this new coding structure avoids the repetitive blocks of AVC, HEVC is better at reducing blocking artifacts, while at the same time providing a more efficient coding of picture details. HEVC also specifies several planar and DC modes, which reconstruct smooth regions or directional structures in a way that hides artifacts better. An internal bit-depth increase allows encoding of video pictures by processing them with a color depth higher than 8 bits.
Motion compensation is provided with two new methods, and luma and chroma motion vectors are calculated to quarter- and eighth-pixel accuracy, respectively. A new deblocking filter is also provided, which operates only on edges that are on the block grid. After the deblocking filter, HEVC provides two new optional filters, designed to minimize coding artifacts.
VP9 IMPROVES ON VP8
With YouTube carrying so much video content, it stands to reason that the services parent, Google, has a vested interest in not just the technology behind video compression, but also in some of the market considerations attached therein. To that end, VP9 has been developed to provide a royalty-free alternative to HEVC.
Many of the tools used in VP9 (and its predecessor, VP8) are similar to those used in HEVC but ostensibly avoid the intellectual property used in the latter. VP9 supports the image format used for many Web videos: 4:2:0 color sampling, 8-bit color depth, progressive scan, and image dimensions up to 16,383x16,383 pixels. It can go well past these specs, however, supporting 4:4:4 chroma and up to 12 bits per sample.
Figure 2. VP9 supports superblocks that can be recursively partitioned into rectangular blocks. The Chromium, Chrome, Firefox and Opera browsers now all support playing VP9 video in the HTML5 video tag. Both VP8 and VP9 video are usually encapsulated in a format called WebM, a Matroska-based container also supported by Google, which can carry Vorbis or Opus audio.
VP8 uses a 4x4 block-based discrete cosine transform (DCT) for all luma and chroma residual pixels. The DC coefficients from 16x16 macroblocks can then undergo a 4x4 Walsh-Hadamard transform. Three reference frames are used for inter-prediction, limiting the buffer size requirement to three frame buffers, while storing a golden reference frame from an arbitrary point in the past.
VP9 augments these tools by adding 32x32 and 64x64 superblocks, which can be recursively partitioned into rectangular blocks, with enhanced intra- and intermodes, allowing for more efficient coding of arbitrary block patterns within a macroblock. VP9 introduces the larger 8x8 and 16x16 DCTs, as well as the asymmetric DST (discrete sine transform), both of which provide more coding options.
Like HEVC, VP9 supports sub-pixel interpolation and adaptive in-loop deblocking filtering, where the type of filtering can be adjusted depending on other coding parameters, as well as data partitioning to allow parallel processing.
COMPARING HEVC WITH VP9
As you would expect, performance depends on who you ask. Google says VP9 delivers a 50 percent gain in compression levels over VP8 a










