H.264 Intra Coding

 

In this post, I will outline the algorithms used to encode macroblocks in intra-mode. Intra coding is designed to exploit spatial correlation in a given slice. For example, when encoding a slice which has uniform color throughout, there is a high correlation between the macroblocks of that slice (In this context, note that inter coding is designed to exploit temporal correlations (across multiple frames)). You can read my earlier articles on video coding here:
We will start with intra coding for the luma blocks of a macroblock. For each 16x16 luma block, a decision should be made whether to encode the block with 16x16 resolution, or as 16 4x4 blocks. The encoder runs the prediction algorithms for each of the two choices above, and computes the SAE (Sum of the Absolute Error between the actual 16x16 block and the predicted 16x16 block). The encoder will pick the choice with the least SAE. Note that the predicted blocks are obtained using neighbouring blocks, exploiting space correlation in the slice.

For encoding as a 16x16 block, there are 4 predicted 16x16 blocks possible. Of course, the predicted block resulting in the least SAE is used to represent this choice. For encoding as 16 4x4 blocks, there are 9 predicted blocks for each 4x4 block. Again, the predicted block that minimizes the SAE is chosen for each 4x4 block. Thus, for making the choice, the encoder needs to go through the following number of SAE computations: 4 predictions * 256 pixels for the one 16x16 block + 9 predictions * 16 pixels for each 4x4 block * 16 4x4 blocks = 3328. Several algorithms are proposed for reducing the complexity of making this choice; these are beyond the scope of my blog posts.

We will now examine the 4 predicted 16x16 blocks, which are illustrated in the picture below. The vertical mode has the immediate top row of pixels replicated across the block. The horizontal mode has the pixels on the left column replicated across the block. The DC mode fills each pixel with the mean of the pixels from the left column and the pixels from the top row. The plane mode uses a linear function of the top row and left column of pixels to generate values for the entire 16x16 block of pixels (this mode is good when the luma varies smoothly over the block).












The 9 modes for the 4x4 blocks are illustrated below.The vertical, horizontal, and DC modes are as described above (for the 16x16 blocks).
























For modes 3-8, the value of a pixel is computed as a weighted average of the neighboring pixels. The pictures below indicate how the predicted values are computed for the diagonal down-left, diagonal down-right, vertical-left, vertical-right, horizontal-down, and horizontal-up modes.





































Note that slices in H.264 are encoded independently (i.e., no intra predictions are used across slices), and hence pixels A-H and pixel M (or Q in the 4x4 pictures above) and pixels I-L might not always be available for prediction. If some pixels are not available, the modes which use those pixels are not considered. Only for the DC mode, if E-H are unavailable, they are replaced by the value of D. Note that the DC mode is the only one which can be used when none of the neighboring pixels are available (with a predicted value of 0 for all the pixels).

For the typical YUV420 color space encoding, there are two 8x8 chroma blocks (Cb and Cr). The 8x8 intra prediction schemes follow the 4 16x16 luma modes described above. The same prediction mode is used for both the chroma blocks. Also, whenever any of the 4 8x8 luma blocks of a 16x16 macroblock are intra coded, the chroma blocks are intra coded as well.

For some anomalous content or when the quantization parameters are too small, there is an I_PCM mode where the blocks are sent without any prediction/transformation/quantization. In such cases, sending the original pixels is more efficient with bits than the regular process.

Komentar