Alpha Channel
Objective
The purpose of this document is to define the method of supporting WebM video with alpha channel information for VP8 video.
Background
One of the most requested features for WebM as used in HTML5 is for alpha channel support, i.e. a value for each pixel in a video frame that indicates the desired transparency, where 0 is completely transparent and effectively masked, and 255 is completely opaque. Values in between specify degrees of opaqueness meaning that the resulting pixel value is the ratio of the normally occluded pixel and the normally occluding pixel.
Use Cases
Alpha channel data should be pixel perfect
Loss in alpha channel data is acceptable
Design Ideas
Method 1 - VP8 encoding of A-channel
Encoding:
The YUV frame is sent to the encoder (without the A-channel). The encoded output is placed into a Block element in the container. The A-channel is also VP8 encoded (with A-channel in Y plane, dummy values in U and V planes) and the encoded output is placed in the BlockAdditional element of the container. The A-channel uses a separate encoder than that of the YUV frames (to exploit temporal coherence). Though this method might not give a pixel perfect alpha information, it will be perceptually lossless given enough datarate.
Decoding:
The “Block” data is sent to the VP8 decoder and the “BlockAdditional” data (if present) is sent to another VP8 decoder and a component after the decoder will reconstruct the appropriate YUV frame and A-channel, which is then passed on to the renderer.
Alternatives Considered
Method 2 - Lossless encoding of A-channel
This method is almost similar to Method 2 with the only difference being the A-channel is encoded by a lossless technique (which is to be defined later) and is placed into BlockAdditional element in the container (exact spec to be defined later). This is optimized such that the BlockAdditional is present only when there is a change in A-channel (for e.g. if the A-channel doesn’t change between frame 20 and frame 35, then there will be a block additional only on frame 20 and not on frames 21 through 35). The lossless encoding technique used can be similar to the one used for alpha channels in WebP, so essentially it will be alpha part of a standard WebP frame. Again, this is tentative and if there is a better lossless encoding method that will exploit temporal redundancy, we can probably use that.
Methods 1 & 2 need vpx to support a new paradigm of having multiple encoder/decoder instances at the same time (one for video data and one for alpha channel).
Method 3 - Double height and VP8 encoding
Encoding:
The YUV frame height is doubled and the A-plane information is stored in the bottom half of the Y plane. The bottom halves of the U and V planes are not used and are set to fixed dummy values. This frame is then sent to the encoder, which does not know presence of A-channel in the raw YUV frame. A flag is added to the container indicating the presence of alpha channel.
Decoding:
The decoder will decode the encoded frame with A-channel and output YUV frames of twice the original source height. Again, the decoder does not know about the presence of A-channel in the decoded data. If the flag is set in the container, a component after the encoder must reconstruct the original YUVA frame. Then the YUV and A-channels are passed on to the renderer appropriately.
Pros and Cons