Alpha Channel

Objective

The purpose of this document is to define the method of supporting WebM video with alpha channel information for VP8 video.

Background

One of the most requested features for WebM as used in HTML5 is for alpha channel support, i.e. a value for each pixel in a video frame that indicates the desired transparency, where 0 is completely transparent and effectively masked, and 255 is completely opaque. Values in between specify degrees of opaqueness meaning that the resulting pixel value is the ratio of the normally occluded pixel and the normally occluding pixel.


Use Cases

  • Alpha channel data should be pixel perfect
  • Loss in alpha channel data is acceptable


Design Ideas

Method 1 - VP8 encoding of A-channel

Encoding:
The YUV frame is sent to the encoder (without the A-channel). The encoded output is placed into a Block element in the container. The A-channel is also VP8 encoded (with A-channel in Y plane, dummy values in U and V planes) and the encoded output is placed in the BlockAdditional element of the container. The A-channel uses a separate encoder than that of the YUV frames (to exploit temporal coherence). Though this method might not give a pixel perfect alpha information, it will be perceptually lossless given enough datarate.




Decoding:
The “Block” data is sent to the VP8 decoder and the “BlockAdditional” data (if present) is sent to another VP8 decoder and a component after the decoder will reconstruct the appropriate YUV frame and A-channel, which is then passed on to the renderer.




Alternatives Considered

Method 2 - Lossless encoding of A-channel
This method is almost similar to Method 2 with the only difference being the A-channel is encoded by a lossless technique (which is to be defined later) and is placed into BlockAdditional element in the container (exact spec to be defined later). This is optimized such that the BlockAdditional is present only when there is a change in A-channel (for e.g. if the A-channel doesn’t change between frame 20 and frame 35, then there will be a block additional only on frame 20 and not on frames 21 through 35). The lossless encoding technique used can be similar to the one used for alpha channels in WebP, so essentially it will be alpha part of a standard WebP frame. Again, this is tentative and if there is a better lossless encoding method that will exploit temporal redundancy, we can probably use that.

Methods 1 & 2 need vpx to support a new paradigm of having multiple encoder/decoder instances at the same time (one for video data and one for alpha channel).

Method 3 - Double height and VP8 encoding
Encoding:

The YUV frame height is doubled and the A-plane information is stored in the bottom half of the Y plane. The bottom halves of the U and V planes are not used and are set to fixed dummy values. This frame is then sent to the encoder, which does not know presence of A-channel in the raw YUV frame. A flag is added to the container indicating the presence of alpha channel.






Decoding:
The decoder will decode the encoded frame with A-channel and output YUV frames of twice the original source height. Again, the decoder does not know about the presence of A-channel in the decoded data. If the flag is set in the container, a component after the encoder must reconstruct the original YUVA frame. Then the YUV and A-channels are passed on to the renderer appropriately.






Pros and Cons

ProsCons
Method 1 (VP8 encoding of A-Channel)
  • Least amount of overhead

  • multiple encoders/decoders may be necessary (one for yuv frames and one for alpha frames)
  • vp8’s lossy nature may result in significantly different alpha channel data from the original (although this can be overcome by running the alpha encoder at a higher bitrate and some clamping on the decoding side).
  • most changes to tools and players
Method 2
(Lossless encoding of A-channel)

  • no need to run multiple instances of vp8 encoder/decoder
  • alpha data will be pixel perfect
  • if we use the same lossless method as webp, then reuse of libwebp code for lossless encoding/decoding

  • overhead from lossless compression may be high
  • most changes to tools and players
Method 3 (Height doubling and VP8 encoding)
  • Easy to implement. Least amount of changes to tools and player to support this format

  • not 100% backward compatible (players that don’t recognize the flag might play a video of twice the height and bottom half may be junk).
  • vp8’s lossy nature may result in significantly different alpha channel data from the original
  • Every frame must be encoded with Alpha and bogus UV data. If the Alpha does not change for a long time then encoder may waste bits (hopefully not too much since the data is not changing)
  • If Alpha does not change, Decoder will be wasting cycles decoding redundant data

Č
ą
Lou Quillio,
Dec 7, 2012, 3:34 PM
ą
Lou Quillio,
Dec 7, 2012, 3:35 PM
ą
Lou Quillio,
Dec 7, 2012, 3:35 PM
ą
Lou Quillio,
Dec 7, 2012, 3:35 PM
Comments