Encryption‎ > ‎

WebM Encryption RFC

Author: Frank Galligan


The purpose of this RFC is to get comments on a mechanism for adding AES encryption to the WebM specification.


There is a proposal on W3C to add extensions for encrypted media. In order for WebM to be supported, we must define a system-independent way of encrypting the files.

Matroska has support for encrypting certain elements with AES (ContentEncryption element) but not define how they were encrypted with AES.
Please discuss on webm-discuss.

1.0 Definitions

1.1 AES
Advanced Encryption Standard

1.2 Block Cipher
An encryption algorithm that works on fixed length blocks of data.

1.3 Counter Block
This is the block used to generate the keystream with AES-CTR.

1.4 CTR
A mode of AES encryption that uses Counter Blocks to generate a key stream that is then XORed with the plaintext to produce the ciphertext.

1.5 Initialization Vector
A non-secret auxiliary input to cryptographic algorithms used to prevent certain classes of attacks. Fixed size input to the cryptographic algorithm.

1.6 Live Streaming
Media that is captured and sent to users at a specific time.

1.7 CENC
MPEG Common Encryption (ISO/IEC 23001-7)

1.8 VOD
Video on demand. Previously recorded media files that are watched when a user decides to watch them.

Table Definitions

Cells in orange = Proposed additions
L = Level
ID = Matroska/Webm Element ID
D = Default
T = Type

2.0 Use Cases

2.1 Playback of encrypted content over a network

In this use case a content distributor wants to serve protected content to users. The users want to watch the encrypted content, while also seeking to other times within the media.

2.2 Playback of encrypted content from a storage medium

In this use case the user wants to playback the encrypted content from local storage.

2.3 Out of order decryption

In this use case encrypted frames may arrive to a client out of order. The client may want to decrypt the frames as soon as they arrive. This use case is from WebRTC which decodes out of order video frames.

3.0 Requirements

3.1 Main Requirements

3.1.1 Propose as small number of combinations of encryption parameters as possible. Ideally this would be one.
3.1.2 Try to add as little overhead to the stream data as possible.
3.1.3 Support for seeking within VOD files.
3.1.4 Keep the added latency after a seek down as much as possible.
3.1.5 Support for live streaming.
3.1.6 Strive for compatibility with CENC.
3.1.7 Add support for as low startup latency as possible.

Design Idea

4. WebM Common Encryption with Integrity Checking

Having one common encryption for WebM will have benefits on the delivery side as well as on clients.

4.1 Common Encryption Format

The WebM common encryption algorithm will be AES. The key size will be 128 bit. Information on how the blocks are encrypted will be stored in the Track element and interleaved with the Block’s data.

4.2 New Matroska/WebM elements

The idea is to add a master element named ContentEncAESSettings as a sub-element of the  ContentEncryption element, that would contain elements representing the features of AES. ContentEncAESSettings will currently contain one sub element. AESSettingsCipherMode will convey the block cipher mode used with the AES encryption. AESSettingsCipherMode will currently only contain one value, CTR.
Element NameLIDDTDescription
ContentEncryption5[50][35]-mSettings describing the encryption used. Must be present if the value of ContentEncodingType is 1 and absent otherwise.
ContentEncAESSettings6[47][E7]-mSettings describing the encryption algorithm used. If ContentEncAlgo != 5 this must be absent.
AESSettingsCipherMode7[47][E8]1uThe cipher mode used in the encryption. Predefined values:
1 - CTR

With the new elements, clients should be able to decode frames encoded with AES.

4.3 Supported Matroska Encryption Elements

Below are a list of Matroska elements and values that would be added to the WebM specification.

  • Add support for ContentEncryption.
  • Add support for ContentEncAlgo. Only a value of 5 (AES) will be supported.
  • Add support for ContentEncKeyID.
  • Add support for ContentEncAESSettings.
  • Add support for AESSettingsCipherMode. Only a value of 1 (CTR) will be supported.

4.4 Encrypted Block Format

The payload of the Encrypted Blocks will be comprised of three parts. The first part is the Signal Byte. The second part is the IV. The last part of an encrypted Block payload will be the frame data.
0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

|  Signal Byte  |                                               |
+-+-+-+-+-+-+-+-+             IV                                |
|                                                               |
|               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|               |                                               |
+-+-+-+-+-+-+-+-+                                               |
:               Bytes 1..N of encrypted frame                   :
|                                                               |
|                                                               |

4.5 Unencrypted Block Format

The payload of the unencrypted Blocks will be comprised of two parts. The first part is the Signal Byte. The last part of an unencrypted Block payload will be the frame data.
0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

|  Signal Byte  |                                               |
+-+-+-+-+-+-+-+-+                                               |
:               Bytes 1..N of unencrypted frame                 :
|                                                               |
|                                                               |

4.6 Signal Byte Format

0 1 2 3 4 5 6 7
|X|   RSV     |E|
X bit
Extension bit. If set there will be another signal byte following this byte. Used for future expansion. Currently this MUST be set to 0.

RSV bits
Bits reserved for future use.  MUST be set to zero and MUST be ignored.

E bit
Encrypted bit. If set the Block MUST contain an IV immediately followed by an encrypted frame. If not set the Block MUST NOT include an IV and the frame MUST be unencrypted. The unencrypted frame MUST immediately follow the Signal Byte.

4.7 Initialization Vector

The IV MUST be unique for every frame for a given key. The IV SHOULD start with a random value on the first encrypted frame.

4.7.1 Incrementing Initialization Vector

The IV MUST be increased by 1 for every encrypted frame. The IV MUST be stored as a raw stream of bytes. Incrementing of the IV should be treated as an unsigned 64 bit number. I.e. If the IV value of the current encrypted frame is 0xFFFFFFFFFFFFFFFF, then the IV value of the next encrypted frame should be 0.

4.8 CTR Counter Block Format Generation

The Counter Block Format generation is only valid if the stream has a ContentEncAlgo=5 and a AESSettingsCipherMode=1. If the stream has any values that are different then this Counter Block Format generation MUST NOT be used.

Every encrypted frame will need to reinitialize the decryptor with a unique Counter Block. Each Counter Block has a requirement that it must be unique within the same stream for the same encryption key. All Counter Blocks MUST be 16 bytes.

The most significant 8 bytes of the Counter Block is the IV which is set from the IV data in the encrypted Block. The least significant 8 bytes is the Block Counter which is initialized to 0.

4.9 Excess Key Stream Data

After encrypting a frame there may be excess key stream data. This data MUST be discarded before the next frame is encrypted.

4.10 Examples

4.10.1 Three Encrypted Frames

Block Counter = 0x0000000000000000
Counter Block = 0xFFFFFFFFFFFFFFFE0000000000000000

Block Counter = 0x0000000000000000
Counter Block = 0xFFFFFFFFFFFFFFFF0000000000000000

IV = 0x0000000000000000
Block Counter = 0x0000000000000000
Counter Block = 0x00000000000000000000000000000000

4.11 Fast startup recommendation

Acquiring keys for the decryption may take longer than some clients deem acceptable. The recommendation to facilitate faster startup is to create Tracks that have the first number of frames unencrypted.

5. Issues

5.1 Lacing

How should lacing be handled?

5.2 Integrity Check

Should we add this back in if devices will not be able to handle it? Should we make it an optional feature? If we add it should we spec it differently? Should we make it stronger?

5.2.1 Handling Integrity Check Failure

After a client encounters a verification failure what should the client do? Return an error and stop playing the stream? Drop the frame and continue playing without notifying the user?

5.3 Key Rotation

Do we need to add key rotation within a single Track?

6. Revision History

0.5Changed storing of IV values to be a raw stream of bytes.
0.4Removed HMAC.
0.3Frames may be encrypted or unencrypted. Adding signal byte to every frame. Adding Use Cases.
0.2Changing IV prepended to every frame.
0.1First released revision. All frames encrypted. HMAC prepended to every frame. IV derived from Block timestamp.