Author: Frank GalliganObjectiveThe purpose of this RFC is to get comments on a mechanism for adding AES encryption to the WebM specification.BackgroundThere is a proposal on W3C to add extensions for encrypted media. In order for WebM to be supported, we must define a system-independent way of encrypting the files.Matroska has support for encrypting certain elements with AES (ContentEncryption element) but not define how they were encrypted with AES. Please discuss on webm-discuss. 1.0 Definitions1.1 AESAdvanced Encryption Standard 1.2 Block Cipher An encryption algorithm that works on fixed length blocks of data. 1.3 Counter Block This is the block used to generate the keystream with AES-CTR. 1.4 CTR A mode of AES encryption that uses Counter Blocks to generate a key stream that is then XORed with the plaintext to produce the ciphertext. 1.5 Initialization Vector A non-secret auxiliary input to cryptographic algorithms used to prevent certain classes of attacks. Fixed size input to the cryptographic algorithm. 1.6 Live Streaming Media that is captured and sent to users at a specific time. 1.7 CENC MPEG Common Encryption (ISO/IEC 23001-7) 1.8 VOD Video on demand. Previously recorded media files that are watched when a user decides to watch them. Table DefinitionsCells in orange = Proposed additionsL = Level ID = Matroska/Webm Element ID D = Default T = Type 2.0 Use Cases2.1 Playback of encrypted content over a networkIn this use case a content distributor wants to serve protected content to users. The users want to watch the encrypted content, while also seeking to other times within the media.2.2 Playback of encrypted content from a storage mediumIn this use case the user wants to playback the encrypted content from local storage.2.3 Out of order decryptionIn this use case encrypted frames may arrive to a client out of order. The client may want to decrypt the frames as soon as they arrive. This use case is from WebRTC which decodes out of order video frames.3.0 Requirements3.1 Main Requirements3.1.1 Propose as small number of combinations of encryption parameters as possible. Ideally this would be one.3.1.2 Try to add as little overhead to the stream data as possible. 3.1.3 Support for seeking within VOD files. 3.1.4 Keep the added latency after a seek down as much as possible. 3.1.5 Support for live streaming. 3.1.6 Strive for compatibility with CENC. 3.1.7 Add support for as low startup latency as possible. Design Idea4. WebM Common Encryption with Integrity CheckingHaving one common encryption for WebM will have benefits on the delivery side as well as on clients.4.1 Common Encryption FormatThe WebM common encryption algorithm will be AES. The key size will be 128 bit. Information on how the blocks are encrypted will be stored in the Track element and interleaved with the Block’s data.4.2 New Matroska/WebM elementsThe idea is to add a master element named ContentEncAESSettings as a sub-element of the ContentEncryption element, that would contain elements representing the features of AES. ContentEncAESSettings will currently contain one sub element. AESSettingsCipherMode will convey the block cipher mode used with the AES encryption. AESSettingsCipherMode will currently only contain one value, CTR.
With the new elements, clients should be able to decode frames encoded with AES. 4.3 Supported Matroska Encryption ElementsBelow are a list of Matroska elements and values that would be added to the WebM specification.
4.4 Encrypted Block FormatThe payload of the Encrypted Blocks will be comprised of three parts. The first part is the Signal Byte. The second part is the IV. The last part of an encrypted Block payload will be the frame data.0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Signal Byte | | +-+-+-+-+-+-+-+-+ IV | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | +-+-+-+-+-+-+-+-+ | : Bytes 1..N of encrypted frame : | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 4.5 Unencrypted Block FormatThe payload of the unencrypted Blocks will be comprised of two parts. The first part is the Signal Byte. The last part of an unencrypted Block payload will be the frame data.0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Signal Byte | | +-+-+-+-+-+-+-+-+ | : Bytes 1..N of unencrypted frame : | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 4.6 Signal Byte Format0 1 2 3 4 5 6 7+-+-+-+-+-+-+-+-+ |X| RSV |E| +-+-+-+-+-+-+-+-+ X bit Extension bit. If set there will be another signal byte following this byte. Used for future expansion. Currently this MUST be set to 0. RSV bits Bits reserved for future use. MUST be set to zero and MUST be ignored. E bit Encrypted bit. If set the Block MUST contain an IV immediately followed by an encrypted frame. If not set the Block MUST NOT include an IV and the frame MUST be unencrypted. The unencrypted frame MUST immediately follow the Signal Byte. 4.7 Initialization VectorThe IV MUST be unique for every frame for a given key. The IV SHOULD start with a random value on the first encrypted frame.4.7.1 Incrementing Initialization VectorThe IV MUST be increased by 1 for every encrypted frame. The IV MUST be stored as a raw stream of bytes. Incrementing of the IV should be treated as an unsigned 64 bit number. I.e. If the IV value of the current encrypted frame is 0xFFFFFFFFFFFFFFFF, then the IV value of the next encrypted frame should be 0.4.8 CTR Counter Block Format GenerationThe Counter Block Format generation is only valid if the stream has a ContentEncAlgo=5 and a AESSettingsCipherMode=1. If the stream has any values that are different then this Counter Block Format generation MUST NOT be used.Every encrypted frame will need to reinitialize the decryptor with a unique Counter Block. Each Counter Block has a requirement that it must be unique within the same stream for the same encryption key. All Counter Blocks MUST be 16 bytes. The most significant 8 bytes of the Counter Block is the IV which is set from the IV data in the encrypted Block. The least significant 8 bytes is the Block Counter which is initialized to 0. 4.9 Excess Key Stream DataAfter encrypting a frame there may be excess key stream data. This data MUST be discarded before the next frame is encrypted.4.10 Examples4.10.1 Three Encrypted FramesIV = 0xFFFFFFFFFFFFFFFEBlock Counter = 0x0000000000000000 Counter Block = 0xFFFFFFFFFFFFFFFE0000000000000000 IV = 0xFFFFFFFFFFFFFFFF Block Counter = 0x0000000000000000 Counter Block = 0xFFFFFFFFFFFFFFFF0000000000000000 IV = 0x0000000000000000 Block Counter = 0x0000000000000000 Counter Block = 0x00000000000000000000000000000000 4.11 Fast startup recommendationAcquiring keys for the decryption may take longer than some clients deem acceptable. The recommendation to facilitate faster startup is to create Tracks that have the first number of frames unencrypted.5. Issues5.1 LacingHow should lacing be handled?5.2 Integrity CheckShould we add this back in if devices will not be able to handle it? Should we make it an optional feature? If we add it should we spec it differently? Should we make it stronger?5.2.1 Handling Integrity Check FailureAfter a client encounters a verification failure what should the client do? Return an error and stop playing the stream? Drop the frame and continue playing without notifying the user?5.3 Key RotationDo we need to add key rotation within a single Track?6. Revision History
|
