Fast HEVC-to-VVC video transcoding based on a machine learning approach
- Gabriel Cebrián Márquez Director/a
- Antonio Jesús Díaz Honrubia Codirector/a
Universidad de defensa: Universidad de Castilla-La Mancha
Fecha de defensa: 20 de diciembre de 2021
- Manuel Jose Perez Malumbres Presidente/a
- José Luis Martínez Martínez Secretario/a
- Glenn Van Wallendael Vocal
Tipo: Tesis
Resumen
In the last ten years, and thanks to the dizzying progress in the field of video coding, a wide variety of applications have been developed that have substantially modified the way in which audiovisual content is consumed. The use of platforms such as Youtube, Netflix or HBO are examples of this, as well as the growing demand for multimedia content, which accounted for more than 75% of all Internet traffic during 2020. For this reason, it is necessary to improve the performance of encoders to further compress the video stream while maintaining the image quality. The High Efficiency Video Coding (HEVC) standard was released in 2013 to replace its predecessor, H.264/Advanced Video Coding (AVC), by doubling its compression performance while maintaining the same subjective video quality. However, the increase in coding efficiency was achieved at the expense of the high computational cost of the HEVC codec, especially the encoder. For this reason, companies and researchers focused on reducing the computational complexity of the encoder in order to make the implementation of this standard in real-world scenarios feasible, by using techniques based on machine learning and fast encoding, as well as the design of hardware-based algorithms. Given the exponential growth in demand for higher quality and higher resolution content such as 4K or even 8K, it was expected that HEVC would need to be replaced by a new standard within a few years. For this reason, the international organisations in charge of regulating the standardisation of video codecs began the development of a new video coding standard, initially known as H.266, and later also known as Versatile Video Coding (VVC). All coding tools developed with the aim of improving the coding efficiency of this new standard were implemented and tested on the so-called Joint Exploration Model (JEM), which achieved a 30% bit-rate reduction for the same objective quality. However, these tools entailed an enormous increase in computational complexity. In particular, JEM resulted in 12-times longer encoding times compared with HEVC in random access scenarios. For this reason, the scientific community and the industry focused their efforts on developing an encoder with an acceptable computational cost. After migrating the most promising algorithms to a new and faster reference software called the VVC Test Model (VTM), and after integrating novel coding algorithms and improving the existing ones, the VVC standard was finally released in 2020. Traditionally, when a new video format is published, a task that always emerges is the conversion of content to this new format, in this case from HEVC to the new VVC standard. This process is called transcoding, in which a video signal is converted into another by modifying its characteristics, such as bit-rate or resolution. The video platforms mentioned above make massive use of video transcoders to offer their users the same content in different formats, thus adapting to the characteristics of the network and devices, thus generating a huge demand for computation and storage. However, the high computational complexity introduced in VVC makes a traditional cascading transcoder from HEVC to VVC unfeasible. In order to reduce this computational cost, machine learning-based solutions have proven to achieve good results in both fast encoding and transcoding environments with previous standards. These solutions involve a process of data collection and analysis to build accurate prediction models that achieve significant time savings. In the case of transcoding, the source of this data is obtained from the initial bitstream, and the correlation between this information and the decisions made by the transcoder are the key to designing a prediction model that identifies the appropriate patterns to make the optimal decision, which would have been obtained by brute force in the traditional transcoder. In view of the above, the aim of this Thesis is to propose different techniques in the development of a heterogeneous transcoder from HEVC to VVC, that are efficient in terms of compression and considerably reduce the computational cost of a traditional transcoder. The proposed transcoder is composed of two stages: the HEVC decoder, which extracts different information from the initial bitstream, and the VVC encoder, in which the use of machine learning techniques allows the decision-making process to be speeded up thanks to the information from the previous stage. In this sense, the proposed algorithms are focused on assisting the transcoder’s decision-making in the partitioning structure, since finding the optimal partitioning through a brute-force scheme is the most costly part in terms of complexity. Thus, the proposed algorithm consists of a Naïve-Bayes classifier for the first level of the quadtree partitioning, followed by the HEVC decisions for the remaining levels, significantly reducing the computational cost of the transcoding process. The evaluation of the transcoder in random access scenarios shows a reduction in the total encoding time of 57.08% with respect to the traditional cascaded transcoder, with a penalty in terms of BD-rate of only 2.40%. This Thesis presents one of the fastest transcoding algorithms in the literature, and the first transcoding algorithm involving HEVC to the new VVC video coding standard.