Motion compensationMotion compensation in computing, is an algorithmic technique used to predict a frame in a video, given the previous and/or future frames by accounting for motion of the camera and/or objects in the video. It is employed in the encoding of video data for video compression, for example in the generation of MPEG-2 files. Motion compensation describes a picture in terms of the transformation of a reference picture to the current picture. The reference picture may be previous in time or even from the future.
Video coding formatA video coding format (or sometimes video compression format) is a content representation format for storage or transmission of digital video content (such as in a data file or bitstream). It typically uses a standardized video compression algorithm, most commonly based on discrete cosine transform (DCT) coding and motion compensation. A specific software, firmware, or hardware implementation capable of compression or decompression to/from a specific video coding format is called a video codec.
Sparse approximationSparse approximation (also known as sparse representation) theory deals with sparse solutions for systems of linear equations. Techniques for finding these solutions and exploiting them in applications have found wide use in , signal processing, machine learning, medical imaging, and more. Consider a linear system of equations , where is an underdetermined matrix and . The matrix (typically assumed to be full-rank) is referred to as the dictionary, and is a signal of interest.
Matching pursuitMatching pursuit (MP) is a sparse approximation algorithm which finds the "best matching" projections of multidimensional data onto the span of an over-complete (i.e., redundant) dictionary . The basic idea is to approximately represent a signal from Hilbert space as a weighted sum of finitely many functions (called atoms) taken from . An approximation with atoms has the form where is the th column of the matrix and is the scalar weighting factor (amplitude) for the atom . Normally, not every atom in will be used in this sum.
SignalIn signal processing, a signal is a function that conveys information about a phenomenon. Any quantity that can vary over space or time can be used as a signal to share messages between observers. The IEEE Transactions on Signal Processing includes audio, video, speech, , sonar, and radar as examples of signals. A signal may also be defined as observable change in a quantity over space or time (a time series), even if it does not carry information.
Advanced Video CodingAdvanced Video Coding (AVC), also referred to as H.264 or MPEG-4 Part 10, is a video compression standard based on block-oriented, motion-compensated coding. It is by far the most commonly used format for the recording, compression, and distribution of video content, used by 91% of video industry developers . It supports a maximum resolution of 8K UHD. The intent of the H.264/AVC project was to create a standard capable of providing good video quality at substantially lower bit rates than previous standards (i.
WaveletA wavelet is a wave-like oscillation with an amplitude that begins at zero, increases or decreases, and then returns to zero one or more times. Wavelets are termed a "brief oscillation". A taxonomy of wavelets has been established, based on the number and direction of its pulses. Wavelets are imbued with specific properties that make them useful for signal processing. For example, a wavelet could be created to have a frequency of Middle C and a short duration of roughly one tenth of a second.
Sparse dictionary learningSparse dictionary learning (also known as sparse coding or SDL) is a representation learning method which aims at finding a sparse representation of the input data in the form of a linear combination of basic elements as well as those basic elements themselves. These elements are called atoms and they compose a dictionary. Atoms in the dictionary are not required to be orthogonal, and they may be an over-complete spanning set. This problem setup also allows the dimensionality of the signals being represented to be higher than the one of the signals being observed.
VideoVideo is an electronic medium for the recording, copying, playback, broadcasting, and display of moving media. Video was first developed for mechanical television systems, which were quickly replaced by cathode-ray tube (CRT) systems which, in turn, were replaced by flat panel displays of several types. Video systems vary in display resolution, aspect ratio, refresh rate, color capabilities and other qualities. Analog and digital variants exist and can be carried on a variety of media, including radio broadcasts, magnetic tape, optical discs, , and network streaming.
Digital signal processingDigital signal processing (DSP) is the use of digital processing, such as by computers or more specialized digital signal processors, to perform a wide variety of signal processing operations. The digital signals processed in this manner are a sequence of numbers that represent samples of a continuous variable in a domain such as time, space, or frequency. In digital electronics, a digital signal is represented as a pulse train, which is typically generated by the switching of a transistor.
High Efficiency Video CodingHigh Efficiency Video Coding (HEVC), also known as H.265 and MPEG-H Part 2, is a video compression standard designed as part of the MPEG-H project as a successor to the widely used Advanced Video Coding (AVC, H.264, or MPEG-4 Part 10). In comparison to AVC, HEVC offers from 25% to 50% better data compression at the same level of video quality, or substantially improved video quality at the same bit rate. It supports resolutions up to 8192×4320, including 8K UHD, and unlike the primarily 8-bit AVC, HEVC's higher fidelity Main 10 profile has been incorporated into nearly all supporting hardware.
Cholesky decompositionIn linear algebra, the Cholesky decomposition or Cholesky factorization (pronounced ʃəˈlɛski ) is a decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose, which is useful for efficient numerical solutions, e.g., Monte Carlo simulations. It was discovered by André-Louis Cholesky for real matrices, and posthumously published in 1924. When it is applicable, the Cholesky decomposition is roughly twice as efficient as the LU decomposition for solving systems of linear equations.
QR decompositionIn linear algebra, a QR decomposition, also known as a QR factorization or QU factorization, is a decomposition of a matrix A into a product A = QR of an orthonormal matrix Q and an upper triangular matrix R. QR decomposition is often used to solve the linear least squares problem and is the basis for a particular eigenvalue algorithm, the QR algorithm. Any real square matrix A may be decomposed as where Q is an orthogonal matrix (its columns are orthogonal unit vectors meaning ) and R is an upper triangular matrix (also called right triangular matrix).
LU decompositionIn numerical analysis and linear algebra, lower–upper (LU) decomposition or factorization factors a matrix as the product of a lower triangular matrix and an upper triangular matrix (see matrix decomposition). The product sometimes includes a permutation matrix as well. LU decomposition can be viewed as the matrix form of Gaussian elimination. Computers usually solve square systems of linear equations using LU decomposition, and it is also a key step when inverting a matrix or computing the determinant of a matrix.
Digital videoDigital video is an electronic representation of moving visual images (video) in the form of encoded digital data. This is in contrast to analog video, which represents moving visual images in the form of analog signals. Digital video comprises a series of s displayed in rapid succession, usually at 24, 30, or 60 frames per second. Digital video has many advantages such as easy copying, multicasting, sharing and storage. Digital video was first introduced commercially in 1986 with the Sony D1 format, which recorded an uncompressed standard-definition component video signal in digital form.
Signal processingSignal processing is an electrical engineering subfield that focuses on analyzing, modifying and synthesizing signals, such as sound, , potential fields, seismic signals, altimetry processing, and scientific measurements. Signal processing techniques are used to optimize transmissions, digital storage efficiency, correcting distorted signals, subjective video quality and to also detect or pinpoint components of interest in a measured signal. According to Alan V. Oppenheim and Ronald W.
Video Coding Experts GroupThe Video Coding Experts Group or Visual Coding Experts Group (VCEG, also known as Question 6) is a working group of the ITU Telecommunication Standardization Sector (ITU-T) concerned with standards for compression coding of video, images, audio, and other signals. It is responsible for standardization of the "H.26x" line of video coding standards, the "T.8xx" line of image coding standards, and related technologies.
Neural codingNeural coding (or neural representation) is a neuroscience field concerned with characterising the hypothetical relationship between the stimulus and the individual or ensemble neuronal responses and the relationship among the electrical activity of the neurons in the ensemble. Based on the theory that sensory and other information is represented in the brain by networks of neurons, it is thought that neurons can encode both digital and analog information.
Basis pursuitBasis pursuit is the mathematical optimization problem of the form where x is a N-dimensional solution vector (signal), y is a M-dimensional vector of observations (measurements), A is a M × N transform matrix (usually measurement matrix) and M < N. It is usually applied in cases where there is an underdetermined system of linear equations y = Ax that must be exactly satisfied, and the sparsest solution in the L1 sense is desired. When it is desirable to trade off exact equality of Ax and y in exchange for a sparser x, basis pursuit denoising is preferred.
Inter frameAn inter frame is a frame in a video compression stream which is expressed in terms of one or more neighboring frames. The "inter" part of the term refers to the use of Inter frame prediction. This kind of prediction tries to take advantage from temporal redundancy between neighboring frames enabling higher compression rates. An inter coded frame is divided into blocks known as macroblocks. After that, instead of directly encoding the raw pixel values for each block, the encoder will try to find a block similar to the one it is encoding on a previously encoded frame, referred to as a reference frame.