Nonlinear dimensionality reductionNonlinear dimensionality reduction, also known as manifold learning, refers to various related techniques that aim to project high-dimensional data onto lower-dimensional latent manifolds, with the goal of either visualizing the data in the low-dimensional space, or learning the mapping (either from the high-dimensional space to the low-dimensional embedding or vice versa) itself. The techniques described below can be understood as generalizations of linear decomposition methods used for dimensionality reduction, such as singular value decomposition and principal component analysis.
Dimensionality reductionDimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally close to its intrinsic dimension. Working in high-dimensional spaces can be undesirable for many reasons; raw data are often sparse as a consequence of the curse of dimensionality, and analyzing the data is usually computationally intractable (hard to control or deal with).
Six-dimensional spaceSix-dimensional space is any space that has six dimensions, six degrees of freedom, and that needs six pieces of data, or coordinates, to specify a location in this space. There are an infinite number of these, but those of most interest are simpler ones that model some aspect of the environment. Of particular interest is six-dimensional Euclidean space, in which 6-polytopes and the 5-sphere are constructed. Six-dimensional elliptical space and hyperbolic spaces are also studied, with constant positive and negative curvature.
Singular value decompositionIn linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix. It generalizes the eigendecomposition of a square normal matrix with an orthonormal eigenbasis to any matrix. It is related to the polar decomposition. Specifically, the singular value decomposition of an complex matrix M is a factorization of the form where U is an complex unitary matrix, is an rectangular diagonal matrix with non-negative real numbers on the diagonal, V is an complex unitary matrix, and is the conjugate transpose of V.
Matrix normIn mathematics, a matrix norm is a vector norm in a vector space whose elements (vectors) are matrices (of given dimensions). Given a field of either real or complex numbers, let be the K-vector space of matrices with rows and columns and entries in the field . A matrix norm is a norm on . This article will always write such norms with double vertical bars (like so: ).
Euclidean spaceEuclidean space is the fundamental space of geometry, intended to represent physical space. Originally, that is, in Euclid's Elements, it was the three-dimensional space of Euclidean geometry, but in modern mathematics there are Euclidean spaces of any positive integer dimension n, which are called Euclidean n-spaces when one wants to specify their dimension. For n equal to one or two, they are commonly called respectively Euclidean lines and Euclidean planes.
Frobenius inner productIn mathematics, the Frobenius inner product is a binary operation that takes two matrices and returns a scalar. It is often denoted . The operation is a component-wise inner product of two matrices as though they are vectors, and satisfies the axioms for an inner product. The two matrices must have the same dimension - same number of rows and columns, but are not restricted to be square matrices. Given two complex number-valued n×m matrices A and B, written explicitly as the Frobenius inner product is defined as, where the overline denotes the complex conjugate, and denotes Hermitian conjugate.
QuaternionIn mathematics, the quaternion number system extends the complex numbers. Quaternions were first described by the Irish mathematician William Rowan Hamilton in 1843 and applied to mechanics in three-dimensional space. Hamilton defined a quaternion as the quotient of two directed lines in a three-dimensional space, or, equivalently, as the quotient of two vectors. Multiplication of quaternions is noncommutative. Quaternions are generally represented in the form where a, b, c, and d are real numbers; and 1, i, j, and k are the basis vectors or basis elements.
Curse of dimensionalityThe curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces that do not occur in low-dimensional settings such as the three-dimensional physical space of everyday experience. The expression was coined by Richard E. Bellman when considering problems in dynamic programming. Dimensionally cursed phenomena occur in domains such as numerical analysis, sampling, combinatorics, machine learning, data mining and databases.
Vector spaceIn mathematics and physics, a vector space (also called a linear space) is a set whose elements, often called vectors, may be added together and multiplied ("scaled") by numbers called scalars. Scalars are often real numbers, but can be complex numbers or, more generally, elements of any field. The operations of vector addition and scalar multiplication must satisfy certain requirements, called vector axioms. The terms real vector space and complex vector space are often used to specify the nature of the scalars: real coordinate space or complex coordinate space.
Compressed sensingCompressed sensing (also known as compressive sensing, compressive sampling, or sparse sampling) is a signal processing technique for efficiently acquiring and reconstructing a signal, by finding solutions to underdetermined linear systems. This is based on the principle that, through optimization, the sparsity of a signal can be exploited to recover it from far fewer samples than required by the Nyquist–Shannon sampling theorem. There are two conditions under which recovery is possible.
Matrix (mathematics)In mathematics, a matrix (plural matrices) is a rectangular array or table of numbers, symbols, or expressions, arranged in rows and columns, which is used to represent a mathematical object or a property of such an object. For example, is a matrix with two rows and three columns. This is often referred to as a "two by three matrix", a " matrix", or a matrix of dimension . Without further specifications, matrices represent linear maps, and allow explicit computations in linear algebra.
Generative adversarial networkA generative adversarial network (GAN) is a class of machine learning framework and a prominent framework for approaching generative AI. The concept was initially developed by Ian Goodfellow and his colleagues in June 2014. In a GAN, two neural networks contest with each other in the form of a zero-sum game, where one agent's gain is another agent's loss. Given a training set, this technique learns to generate new data with the same statistics as the training set.
Sampling (statistics)In statistics, quality assurance, and survey methodology, sampling is the selection of a subset or a statistical sample (termed sample for short) of individuals from within a statistical population to estimate characteristics of the whole population. Statisticians attempt to collect samples that are representative of the population. Sampling has lower costs and faster data collection compared to recording data from the entire population, and thus, it can provide insights in cases where it is infeasible to measure an entire population.
Row and column spacesIn linear algebra, the column space (also called the range or ) of a matrix A is the span (set of all possible linear combinations) of its column vectors. The column space of a matrix is the or range of the corresponding matrix transformation. Let be a field. The column space of an m × n matrix with components from is a linear subspace of the m-space . The dimension of the column space is called the rank of the matrix and is at most min(m, n). A definition for matrices over a ring is also possible.
DimensionIn physics and mathematics, the dimension of a mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any point within it. Thus, a line has a dimension of one (1D) because only one coordinate is needed to specify a point on it - for example, the point at 5 on a number line. A surface, such as the boundary of a cylinder or sphere, has a dimension of two (2D) because two coordinates are needed to specify a point on it - for example, both a latitude and longitude are required to locate a point on the surface of a sphere.
Machine learningMachine learning (ML) is an umbrella term for solving problems for which development of algorithms by human programmers would be cost-prohibitive, and instead the problems are solved by helping machines 'discover' their 'own' algorithms, without needing to be explicitly told what to do by any human-developed algorithms. Recently, generative artificial neural networks have been able to surpass results of many previous approaches.
Principal component analysisPrincipal component analysis (PCA) is a popular technique for analyzing large datasets containing a high number of dimensions/features per observation, increasing the interpretability of data while preserving the maximum amount of information, and enabling the visualization of multidimensional data. Formally, PCA is a statistical technique for reducing the dimensionality of a dataset. This is accomplished by linearly transforming the data into a new coordinate system where (most of) the variation in the data can be described with fewer dimensions than the initial data.
Convolutional neural networkConvolutional neural network (CNN) is a regularized type of feed-forward neural network that learns feature engineering by itself via filters (or kernel) optimization. Vanishing gradients and exploding gradients, seen during backpropagation in earlier neural networks, are prevented by using regularized weights over fewer connections. For example, for each neuron in the fully-connected layer 10,000 weights would be required for processing an image sized 100 × 100 pixels.
AutoencoderAn autoencoder is a type of artificial neural network used to learn efficient codings of unlabeled data (unsupervised learning). An autoencoder learns two functions: an encoding function that transforms the input data, and a decoding function that recreates the input data from the encoded representation. The autoencoder learns an efficient representation (encoding) for a set of data, typically for dimensionality reduction. Variants exist, aiming to force the learned representations to assume useful properties.