Publication

Variance-Reduced Stochastic Learning Under Random Reshuffling

Related concepts (32)

In mathematics, there are many senses in which a sequence or a series is said to be convergent. This article describes various modes (senses or species) of convergence in the settings where they are defined. For a list of modes of convergence, see Modes of convergence (annotated index) Note that each of the following objects is a special case of the types preceding it: sets, topological spaces, uniform spaces, TAGs (topological abelian groups), normed spaces, Euclidean spaces, and the real/complex numbers.

Monotone convergence theorem

In the mathematical field of real analysis, the monotone convergence theorem is any of a number of related theorems proving the convergence of monotonic sequences (sequences that are non-decreasing or non-increasing) that are also bounded. Informally, the theorems state that if a sequence is increasing and bounded above by a supremum, then the sequence will converge to the supremum; in the same way, if a sequence is decreasing and is bounded below by an infimum, it will converge to the infimum.

Topologies on spaces of linear maps

In mathematics, particularly functional analysis, spaces of linear maps between two vector spaces can be endowed with a variety of topologies. Studying space of linear maps and these topologies can give insight into the spaces themselves. The article operator topologies discusses topologies on spaces of linear maps between normed spaces, whereas this article discusses topologies on such spaces in the more general setting of topological vector spaces (TVSs).

Variational autoencoder

In machine learning, a variational autoencoder (VAE) is an artificial neural network architecture introduced by Diederik P. Kingma and Max Welling. It is part of the families of probabilistic graphical models and variational Bayesian methods. Variational autoencoders are often associated with the autoencoder model because of its architectural affinity, but with significant differences in the goal and mathematical formulation. Variational autoencoders are probabilistic generative models that require neural networks as only a part of their overall structure.

Formal proof

In logic and mathematics, a formal proof or derivation is a finite sequence of sentences (called well-formed formulas in the case of a formal language), each of which is an axiom, an assumption, or follows from the preceding sentences in the sequence by a rule of inference. It differs from a natural language argument in that it is rigorous, unambiguous and mechanically verifiable. If the set of assumptions is empty, then the last sentence in a formal proof is called a theorem of the formal system.

Radius of convergence

In mathematics, the radius of convergence of a power series is the radius of the largest disk at the center of the series in which the series converges. It is either a non-negative real number or . When it is positive, the power series converges absolutely and uniformly on compact sets inside the open disk of radius equal to the radius of convergence, and it is the Taylor series of the analytic function to which it converges.

Formal system

A formal system is an abstract structure used for inferring theorems from axioms according to a set of rules. These rules, which are used for carrying out the inference of theorems from axioms, are the logical calculus of the formal system. A formal system is essentially an "axiomatic system". In 1921, David Hilbert proposed to use such a system as the foundation for the knowledge in mathematics. A formal system may represent a well-defined system of abstract thought.

Mean squared error

In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference between the estimated values and the actual value. MSE is a risk function, corresponding to the expected value of the squared error loss. The fact that MSE is almost always strictly positive (and not zero) is because of randomness or because the estimator does not account for information that could produce a more accurate estimate.

Continuous uniform distribution

In probability theory and statistics, the continuous uniform distributions or rectangular distributions are a family of symmetric probability distributions. Such a distribution describes an experiment where there is an arbitrary outcome that lies between certain bounds. The bounds are defined by the parameters, and which are the minimum and maximum values. The interval can either be closed (i.e. ) or open (i.e. ). Therefore, the distribution is often abbreviated where stands for uniform distribution.

Stochastic gradient descent

Stochastic gradient descent (often abbreviated SGD) is an iterative method for optimizing an objective function with suitable smoothness properties (e.g. differentiable or subdifferentiable). It can be regarded as a stochastic approximation of gradient descent optimization, since it replaces the actual gradient (calculated from the entire data set) by an estimate thereof (calculated from a randomly selected subset of the data).

Inverse transform sampling

Inverse transform sampling (also known as inversion sampling, the inverse probability integral transform, the inverse transformation method, Smirnov transform, or the golden rule) is a basic method for pseudo-random number sampling, i.e., for generating sample numbers at random from any probability distribution given its cumulative distribution function. Inverse transformation sampling takes uniform samples of a number between 0 and 1, interpreted as a probability, and then returns the smallest number such that for the cumulative distribution function of a random variable.

Mathematical proof

A mathematical proof is a deductive argument for a mathematical statement, showing that the stated assumptions logically guarantee the conclusion. The argument may use other previously established statements, such as theorems; but every proof can, in principle, be constructed using only certain basic or original assumptions known as axioms, along with the accepted rules of inference. Proofs are examples of exhaustive deductive reasoning which establish logical certainty, to be distinguished from empirical arguments or non-exhaustive inductive reasoning which establish "reasonable expectation".

Conjugate gradient method

In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is positive-definite. The conjugate gradient method is often implemented as an iterative algorithm, applicable to sparse systems that are too large to be handled by a direct implementation or other direct methods such as the Cholesky decomposition. Large sparse systems often arise when numerically solving partial differential equations or optimization problems.

Root-mean-square deviation

The root-mean-square deviation (RMSD) or root-mean-square error (RMSE) is a frequently used measure of the differences between values (sample or population values) predicted by a model or an estimator and the values observed. The RMSD represents the square root of the second sample moment of the differences between predicted values and observed values or the quadratic mean of these differences. These deviations are called residuals when the calculations are performed over the data sample that was used for estimation and are called errors (or prediction errors) when computed out-of-sample.

Weak topology

In mathematics, weak topology is an alternative term for certain initial topologies, often on topological vector spaces or spaces of linear operators, for instance on a Hilbert space. The term is most commonly used for the initial topology of a topological vector space (such as a normed vector space) with respect to its continuous dual. The remainder of this article will deal with this case, which is one of the concepts of functional analysis. One may call subsets of a topological vector space weakly closed (respectively, weakly compact, etc.

Discrete uniform distribution

In probability theory and statistics, the discrete uniform distribution is a symmetric probability distribution wherein a finite number of values are equally likely to be observed; every one of n values has equal probability 1/n. Another way of saying "discrete uniform distribution" would be "a known, finite number of outcomes equally likely to happen". A simple example of the discrete uniform distribution is throwing a fair die. The possible values are 1, 2, 3, 4, 5, 6, and each time the die is thrown the probability of a given score is 1/6.

Formal language

In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules. The alphabet of a formal language consists of symbols, letters, or tokens that concatenate into strings of the language. Each string concatenated from symbols of this alphabet is called a word, and the words that belong to a particular formal language are sometimes called well-formed words or well-formed formulas.

Dominated convergence theorem

In measure theory, Lebesgue's dominated convergence theorem provides sufficient conditions under which almost everywhere convergence of a sequence of functions implies convergence in the L1 norm. Its power and utility are two of the primary theoretical advantages of Lebesgue integration over Riemann integration. In addition to its frequent appearance in mathematical analysis and partial differential equations, it is widely used in probability theory, since it gives a sufficient condition for the convergence of expected values of random variables.

Pointwise convergence

In mathematics, pointwise convergence is one of various senses in which a sequence of functions can converge to a particular function. It is weaker than uniform convergence, to which it is often compared. Suppose that is a set and is a topological space, such as the real or complex numbers or a metric space, for example. A net or sequence of functions all having the same domain and codomain is said to converge pointwise to a given function often written as if (and only if) The function is said to be the pointwise limit function of the Sometimes, authors use the term bounded pointwise convergence when there is a constant such that .

Non-uniform random variate generation

Non-uniform random variate generation or pseudo-random number sampling is the numerical practice of generating pseudo-random numbers (PRN) that follow a given probability distribution. Methods are typically based on the availability of a uniformly distributed PRN generator. Computational algorithms are then used to manipulate a single random variate, X, or often several such variates, into a new random variate Y such that these values have the required distribution.