Publication

Linear Complexity Self-Attention With 3rd Order Polynomials

Related concepts (32)

Machine learning-based attention is a mechanism mimicking cognitive attention. It calculates "soft" weights for each word, more precisely for its embedding, in the context window. It can do it either in parallel (such as in transformers) or sequentially (such as recursive neural networks). "Soft" weights can change during each runtime, in contrast to "hard" weights, which are (pre-)trained and fine-tuned and remain frozen afterwards. Multiple attention heads are used in transformer-based large language models.

Residual neural network

A Residual Neural Network (a.k.a. Residual Network, ResNet) is a deep learning model in which the weight layers learn residual functions with reference to the layer inputs. A Residual Network is a network with skip connections that perform identity mappings, merged with the layer outputs by addition. It behaves like a Highway Network whose gates are opened through strongly positive bias weights. This enables deep learning models with tens or hundreds of layers to train easily and approach better accuracy when going deeper.

Recurrent neural network

A recurrent neural network (RNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between its layers. In contrast to uni-directional feedforward neural network, it is a bi-directional artificial neural network, meaning that it allows the output from some nodes to affect subsequent input to the same nodes. Their ability to use internal state (memory) to process arbitrary sequences of inputs makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition.

Polynomial

In mathematics, a polynomial is an expression consisting of indeterminates (also called variables) and coefficients, that involves only the operations of addition, subtraction, multiplication, and positive-integer powers of variables. An example of a polynomial of a single indeterminate x is x2 − 4x + 7. An example with three indeterminates is x3 + 2xyz2 − yz + 1. Polynomials appear in many areas of mathematics and science.

Polynomial ring

In mathematics, especially in the field of algebra, a polynomial ring or polynomial algebra is a ring (which is also a commutative algebra) formed from the set of polynomials in one or more indeterminates (traditionally also called variables) with coefficients in another ring, often a field. Often, the term "polynomial ring" refers implicitly to the special case of a polynomial ring in one indeterminate over a field. The importance of such polynomial rings relies on the high number of properties that they have in common with the ring of the integers.

Artificial neural network

Artificial neural networks (ANNs, also shortened to neural networks (NNs) or neural nets) are a branch of machine learning models that are built using principles of neuronal organization discovered by connectionism in the biological neural networks constituting animal brains. An ANN is based on a collection of connected units or nodes called artificial neurons, which loosely model the neurons in a biological brain. Each connection, like the synapses in a biological brain, can transmit a signal to other neurons.

Deep learning

Deep learning is part of a broader family of machine learning methods, which is based on artificial neural networks with representation learning. The adjective "deep" in deep learning refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.

Monic polynomial

In algebra, a monic polynomial is a non-zero univariate polynomial (that is, a polynomial in a single variable) in which the leading coefficient (the nonzero coefficient of highest degree) is equal to 1. That is to say, a monic polynomial is one that can be written as with Monic polynomials are widely used in algebra and number theory, since they produce many simplifications and they avoid divisions and denominators. Here are some examples. Every polynomial is associated to a unique monic polynomial.

Complexity class

In computational complexity theory, a complexity class is a set of computational problems "of related resource-based complexity". The two most commonly analyzed resources are time and memory. In general, a complexity class is defined in terms of a type of computational problem, a model of computation, and a bounded resource like time or memory. In particular, most complexity classes consist of decision problems that are solvable with a Turing machine, and are differentiated by their time or space (memory) requirements.

Local field

In mathematics, a field K is called a (non-Archimedean) local field if it is complete with respect to a topology induced by a discrete valuation v and if its residue field k is finite. Equivalently, a local field is a locally compact topological field with respect to a non-discrete topology. Sometimes, real numbers R, and the complex numbers C (with their standard topologies) are also defined to be local fields; this is the convention we will adopt below.

Attention

Attention is the concentration of awareness on some phenomenon to the exclusion of other stimuli. It is a process of selectively concentrating on a discrete aspect of information, whether considered subjective or objective. William James (1890) wrote that "Attention is the taking possession by the mind, in clear and vivid form, of one out of what seem several simultaneously possible objects or trains of thought. Focalization, concentration, of consciousness are of its essence.

Cyclotomic polynomial

In mathematics, the nth cyclotomic polynomial, for any positive integer n, is the unique irreducible polynomial with integer coefficients that is a divisor of and is not a divisor of for any k < n. Its roots are all nth primitive roots of unity where k runs over the positive integers not greater than n and coprime to n (and i is the imaginary unit). In other words, the nth cyclotomic polynomial is equal to It may also be defined as the monic polynomial with integer coefficients that is the minimal polynomial over the field of the rational numbers of any primitive nth-root of unity ( is an example of such a root).

Irreducible polynomial

In mathematics, an irreducible polynomial is, roughly speaking, a polynomial that cannot be factored into the product of two non-constant polynomials. The property of irreducibility depends on the nature of the coefficients that are accepted for the possible factors, that is, the field to which the coefficients of the polynomial and its possible factors are supposed to belong. For example, the polynomial x2 − 2 is a polynomial with integer coefficients, but, as every integer is also a real number, it is also a polynomial with real coefficients.

Self-harm

Self-harm is intentional behavior that is considered harmful to oneself. This is most commonly regarded as direct injury of one's own skin tissues usually without a suicidal intention. Other terms such as cutting, self-injury, and self-mutilation have been used for any self-harming behavior regardless of suicidal intent. The most common form of self-harm is using a sharp object to cut the skin. Other forms include scratching, hitting, or burning body parts.

Attention deficit hyperactivity disorder

Attention deficit hyperactivity disorder (ADHD) is a neurodevelopmental disorder characterised by excessive amounts of inattention, hyperactivity, and impulsivity that are pervasive, impairing in multiple contexts, and otherwise age-inappropriate. ADHD symptoms arise from executive dysfunction, and emotional dysregulation is often considered a core symptom. In children, problems paying attention may result in poor school performance.

Time delay neural network

Time delay neural network (TDNN) is a multilayer artificial neural network architecture whose purpose is to 1) classify patterns with shift-invariance, and 2) model context at each layer of the network. Shift-invariant classification means that the classifier does not require explicit segmentation prior to classification. For the classification of a temporal pattern (such as speech), the TDNN thus avoids having to determine the beginning and end points of sounds before classifying them.

Reciprocal polynomial

In algebra, given a polynomial with coefficients from an arbitrary field, its reciprocal polynomial or reflected polynomial, denoted by p∗ or pR, is the polynomial That is, the coefficients of p∗ are the coefficients of p in reverse order. Reciprocal polynomials arise naturally in linear algebra as the characteristic polynomial of the inverse of a matrix. In the special case where the field is the complex numbers, when the conjugate reciprocal polynomial, denoted p†, is defined by, where denotes the complex conjugate of , and is also called the reciprocal polynomial when no confusion can arise.

Chebyshev polynomials

The Chebyshev polynomials are two sequences of polynomials related to the cosine and sine functions, notated as and . They can be defined in several equivalent ways, one of which starts with trigonometric functions: The Chebyshev polynomials of the first kind are defined by Similarly, the Chebyshev polynomials of the second kind are defined by That these expressions define polynomials in may not be obvious at first sight, but follows by rewriting and using de Moivre's formula or by using the angle sum formulas for and repeatedly.

Defence mechanism

In psychoanalytic theory, a defence mechanism (American English: defense mechanism) is an unconscious psychological operation that functions to protect a person from anxiety-producing thoughts and feelings related to internal conflicts and outer stressors. Defence mechanisms (Abwehrmechanismen) are unconscious psychological processes employed to defend against feelings of anxiety and unacceptable impulses at the level of consciousness.

Generative adversarial network

A generative adversarial network (GAN) is a class of machine learning framework and a prominent framework for approaching generative AI. The concept was initially developed by Ian Goodfellow and his colleagues in June 2014. In a GAN, two neural networks contest with each other in the form of a zero-sum game, where one agent's gain is another agent's loss. Given a training set, this technique learns to generate new data with the same statistics as the training set.