Publication

Fast multi-view face tracking with pose estimation

Related concepts (30)

In virtual reality (VR) and augmented reality (AR), a pose tracking system detects the precise pose of head-mounted displays, controllers, other objects or body parts within Euclidean space. Pose tracking is often referred to as 6DOF tracking, for the six degrees of freedom in which the pose is often tracked. Pose tracking is sometimes referred to as positional tracking, but the two are separate. Pose tracking is different from positional tracking because pose tracking includes orientation whereas and positional tracking does not.

Facial recognition system

A facial recognition system is a technology potentially capable of matching a human face from a or a video frame against a database of faces. Such a system is typically employed to authenticate users through ID verification services, and works by pinpointing and measuring facial features from a given image. Development began on similar systems in the 1960s, beginning as a form of computer application. Since their inception, facial recognition systems have seen wider uses in recent times on smartphones and in other forms of technology, such as robotics.

Boosting (machine learning)

In machine learning, boosting is an ensemble meta-algorithm for primarily reducing bias, and also variance in supervised learning, and a family of machine learning algorithms that convert weak learners to strong ones. Boosting is based on the question posed by Kearns and Valiant (1988, 1989): "Can a set of weak learners create a single strong learner?" A weak learner is defined to be a classifier that is only slightly correlated with the true classification (it can label examples better than random guessing).

Gaussian blur

In , a Gaussian blur (also known as Gaussian smoothing) is the result of blurring an by a Gaussian function (named after mathematician and scientist Carl Friedrich Gauss). It is a widely used effect in graphics software, typically to reduce and reduce detail. The visual effect of this blurring technique is a smooth blur resembling that of viewing the image through a translucent screen, distinctly different from the bokeh effect produced by an out-of-focus lens or the shadow of an object under usual illumination.

Discriminative model

Discriminative models, also referred to as conditional models, are a class of logistical models used for classification or regression. They distinguish decision boundaries through observed data, such as pass/fail, win/lose, alive/dead or healthy/sick. Typical discriminative models include logistic regression (LR), conditional random fields (CRFs) (specified over an undirected graph), decision trees, and many others. Typical generative model approaches include naive Bayes classifiers, Gaussian mixture models, variational autoencoders, generative adversarial networks and others.

Bootstrap aggregating

Bootstrap aggregating, also called bagging (from bootstrap aggregating), is a machine learning ensemble meta-algorithm designed to improve the stability and accuracy of machine learning algorithms used in statistical classification and regression. It also reduces variance and helps to avoid overfitting. Although it is usually applied to decision tree methods, it can be used with any type of method. Bagging is a special case of the model averaging approach.

AdaBoost

AdaBoost, short for Adaptive Boosting, is a statistical classification meta-algorithm formulated by Yoav Freund and Robert Schapire in 1995, who won the 2003 Gödel Prize for their work. It can be used in conjunction with many other types of learning algorithms to improve performance. The output of the other learning algorithms ('weak learners') is combined into a weighted sum that represents the final output of the boosted classifier. Usually, AdaBoost is presented for binary classification, although it can be generalized to multiple classes or bounded intervals on the real line.

Weierstrass transform

In mathematics, the Weierstrass transform of a function f : R → R, named after Karl Weierstrass, is a "smoothed" version of f(x) obtained by averaging the values of f, weighted with a Gaussian centered at x. Specifically, it is the function F defined by the convolution of f with the Gaussian function The factor 1/√(4π) is chosen so that the Gaussian will have a total integral of 1, with the consequence that constant functions are not changed by the Weierstrass transform. Instead of F(x) one also writes Wf.

Motion capture

Motion capture (sometimes referred as mo-cap or mocap, for short) is the process of recording the movement of objects or people. It is used in military, entertainment, sports, medical applications, and for validation of computer vision and robots. In filmmaking and video game development, it refers to recording actions of human actors and using that information to animate digital character models in 2D or 3D computer animation. When it includes face and fingers or captures subtle expressions, it is often referred to as performance capture.

Gaussian function

In mathematics, a Gaussian function, often simply referred to as a Gaussian, is a function of the base form and with parametric extension for arbitrary real constants a, b and non-zero c. It is named after the mathematician Carl Friedrich Gauss. The graph of a Gaussian is a characteristic symmetric "bell curve" shape. The parameter a is the height of the curve's peak, b is the position of the center of the peak, and c (the standard deviation, sometimes called the Gaussian RMS width) controls the width of the "bell".

Decision tree learning

Decision tree learning is a supervised learning approach used in statistics, data mining and machine learning. In this formalism, a classification or regression decision tree is used as a predictive model to draw conclusions about a set of observations. Tree models where the target variable can take a discrete set of values are called classification trees; in these tree structures, leaves represent class labels and branches represent conjunctions of features that lead to those class labels.

Pose (computer vision)

In the fields of computing and computer vision, pose (or spatial pose) represents the position and orientation of an object, usually in three dimensions. Poses are often stored internally as transformation matrices. The term “pose” is largely synonymous with the term “transform”, but a transform may often include scale, whereas pose does not. In computer vision, the pose of an object is often estimated from camera input by the process of pose estimation.

Face ID

Face ID is a facial recognition system designed and developed by Apple Inc. for the iPhone and iPad Pro. The system allows biometric authentication for unlocking a device, making payments, accessing sensitive data, providing detailed facial expression tracking for Animoji, as well as six degrees of freedom (6DOF) head-tracking, eye-tracking, and other features. Initially released in November 2017 with the iPhone X, it has since been updated and introduced to several new iPhone models, and all iPad Pro models.

Random forest

Random forests or random decision forests is an ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time. For classification tasks, the output of the random forest is the class selected by most trees. For regression tasks, the mean or average prediction of the individual trees is returned. Random decision forests correct for decision trees' habit of overfitting to their training set.

Maximum likelihood estimation

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

Gesture recognition

Gesture recognition is a topic in computer science and language technology with the goal of interpreting human gestures via mathematical algorithms. It is a subdiscipline of computer vision. Gestures can originate from any bodily motion or state, but commonly originate from the face or hand. Focuses in the field include emotion recognition from face and hand gesture recognition since they are all expressions. Users can make simple gestures to control or interact with devices without physically touching them.

GPS tracking unit

A GPS tracking unit, geotracking unit, satellite tracking unit, or simply tracker is a navigation device normally on a vehicle, asset, person or animal that uses satellite navigation to determine its movement and determine its WGS84 UTM geographic position (geotracking) to determine its location. Satellite tracking devices may send special satellite signals that are processed by a receiver. Locations are stored in the tracking unit or transmitted to an Internet-connected device using the cellular network (GSM/GPRS/CDMA/LTE or SMS), radio, or satellite modem embedded in the unit or WiFi work worldwide.

Tracking system

A tracking system, also known as a locating system, is used for the observing of persons or objects on the move and supplying a timely ordered sequence of location data for further processing. A myriad of tracking systems exists. Some are 'lag time' indicators, that is, the data is collected after an item has passed a point for example a bar code or choke point or gate. Others are 'real-time' or 'near real-time' like Global Positioning Systems (GPS) depending on how often the data is refreshed.

Generative model

In statistical classification, two main approaches are called the generative approach and the discriminative approach. These compute classifiers by different approaches, differing in the degree of statistical modelling. Terminology is inconsistent, but three major types can be distinguished, following : A generative model is a statistical model of the joint probability distribution on given observable variable X and target variable Y; A discriminative model is a model of the conditional probability of the target Y, given an observation x; and Classifiers computed without using a probability model are also referred to loosely as "discriminative".

Camera resectioning

Camera resectioning is the process of estimating the parameters of a pinhole camera model approximating the camera that produced a given photograph or video; it determines which incoming light ray is associated with each pixel on the resulting image. Basically, the process determines the pose of the pinhole camera. Usually, the camera parameters are represented in a 3 × 4 projection matrix called the camera matrix. The extrinsic parameters define the camera pose (position and orientation) while the intrinsic parameters specify the camera image format (focal length, pixel size, and image origin).