Reinforcement learningReinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning. Reinforcement learning differs from supervised learning in not needing labelled input/output pairs to be presented, and in not needing sub-optimal actions to be explicitly corrected.
Markov decision processIn mathematics, a Markov decision process (MDP) is a discrete-time stochastic control process. It provides a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker. MDPs are useful for studying optimization problems solved via dynamic programming. MDPs were known at least as early as the 1950s; a core body of research on Markov decision processes resulted from Ronald Howard's 1960 book, Dynamic Programming and Markov Processes.
MemoryMemory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembered, it would be impossible for language, relationships, or personal identity to develop. Memory loss is usually described as forgetfulness or amnesia. Memory is often understood as an informational processing system with explicit and implicit functioning that is made up of a sensory processor, short-term (or working) memory, and long-term memory.
Machine learningMachine learning (ML) is an umbrella term for solving problems for which development of algorithms by human programmers would be cost-prohibitive, and instead the problems are solved by helping machines 'discover' their 'own' algorithms, without needing to be explicitly told what to do by any human-developed algorithms. Recently, generative artificial neural networks have been able to surpass results of many previous approaches.
Memory consolidationMemory consolidation is a category of processes that stabilize a memory trace after its initial acquisition. A memory trace is a change in the nervous system caused by memorizing something. Consolidation is distinguished into two specific processes. The first, synaptic consolidation, which is thought to correspond to late-phase long-term potentiation, occurs on a small scale in the synaptic connections and neural circuits within the first few hours after learning.
False memoryIn psychology, a false memory is a phenomenon where someone recalls something that did not actually happen or recalls it differently from the way it actually happened. Suggestibility, activation of associated information, the incorporation of misinformation, and source misattribution have been suggested to be several mechanisms underlying a variety of types of false memory. The false memory phenomenon was initially investigated by psychological pioneers Pierre Janet and Sigmund Freud.
IncentiveIn general, incentives are anything that persuade a person to alter their behaviour in the desired manner. It is emphasised that incentives matter by the basic law of economists and the laws of behaviour, which state that higher incentives amount to greater levels of effort and therefore higher levels of performance. An incentive is a powerful tool to influence certain desired behaviors or action often adopted by governments and businesses. Incentives can be broadly broken down into two categories: intrinsic incentives and extrinsic incentives.
Deep reinforcement learningDeep reinforcement learning (deep RL) is a subfield of machine learning that combines reinforcement learning (RL) and deep learning. RL considers the problem of a computational agent learning to make decisions by trial and error. Deep RL incorporates deep learning into the solution, allowing agents to make decisions from unstructured input data without manual engineering of the state space. Deep RL algorithms are able to take in very large inputs (e.g.
Decision-makingIn psychology, decision-making (also spelled decision making and decisionmaking) is regarded as the cognitive process resulting in the selection of a belief or a course of action among several possible alternative options. It could be either rational or irrational. The decision-making process is a reasoning process based on assumptions of values, preferences and beliefs of the decision-maker. Every decision-making process produces a final choice, which may or may not prompt action.
Decision treeA decision tree is a decision support hierarchical model that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains conditional control statements. Decision trees are commonly used in operations research, specifically in decision analysis, to help identify a strategy most likely to reach a goal, but are also a popular tool in machine learning.
Heuristic (psychology)Heuristics is the process by which humans use mental short cuts to arrive at decisions. Heuristics are simple strategies that humans, animals, organizations, and even machines use to quickly form judgments, make decisions, and find solutions to complex problems. Often this involves focusing on the most relevant aspects of a problem or situation to formulate a solution. While heuristic processes are used to find the answers and solutions that are most likely to work or be correct, they are not always right or the most accurate.
Rational choice theoryRational choice theory refers to a set of guidelines that help understand economic and social behaviour. The theory originated in the eighteenth century and can be traced back to political economist and philosopher, Adam Smith. The theory postulates that an individual will perform a cost-benefit analysis to determine whether an option is right for them. It also suggests that an individual's self-driven rational actions will help better the overall economy. Rational choice theory looks at three concepts: rational actors, self interest and the invisible hand.
SequenceIn mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called elements, or terms). The number of elements (possibly infinite) is called the length of the sequence. Unlike a set, the same elements can appear multiple times at different positions in a sequence, and unlike a set, the order does matter. Formally, a sequence can be defined as a function from natural numbers (the positions of elements in the sequence) to the elements at each position.
Decision intelligenceDecision intelligence is an engineering discipline that augments data science with theory from social science, decision theory, and managerial science. Its application provides a framework for best practices in organizational decision-making and processes for applying machine learning at scale. The basic idea is that decisions are based on our understanding of how actions lead to outcomes. Decision intelligence is a discipline for analyzing this chain of cause and effect, and decision modeling is a visual language for representing these chains.
Sequential spaceIn topology and related fields of mathematics, a sequential space is a topological space whose topology can be completely characterized by its convergent/divergent sequences. They can be thought of as spaces that satisfy a very weak axiom of countability, and all first-countable spaces (especially metric spaces) are sequential. In any topological space if a convergent sequence is contained in a closed set then the limit of that sequence must be contained in as well. This property is known as sequential closure.
Q-learningQ-learning is a model-free reinforcement learning algorithm to learn the value of an action in a particular state. It does not require a model of the environment (hence "model-free"), and it can handle problems with stochastic transitions and rewards without requiring adaptations. For any finite Markov decision process (FMDP), Q-learning finds an optimal policy in the sense of maximizing the expected value of the total reward over any and all successive steps, starting from the current state.
Discrete mathematicsDiscrete mathematics is the study of mathematical structures that can be considered "discrete" (in a way analogous to discrete variables, having a bijection with the set of natural numbers) rather than "continuous" (analogously to continuous functions). Objects studied in discrete mathematics include integers, graphs, and statements in logic. By contrast, discrete mathematics excludes topics in "continuous mathematics" such as real numbers, calculus or Euclidean geometry.
Incentive programAn incentive program is a formal scheme used to promote or encourage specific actions or behavior by a specific group of people during a defined period of time. Incentive programs are particularly used in business management to motivate employees and in sales to attract and retain customers. Scientific literature also refers to this concept as pay for performance. Motivation Employee incentive programs are programs used to increase overall employee performance.
Influence diagramAn influence diagram (ID) (also called a relevance diagram, decision diagram or a decision network) is a compact graphical and mathematical representation of a decision situation. It is a generalization of a Bayesian network, in which not only probabilistic inference problems but also decision making problems (following the maximum expected utility criterion) can be modeled and solved. ID was first developed in the mid-1970s by decision analysts with an intuitive semantic that is easy to understand.
Bellman equationA Bellman equation, named after Richard E. Bellman, is a necessary condition for optimality associated with the mathematical optimization method known as dynamic programming. It writes the "value" of a decision problem at a certain point in time in terms of the payoff from some initial choices and the "value" of the remaining decision problem that results from those initial choices. This breaks a dynamic optimization problem into a sequence of simpler subproblems, as Bellman's “principle of optimality" prescribes.