Big dataBig data primarily refers to data sets that are too large or complex to be dealt with by traditional data-processing application software. Data with many entries (rows) offer greater statistical power, while data with higher complexity (more attributes or columns) may lead to a higher false discovery rate. Though used sometimes loosely partly because of a lack of formal definition, the interpretation that seems to best describe big data is the one associated with a large body of information that we could not comprehend when used only in smaller amounts.
DataIn common usage and statistics, data (USˈdætə; UKˈdeɪtə) is a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted formally. A datum is an individual value in a collection of data. Data is usually organized into structures such as tables that provide additional context and meaning, and which may themselves be used as data in larger structures.
Deep learningDeep learning is part of a broader family of machine learning methods, which is based on artificial neural networks with representation learning. The adjective "deep" in deep learning refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.
Unsupervised learningUnsupervised learning, is paradigm in machine learning where, in contrast to supervised learning and semi-supervised learning, algorithms learn patterns exclusively from unlabeled data. Neural network tasks are often categorized as discriminative (recognition) or generative (imagination). Often but not always, discriminative tasks use supervised methods and generative tasks use unsupervised (see Venn diagram); however, the separation is very hazy. For example, object recognition favors supervised learning but unsupervised learning can also cluster objects into groups.
Supervised learningSupervised learning (SL) is a paradigm in machine learning where input objects (for example, a vector of predictor variables) and a desired output value (also known as human-labeled supervisory signal) train a model. The training data is processed, building a function that maps new data on expected output values. An optimal scenario will allow for the algorithm to correctly determine output values for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a "reasonable" way (see inductive bias).
MaintenanceThe technical meaning of maintenance involves functional checks, servicing, repairing or replacing of necessary devices, equipment, machinery, building infrastructure, and supporting utilities in industrial, business, and residential installations. Over time, this has come to include multiple wordings that describe various cost-effective practices to keep equipment operational; these activities occur either before or after a failure. Maintenance functions can be defined as maintenance, repair and overhaul (MRO), and MRO is also used for maintenance, repair and operations.
Predictive maintenancePredictive maintenance techniques are designed to help determine the condition of in-service equipment in order to estimate when maintenance should be performed. This approach promises cost savings over routine or time-based preventive maintenance, because tasks are performed only when warranted. Thus, it is regarded as condition-based maintenance carried out as suggested by estimations of the degradation state of an item. The main promise of predictive maintenance is to allow convenient scheduling of corrective maintenance, and to prevent unexpected equipment failures.
Data processingData processing is the collection and manipulation of digital data to produce meaningful information. Data processing is a form of information processing, which is the modification (processing) of information in any manner detectable by an observer. The term "Data Processing", or "DP" has also been used to refer to a department within an organization responsible for the operation of data processing programs. Data processing may involve various processes, including: Validation – Ensuring that supplied data is correct and relevant.
Artificial neural networkArtificial neural networks (ANNs, also shortened to neural networks (NNs) or neural nets) are a branch of machine learning models that are built using principles of neuronal organization discovered by connectionism in the biological neural networks constituting animal brains. An ANN is based on a collection of connected units or nodes called artificial neurons, which loosely model the neurons in a biological brain. Each connection, like the synapses in a biological brain, can transmit a signal to other neurons.
Options strategyOption strategies are the simultaneous, and often mixed, buying or selling of one or more options that differ in one or more of the options' variables. Call options, simply known as Calls, give the buyer a right to buy a particular stock at that option's strike price. Opposite to that are Put options, simply known as Puts, which give the buyer the right to sell a particular stock at the option's strike price. This is often done to gain exposure to a specific type of opportunity or risk while eliminating other risks as part of a trading strategy.
Data analysisData analysis is the process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, and is used in different business, science, and social science domains. In today's business world, data analysis plays a role in making decisions more scientific and helping businesses operate more effectively.
Data scienceData science is an interdisciplinary academic field that uses statistics, scientific computing, scientific methods, processes, algorithms and systems to extract or extrapolate knowledge and insights from noisy, structured, and unstructured data. Data science also integrates domain knowledge from the underlying application domain (e.g., natural sciences, information technology, and medicine). Data science is multifaceted and can be described as a science, a research paradigm, a research method, a discipline, a workflow, and a profession.
Data PreprocessingData preprocessing can refer to manipulation or dropping of data before it is used in order to ensure or enhance performance, and is an important step in the data mining process. The phrase "garbage in, garbage out" is particularly applicable to data mining and machine learning projects. Data collection methods are often loosely controlled, resulting in out-of-range values, impossible data combinations, and missing values, amongst other issues. Analyzing data that has not been carefully screened for such problems can produce misleading results.
Real options valuationReal options valuation, also often termed real options analysis, (ROV or ROA) applies option valuation techniques to capital budgeting decisions. A real option itself, is the right—but not the obligation—to undertake certain business initiatives, such as deferring, abandoning, expanding, staging, or contracting a capital investment project. For example, real options valuation could examine the opportunity to invest in the expansion of a firm's factory and the alternative option to sell the factory.
Option (finance)In finance, an option is a contract which conveys to its owner, the holder, the right, but not the obligation, to buy or sell a specific quantity of an underlying asset or instrument at a specified strike price on or before a specified date, depending on the style of the option. Options are typically acquired by purchase, as a form of compensation, or as part of a complex financial transaction.
Data warehouseIn computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis and is considered a core component of business intelligence. Data warehouses are central repositories of integrated data from one or more disparate sources. They store current and historical data in one single place that are used for creating analytical reports for workers throughout the enterprise. This is beneficial for companies as it enables them to interrogate and draw insights from their data and make decisions.
Predictive codingIn neuroscience, predictive coding (also known as predictive processing) is a theory of brain function which postulates that the brain is constantly generating and updating a "mental model" of the environment. According to the theory, such a mental model is used to predict input signals from the senses that are then compared with the actual input signals from those senses. With the rising popularity of representation learning, the theory is being actively pursued and applied in machine learning and related fields.
Self-supervised learningSelf-supervised learning (SSL) is a paradigm in machine learning for processing data of lower quality, rather than improving ultimate outcomes. Self-supervised learning more closely imitates the way humans learn to classify objects. The typical SSL method is based on an artificial neural network or other model such as a decision list. The model learns in two steps. First, the task is solved based on an auxiliary or pretext classification task using pseudo-labels which help to initialize the model parameters.
Put optionIn finance, a put or put option is a derivative instrument in financial markets that gives the holder (i.e. the purchaser of the put option) the right to sell an asset (the underlying), at a specified price (the strike), by (or on) a specified date (the expiry or maturity) to the writer (i.e. seller) of the put. The purchase of a put option is interpreted as a negative sentiment about the future value of the underlying stock. The term "put" comes from the fact that the owner has the right to "put up for sale" the stock or index.
Online machine learningIn computer science, online machine learning is a method of machine learning in which data becomes available in a sequential order and is used to update the best predictor for future data at each step, as opposed to batch learning techniques which generate the best predictor by learning on the entire training data set at once. Online learning is a common technique used in areas of machine learning where it is computationally infeasible to train over the entire dataset, requiring the need of out-of-core algorithms.