Big dataBig data primarily refers to data sets that are too large or complex to be dealt with by traditional data-processing application software. Data with many entries (rows) offer greater statistical power, while data with higher complexity (more attributes or columns) may lead to a higher false discovery rate. Though used sometimes loosely partly because of a lack of formal definition, the interpretation that seems to best describe big data is the one associated with a large body of information that we could not comprehend when used only in smaller amounts.
Open scientific dataOpen scientific data or open research data is a type of open data focused on publishing observations and results of scientific activities available for anyone to analyze and reuse. A major purpose of the drive for open data is to allow the verification of scientific claims, by allowing others to look at the reproducibility of results, and to allow data from many sources to be integrated to give new knowledge. The modern concept of scientific data emerged in the second half of the 20th century, with the development of large knowledge infrastructure to compute scientific information and observation.
Scientific communityThe scientific community is a diverse network of interacting scientists. It includes many "sub-communities" working on particular scientific fields, and within particular institutions; interdisciplinary and cross-institutional activities are also significant. Objectivity is expected to be achieved by the scientific method. Peer review, through discussion and debate within journals and conferences, assists in this objectivity by maintaining the quality of research methodology and interpretation of results.
Open scienceOpen science is the movement to make scientific research (including publications, data, physical samples, and software) and its dissemination accessible to all levels of society, amateur or professional. Open science is transparent and accessible knowledge that is shared and developed through collaborative networks. It encompasses practices such as publishing open research, campaigning for open access, encouraging scientists to practice open-notebook science (such as openly sharing data and code), broader dissemination and engagement in science and generally making it easier to publish, access and communicate scientific knowledge.
Scientific methodThe scientific method is an empirical method for acquiring knowledge that has characterized the development of science since at least the 17th century (with notable practitioners in previous centuries; see the article history of scientific method for additional detail.) It involves careful observation, applying rigorous skepticism about what is observed, given that cognitive assumptions can distort how one interprets the observation.
Ensemble learningIn statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. Unlike a statistical ensemble in statistical mechanics, which is usually infinite, a machine learning ensemble consists of only a concrete finite set of alternative models, but typically allows for much more flexible structure to exist among those alternatives.
Open researchOpen research is research that is openly accessible by others. Those who publish research in this way are often concerned with making research more transparent, more collaborative, more wide-reaching, and more efficient. Open research aims to make both research methods and the resulting data freely available, often via the internet, in order to support reproducibility and, potentially, massively distributed research collaboration. In this regard, it is related to both open source software and citizen science.
Machine learningMachine learning (ML) is an umbrella term for solving problems for which development of algorithms by human programmers would be cost-prohibitive, and instead the problems are solved by helping machines 'discover' their 'own' algorithms, without needing to be explicitly told what to do by any human-developed algorithms. Recently, generative artificial neural networks have been able to surpass results of many previous approaches.
ScienceScience is a rigorous, systematic endeavor that builds and organizes knowledge in the form of testable explanations and predictions about the universe. Modern science is typically divided into three major branches: natural sciences (e.g., biology, chemistry, and physics), which study the physical world; the social sciences (e.g., economics, psychology, and sociology), which study individuals and societies; and the formal sciences (e.g., logic, mathematics, and theoretical computer science), which study formal systems, governed by axioms and rules.
Scientific consensusScientific consensus is the generally held judgment, position, and opinion of the majority or the supermajority of scientists in a particular field of study at any particular time. Consensus is achieved through scholarly communication at conferences, the publication process, replication of reproducible results by others, scholarly debate, and peer review. A conference meant to create a consensus is termed as a consensus conference.
Natural scienceNatural science is one of the branches of science concerned with the description, understanding and prediction of natural phenomena, based on empirical evidence from observation and experimentation. Mechanisms such as peer review and repeatability of findings are used to try to ensure the validity of scientific advances. Natural science can be divided into two main branches: life science and physical science. Life science is alternatively known as biology, and physical science is subdivided into branches: physics, chemistry, earth science, and astronomy.
Computational scienceComputational science, also known as scientific computing, technical computing or scientific computation (SC), is a division of science that uses advanced computing capabilities to understand and solve complex physical problems. This includes Algorithms (numerical and non-numerical): mathematical models, computational models, and computer simulations developed to solve sciences (e.
Data scienceData science is an interdisciplinary academic field that uses statistics, scientific computing, scientific methods, processes, algorithms and systems to extract or extrapolate knowledge and insights from noisy, structured, and unstructured data. Data science also integrates domain knowledge from the underlying application domain (e.g., natural sciences, information technology, and medicine). Data science is multifaceted and can be described as a science, a research paradigm, a research method, a discipline, a workflow, and a profession.
Open dataOpen data is data that is openly accessible, exploitable, editable and shared by anyone for any purpose. Open data is licensed under an open license. The goals of the open data movement are similar to those of other "open(-source)" movements such as open-source software, open-source hardware, open content, open specifications, open education, open educational resources, open government, open knowledge, open access, open science, and the open web. The growth of the open data movement is paralleled by a rise in intellectual property rights.
Online machine learningIn computer science, online machine learning is a method of machine learning in which data becomes available in a sequential order and is used to update the best predictor for future data at each step, as opposed to batch learning techniques which generate the best predictor by learning on the entire training data set at once. Online learning is a common technique used in areas of machine learning where it is computationally infeasible to train over the entire dataset, requiring the need of out-of-core algorithms.
Feature learningIn machine learning, feature learning or representation learning is a set of techniques that allows a system to automatically discover the representations needed for feature detection or classification from raw data. This replaces manual feature engineering and allows a machine to both learn the features and use them to perform a specific task. Feature learning is motivated by the fact that machine learning tasks such as classification often require input that is mathematically and computationally convenient to process.
Unsupervised learningUnsupervised learning, is paradigm in machine learning where, in contrast to supervised learning and semi-supervised learning, algorithms learn patterns exclusively from unlabeled data. Neural network tasks are often categorized as discriminative (recognition) or generative (imagination). Often but not always, discriminative tasks use supervised methods and generative tasks use unsupervised (see Venn diagram); however, the separation is very hazy. For example, object recognition favors supervised learning but unsupervised learning can also cluster objects into groups.
Center for Open ScienceThe Center for Open Science is a non-profit technology organization based in Charlottesville, Virginia with a mission to "increase the openness, integrity, and reproducibility of scientific research." Brian Nosek and Jeffrey Spies founded the organization in January 2013, funded mainly by the Laura and John Arnold Foundation and others. The organization began with work in reproducibility of psychology research, with the large-scale initiative Reproducibility Project: Psychology.
History of scienceThe history of science covers the development of science from ancient times to the present. It encompasses all three major branches of science: natural, social, and formal. Science's earliest roots can be traced to Ancient Egypt and Mesopotamia around 3000 to 1200 BCE. These civilizations' contributions to mathematics, astronomy, and medicine influenced later Greek natural philosophy of classical antiquity, wherein formal attempts were made to provide explanations of events in the physical world based on natural causes.
Deep learningDeep learning is part of a broader family of machine learning methods, which is based on artificial neural networks with representation learning. The adjective "deep" in deep learning refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.