Star clusterStar clusters are large groups of stars held together by self-gravitation. Two main types of star clusters can be distinguished: globular clusters are tight groups of ten thousand to millions of old stars which are gravitationally bound, while open clusters are more loosely clustered groups of stars, generally containing fewer than a few hundred members, and are often very young.
Information contentIn information theory, the information content, self-information, surprisal, or Shannon information is a basic quantity derived from the probability of a particular event occurring from a random variable. It can be thought of as an alternative way of expressing probability, much like odds or log-odds, but which has particular mathematical advantages in the setting of information theory. The Shannon information can be interpreted as quantifying the level of "surprise" of a particular outcome.
Sombrero GalaxyThe Sombrero Galaxy (also known as Messier Object 104, M104 or NGC 4594) is a peculiar galaxy of unclear classification in the constellation borders of Virgo and Corvus, being about from the Milky Way galaxy. It is a member of the Virgo II Groups, a series of galaxies and galaxy clusters strung out from the southern edge of the Virgo Supercluster. It has a D25 isophotal diameter of approximately , making it slightly bigger in size than the Milky Way.
Inflation (cosmology)In physical cosmology, cosmic inflation, cosmological inflation, or just inflation, is a theory of exponential expansion of space in the early universe. The inflationary epoch is believed to have lasted from seconds to between and seconds after the Big Bang. Following the inflationary period, the universe continued to expand, but at a slower rate. The acceleration of this expansion due to dark energy began after the universe was already over 7.7 billion years old (5.4 billion years ago).
K-means clusteringk-means clustering is a method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid), serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells. k-means clustering minimizes within-cluster variances (squared Euclidean distances), but not regular Euclidean distances, which would be the more difficult Weber problem: the mean optimizes squared errors, whereas only the geometric median minimizes Euclidean distances.
Spiral galaxySpiral galaxies form a class of galaxy originally described by Edwin Hubble in his 1936 work The Realm of the Nebulae and, as such, form part of the Hubble sequence. Most spiral galaxies consist of a flat, rotating disk containing stars, gas and dust, and a central concentration of stars known as the bulge. These are often surrounded by a much fainter halo of stars, many of which reside in globular clusters. Spiral galaxies are named by their spiral structures that extend from the center into the galactic disc.
Lambda-CDM modelThe ΛCDM (Lambda cold dark matter) or Lambda-CDM model is a parameterization of the Big Bang cosmological model in which the universe contains three major components: first, a cosmological constant denoted by Lambda (Greek Λ) associated with dark energy; second, the postulated cold dark matter (abbreviated CDM); and third, ordinary matter.
Single-linkage clusteringIn statistics, single-linkage clustering is one of several methods of hierarchical clustering. It is based on grouping clusters in bottom-up fashion (agglomerative clustering), at each step combining two clusters that contain the closest pair of elements not yet belonging to the same cluster as each other. This method tends to produce long thin clusters in which nearby elements of the same cluster have small distances, but elements at opposite ends of a cluster may be much farther from each other than two elements of other clusters.
Open clusterAn open cluster is a type of star cluster made of tens to a few thousand stars that were formed from the same giant molecular cloud and have roughly the same age. More than 1,100 open clusters have been discovered within the Milky Way galaxy, and many more are thought to exist. They are loosely bound by mutual gravitational attraction and become disrupted by close encounters with other clusters and clouds of gas as they orbit the Galactic Center.
Multistage samplingIn statistics, multistage sampling is the taking of samples in stages using smaller and smaller sampling units at each stage. Multistage sampling can be a complex form of cluster sampling because it is a type of sampling which involves dividing the population into groups (or clusters). Then, one or more clusters are chosen at random and everyone within the chosen cluster is sampled. Using all the sample elements in all the selected clusters may be prohibitively expensive or unnecessary.
Multivariate statisticsMultivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable, i.e., multivariate random variables. Multivariate statistics concerns understanding the different aims and background of each of the different forms of multivariate analysis, and how they relate to each other. The practical application of multivariate statistics to a particular problem may involve several types of univariate and multivariate analyses in order to understand the relationships between variables and their relevance to the problem being studied.
Determining the number of clusters in a data setDetermining the number of clusters in a data set, a quantity often labelled k as in the k-means algorithm, is a frequent problem in data clustering, and is a distinct issue from the process of actually solving the clustering problem. For a certain class of clustering algorithms (in particular k-means, k-medoids and expectation–maximization algorithm), there is a parameter commonly referred to as k that specifies the number of clusters to detect.