DigitizationDigitization is the process of converting information into a digital (i.e. computer-readable) format. The result is the representation of an object, , sound, document, or signal (usually an analog signal) obtained by generating a series of numbers that describe a discrete set of points or samples. The result is called digital representation or, more specifically, a , for the object, and digital form, for the signal.
Information extractionInformation extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents and other electronically represented sources. In most of the cases this activity concerns processing human language texts by means of natural language processing (NLP). Recent activities in multimedia document processing like automatic annotation and content extraction out of images/audio/video/documents could be seen as information extraction Due to the difficulty of the problem, current approaches to IE (as of 2010) focus on narrowly restricted domains.
Book scanningBook scanning or book digitization (also: magazine scanning or magazine digitization) is the process of converting physical books and magazines into digital media such as , electronic text, or electronic books (e-books) by using an . Large scale book scanning projects have made many books available online. Digital books can be easily distributed, reproduced, and read on-screen. Common file formats are DjVu, Portable Document Format (PDF), and (TIFF).
Knowledge extractionKnowledge extraction is the creation of knowledge from structured (relational databases, XML) and unstructured (text, documents, s) sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to information extraction (NLP) and ETL (data warehouse), the main criterion is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema.
Terminology extractionTerminology extraction (also known as term extraction, glossary extraction, term recognition, or terminology mining) is a subtask of information extraction. The goal of terminology extraction is to automatically extract relevant terms from a given corpus. In the semantic web era, a growing number of communities and networked enterprises started to access and interoperate through the internet. Modeling these communities and their information needs is important for several web applications, like topic-driven web crawlers, web services, recommender systems, etc.
Internet ArchiveThe Internet Archive is an American digital library founded on May 10, 1996, and chaired by free information advocate Brewster Kahle. It provides free access to collections of digitized materials like websites, software applications, music, audiovisual and print materials. The Archive is also an activist organization, advocating a free and open Internet. , the Internet Archive holds more than 39 million print materials, 11.6 million pieces of audiovisual content, 2.6 million software programs, 15 million audio files, 4.
Automatic summarizationAutomatic summarization is the process of shortening a set of data computationally, to create a subset (a summary) that represents the most important or relevant information within the original content. Artificial intelligence algorithms are commonly developed and employed to achieve this, specialized for different types of data. Text summarization is usually implemented by natural language processing methods, designed to locate the most informative sentences in a given document.
MetadataMetadata (or metainformation) is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: Descriptive metadata – the descriptive information about a resource. It is used for discovery and identification. It includes elements such as title, abstract, author, and keywords. Structural metadata – metadata about containers of data and indicates how compound objects are put together, for example, how pages are ordered to form chapters.
Preservation (library and archive)In conservation, library and archival science, preservation is a set of preventive conservation activities aimed at prolonging the life of a record, book, or object while making as few changes as possible. Preservation activities vary widely and may include monitoring the condition of items, maintaining the temperature and humidity in collection storage areas, writing a plan in case of emergencies, digitizing items, writing relevant metadata, and increasing accessibility.
HistoryHistory (derived ) is the systematic study and documentation of the human past. The period of events before the invention of writing systems is considered prehistory. "History" is an umbrella term comprising past events as well as the memory, discovery, collection, organization, presentation, and interpretation of these events. Historians seek knowledge of the past using historical sources such as written documents, oral accounts, art and material artifacts, and ecological markers.
Art historyArt history is the study of aesthetic objects and visual expression in historical and stylistic context. Traditionally, the discipline of art history emphasized painting, drawing, sculpture, architecture, ceramics and decorative arts; yet today, art history examines broader aspects of visual culture, including the various visual and conceptual outcomes related to an ever-evolving definition of art. Art history encompasses the study of objects created by different cultures around the world and throughout history that convey meaning, importance or serve usefulness primarily through visual representations.
Google BooksGoogle Books (previously known as Google Book Search, Google Print, and by its code-name Project Ocean) is a service from Google that searches the full text of books and magazines that Google has scanned, converted to text using optical character recognition (OCR), and stored in its digital database. Books are provided either by publishers and authors through the Google Books Partner Program, or by Google's library partners through the Library Project. Additionally, Google has partnered with a number of magazine publishers to digitize their archives.
Information retrievalInformation retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other content-based indexing. Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for the metadata that describes data, and for databases of texts, images or sounds.
Metadata registryA metadata registry is a central location in an organization where metadata definitions are stored and maintained in a controlled method. A metadata repository is the database where metadata is stored. The registry also adds relationships with related metadata types. A metadata engine collects, stores and analyzes information about data and metadata (data about data) in use within a domain. Metadata registries are used whenever data must be used consistently within an organization or group of organizations.
Search engineA search engine is a software system that finds web pages that match a web search. They search the World Wide Web in a systematic way for particular information specified in a textual web search query. The search results are generally presented in a line of results, often referred to as search engine results pages (SERPs). The information may be a mix of hyperlinks to web pages, images, videos, infographics, articles, and other types of files. Some search engines also mine data available in databases or open directories.
Information scienceInformation science (also known as information studies) is an academic field which is primarily concerned with analysis, collection, classification, manipulation, storage, retrieval, movement, dissemination, and protection of information. Practitioners within and outside the field study the application and the usage of knowledge in organizations in addition to the interaction between people, organizations, and any existing information systems with the aim of creating, replacing, improving, or understanding the information systems.
InformationInformation is an abstract concept that refers to that which has the power to inform. At the most fundamental level, information pertains to the interpretation (perhaps formally) of that which may be sensed, or their abstractions. Any natural process that is not completely random and any observable pattern in any medium can be said to convey some amount of information. Whereas digital signals and other data use discrete signs to convey information, other phenomena and artefacts such as analogue signals, poems, pictures, music or other sounds, and currents convey information in a more continuous form.
Search engine (computing)A search engine is an information retrieval system designed to help find information stored on a computer system. It is an information retrieval software program that discovers, crawls, transforms, and stores information for retrieval and presentation in response to user queries. The search results are usually presented in a list and are commonly called hits. A search engine normally consists of four components, as follows: a search interface, a crawler (also known as a spider or bot), an indexer, and a database.
Metadata standardA metadata standard is a requirement which is intended to establish a common understanding of the meaning or semantics of the data, to ensure correct and proper use and interpretation of the data by its owners and users. To achieve this common understanding, a number of characteristics, or attributes of the data have to be defined, also known as metadata. Metadata is often defined as data about data.
Intellectual historyIntellectual history (also the history of ideas) is the study of the history of human thought and of intellectuals, people who conceptualize, discuss, write about, and concern themselves with ideas. The investigative premise of intellectual history is that ideas do not develop in isolation from the thinkers who conceptualize and apply those ideas; thus the intellectual historian studies ideas in two contexts: (i) as abstract propositions for critical application; and (ii) in concrete terms of culture, life, and history.