This article reports on the current state of the OBI DICT project, a bilingual e-dictionary of oracle-bone inscriptions (OBI), incorporating artificial intelligence (AI) image recognition technology. It first provides a brief overview of the development of ...
Natural language processing has experienced significant improvements with the development of Transformer-based models, which employ self-attention mechanism and pre-training strategies. However, these models still present several obstacles. A notable issue ...
Text-based games (TBGs) have emerged as useful benchmarks for evaluating progress at the intersection of grounded language understanding and reinforcement learning (RL). Recent work has proposed the use of external knowledge to improve the efficiency of RL ...
Most of the Natural Language Processing (NLP) algorithms involve use of distributed vector representations of linguistic units (primarily words and sentences) also known as embeddings in one way or another. These embeddings come in two flavours namely, sta ...
Voice communication is the main channel to exchange information between pilots and Air-Traffic Controllers (ATCos). Recently, several projects have explored the employment of speech recognition technology to automatically extract spoken key information suc ...
The current information landscape is characterised by a vast amount of relatively semantically homogeneous, when observed in isolation, data silos that are, however, drastically semantically fragmented when considered as a whole. Within each data silo, inf ...
We propose a novel sparse dictionary learning method for planar shapes in the sense of Kendall, namely configurations of landmarks in the plane considered up to similitudes. Our shape dictionary method provides a good trade-off between algorithmic simplici ...
Systems modelling and simulation are two important facets for thoroughly and effectively analysing manufacturing processes. The ever-growing complexity of the latter, the increasing amount of knowledge, and the use of Semantic Web techniques adhering meani ...
In this paper, we propose a novel Deep Micro-Dictionary Learning and Coding Network (DDLCN). DDLCN has most of the standard deep learning layers (pooling, fully, connected, input/output, etc.) but the main difference is that the fundamental convolutional l ...
Different senses of source words must often be rendered by different words in the target language when performing machine translation (MT). Selecting the correct translation of polysemous words can be done based on the contexts of use. However, state-of-th ...
We propose methods to link automatically parsed linguistic data to the WordNet. We apply these methods on a trilingual dictionary in Fula, English and French. Dictionary entry parsing is used to collect the linguistic data. Then we connect it to the Open M ...
State-of-the-art automatic speech recognition (ASR) and text-to-speech systems require a pronunciation lexicon that maps each word to a sequence of phones. Manual development of lexicons is costly as it needs linguistic knowledge and human expertise. To fa ...
The data compiled through many Wordnet projects can be a rich source of seed information for a multilingual dictionary. However, the original Princeton WordNet was not intended as a dictionary per se, and spawning other languages from it introduces inheren ...
This paper reports on an approach and experiments to automatically build a cross-lingual multi-word entity resource. Starting from a collection of millions of acronym/expansion pairs for 22 languages where expansion variants were grouped into monolingual c ...
European Language Resources Association (ELRA)2016
This paper describes our conceptual framework of closed-loop lifecycle information sharing for product-service in the Internet of Things (IoT). The framework is based on the ontology model of product-service and a type of IoT message standard, Open Messagi ...
Machine Translation (MT) has progressed tremendously in the past two decades. The rule-based and interlingua approaches have been superseded by statistical models, which learn the most likely translations from large parallel corpora. System design does not ...
One of the key challenges involved in building statistical automatic speech recognition (ASR) systems is modeling the relationship between subword units or “lexical units” and acoustic feature observations. To model this relationship two types of resources ...
The product development process (PDP) is a complex process encompassing many very diverse activities, and involving a fairly big number of actors, spread across different professions, teams and companies. In todayâ s product development, more often than n ...
Automatic speech recognition (ASR) systems, through use of the phoneme as an intermediary unit representation, split the problem of modeling the relationship between the written form, i.e., the text and the acoustic speech signal into two disjoint processe ...
This paper looks at the challenges that the Kamusi Project faces for acquiring open lexical data for less-resourced languages (LRLs), of a range, depth, and quality that can be useful within Human Language Technology (HLT). These challenges include accessi ...