Concepts inAutomatic identification of non-compositional phrases
Principle of compositionality
In mathematics, semantics, and philosophy of language, the Principle of Compositionality is the principle that the meaning of a complex expression is determined by the meanings of its constituent expressions and the rules used to combine them. This principle is also called Frege's Principle, because Gottlob Frege is widely credited for the first modern formulation of it.
more from Wikipedia
Natural language processing
Natural language processing (NLP) is a field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human (natural) languages. Specifically, it is the process of a computer extracting meaningful information from natural language input and/or producing natural language output. In theory, natural language processing is a very attractive method of human¿computer interaction.
more from Wikipedia
Word
In language, a word is the smallest element that may be uttered in isolation with semantic or pragmatic content (with literal or practical meaning). This contrasts with a morpheme, which is the smallest unit of meaning but will not necessarily stand on its own. A word may consist of a single morpheme, or several (rocks, redness, quickly, running, unexpected), whereas a morpheme may not be able to stand on its own as a word (in the words just mentioned, these are -s, -ness, -ly, -ing, un-, -ed).
more from Wikipedia
Text corpus
In linguistics, a corpus (plural corpora) or text corpus is a large and structured set of texts (now usually electronically stored and processed). They are used to do statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules on a specific universe. A corpus may contain texts in a single language (monolingual corpus) or text data in multiple languages (multilingual corpus).
more from Wikipedia
Mutual information
In probability theory and information theory, the mutual information (sometimes known by the archaic term transinformation) of two random variables is a quantity that measures the mutual dependence of the two random variables. The most common unit of measurement of mutual information is the bit, when logarithms to the base 2 are used.
more from Wikipedia
Hypothesis
A hypothesis (from Greek ¿¿¿¿¿¿¿¿; plural hypotheses) is a proposed explanation for a phenomenon. The term derives from the Greek, ¿¿¿¿¿¿¿¿¿¿ ¿ hypotithenai meaning "to put under" or "to suppose". For a hypothesis to be put forward as a scientific hypothesis, the scientific method requires that one can test it. Scientists generally base scientific hypotheses on previous observations that cannot satisfactorily be explained with the available scientific theories.
more from Wikipedia
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments. A statistician is someone who is particularly well versed in the ways of thinking necessary for the successful application of statistical analysis. Such people have often gained this experience through working in any of a wide number of fields.
more from Wikipedia