Directory

Encyclopedia

NodeWorks
                              ENCYCLOPEDIA

Link Checker

Home
Encyclopedia : C : CO : COR :

Corpus

 

Corpus

In law a corpus (Latin: "body") is a set, a collection of documents and sources. See Corpus Juris Civilis.

In linguistics, corpus (plural corpora) is a large and structured set of texts (now usually electronically stored and processed). A corpus may contain single texts in single language (monolingual corpus) or text data in multiple languages (multilingual corpus). Multilingual corpora that have been specially formatted for side-by-side comparison are called aligned parallel corpora.

In order to make the corpora more useful for doing linguistic research, they are often subjected to a process known as annotating.
An example for annotating a corpus is part-of-speech tagging, or POS-tagging, in which information about each word's part of speech (verb, noun, adjective, etc.) are added to the corpus in the form of tags.

Corpora are the main knowledge base in corpus linguistics.

In biology, corpus refers to the main body/mass/part of an organ or other anatomical structure, distinguished from the head or tail.

See also

  • translation memory
  • parallel text alignment
  • Bank of English
  • British National Corpus

    External links



  • NodeWorks boosts web surfing!
    Page Returned in 0.057 seconds - HTML Compressed 68.5%

    This article is from Wikipedia. All text is available
    under the terms of the GNU Free Documentation License.
     GNU Free Documentation License
    © 2008 Chamas Enterprises Inc.