How many words in the English Language?

How many words in the English Language?

    by: Daniel Burke

   

I found this interesting little snippet from the Oxford dictionary.

 

How many words are there in the English language?

There is no single sensible answer to this question. It's impossible to count the number of words in a language, because it's so hard to decide what actually counts as a word. Is dog one word, or two (a noun meaning 'a kind of animal', and a verb meaning 'to follow persistently')? If we count it as two, then do we count inflections separately too (e.g. dogs = plural noun, dogs = present tense of the verb). Is dog-tired a word, or just two other words joined together?

Is hot dog really two words, since it might also be written as hot-dogor even hotdog?

It's also difficult to decide what counts as 'English'. What about medical and scientific terms? Latin words used in law, French words used in cooking, German words used in academic writing, Japanese words used in martial arts? Do you count Scots dialect? Teenage slang? Abbreviations?

The Second Edition of the 20-volume  Oxford English Dictionary contains full entries for 171,476 words in current use, and 47,156 obsolete words. To this may be added around 9,500 derivative words included as subentries. Over half of these words are nouns, about a quarter adjectives, and about a seventh verbs; the rest is made up of exclamations, conjunctions, prepositions, suffixes, etc.

And these figures don't take account of entries with senses for different word classes (such as noun and adjective).

This suggests that there are, at the very least, a quarter of a million distinct English words, excluding inflections, and words from technical and regional vocabulary not covered by the OED, or words not yet added to the published dictionary, of which perhaps 20 per cent are no longer in current use. If distinct senses were counted, the total would probably approach three quarters of a million.

It is very easy to see why machine translation is so hard, there is a potential of 750,000 words to deal with in each language pair!!! 

Posted by: Daniel Burke in [MachineTranslation] Digest Number 367 


Lascia un commento

Effettua il login con uno di questi metodi per inviare il tuo commento:

Logo WordPress.com

Stai commentando usando il tuo account WordPress.com. Chiudi sessione / Modifica )

Foto Twitter

Stai commentando usando il tuo account Twitter. Chiudi sessione / Modifica )

Foto di Facebook

Stai commentando usando il tuo account Facebook. Chiudi sessione / Modifica )

Google+ photo

Stai commentando usando il tuo account Google+. Chiudi sessione / Modifica )

Connessione a %s...