VLC Corpus Files for English


The files available for concordance searches are listed below
Corpus Comments Word Count (MS Word) Size
Starr Report Report published by the US Government written by independent prosecutor Kenneth Starr 98,856 7735 kb
Brown Widely used corpus of American English, compiled in 1961. 75% factual writing, 25% fiction. 1,373,427 7735 kb
LOB The LOB (London / Oslo / Bergen Universities) Corpus is a British English counterpart of the Brown Corpus. It contains 500 text samples of about 2,000 words each. 1,214,753 6,834 kb
The Times (Jan, Feb, Mar) 3 files Articles published in The Times for Jan-March 1995. Includes business, home news, readers letters and reviews. (Jan)3,567,629;(Feb)3,351,646 (Jan)22,076 kb; (Feb)20,751 kb; (Mar)20,425 kb
SCMP Miscellaneous texts from the South China Morning Post, compiled by Phil Benson of HKU. 1,202,905 7,272 kb
Business & economy Texts on business & economics, compiled from internet documents, 1998. 119,972 738 kb
Computing Texts on computing, compiled from internet documents, 1998. 170,691 1077 kb
Sports Texts on sport, compiled from internet documents, 1998. 155,539 919 kb
Health Texts on health topics, compiled from internet documents, 1998. 176,566 1078 kb
Students' writing Collection of student writing from HKPU, collected by English Dept staff. Reports, letters & instructions. 230,418 1480 kb
Language & teaching Collected from the Independent Nespaper, articles relating to teaching and language. 96, 497 573 kb
HK Government reports (English) Reports published on the internet in both languages (see parallel texts) 301, 218 1909 kb