CELEX A GUIDE FOR USERS GAVIN BURNAGE CELEX CENTRE FOR LEXICAL INFORMATION CELE
CELEX A GUIDE FOR USERS GAVIN BURNAGE CELEX CENTRE FOR LEXICAL INFORMATION CELEX A GUIDE FOR USERS C C C C C C C C C C CCCCCC C CCCCCCCCCCCCC C C C CCCCCCCCCCCCCCCC CCCCCCCCCC CC C CCCCCCCC CCCCCCCC CCCCCCCC CCCCCCCC CCCCCCCC CCCCCCCC CCCCCCCC CCCCCCCC CCCCCCCC CCCCCCCCCC CELEX – CENTRE FOR LEXICAL INFORMATION Max Planck Institute for Psycholinguistics Wundtlaan 1 6525 XD Nijmegen The Netherlands Telephone 31-(0)24-3615797 31-(0)24-3615751 Fax 31-(0)24-3521213 Electronic mail internet: celex@mpi.nl First published in the Netherlands in 1990 c ⃝1990 CELEX – CENTRE FOR LEXICAL INFORMATION ISBN 90 373 0063 4 No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher Typeset using the T EX computer typesetting system. Printed by drukkerij SSN, Nijmegen. T EX is a trademark of the American Mathematical Society. INTRODUCTION There can be no doubt that lexicography is a very difficult sphere of linguistic activity. Many lexicographers have given vent to their feelings in this respect. Perhaps the most colourful of these opinions based on a lexicographer’s long experience is that of J.J Scaliger (16th–17th cent.) who says in fine Latin verses that the worst criminals should neither be executed nor sentenced to forced labour, but should be condemned to compile dictionaries, because all the tortures are included in this work. — LADISLAV ZGUSTA Manual of Lexicography (1971) The 1980s will one day be seen as a watershed in lexicography – the decade in which computer applications began to alter radically the methods and the potential of lexicography. Gone are the days of painstaking manual transcription and sorting on paper slips: the future is on disk, in the form of vast lexical databases, continuously updated, that can generate a dictionary of a given size and scope in a fraction of the time it used to take. — DAVID CRYSTAL The Cambridge Encyclopedia of Language (1988) CONTENTS 1 DATABASES AND LEXICONS 1–1 1.1 Why use a database? 1–1 2 LEXICON TYPES 1–6 2.1 Dutch Lemmas 1–7 2.2 Dutch Wordforms 1–11 2.3 Dutch Abbreviations 1–11 2.4 Dutch INL Corpus Types 1–12 2.5 English Lemmas 1–14 2.6 English Wordforms 1–16 2.7 English COBUILD Corpus Types 1–17 2.8 German Lemmas 1–18 2.9 German Wordforms 1–21 2.10 German Mannheim Corpus Types 1–21 1 DATABASES AND LEXICONS This introduction tries to do two things. In the first section, for those who aren’t familiar with the ideas and possibilities of databases and lexicons, there is a description of the way in which a computer database and lexicon is like—and more importantly unlike—a traditional paper dictionary. If you’re already familiar with such things, you may like to skip ahead to the second section, where there is a description of each of the main lexicon types available to you in flex. Fundamen- tal to this description is the difference between wordforms (the words we use in everyday speech and writing) and lem- mas (words used to represent families of wordforms, in the same way as bold-type dictionary headings, which take the form of stems or headwords). Since the linguistic information available to you depends on the type of your lexicon, you should make sure you understand the differences between the various lexicon types before beginning your work. And when you start work with flex, the special program which helps you build and use your lexicons, you’ll be better offfor having read these sections carefully. In the third and last section of this introductory chapter, you can find out how to log into celex using local, national, and international computer networks. 1.1 WHY USE A DATABASE? Since we are dealing with words, we can start offby thinking of databases in terms of a paper dictionary. A book like the Van Dale Groot Woordenboek der Nederlandse Taal is essen- tially a long list of words with information supplied alongside each word. The key to a dictionary is the alphabetical order of its word entries: you can only look up one particular word at a time and examine the information given for it. If you’ve got time, you can look at every page to find all the words with a certain grammatical code or pronunciation, but, quite understandably, most people don’t do this unless they’re really desperate. In its simplest form, a database can be like a dictionary: just a list of words, and some information alongside each word. The first important difference between a computer database and a paper dictionary is that the database uses different columns to store separate types of information, whereas the dictionary uses one paragraph of text, and marks different sorts of information within that text by using different type- faces and coding systems, or by giving the information in a particular order. Dictionary text is fixed once it is printed. You can’t move bits of an entry around, or miss them out: you are presented with everything at once, and you may have to read a lot of irrelevant information before you find what you’re looking for. The columns which make up a database are much more regimented, but that, paradoxically, is what gives a database its flexibility. Each type of information keeps strictly to its own dedicated place, which means it’s easier for the computer to locate and serve up one individual item, or several particular items, relating to each word that interests you. So, you can look up a word and its word class code and pronunciation, say, without even having to glance at all the other information. The diagram below is a simple representation of how information is held in a database. Headword Class Phonetics aback ADV @-’b&k abacus N ’&-b@-k@s abandon N @-’b&n-d@n abandon V @-’b&n-d@n abandoned A @-’b&n-d@nd abandonment N @-’b&n-d@n-m@nt abase V @-’beIs abasement N @-’beIs-m@nt abash V @-’b&S abate V @-’beIt The crucial difference between a database and a dictionary is the flexibility that a computer can achieve with the properly- defined rows and columns: you can gather together different parts of the database, and display the information in any way you like. This illustration shows you three vertical columns, which are entitled ‘Headword’, ‘Class’, and ‘Phonetics’, and ten horizontal rows, each of which displays information for each headword under the correct column heading. A row thus contains every type of information for one word, while a column contains one specific type of information for every headword. The illustration is, of course, only a very simple example. To get an idea of what the whole celex Dutch, English or German database might look like, imagine three hundred or so more column headings added to the right hand side, and a hundred thousand or so more rows added at the bottom. This diagram would then represent a small part of the top left corner of an enormous grid packed with lexical information. Experts have calculated that if you printed out the rest of this table in full, you would end up with a piece of paper approximately 5.5m wide, and 2.4km long – so you could probably walk round it in just under an hour. Using flex, which itself uses a database management system to access the information in the grid, you can extract tiny bits of information, or long and detailed lists, just as you please. When you create a lexicon, you’re essentially creating a little dictionary, designed to your own specifications. Unlike a dictionary, you can use keys other than the head- word when you look something up in a database. On a simple level, this means you can look up the verb walk, instead of the noun walk. On another level, it means that you can get a list of all the verbs in the database, excluding all the other words which are not verbs. The individual printed paragraphs for each word in a dictionary are fixed, but the corresponding rows in a computer database can be moved about and rearranged just as you want them. So, it’s possible to create a lexicon like the one illustrated below by using flex restrictions. You simply state that you want to see all the words which have the word class code V, and you can then get as much information as you like about the verbs in your list. The example below shows a list of verbs with their pronunciations: Headword Transcription abandon @-’b&n-d@n abase @-’beIs abash @-’b&S abate @-’beIt Since you’ve specified that you only want verbs in your list, there’s no need to put the word class code column on display. The computer uses it in preparing the list, but you don’t have to look at it – you’d just get a list of V’s. The possibilities for creating all sorts of lexicons are seemingly endless. You have hundreds of columns to choose from, most of which contain information you might want to inspect on your screen. Other columns contain information which can be used to control what is shown on your screen uploads/Finance/ celex-user-guide.pdf
Documents similaires






-
34
-
0
-
0
Licence et utilisation
Gratuit pour un usage personnel Attribution requise- Détails
- Publié le Fev 22, 2022
- Catégorie Business / Finance
- Langue French
- Taille du fichier 0.1644MB