Introductory Guide to S-Plus Final Version B.D. Ripley Professor of Applied Sta

Introductory Guide to S-Plus Final Version B.D. Ripley Professor of Applied Statistics, University of Oxford e-mail:       24 August 1994 Preface This guide was originally written for graduate students in Statistics at the University of Ox- ford. The first versions were based closely on notes by Dr. Bill Venables of the Department of Statistics at the University of Adelaide, but have been updated to reflect later versions of S, the extensions of S-Plus and local facilities. Several sections, in particular 4, 6 and 11, remain close to Dr. Venables’ original material. This guide will no longer be updated, following the publication of Venables & Ripley (1994). [See p. 1. Where that takes a significantly better approach than earlier editions of these notes, the material formerly here has been dropped.] The guide is to S-Plus, but much of it will be relevant to users of the underlying S. Extensions which are only in S-Plus include dynamic graphics ( 6.3, "!$# %& and %' () ) and the classical statistics functions ( 9). The terminology of this guide is intended to be precise, only referring to S-Plus rather than S for features unique to S-Plus. These notes were written for a particular environment, S-Plus 3.2 on Sun SparcStations running the Open Windows windowing system. You will find a number of differences depending on your local environment. It will help to have the library !(*'+-,/. available — it should be in the same source as these notes. It can be also be obtained by anonymous ftp from 0 1 !324/5768%*9 1 9% 6:4/;76 1=< 6>#-2 ( in file '=#- ?=@-?A!B('+-,.C6 %D&C6FE . It is available from %*9 1 9+"(* (see Section A.2) as %3,/)GH!('B+=,I.HJ"!4 0 @ Alternatively, +( "! 1 !3.KML=NB@=@=O from Venables & Ripley (1994) can be used. This guide may be freely copied and redistributed for any educational purpose (including com- mercial courses) provided its authorship (B.D. Ripley and W.N. Venables) is clearly stated. Where appropriate, a small charge to cover the costs of production and distribution, only, may be made. B.D. Ripley, University of Oxford, 24th August, 1994. i Contents ii Contents 1 Introduction 1 1.1 Starting and Finishing PPQPPQPQPPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 1 1.2 Getting Help PQPRPSPRPQPPQPPQPQPPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 2 1.3 Hardcopy Output PSPRPQPPQPPQPQPPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 3 2 Datasets 3 3 A First Session 5 4 Simple Data Manipulation 6 4.1 Vectors PQPQPPQPRPSPRPQPPQPPQPQPPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 6 4.2 Vector Arithmetic PRPQPPQPPQPQPPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 6 4.3 Generating Regular Sequences of Numbers. PRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 7 4.4 Logical Vectors. Missing Values PPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 8 4.5 Character Vectors PRPQPPQPPQPQPPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 8 4.6 Index Vectors. Selecting and Modifying Subsets of a Data Set PQPRPSPRPQPSPRP 9 4.7 Arrays PQPQPPQPRPSPRPQPPQPPQPQPPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 10 4.8 Lists PPQPQPPQPRPSPRPQPPQPPQPQPPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 11 4.9 Data Frames PQPRPSPRPQPPQPPQPQPPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 12 5 Reading data into S 14 5.1 Writing out data PSPRPQPPQPPQPQPPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 15 6 Graphics 16 6.1 Graphical Parameters PPQPPQPQPPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 16 6.2 Some Basic Plotting Functions PQPPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 17 6.3 Interaction with Plots PPQPPQPQPPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 17 6.4 Brush and Spin PRPSPRPQPPQPPQPQPPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 18 6.5 Equally-scaled plots PQPPQPPQPQPPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 18 Contents iii 7 Statistical Summaries 20 7.1 Arithmetical Summaries PQPPQPQPPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 20 7.2 Histograms and Stem-and-Leaf Plots PQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 20 7.3 Boxplots PQPPQPRPSPRPQPPQPPQPQPPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 21 8 Distributions 22 8.1 Q-Q Plots PPQPRPSPRPQPPQPPQPQPPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 23 9 Classical Statistics 24 10 Handling Categorical Data 27 10.1 The Function 9 1 '-'+I.KTPDPDPUO and Ragged Arrays PPQPQPPQPPQPPQPRPSPRPQPSPRP 28 11 Loops and Conditional Execution 29 12 Writing Your Own Functions 30 13 Statistical Models 32 13.1 Model Formulas PSPRPQPPQPPQPQPPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 32 13.2 One-way Layouts PRPQPPQPPQPQPPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 33 13.3 Designed Experiments PPQPPQPQPPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 35 13.4 Generalized Linear Models PQPQPPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 39 13.5 Updating and Selecting Models PPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 42 14 Multivariate Analysis 43 Appendix A Libraries 45 A.1 Library !B('B+=,/. PSPRPQPPQPPQPQPPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 46 A.2 Sources of Libraries PQPPQPPQPQPPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 46 Intr V oduction 1 1 Introduction S is a statistical language developed at AT&T’s Bell Laboratories. S-Plus is a binary distri- bution of S, with added functions, produced by the StatSci Division of MathSoft in Seattle. The S system was radically re-designed in the 1988 release and known as ‘New S’. In August 1991 a new release of what is once again called S consisted of a moderate revision of ‘New S’ together with far-ranging extensions. S-Plus 3.0 was introduced in late 1991, based on that release of S, with numerous additional features. S-Plus 3.1 was released at the very end of 1992, and S-Plus 3.2 in very early 1994. The main references are: R.A. Becker, J.M. Chambers and A.R. Wilks (1988) The NEW S language. Wadsworth & Brooks/Cole. J.M. Chambers and T.J. Hastie (1992) Statistical Models in S. Wadsworth & Brooks/Cole. It is not the intention of this guide to replace the books. Rather these notes are intended as a brief introduction to the capabilities of the S programming language and to how to perform some common statistical procedures within S. Users of S-Plus will need to consult both books, probably frequently. Both books contain some reference documentation, but the on-line ver- sions (see 1.2) are later and definitive. There also manuals for S-Plus itself, whose organization differs from release to release. Other books include W.N. Venables and B.D. Ripley (1994) Modern Applied Statistics with S-Plus. New York: Springer ISBN 0-387-94350-1 which goes far beyond the coverage of this guide, including many topics (such as robust statis- tics, non-linear regressions, modern regression, survival analysis, tree-based models, time se- ries and spatial statistics) not covered here, as well as in greater depth on what is covered. 1.1 Starting and Finishing To start S-Plus, type the command 0W1-< & (),$XH@I'+I#W% After a short while (and, the first time, an initialization message) you get the S-Plus prompt Y : Z This is waiting for input from you. TechnicallyS is a function languagewith a very simple syntax. Likemost Unix based packages it is case sensitive, so N and 1 are different variables. Elementary commands consist of either expressions or assignments. If an expression is given as a command, it is evaluated, printed, and the value is discarded [ . An assignment also evaluates an expression and passes the value \ which can be changed, but the default is assumed here ] In fact it is kept in the (hidden) variable ^`_a*bdc ^`eaDf8g$h and so can be retrieved from the ‘bin’. 1.2 Getting Help 2 to a variable but the result is not printed automatically. An expression can be as simple as iHj k or a complex function call. Assignments are indicated by the assignment operator l=m or . (As the first needs two keystrokes, lazy typists use the second. However, the first is easier to read.) For example, Z i$j k npo*qsr Z 0 , 1 )Kt& %9 1 !$9O npo*quo k=v 6dw=wIx=x Z 0 l-m 0 , 1 )KU&W%9 1 !39Oyz5{l=m|5 1 !Kt&W%*9 1 !$9WO Z 0 ?"%I}=!$9~Kt5 O npo*q k 6 o v x-i o The npo*q states that the answer is starting at the first element of a vector. Commands are separated either by a semi-colon, y , or by a newline. If a command is not com- plete at the end of a line, S will give a different prompt, namely j on second and subsequent lines and continue to read input until the command is syntactically complete. S can be extended by writing new functions, which then can be used exactly as built-in func- tions (and can even replace them). How to write your own functions is covered in section 12. 1.2 Getting Help S has an inbuilt help facility similar to the man facility of Unix. To get more information on any specific named function or dataset, for example 0 , 1 ) , the command is Z &B,=+I'K 0 , 1 )WO For a feature specified by special characters, and in a few other cases (one is €$%*W(-%=%"€ ), the argument must be enclosed in double quotes, making it a ‘character string’: Z &B,=+I'K€ n=n €$O Help uses a window which overlays your main window. The pager accepts a number of options, including %*' 1-< , for the next page and } to quit. (Other useful options are o*‚ to go to the top and < 4/)-9"!4-+=mI to go back a page.) If you prefer, a separate help window (which can be left up) can be obtained by the argument W()G4I"ƒ$„ . Another way to get help is by Z† 0 , 1 ) Short help is given by the function 1 !3‡W% . S-Plus also has a window-based help facility, started by Z &B,=+I'C68%*9 1 !$9KM‡-#(/ƒ €*4/',)B+=4$4I2W€$O Click with the left mouse button on items to select categories and items. The help window can be left up, or removed by 1.3 Hardcopy Output 3 Z &B,=+I'C6U4IJ=J~KTO It is not advisable to quit S-Plus windows from the frame menu. 1.3 Hardcopy Output Graphics are printed by holding down the right button on the ‡! 1 '=& menu in a 4/',I)+-4=4/2~KpO window (see 6) and releasing over the print item. This will print on the nearest laser printer (or that selected by your ˆ$‰ Š‹=„Œ$‰ environment variable). To record a session cut-and-paste to a 9B,/;=9B,AGB(9 window, then remove your mistakes (if any) and save as a uploads/Litterature/ s-guide.pdf

  • 27
  • 0
  • 0
Afficher les détails des licences
Licence et utilisation
Gratuit pour un usage personnel Attribution requise