Statistical Data Analysis Explained
Applied Environmental Statistics with R
AvClemens Reimann,Peter Filzmoser
1 145 kr
Beställningsvara. Skickas inom 11-20 vardagar. Fri frakt över 249 kr.
Beskrivning
Produktinformation
- Utgivningsdatum:2008-04-07
- Mått:177 x 252 x 26 mm
- Vikt:765 g
- Format:Inbunden
- Språk:Engelska
- Antal sidor:368
- Förlag:John Wiley & Sons Inc
- ISBN:9780470985816
Utforska kategorier
Mer om författaren
Clemens Reiman (born 1952) holds an M.Sc. in Mineralogy and Petrology from the University of Hamburg (Germany), a Ph.D. in Geosciences from Leoben Mining University, Austria, and a D.Sc. in Applied Geochemistry from the same university. he has worked as a lecturer in Mineralogy and Petrology and Environmental Sciences at Leoben Mining University, as an exploration geochemist in eastern Canada, in contract research in environmental sciences in Austria and managed the laboratory of an Austrian cement company before joining the Geological Survey of Norway in 1991 as a senior geochemist. From March to October 2004 he was director and professor at the German Federal Environment Agency (Unweltbundesamt, UBAS), responsible for the Division II, Environmental Health and Protection of Ecosystems. At present he is chairman of the EuroGeoSurveys geochemistry expert group, acting vice president of the International Association of GeoChemistry (IAGC), and associate editor of both Applied Geochemistry and Geochemistry: Exploration, Environment, Analysis. Peter Filzmoser (born 1968) studies Applied Mathematics at the Vienna University of Technology, Austria, where he also wrote his doctoral thesis and habilitation devoted to the field of multivariate statistics. His research led him to the area of robust statistics, resulting in many international collaborations and various scientific papers in this area. His interest in applications of robust methods resulted in the development of R software packages. He was and is involved in the Organisation of several scientific evens devoted to robust statistics. Since 2001 he has been dozent at the Statistics Department at Vienna University of Technology. He was visiting professor at the universities of Vienna, Toulouse and Minsk.Robert G. Garrett (Bob Garrett) studied Mining Geology and Applied Geochemistry at Imperial College, London, and joined the Geological Survey of Canada (GSC) in 1967 following post-doctoral studies at Northwestern University, Evanston. For the next 25 years his activities focused on regional geochemical mapping in Canada, and overseas for the Canadian International Development Agency, to support mineral exploration and resource appraisal. Throughout his work there has been a use of computers and statistics to manage data, assess their quality, and maximise the knowledge extracted from them. In the 1990s he commenced collaboration crops. Since then he has been involved in various Canadian Federal and university-based research initiatives aimed at providing sound science to support Canadian regulatory and international policy activities concerning risk assessments and risk management for metals. he retired in March 2005 but remains active as an Emeritus Scientist.Rudolf Dutter is senior statistician and full professor at Vienna University of Technology, Austria. he studies Applied Mathematics in Vienna (M.Sc.) and Statistics at Universite de Montreal, Canada (Ph.D.). He spent three years as a post-doctoral fellow at ETH, Zurich, working on computational robust statistics. research and teaching activities followed at the Graz University of Technology, and as a full professor of statistics at Vienna University of Technology, both in Austria. he also taught and consulted at Leoben Mining University, Technology, both in Austria. he also taught and consulted at Leoben Mining University, Austria; currently he consults in many fields of applied statistics with main interests in computational and robust statistics, development of statistical software, and geostatistics. He is author and coauthor of many publications and several books, e.g., an early booklet in German on geostatistics.
Innehållsförteckning
- Preface xiiiAcknowledgements xvAbout the authors xvii1 Introduction 11.1 The Kola Ecogeochemistry Project 51.1.1 Short description of the Kola Project survey area 61.1.2 Sampling and characteristics of the different sample materials 91.1.3 Sample preparation and chemical analysis 112 Preparing the Data for Use in R and DAS+R 132.1 Required data format for import into R and DAS+R 142.2 The detection limit problem 172.3 Missing values 202.4 Some "typical" problems encountered when editing a laboratory data report file to a DAS+R file 212.4.1 Sample identification 222.4.2 Reporting units 222.4.3 Variable names 232.4.4 Results below the detection limit 232.4.5 Handling of missing values 242.4.6 File structure 242.4.7 Quality control samples 252.4.8 Geographical coordinates, further editing and some unpleasant limitations of spreadsheet programs 252.5 Appending and linking data files 252.6 Requirements for a geochemical database 272.7 Summary 283 Graphics to Display the Data Distribution 293.1 The one-dimensional scatterplot 293.2 The histogram 313.3 The density trace 343.4 Plots of the distribution function 353.4.1 Plot of the cumulative distribution function (CDF-plot) 353.4.2 Plot of the empirical cumulative distribution function (ECDF-plot) 363.4.3 The quantile-quantile plot (QQ-plot) 363.4.4 The cumulative probability plot (CP-plot) 393.4.5 The probability-probability plot (PP-plot) 403.4.6 Discussion of the distribution function plots 413.5 Boxplots 413.5.1 The Tukey boxplot 423.5.2 The log-boxplot 443.5.3 The percentile-based boxplot and the box-and-whisker plot 463.5.4 The notched boxplot 473.6 Combination of histogram, density trace, one-dimensional scatterplot, boxplot, and ECDF-plot 483.7 Combination of histogram, boxplot or box-and-whisker plot, ECDF-plot, and CP-plot 493.8 Summary 504 Statistical Distribution Measures 514.1 Central value 514.1.1 The arithmetic mean 514.1.2 The geometric mean 524.1.3 The mode 524.1.4 The median 524.1.5 Trimmed mean and other robust measures of the central value 534.1.6 Influence of the shape of the data distribution 534.2 Measures of spread 564.2.1 The range 564.2.2 The interquartile range (IQR) 564.2.3 The standard deviation 574.2.4 The median absolute deviation (MAD) 574.2.5 Variance 584.2.6 The coefficient of variation (CV) 584.2.7 The robust coefficient of variation (CVR) 594.3 Quartiles, quantiles and percentiles 594.4 Skewness 594.5 Kurtosis 594.6 Summary table of statistical distribution measures 604.7 Summary 605 Mapping Spatial Data 635.1 Map coordinate systems (map projection) 645.2 Map scale 655.3 Choice of the base map for geochemical mapping 665.4 Mapping geochemical data with proportional dots 685.5 Mapping geochemical data using classes 695.5.1 Choice of symbols for geochemical mapping 705.5.2 Percentile classes 715.5.3 Boxplot classes 715.5.4 Use of ECDF- and CP-plot to select classes for mapping 745.6 Surface maps constructed with smoothing techniques 745.7 Surface maps constructed with kriging 765.7.1 Construction of the (semi)variogram 765.7.2 Quality criteria for semivariograms 795.7.3 Mapping based on the semivariogram (kriging) 795.7.4 Possible problems with semivariogram estimation and kriging 805.8 Colour maps 825.9 Some common mistakes in geochemical mapping 845.9.1 Map scale 845.9.2 Base map 845.9.3 Symbol set 845.9.4 Scaling of symbol size 845.9.5 Class selection 865.10 Summary 886 Further Graphics for Exploratory Data Analysis 916.1 Scatterplots (xy-plots) 916.1.1 Scatterplots with user-defined lines or fields 926.2 Linear regression lines 936.3 Time trends 956.4 Spatial trends 976.5 Spatial distance plot 996.6 Spiderplots (normalised multi-element diagrams) 1016.7 Scatterplot matrix 1026.8 Ternary plots 1036.9 Summary 1067 Defining Background and Threshold, Identification of Data Outliers and Element Sources 1077.1 Statistical methods to identify extreme values and data outliers 1087.1.1 Classical statistics 1087.1.2 The boxplot 1097.1.3 Robust statistics 1107.1.4 Percentiles 1117.1.5 Can the range of background be calculated? 1127.2 Detecting outliers and extreme values in the ECDF- or CP-plot 1127.3 Including the spatial distribution in the definition of background 1147.3.1 Using geochemical maps to identify a reasonable threshold 1147.3.2 The concentration-area plot 1157.3.3 Spatial trend analysis 1187.3.4 Multiple background populations in one data set 1197.4 Methods to distinguish geogenic from anthropogenic element sources 1207.4.1 The TOP/BOT-ratio 1207.4.2 Enrichment factors (EFs) 1217.4.3 Mineralogical versus chemical methods 1287.5 Summary 1288 Comparing Data in Tables and Graphics 1298.1 Comparing data in tables 1298.2 Graphical comparison of the data distributions of several data sets 1338.3 Comparing the spatial data structure 1368.4 Subset creation – a mighty tool in graphical data analysis 1388.5 Data subsets in scatterplots 1418.6 Data subsets in time and spatial trend diagrams 1428.7 Data subsets in ternary plots 1448.8 Data subsets in the scatterplot matrix 1468.9 Data subsets in maps 1478.10 Summary 1489 Comparing Data Using Statistical Tests 1499.1 Tests for distribution (Kolmogorov–Smirnov and Shapiro–Wilk tests) 1509.1.1 The Kola data set and the normal or lognormal distribution 1519.2 The one-sample t-test (test for the central value) 1549.3 Wilcoxon signed-rank test 1569.4 Comparing two central values of the distributions of independent data groups 1579.4.1 The two-sample t-test 1579.4.2 The Wilcoxon rank sum test 1589.5 Comparing two central values of matched pairs of data 1589.5.1 The paired t-test 1589.5.2 The Wilcoxon test 1609.6 Comparing the variance of two data sets 1609.6.1 The F-test 1609.6.2 The Ansari–Bradley test 1609.7 Comparing several central values 1619.7.1 One-way analysis of variance (ANOVA) 1619.7.2 Kruskal-Wallis test 1619.8 Comparing the variance of several data groups 1619.8.1 Bartlett test 1619.8.2 Levene test 1629.8.3 Fligner test 1629.9 Comparing several central values of dependent groups 1639.9.1 ANOVA with blocking (two-way) 1639.9.2 Friedman test 1639.10 Summary 16410 Improving Data Behaviour for Statistical Analysis: Ranking and Transformations 16710.1 Ranking/sorting 16810.2 Non-linear transformations 16910.2.1 Square root transformation 16910.2.2 Power transformation 16910.2.3 Log(arithmic)-transformation 16910.2.4 Box–Cox transformation 17110.2.5 Logit transformation 17110.3 Linear transformations 17210.3.1 Addition/subtraction 17210.3.2 Multiplication/division 17310.3.3 Range transformation 17410.4 Preparing a data set for multivariate data analysis 17410.4.1 Centring 17410.4.2 Scaling 17410.5 Transformations for closed number systems 17610.5.1 Additive logratio transformation 17710.5.2 Centred logratio transformation 17810.5.3 Isometric logratio transformation 17810.6 Summary 17911 Correlation 18111.1 Pearson correlation 18211.2 Spearman rank correlation 18311.3 Kendall-tau correlation 18411.4 Robust correlation coefficients 18411.5 When is a correlation coefficient significant? 18511.6 Working with many variables 18511.7 Correlation analysis and inhomogeneous data 18711.8 Correlation results following additive logratio or centred logratio transformations 18911.9 Summary 19112 Multivariate Graphics 19312.1 Profiles 19312.2 Stars 19412.3 Segments 19612.4 Boxes 19712.5 Castles and trees 19812.6 Parallel coordinates plot 19812.7 Summary 20013 Multivariate Outlier Detection 20113.1 Univariate versus multivariate outlier detection 20113.2 Robust versus non-robust outlier detection 20413.3 The chi-square plot 20513.4 Automated multivariate outlier detection and visualisation 20513.5 Other graphical approaches for identifying outliers and groups 20813.6 Summary 21014 Principal Component Analysis (PCA) and Factor Analysis (FA) 21114.1 Conditioning the data for PCA and FA 21214.1.1 Different data ranges and variability, skewness 21214.1.2 Normal distribution 21314.1.3 Data outliers 21314.1.4 Closed data 21414.1.5 Censored data 21514.1.6 Inhomogeneous data sets 21514.1.7 Spatial dependence 21514.1.8 Dimensionality 21614.2 Principal component analysis (PCA) 21614.2.1 The scree plot 21714.2.2 The biplot 21914.2.3 Mapping the principal components 22014.2.4 Robust versus classical PCA 22114.3 Factor analysis 22214.3.1 Choice of factor analysis method 22414.3.2 Choice of rotation method 22414.3.3 Number of factors extracted 22414.3.4 Selection of elements for factor analysis 22514.3.5 Graphical representation of the results of factor analysis 22514.3.6 Robust versus classical factor analysis 22914.4 Summary 23115 Cluster Analysis 23315.1 Possible data problems in the context of cluster analysis 23415.1.1 Mixing major, minor and trace elements 23415.1.2 Data outliers 23415.1.3 Censored data 23515.1.4 Data transformation and standardisation 23515.1.5 Closed data 23515.2 Distance measures 23615.3 Clustering samples 23615.3.1 Hierarchical methods 23615.3.2 Partitioning methods 23915.3.3 Model-based methods 24015.3.4 Fuzzy methods 24215.4 Clustering variables 24215.5 Evaluation of cluster validity 24415.6 Selection of variables for cluster analysis 24615.7 Summary 24716 Regression Analysis (RA) 24916.1 Data requirements for regression analysis 25116.1.1 Homogeneity of variance and normality 25116.1.2 Data outliers, extreme values 25316.1.3 Other considerations 25316.2 Multiple regression 25416.3 Classical least squares (LS) regression 25516.3.1 Fitting a regression model 25516.3.2 Inferences from the regression model 25616.3.3 Regression diagnostics 25916.3.4 Regression with opened data 25916.4 Robust regression 26016.4.1 Fitting a robust regression model 26116.4.2 Robust regression diagnostics 26216.5 Model selection in regression analysis 26416.6 Other regression methods 26616.7 Summary 26817 Discriminant Analysis (DA) and Other Knowledge-Based Classification Methods 26917.1 Methods for discriminant analysis 26917.2 Data requirements for discriminant analysis 27017.3 Visualisation of the discriminant function 27117.4 Prediction with discriminant analysis 27217.5 Exploring for similar data structures 27517.6 Other knowledge-based classification methods 27617.6.1 Allocation 27617.6.2 Weighted sums 27817.7 Summary 28018 Quality Control (QC) 28118.1 Randomised samples 28218.2 Trueness 28218.3 Accuracy 28418.4 Precision 28618.4.1 Analytical duplicates 28718.4.2 Field duplicates 28918.5 Analysis of variance (ANOVA) 29018.6 Using maps to assess data quality 29318.7 Variables analysed by two different analytical techniques 29418.8 Working with censored data – a practical example 29618.9 Summary 29919 Introduction to R and Structure of the DAS+R Graphical User Interface 30119.1 R 30119.1.1 Installing R 30119.1.2 Getting started 30219.1.3 Loading data 30219.1.4 Generating and saving plots in R 30319.1.5 Scatterplots 30519.2 R-scripts 30719.3 A brief overview of relevant R commands 31119.4 DAS+R 31519.4.1 Loading data into DAS+R 31619.4.2 Plotting diagrams 31619.4.3 Tables 31719.4.4 Working with “worksheets” 31719.4.5 Groups and subsets 31719.4.6 Mapping 31819.5 Summary 318References 321Index 337
Du kanske också är intresserad av
Developments in Robust Statistics
Rudolf Dutter, Peter Filzmoser, Ursula Gather, Peter J. Rousseeuw
1 577 kr
Developments in Robust Statistics
Rudolf Dutter, Peter Filzmoser, Ursula Gather, Peter J. Rousseeuw
1 577 kr
Introduction to Multivariate Statistical Analysis in Chemometrics
Kurt Varmuza, Peter Filzmoser
2 223 kr