Accuracy evaluation of different statistical and geostatistical censored data imputation approaches (Case study: Sari Gunay gold deposit)

Document Type: Research Paper

Authors

Simulation and Data Processing Laboratory, University College of Engineering, School of Mining Engineering, University of Tehran, Tehran, Iran

Abstract

Most of the geochemical datasets include missing data with different portions and this may cause a significant problem in geostatistical modeling or multivariate analysis of the data. Therefore, it is common to impute the missing data in most of geochemical studies. In this study, three approaches called half detection (HD), multiple imputation (MI), and the cosimulation based on Markov model 2 (MM2) are used to impute the censored data. According to the fact that the new datasets have to satisfy the original data underlying structure, the Multidimensional Scaling (MDS) approach has been used to explore the validity of different imputation methods. Log-ratio transformation (alr transformation) was performed to open the closed compositional data prior to applying the MDS method. Experiments showed that, based on the MDS approach, the MI and the MM2 could not satisfy the original underlying structure of the dataset as well as the HD approach. This is because these two mentioned approaches have produced values higher than the detection limit of the variables.

Keywords


[1]. Croghan, C., & Egeghy, P. P. (2003). Methods of dealing with values below the limit of detection using SAS. Southern SAS User Group, St. Petersburg, FL, 22-24. [2]. Taylor, J. K. (1987). Quality assurance of chemical measurements. CRC Press. [3]. Lyles, R. H., Fan, D., & Chuachoowong, R. (2001). Correlation coefficient estimation involving a left censored laboratory assay variable. Statistics in Medicine, 20(19), 2921-2933. [4]. Grunsky, E. C., & Smee, B. W. (1999). The differentiation of soil types and mineralization from multi-element geochemistry using multivariate methods and digital topography. Journal of Geochemical Exploration, 67(1), 287-299.
[5]. Carranza, E. J. M. (2011). Analysis and mapping of geochemical anomalies using logratio-transformed stream sediment data with censored values. Journal of Geochemical Exploration, 110(2), 167-185.
[6]. Rubin, D. B. (1978). Multiple imputations in sample surveys-a phenomenological Bayesian approach to nonresponse. In Proceedings of the survey research methods section of the American statistical association., American Statistical Association, Vol. 1, pp. 20-34.
[7]. Rubin, D. B. (1988). An overview of multiple imputation. In Proceedings of the survey research methods section of the American statistical association, pp. 79-84. [8]. Van Buuren, Stef, and Karin Oudshoorn. (1999). "Flexible multivariate imputation by MICE." Leiden, The Netherlands: TNO Prevention Center, Netherlands. [9]. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society. Series B (methodological), 1-38.
[10]. Barnett, R. M., & Deutsch, C. V. (2015). Multivariate Imputation of Unequally Sampled Geological Variables. Mathematical Geosciences, 1-27.
[11]. Goovaerts, P., Geostatistics for natural resources evaluation. (1997). Oxford University Press, New York, 483 p.
[12]. Munoz, B., Lesser, V. M., & Smith, R. A. (2010). Applying Multiple imputation with Geostatistical Models to Account for Item Nonresponse in Environmental Data. Journal of Modern Applied Statistical Methods, 9(1), 27.
[13]. Zhang, X., Jiang, H., Zhou, G., Xiao, Z., & Zhang, Z. (2012). Geostatistical interpolation of missing data and downscaling of spatial resolution for remotely sensed atmospheric methane column concentrations. International journal of remote sensing, 33(1), 120-134.
[14]. Torgerson, W. S. (1952). Multidimensional scaling: I. Theory and method. Psychometrika, 17(4), 401-419..
[16]. Deutsch, J. L., & Deutsch, C. V. (2014). A multidimensional scaling approach to enforce reproduction of transition probabilities in truncated plurigaussian simulation. Stochastic Environmental Research and Risk Assessment, 28(3), 707-716.
[17]. Boisvert, J. B., & Deutsch, C. V. (2011). Programs for kriging and sequential Gaussian simulation with locally varying anisotropy using non-Euclidean distances. Computers & Geosciences, 37(4), 495-510. [18]. Pawlowsky-Glahn, V., & Egozcue, J. J. (2006). Compositional data and their analysis: an introduction. Geological Society, London, Special Publications, 264(1), 1-10.
[19]. Aitchison, J. (1983). Principal component analysis of compositional data. Biometrika, 70(1), 57-65.
[20]. Aitchison, J. (1986). The Statistical Analysis of Compositional Data, first ed. Chapman and Hall, London, UK, 416 pp.
[21]. Aitchison, J. (1999). Logratios and natural laws in compositional data analysis. Mathematical Geology, 31(5), 563-580.
[22]. Aitchison, J., Barceló-Vidal, C., Martín-Fernández, J. A., & Pawlowsky-Glahn, V. (2000). Logratio analysis and compositional distance. Mathematical Geology, 32(3), 271-275.
[23]. Buccianti, A., & Pawlowsky-Glahn, V. (2005). New perspectives on water chemistry and compositional data analysis. Mathematical Geology, 37(7), 703-727.
[24]. Buccianti, A., & Grunsky, E. (2014). Compositional data analysis in geochemistry: Are we sure to see what really occurs during natural processes?. Journal of Geochemical Exploration, 141, 1-5.

[25]. Egozcue, J. J., Pawlowsky-Glahn, V., Mateu-Figueras, G., & Barcelo-Vidal, C. (2003). Isometric logratio transformations for compositional data analysis. Mathematical Geology, 35(3), 279-300.
[26]. de Caritat, P., & Grunsky, E. C. (2013). Defining element associations and inferring geological processes from total element concentrations in Australian catchment outlet sediments: multivariate analysis of continental-scale geochemical data. Applied Geochemistry, 33, 104-126.
[27]. Wilkinson, L. D. (2005). Geology and mineralization of the Sari Gunay gold deposits, Kurdistan province, Iran. Open-File ReportRio-Tinto Mining and Exploration Ltd.
[28]. Yuan, Y. C. (2010). Multiple imputation for missing data: Concepts and new development (Version 9.0). SAS Institute Inc, Rockville, MD.
[29]. Ni, D., & Leonard, J. D. (2005). Markov Chain Monte Carlo Multiple imputation for Incomplete ITS Data Using Bayesian Networks.
[30]. Schafer, J. L. (1997). Imputation of missing covariates under a multivariate linear mixed model. Unpublished technical report.
[31]. Almeida, A. S. (1993). Joint simulation of multiple variables with a Markov-type coregionalization model. Unpublished doctoral dissertation, Stanford University, Stanford, 199 p.
[32]. Almeida, A. S., and Journel, A. G. (1996). Joint simulation of multiple variables with a Markov-type coregionalization model. Math. Geology, v. 26, no. 5, p. 565–588.
[33]. Journel, A. G. (1999). Markov models for cross-covariances. Mathematical Geology, 31(8), 955-964.
[34]. Shmaryan, L. E., & Journel, A. G. (1999). Two Markov models and their application. Mathematical geology, 31(8), 965-988.
[35]. Egozcue, J. J., & Pawlowsky-Glahn, V. (2005). Groups of parts and their balances in compositional data analysis. Mathematical Geology, 37(7), 795-828.
[36]. Thomas, C. W., & Aitchison, J. (2006). Log-ratios and geochemical discrimination of Scottish Dalradian limestones: a case study. Geological Society, London, Special Publications, 264(1), 25-41.
[37]. Wang, W., Zhao, J., & Cheng, Q. (2014). Mapping of Fe mineralization-associated geochemical signatures using logratio transformed stream sediment geochemical data in eastern Tianshan, China. Journal of Geochemical Exploration, 141, 6-14.
[38]. Aitchison, J. (1982). The statistical analysis of compositional data. Journal of the Royal Statistical Society. Series B (Methodological), 139-177.
[39]. Job, M. R. (2012). Application of Logratios for Geostatistical Modelling of Compositional Data (Doctoral dissertation, University of Alberta).
[40]. Wickelmaier, F. (2003). An introduction to MDS. Sound Quality Research Unit, Aalborg University, Denmark.