715 research outputs found
Compound compositional data processes
Compositional data is non-negative data subject to the unit sum constraint. The logistic normal distribution provides a framework for compositional data when it satisfies sub-compositional coherence in that the inference from a sub- composition should be the same based on the full composition or the sub-composition alone. However, in many cases sub-compositions are not coherent because of additional structure on the compositions, which can be modelled as process(es) inducing change. Sometimes data are collected with a model already well validated and hence with the focus on estimation of the model parameters. Alternatively, sometimes the appropriate model is unknown in advance and it is necessary to use the data to identify a suitable model. In both cases, a hierarchy of possible structure(s) is very helpful. This is evident in the evaluation of, for example, geochemical and household expenditure data. In the case of geochemical data, the structural process might be the stoichiometric constraints induced by the crystal lattice sites, which ensures that amalgamations of some elements are constant in molar terms. The choice of units (weight percent oxide or moles) has an impact on how the data can be modelled and interpreted. For simple igneous systems (e.g. Hawaiian basalt) mineral modes can be calculated from which a valid geochemical interpretation can be obtained. For household expenditure data, the structural process might be how teetotal households have distinct spending patterns on discretionary items from non-teetotal households. Measurement error is an example of another underlying process that reflects how an underlying discrete distribution (e.g. for the number of molecules in a sample) is converted using a linear calibration into a non-negative measurement, where measurements below the stated detection limit are reported as zero. Compositional perturbation involves additive errors on the log-ratio space and is the process that does show sub-compositional coherence. The mixing process involves the combination of compositions into a new composition, such as minerals combining to form a rock, where there may be considerable knowledge about the set of possible mixing processes. Finally, recording error may affect the composition, such as recording the components to a specified number of decimal digits, implying interval censoring, which implies error is close to uniform on the simplex.postprin
Practical aspects of compositional data analysis using regional geochemical survey data
Government geological surveys and mineral exploration companies collect large amounts of geochemical data, which are used in search for mineral commodities or for determining environmental
disturbances. These surveys consist of many thousands of samples (observations) with as many as
50 elements determined for each. Because the nature of the data is compositional, they must be
treated according the protocols established by John Aitchison and others. This contribution details
an approach based on the application of the alr, clr and ilr transforms for process discovery and validation. Issues of around the treatment of zeros and/or missing values are complicated due to the
stoichiometric nature of the data. Case studies are presented where the use of logratio transforms
and the estimation of replacement values for missing data are considered in the context of stoichiometric constraint
The single component geochemical map: Fact or fiction?
Single component geochemical maps are the most basic representation of spatial elemental distributions and commonly used in environmental and exploration geochemistry. However, the compositional nature of geochemical data imposes several limitations on how the data should be presented. The problems relate to the constant sum problem (closure), and the inherently multivariate relative information conveyed by compositional data. Well known is, for instance, the tendency of all heavy metals to show lower values in soils with significant contributions of diluting elements (e.g., the quartz dilution effect); or the contrary effect, apparent enrichment in many elements due to removal of potassium during weathering. The validity of classical single component maps is thus investigated, and reasonable alternatives that honour the compositional character of geochemical concentrations are presented. The first recommended such method relies on knowledge-driven log-ratios, chosen to highlight certain geochemical relations or to filter known artefacts (e.g. dilution with SiO2 or volatiles). This is similar to the classical normalisation approach to a single element. The second approach uses the (so called) log-contrasts, that employ suitable statistical methods (such as classification techniques, regression analysis, principal component analysis, clustering of variables, etc.) to extract potentially interesting geochemical summaries. The caution from this work is that if a compositional approach is not used, it becomes difficult to guarantee that any identified pattern, trend or anomaly is not an artefact of the constant sum constraint. In summary the authors recommend a chain of enquiry that involves searching for the appropriate statistical method that can answer the required geological or geochemical question whilst maintaining the integrity of the compositional nature of the data. The required log-ratio transformations should be applied followed by the chosen statistical method. Interpreting the results may require a closer working relationship between statisticians, data analysts and geochemists
The stoichiometry of mineral compositions
Previous work by John Aitchison (1999) showed how log-ratio compositional data analysis can illuminate the relationships between components of a composition based on mineral constituents,
However, his analysis was framed in terms of weight based compositional, so it did not illustrate
directly the stoichometric relationships of the olivine minerals he investigated. We show how applying log-ratio compositional data analysis to the mole based composition illustrates the stoichometric
relationships directly by investigating olivines, alkali feldspars and plagioclases. This approach has
the potential to provide much greater meaning to geochemists than one based on weight based composition
GeoCoDA: Recognizing and Validating Structural Processes in Geochemical Data. A Workflow on Compositional Data Analysis in Lithogeochemistry
Geochemical data are compositional in nature and are subject to the problems
typically associated with data that are restricted to the real non-negative
number space with constant-sum constraint, that is, the simplex. Geochemistry
can be considered a proxy for mineralogy, comprised of atomically ordered
structures that define the placement and abundance of elements in the mineral
lattice structure. Based on the innovative contributions of John Aitchison, who
introduced the logratio transformation into compositional data analysis, this
contribution provides a systematic workflow for assessing geochemical data in
an efficient way, such that significant geochemical (mineralogical) processes
can be recognized and validated. The results of a workflow, called GeoCoDA and
presented here in the form of a tutorial, enables the recognition of processes
from which models can be constructed based on the associations of elements that
reflect mineralogy. Both the original compositional values and their
transformation to logratios are considered. These models can reflect rock
forming processes, metamorphic, alteration and ore mineralization. Moreover,
machine learning methods, both unsupervised and supervised, applied to an
optimized set of subcompositions of the data, provide a systematic, accurate,
efficient and defensible approach to geochemical data analysis. The workflow is
illustrated on lithogeochemical data from exploration of the Star kimberlite,
consisting of a series of eruptions with five recognized phases.Comment: 38 pages, 18 figures (including Supplementary Material
- …
