Addressing missing data in geochemistry: a non-linear approach

Martin Schroeder, Dan Cornford, Paul Farrimond, Chris Cornford

Research output: Contribution to journalArticlepeer-review

Abstract

Exploratory analysis of petroleum geochemical data seeks to find common patterns to help distinguish between different source rocks, oils and gases, and to explain their source, maturity and any intra-reservoir alteration. However, at the outset, one is typically faced with (a) a large matrix of samples, each with a range of molecular and isotopic properties, (b) a spatially and temporally unrepresentative sampling pattern, (c) noisy data and (d) often, a large number of missing values. This inhibits analysis using conventional statistical methods. Typically, visualisation methods like principal components analysis are used, but these methods are not easily able to deal with missing data nor can they capture non-linear structure in the data. One approach to discovering complex, non-linear structure in the data is through the use of linked plots, or brushing, while ignoring the missing data. In this paper we introduce a complementary approach based on a non-linear probabilistic model. Generative topographic mapping enables the visualisation of the effects of very many variables on a single plot, while also dealing with missing data. We show how using generative topographic mapping also provides an optimal method with which to replace missing values in two geochemical datasets, particularly where a large proportion of the data is missing.
Original languageEnglish
Pages (from-to)1162-1169
Number of pages8
JournalOrganic Geochemistry
Volume39
Issue number8
DOIs
Publication statusPublished - Aug 2008

Bibliographical note

Advances in Organic Geochemistry 2007 — Proceedings of the 23rd International Meeting on Organic Geochemistry

Keywords

  • petroleum geochemical
  • range of molecular and isotopic properties
  • spatially and temporally unrepresentative sampling pattern
  • linked plots
  • brushing
  • non-linear probabilistic model
  • Generative topographic mapping

Fingerprint

Dive into the research topics of 'Addressing missing data in geochemistry: a non-linear approach'. Together they form a unique fingerprint.

Cite this