Unsupervised Substructure Discovery in Metabolomics

Metabolomics is the large-scale, untargeted studies of the small molecules involved in essential life-sustaining chemical processes (metabolites). Untargeted metabolomics has provided insights into a wide array of fields, such as medical diagnostics, drug discovery, personalised medicine and many others. Measurements in metabolomics studies are routinely performed using liquid chromatography mass spectrometry (LC-MS) instruments. Using tandem mass spectrometry, fragmentation peaks characteristic to a compound can be obtained and used to help establish the identity of the compound.

Fragmentation spectra, which provide the characteristic fingerprints of compounds, also contain structural information where a subset of fragment peaks may correspond to a shared chemical substructure in a class of compounds. The aim of this site is to provide an online platform that allows users to perform unsupervised substructure discovery in fragmentation experiments, decompose fragmentation experiments into characterized substructures (Mass2Motifs) found in MS/MS spectra of reference compounds, and integrate fragmentation analysis with comparative metabolomics experiments.

How does it work? In our proposed method (MS2LDA), discrete fragment and neutral loss features are extracted from fragmentation spectra. Related features that tend to co-occur are detected using the Latent Dirichlet Allocation model. The figure below shows the analogy between LDA for text and MS2LDA for fragment and neutral loss features. LDA finds topics interpreted as ‘football related’, ‘business-related’ and ‘environment related’. MS2LDA finds sets of concurring mass fragments or losses (Mass2Motifs) that can be interpreted as ‘Asparagine-related’, ‘Hexose-related’ and ‘Adenine-related’.


The tool currently accepts the fragmentation experiments in various formats (mzML, MSP, MGF) and optionally an MS1 peak list can be added to which the MS1 peaks found in the fragmentation experiment are then matched prior to running LDA or Decomposition.


The following are relevant literatures for MS2LDA

Other papers that cite MS2LDA can be found here.

To run your own instance of ms2lda.org, please refer to the codes available at http://github.com/glasgowcompbio/ms2ldaviz