Metabolomics is the large-scale, untargeted studies of the small molecules involved in essential life-sustaining chemical processes (metabolites). Untargeted metabolomics has provided insights into a wide array of fields, such as medical diagnostics, drug discovery, personalised medicine and many others. Measurements in metabolomics studies are routinely performed using liquid chromatography mass spectrometry (LC-MS) instruments. Using tandem mass spectrometry, fragmentation peaks characteristic to a compound can be obtained and used to help establish the identity of the compound.
Fragmentation spectra, which provide the characteristic fingerprints of compounds, also contain structural information where a subset of fragment peaks may correspond to a shared chemical substructure in a class of compounds. The aim of this site is to provide an online platform that allows users to perform unsupervised substructure discovery in fragmentation experiments, decompose fragmentation experiments into characterized substructures (Mass2Motifs) found in MS/MS spectra of reference compounds, and integrate fragmentation analysis with comparative metabolomics experiments.
How does it work? In our proposed method (MS2LDA), discrete fragment and neutral loss features are extracted from fragmentation spectra. Related features that tend to co-occur are detected using the Latent Dirichlet Allocation model. The figure below shows the analogy between LDA for text and MS2LDA for fragment and neutral loss features. LDA finds topics interpreted as ‘football related’, ‘business-related’ and ‘environment related’. MS2LDA finds sets of concurring mass fragments or losses (Mass2Motifs) that can be interpreted as ‘Asparagine-related’, ‘Hexose-related’ and ‘Adenine-related’.
The tool currently accepts the fragmentation experiments in .mzML and .MSP formats and optionally an MS1 peak list can be added to which the MS1 peaks found in the fragmentation experiment are then matched prior to running LDA or Decomposition.
Ms2lda.org provides access to the LDA and Decomposition models and includes the following visualisation features:
In addition, the following features are provided to facilitate integration with metabolomics experiments by:
The data and codes for the paper can be found at http://dx.doi.org/10.5525/gla.researchdata.313. A new version of MS2LDA that allows for topics (i.e. Mass2Motifs) to be inferred across multiple document collections (i.e. fragmentation files) at once can be found at http://github.com/sdrogers/lda. The rest of the pipeline codes to process and load fragmentation data into the pipeline can also be found there. The codes for this website itself, alongside various visualisation modules, can be found at http://github.com/sdrogers/ms2ldaviz