Go to Experiments Go to MotifDB


15th January 2021: is finally migrated to Python 3, and job submission is now working again! If you encounter any further problem submitting job, please let us know. Additionally, if you have any old, unused experiments, please delete them to help us save space on the database.

25th November 2018: We have introduced many new features to make interpreting Mass2Motifs in easier:

  • A new upload function is added in the Create Experiment page to allow for the uploading of large locally-run LDA experiments(script under development).
  • MotifDB (a database of curated and annotated Mass2Motifs) have been incorporated into the system, as well as a functionality to predictClassyfiresubstituent terms for the spectra in your dataset using a neural network.
  • Automatic MAGMa annotations of fragment and neutral loss features have also been incorporated into the system.

4th August 2017: Our new paper on the use of MS2LDA to investigate the variability in substructure prevalence across large experiments is now published in Analytical Chemistry:Unsupervised Discovery and Comparison of Structural Families Across Multiple Samples in Untargeted Metabolomics. Additionally, we have changed the way the system handles thresholds for links between spectra and motifs. All experiments can now set a threshold on both probability and overlap_score. If you do not wish to threshold on one or the other, set the respective threshold to zero. All experiments have been migrated such that the new settings give identical output to the old ones. We think that this improved flexibility will make the system more user friendly.

26th September 2017: Our application note describing this Web application is now published in Bioinformatics: web-based topic modelling for substructure discovery in mass spectrometry.

Unsupervised Substructure Discovery in Metabolomics

Metabolomics is the large-scale, untargeted studies of the small molecules involved in essential life-sustaining chemical processes (metabolites). Untargeted metabolomics has provided insights into a wide array of fields, such as medical diagnostics, drug discovery, personalised medicine and many others. Measurements in metabolomics studies are routinely performed using liquid chromatography mass spectrometry (LC-MS) instruments. Using tandem mass spectrometry, fragmentation peaks characteristic to a compound can be obtained and used to help establish the identity of the compound.

Fragmentation spectra, which provide the characteristic fingerprints of compounds, also contain structural information where a subset of fragment peaks may correspond to a shared chemical substructure in a class of compounds. The aim of this site is to provide an online platform that allows users to perform unsupervised substructure discovery in fragmentation experiments, decompose fragmentation experiments into characterized substructures (Mass2Motifs) found in MS/MS spectra of reference compounds, and integrate fragmentation analysis with comparative metabolomics experiments.

How does it work? In our proposed method (MS2LDA), discrete fragment and neutral loss features are extracted from fragmentation spectra. Related features that tend to co-occur are detected using the Latent Dirichlet Allocation model. The figure below shows the analogy between LDA for text and MS2LDA for fragment and neutral loss features. LDA finds topics interpreted as ‘football related’, ‘business-related’ and ‘environment related’. MS2LDA finds sets of concurring mass fragments or losses (Mass2Motifs) that can be interpreted as ‘Asparagine-related’, ‘Hexose-related’ and ‘Adenine-related’.


The tool currently accepts the fragmentation experiments in various formats (mzML, MSP, MGF) and optionally an MS1 peak list can be added to which the MS1 peaks found in the fragmentation experiment are then matched prior to running LDA or Decomposition.


The following are relevant literatures for MS2LDA

To run your own instance of, please refer to the codes available at