Table of Contents

  1. Getting Started
  2. Prerequisites
  3. Analysing Your Data
  4. Creating an Experiment
  5. Summary Page
  6. Mass2Motif Matching
  7. Interactive Network Visualisation
  8. Combining MS1 Differential Expression or Prevalence with MS2LDA

1. Getting Started

To log into MS2lda.org, you need to create an account. However, a guest account is also available to explore the system without registration. Functionalities are available for most experiments to allow for browsing through example data sets. However, to create your own experiment, you will need to request an account to ensure that your data is visible to you and collaborators of your choice.

The following experiments are automatically linked to your account for browsing and exploration:


2. Prerequisites

To analyse your data in Ms2lda.org, you first need:

  1. Your fragmentation data in mzML, MSP or MGF formats
  2. A list of MS1 peaks (optional). When fragmentation data in mzML format is provided, this list will be used to seed the MS1 peaks during feature extraction so only peaks that match the MS1 list within certain m/z and RT tolerances will be used.

Once you have these available, from the Experiment screen, click on the Create Experiment button, shown in (A) below. A screen will appear asking you to upload your data and define the parameters for feature extraction and inference (see Section 4 for more details). Upon clicking submit, the experiment will be processed in a job queue. While processing, it is also shown in the list of Pending Experiments, shown in (B) below. Submitted experiments will go ‘pending’ till they are finished. Depending on the size of the data and if any other experiments are running, it might take from a few hours till some longer to finish. Completed experiments are listed on the main page.

Create Experiment


3. Analysing Your Data

Once processed, LDA experiments (where substructures are discovered in an unsupervised manner) can be found in the list of LDA experiments. Experiments that are editable are shown in bold in the list, while read-only experiments are shown in normal font weight. Clicking an experiment will expand it into tabs, where additional functionalities can be selected.

LDA Experiments

The following functionalities are available for an LDA experiment:

Most of these pages including the visualisation have an excellent Search Function where Mass2Motifs, Mass2Motif annotations, and/or parent ions can be quickly and convienently found.

The following sections describe all the functionalities of Ms2lda.org in greater details:


4. Creating an Experiment

To upload your own data, please take care to select and submit the correct format and also to fill out the correct filters for RT and mass intensities in MS1 and MS2 [the defaults are suitable only for Thermo Q-Exactive spectral files] . Inclusion of noise does not contribute to the substructure discovery, and will make the LDA process running much slower. Thus, it is very important to check the noise level in your data and modify the minimum MS2 level to include accordingly. For example, ToF-based machines generate spectra with noise levels typically around 100 a.u. - whereas the default is set to 5000 for QExactive spectra. We would recommend to, if possible, submit a small subset of the data to check if things complete as expected. If you have MS1 peak information available, we also advise to run MS2LDA first without MS1 peak csv to ensure that this is working okay before trying to format the MS1 peak file format correctly and submitting it along.

If you upload an MS1 peak list with one or multiple files, then those peaks will be used to match the extracted MS1-MS2 pairs to according to thresholds you can put. Please check the website for requirements of the MS1 peak file. One experiments that you can view contains examples of MS1 comparisons that you then can do to find Mass2Motifs that contain discriminative metabolites between two groups. Please also note that the MS2 masses are by default binned in 0.005 Da bins, so please be aware that the masses displayed for them are no longer ‘accurate’ masses. There is an option now to choose for different bin sizes in case that is more appropriate for the data.

The following is a walkthrough on how to create your own experiment. Firstly click on Create Experiment and give it an experiment name and a description. Then select the format of the MS2 fragmentation file (either .mzML or .MSP or .MGF). Finally, upload the fragmentation file in the correct format in the file selector.

Create Experiment

Depending on your choice of fragmentation file format, different fields will be shown to configure filtering and feature extraction parameters. For fragmentation data in .mzML format, the parameters are:

Create Experiment (mzML)

For fragmentation data in .MSP or .MGF format, the parameters are:

Create Experiment (MSP, MGF)

For all formats, the following parameters for LDA inference have to be specified:

Finally press the Submit Your Experiment button to submit the experiment. Upon job completion, the experiments that are in your account now you can view, the ones you will upload yourself you can edit as well – and thus start to annotate your Mass2Motifs from an LDA run. To help you on the way, you can perform Motif-matching to previously run experiments. You can find all these functionalities once you click on a finished experiment.


5. Summary Page

A good start to exploring LDA results is through the Summary Page which shows all the key results for your dataset. From here you can get a clue of how much spectra are in each Mass2Motif so you can set a reasonable threshold for the visualization of the network (minimum degree – if put too high, not many data will be displayed). One of the tabs is called “View Experiment options” - here you can set the thresholds for a fragmented spectrum (document) to belong to a Mass2Motif. In our experience, a probability threshold of 0.1 and an overlap threshold of 0.3 is a good starting point to explore the data. By default, both are set at 0.05. A final note on this is that the MS2LDA model requires all fragmented spectra to be part of at least one Mass2Motif. Therefore, in some cases, fragmented molecules might have a very high probability but very low overlap with the Mass2Motif - this happens to molecules that have a unique fragmentation spectrum compared to all other spectra in the data set.

In particular from the Summary page, the discovered Mass2Motifs can be studied and annotated from the Summary Page by clicking Mass2Motif in the Mass2Motif Details table of the Summary Page. The Table contains the degrees and annotations (if there). When clicking on a Mass2Motif link, details on the selected Mass2Motif are shown. Annotation can also be assigned from this screen. In the example below, we assign the annotation "Histidine substructure" to this Mass2Motifs based on the top fragments (110.07176, 156.07684, etc) shown in the table. The Mass2Motifs can also be assessed through the Show Mass2Motifs Page.

Motif Annotation


6. Mass2Motif Matching

For quick annotations of a large number of Mass2Motifs, manual annotation can be tedious. The motif matching functionality can be used to speed up this process. This functionality is launched from the Start Motif Matching link from the functionality tabs of an experiment. Matching is performed based on the cosine similarity, which is specified as a user-configurable option. To begin motif matching, select a motifset to match against and specify the minimum cosine similarity score to select candidate matches. Click the Start matching button.

Motif Matching (Start)

Matching will be performed in the background. Upon completion, match results will be shown in the Manage Motif Matches screen, as shown below. The first column shows the original Mass2Motifs discovered in this dataset. The second and third columns show the best match Mass2Motifs (according to cosine similarity) in the target dataset. The match score is shown in the next column. Clicking Add Link will create a link between the pair of Mass2Motifs, transferring their annotations from the matched to the original Mass2Motif. It is important to realize that if the matched annotation changes, so will the annotation of the linked Mass2Motif.

Motif Matching (Manage)


7. Interactive Network Visualisation

Interactive visualisation can be launched from the Start Visualisation link from the functionality tabs of an experiment. The minimum degree is the minimum threshold to set to draw an edge connecting a Mass2Motifs to adjacent spectra that can be explained by that Mass2Motif, e.g. a value of 5 means edges are drawn only when a Mass2Motif is connected to 5 spectra (at the specified threshold in the experiment option). Please note that if all Mass2Motifs contain more than 5 spectra, the network might take a while to load and we advise users to higher the minimum degree for interactive network visualisation.

Visualisation (Start)

The next screen shows the interactive visualisation. Circles are Mass2Motifs, while squares are spectra (fragmented metabolites). If the network appears as a small pile of circles, please use the left-click mouse to select a Mass2Motif and drag it slightly away from the pile - the network will 'explode' as result. The network can be enlarged or made smaller by zooming in or out using the mouse wheel or a similar action. Selecting (double-clicking on) a Mass2Motif in the network will display more information in other panels, including the fragmentation spectra that are explained by this motif and the counts of occurrences of this motif amongst the spectra. Associated spectra (fragmented metabolites) will be highlighted as well after selecting the Mass2Motif. Annotated Mass2Motifs will be coloured red in the network and the annotations will be visible when hoovering over them with the mouse. Other Mass2Motifs will appear orange and Mass2Motif numbers will appear when hoovering over them with the mouse. Similarly, information on the fragmented metabolites including the precursor ion will appear when hoovering over the squares with the mouse. Motif nodes, annotations, and fragmented ions in the network can also be searched through the search box at the top of the page and subsequently quickly and convienently selected in the network.

Visualisation (Network)


8. Combining MS1 Differential Expression or Prevalence with MS2LDA

Where available, MS1 analysis can be performed to map differential expression or prevalence of metabolites based on their MS1 intensities. In order to do so, you will need to upload a CSV file that includes the fragmented sample. Please look carefully at the requirements for the CSV file at the bottom of the Create Experiment page. Please also note that this is currently possible alongside an mzml file only. During the preprocessing, the fragmented features of the fragmentation mzml file will be matched to those present in the CSV file based on their m/z values and retention times, so ensure that the retention times are comparable and that the m/z and retention time matching parameters are set correctly. Once the MS1 features are matched, MS1 analysis can be done using the Create MS1 analysis. The list of sample names is available in the middle of the page (see figure below), and after selection of samples (e.g., 5 treatment replicates of which one was fragmented) the arrows can be used to move the samples to group 1 on the left. Similarly, group 2 (on the right) can be populated. A t-test comparative analysis is performed with group 1 over group

2.

Differential Expression

In order to analyze the MS1 analysis, it needs to be mapped on the network in the visualisation page. After loading the network in the visualisation page, the user can toggle Show MS1 analysis in the network at the left bottom of the page which will change the appearance of the network (see Figure below). Now, Mass2Motifs are coloured green - the greener they are, the more differential metabolites contain that particular Mass2Motif. Additionally, differential metabolites are coloured (red is up, blue is down - the darker, the larger the fold change) and sized according to their significance, the larger the significance. As in the regular network, users can click on Mass2Motifs to view the spectra and other statistics. However, the Mass2Motif and/or number is now accompagnied by the PLAGE score (the higher, in the more differential metabolites the Mass2Motif is present, independent on the direction of the fold change). The user can return to the 'standard network view' by detoggling the Show MS1 analysis in the network option.

Differential Expression (Network)