Jump to contentJump to page navigation: previous page [access key p]/next page [access key n]
Applies to mineXpert 5.8.0

1 Generalities

In this chapter, I wish to introduce some general concepts around the mineXpert program and the way data elements are named in this manual and in the program.

A mass spectrometry experiment generally involves monitoring the m/z value of analytes injected in the mass spectrometer along a certain time duration. The m/z value of each detected analyte is recorded along with the corresponding signal intensity i, so that a mass spectrum is nothing but a series of (m/z,i) pairs recorded along the acquisition duration. All along the acquisition, the precise moment at which a given analyte is detected (and its (m/z,i) pair is recorded), is called the retention time of that analyte (rt). This retention time is not to be misunderstood as the drift time of that analyte in an ion mobility mass spectrometry experiment.

1.1 General concepts and terminologies

Most generally, the mass spectrometer acquires an important number of spectra in, say, one second. But all these spectra are combined together, and, on the surface, the massist only sees a slow acquisition of 1 spectrum per second. This apparent slow acquisition rate is configurable. At the time of writing, generally 1 spectrum per second is recorded on disk. So, say we record mass spectra for 5 minutes, we would have recorded (5*60) spectra.

1.2 Acquiring Mass Data Along Time: To Profile or Not To Profile?

As a mass spectrometry user, the reader of this manual certainly has used mass spectrometers where mass spectra are acquired and stored in different ways:

  • Mass spectra are acquired and summed—the next to the previous—in such a manner that one is left, at the end of the acquisition, with a single spectrum of which the various peak intensities have been increasing all along the acquisition. Indeed, in this mode, each new spectrum is actually combined to the previously acquired ones. The resulting mass spectrum that is displayed on screen and that gets ultimately stored on disk is called a combined spectrum. This is typically the way MALDI-TOF mass spectrometers are used when acquiring data from samples deposited onto sample plates. We refer to this kind of acquisition as an accumulation mode acquisition;

  • Mass spectra are acquired and stored on disk as a single file containing all the spectra, appended one after the other. There is no combination of the spectra: each time a new spectrum is displayed on screen, that spectrum is appended to the file. [4] This is typically the case when mass spectra are acquired all along a chromatography run and is generally called a profile mode acquisition.

1.3 Mass Data Visualisation: To Combine or Not To Combine?

In the previous section, we mentioned spectrum combination a number of times. What does that mean, that spectra are combined together into a single combined spectrum? Say we have 200 spectra that need to be combined together into a single spectrum that summatively represents the data of these 200 spectra.

First, a new spectrum would be allocated (result spectrum), entirely empty at first. Then, the very first spectrum of the 200 spectra is literally copied into that result spectrum. At this point the combination occurs, according to an iterative process that has the following steps:

  • Pick the next spectrum of the 200-spectra dataset;

    1. Pick the first (m/z,i) pair of the currently iterated spectrum;

    2. Look up in the result spectrum if a m/z value identical to the m/z value of the current (m/z,i) pair is already present;

    3. If the m/z value is found, increment its intensity by the intensity of the (m/z,i) pair;

    4. Else, if the m/z value is not found, add the current (m/z,i) pair to the result spectrum;

    5. Iterate over all the remaining (m/z,i) pairs of the current spectrum and redo these steps.

  • Iterate over all the 198 remaining spectra of the dataset and do the steps above for each single iterated spectrum.

At the end of the two nested loops above, the combined spectrum is still a single spectrum that represents---summatively---all the 200 spectra. This whole process is very computing-intensive, in particular if:

  • The m/z range is large: there are lots of points in each spectrum, which means that for each new (m/z,i) pair we need to iterate in the long list of m/z values that make the result spectrum;

  • The resolving power of the mass spectrometer is high: there are many points per m/z range unit.

When a profile mode acquisition is performed, the user gets an innumerable number of distinct spectra, all appended to a single file. These unitary spectra are virtually unusable if an initial processing is not performed. This initial processing of the spectra is called total ion current chromatogram calculation. What is it? Let's say that the user has performed a profile mode mass spectrometry acquisition on the eluate of a chromatography column. Now, imagine that the spectrometer stores the mass data at a rate of one spectrum per second and that the chromatography gradient develops over 45 min: there would be a total of (45 * 60) spectra in that file. The question is: —How can we provide the user with a data representation that might be both meaningful and useful to start mining the data? The conventional way of doing so is to load all the mass spectra and compute the total ion current chromatogram (the TIC chromatogram). The analogy with chromatography is evident: the TIC chromatogram is the same as the UV chromatogram unless optical density is not the physical property that is measured over time; instead, the amount of ions that are detected in the mass spectrometer is measured over time. That amount is actually the sum of the intensities of all the (m/z,i) pairs detected in each spectrum. When mass data are acquired during a chromatography run, often, the total ion current chromatogram mirrors (mimicks) the UV chromatogram[5]. For each retention time (RT) a TIC value is computed by summing the intensities of all the (m/z,i) pairs detected at that specific RT.

How is this total ion current chromatogram computed? This is an iterative process: from the first spectrum (retention time 0 s), to the second spectrum (retention time 1 s) up to the last spectrum (retention time 45 min), the program computes the sum of the intensities of all the spectrum's (m/z,i) pairs. That computation ends up with a map that relates each RT value with the corresponding TIC value. The TIC chromatogram is nothing but a plot of the TIC values as a function of RT values. In that sense, it is indeed a chromatogram.

mineXpert works exactly in this way. When mass spectrometry data are loaded from a file, the TIC chromatogram is computed and displayed. This TIC chromatogram serves as the basis for the mass data mining, as described in this manual. The TIC chromatogram serves as the basis for spectral combinations that can be performed in various ways, and not all formally combinations, which is why I prefer the term integrations. Some of these integrations are described below:

  • Integrating data from the TIC chromatogram to a single mass spectrum;

  • Integrating data from the TIC chromatogram to a single drift spectrum;

Note that the reverse actions are possible (and indeed necessary for a thorough data mining): selecting a region of a mass spectrum and asking that the TIC chromatogram be reconstituted from there; or selecting a region of a drift spectrum and asking that the TIC chromatogram be reconstituted from there also. Finally, integrations may, of course, be performed from a mass spectrum to a drift spectrum, and reverse.

1.4 Examples of Various Mass Spectral Data Integrations

In the sections below, the inner workings of mineXpert are described for some exemplary mass data integrations. For example, when doing ion mobility mass spectrometry data mining, it is essential to be able to characterize most finely the drift time of each and any analyte. Since each analyte is actually defined as one or more (m/z,i) pairs, it is essential to be able to ask questions like the following:

  • What is the drift time of the ions below this mass peak?

  • What are all the drift times of all the analytes going through the mobility cell for a given retention time range?

  • What are all the ions that are responsible for this shoulder in the drift spectrum?

1.4.1 TIC -> MZ integration

What computation does actually mineXpert do when a mass spectrum is computed starting from a TIC chromatogram region, say between retention time RT minute 7 and RT minute 8.5?

  1. List all the mass spectra that were acquired between RT 7 and RT 8.5. In this spectral set, there might be many hundreds of spectra that match this criterion, if we think that, in ion mobility mass spectrometry, ≈ 200 spectra are acquired and stored individually every second (I mean it, every 1 s time lapse);

  2. Allocate a new empty spectrum—the combined spectrum—and copy into it without modification the first spectrum of the spectral set;

  3. Go to the next spectrum of the spectral set and iterate into each (m/z,i) pair:

    • Check if the m/z value of the iterated pair is already present in the combined spectrum. If so, increment the combined spectrum's (m/z,i) pair's intensity value by the intensity of the iterated (m/z,i) pair's intensity. If not, simply copy the iterated (m/z,i) pair in the combined spectrum;

    • Iterate over all the remaining (m/z,i) pairs and perform the same action as above.

  4. Iterate over all the remaining spectra of the spectral set and perform step number 3.

mineXpert then displays the combined spectrum.

1.4.2 TIC -> DT integration

What computation does mineXpert actually do when a drift spectrum is computed starting from a given TIC chromatogram region, say between retention time RT minute 7 and RT minute 8.5?

What is a drift spectrum? A drift spectrum (mobilogram) is a plot where the cumulated ion current of the detected ions is plotted against the drift time at which they were detected. Let's see how that computation is handled in mineXpert:

  1. Create a map to store all the (drift time, intensity) pairs that are to be computed below, the (dt,i) map;

  2. List all the mass spectra that were acquired between RT 7 and RT 8.5. The obtained list of mass spectra is called the spectral set;

  3. Go to the first spectrum of the spectral set and compute its TIC value (sum of all the intensities of all the (m/z,i) pairs of that spectrum). Get the drift time value at which this mass spectrum was acquired. We thus have a value pair: (dt, i), that is, for drift time dt, the intensity of the total ion current is i;

    At this point, we need to do a short digression: we saw earlier that, at the time of this writing, one of the commercial instruments on which the author of these lines does his experiments stores 200 spectra each second. These 200 spectra actually correspond to the way the drift cycle is divided into 200 bin (time bins). That means that in the retention time range [7–8.5], there are (1.5*60) complete drift cycles. And thus there are (1.5*60) spectra with drift time x, the same amount of spectra with drift time y, and so on for the reminaing 198 time bins. Of course, a large number of these spectra might be almost empty, but these spectra are there and we need to cope with them.

    The paragraph above must thus lead to one interrogation about the current (dt,i) pair: —Has the current dt value be seen before, during the previous iterations in this loop?. If not, then create the (dt, i) pair and add it to the (dt,i) map; if yes, get the dt element in the map and increment its intensity value by the TIC value computed above;

  4. Iterate over all the remaining spectra of the spectral set and perform step number 3.

At the end of the loop above, we get a map in which each item relates a given drift time with a TIC value. This can be understood this way: —For each drift time value, what is the accumulated ion current of all the ions having that specific drift time?.

At this point, mineXpert displays the drift spectrum (mobilogram).

[4] Although there certainly is spectrum combination going on in the guts of the software, because the system actually acquires much more spectra than is visible on screen and each newly displayed spectrum is actually the combination of many spectra acquired under the surface.

[5] Unless eluted analytes do absorb UV light but do not either desorb/desolvate or ionize, or both.

Print this page