Jump to contentJump to page navigation: previous page [access key p]/next page [access key n]
Applies to mineXpert 5.8.0

2 Visualizing mass spectral data

Data mining, in mass spectrometry, entails, for a large part, the relentless scrutinization of the mass spectra by an expert eye. Without a powerful mass spectrum viewer, capable of numerous data display modes, the expert eye remains powerless.

After having completed this chapter you will be able to perform mass spectrum visualization and analysis, optionally reporting all the analysed peaks to a file on disk.

2.1 Opening mass spectrum files

To start a mineXpert session, open one or more mass spectra using the menu Open full mass spectrum file(s) from the File menu.

mineXpert understands the mzML format and the file loading procedures are delegated to the excellent libpwiz library from the ProteoWizard project[6]. Simple txt,asc data where m/z and i values are separated by any character that is neither a newline nor a dot nor a digit (loading is handled by a private parser) can be loaded either from file or directly from the clipboard. A third format is SQLite3, a private open/documented database format (private parser) that is used in mineXpert to allow slicing very big datasets into smaller chunks. Incidentally, the SQLite3 format allows for faster data loads.

There are two variants of the mass spectrometry file opening menu, one for which all the mass data are read from file and stored in memory and one for which the mass data are read from file in streamed mode, used to compute the TIC chromatogram and discarded. The latter mode is useful when the mass data are so large that they cannot fit in memory. The TIC chromatogram that is computed in streamed mode is then used to access the mass data in the file according to criteria set by the user (retention time range, for example).

2.2 The Window Layout

The graphical interface of mineXpert comprises a number of windows where data and informations are displayed. These windows are described below (see Figure 2.1, “General view of the graphical user interface”):

  • mineXpert main program window: this is an unintrusive window sporting the main menu and a status bar where messages are displayed;

  • The Loaded mass spectrum files window, that lists all the mass spectrometry data files that are currently loaded in the program;

  • The TIC chromatogram window[7] where the various TIC chromatograms are displayed for the various mass spectrometry data files that have been loaded. There is, by definition, a single TIC chromatogram per data file currently loaded in the program. However, this window will also display TIC chromatograms that are computed as an integration step from the other windows, like from the Mass spectrum window or from the Drift spectrum window. In this case, the chromatogram is an extracted ion current chromatogram (XIC chromatogram);

  • The Mass spectrum window, where the various mass spectra are displayed. A given mass spectrum may originate from a TIC chromatogram or from a drift spectrum, or even from a color map. A given originating chromatogram or drift spectrum or color map may be the origin of more than one derived mass spectrum;

  • The Drift spectrum window, where the various drift spectra are displayed. Drift spectra can originate from the TIC chromatograms, from the mass spectra or from the color map;

  • The Color map window, that contains a single color map for each loaded mass data file. At the time of this writing, there is no way to produce a color map from any other window;

  • The m/z integration parameters window, where the parameters governing the mass data integrations to a mass spectrum are set;

  • The XIC extraction parameters window, where the parameters governing the XIC extractions to a XIC chromatogram are set;

  • The Console window, where the various messages or analysis data elements are displayed for the user to select, copy and paste in an electronic lab-book;

General view of the graphical user interface

The position and size of all the windows can be stored for a later session

Figure 2.1: General view of the graphical user interface

2.3 The Main Program Window Menu

The menu bar in the main program window displays a number of menu items, reviewed below:

  • File

    • File -> Open full mass spectrum file(s) Choose the mass spectrum file(s) to load. Note that the full descriptor indicates that the user wants to actually load the full data set in memory. This means that she explicitely knows that the system's memory will cope will all the data in the file;

    • File -> Open streamed mass spectrum file(s) Choose the mass spectrum file(s) to load. Note that the streamed descriptor indicates that the user wants not to actually load the full data set in memory. This is typically the case when the data file is so large that its data cannot fit in memory. The program then only looks at the data in the file and crafts, piecemeal, the TIC chromatogram and the color map. In this context, any other data integration will be performed by looking into the same mass data file since no data are available in memory;

    • File -> Mass spectrum from clipboard creates a mass spectrum from a textual representation of (m/z,i) pairs in the same format as described above for the txt,asc file format;

    • File -> Analysis preferences Define the analysis preferences. The analysis preferences govern how the data about scrutinized

  • Plot -> Clear plots Clears all the plots currently displayed in the program. The plot items in the Loaded mass spectrum files window are all removed. Note that this releases all the memory that was used by the data. This menu is equivalent to closing all files;

  • Windows

    The menus are self-explanatory, as they explicitely explain which window is to be shown. The Save workspace menu records on disk the position and size of all the windows, so that upon reopening the program, the windows all position themselves at the recorded position and size;

  • Help

    This menu's items show help about the program itself and also about the Qt libraries that were used to build it. These informations are essential in case the user wants to make a bug report.

2.4 The main data windows

This section will succinctly describe the main data windows of mineXpert. Each window will be described in greater detail when the features of the program will be described.

2.4.1 The TIC chromatogram window

Each time a new mass spectrum file is loaded, its corresponding TIC chromatogram is computed and then displayed in a new plot widget in the TIC chromatrogram window (Figure 2.2, “The total ion current (TIC) chromatogram window”). Each new TIC chromatogram plot generated as a result of the loading of a mass spectrometry data file is plotted using a new color. That color encodes the filiation of the whole set of plots that are generated starting from that initial TIC chromatogram plot. For example, a red TIC chromatogram plot that serves as the starting point for a mass spectrum integration will trigger the creation of a mass spectrum plot widget that will have a red graph in it. Same is true for the color map widget that has its axis and tick labels of the same color as that of the TIC chromatogram plot.

The total ion current (TIC) chromatogram window

As many traces as necessary can be shown in the window. Each plot has its own set of markers.

Figure 2.2: The total ion current (TIC) chromatogram window
Note
Note: Creating a mass spectrum from non-profile acquisitions

The attention of the reader is drawn on the specific situation whereby the user loads mass spectrometric data from a non-profile acquisition data file or from clipboard data. For example, when a mass spectrum is opened from a txt,asc,xy text-based format file where the data correspond to a single spectrum, not a sequence of spectra, the TIC chromatogram really has a single (rt,i) pair denoting the TIC intensity a the single retention time of that very unique spectrum. The TIC chromatogram window thus artificially creates and displays a TIC chromatogram that is a simple line, like shown in Figure 2.3, “Loading a single-spectrum data file or a mass spectrum from clipboard data”. Because the is a single point, the user has nothing to do than integrate these data to a mass spectrum. This is the reason why this integration is actually performed automatically and the spectrum is thus shown in the Mass spectrum window.

Loading a single-spectrum data file or a mass spectrum from clipboard data

When data are loaded from a file containing a single spectrum or from the clipboard, the TIC chromatogram only contains a single data point. The mass spectrum is automatically displayed in the Mass spectrum window.

Figure 2.3: Loading a single-spectrum data file or a mass spectrum from clipboard data

2.4.2 The mass spectrum window

The mass spectrum window contains all the plot widgets that display mass spectra that originated in other windows. For example, the user might select a region in a TIC chromatogram and then ask that a mass spectrum integration be computed. In this case, the resulting mass spectrum is displayed in a new plot widget that is located in the mass spectrum window (Figure 2.4, “The mass spectrum window”).

The mass spectrum window

This mass spectrum window shows multiple mass spectra.

Figure 2.4: The mass spectrum window

2.4.3 The drift spectrum window

As described for the mass spectrum window, the drift spectrum window contains all the plot widgets that display drift spectra (Figure 2.5, “The drift spectrum window”).

The drift spectrum window

This drift spectrum window shows multiple mass spectra.

Figure 2.5: The drift spectrum window

2.4.4 The color map window

The color map window displays a color map view of the drift data in the form of m/z vs drift time (dt). The intensity of the m/z values is coded in colour. The axes can be switched, such that either the m/z vs dt or the dt vs m/z representation can be obtained (see toolbar button on Figure 2.6, “The color map window”).

The color map window

The color map window relates mass spectra to drift times.

Figure 2.6: The color map window

2.5 General structure of the windows containing plot widgets

The TIC chromatogram, mass spectrum and drift spectrum windows are all structured in a similar way. The window is divided vertically in two compartments. The bottom compartment will host all the plot widgets stacked vertically. The top compartment hosts a single plot widget where all the graphs that are displayed unitarily in the lower compartment are shown superimposed.

Note
Note: The multi-graph plot widget at the top of the windows

The plot widget that is packed in the top compartment of the window is called the multi-graph plot widget because it can hold more than one graph. The plot widget(s) that is(are) packed in the bottom compartment of the windows is(are) called single-graph plot widget(s) because each plot contains only one graph.

The two vertical compartments of the window are resizable by dragging the sliding horizontal bar that separates them. It is possible to totally occlude one of the compartments by dragging that sliding bar all the way up (or down) to the window side.

The behaviour described above does not apply to the Color map window that has no upper compartment with all the color maps superimposed.

2.6 The data-displaying widgets

The TIC chromatogram, color map, mass spectrum and drift spectrum windows all contain plot widgets (or color map widgets) that have a general working scheme as to how the data can be visualized. The main visualization operations are succintly described below. The following convention will be used to describe the mouse buttons:

  • left-mouse-button: left mouse button;

  • middle-mouse-button: middle mouse button;

  • right-mouse-button: right mouse button.

The different plot or colormap graph visualization methods are detailed below:

  • Zooming in and zooming out:

    • Zoom in: left-mouse-button-click-drag to draw a selection rectangle. When the mouse button is released, the new plot view contains the data contained in the selection rectangle;

    • Zoom in: right-mouse-button-click-drag along the X-axis over the region to zoom and release the mouse button. The new zoomed view does not automatically scale to full scale in the Y-axis direction. To ensure that the new view automatically scales on the Y-axis, press Shift while releasing the mouse button. When the zoom in operation is performed using Shift in the multi-graph widget, the Y-axis is set to full scale with respect to the point having the maximum intensity of all the graphs being shown at that moment;

      Figure 2.7:

      As seen in the figure above, the region defined by the left-mouse-button-click-dragging operation is delimited by green and red markers, respectively at the start and at the end of the selection. The distance between the start and end points is updated along the mouse move operation.

    • Zoom in/out: right-mouse-button-click-drag on the X- or Y-axis to interactively zoom in or out along the selected axis. In this mode, the zoom operates by contracting/expanding the data in such a manner that the left/bottom part of the graph (the origin of the graph) is anchored and does not move. When the drag occurs towards larger values on the clicked axis, the view is zoomed in along that axis. Conversely, it is possible to zoom out by dragging the mouse towards lower axis values. When the number of points in the plot is so large that the zoom operation is sluggish, pressing Ctrl will fluidify the zoom operation;

    • Zoom in/out: The middle-mouse-button-wheel-rotation can be used to zoom in or out the whole plot on both the X- and Y-axis simultaneously. Note that the position of the mouse cursor when the wheel is rolled defines the new view of the plot. Practising a bit allows to make that zooming in/out mode very powerful.

    • Zoom out: To reset the zoom along one axis, left-mouse-button-double-click that axis. In this case, only the clicked axis will be full-scale, the other axis remains unchanged. To reset the zoom such that the full scale is calculated on the data set displayed after the zoom, maintain the Shift key pressed when double-clicking. To reset the zoom on both axes in one go, left-mouse-button-double-click one of the axes maintaining the Ctrl key pressed;

  • Panning:

    left-mouse-button-click-drag on one of the axes to pan the plot view along that axis;

  • History:

    Each time a new zoomed in/out view of the plot is triggered, a history element is stored in the plot widget. To back-replay the various steps of the zoom in/out operations in sequence, from pre-last to first, hit the Backspace key. The exceptions to this mechanics is when the plot view is panned or when the mouse wheel is used.

Note
Note: Locking the x and y axes of all the plots

The tool bar located at the top of the windows described above contains two buttons that allow to lock the x axis (the button icon has the horizontal red line) and/or the y axis (red line is vertical) range throughout all the graphs displayed in the window. This of great use when the user wants to compare a number of graphs that have been obtained on comparable samples. The movements and zooming-in or zooming-out operations in one graph are then synchronized to all the other graphs.

The third button performs a transpose operation. When the color map is initially created, the horizontal axis (the keys of the map) is the drift time axis and the vertical axis (the values of the map) is the m/z axis. The transpose operation switches the representation of the map such that the axes are inverted.

2.7 Data integrations featured by mineXpert

Analyzing mass spectrometric data (with or without drift data) usually involves performing various data integrations in sequence. We saw earlier that the first data that are plotted upon loading a mass spectrometry data file are the TIC chromatograms along with (if applicable) the m/z vs dt color maps. These two graphed data sets are the starting points for the mass spectrometric data mining, that may involve the following integration operations:

  • TIC chromatogram to mass spectrum This kind of operation is triggered upon right-mouse-button-click-dragging the mouse over the region of interest and maintaining the S key pressed. mineXpert integrates all the spectra that have been acquired at all the retention times between the start and the end of the selected region. A new mass spectrum is then plotted in a new plot widget in the mass spectrum window;

    Figure 2.8:

    As seen on the figure above, the region defined by the right-mouse-button-click-dragging operation is delimited by arrows, a green marker at the start and a red marker at the end.

  • TIC chromatogram to drift spectrum This kind of operation is similar to the one described above, unless the D key must be pressed. As above, a new drift spectrum is appended to the drift spectrum window.

  • Color map to mass spectrum This operation involves right-mouse-button-selecting a rectangular region of interest on the color map and by maintaining the S key pressed. A new mass spectrum is then plotted in a new plot widget in the mass spectrum window. Note that, due to a bug in the plotting library, the rectangle is not currently drawn on top of the color map.

  • Color map to drift spectrum Same as above, but with the S key pressed. As above, a new drift spectrum is appended to the drift spectrum window. Same remark as above.

  • The same mechanics is at work in the other plot widget windows. For example, to trigger the integration of a mass spectrum starting from a drift spectrum, simply drag the mouse over the drift spectrum and maintain the S key pressed.

    Note
    Note

    Rule of thumb: when a maSs spectrum is to be generated, use the S key, when a Drift spectrum is to be generated, use the D key and, finally, when a Retention time TIC chromatogram is to be generated, use the R key.

  • One of the most interesting features for detailed mass data mining is the integration to a TIC intensity ( right-mouse-button-click-drag with I pressed). That integration can be triggered from any of the data windows (any single-graphplot widget in any of these windows, that is). No plot is created, the data are simply displayed in the status bar of the window and in the Console window.

    Important
    Important

    It is important that the mouse is maintained still right after having triggered the intensity calculation because otherwise the status bar message displaying the result vanishes.

    This integration will be discussed later.

Figure 2.9, “The tool bar and its buttons” shows the various buttons of a plot window. The button with the `?' character will show a tooltip describing the various keyboard/mouse combinations to use to trigger the various data combinations described ealier.

The tool bar and its buttons

The ? button shows a tool tip helping the user to select the proper keyboard/mouse combination to perform a given integration task.

Figure 2.9: The tool bar and its buttons

The filiation of the plots is maintained using identifying colors. However, color is not enough to unambiguously identify the filiation of any given plot. Indeed, the same TIC chromatogram or Color map plot can be used multiple times to perform integrations. The newly created plots will have the same color as the originating plot, but it will not be possible to distinguish between all the child plots. This is why the plots maintain a history of the way they have derived from the initial TIC chromatogram/Color map plot. This history is shown in a small widget that shows up when the O key is pressed while the mouse cursor hovers over the widget at hand. One example of plot history is shown in Figure 2.10, “The plot filiation history widget”.

The plot filiation history widget

Each plot has a filiation history that can be displayed by keying O.

Figure 2.10: The plot filiation history widget

This figure shows the filiation history of plots. As shown, the first integration was performed when loading the mass data file in order to produce the TIC chromatogram. The first history item is thus [File -> RT], indicating that the plot widget graph originates from loading a file and computing the TIC chromatogram (RT stands for retention time).

The computed TIC chromatogram served as a starting point for an integration to a mass spectrum. The second history item shows just this, in addition to the first item that is still shown: [RT -> MZ]. The range shown indicates that the integration was performed over the specified range of RT values.

The final integration step was to compute a drift spectrum starting from the mass spectrum and that is denoted using the [MZ -> DT] expression. The concerned range is also shown.

As can be seen in the various filiation history items of Figure 2.10, “The plot filiation history widget”, there is always a History - innermost ranges section that lists the three ranges for RT, MZ and DT. What is that innermost range concept? The idea is that, when chaining integrations, each new plot reflects an always smaller subset of the initial dataset that was loaded from file. The RT, MZ, DT ranges may thus be reduced progressively. For each of these RT, MZ, DT properties, the innermost ranges is just that: the smallest range that is currently plotted in the plot widget at hand. An example is worth a thousand words:

  1. An [RT -> MZ] integration starting from a TIC chromatogram;

  2. The m/z range obtained is [500-2500]. From that mass spectrum, the user integrates to a drift spectrum [MZ -> DT];

  3. Then, from the drift spectrum, a peak seems interesting and the user back-integrates to a mass spectrum [DT -> MZ];

  4. In the mass spectrum, a mass peak is of interest and the user want to see at which retention times the m/z value elutes: she does an integration [MZ -> RT].

The innermost MZ range in the XIC chromatogram obtained at step 4 will be the last m/z range selected for the mass peak of interest, not the range at step 2.

2.8 Irks and Quirks of Data Integrations

Depending on the integrations that are triggered in the various data plot/map widgets, the computations vary significantly. This section will describe the general computation algorithms in such a manner that the mineXpert user can grasp what is actually going on in the guts of the software. The integrations to a mass spectrum are particularly sensitive to some parameters that will be described in detail in the following section.

2.8.1 Integrations to a mass spectrum

This integration occurs when the user right-mouse-button-selects a range in a given plot while pressing the S key. Integrations to a mass spectrum can be elicited from a TIC chromatogram plot, a color map plot or a drift spectrum plot. In all these cases, the integration computation (that is, a mass spectral combination) needs to be aware of the kind of data at hand.

In order to clarify what integration means in the context of the creation of a mass spectrum, that is, the summative integration (also known as combination) of any number of mass spectra, the following describes a combination in detail.

In this example, the user has loaded a mass data file obtained after an acquistion of mass data in profile mode. mineXpert calculates the TIC chromatogram right after having loaded the mass data. The user performs an integration for a given retention time range in the TIC chromatogram. If we consider an integration range [0–15] min, this is what would occur in the guts of mineXpert. In this example, we omit any step corresponding to any binning.

  • First of all, create a new mass spectrum (let's call it newMS, also known as the combination spectrum, that is, the result spectrum);

  • Extract from the mass spectrometry data all the spectra that have their internal rt value (retention time) contained in the [0–15] min interval. The list of extracted mass spectra (let's call that list msL) is then processed as follows:

  • Iterate in msL and for each iterated mass spectrum (iterMS):

    • Iterate in all the (m/z,i) pairs of iterMS and for each one check if the m/z value was already found in any of the previous mass spectra, that is, if a (m/z,i) pair in newMS has that m/z value. If:

      • the m/z was not found, copy the (m/z,i) pair in newMs;

      • else if the m/z value was already encountered in previously iterated mass spectra, increment the intensity of the corresponding (m/z,i) pair of newMs by the value of the iterated (m/z,i) pair. This is where the summative combination of mass spectra is at work.

At the end of this process, newMS will correspond to the summation of all the spectra contained in the msL list. The newMS mass spectrum is then plotted in the mass spectrum window as a new plot. The color of the newMS plot is the same as the color of the initial TIC chromatogram plot.

The process described above can only work in very limited circumstances, with data files generated with particular instruments. In general, this process does not lead to a usable mass spectrum, as described in Figure 2.11, “Unusable combination spectrum without binning”. In this combination mass spectrum, computed from Lumos Orbitrap-originating data, the plot shows what should have been a high resolution monoisotopic peak (the m/z delta of the whole signal is 0.009). As can be seen, the signal in this mass spectrum is totally useless and the integration to a mass spectrum requires binning to overcome the presence of so many peaks in that 0.009 m/z interval.

mineXpert provides a number of ways to configure mass spectral combinations such that the obtained mass spectrum is usable. The m/z integration parameters that might be set are described in the following sections.

Unusable combination spectrum without binning

The mass data used to compute this combination spectrum originate from a Lumos Orbitrap analyzer. The visible signal should have been a single high-resolution monoisotopic peak. The width of signal is 0.009, as shown by the ruler.

Figure 2.11: Unusable combination spectrum without binning

2.8.1.1 Considerations on the diversity of mass data contents

Loading data from mass data files in mzML format does not guarantee that the data will be of the same kind when they originate from different mass spectrometers. For example, data from Orbitrap mass spectrometers have the following characteristics:

  • All spectra do not start at the same m/z value;

  • All spectra do not have the same number of data points (they do not have the same size);

  • A large number of data points might have 0 values (intensity at a given m/z value is 0);

  • The m/z delta between two consecutive m/z values is not constant, and this is the major difficulty for data integration to a mass spectrum.

This is the output of the statistical analysis of the data loaded from a Lumos Orbitrap-originating file:


Spectral data set statistics:
Total number of spectra: 6203
Average of spectrum size: 391.311946
StdDev of spectrum size: 168.062934
Mininum m/z value: 400.007111
Average of first m/z value: 401.448935
StdDev of first m/z value: 1.590049
Maximum m/z value: 1999.928589
Average of last m/z value: 1901.852315
StdDev of last m/z value: 45.864131
Minimum m/z shift: -0.344452
Maximum m/z shift: 0.000000
Average of m/z shift: 1.097372
StdDev of m/z shift: 1.590049
Smallest Delta of m/z (step): 0.006195
Average of smallest Delta of m/z (step): 0.023757
StdDev of smallest Delta of m/z (step): 0.013179
Greatest Delta of m/z (step): 405.356934
Average of greatest Delta of m/z (step): 163.112057
StdDev of greatest Delta of m/z (step): 75.947334

As mentioned earlier, the most interesting bit of information is in the line reproduced below:

Smallest Delta of m/z (step): 0.006195

That 0.0062 value somehow gives an indication of the definition of the spectrum, that is, the smallest distance possible between two consecutive points in the m/z axis.

In general, the fact that the spectra of an acquisition do not all have the same m/z vector as the m/z axis is a great difficulty for mass spectral integration because it requires setting up binning prior to performing the mass spectral combination. That binning is nothing else than crafting a m/z value vector able to receive the intensities of all the m/z data points in the spectra to be combined. These concepts are developed in the following paragraphs.

2.8.1.2 Statistical analysis of mass data

At the end of the data file loading, mineXpert performs a rudimentary statistical analysis of the data. The main datum of interest is the smallest m/z step that is observed in the whole set of mass data loaded from disk (the mass spectrum list, that can hold mass spectra in the thousands). For each mass spectrum in the list, the smallest m/z delta between any two consecutive data points is recorded. Then, the smallest ever m/z delta value is sought amidst all the recorded values. Intuitively, that smallest m/z delta value provides an idea of the resolution power of the instrument that generated the mass spectra. Interestingly, this is not the proper value to configure binning. The best value is the median value of the smallest m/z delta values encountered over all the mass spectra of the data file. It is the value that is suggested by default to arbitrarily construct the bins during an integration to a mass spectrum, as described in Figure 2.12, “The m/z integration parameters window” (Arbitrary binning value with bin size unit MZ).

The m/z integration parameters window

Integrations to a mass spectrum (whatever the source, TIC chromatogram, drift colormap or drift spectrum) can be configured to ensure the best results depending on the kind of mass data. Proper binning configuration is key to getting best results.

Figure 2.12: The m/z integration parameters window

The bin size units, when using Arbitrary binning, might be MZ, PPM or RES. In the two latter cases, the bin size changes along the m/z axis. It increases along with increasing m/z values. For example, if the bin size unit is PPM and the bin size is 10, then at m/z 300, the bin size would be m/z 0.003, while at m/z 2000, the bin size would be m/z 0.02. If the bin size unit is RES and the bin size is 10000, then at m/z 300, the bin size would be m/z 0.003, while at m/z 2000, the bin size would be m/z 0.02.

Once the Arbitrary binning is selected in the m/z integration parameters window, the bin size and bins size unit have been set, the program creates the bins in the combination mass spectrum according to these settings. The first bin and the last bin are simply the smallest and greatest m/z values found in all the spectra to be combined. The program fills in the void in between these two values in steps matching the bin size with or without PPM/RES ponderation: In case the bin size unit is MZ, there is no specific calculation; if the bin size unit is either PPM or RES, then the bins are calculated accordingly, as shown in the examples above. Once the bins have been set up in the combination spectrum, the actual combination of all the mass spectra can take place.

2.8.1.3 Effects of the m/z integration parameters

This section provides some examples of how the integration parameters might impact the mass spectrum resulting from combination of mass spectra. Also, this section details the general guidelines for ensuring the best combination calculation.

When Data-based binning is recommended. Data-based binning means that the bins in the combination spectrum are nothing but the m/z values of the first spectrum of the mass spectral set to be combined. This is the simplest integration mechanism and is recommended when the mass data are perfectly coherent, that is, when all the mass spectra are rooted in the (roughly) same m/z value and the vector of m/z values along the m/z axis is reproduced over all the mass spectra of the combination set. This situation is exemplified in Figure 2.13, “Bruker microQTof acquisition of a protein mass data”.

Bruker microQTof acquisition of a protein mass data

In this example, not all the m/z integration parameters produce acceptable results for the combination mass spectrum. The integration with no binning produces an unusable spectrum. Note that the best results are for Data-based binning. This is because the m/z data from the Bruker microQTof instrument are very reproducible from a spectrum to the other. Setting the bins exactly as they are in the first spectrum of the mass spectrum list to be combined is thus efficient.

Figure 2.13: Bruker microQTof acquisition of a protein mass data

These combination spectra were obtained by performing a mass spectral integration of mass data acquired for a protein solution in a microQTof Bruker instrument. The mass acquisition settings were characteristic of a protein analysis in the 25–35 kDa range. The top spectrum was obtained by performing an integration with no binning at all. The spectrum is useless.

The statistical analysis of the mass data calculated after loading of the mass data had shown that the median smallest m/z delta value was of 0.017. The middle spectrum was obtained after an integration with an arbitrary binning of size 0.017 and with bin size unit MZ (constant bin size throughout the m/z vector). The result is much better than the one obtained earlier. Some glitches are still visible, but the data are eminently usable. The bottom spectrum was obtained by performing a combination with Data-based binning. This result is the best.

Note
Note: When removing 0-intensity m/z data points is useful

The setting up of bins ultimately consists in creating a mass spectrum out of preexisting data (the first mass spectrum of the set in the case of Data-based binning) or out of arbitrary values (the smallest and greatest m/z values of the spectral set, the bin size and finally the bin size unit). In the latter case, the data points making the newly created mass spectrum have their m/z value calculated and their intensity set to 0. Because the m/z value is calculated starting from an arbitrary bin size value, it might be possible that not a single data point in the whole set of mass spectra has a m/z value matching that bin m/z value. In that case, the m/z data point still has a 0-intensity value at the end of the mass spectral combination. This is illustrated in Figure 2.14, “Removing 0-intensity data points”. When the 0-intensity data points are not removed (upper spectrum), the signal is deteriorated by these inverted spikes. Removal of the 0-intensity data points, cleans the trace perfectly.

Removing 0-intensity data points

When arbitrary binning is performed, residual 0-intensity data points might survive in the combination spectrum, which deteriorates the resulting mass spectrum. Removing these data points from the combined mass spectrum cleans the trace.

Figure 2.14: Removing 0-intensity data points
Note
Note: When Savitzky-Golay filtering of the data might be useful

The Savitzky-Golay filtering method is widely known for its effectiveness in removing noise from mass spectral data. It is possible to apply that filter at the end of a mass spectral combination. The m/z integration parameters window allows setting the Savitzky-Golay parameters:

  • nL: specifies the number of data points to the left of the point being filtered;

  • nR: specifies the number of data points to the right of the point being filtered. The total number of points in the window that is considered for the regression is thus nL + nR + 1.

  • m: specifies the order of the polynomial to use in the regression analysis leading to the Savitzky-Golay coefficients (typically between 2 and 6);

  • lD: specifies the order of the derivative to extract from the Savitzky-Golay smoothing algorithm (for regular smoothing, use 0);

2.8.2 Integrations to a drift spectrum

This integration occurs when the user right-mouse-button-selects a range in a given plot while pressing the D key. In the detailed example below, the integration occurs for a given retention time range in the TIC chromatogram (integration range [0–15] min):

  • First of all, create a <dt,tic> map to store all the drift time values encountered below, along with the cumulated total ion current intensity value of the spectra acquired at the corresponding dt drift time;

  • Extract from the mass spectrometry data all the spectra that have their internal rt value (retention time) contained in the [0–15] min interval. The list of extracted mass spectra (msL) is then processed in such a manner that each mass spectrum (ms) it contains is iterated over:

    • Get the dt at which the ms was acquired;

    • Calculate the total ion current (tic) for the ms;

    • In the <dt,tic> map, check if the dt value was already found. If:

      • The dt was not already found, create one (dt,tic) pair and insert it in the map;

      • Else if the dt was already encountered in previously iterated mass spectra, increment the tic value of the corresponding (dt,tic) pair in the map by the tic value calculated above for ms.

At the end of this process, the <dt,tic> map will correspond to the drift spectrum. That spectrum is then plotted in the drift spectrum window as a new plot with the same color as that of the initial TIC chromatogram plot.

2.8.3 Integrations to a TIC intensity value

This integration occurs when the user right-mouse-button-selects a range in the TIC chromatogram plot while pressing the I key. The integration is performed by looking into the mass data for (m/z,i) pairs that match the current integration history of the current data plot and sums all the intensities to yield a final TIC intensity value. This value is printed in the status bar of the window. Be aware that if you move the cursor right after having performed the computation, the message in the status bar of the window will be erased. In this case, that same value is printed in the console window in the same color as the color of the plot from which it was computed.

Note
Note

Worthy of note is the fact that this kind of integration can be performed in the exact same way in the various data plots (TIC chromatogram, mass spectrum, drift spectrum, mz=f(dt) color map).

2.9 Chained Integrations

The user, in the process of mining the data, will inevitably chain integrations to pinpoint a specific feature of interest. For example, let's say that the user is mining ion mobility mass spectrometry data. After having loaded the data file, the colormap is computed and displayed (see Figure 2.15, “Example of chained integrations”))

Example of chained integrations
Figure 2.15: Example of chained integrations

There starts the exploration. The user sees that there are a number of species having discrete drift times at the m/z ratio around 1220 (lower region of the colormap). She thus integrates to a single drift spectrum ( right-mouse-button-click-drag with D pressed) that horizontal lower region of the colormap. The obtained drift spectrum is shown at the right hand side of Figure 2.16, “One example of chained integrations”.

Because there are multiple drift peaks in the drift spectrum, the user perform individual mass data integrations to a mass spectrum for each drift peak ( right-mouse-button-click-drag with S pressed). The mass spectra obtained are all shown in the middle window of the same figure.

One example of chained integrations
Figure 2.16: One example of chained integrations

Most interestingly, the various drift regions are integrated to almost identical m/z values in their respective mass spectrum. In order to know when the various molecular species eluted in the chromatogram, the user performs for each mass spectrum an integration to a XIC chromatogram ( right-mouse-button-click-drag with R pressed). It is then visible that each molecular species was eluting from the TIC chromatogram at discrete retention times (this was clearly not a true chromatography but instead an infusion with instrument parameters changed during the acquisition).

Note
Note

The dataset used for the Section 2.9, “Chained Integrations” section are kind courtesy of Dr. Valérie Gabelica and correspond to a work entitled Optimizing Native Ion Mobility Q-TOF in Helium and Nitrogen for Very Fragile Noncovalent Structures published in JASMS with DOI: 10.1007/s13361-018-2029-4.

2.10 Mass spectral feature analysis

When analysing a mass spectrum, two major deconvolutions are performed to get back to the Mr mass of the analyte while reading m/z values: the charge-based deconvolution and the monoisotopic cluster-based deconvolution. In the following sections, both deconvolutions are described.

2.10.1 Mass spectral deconvolution based on charge state

In this kind of deconvolution, at the present time, the software assumes that the ionization agent is the proton and that the ionization is positive.

The deconvolution is based on the determination of the distance between consecutive (or not) peaks of a given charge state envelope. When the user left-mouse-button-click-drags the cursor from one peak to another, the program tries to calculate if the distance between two peaks matches a charge difference. If so, it computes the molecular (Mr) mass of the analyte whose mass peak is located under the cursor. Figure 2.17, “Charge-based mass deconvolution (consecutive peaks)” shows that precise state for two consecutive peaks of a charge state envelope.

Note that the charge calculation almost never produces an integer value with no fractional part (say, charge z=15.0) because it is almost impossible to drag the mouse cursor the exact m/z range that would lead to such an integral charge value. Almost always, the charge that is calculated looks like 14.995 or 15.001, for example. Why is it impossible to drag the mouse cursor exactly the interval that would produce an integral charge value? Simply because the mouse moves at discrete positions on the screen that might be more or less far apart, depending on the mouse capabilities. The zoom state over the two peaks also plays a role. It is advised to zoom in as much as possible over the peaks at hand so as to minimize the difficulties above. It may happen however that even zoomed in peaks are not sufficiently distant to allow a charge calculation (this is the case in the upper spectrum of the Figure 2.17, “Charge-based mass deconvolution (consecutive peaks)”, where the computation could not be performed). In this case, reduce the stringency over the fractional part that is allowed in the charge. By default, the stringency is set at 0.99, that is, any calculated value that has a fractional part either superior or equal to 0.99 or inferior or equal to 0.01 would lead to a successful round-up/round-down to the nearest integer value. Outside of the ]0.01-0.99[ interval, no charge calculation is performed and thus no deconvolution is performed. When the stringency is too high, reducing it will allow the deconvolution to be carried-over (see bottom spectrum of Figure 2.17, “Charge-based mass deconvolution (consecutive peaks)”).

Charge-based mass deconvolution (consecutive peaks)

Approach using two consecutive mass peaks. Note that the 0.99 stringency set in the spin box in the status bar could not allow any deconvolution to be carried-over, while reducing that stringency to 0.97 allowed it to proceed successfully.

Figure 2.17: Charge-based mass deconvolution (consecutive peaks)

The status bar of the window documents the current inter-peak distance measurement operation that is performed by left-mouse-button-click-drag of the cursor starting at the left peak towards the right peak. The start peak is marked with a green marker and the end peak is marked with a red marker. Start and end positions are documented in the form (m/z start, i) -> (m/z end, i). Then, the m/z delta, that is, the distance between both positions is provided. When the end position matches a theoretically expected distance corresponding to a charge difference of 1, then the charge z of the peak under the cursor is provided and the molecular mass (Mr) is provided for the analyte whose peak is under the cursor.

It might happen that two consecutive peaks of the charge state envelope are not of a good shape enough to point and click precisely in the center of the peaks. In that case, the software allows indicating the number of intervals that run between two left-mouse-button-click-drag-connected peaks. This is illustrated in Figure 2.18, “Charge-based mass deconvolution (non-consecutive peaks)”. The user knew that she had to measure the distance between two peaks that were separated by two intervals. She thus incremented the interval value in the status bar to 2 and performed the measurement. The Mr value that is displayed is different than the previous one because without enlarging the window, it is more difficult to click right at the center of the gaussian shape of each peak. Theoretically, the Mr values should be identical, and actually are when the measurements are performed cleanly in widely-laid mass spectra.

Charge-based mass deconvolution (non-consecutive peaks)

Approach using two non-consecutive mass peaks. Note the 2 interval value in the status bar of the window.

Figure 2.18: Charge-based mass deconvolution (non-consecutive peaks)
Note
Note: The mouse drag position is significant

Note that the left-mouse-button-click-dragging direction (left -> right or right -> left) has an impact on the value of the charge (z) that is obtained, since that charge value is relative to the peak under the cursor at the moment of the deconvolution. Conversely, the mouse-dragging direction has no effect on the Mr (molecular mass) of the analyte obtained as a result of the deconvolution process.

2.10.2 Mass spectral deconvolution based on isotopic cluster peaks

In this kind of deconvolution, the user left-mouse-button-click-drags the cursor between the first two peaks (when possible) of the isotopic cluster. The charge state of the ion is the inverse of the m/z delta value that is the distance between the two consecutive peaks. Figure 2.19, “Isotopic cluster-based mass deconvolution” shows that deconvolution process at work.

Isotopic cluster-based mass deconvolution

The user has performed a left-mouse-button-click-drag between the peak under the green marker and the peak under the red marker. The m/z  distance between the two markers is computed and the inverse is the charge of the analyte under this isotopic cluster.

Figure 2.19: Isotopic cluster-based mass deconvolution
Note
Note: The mouse drag position is not significant

Note that the left-mouse-button-click-dragging direction (left -> right or right -> left) has no impact on the value of monoisotopic mass computed because the software postulates that the lightest ion is the peak on the left.

2.10.3 Reading the resolving power based on mass spectral data

When left-mouse-button-click-dragging the mouse cursor between two mass spectrum locations of interest, the program computes the apparent resolving power. This process is shown on Figure 2.20, “Calculation of the resolving power”, where the resolving power is calculated by dragging the mouse cursor from one edge of a peak to the other at half maximum height (this is called full width at half maximum [FWHM] resolution).

Calculation of the resolving power

left-mouse-button-click-dragging the mouse cursor will trigger the calculation of the resolving power of the instrument. That value is printed in the status bar.

Figure 2.20: Calculation of the resolving power

2.11 Calculating Isotopic Clusters with IsoSpec

When the resolution of the mass spectrometer is good, zooming-in on a mass peak may reveal that a given ion has given rise not to one peak but to a set of peaks. This set of peaks is called a isotopic cluster.

It is possible to predict how a given ion (of a given chemical formula) is supposed to be revealed in a mass spectrum, in the form of such an isotopic cluster. One such cluster is shown in Figure 2.21, “Calculation of the isotopic cluster of an analyte”, for the horse apomyoglobin protein, of elemental composition C769H1213N210O218S2 (this formula is typeset like this intentionally, to show how the formulæ need to be entered in the IsoSpec module.

Calculation of the isotopic cluster of an analyte

Calculated isotopic cluster for the apomyoglobin protein.

Figure 2.21: Calculation of the isotopic cluster of an analyte

As of version 5.8.0, mineXpert provides an interface to the libIsoSpec++ library.

IsoSpec: Hyperfast Fine Structure Calculator

Mateusz K. Łącki, Michał Startek, Dirk Valkenborg, and Anna Gambin

Analytical Chemistry, 2017, 89, 3272–3277

DOI: 10.1021/acs.analchem.6b01459

This library performs high-resolution isotopic cluster calculations. In order to run the calculations, it is necessary to have the following items ready:

  • An elemental composition formula of the analyte (for example, H2O1). This formula needs to account for the ionization agent that is involved in the ionization of the analyte prior to its detection in the mass spectrometer.

    Tip
    Tip

    The IsoSpec software requires that all the chemical elements of a chemical formula be indexed. This means that, for water, for example, the formula should be H2O1 (notice the index 1 after the O element symbol).

  • A detailed isotopic configuration of all the chemical elements that are used in the elemental composition formula. mineXpert provides two interfaces to define the isotopic characteristics of the chemical elements. These will be described in the following sections.

2.11.1 The IsoSpec Graphical User Interface in mineXpert

Generating iosotopic clusters using the IsoSpec software package is not easily carried over, in particular because this remarkable library is designed to be highly performant. The authors rightfully put their energy into optimizations for accuracy and speed instead of investing in a graphical user interface. mineXpert provides that graphical user interface, shown in Figure 2.22, “Isotopic cluster calculation dialog window”, that shows up upon selection of the UtilitiesIsotopic cluster calculatormenu.

Isotopic cluster calculation dialog window

The dialog window contains two panels. The left hand side panel configures the charge for which the calculation is to be carried over and the maximum cumulative isotopic presence probability that IsoSpec must reach during the calculation. The right hand side panel contains a tab widget that contains the configuration tabs and the results tab.

Figure 2.22: Isotopic cluster calculation dialog window

An isotopic cluster calculation is most probably performed with the aim of simulating an expected isotopic cluster for an analyte that is being analyzed by mass spectrometry. It is thus logical that the analyte be in an ionized form. The way that the analyte has been ionized needs to be taken into account in the chemical formula that describes the ion for which the isotopic cluster is being calculated. For example, when determining the chemical formula of a protein in the positive ion mode, the number of protons used to ionize the protein need to be included in the analyte elemental composition formula.

Warning
Warning

The IsoSpec software is charge-agnostic in the sense that it does not know what element in the chemical formula is responsible for the ionization of the analyte. Therefore, IsoSpec does not know of (and does not care about) the charge of the analyte. The ionization level of the analyte can be handled by mineXpert if that information is set to the Ion charge spin box widget. By default, the charge state of the analyte is 1.

The Max. cumulative probability spin box widget serves to configure the extent to which IsoSpec simulates the theoretically expected isotopic cluster. A value of 0.99 tells the software to simulate enough combinations of the analyte isotopes to represent 99 % of the theoretically expected combinations.

Tip
Tip

For large biopolymers, it might be prudent to start with a relatively low value for Max. cumulative probability, because setting this value too high near 1 would increase notably the calculation duration.

To perform isotopic cluster calculations, the simulation software needs to be aware of all the isotopes of all the chemical elements that enter in the composition of the ionized analyte. An isotope is defined by its mass and by the probability that it is found in nature. Carbon has two major isotopes that can be found in nature: the 12C most abundant isotope and the 13C least abundant isotope.

There are two ways that the user might configure the characteristics of the chemical elements that enter in the composition of their analyte. These two methods are reviewed in the next sections.

2.11.1.1 Element Tables Shipped within the IsoSpec Library

In order to document all the chemical elements' isotopes' characteristics, the IsoSpec library has in its own headers a number of arrays that mineXpert automatically loads up when opening the Figure 2.22, “Isotopic cluster calculation dialog window” dialog window. These data are displayed in the IsoSpec standard data tab. The table view widget is not editable. However, the user might save the selected rows to a TSV (tab-separated value format) file (click the Save selected to file) that can be edited using any spreadsheet program, like LibreOffice. The modified file can then be saved back to the same TSV text format and loaded back into mineXpert (click Load table from file).

2.11.1.2 Manual Configuration by the User in mineXpert

There is another way to provide IsoSpec with the detailed isotopic configuration of the chemical elements that enter in the chemical formula of the analyte: the user manual configuration. This method is slightly more involved than the previous one but provides also for a much greater flexibility: it allows one to create new chemical elements that might be required in specific labelling experiments. The manual configuration is carried over in the Manual configuration tab of the dialog, as shown in Figure 2.23, “Manual configuration of the chemical element isotopes”

Manual configuration of the chemical element isotopes

When the dialog is created, the tab is empty. To start creating element definitions, you click Add element.

Figure 2.23: Manual configuration of the chemical element isotopes

Upon creation of the dialog window, the Manual configuration tab is empty, with only two rows of buttons at the bottom of the tab. To start configuring chemical elements, you click Add element to create an element group box that contains a number of widgets organized in two rows:

  • Top row, a line edit widget to receive the chemical element symbol, C in the example;

  • A spin box widget in which to set the number of such atoms in the formula for which the isotopic cluster is being calculated. In the example, we set this value to 5;

  • A button with a minus image that removes all the element group box in one go;

  • The bottom row contains an isotope frame widget with two spin boxes for the mass of the isotope being configured (left) and its corresponding abundance (right);

  • In addition to the spin boxes, two buttons, with a plus or a minus figure, allow one to respectively add or remove isotope frames.

    Note
    Note

    It is not possible to remove all the isotope frames from an element group box, otherwise that group box would become useless.

Once an isotope frame has been filled-up, a new line might be required. To create a new isotope frame widget, click any plus-labelled button in any of the isotope frames. Once a new frame is created, the spin box widgets that it contains are set to 0.00000. Fill-in these spin boxes with mass and abundance and go on along this path to create as many isotopes as required.

Once all the isotopes for a given chemical element have been defined, a new element might be needed. For this, click Add element and start the configuration of the new element as described above.

The manual isotopic configuration of the chemical elements required to perform an isotopic cluster calculation for a given formula is tedious. The user may want to save a given configuration to a file (click Save configuration) so that it is easier to recreate automatically all the widgets upon loading of that saved configuration (click Load configutation).

The final configuration is shown in Figure 2.24, “Typical manual configuration of the isotopic characteristics of the chemical elements”. The experiment that was configured above is a labelling of a glucose molecule with Cz, an imaginary chemical element that is like carbon but that has a 14 isotope. The glucose molecule (normal formula: C6H12O6) is labelled on one single carbon atom with an efficieny of 95 %. This means that, when the labelling fails (in 5 % of the cases) the carbon atom has its isotopes with usual probabilities (compounded by the fact that the normal atom is found at hat position only in 5 % of the cases). The isotopic abundances for the Cz element are thus:

  • For the 12C isotope: 0.05 * normal 12C abundance;

  • For the 13C isotope: 0.05 * normal 13C abundance;

  • For the 14C isotope: 0.95;

Typical manual configuration of the isotopic characteristics of the chemical elements

The user has configured a labelling experiment where the glucose molecule is labelled at a single carbon position with a 14C atom (the efficiency of the labelling is 95 %).

Figure 2.24: Typical manual configuration of the isotopic characteristics of the chemical elements
Note
Note

Note that the normal carbon count is 5 (and not 6), that the hydrogen count is 13 (and not 12, because the glucose is protonated) and the labelling carbon is present only once.

2.11.1.3 The IsoSpec results are not shaped mass peaks

Once the configurations have been terminated, the calculations can finally be performed by the IsoSpec library. In the manual configuration setting, the formula is automatically handled since each chemical element that is defined goes along with the count of the correponding atoms. In the case of the standard IsoSpec configuration (either modified or not), the user has to enter the chemical formula of the analyte in the Formula line edit widget.

Click Run. If the configuration was correct and IsoSpec could run the calculation properly, then the dialog window switches to the IsoSpec results tab (Figure 2.25, “Results from the isotopic cluster calculation”). That tab contains a text edit widget in which the results are displayed.

Note that the m/z values calculated by IsoSpec are corrected for the charge level that was specified in the left panel of the dialog window prior to their display in the results tab (Figure 2.22, “Isotopic cluster calculation dialog window”).

Results from the isotopic cluster calculation

The IsoSpec library computes the relative probability of the various combinations of all the isotopes that make the chemical formula submitted to it. The results are in the form of centroid peak values along with corresponding probabilities. The sum of the probabilities corresponds to the Max. cumulative probability value that was set by the user.

Figure 2.25: Results from the isotopic cluster calculation

The results that are produced by IsoSpec represent the centroid peaks of the isotopic cluster. The results are thus a set of (m/z,i) pairs that have not the characteristic shape (the profile) that is found in mass spectra. mineXpert features the ability to give a shape to the centroids peaks. For that, click To peak shaper to open the Peak Shaper dialog window preloaded with the IsoSpec-generated peak centroids. The workings of this peak shaping feature is described in Section 2.12, “Shaping mass peak centroids into well-shaped peaks”.

2.12 Shaping mass peak centroids into well-shaped peaks

The shape of mass peaks is typically Gaussian or Lorentzian (or a mix thereof). There are some data simulation or analysis processes that lead to having mass peaks characterized by a single centroid m/z value and a corresponding intensity. Plotted to a graph, a centroid mass peak yields a bar. In order to convert mass centroid peaks into something that resembles a real ``profile'' mass spectrum, a mathematical formula can be applied (with some parameters) to configure the shapes generated. mineXpert now includes that feature accessible via the menu UtilitiesCentroid peak shaper. The window that is opened is shown in Figure 2.26, “Setting-up of the centroid mass peak shaping process”

Setting-up of the centroid mass peak shaping process

The dialog window allows one to configure the shaping of mass centroid peaks. Setting a spectrum name in the Mass spectrum name line edit widget will help recognize the result mass spectrum once displayed in the Mass spectrum window (see below).

Figure 2.26: Setting-up of the centroid mass peak shaping process

The mass centroid peaks are listed in the Data centroid points (m/z,i) text edit widget. These values are pasted there by the user or copied automatically from the isotopic cluster calculation dialog window (see Section 2.11.1.3, “The IsoSpec results are not shaped mass peaks”). The width of the profile mass peak is determined either by setting the resolution of the instrument (in the example that is set to 45000) or by setting the width of the peak at half maximum of its height (FWHM).

Tip
Tip

It might be useful to have and idea of the FWHM value for a given pair of m/z and resolution values when defining the parameters for the peak shaping process. Double-click-selecting a single mass peak centroid m/z value from the text edit widget will automatically compute the FWHM value and display it in the corresponding spin box widget (the resolution value is set to 0 automatically because the resolution and the FWHM values are mutually exclusive). Do not forget to set back the resolution value to the one initially set!

The spectrum that is generated can be of a Gaussian or a Lorentzian shape. That parameter is configured by selecting the corresponding radio button widget. The number of points used to actually craft the shape of the peak is configurable. In the example, that parameter is set to 150. When the calculation is performed by clicking the Execute button, the mass spectrum that is calculated is displayed as a list of (m/z,i) pairs in the Results tab of the dialog window. In that tab widget, the Display mass spectrum button make the spectrum available in the Mass-spectrum window (see Figure 2.27, “Spectrum created using the peak shaping feature”).

Spectrum created using the peak shaping feature

The spectrum corresponds to a combination of each individual spectrum obtained by shaping each individual mass centroid peak in the input data list.

Figure 2.27: Spectrum created using the peak shaping feature

Note that if the resolution asked is very high, the resulting shaped mass peaks might appear a bit hairy. By tweaking the Bin size value, the binning of the spectra might improve the situation. Otherwise, using the contextual menu in the mass spectrum graph to apply a Savitzky-Golay filter, described at Section 2.8.1.3, “Effects of the m/z integration parameters”, will certainly improve things. To achieve such a filtering process, right-click on the mass spectrum trace and choose the according menu item.

Tip
Tip

If the centroid peaks were not corrected for their charge in the previous generation step (as in the case of the isotopic cluster calculation, it is still time to apply this correction by setting the charge in the Charge spin box widget. If the charge was already accounted for, as described in Section 2.11.1.3, “The IsoSpec results are not shaped mass peaks”, then leave the charge to 1 and the results will be correct.

2.13 Recording the data mining work

When doing mass analysis work it is often desirable to store the painstakingly manually picked m/z or Mr values for later use. mineXpert provides a number of solutions to record the data mining work.

2.13.1 Feature labelling to the console window

The simplest way to record any graph feature is to point that feature with the mouse and press the L key. That key shortcut prints to the console window the coordinates of the current mouse cursor location. To be able to trace back the graph source of that (x,y) pair, the text is printed in the console using the same color as the graph whence the labelling action came. The console is actually a rich text format editor in which it is possible to edit the text contents so as to copy/paste them in the lab-book or an email to a colleague, for example. This is shown in Figure 2.28, “Recording the peak feature coordinates to the console”. The label operation described here does not require any previous integration operation. This is in contrast to the requirements of the mass spectral data analysis recording described below.

Recording the peak feature coordinates to the console

The text color identifies the graph being analyzed.

Figure 2.28: Recording the peak feature coordinates to the console
Note
Note: Single-graph vs Multi-graph

The label recording process works without ambiguity when the cursor is located in the single-graph plot widgets. However, when the cursor is located in the multi-graph plot widget (top part of the window displaying TIC chromatograms, mass spectra or drift spectra) then, only the graph(s) currently selected in the Loaded mass spectrum files window is(are) concerned by the label operation.

2.13.2 Recording the data mining discoveries

In order to record the innumerous analysis steps that make a data mining session, the File -> Analysis preferences menu might be called to display the window shown in  Figure 2.29, “Setting-up of the recording of the data mining discoveries”. In that window, the user can select the destination of the data analysis recording system: console, clipboard, file or any combination of the three. When selecting file recording, the user might specify if the recording whould overwrite any preexisting file or, instead, append to that file. Depending on the kind of graph where data mining occurs, the format of the data to be recorded needs to change. Indeed, it would make no sense to record the charge z when mining data in the Drift spectrum window. This is why the text format of the data export needs to be defined for each one of the three kinds of graphs: TIC/XIC chromatogram, mass spectrum or drift spectrum.

Before delving into the configuration intricacies, let us tell immediately how to trigger the recording of the mining discoveries: using the Space bar.

Setting-up of the recording of the data mining discoveries

It is possible to configure the recording system to record to either the console, the clipboard, a file (in append mode or in overwrite mode) or any combination thereof. The format of the string is defined using special characters (see text) and might be defined specifically for the three main graphs: TIC/XIC chromatogram, mass spectrum and drift spectrum.

Figure 2.29: Setting-up of the recording of the data mining discoveries

The format used to define the text string to be stored on console and/or in file can contain particular tokens as described below:

  • %f : mass spectrometry data file name

  • %X : value on the X axis of the graph (no unit). For a drift spectrum, that would be drift times in milliseconds, for a mass spectrum, that would be m/z values, for a TIC/XIC chromatogram, that would be retention times in minutes;

  • %Y : value on the Y axis of the graph (no unit). In all the graph plots, that would be intensities in any unit provided by the mass spectrometer (typically, counts);

  • %x : delta value on the X axis (when appropriate, no unit)

  • %y : delta value on the Y axis (when appropriate, no unit)

  • %I : TIC intensity after TIC integration over a graph range

  • %s : X axis range start for computation (where applicable, for example for the TIC integration to a single value;)

  • %e : X axis range end for computation (where applicable);

  • For mass spec plot widgets:

Note
Note

It is important to keep in mind that the %z and %M format strings can only work if the user is actually analyzing a mass spectrum and if the user has effectively performed a deconvolution operation that has allowed computing these two values. If the values are not available, the program shows nan (not a number) in the textual output upon hitting the space bar (see below).

In the drift spectrum window, the data recording processes data matching the cursor position at the last left-mouse-button-single-click. The program tries to define the intensity by looking at the graph ordinate (y axis) matching the nearest abscissa point (x axis) to the last left-mouse-button-clicked location.

Also, as stated above for the simple labelling of cursor location points (see Section 2.13.1, “Feature labelling to the console window”), the recording of data analysis steps work both in the multi-graph plot widgets (those at the top of the plot windows) and in the single-graph plot widgets (those at the bottom of the windows). When doing data analysis in the top multi-graph plot widget, it is necessary to select the traces to be analyzed in the Loaded mass spectrum files window, otherwise no data will be recorded. This is of course not necessary when working in bottom plot widgets because, in that case, there is no ambiguity on what data to record.

Once configured, the format strings might be stored in a drop down box for later use. To that end, click onto the Add to history button while having the format text displayed in the text editor and it will be appended to the drop-down list. The list gets stored when the dialog window is closed and will be filled-up again when the program is restarted.

As an example, if the user defined the following format string for a mass spectrum graph:


Mass spec. :
mz = (%X, %Y) z = %z
filename = %f
date = 20161021
session = 20161021
mslevel = 1 msion = esi msanal = tof
chrom = DEAE fraction = 25
seq =  pos =     oxlevel = 0 pos =
intensity =
comment =

Then, a resulting data mining stanza that would be recorded will look like this:


Mass Spec. :
mz = (1051.8, 50863) z = 1
filename = 20161017-rusconi-frac-25-deae-20160712.db
date = 20161021
session = 20161021
mslevel = 1 msion = esi msanal = tof
chrom = DEAE fraction = 25
seq =  pos =     oxlevel = 0 pos =
intensity =
comment =

Interestingly, the user can define any kind of format, leaving fields available for later filling-in. This feature is of immense value when the analysis file is used later to fill-in a database for easy storage and interrogation of the mining discoveries. In this case, it would be useful to have the file opened in an editor and at each new stanza edit the comment field if something needs to be commented, like the shape/intensity of a mass peak, for example.

Note that the program closes the file each time a new stanza has been written. This makes it possible to edit that file safely in between each stanza record. Remember to force the editor to reload the file from disk after each mining discovery recording.

When the recording involves sending the analysis data to the console, the data are sent to it as text colored the same as the spectrum that was under scrutiny.

When the mouse cursor has been placed at the proper location on the graph (with or without left-mouse-button-click-dragging, depending on the situation), the user hits the space bar and the data analysis stanza is recorded to the selected destination(s): console, clipboard, file.

2.14 Splitting files into smaller chunks

At the moment, this feature is only available for mass data files in the SQLite3 format, as obtained via the mineXpert conversion feature described at Section 2.15, “Converting mzML to SQLite3.

Mass spectrometry data acquisitions performed in line with a chromatography setup generally yields massive data files holding mass data acquired all along a chromatographic development. Depending on the application, the data in the obtained file might not be of interest throughout all the acquisition duration. For example, a size exclusion chromatography might have resolved the molecular species of interest only in a very short retention time range. In this case, it might be useful to extract the data corresponding to that retention time range of interest from the initial very large file and store them in another file that will be sufficiently light to be loaded and analyzed quickly and easily.

Note
Note

The data file slicing feature is called by using the Export -> Data contextual menu ( right-mouse-button-click the plot widget of interest) either from the TIC chromatogram plot widget corresponding to the file to be sliced or from the Drift spectrum plot widget of interest.

mineXpert can export data according to various modes, as illustrated in Figure 2.30, “Setting-up of the data export”.

Setting-up of the data export

It is possible to split very large files into smaller chunk files. In this example, the number of slices to be created is 3. The program computes automatically the size of the chunks by looking at the rangeStart and rangeEnd values.

Figure 2.30: Setting-up of the data export

The easiest operation is to first zoom in the relevant data in the TIC chromatogram plot widget and leave the default Export the whole displayed region in a single file checkbox checked. This process basically prunes all the data outside of the currently zoomed-in region. It is possible to export a TIC chromatogram range different than the currently displayed one by entering the Range start and Range end values in the respective spin boxes. For this, uncheck the Export the whole displayed region in a single file checkbox and set the Slice count value to 1.

Another way to operate the slicer is to uncheck the Export the whole displayed region in a single file checkbox and define either the number of slices to be generated or their size. Depending on the slicing configuration, the program will calculate the missing configuration bits to perform the required action. If the user specifies the number of slices, the size of the slices is automatically calculated. Conversely, if the user specifies the size of the slices, the number of slices is deduced.

Note
Note

Note that the Range start and Range end values, corresponding to the limits of the currently zoomed-in data range, are always honored when performing the slice number or slice size computations mentioned above.

The file naming pattern used for the various data files generated in the process of the data export is governed by the format string displayed at the top of the window. By default, the generated files are located in the same directory as the source data file. If a new directory is to be used, it can be selected by pressing the Directory... push button.

Once the configuration is done, press the Validate push button to have a preview of the various file names that have been choosen for the various data slices to be written to. If the configuration is correct, click the Confirm and start data export button.

Note that the data export feature can only be used with the following two data ranges:

  • Retention time ranges: the data export configuration window must be activated from one of the TIC chromatogram plot widgets;

  • Ion mobility drift time ranges: the data export must be triggered from one of the Drift spectrum plot widgets;

This is because it does not make sense to split up a given data file into smaller files on the basis of m/z data ranges.

2.15 Converting mzML to SQLite3

The mzML format is very verbose and parsing it causes a notable delay during loading of mass spectrometry files. mineXpert allows one to convert mzML files to a private open file format based on the SQLite3 database software.

Using the SQLite3 format also allows to slice very large data files into smaller files on the basis of user-selected criteria (see Section 2.14, “Splitting files into smaller chunks”).

For this feature, mineXpert must be run in a system console window. To show a detailed help, type the following:

minexpert --help

Use the following parameters (or flags) to perform a data file conversion:

minexpert -x -o <db file name> <mzML file name>

For example, to convert file test-file.mzml into test-file.db, the command line would be:

minexpert -x -o test-file.db test-file.mzml

Alternatively, the -o flag can specify a directory, in which case the new file name is crafted from the mzML file and written into that directory. In that case, the extension of the mzML file needs to be either mzML or mzml for the automatic renaming to occur.

Note that in batch conversion is possible using this kind of command line:

minexpert -x -o /tmp /home/<user>/lab/mzml/*.mzml

In this case, automatic file renaming happens and the new db files are all stored in the tmp directory.



[7] TIC stands for total ion current.

Print this page