libXpertMass Developer Documentation
  • libXpertMass
  • Formula
  • Formula Class

    class MsXpS::libXpertMass::Formula

    The Formula class provides sophisticated abstractions to work with formulas. More...

    Header: #include <Formula.hpp>
    Inherited By:

    MsXpS::libXpertMass::CrossLinker, MsXpS::libXpertMass::FragRule, MsXpS::libXpertMass::FragSpec, MsXpS::libXpertMass::IonizeRule, MsXpS::libXpertMass::Modif, and MsXpS::libXpertMass::Monomer

    Public Functions

    Formula(const QString &formula = QString())
    Formula(const Formula &other)
    virtual ~Formula()
    std::size_t accountFormula(const QString &text, IsotopicDataCstSPtr isotopic_data_csp, double times = 1)
    virtual bool accountMasses(IsotopicDataCstSPtr isotopic_data_csp, double *mono = Q_NULLPTR, double *avg = Q_NULLPTR, double times = 1)
    virtual bool accountMasses(IsotopicDataCstSPtr isotopic_data_csp, Ponderable *ponderable, double times = 1)
    bool accountSymbolCounts(IsotopicDataCstSPtr isotopic_data_csp, int times)
    QChar actions() const
    void appendFormula(const QString &formula)
    bool checkSyntax() const
    void clear()
    QString elementalComposition(std::vector<std::pair<QString, double>> *symbol_count_pairs_p = nullptr) const
    const std::map<QString, double> getSymbolCountMap() const
    bool hasNetMinusPart()
    bool renderXmlFormulaElement(const QDomElement &element)
    void setForceCountIndex(bool forceCountIndex)
    void setFormula(const QString &formula)
    void setFormula(const Formula &formula)
    double symbolCount(const QString &symbol) const
    virtual QString toString() const
    double totalAtoms() const
    double totalIsotopes(IsotopicDataCstSPtr isotopic_data_csp) const
    virtual bool validate(IsotopicDataCstSPtr isotopic_data_csp)
    virtual bool validate(IsotopicDataCstSPtr isotopic_data_csp, bool store, bool reset)
    virtual bool operator!=(const Formula &other) const
    virtual Formula &operator=(const Formula &other)
    virtual bool operator==(const Formula &other) const

    Static Public Members

    QChar actions(const QString &formula)
    bool checkSyntax(const QString &formula, bool force_count_index = false)

    Protected Functions

    double accountSymbolCountPair(const QString &symbol, double count = 1)
    const QString &minusFormula() const
    const QString &plusFormula() const
    int removeSpaces()
    int removeTitle()
    void setMinusFormula(const QString &formula)
    void setPlusFormula(const QString &formula)
    int splitActionParts(IsotopicDataCstSPtr isotopic_data_csp, double times = 1, bool store = false, bool reset = false)

    Protected Variables

    bool m_forceCountIndex
    QString m_formula
    QString m_minusFormula
    QString m_plusFormula
    std::map<QString, double> m_symbolCountMap
    IsotopicDataCstSPtr mcsp_isotopicData

    Static Protected Members

    QRegularExpression subFormulaRegExp

    Detailed Description

    There are two peculiarities with this Formula implementation:

    The actionformula: the main textual element in this Formula class is the actionformula (member m_formula). A formula is the description of the atomic composition of a compound. For example, the string C2H6 is a formula. While the previous C2H6 example describes a static chemical object, a Formula can also describe a dynamic chemical event, like a reaction, by describing what chemical entities are gained by the molecule during the chemical reaction (the "plus" component of the actionformula) and what chemical entities are lost by the molecule (the "minus" component). For example, an acetylation reaction can be described by the loss of H2O with gain of CH3COOH. The net chemical gain on the molecule will be CH3CO. In this example, one would thus define an actionformula in the following way: -H20+CH3COOH. The "minus" formula associated with the '-' action accounts for the leaving group of the reaction, while the "plus" formula associated with the '+' action accounts for the entering group of the reaction. Note that there is no limitation on the amount of such actions, as one could have an action formula like this -H+CO2-H2O+C2H6. An actionformula does not need to have any action sign (+ or -), and if it has no sign, the actionformula is a plus-signed formula by default, which is what the reader would expect for a standard formula.

    The title: the actionformula may be documented with a title: a prefix text enclosed in double quotes, like the following: "Decomposed adenine" C5H4N5 +H. This documentation element is called the title. Note that the presence of a title in a formula does not change anything to its workings as long as the title is effectively enclosed in double quotes. The title is by no means a required textual element for an actionformula to work correctly. It is mainly used in some particular context, like the calculator. An actionformula behaves exactly the same as a simple formula from an end user perspective. When created, a Formula has its formula string containing the formula (be it a pure formula or an actionformula). Behind the scenes, functions are called to separate all the '+'-associated formulas from all the '-'-associated formulas so that masses are correctly associated to each "leaving" or "entering" chemical groups. Formulas that are '-'-associated are stored in the so-called "minus formula", while '+'-associated ones are stored in the "plus formula". Note that all the formulas in Formula are QString objects.

    Upon parsing of the formula, the m_minusFormula and the m_plusFormula members are populated with formulas (in the -H+CO2-H2O+C2H6 example, the "minus formula" would contain "H1H2O", while the "plus formula" would contain "CO2C2H6") and these are next used to account for the net formula.

    Member Function Documentation

    Formula::Formula(const QString &formula = QString())

    Constructs a formula initialized with the formula actionformula string.

    formula needs not be an actionformula, but it might be an actionformula. This formula gets copied into the m_formula without any processing afterwards.

    See also setFormula().

    Formula::Formula(const Formula &other)

    Constructs a formula as a copy of other.

    The copy is deep with all the data copied from other to the new formula. There is no processing afterwards.

    [virtual noexcept] Formula::~Formula()

    Destructs this formula.

    There is nothing to be delete explicitly.

    std::size_t Formula::accountFormula(const QString &text, IsotopicDataCstSPtr isotopic_data_csp, double times = 1)

    Accounts the actionformula in text using isotopic_data_csp as reference data using times as a compounding factor.

    The text formula is converted into a temporary Formula and processed:

    Returns the member symbol/count m_symbolCountMap size.

    [virtual] bool Formula::accountMasses(IsotopicDataCstSPtr isotopic_data_csp, double *mono = Q_NULLPTR, double *avg = Q_NULLPTR, double times = 1)

    Accounts this formula's monoisotopic and average masses into mono and avg, using times as a compounding factor.

    The masses corresponding to the member actionformula m_formula are calculated first and then the mono and avg parameters are updated by incrementing their value with the calculated values. This incrementation might be compounded by that times factor.

    The masses of m_formula are computed using data from isotopic_data_csp.

    Returns true if no error was encountered, false otherwise.

    See also splitActionParts().

    [virtual] bool Formula::accountMasses(IsotopicDataCstSPtr isotopic_data_csp, Ponderable *ponderable, double times = 1)

    Accounts this formula's monoisotopic and average masses into ponderable, using times as a compounding factor.

    This function uses accountMasses().

    See also splitActionParts().

    [protected] double Formula::accountSymbolCountPair(const QString &symbol, double count = 1)

    Accounts for symbol and corresponding count in the member map.

    The m_symbolCountMap relates each atom (chemical element) symbol with its occurrence count as encountered while parsing the member actionformula.

    If the symbol was not encountered yet, a new key/value pair is created. Otherwise, the count value is updated.

    Returns the new count status for symbol.

    bool Formula::accountSymbolCounts(IsotopicDataCstSPtr isotopic_data_csp, int times)

    Accounts this Formula's actionformula m_formula in the symbol / count member m_symbolCountMap.

    Calls splitActionParts() to actually parse m_formula and account its components to m_symbolCountMap. The accounting of the symbol / count can be compounded by the times factor.

    While splitting the "plus" and "minus" components of the actionformula, their validity is checked against the reference isotopic data isotopic_data_csp.

    This function is used when processively accounting many different formulas into the symbol / count map. The formula is set to a new value and this function is called without resetting the symbol / count map, effectively adding formulas onto formulas sequentially.

    Returns true if no error was encountered, false otherwise.

    See also splitActionParts() and Polymer::elementalComposition().

    [static] QChar Formula::actions(const QString &formula)

    Returns '+' if formula only contains "plus" elements or '-' if at least one "minus" element was found.

    If formula contains no sign at all, then it is considered to contain only '+' elements and the function returns '+'. If at least one element is found associated to a '-', then the "minus" action prevails and the function returns '-'.

    See also actions() and splitActionParts().

    QChar Formula::actions() const

    Calls actions(const QString &formula) on this Formula's actionformula m_formula. Returns '+' if it only contains "plus" elements or '-' if at least one "minus" element was found.

    If m_formula contains no sign at all, then it is considered to contain only '+' elements and the function returns '+'. If at least one element is found associated to a '-', then the "minus" action prevails and the function returns '-'.

    See also actions(const QString &formula) and splitActionParts().

    void Formula::appendFormula(const QString &formula)

    Appends to this formula the formula.

    The formula string is appended to the m_formula without check. No processing is performed afterwards. The formula is copied to a temporary formula that is stripped of its spaces, both in the formula and before and after it before it is appended to m_formula.

    [static] bool Formula::checkSyntax(const QString &formula, bool force_count_index = false)

    Returns true if the formula actionformula is syntactically valid, false otherwise.

    If force_count_index is true, the syntax check accounts for the requirement that all the symbols in the formula must be indexed, even if that symbol's count is 1. This means that H2O would not pass the check, while H2O1 would.

    The formula is first stripped of its title (if any), then all the spaces are removed.

    MsXpS::libXpertMass::Formula::subFormulaRegExp is then used to extract each "plus" and / or "minus" component while checking its syntactic validity.

    Note: The syntax checking code does not verify that the actionformula is chemically valid, that is, the "Cz4" symbol / count pair would check even if the Cz chemical element does not exist.

    See also validate().

    bool Formula::checkSyntax() const

    Returns true if the member actionformula is syntactically valid, false otherwise.

    See also checkSyntax(const QString &formula, bool force_count_index).

    void Formula::clear()

    Clears all the formula member data.

    QString Formula::elementalComposition(std::vector<std::pair<QString, double>> *symbol_count_pairs_p = nullptr) const

    Returns a formula matching the contents of the symbol / count member map.

    The returned formula is formatted according to the IUPAC convention about the ordering of the chemical elements: CxxHxxNxxOxxSxxPxx.

    The "plus" components are output first and the "minus" components after.

    If symbol_count_pairs_p is not nullptr, each symbol / count pair is added to it.

    const std::map<QString, double> Formula::getSymbolCountMap() const

    Returns a reference to the atom symbol / count member map.

    bool Formula::hasNetMinusPart()

    Returns true if the member "minus" formula component is not empty, false otherwise.

    [protected] const QString &Formula::minusFormula() const

    Returns the m_minusFormula formula.

    See also setMinusFormula().

    [protected] const QString &Formula::plusFormula() const

    Returns the m_plusFormula formula.

    See also setPlusFormula().

    [protected] int Formula::removeSpaces()

    Removes all the space characters from the member actionformula.

    Spaces can be placed anywhere in formula for more readability. However, it might be required that these character spaces be removed. This function does just this, using a QRegularExpression.

    Returns the number of removed characters.

    [protected] int Formula::removeTitle()

    Removes the title from the member actionformula.

    The title of a formula is the string, enclosed in double quotes, that is located in front of the actual chemical actionformula. This function removes that title string from the member actionformula using a QRegularExpression.

    Returns the count of removed characters.

    bool Formula::renderXmlFormulaElement(const QDomElement &element)

    Parses a formula XML element, sets the data to the member actionformula m_formula and checks it syntax.

    Returns true if parsing and syntax checking were successful, false otherwise.

    See also checkSyntax().

    void Formula::setForceCountIndex(bool forceCountIndex)

    Sets m_forceCountIndex to forceCountIndex.

    When a formula contains a chemical element in a single copy, it is standard practice to omit the count index: H2O is the same as H2O1. If forceCountIndex is true, then the formula has to be in the form H2O1. This is required for some specific calculations.

    void Formula::setFormula(const QString &formula)

    Sets the actionformula formula to this Formula.

    The formula is copied to this m_formula. No other processing is performed afterwards.

    void Formula::setFormula(const Formula &formula)

    Sets the actionformula from formula to this Formula.

    The actionformula from formula is copied to this m_formula. No processing is performed afterwards.

    [protected] void Formula::setMinusFormula(const QString &formula)

    Sets the m_minusFormula formula to formula.

    See also minusFormula().

    [protected] void Formula::setPlusFormula(const QString &formula)

    Sets the m_plusFormula formula to formula.

    See also plusFormula().

    [protected] int Formula::splitActionParts(IsotopicDataCstSPtr isotopic_data_csp, double times = 1, bool store = false, bool reset = false)

    Tells the "plus" ('+') and "minus" ('-') parts in the member actinformula.

    Parses the m_formula actionformula and separates all the minus components of that actionformula from all the plus components. The different components are set to their corresponding formula (m_minusFormula and m_plusFormula).

    At the end of the split work, each sub-formula (plus- and/or minus-) is actually parsed for validity, using the isotopic_data_csp IsotopicData as reference.

    If times is not 1, then the accounting of the plus/minus formulas is compounded by this factor.

    If store is true, the symbol/count data obtained while parsing of the plus/minus actionformula components are stored (m_symbolCountMap).

    If reset is true, the symbol/count data are reset before the parsing of the actionformula. Setting this parameter to false may be useful if the caller needs to "accumulate" the accounting of the formulas.

    The parsing of the actionformula is performed by performing its deconstruction using m_subFormulaRegExp.

    Returns MsXpS::libXpertMass::FormulaSplitResult::FAILURE if the splitting failed, MsXpS::libXpertMass::FormulaSplitResult::HAS_PLUS_COMPONENT if at least one of the components of the actionformula was found to be of type plus, MsXpS::libXpertMass::FormulaSplitResult::HAS_MINUS_COMPONENT if at least one of the components of the actionformula was found to be of type minus. The result can be an OR'ing of both values (MsXpS::libXpertMass::FormulaSplitResult::HAS_BOTH_COMPONENTS) in the m_formula actionformula.

    double Formula::symbolCount(const QString &symbol) const

    Returns the count value associated with key symbol in the symbol / count member map m_symbolCountMap.

    [virtual] QString Formula::toString() const

    Returns the actionformula.

    double Formula::totalAtoms() const

    Returns the total count of symbols (atoms) in this formula.

    The determination is performed by summing up all the count values for all the symbols in the member symbol / count pairs in the member map m_symbolCountMap.

    double Formula::totalIsotopes(IsotopicDataCstSPtr isotopic_data_csp) const

    Returns the total count of isotopes in this formula using isotopic_data_csp as the reference isotopic data.

    The determination is performed by summing up all the isotope counts for all the symbols keys in the member symbol / count map m_symbolCountMap.

    [virtual] bool Formula::validate(IsotopicDataCstSPtr isotopic_data_csp)

    Returns true if the formula validates successfully, false otherwise.

    The polymorphic function validate(IsotopicDataCstSPtr isotopic_data_csp, bool store, bool reset) is called with both arguments set to false.

    The validation of this Formula instance is performed against the isotopic_data_csp isotopic reference data.

    [virtual] bool Formula::validate(IsotopicDataCstSPtr isotopic_data_csp, bool store, bool reset)

    Returns true if the formula validates successfully, false otherwise.

    The validation of the formula involves:

    If store is true, the symbol / count data obtained while splitting the "plus" and "minus" components of the actionformula are stored in the member m_symbolCountMap map.

    If reset is true, the member symbol / count is first reset.

    isotopic_data_csp are the isotopic data used as reference to ensure chemical validity of the formula components.

    [virtual] bool Formula::operator!=(const Formula &other) const

    Returns true if this Formula and other are different, false otherwise.

    The comparison is only performed on the m_formula actionformula, not on any other member data that derived from processing of m_formula.

    [virtual] Formula &Formula::operator=(const Formula &other)

    Initializes all the member data of this formula by copying to it the data from other.

    The copy is deep with all the data from other being copied into this formula.

    There is no processing afterwards.

    [virtual] bool Formula::operator==(const Formula &other) const

    Returns true if this Formula and other are identical, false otherwise.

    The comparison is only performed on the m_formula actionformula, not on any other member data that derived from processing of m_formula.

    Member Variable Documentation

    bool Formula::m_forceCountIndex

    This variable holds the m_forceCountIndex tells if when defining a chemical composition formula, the index '1' is required when the count of a symbol is not specified and thus considered to be '1' by default. If true, water should be described as "H2O1", if false, it might be described as "H2O".

    QString Formula::m_formula

    String representing the actionformula.

    QString Formula::m_minusFormula

    String representing the "minus" component of the main m_minusFormula.

    This member datum is set upon parsing of m_formula.

    QString Formula::m_plusFormula

    String representing the "plus" component of the main m_formula.

    This member datum is set upon parsing of m_formula.

    std::map<QString, double> Formula::m_symbolCountMap

    Map relating the symbols (as keys) found in the formula and their counts (atoms, in fact, as values).

    Note that the count value type is double, which allows for interesting things to be done with Formula. Also, the count value might be negative if the net mass of an actionformula is negative.

    See also Formula::splitActionParts().

    IsotopicDataCstSPtr Formula::mcsp_isotopicData

    This variable holds the isotopic data that the formula is based on.

    QRegularExpression Formula::subFormulaRegExp

    Regular expression used to deconstruct the main formula into minus and plus component subformulas.

    "([+-]?)([A-Z][a-z]*)(\\d*[\\.]?\\d*)"

    See also Formula::splitActionParts().