Skip to content
/ MVP Public

Multi-View Projection for metabolite annotation

Notifications You must be signed in to change notification settings

HassounLab/MVP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

title emoji colorFrom colorTo sdk app_file pinned short_description python_version
MVP
🏆
blue
pink
streamlit
app.py
false
msms annotation tool
3.11.7

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

MultiView Projection (MVP) for Spectra Annotation

Yan Zhou Chen, Soha Hassoun

Department of Computer Science, Tufts University

This repository provides the implementation of MultiView Projection (MVP). MVP can be used to rank a set of molecular candidates given a spectrum.

Table of Contents

  1. [Install & setup]
  2. [Data prep]
  3. [MassSpecGym data download]
  4. [Use our pretrained model]
  5. [Training from scratch]
  6. [References]

Install & setup

  1. Clone the repository: git clone <REPO_link>
  2. Install evironment or only key packages:
conda env create -f environment.yml

Key packages

  • python
  • dgl
  • pytorch
  • rdkit
  • pytorch-geometric
  • numpy
  • scikit-learn
  • scipy
  • massspecgym
  • lightning

Data prep

We provide sample spectra data and candidates in data/sample. For preprocessing:

  1. If using formSpec, compute subformula labels
  2. Run our preprocess code to obatain fingerprints and consensus spectra files
# If using formSpec
python subformula_assign/assign_subformulae.py --spec-files ../data/sample/data.tsv --output-dir ../data/sample/subformulae_default --max-formulae 60 --labels-file ../data/sample/data.tsv
python data_preprocess.py --spec_type formSpec --dataset_pth ../data/sample/data.tsv --candidates_pth  ../data/sample/candidates_mass.json --subformula_dir_pth ../data/sample/subformulae_default/ --output_dir ../data/sample/

# If using binnedSpec
python data_preprocess.py --spec_type binnedSpec --dataset_pth ../data/sample/data.tsv --candidates_pth  ../data/sample/candidates_mass.json --output_dir ../data/sample/

We include sample subformula, fingerprint, and consensus spectra data in ../data/sample/.

Use our pretrained model

You can use our pretrained model (on MassSpecGym) to rank molecular candidates by providing the spectra data and a list of candidates.

After prepping your data, modify the params_binnedSpec.yaml or params_formSpec.yaml with your dataset paths:

# If using formSpec
python test.py --param_pth params_formSpec.yaml

# If using binnedSpec
python test.py --param_pth params_binnedSpec.yaml

We provide a notebook showing sample result files in notebooks/demo.ipynb

MassSpecGym data download

Our model is trained on MassSpecGym dataset. Follow their instruction to download the spectra and candidate dataset.

You can preprocess the MassSpecGym dataset as descirbed in the above section or download the preprocessed files as follows:

mkdir data/msgym/
cd data/msgym
wget 
wget 

Training from scratch

To train a model from scratch:

  1. Prepare data as described in the data prep section
  2. Modify the configuration in params file as necessary
  3. Train using the following
# If using formSpec
python train.py --param_pth params_formSpec.yaml

# If using binnedSpec
python train.py --param_pth params_binnedSpec.yaml

References

Contact

About

Multi-View Projection for metabolite annotation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages