In this tutorial, we use TopPIC suite to analyze two top-down MS/MS data files on a computer with a Windows 10 operating system. Annotated proteoform spectrum matches (PrSMs) identified by TopPIC from the data files can be browsed here.
In this tutorial, we use TopPIC suite to analyze two top-down MS/MS data files on a computer with a Windows 10 operating system. Annotated proteoform spectrum matches (PrSMs) identified by TopPIC from the data files can be browsed here.
Create the folders below for software packages and data sets used in this tutorial.
The resulting folder structure is shown in the screenshot below.
Msconvert is a software tool in ProteoWizard that converts raw files into various spectrum file formats.
Microsoft .NET Framework 4.0 or a higher version is required. If you are using Windows 10, Microsoft .NET Framework 4.0 has been preinstalled.
Follow the steps below to download ProteoWizard:
In this tutorial, we will use TopFD and TopPIC to analyze a top-down MS/MS data set of Salmonella typhimurium for proteoform identification.
In the MS experiment, the protein extract of S. typhimurium was reduced with dithiothreitol and alkylated with iodoacetamide. The protein mixture was first separated by gas-phase fractionation, resulting in 7 fractions. Each fraction was separated by an HPLC system coupled to an LTQ-Orbitrap mass spectrometer (Thermo Fisher Scientific). MS and MS/MS spectra were collected at a resolution of 60,000 and 30,000, respectively. In this tutorial, we use only the data files of two fractions (st_1.raw and st_2.raw).
Click here to download the data set, save it in the folder C:\toppic_tutorial\tutorial_1\, and unzip it in the same folder.
A S. typhimurium proteome database of 1,799 proteins was downloaded from the UniProt database.
Click here to download the protein database and save it in the folder C:\toppic_tutorial\tutorial_1\.
The folder C:\toppic_tutorial\tutorial_1\ is shown in the screenshot below.
We use MSConvertGUI to convert the raw files st_1.raw and st_2.raw to mzML files.
The screenshot of MSConvertGUI is shown below.
In the above file format conversion, the peak picking filter (step 3) is used to generate centroid, not profile, mzML data files, which are required by the spectral deconvolution tool TopFD.
The resulting mzML files are
C:\toppic_tutorial\tutorial_1\st_1.mzML
and
C:\toppic_tutorial\tutorial_1\st_2.mzML
The sizes of the two files are about 41 MB and 47 MB, respectively. They can be downloaded here. The running time for the file format conversion is less than one minute.
We use topfd_gui for top-down mass spectral deconvolution.
The screenshot of topfd_gui is shown below.
TopFD reports eight text files and four folders.
The output files and folders can be downloaded here.
We use toppic_gui to search the MS/MS spectra in st_1_ms2.msalign and st_2_ms2.msalign against the protein database uniprot-st.fasta to identify PrSMs.
The screenshots of toppic_gui are shown below.
For each input msalign file, TopPIC reports two csv files, an xml file, and collections of html files for identified proteoforms. For example, the output files for st_1_ms2.msalign are
In addition, the identifications reported for st_1_ms2.msalign and st_2_ms2.msalign are combined, and filtered by a 1% spectrum-level FDR and a 1% proteoform level FDR. The combined results are reported in the following files.
In the analysis, C57 is selected as the fixed modification because proteins were reduced with dithiothreitol and alkylated with iodoacetamide before the MS experiment. When proteins are not reduced, C0 should be selected.
A shuffled decoy database is concatenated to the target database to estimate spectrum level and proteoform level FDRs. All identified PrSMs are first filtered by a 1% spectrum level FDR and the resulting PrSMs are reported in the file combined_ms2_toppic_prsm.csv. The proteoforms corresponding to the PrSMs are further filtered using a 1% proteoform level FDR and the resulting proteoforms and their corresponding best PrSMs are reported in the file combined_ms2_toppic_proteoform.csv. Microsoft Excel can be used to open these two files. To browse the PrSM identifications, go to the folder combined_html\topview and use Google Chrome (Windows IE and Firefox are not recommended) to open the file index.html.
The output files can be downloaded here.
We use topfd for top-down mass spectral deconvolution.
cd c:\toppic_tutorial\tutorial_1
..\toppic\topfd st_*.mzML
We use toppic to search the MS/MS spectra in st_1_ms2.msalign and st_2_ms2.msalign against the protein database uniprot-st.fasta to identify PrSMs.
cd c:\toppic_tutorial\tutorial_1
..\toppic\toppic -f C57 -d -t FDR -T FDR -c combined uniprot-st.fasta st_*_ms2.msalign
We will use TopMG to analyze the data set st_1.raw described in Tutorial 1. TopMG is still in the development stage. Please let us know if you find any bugs in it..
The description of the data file and its preprocessing steps can be found in Sections 4.1 - 4.4. Click here to download the data files used in the analysis, save it in the folder C:\toppic_tutorial\tutorial_2\, and unzip it. It includes the following files.
The screenshots of topmg_gui are shown below.
TopMG reports two cvs files, an xml file, and collections of html files for identified proteoforms.
The output files can be downloaded here.
To browse the PrSM identifications, go to the folder st_1_html\topview and use Google Chrome (Windows IE and Firefox are not recommended) to open the file index.html.
cd c:\toppic_tutorial\tutorial_2
..\toppic\topmg -f C57 -d -t FDR -v 0.05 -T FDR -V 0.05 -i variable_mods.txt uniprot-st.fasta st_1_ms2.msalign
We will use TopPIC and TopDiff to compare the abundance of proteoforms and find differentially expressed proteoforms using two MS data files of Escherichia coli cells (ecoli_1.raw and ecoli_2.raw).
In the MS experiment, the protein extract of E. coli was reduced with dithiothreitol and alkylated with iodoacetamide. The protein mixture was separated by capillary zone electrophoresis and analyzed by an LTQ-Orbitrap mass spectrometer (Thermo Fisher Scientific). Technical duplicates were generated for testing proteoform quantification in two runs of the same sample.
The raw data files were processed following the steps found in Sections 4.1 - 4.4. Click here to download the data files used in the analysis, save it in the folder C:\toppic_tutorial\tutorial_3\, and unzip it. It includes the following files.
We use toppic_gui to search the MS/MS spectra in ecoli_1_ms2.msalign and ecoli_2_ms2.msalign against the protein database uniprot-ecoli.fasta to identify PrSMs.
The screenshots of toppic_gui are shown below.
For each input msalign file, TopPIC reports two csv files, an xml file, and collections of html files for identified proteoforms. As a result, the output files for ecoli_1_ms2.msalign, ecoli_2_ms2.msalign are
The output files can be downloaded here.
The screenshots of topdiff_gui are shown below.
TopDiff reports one csv file for identified proteoforms with their abundances in the input mass spectrum data
C:\toppic_tutorial\tutorial_3\sample_diff.csv
The output file can be downloaded here.
cd c:\toppic_tutorial\tutorial_3
..\toppic\toppic -f C57 -d -t FDR -T FDR uniprot-ecoli.fasta ecoli_*_ms2.msalign
cd c:\toppic_tutorial\tutorial_3
..\toppic\topdiff -f C57 unprot-ecoli.fasta ecoli_1_ms2.msalign ecoli_2_ms2.msalign