Python and the holy grail: A proteomics venture with scripting
This is a poster for Monty Python and the Holy Grail. http://www.movieposterdb.com/poster/9961b8f1

Python and the holy grail: A proteomics venture with scripting

Freaking happy with the data analysis! Got a case that requires to determine the sequence of a truncated protein by top-down. However, it turns out it is highly glycosylated and the fragmentation pattern is driving me nuts, because all I got are sky-high sugar peaks but nothing major from peptide backbone. I have this in-house script (like always) to do de novo for 3-4 amino acid sequence, but all the low mass in this case are buried by the sugars, sucks...

Then one day I was looking at all the peaks with multiple charges, I told myself this is definitely peptide chunks, but I just don't have any tools to read out the sequence since I don't know where is the N-terminal which can be anywhere in this protein with more than 400 AAs. And I also know that top-down always leads to truncated fragments that looks like not starting from the real N-terminal. So I have no choice but use every AA in the full sequence as pseudo N-term and from there adding in next AA and next next AA etc., to see if it can match with the peptide fragments. This idea is purely a brute-force attack but it is simple enough for my apprentice Python skill. Why not since all my de novo attempts failed?

So eventually I did it (and thanks to the experience from ProSightPC). The script needs some inputs: 1. the fasta 2. the modification table (from Uniprot) 3. deconvoluted mass of the multiple charged peaks. It will generate all possible proteoforms using the combination of the modifications, and iterate using every AA as pseudo N-term, add in the next AA, and check if it can match with the peak mass (with user defined mass tolerance), if not, keep adding the next AA from the sequence and check again, until it goes beyond the mass in query. If this pesudo N-term doesn't give any good matches, move on to next AA as the new pesudo N-term. So at 1:00 am, after testing on my Enolase top-down data (Enolase because we are also doing X-linking here, welcome to send your sample :) I got my hands on the cursed truncated glycoprotein. Thanks to my i7-9700 the brute-force attack works smoothly! In a few mins I got several good matches (May the HRAM be with you, young Skywalker), some matches constantly point to the same Threonine, the frequency of such observation is significant to lead to a theory.

603.7843 (+4): 2.16mDa, 705.2784 (+2): 4.98mDa, 811.5366 (+6): 8.92mDa, and within 10mDa, I got from b2 to b15. Again check the theoretical mass of the proposed truncated sequence, tallied with the intact mass result. Puzzle solved.

Nicely done Li Xin! That aha moment at the end of an intense scripting session is priceless isn't it!

Like
Reply

To view or add a comment, sign in

More articles by Xin Li

  • New MS toys in 2021

    Although the pandemic is still evolving around the world, several new mass spectrometers were brought to the market…

    2 Comments

Others also viewed

Explore content categories