|
|
|
|
|
what's
my contribution?
|
|
|
|
|
|
|
|
|
|
|
Journal
papers
N. Gupta and
P.A. Pevzner. Peptide versus protein identifications. A strike against
the two peptide rule. Submitted.
S. Kim, N. Gupta, N. Bandeira
and P.A.
Pevzner. Spectral Dictionaries: Integrating De Novo Peptide
Sequencing with Database Search of Tandem Mass Spectra. To appear in Molecular and Cellular
Proteomics.
N. Gupta, J.
Benhamida, V. Bhargava, D. Goodman, E. Kain, I. Kerman, N. Nguyen, N.
Ollikainen, J. Rodriguez, J. Wang, M.S. Lipton, M. Romine,
V. Bafna, R.D. Smith and P.A. Pevzner (2008). Comparative
Proteogenomics:
Combining Mass Spectrometry and Comparative Genomics to Analyze
Multiple Genomes. Genome Research. 18:1133-1142 .
[Abstract]
[Full Text] [Pubmed]
* Included
in Research Highlights of Nature
Reviews Genetics, 6:418, 2008 [link]
* I presented this research in a talk at
ASMS 2008 in Denver on June 5, 2008.
* Press releases on undergraduate-involvement: UCSD
| Science
Daily | HHMI
S. Kim, N.
Gupta and P.A.
Pevzner (2008).
The Partition Function of Tandem Mass Spectra: a New Approach to
Peptide Identifications. Journal of Proteome Research. 7(8):
3354 - 3363.
J. Rodriguez, N. Gupta, R.D.
Smith and P.A.
Pevzner (2008). Does trypsin cut before Proline? Journal of Proteome
Research. 7(1):300-5.
[Abstract]
[Full
Text] [Pubmed]
* Noted as one
of the 20 most accessed articles
in the first quarter of 2008.
N. Gupta, S.
Tanner, N. Jaitly, J.N. Adkins, M. Lipton, R. Edwards, M. Romine, A.
Osterman, V. Bafna, R.D. Smith and P.A. Pevzner (2007). Whole
proteome analysis of post-translational modifications: applications of
mass-spectrometry for proteogenomic annotation. Genome Research.
17(9):1362-77.
[Abstract] [Full
Text] [Pubmed]
* Research highlight at Pacific
Northwest National Labs [link].
K. Gaurav, N. Gupta and R.
Sowdhamini (2005). "FASSM: Enhanced Function Association in whole
genome
analysis using Sequence and Structural Motifs". In Silico Biology
5, 0040.
[Abstract] [Full Text] [Pubmed]
N. Gupta, N.
Mangal and S. Biswas (2005). "Evolution and similarity evaluation of
protein structures in contact
map space".
Proteins:
Structure, Function and Bioinformatics, 59(2):196-204.
[Abstract] [Full
Text] [Pubmed]
A. Bhaduri, G.
Pugalenthi, N. Gupta and R.
Sowdhamini (2004). "iMOT: an interactive package for the selection of
spatially interacting
motifs". Nucleic
Acids Research, 32, W602-W605.
[Abstract] [Full
Text] [PubMed]
N. Gupta and
A. Irback (2004). Coupled folding-binding
versus docking: A lattice model study. Journal
of Chemical Physics, 120, 3983-3989.
[Abstract] [Full Text] [Pubmed]
Conference papers
B. Dost, T. Shlomi, N. Gupta,
V. Bafna, and Roded Sharan, "QNet: A tool for querying biological
networks", RECOMB 2007.
[Abstract] [Full
Text]
* Also
published in Lecture
Notes in Bioinformatics, 4453,
p. 1 ff.
N. Gupta, N.
Mangal, K. Tiwari and P. Mitra. "Mining quantitative association rules
in protein sequences". Proceedings
of the third Australasian Data Mining Conference 2004 (AusDM'04), Cairns, Australia.
[Abstract]
* Also
published in
Lecture Notes in Computer Science, Volume 3755 / 2006,
pp. 273 - 281.
N. Gupta and
V. K. Agrawal. "Two Criterion Optimization in state assignment for
synchronous finite
state machines using NSGA-II". Proceedings
of the
International Conference on Adaptive and Natural Computing Algorithms,
2005 (ICANNGA'05), Coimbra,
Portugal.
[Abstract]
Patents
S. Kim, N.
Gupta and P.A.
Pevzner. Method for identifying peptides using tandem mass spectra by
dynamically determining the number of peptide reconstructions required.
Pending.
|
K. Gaurav, N. Gupta and R. Sowdhamini. "FASSM:
Enhanced Function Association in whole genome analysis using Sequence
and Structural Motifs".
In Silico Biology 5,
0040 (2005).
We present
an algorithm to detect remote homology, which arises through circular
permutation and discontinuous domains. It is also helpful in detecting
small domain proteins that are characterized by few conserved residues.
The input to the algorithm is a set of multiply aligned protein
sequence profiles. This method, coded as FASSM, examines the sequence
conservation and positions of protein family signatures or motifs for
the annotation of protein sequences and to facilitate the analysis of
their domains. The overall coverage of FASSM is 93% in comparison to
other validation tools like HMM and IMPALA. The method is especially
useful for difficult relationships such as discontinuous domains during
whole-genome surveys and is demonstrated to perform accurate family
associations at sequence identities as low as 15%.
N. Gupta, N.
Mangal and S. Biswas (2005). "Evolution
and similarity evaluation of protein structures in contact
map space". Proteins:
Structure, Function and Bioinformatics,
59(2):196-204.
Prediction
of fold from amino-acid sequence of
a protein has been an active area of research in the past few years,
but the limited accuracy of existing techniques emphasizes the need to
develop newer approaches to tackle this task. In this study, we use
contact map prediction as an intermediate step in fold prediction from
sequence. Contact map is a reduced graph-theoretic representation of
proteins which models the local and global inter-residue contacts in
the structure. We start with a population of random contact maps for
the protein sequence and "evolve" the population to a
"high-feasibility" configuration using a genetic algorithm. A neural
network is employed to assess the feasibility of contact maps based on
their four physically relevant properties. We also introduce five
parameters, based on algebraic graph theory and physical
considerations, that can be used to judge the structural similarity
between proteins through contact maps. To predict the fold of a given
amino acid sequence, we predict a contact map that will sufficiently
approximate the structure of the corresponding protein. Then we assess
the similarity of this contact map with the representative contact map
of each fold; the fold that corresponds to the closest match is our
predicted fold for the input sequence. We have found that our
feasibility measure is able to differentiate between feasible and
infeasible contact maps. Further, this novel approach is able to
predict the folds from sequences significantly better than a random
predictor.
A. Bhaduri,
G.
Pugalenthi, N. Gupta and R.
Sowdhamini (2004)." iMOT:
an
interactive package for the selection of spatially interacting
motifs". Nucleic
Acids Research, 32,
W602-W605.
Functional
selection and three-dimensional structural constraints of
proteins relate to the retention of significant sequence similarity
between proteins of similar fold and function despite poor
overall sequence identity and evolutionary pressures. We report
the availability of ‘iMOT’ (interacting MOTif) server,
an interactive package for the automatic identification of
spatially interacting motifs among distantly related proteins sharing
similar folds and possessing common ancestral lineage. Spatial
interactions between conserved stretches of a protein are
evaluated by calculations of pseudo-potentials that describe the
strength of interactions. Such an evaluation permits the automatic
identification of highly interacting conserved regions of a
protein. Interacting motifs have been shown to be useful in
searching for distant homologues and establishing remote homologies
among the largely unassigned sequences in genome databases.
Information on such motifs should also be of value in
protein folding, modelling and engineering experiments.
The iMOT server can be accessed from
http://www.ncbs.res.in/~faculty/mini/imot/iMOTserver.html.
N. Gupta and
A. Irback (2004). "Coupled
folding-binding versus docking: A lattice model study". Journal
of Chemical Physics,
120, 3983-3989.
Using a
simple hydrophobic/polar protein model, we perform a
Monte Carlo study of the thermodynamics and kinetics of binding
to a target structure for two closely related sequences,
one of which has a unique folded state while the other
is unstructured. We obtain significant differences in their binding
behavior.
The stable sequence has rigid docking as its preferred binding mode,
while the unstructured chain tends to first attach to the target and
then
fold. The free-energy profiles associated with these two binding modes
are
compared.
N. Gupta, N.
Mangal, K. Tiwari and P. Mitra. "Mining
quantitative association rules in protein sequences". Proceedings
of the third Australasian Data Mining Conference 2004 (AusDM'04), Cairns, Australia.
Lot of
research has gone into understanding the composition and nature of
proteins, still many things are yet to be understood properly. It is
now generally believed that amino acid sequences of proteins are not
random, and thus the patterns of amino acids that we observe in the
protein sequences are non-random. In this study, we are trying to
decipher the nature of associations between different amino acids that
are present in a protein. This very basic analysis can provide some
insight into the co-occurrence of certain amino acids in a protein.
Such association rules are desirable for enhancing our understanding of
protein composition. They have the potential to give some clue
regarding global interactions among particular sets of amino acids
occuring in proteins. Presence of strong non-trivial
associations
further suggests evidence for non-randomness of protein sequences.
N. Gupta and V. K. Agrawal. "Two
Criterion Optimization in state assignment for synchronous finite
state machines using NSGA-II".Proceedings of
the International Conference on Adaptive and Natural Computing
Algorithms, 2005 (ICANNGA'05), Coimbra, Portugal.
This project
aims at finding the best state assignment for implementing a synchronous
sequential circuit which are also represented as Finite State
Machines. This problem, commonly known as State Assignment Problem
(S.A.P.), has been studied extensively because of its importance in reducing the
cost of implementation. The previous work on this problem assumes the number of bits
that are used for state assignment as given beforehand. Thus the
problem has been treated as a single objective problem, with the only objective
being to reduce the cumulative
cost of transition between the connected states.
In this
work, we add another dimension to this optimization problem by introducing a second objective
of minimizing the number of bits used for assignment. This is desirable to reduce the
complexity and cost of the circuit. The second objective conflicts with
the first objective and thus the optimal solution requires a
tradeoff between the two. We have used different EMO methods to
tackle this problem. The results show that our NSGA-II based approach,
with some modifications to constraint handling, gives better results and running
time than NSGA. We
also gain some insights about the shape of the efficient frontier.
B. Dost, T. Shlomi, N. Gupta, V. Bafna, and Roded Sharan,
"QNet: A tool for querying biological networks", RECOMB 2007. Also published in Lecture
Notes in Bioinformatics,
4453, p. 1 ff.
Molecular interaction databases can be used to study the evolution of
molecular pathways across species. Querying such pathways is a
challenging computational problem, and recent efforts have been limited
to simple queries (paths), or simple networks (forests). In this paper,
we significantly extend the class of pathways that can be efficiently
queried to the case of trees, and graphs of bounded treewidth. Our
algorithm allows the identification of non-exact (homeomorphic)
matches, exploiting the color coding technique of Alon et al. We
implement a tool for tree queries, called QNet, and test its retrieval
properties in simulations and on real network data. We show that QNet
searches queries with up to 9 proteins in seconds on current networks,
and outperforms sequence-based searches. We also use QNet to perform
the first large scale cross-species comparison of protein complexes, by
querying known yeast complexes against a fly protein interaction
network. This comparison points to strong conservation between the two
species, and underscores the importance of our tool in mining protein
interaction networks.
N. Gupta, S. Tanner, N. Jaitly,
J.N. Adkins, M. Lipton, R. Edwards, M. Romine, A. Osterman, V. Bafna,
R.D. Smith and P.A. Pevzner (2007). Whole proteome analysis of
post-translational modifications: applications of mass-spectrometry for
proteogenomic annotation. Genome
Res. Sep;17(9):1362-77.
While bacterial genome annotations have significantly improved in
recent years, techniques for bacterial proteome annotation (including
post-translational chemical modifications, signal peptides, proteolytic
events, etc.) are still in their infancy. At the same time, the number
of sequenced bacterial genomes is rising sharply, far outpacing our
ability to validate the predicted genes, let alone annotate bacterial
proteomes. In this study, we use tandem mass spectrometry (MS/MS) to
annotate the proteome of Shewanella oneidensis MR-1, an important
microbe for bioremediation. In particular, we provide the first
comprehensive map of post-translational modifications in a bacterial
genome, including a large number of chemical modifications, signal
peptide cleavages, and cleavages of N-terminal methionine residues. We
also detect multiple genes that were missed or assigned incorrect start
positions by gene prediction programs, and suggest corrections to
improve the gene annotation. This study demonstrates that complementing
every genome sequencing project by an MS/MS project would significantly
improve both genome and proteome annotations for a reasonable cost.
J. Rodriguez, N. Gupta, R.D.
Smith and P.A.
Pevzner (2008). Does trypsin cut before Proline? Journal
of Proteome
Research. 7(1):300-5.
Trypsin is the most commonly used enzyme in mass spectrometry for
protein digestion with high substrate specificity. Many peptide
identification algorithms incorporate these specificity rules as
filtering criteria. A generally accepted "Keil rule" is that trypsin
cleaves next to arginine or lysine, but not before proline. Since this
rule was derived two decades ago based on a small number of
experimentally confirmed cleavages, we decided to re-examine it using
14.5 million tandem spectra (two orders of magnitude increase in the
number of observed tryptic cleavages). Our analysis revealed a
surprisingly large number of cleavages before proline. We examine
several hypotheses to explain these cleavages and argue that trypsin
specificity rules used in peptide identification algorithms should be
modified to "legitimatize" cleavages before proline. Our approach can
be applied to analyzing any protease and we further argue that
specificity rules for other enzymes should also be re-evaluated based
on statistical evidence derived from large MS/MS datasets.
N. Gupta, J.
Benhamida, V. Bhargava, D. Goodman, E. Kain, I. Kerman, N. Nguyen, N.
Ollikainen, J. Rodriguez, J. Wang, M.S. Lipton, M. Romine,
V. Bafna, R.D. Smith and P.A. Pevzner (2008). Comparative
Proteogenomics:
Combining Mass Spectrometry and Comparative Genomics to Analyze
Multiple Genomes. To appear in Genome Research.
Recent proliferation of low-cost DNA sequencing techniques will soon
lead to an explosive growth in the number of sequenced genomes and will
turn manual annotations into a luxury. Mass spectrometry recently
emerged as a valuable technique for proteogenomic annotations that
improves on the state-of-the-art in predicting genes and other
features. However, previous proteogenomic approaches were limited to a
single genome and did not take advantage of analyzing mass spectrometry
data from multiple genomes at once. We show that such a comparative
proteogenomics approach (like comparative genomics) allows one to
address the problems that remained beyond the reach of the traditional
"single proteome" approach in mass spectrometry. In particular, we show
how comparative proteogenomics addresses the notoriously difficult
problem of "one-hit-wonders" in proteomics, improves on the existing
gene prediction tools in genomics, and allows identification of rare
post-translational modifications. We therefore argue that complementing
DNA sequencing projects by comparative proteogenomics projects can be a
viable approach to improve both genomic and proteomic annotations.
|
|
|
|
|
|
|
|
|
|
|
|
|
|