Bioinformatics in India
Sequencing
Sequencing means to find the symbolic linear arrangement of monomers in an unbranched Biopolymers so as to clearly concise much of the atomic-level structure of that biopolymerMethods to perform DNA or RNA sequencing include:
Methods for performing protein sequencing include:
- Edman degradation
- Mass spectrometry
- Peptide mass fingerprinting
- Protease digests
There are two types of databases used for bioinformatics work :
- Public repositories like GenBank for gene or Protein DataBank for protein
- Private databases belonging to individual resarch groups involved in gene mapping projects or those held by Biotech companies.
There are many different file formats for the storage of sequence data
Usually, human-readable entries that can be printed out onto
paper without complicated layout and nesting are reffered to as "flat file" formats.
Gernally,there is a header section at the top of such a file with a depiction of the file, the species it came from,the way the sequence was obtained the sequence itself of course, and information about the authors of the original publication in which sequence first appeared plus a variety of other useful metadata(information about the information).Luckily the standard. EMBL flat-format is labelled fairly, clearly and there is friendly and informative interface to the whole database called SRS where all of the header fields are depicted in more detail
Now a days, copmuter programs are used to similar sequences in the genome of dozens of organisms,
within billions of nucleotides.These programs can compensate for mutation(exchanged,deleted or inserted
bases) in the DNA sequence, in order to identify sequences that are related, but not identical. A variant
of this sequence alignment is used in the sequencing process itself. The so-called shotgun sequencing
(that was used, for example, by celera Genomics to sequence the human genome) does not give a sequential
list of nucleotides, but instead the sequences of Thousands of small DNA fragments (each about 600
nucleotides long).
The ends of these fragments overlap and aligned in the right way,make up the complete genome Shotgun
sequencing yeilds sequence data quickly,but the task to realign the fragments can be quite complicated for
larger genomes. In the case of the Human Genome Project it took several months on a super computer array to align
them correctly.Shotgun sequencing is generally prefered for smaller genomes, such as bacteria,
and often used at least partially on organisms with much larger genomes. Another aspect of
bioinformatics in sequence analysis the automatic search for genes and regulatory sequences within
a genome. Not all of the nucleotides within a genome are genes. Within the genome of higher
organisms, large parts of the DNA do not serve any obivious perpose. This so called junk DNA may
however,contain unrecognized functional elements. Bioinformatics helps to bridge the gap between
genome and proteome projects, for example in the use of DNA sequence for protein identification.
In the case of the Human Genome Project it took several months on a super computer array to align
them correctly.Shotgun sequencing is generally prefered for smaller genomes, such as bacteria,
and often used at least partially on organisms with much larger genomes. Another aspect of
bioinformatics in sequence analysis the automatic search for genes and regulatory sequences within
a genome. Not all of the nucleotides within a genome are genes. Within the genome of higher
organisms, large parts of the DNA do not serve any obivious perpose. This so called junk DNA may
however,contain unrecognized functional elements. Bioinformatics helps to bridge the gap between
genome and proteome projects, for example in the use of DNA sequence for protein identification.
The initial sequence analysis carried on any protein sequence are : composition analysis, molecular
weight search,isoelectric point calculation,peptide mapping by exposure to protease or chemical agents,
hydrophobicity and hydrophilicity of the sequence,secondary structure determination fold prediction,
transemembrane region prediction,coil structure prediction,signal peptide prediction,motif prediction,
nonglobular region prediction,tertiary structure prediction,etc
Once we obtain a new sequence or if we have
![]()