OtherPapers.com - Other Term Papers and Free Essays
Search

Human Genome

Essay by   •  November 9, 2012  •  Research Paper  •  2,196 Words (9 Pages)  •  1,953 Views

Essay Preview: Human Genome

Report this essay
Page 1 of 9

The human genome has been widely anticipated for its contribution towards the understanding of human evolution, the environment as well as heredity when it comes to human condition, and the causation of disease. Such factors can be determined by decoding the DNA that contributes the human genome. In 1985, a project was launched where the aim was to determine the complete sequence of nucleotides in the human genome, where the idea was then met with different reactions within the scientific community in the following years (Sinsheimer 1989). Other studies of the human genome included "The Human Genome Project (HGP)". The sequence of the human genome can be determined by performing a whole-genome random shotgun procedure with the gathering of sequenced segments.

The history of DNA sequencing was initiated in 1977, when Sanger informed about his method of detecting the order of the nucleotides in the DNA by using analogs that terminated chain nucleotides (Cook-Deegan 1994). Also, it was in that year that the human gene was first isolated and sequenced (Seeburg, Shine et al. 1977). Sequencing of the human genome regions have showed that the cDNA sequences, which are transcribed reversely from RNA, are important for the explanation as well as the validation of the predications of the genes in the human genome. Such studies were considered to be foundations in order to develop the expressed sequence tag (EST) technique, which were used to identify genes (Adams, Kelley et al. 1991). This method is a random selecting and very high throughput approach for the characterization of cDNA libraries (Adams, Dubnick et al. 1992), which led to the rapid increase of the human EST sequences required for the development of new computer algorithms so as to analyse vast amounts of sequence data. An algorithm was developed by the institute of Genomic Research in 1993, which allowed the human genes to be characterized and annotated on the basis of 30,000 EST assemblies (Adams, Kerlavage et al. 1993).

Preparation of high-quality plasmid libraries in several insert sizes was done for the important part of the whole-genome shotgun sequencing process. This was performed so that pairs (reads) of the sequence reads are obtained (a read from both ends of each plasmid insert). An equal representation of all parts of the genome, a minute number of clones without inserts, and no contaminations from sources (such as the mitochondrial genome and E.coli genomic DNA); all exist in high-quality libraries. A study was performed where, from each donor, DNA was used to construct plasmid libraries in one or more of three sizes (2kbp, 10kbp, and 50kbp) (Polymeropoulos, Xiao et al. 1993). In another study, a DNA sequencing process was designed by focusing on developing a simple system that could be applied in a vigorous and reproducible manner where it could be effectively monitored (Figure 1).

Human genome sequencing was first started on the 8th of September 1999, and finished on the 17th of June 2000. There are two different assembly approaches that are used for assembling the 3 billion base pairs (bp) that build up the 23 pairs of chromosomes in a Homo sapien genome. Furthermore, any data that has been derived from GenBank (an open-access sequence database) were shredded in order to remove possible preference to the final sequence from chimeric clones, misassembled contigs, or foreign DNA contaminations.

A method for assembling the genome includes the combination of all sequence reads with bits of data from GenBank so as to create an independent and non-biased view of the genome. Another approach involves grouping up (clustering) all of the fragments to a chromosome or region in order to map the information. Both methods provide the same reconstruction of assembled DNA sequence coverage, in other words, having fewer gaps. Therefore, this was the principle sequence that was used in the analysis stage (White, Dunning et al. 1993).

An example of this would be the whole-genome assembly anatomy (Figure 2). Internally derived reads from five different individuals (black lines) and overlapping shredded bactig fragments (red lines) are combined to produce a contig and a consensus sequence (green lines). By using mate pair information, the contigs are connected into scaffolds, which are then mapped to the genome (grey line) with STS (blue star) physical map information.

Figure 2. Anatomy of the whole-genome assembly.

Figure 1. Flow diagram representing sequencing pipeline

It has been estimated that the average span for a 'typical' gene in a human DNA sequence is 27,894 bases. The most obvious and most visible element of the genome structure is the banding part produced by Giema Stain. Studies involving chromosomes have revealed that approximately 17%-20% of the human chromosome complement consists of either constitutive heterochromatin, or C-bands (Miklos and John 1979). Most of the heterochromatin is extremely polymorphic as well as consisting of different families of α-satellite DNA that have different high order structures (White, Dunning et al. 1993). Several chromosomes in the pericentromeric regions have complex inter- as well as intra-chromosomal duplications (Horvath, Schwartz et al. 2000).

One of the central goals in biology and medicine is to understand the relationship between genotype and phenotype. The reference human genome sequence offers a basis in order for human genetics to be studied; however, to do so, full knowledge of DNA sequence variation across the whole spectrums of allele frequencies as well as the differences in DNAs are essential to perform a systemic investigation of human variation.

By 2008, approximately 11 million single nucleotide polymorphism (SNPs) as well as 3 million short insertions and deletions (indels) were in the public catalogue of variant sites (dbSNP-129). In addition to this, locations of large genomic variants were indexed by structural variant (SV) databases. An example of such a database is dbVAR. Both allele frequencies and the correlation patterns between nearby variants known as linkage disequilibrium (LD) were catalogued by the International HapMap Project (Frazer, Ballinger et al. 2007).

Association studies over the last five years have identified over a thousand genomic regions that have been associated with the susceptibility of diseases including other common traits. The genomic wide collections of rare SVs including the common ones have also been tested for their association with disease (Craddock, Hurles et al. 2010).

Even though much success has been made throughout these studies, a considerable amount of work is still required in order to gain a deep understanding of the genetic

...

...

Download as:   txt (13.7 Kb)   pdf (155.3 Kb)   docx (14.5 Kb)  
Continue for 8 more pages »
Only available on OtherPapers.com