dna

Automatically generated description.

Human genome

Human genome consists of 23 pairs of chromosomes.

Each chromosome pair, except of the 23d is identified by its number (1-22). The 23d pair is named as XX or XY, and identifies the gender.

Each chromosome is a DNA — a double helix shaped molecule, consisting of a long sequence of base pairs.

A base is a “letter” of DNA. There are four possible bases in the DNA: ACGT (Adenine, Cytosine, Guanine and Thymine).

There are different number of basepairs in each of the chromosome (more than 6 billion total).

ChromosomeNumber of pairs
1249250621
2243199373
3198022430
4191154276
5180915260
6171115067
7159138663
8146364022
9141213431
10135534747
11135006516
12133851895
13115169878
14107349540
15102531392
1690354753
1781195210
1878077248
1959128983
2063025520
2148129895
2251304566
X155270560
Y59373566

Some parts of base pairs sequence make up a gene — a logical unit that scientists relate to a specific function. There are roughly 20000 genes in the human genome. Each gene is located in its dedicated [locus](http://en.wikipedia.org/wiki/Locus_(genetics) — its “coordinates” in the genome.

The genes is a blueprint of how to produce a specific protein. Slight variations of the gene can lead to a different protein being produced. Such variations are called alleles.

The gene is divided into codons, and a protein consists of amino acids. (Almost) each codon corresponds to an amino acid. A sequence of amino acids define a specific protein.

A codon consists of 3 bases. Given 4 states for each letter, there are 4^3 = 64 codons. But the same amino acid can be encoded by different codons. In addition, some codons don’t encode any acid, they have a special meaning. For a corresponding between a codon and aminoacid, see DNA codon table. There are 22 standard proteino-genic amino acids, 20 are encoded by genes, but only 11 of them can be produced by humans. Other 9 are essential, i.e. should be supplied with food.

SNP: genotyping vs sequencing

To determine the full sequence of base pairs that defines the genome, it must be sequenced. The Human genome project was dedicated to sequence the human genome and define a complete map of its 20000-25000 genes. It took 10 years and huge amount of resources to sequence one human genome.

While the cost and time of complete genome sequencing has enormously decreased, it is still quite expensive. It is much cheaper to sequence only the parts of DNA that vary for each individual. Because 99.5% of the DNA is exactly the same for all human beings. The process of determining a “diff” between individual and the “common” DNA is called genotyping.

The genetic variation come from base insertions, deletions and substitutions. The substitution variation is called SNP. It defines 90% of all human genome variation. It is estimated that there are 10-30 millions SNPs in human genome.

It has become affordable to make SNP genotyping of your DNA for personal use. The 23andme service provides up to 1 million SNPs for just $99 USD.