  Frequently Asked Questions: Genome Browser Tracks
  Database/browser start coordinates differ by 1 base

"I am confused about the start coordinates for items in the RefSeq table. It looks like you need to add "1" to the starting point in order to get the same start coordinate as is shown by the Genome Browser. Why is this the case?"

Our internal database representations of coordinates always have a zero-based start and a one-based end. We add 1 to the start before displaying coordinates in the Genome Browser. Therefore, they appear as one-based start, one-based end in the graphical display. The refGene.txt file is a database file, and consequently is based on the internal representation.

We use this particular internal representation because it simplifies coordinate arithmetic, i.e. it eliminates the need to add or subtract 1 at every step. Unfortunately, it does create some confusion when the internal representation is exposed or when we forget to add 1 before displaying a start coordinate. However, it saves us from much trickier bugs.

In summary, if you use a database dump file but would prefer to see the one-based start coordinates, you will always need to add 1 to each start coordinate.

  Gene name / annotation search results

"Someties when I type in the name of a gene -- e.g. DAO (D aminoacid oxidase) -- the Genome Browser returns a list that includes the gene entry on the assembly, but also contains links to several other genes. What is the relationship between my gene of interest and these results?"

The gene search results are obtained from scanning the names and description fields from Genbank RefSeq, TIGR, RNA, and most other available gene tracks.

  Protein doesn't begin with methionine

"I am looking at a protein that the Genome Browser associates with a particular gene. According to the Genome Browser, its amino acid sequence doesn't start with M (methionine). I thought proteins always began with ATG (Met)?"


Many legitimate archaeal and (some) bacterial genes start with the alternate codons GTG or TTG, which when used as a start codon, are translated as methionine. However, automated translation tracks do not attempt to identify start codons, so these cases will be translated into their elongation-mode amino acids: Leucine (TTG) or Valine (GTG).

Also, the UCSC genome browser uses translated ORFs exactly as supplied to GenBank by the original sequencing authors. Any errors at GenBank propagate through many other databases and tools. To work effectively in a bioinformatic area subject to errors, it is a good idea to seek supporting data for any unusual observations.