|
Question:
"I am confused about the start coordinates for items in the RefSeq table.
It looks like you need to add "1" to the starting point in order to get
the same start coordinate as is shown by the Genome Browser. Why is this the case?"
Response:
Our internal database representations of coordinates always have a
zero-based start and a one-based end. We add 1 to the start before
displaying coordinates in the Genome Browser. Therefore, they appear as
one-based start, one-based end in the graphical display. The refGene.txt file
is a database file, and consequently is based on the internal representation.
We use this particular internal representation because it simplifies
coordinate arithmetic, i.e. it eliminates the need to add or subtract 1 at every step.
Unfortunately, it does create some confusion when the internal
representation is exposed or when we forget to add 1 before
displaying a start coordinate. However, it saves us from much trickier
bugs.
In summary, if you use a database dump file but would prefer to
see the one-based start coordinates, you will always need to add 1
to each start coordinate.
|