Article first published as Want to Know What Really Makes Us Human? Better Know Your SQL on Technorati.
Genetic research is hot stuff. From homo sapiens to drosophila melanogaster, everyone wants to know what life is really made of.
All that research data has to go somewhere, and often it is placed straight into a database, stuck somewhere, and (the theory is) accessed by men with lab coats and clipboards.
There is a different truth: genetic researchers apparently are some of the original open-source pioneers. From the Online Mendelian Inheritance in Man Database (OMIM), and Entrez’s numerous genome codes, to the genome of Illumina Corporation’s CEO, this data is available for public use. The secret: you better know your SQL to access it.
The OMIM database is a trove of data about genetic illnesses and vulnerabilities. Unlike Entrez or other, more complex databases, it can be searched by keyword, including diagnoses. OMIM pulls data from several other major genetic databases and compiles the results.
Thankfully, it also cites the location of the research in case you want to know, for example, the location of the genetic vulnerability for schizophrenia (note, there are several possible culprits).
Entrez is technically the ‘life sciences search engine’ (read: database collection) that is part of the National Center for Biotechnological Information (NCBI). Entrez hosts open-access databases for all things genetic. Some of these are accessible by keyword searching, others through various forms of modified SQL. Be warned, it is imperative that users have SQL experience past the surface level. Genetic research is not easily understood; thankfully some information can still be gleaned by basic users.
Lastly, the not-so-open-source data: Illumina‘s human genomes (two data sets). Hosted via Amazon’s Web Suite, there is a nominal fee for using these databases. Amazon charges per GB of data transfer, as Illumina’s information is in the Cloud.
Illumina does not allow mucking around with keywords – it is SQL or nothing with this data. Interestingly, one entire genome is Illumina’s CEO, Jay Flatley. Yes, Illumina’s CEO lets any researcher with $2 play with his evolutionary code. It takes a dedicated man to publish his entire genetic makeup.
Many of OMIM’s human genome databases are available for download via FTP. The OMIM databases can also be mapped to users’ databases via XML, which is vital for a smooth transfer. Entrez has a whole utility suite for making remote queries and downloading results, but setting all this up requires some finesse.
Theoretically, the Illumina data could be downloaded from Amazon and mapped, and the cost would be fairly minor. This makes Mr. Flatley technically immortal, because his genetic code is now open-source for eternity.
There is really nothing preventing access to genetic materials, even the H1N1 influenza virus. The crux is not the data, but knowing how to use it. Genetic scientists still have this domain locked tight. Still, if you want to research genetic illnesses, practice your SQL with some novel resources, or download genomic data, it is absolutely possible. In twenty years, medicine may rely on certain diagnoses requiring genetic tests.
In that future, Entrez, OMIM, and even Illumina may slide into the mainstream Internet search collective. Until then, if you want to access the human genome, you had better know your SQL.
- Illumina Announces Creation of the Illumina Genome Network (eon.businesswire.com)
- Illumina Announces Its First Full Coverage DNA Sequencing of a Named Family (eon.businesswire.com)