banner created in canvas
It has been a while since I want to write something about this topic. I work with bioinformatics since 2007 now and every time that I tell people, more than 90% or more get confused about it. People know what is a biologist, what is a computer programmer, but usually, the term bioinformatics confuses a bit. There are some people here in the hive that tries to explain this science area. The most recent one you can find here from another user. For now, this post doesn't intend to explain much about what is the basis of bioinformatics, but some products that can be used by any people, if they are willing for paying.
Since I started to work, I already lost the count of the number of human samples that I analyzed, all of them were anonymous samples since they were derived from patients. The data usually was a product of a DNA or RNA sequencing, and I usually look for some mutations that could be specific to the disease, known and novel mutations (from the DNA sequencing), and also aberrant RNA expression (from the RNA sequencing). Why there is a need for that? To find new ways to improve diagnosis, prognosis ( the outcome to the patient, such as response to a treatment) and also find new therapeutic targets. When we talk about DNA or RNA sequencing, it is a very expensive experiment, even though the price per sample reduced a lot in the last 20 years. However, I was always curious to know and get the data from my own genetic code!
I don't know much about the history of all the products available, but I guess that ancestry was the first one, at least it is the oldest that is lasting more, that started to sell the analysis of the code for anyone that would like it. They began still in the late 90's, before the first draft of the whole human genome. Their initial focus was to identify single nucleotide variations (SNPs) that were associated with the human population. So in the end you have a result for example you have 90% of the caucasian genome and 10% Yoruba. What is a SNP?
Figure made by myself in an old slide
In all of our 3.2 billion nucleotides (so 3.2 billion letters like in the picture above) in the human genome, there are some variations, which aren't pathologic but are more common in different populations in the world. In the figure above you can find two sequences of nucleotides from different people, where in one location or you can have a T (timine) and in the other person a G (guanine) So identifying these SNPs in our genome can show the percentages of SNPs that could be classified for the population X, Y, or Z. Do we need to perform a whole DNA sequencing to identify them? No. Ancestry has been using a technology called SNP-array. SNP-array instead of sequencing every single nucleotide of a person's genome has probes with known SNPs, and the machine shows if the probes captured these known SNPs. There are some SNPs already associated with some traits and diseases. Ancestry only provides a report also associated with traits. SNP-array is a technology that doesn't "look" for each of your 3.2 billion nucleotides, the coverage is something around 0.5% of the total human genome for this technology. So for discovering new variations it isn't recommended. The price of having your code deciphered by Ancestry is around 100 USD
The company 23andMe started a little bit after Ancestry. Also provides the results using the SNP-Array technology. They started also being the main competitors of Ancestry, but know they offer traits and disease reports ( a little plus compared to Ancestry). But don't expect a huge list of diseases in the report, but some interesting ones, such as diabetes, alzheimer's, variations in the BRCA1/BRCA2 (associated with some cancers, mostly breast cancer) and some other. They usually do some sales on amazon ( right now all the ancestry+traits+health are being sold for 125 CAD, less than 100 USD). I performed this test two years ago in one black Friday deal. The ancestry result didn't surprise me, since I already knew my ancestry roots and I don't have much mixture in my ancestry. The traits results are fun to look at, but they aren't essential for our lives; here are some examples:
Part of my report generated by 23andme
For my health report, some stuff was interesting, like a SNP associated with Celiac Disease. I don't have ( at least yet) the disease, but my mom has it. But I am diagnosed with another auto-immune disease; this leaves a question: genetics of auto-immune diseases are very complex since there are many epigenetic events that also infers into it. The other results were that I am a carrier of a SNP to hyperinsulinism, which can create trouble for my kids if my wife also is. But despite that no other big concerns. The report online is very interactive and easy to manage; my only advice is not to get worried if you find an SNP associated with the disease since it doesn't mean that you will have the disease. Another cool thing is that we can download all the processed data, the VCF file, which has thousands of lines with all variants results. So we can process it by ourselves, as a bioinformatician, or send it to another site ( there are some around) which can look for known variants for other traits/diseases that 23andme doesn't look for.
There are some other companies but these two are the main ones that I heard of that provides a Whole Genome Sequencing result. Nebula has three different sequencing products, the cheapest one costs 99 USD, but it gives only 0.4 X of coverage, so some predisposition reports and ancestry. An intermediary of 299 USD test that gives you a more complex result with 30x of genome coverage, so you will have more data there to dig into some diseases, and also a super expensive 1,000 USD with 100x of coverage. Dante Labs, nowadays has only one test with 30x of coverage that cost around 500 EUR. For a normal person, I don't thing that doing whole-genome sequencing will give you a lot more results compared to SNP-Array company. Since most people will only read the reports and the reports are mostly based on the same known SNPs. But the cool thing about these companies is that they provide the raw data, so if you are a bioinformatician you can play with it. Maybe I will wait for the next black Friday to get one of them! I want to compare it with the results of the 23andMe. I would expect some differences since SNP-Array also has some technology bias. So maybe I can tell you if I find some concerning differences between the tests one day!
Please leave me any question bellow, it is too much of info around here