And half the DNA of all of the siblings and parents of the people that submitted their DNA, a quarter of their grandparents and grandchildren and so on. That's what I really hate about these companies, they get people to submit their DNA and the customers do not realize it isn't a decision that affects just them.
In terms of medical data the amount of leakage is somewhat limited by the nature of DNA, e.g because you get a random mix you can't conclude anything about parents, etc medical status.
By far the biggest practical knock-on effect is if you match someone who's doesn't know their parentage (adoption/illegitimate children/etc) who can figure out their parentage as a result of that match.
Familial DNA crime searches are probably the next biggest, but they're still very rare at the moment and many of the DNA platforms don't allow them (GEDMatch was one of the few that do).
> I’m assuming that the stolen information doesn’t have this limitation
Stolen information has provenance problems that make it difficult to use as evidence of any crime other than theft itself in any system with even rudimentary due process protections and presumption of innocence.
I mean, it's hardly as if you are going to be able to get the people who handled the data between the people who had it lawfully and the time it got to the police on the stand to attest to it's integrity.
(That doesn't prevent its use in investigations, but it means that it would only lead to convictions in a contested case where the police used it to locate proof that was legally sufficient without the use of the DNA as evidence.)
Most law enforcement typically use a handful of commercial agencies (like Parabon NanoLabs, etc) for these kind of searches, it seems highly unlikely that any of them would risk using illegally obtained data because it would put their entire business at risk.
(obviously if your threat model includes intelligence agencies, etc. then your calculus might be different)
It's not about law enforcement using the data, it's about the viability of running a business that provides illegal hacking services for law enforcement.
I’m cackling at the idea of GDPR compliant data thieves. “This data is only to be used for the purposes of: anything. The data controller is: whoever.”
> You know what parent a male's X and Y came from.
You can identify which parent any chromosome came from. They're all marked, and the same genetics may do sharply different things depending on whether it was inherited from the father or the mother.
Inability to recover this data has nothing to do with "the nature of DNA" -- the data is very much present in the DNA. It's unrecoverable because when we summarize DNA, we leave it out.
> You can identify which parent any chromosome came from. They're all marked, and the same genetics may do sharply different things depending on whether it was inherited from the father or the mother.
I did not know this. This sounds interesting! Can you provide any google search terms (or a link) where I can read more about this? (e.g. a name of what they are marked with)
This surprises me. I thought that there was a process by which portions of the two copies of a chromosome get switched between the two. Is that right? How does that fit together with these markings?
(If these questions would be answered by searching for whatever search term or reading whatever link you provide, I would consider providing said search term or link to be answering these questions)
> Can you provide any google search terms (or a link) where I can read more about this? (e.g. a name of what they are marked with)
The term I know related to this is "methylation". https://en.wikipedia.org/wiki/DNA_methylation . I don't know all that much about it; I would not want to claim that methylation is the only such mechanism, or that this is the only information expressed by DNA methylation.
> I thought that there was a process by which portions of the two copies of a chromosome get switched between the two. Is that right? How does that fit together with these markings?
Yes, that's correct. "Crossing over" does not occur during ordinary cell division ("mitosis"), in which one of your cells divides into two of your cells -- your chromosomes should stay the same (except for new mutations) through your life.
But it does occur during meiosis, the process by which one of your cells divides into four sperm or four eggs (these are "gametes", and in terms of chromosome content they are only half-cells, not full cells). Your children's chromosomes may therefore differ from yours.
So the interaction between parental marking and crossing over would broadly look like:
1. You are going to produce four gametes.
2. Remove the parental marking (indicating the sex of the gamete's grandparent) from the cell undergoing meiosis.
3. Do the crossing over.
4. Apply parental marking indicating your own sex (the gamete's parent, rather than grandparent).
5. Divide into four cells.
I don't actually know where the unmarking and remarking occur in the process; maybe reality is more like 2435, or 3254. But both crossing over and applying correct parental marking are part of meiosis -- since meiosis produces a cell that belongs to your child rather than a cell that belongs to you, it's easy to know what kind of marking should be applied.
Yes, but there's generally not much medical data you can infer from those.
You're right that you could reconstruct parental haplotypes, but that reveals a fairly limited amount of data, typically you'll share haplotypes with many millions of people.
What would insurance companies do with the data though? If they knew you were predisposed to obesity and cancer due to this data, would they be kind enough to ignore that info?
If we’re hypothetically considering what they could do if they weren’t one of the most regulated industries, they have exponentially better options for limiting their risk than requesting DNA.
It’s not only medical data that’s of concern, but also nation states Could try to use the data to identify embedded foreign agents/spies implanted in their country. Those are the ones without diplomatic cover
Ignorant question here. How is this not regulated through HIPAA? Shouldn't these board members of this company face prison? DNA, a prosecutor could argue is a unique health identifier.
"Access to equipment containing health information should be carefully controlled and monitored."
People think of HIPAA as a generic cover-all medical privacy law for some reason.
It's not, not even close, It's a law that very narrowly applies mainly to insurance companies and healthcare entities that accept medical insurance.
As a general rule - if insurance is never involved HIPAA doesn't apply.
If you got a DNA test prescribed by your doctor for a diagnosis or even for genetic counseling then HIPAA applies. It's not the nature of the data, it's the nature of the organization dealing with the data.
I have no idea where this mass misunderstanding came from
No, personally identifiable health information can be shared/exchanged without HIPAA applying. For example if I email my grandma information about my cancer diagnosis, Gmail isn't HIPAA compliant and doesn't need to be just because some people might use it to talk about their health. Grandma is also free to share my health information with impunity, she is free to, say, forward it to my boss because grandma doesn't have to abide by HIPAA either because she's a grandma.
The privacy rule only applies covered entities. If a covered entity works with cloud provider, they sign a BAA. The cloud provider is not a covered entity.
More accurately "Dna services for funsies" are not covered entities. Medical labs that sequence DNA in the realm of actual healthcare (and accept medical insurance) are covered entities.
"I'm standing here in this chalk circle where HIPAA does not apply, can't touch me, nyah nyah!" Sounds like that would work against a 5-year-old sibling, but that's rarely the case...
It's not covered. The ones at which we should be most angry are law enforcement officers using this information. This is simply a first step to the state collecting DNA on all citizens (see what it's done with fingerprints as an example.)
This is exactly how I feel about my friends/family having Facebook apps on their phone. I didn't consent to giving my contact info to Facebook. I wasn't given a choice.
Agreed. But at least I don't leak their data in return. I figure Facebook must have 99%+ coverage of the world's social graph by now, including all the holdouts. You may not have an account, but they know you exist, what you look like, what your phone number is and probably where you are just by observing the nodes that are still 'blank'. Shadow profiles should be assumed to be just as detailed as the rest. It's one reason why there are very few photographs of me online (or elsewhere). I'd like the option to go rogue one day to be open to me ;)
It is impossible to have an "expectation of privacy" over the DNA of your relatives. You need to live with it, and resolve your feelings to the reality of that situation.
You don't get a choice if your uncle, grandmother, aunt, niece or son share their DNA with law enforcement.
I'm not sure it's just feelings to be resolved. The problem is that even DNA matching, especially the SNP genotyping which tends to be used by consumer ancestry and heritage services, is not perfect. So your daughter can end up false-positively matching to a crime 20 years from now due to your uncle submitting his own DNA last year without you ever knowing. I'm not sure how anyone sufficiently familiar with the implications can just get over it and accept the reality of the situation. It is by no means an easy problem to solve.
That’s very true but it’s more of a criminal justice problem than a privacy problem. The same issue could happen (and has) with fingerprints or other biometric data.
Do they not realize? It's just served as a convenience for family trees and cure together services. Same mechanism with any "social app" that asks access to all your phone contacts to "easy friends discovery": you may have never used Tiktok or Facebook but they already know quite a few things about you thanks to an acquaintance. I know DNA stuff seems scarier in the long run but data from an address book are more easily exploitable.
With likely very dire results, yes I think you should. If your mothers insurance rate goes up, since you got one of these dna tests for Christmas, she should be involved in the decision to publish this data in the first place.
And George W. Bush, a Republican, signed this into law. I remember thinking that strange at the time because I would have thought insurance companies would want to be able to use DNA information and the Republicans being more of a "big business" party would have supported that.
Also, I found out last time this discussion came up on HN that the law prevents it being used for regular insurance but does not apply to life insurance.
And life insurance could just simply demand a dna sample from you before underwriting a policy just like they might demand a physical so the whole "concern" is entirely moot.
Insurance is highly regulated, insurance companies have specific legal ways to underwrite policies, the idea that life insurance companies are going to secretly use stolen data of uncertain provenance in their underwriting instead of just making you submit a dna sample is, quite frankly, silly.
Of course I realised the comparison would be somewhat controversial, it was actually the point of bringing it up. However, if you have the time I would appreciate it if you tried to articulate why you think the comparison is unfair, instead of just a general dismissal.
That's a fair point. However, I'm not entirely sure I buy the premise. With the advent of deepfakes and internet scraping facial recognition, I think a public photo collection of your entire likeness could be considered least somewhat at risk for abuse, when compared to the risk that a confidential fingerprint with ~25% of your genes is leaked and then used against you.
Your data, your rules.
I put my 23andme raw data in github (https://github.com/sbassi/MiGenomaSbassi) for the world to see and use without asking anybody in my family.
I strongly disagree. I consider it comparable to something like financial administration. In which an "expense" or "exchange" has two sides. Me, paying you, you receiving the money.
It is not up to me to decide to just release such data. Because it encodes other people's data too. If I were to release my financial records because "it's my data", i'd be exposing a lot of people, organisations and companies who I had interaction with.
But it is up to me to decide to release my financial records. All the parties I've dealt with have to expect the possibility (unless there is some signed agreement that prevents disclosing them).
I'm pretty sure that if <insert ecommerce platform here> were to leak all their financial transactions, that is considered a large data-breach and would be considered a privacy infringement.
I am aware that "an ecommerce platform" is something else than "your personal finance", but the principle is the same: X shouldn't release other people's financial transactions just because those were done with X.
The federal Genetic Information Nondiscrimination Act does prohibit insurers from asking for or using your genetic information to make decisions about whether to sell you health insurance or how much to charge you. But those privacy protections don't apply to long-term-care policies, life insurance or disability insurance.
That's ridiculous. Should I not use my surname because I identify my parents, my brothers and some of my cousins? I think bodily autonomy applies here.
You are giving far more away with DNA information; it's not remotely the same.
The point is, at the very least it's a grey area, so to dismiss the counter points so airily as you have done on such a serious subject indicates - at best - a lack of reflection and respect for the rights of others.
This is really another example of a claim of "genetic exceptionalism", that genetic information has a special status among other sorts of personal information, that mostly is not true. Your personal information, broadly, is informative about your relatives, your friends, etc. This includes your personal health information, your personal financial information, your online habits, etc. Any time you share personal data, you are disclosing information about people associated with you, without their consent, that might be used against them. And often these other classes of personal data are more informative than genetic information.
Yeah, that was my biggest fear with these services. How do I stop my family members from falling for it? In the end, I can't and just have to live with their mistake (if they used these services).
1. Police misconstruing DNA evidence and falsely accusing people of crimes. For example, a person's DNA can appear at a crime scene if they rode in a Lyft before a perpetrator.
1. Criminals extorting parents of sperm-donor children: Pay us or we'll reveal to your kids that he's not their dad.
1. Criminals extorting unfaithful parents: Pay us or we'll tell him that the kid isn't his. Pay us or we'll tell her about the child born from your affair. Pay us or we'll tell your religious group about your child born not to your spouse.
1. Criminals extorting people about their expected health outcomes: Pay us or we'll tell the shareholders about your 50% chance of getting disease X in the next 5 years. Pay us or we'll tell her that you're likely infertile. Pay us or we'll tell your kid that they will probably die by age 30.
1. Criminals extorting folks who have changed their identities: asylees, stalking victims, protected witnesses, etc.
1. Oppressive governments persecuting relatives of escaped asylees: Your brother who disappeared actually went to country X. We can't punish him so we're punishing you.
"This is a GDPR erasure request. Your site contains my PII by way of that of my father. Please erase this information and indicate that you have complied within 30 days."
I think this is somewhat analogous to the privacy issues around Google Street View. Almost nobody thought the image of the front of their house was really private, but the idea of it being catalogued and searchable bothered more than a few. Removing the barrier of someone having to physically do the work to get that information at least made them feel more vulnerable.
Has Street View been a problem for the world in that way? I haven't personally experienced that. That's probably why the DNA database idea doesn't scare me. If you want to live in the world it's essentially impossible to keep your DNA a secret. It seems to me that eventually someone will pick it all up and organize it.
Your street view doesn’t contain your entire genetic record (including propensities towards disease, mental and physical, which could very easily be used to discriminate against you). So they’re not really comparable whatsoever.
And what is with this “this terrible thing X will happen eventually, so why not have it happen now?” argument I keep seeing nowadays? Your argument was quite literally: “Eventually someone will collect all your DNA”, so who cares if it’s now or later?
> Your street view doesn’t contain your entire genetic record (including propensities towards disease, mental and physical, which could very easily be used to discriminate against you).
Isn't this a form of victim blaming? How is this different than saying Black people should try to hide their skin color since in many cases they will be discriminated against because of it? We should be working to suppress the discrimination at it's source, not it's target.
You're right, working to reduce discrimination at source is undoubtedly worthwhile. But data does not exist in a vacuum - it is collected on behalf of, and used by, people.
Until we reach zero intolerance nirvana, you can't ignore that personal data collection at scale simplifies discrimination, and also opens up new methods for discriminating. Will there be benefits to society from personal data collection at scale? Of course. But there are also costs. There are plenty of examples of people whose ideas or products became used in unforeseen ways and regretted their actions.
Discrimination should be suppressed at source and systems that simplify its manifestation in the real world should be handled extra carefully.
I'm a little confused about what exactly the point of debate is here.
* Is your DNA a secret? I think the fact that you leave it everywhere means no.
* Should people be allowed to aggregate that information? It literally cannot be stopped so I think the point is moot.
I guess what I'm missing is any addressing of the reality of the situation. I'm guessing from the content of your reply that you think that the practice of cataloging DNA should be banned. Great. What happens when they do it anyway?
I'm just looking for a helpful, actionable response. All I've seen so far is "X is bad" (not actionable) and "Let's ban X" (not helpful).
What good will it do you that there's an international ban on DNA databases when corporations use the impossible-to-stop one anyway to discriminate against and target you or the police use it anyway to throw you in prison.
The most helpful course of action imo is to learn how best to cope with this new reality. How should we set our expectations when our DNA is public and searchable? Are there behaviors that would once be safe but will not be in the future? I think those are the more relevant questions.
To your first point, you can go out to the street and bring home someone’s random dna, but there is no way you’d ever be able to know who’s dna it was.
... unless you were to look it up maybe, in this leaked dna database.
Dna is not inherently an identifier. It needs the lookup code in order to act as one. A database like this MAKES it no longer a secret.
I'm not talking about taking random samples off a sidewalk. I'm saying if you follow a person you know and collect something they've discarded, now they're in the database. Do that enough times and everyone's in it. That's the exact technique the police use to collect people's DNA without their consent.
> Is your DNA a secret? I think the fact that you leave it everywhere means no.
There is a complicated procedure to convert this skin scales to data. Not everybody is able to do it, so if is not a secret, neither is exactly open data.
> Yes your DNS is a secret, just like your fingerprint is a secret.
But is it really? I think the point being made here is that actually it is relatively easy to obtain someone's DNA. Is there a law that prevents someone who knows your name from picking up a discarded coffee cup and extracting your DNA? I think it's an interesting debate. Is your face private? Is the sound of your voice private? Those things are unique to you but anybody that interacts with you will be exposed to those features including possibly your DNA. I guess the concern is how the data is collected, what it is used for and in the case of DNA the impact it has on anybody that has a genetic link to us. I think it's fair to consider DNA in separate category. There's only so much that can be deduced from your face as compared to DNA. It's tricky...
By targeted reconnaissance effort do you mean trivial geographic correlation based on your phones location data. So if the Google Street View car had a DNA sequencer on the back and GPS recorded any fragments and location it could trivially reconstruct quite a bit. No one has done this yet, but it's utterly doable. DNA is not private information its the most public information you can imagine is not controllable in any way thats meaningful to traditional thoughts on data privacy.
If an action requires less investment and provides the same value, it will happen more frequently — economics. A database lookup requires less investment than a targeted DNA harvesting, sequencing, and location correlation operation.
So because it is supposed to be trivial to identify people based on GPS, phone and DNA (which I dont believe), it doesn't matter if one gets his data into a DB, which gets leaked to the internet and then can be found/used by anyone? I don't think I follow u our reasoning.
I'll also state that DNA is hardly the most public information there is, surely your face/skin color/size/other physical characteristics are more public?