For years, researchers have been studying medical conditions using huge swaths of patient data with identifying information removed to protect people's privacy. But a new study suggests hackers may be able to match "de-identified" health information to patient identities.In a test case described in JAMA Network Open, researchers used artificial intelligence to link health data with a medical record number. While the data in the test case was fairly innocuous - just the output of movement trackers like Fitbit - it suggests that de-identified data may not be so anonymous after all.
"The study shows that machine learning can successfully re-identify the de-identified physical activity data of a large percentage of individuals, and this indicates that our current practices for de-identifying physical activity data are insufficient for privacy," said study coauthor Anil Aswani of the University of California, Berkeley. "More broadly it suggests that other types of health data that have been thought to be non-identifying could potentially be matched to individuals by using machine learning and other artificial intelligence technologies."
Aswani and colleagues used one of the largest publicly available patient databases, the National Health and Nutrition Examination Survey, or NHANES. Included in the database were recordings from physical activity monitors, during both a training run and an actual study mode, for 4,720 adults and 2,427 children.
The researchers showed their computer the data from the training runs for each person and included six demographic characteristics: age, gender, educational level, annual household income, race/ethnicity, and country of birth. The training data for each person was given a made-up record number.
Then Aswani and his colleagues fed the computer the second set of activity data, including the six demographic factors. For 95 percent of the adults and 86 percent of the children, the computer successfully matched the two sets.
What are the practical implications of that matchup?
Aswani offers a hypothetical situation. "Say your employer is giving a discount for participation in a wellness program and will be collecting demographic information as well as physical activity data," he said. "At the same time, your health insurance company might have a program to try to get insureds to lose weight. They also collect demographic information and physical activity data, but remove identifying information."
Theoretically, your employer could link the two data sets and "then they will accurately be able to link to the rest of your medical record," Aswani said.
Another scenario, Aswani said, is that your smart phone is collecting your movement data as part of a health app. If your insurer also has movement data, the app maker might be able to link your name to your medical record and then sell the information to others.

More at: