Please ensure Javascript is enabled for purposes of website accessibility

Hopkins pair mines 2B tweets for patterns in health care information

Researchers from the Johns Hopkins University found in a recent study that looking at how people complain about headaches, allergies or even the flu on the social media website Twitter can yield valuable public health information not available through other channels.

Mark Dredze, an assistant research professor of computer science and researcher at the university’s Human Language Technology Center of Excellence, and doctoral student Michael J. Paul analyzed 2 billion tweets posted to the website between May 2009 and October 2010.

Dredze and Paul, who also are affiliated with the university’s Center for Language and Speech Processing, used data mining techniques to sort through the 2 billion tweets, which are 140 characters or less, to find only ones that dealt with health care issues. This left them with about 1.5 million in the end.

“At first we looked at the tweets and filtered it for health-related topics,” Paul said. “That didn’t really work though; we kept having people talk about having ‘Bieber fever’ and other non-health related issues.”

The researchers did not include any personal information from the tweets in the study, except for geographical data where it was available.

The study was originally meant to be less about public health and more as a way to show the effectiveness of using computers to sort through millions of tweets to find relevant patterns. Studying the tweets, Paul and Dredze did find some trends nationally, such as one that showed seasonal allergies seem to kick off earlier in the year in the hotter Southern and Southwestern states. They also found misconceptions — mainly that some people still appear to not know that antibiotics are not effective in treating the influenza virus.

Additionally, the pair found anecdotal evidence that about 10 percent of the time, people were using the allergy medicine Benadryl not for allergies, but to combat insomnia, with its drowsiness causing side effect.

“We had no expectations this would work,” Paul said. “So we were really surprised that it worked as well as it did. It wasn’t anything like ‘stop the presses,’ but it shows this approach can be used to find information like this.”

The researchers first filtered the tweets for specific health-related keywords like allergies and flu. Paul and Dredze then went through the tweets by hand, sorting out ones that were relevant while training the computers to be able to pick these up as well.

“It’s kind of like a spam filter for email, the computers learned which ones were useless and which ones weren’t,” Paul said.

Paul and Dredze will present their complete study on July 18 in Barcelona, Spain, at the International Conference on Weblogs and Social Media, sponsored by the Association for the Advancement of Artificial Intelligence.