Healthcare Twitter Analysis

The use of social media data and data science to gain insights into health care and medicine.

The current status report is in the main folder.

All of the tweets for this project have been processed and consolidated into a single file that can be downloaded with this link:

https://s3-us-west-2.amazonaws.com/healthcare-twitter-analysis/HTA_noduplicates.gz
1.85 Gb zipped / 15.80 Gb unzipped

Each of the 4 million rows in this file is a tweet in json format containing the following information:

All the Twitter data in exactly the json format of the original
Unix time stamp
data from the original files:
- originating file name
- score
- author screen name
- URLs

60% of the records have geographic information ...

Latitude & Longitude
Country name & ISO2 country code
City
For country code "US"
- Zipcode
- Telephone area code
- Square miles inside the zipcode
- 2010 Census population of the zipcode
- County & FIPS code
- State name & USPS abbreviation

The basic technique for using this file in Python is the following:

import json

with open("HTA_noduplicates.json", "r") as f:
    # convert each row in turn into json format and process
    for row in f:
        tweet = json.loads(row)
        text  = tweet["text"]      # text of original tweet
        ...                        # etc.

Python provides very powerful analytical and plotting features but R is also very handy; R does not work well with large datasets but Python can be used to create a targeted subset file that R can read (or Excel, or anything else for that matter).

The Status Report in the main part of the repo contains

a comprehensive explanation of the dataset
examples of analyses done with this dataset
a list of references to other healthcare-related Twitter analyses
instructions for using Amazon Web Services
sample programs using this file with Python, R and MongoDB.

Name		Name	Last commit message	Last commit date
Latest commit History 228 Commits
code		code
files		files
Instructions for git.pdf		Instructions for git.pdf
Instructions for python.pdf		Instructions for python.pdf
Instructions for r.pdf		Instructions for r.pdf
LICENSE.txt		LICENSE.txt
README.md		README.md
Status Report.pdf		Status Report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Healthcare Twitter Analysis

About

Uh oh!

Releases

Packages

License

v-supreetha/healthcare_twitter_analysis

Folders and files

Latest commit

History

Repository files navigation

Healthcare Twitter Analysis

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages