How it starts
This pet project started from a desire to track what my daughter’s doctors were prescribing her. My daughter Hayoon spent most of her days in daycare and would often end up catching various viruses or developing skin issues. Sometimes she needed to visit the doctor once or twice each week. In Korea, it is incredibly easy to and convenient to visit any doctor or clinic of your choice, but there is never enough time to deeply consult with medical professionals concerning the health issues that brought you in for their care. Each time we left the doctor’s office with a new prescription and I began to wonder exactly what kind of medicine was being prescribed. Even though I asked the doctors to share my daughter’s EMR data with me, I could not get them to share a digital version, so I needed to collect all of her paper prescriptions and type them into a spreadsheet by hand.
Questions
-
What diseases were afflicting her?
-
What medicines were being prescribed?
-
What are the trends in medicine type and amount?
Procedures of Visual information
I collected all medication prescriptions and analyzed the data to see what the relation was between disease codes and medications prescribed. I first collected the data by hand then analyzed and visualized them with R programming.
The project led me through the whole process of data analysis from collecting raw data, processing the data for analyzing, actual analysis and visualization. It encouraged me to join an R community and study statistics and R programming language.
Also, this personal project revealed to me the issue of health autonomy and medical data ownership when I was not allowed to access to my own daughter’s medical data. It showed me the true value of data, and led me to believe the individual should have some rights over it. It beckons me to learn more about the data market and data technology of the healthcare industry. It even led me to learn more about medical data saving controversies that erupted in Korea and about an uptick in medicinal overdoses due to “medical shopping”.
Data collection
Whenever one gets a prescription in Korea, they are always presented with two copies. One is for the patient and the other is for the drug store. I collected almost a year’s worth of paper prescription from my daughter’s doctor visits and transcribed the data into a spreadsheet.
I used ten variables for column headers, with key variables being “disease code” and “medicine name” to verify the existence of a correlation.
Data wrangling and analysis
Making a chart requires knowledge of statistics, and making them attractive requires visual literacy and graphic skills. I chose R programming language so that I could use a diversity of charts to make beautiful visualizations. Using graphic tools to create charts is both slow and imprecise. By using R, I could become more familiar with statistics and programming at the same time.
Data visualization
Before compiling the data, I had no idea what to ask or suspect. While wrangling the data, questions began to arise like why a certain diagnosis occurred so frequently, or the different prescribing habits of each doctor. In making these charts, not only were trends revealed, but it also brought new thoughts and questions to mind, deepening my desire to learn more.
Q. Which diseases were diagnosed?
Chart 01: Stacked Bar
Disease codes are selected by the doctor to prescribe medicine. In the patient prescription copy, there were two major disease codes. To find out what disease she had most often in a year, I made a stack bar chart: the x axis is disease code and the y axis is the number of times it appears. Disease codes are the combination of alphabet and numbers, I factorized them to make a bar chart. The graph shows how often each disease code occured.
Q. Which medications were prescribed for each disease code?
Chart 02: Bar, dodge
The positions were adjusted by dodging overlaps to the side. This makes it possible to more accurately know the type and dosage of drug prescribed by disease code classification.
Q. Which medications were prescribed the most?
Chart 03: Bar with monotone
The proper quantity could not be directly compared between different medications, so I utilized a ratio comparison. Also, there were 32 different medications; too many to simply convey information by assigning each medication a distinct color. An overload of colors would not help to differentiate between each medication. Using only one color was easier on the eyes, and I was able to differentiate between each one by adjusting the opacity. I used the color palette package to achieve this effect.
Q. What kind of medicine were prescriped per disease codes?
Chart 04: Grid Map
I decided to sort each medication by disease classification code. Each box represents a disease code combination by doctor per visit. Multi-panel plots looked better than one bar chart to compare disease codes.
Visual literacy
Disease codes are like blocks. The blocks can be combined with different codes for each doctor encounter. The graph below shows each code combination. For example, the code H6501 was diagnosed with J209, J459, and L2088 on different encounters. The density of color represents the number of medication administrations per day. The code H6501 represents otitis media and J209 is bronchitis. The graph shows that bronchitis was diagnosed several times that otitis media was also diagnosed.
The result
The most important takeaway from all this was that ordinary citizens do not have much right to own or use their own data. If people were able to easily access and browse their own data it would help them to be healthier and protect them from overprescription of medications. ​
Second, this exercise helped me to see data from different perspectives.
An idle curiosity moved me to learn R programming language to wrangling the data. It took more than a year to become versed in R. This personal project motivated me to learn R, and I have become more savvy at data analysis and data visualization because of it.