Font Awesome Icons

Yi-Shin Sheu

Data Science & Analytics | Real-world Data | Healthcare

Hello! I thrive on unraveling healthcare challenges and improving population health. With a sharp critical mindset and advanced analytical skills, I turn real-world healthcare data into impactful, actionable insights.

About Me

Yi-Shin Sheu, Ph.D.

As a Research Data Analyst at Kaiser Permanente, I specialize in applying advanced data cleaning techniques, statistical modeling, and data visualization tools to decipher patterns and trends from our large patient database. I closely collaborate with physicians and research scientists to turn ideas into actionable insights, positively contributing to the health outcomes of our patients and the broader community in the realm of population health.

I come from a strong academic background, with a Ph.D. in Psychological and Brain Sciences and post-doctoral training in cognitive neuroscience at Johns Hopkins University. Through this journey, I’ve enhanced my ability to think critically, communicate effectively, and adapt dynamically in challenging situations. These refined abilities significantly enhance my delivery of impactful, data-driven insights and solutions.

Beyond my professional pursuits, I find joy in outdoor adventures like hiking, camping, and exploring state parks with my canine companion. I also indulge my creative side through hands-on DIY projects, learning the ropes from YouTube videos.

Research Projects & Publications

My recent publications, a result of collaborations with physicians at Kaiser Permanente, focus on population health and Health Economics and Outcomes Research (HEOR). Furthermore, I highlight my writing skills through earlier first-author publications from my time as a cognitive neuroscience researcher.


How do we use EHR to evaluate the value of a healthcare intervention and identify areas of improvement?

Using text mining-based tool to extract structured and unstructured data from EHR.

Mining of EHR: EHR data is messy, due to the lack of standardization in how it was collected. Here, I used a combination of rule-based logics and regular expressions to extract information from patient's chart for HBV pathway evaluation.

Read More

How do we evaluate the effect of a large-scale event like COVID-19 in our natural environment?

Interrupted time series: A quasi-experimental approach for evaluating longitudinal effects of interventions

My work at Kaiser allows me to use the huge amount of Electronic Medical Record (EMR) in our institute to compare the trends of mental health utilization before vs. after COVID-19. This is to gain insights of how our members change the way they use mental health servies since the pandemic.

Read More

How does the brain make predictions?

Cerebellum: predicting the future

The brain is constantly making predictions of the future states in what we see, what we do, and what we think. My research focused on the predictive function of the cerebellum in cognition. We tested the cerebellum's contribution to sequence prediction using TMS and fMRI.

Read More

How do we optimize task performance?

Tapping into our predictive minds

Sometimes a simple tweak in the task design can make tremendous difference in how efficiently we process stimuli. By having a feature that directly taps into the brain’s predictive power, it allows the participant to make predictions which greatly enhances task performance.

Read More

How do we choose between two task rules?

Neural Decoding of Task Rules

Machine learning and pattern classification techniques can help neuroscientists to make inferences about the representational contents of the brain. For the first time we showed a neural mechanism that involves the selection of a task rule.

Read More

Data Project Portfolio

In my portfolio, you'll discover practical python code snippets in Jupyter Notebook. Each example demonstrates how I approach and solve real-world problems through data, accompanied by explanations of the statistical models I used for solving them.

Linear Mixed Effect Model

A mixed-effects model is ideal for repeated measures over time or nested observations within groups. It handles missing data and unbalanced designs, where subjects may have varying, unevenly distributed observations. In this demo, I showcase the linear mixed-effects analysis process, based on my work with a liver specialist physician.

Try my Jupyter Notebook on GitHub

Generalized Estimating Equation (GEE)

GEE provides population-average estimates for correlated data such as repeated measures and clustered data. GEE does not make assumption about the distribution of dependent variable. I demonstrated 2 examples using GEE, one with numerical value and the other with ordinal value as dependent variable.

Try my Jupyter Notebook on GitHub

Eye Tracking

I’ve written some code to calculate the total amount of eye movement at a given time window and concatenate them across time. The output can be incorporated into GLM analysis as a regressor.

Try it

Contact me ...

Also find me on ...