Portrait of Baixue (Doris) Zhang

Baixue (Doris) Zhang

Statistics Grad Student | TA | Aspiring Data Analyst

Passionate about transforming data into meaningful insights, with a strong interest in human–AI cooperation for medical applications. Currently pursuing graduate studies in Statistics while gaining hands‑on experience on innovative data analysis projects. Besides, I also serve as a Teaching Associate, where I strengthen my communication skills by explaining complex concepts clearly, fostering interactive learning, and guiding students to develop problem-solving strategies.

About Me

A passionate data enthusiast with a strong foundation in statistics and a drive to uncover meaningful insights from complex datasets.

Skills

Domain Knowledge

ANOVA
Hypothesis Testing
Linear & Logistic Regression
GLMM
Power Analysis
Multiple Testing Correction
Time Series
SARIMA
Stochastic Process
Markov Models
Brownian Motion
Poisson Process
Dimensionality Reduction
Probability
Linear Algebra

Programming & Development

R Programming
Python
LaTeX
HTML/CSS
JavaScript
Flask
Streamlit
Prompt Engineering
Feature Engineering
Responsive Design

Tools & Frameworks

Pandas
PyPDF2
Git
Tableau
Cursor
ggplot2
lme4
forecast
tseries
R Markdown
LaTeX with R
Virtual Environments
LLMs & Prompt Engineering

Honors & Certifications

🏆

Beverly L. Mecklenburg Scholarship

SJSU, Dept. of Mathematics & Statistics (2025)
Merit-based departmental award recognizing academic excellence in mathematics/statistics.

📊

SOA Exam P (Probability)

Passed January 2025
Professional actuarial certification demonstrating expertise in probability theory.

🏅

Outstanding Achievement Award

Chinese American Biopharmaceutical Society (CABS) — Summer 2025
For contributions to clinical-trial analysis AI (DataCraft & EfficacyLens); featured in CABS magazine.

Featured Projects

A collection of data analysis and statistical modeling projects showcasing my skills in various domains.

📊

Simulation in R and GLMM Models

Advanced statistical simulation project implementing Generalized Linear Mixed Models (GLMM) in R. Explored complex hierarchical data structures, random effects modeling, and simulation techniques for statistical inference and model validation.

R GLMM Simulation Mixed Models Statistical Inference
🤖

EfficacyLens AI Agent

Developed an AI tool to compare two clinical trials within five minutes. Automated extraction of endpoints, trial design, and efficacy outcomes to support strategic drug evaluation. Enhanced accuracy and speed of comparative analysis in oncology and cardiovascular trials.

Python AI Agents Google Gemini Streamlit Prompt Engineering Healthcare Technology
⚕️

DataCraft AI Agent

Built an AI agent that programmatically generates customizable virtual patient records supporting English/Chinese names and common diseases via a Flask+Gemini API with a lightweight web UI and CSV export. Exposed REST endpoint for integration.

Python Flask Google Gemini HTML/JavaScript API Healthcare Informatics

Time Series Models Identification, Forecasting and Confidence Intervals

Analyzed four time series datasets to identify suitable models in R. Applied SARIMA and other techniques to two real-world datasets for forecasting and interval estimation with advanced statistical methodology.

R SARIMA Time Series Forecasting
♟️

Dimension Reduction for Chess Dataset

Engineered features from the variable "moves" and applied both linear (PCA, Factor Analysis) and nonlinear (t-SNE) dimensionality reduction methods to uncover patterns in chess game data and player behavior.

R PCA Factor Analysis t-SNE
🚗

Car Price Prediction with Linear Regression and Categorical Variables

Led variable selection to identify 8 key predictors from 25 features. Addressed multicollinearity and imbalanced categorical variables in building robust regression models for automobile price prediction.

R Python Regression Feature Selection
📺

TV Show "Friends" Data Visualization

Visualized line distributions, tested dialogue patterns against plot points, and modeled the relationship between IMDB ratings and viewership using advanced R visualization and inferential analysis techniques.

R Data Visualization Inferential Analysis Statistical Testing
🎮

Bloxed - Fred's House Building Adventure

A web-based survival game where players help Fred gather resources and build a house in a forest environment. Features interactive gameplay with resource management, energy systems, and HTML5 Canvas rendering.

JavaScript HTML5 Canvas CSS3 Game Development Web Development

Students Feedback

I'm also passionate about communication and teaching. Students describe me as clear, approachable, and supportive.

Experience & Education

My academic journey and professional development in statistics and data science.

August 2025 - Present (Volunteer)

Website Editor

Chinese American Biopharmaceutical Society - Remote, CA

Help manage and update CABS websites, ensuring accurate and current information for the community. Contribute to digital presence and accessibility of resources for members.

June 2025 - August 2025

Data Science Summer Intern

Chinese American Biopharmaceutical Society - Remote, CA

Developed DataCraftAgent, an AI-powered virtual patient generator for breast cancer clinical trials. Built full-stack web application using Python Flask, Google AI API, and JavaScript to generate realistic patient profiles with 30+ clinical variables. Implemented batch processing and CSV export functionality to accelerate clinical trial planning and reduce patient recruitment bottlenecks.

January 2025 - Present

Teaching Associate (Part-time)

San José State University - San Jose, CA

Led undergraduate workshops for Calculus III, Calculus I, Precalculus, and College Algebra; graded quizzes and supported students in achieving academic success. Facilitated summer Zoom sessions, using breakout rooms to promote collaborative group work and provide individualized academic support.

January 2025 - May 2025

Grader (Part-time)

San José State University - San Jose, CA

Graded assignments and quizzes for two Probability Theory courses. Maintained records, gave feedback, and collaborated with instructors to ensure consistent evaluation standards.

August 2024 - Present

Master of Science in Statistics

San José State University

Advanced coursework in statistical theory, applied statistics, data mining, and machine learning. Focus on practical applications in data analysis and statistical modeling.

Spring 2024 - May 2024

Community College coursework

Canada College

I took Introduction to Computer Science and Introduction to Java at Canada College as the starting point of my Data Science career.

October 2022 - Present

Troop 61920 Cookie Chair (Volunteer)

Girl Scout of Northern California - CA

Oversee the Cookie Program for my daughter's troop, coordinate family cookie meetings, manage my troop's cookie inventory, handle troop finances including making deposits, distribute girl recognitions, coordinate booth sales, and communicate with girls and families.

2015 - 2023

Full-time Parent 👧👧

Home - Dedicated to raising two kids

Devoted eight years to raising two kids, developing strong organizational, multitasking, and communication skills. Managed household operations, coordinated educational activities, and fostered a nurturing environment that supported their growth and development.

January 2013 - January 2014

Teaching Assistant for Foreign Teachers

Beile English Learning Center - Beijing, China

Assisted foreign teachers with lesson planning and classroom management. Hosted parent meetings, handled make-up lessons, and assessed over 100 students' academic progress. Acted as translator between parents/students and foreign teachers.

September 2008 - June 2012

Undergraduate Student

Hebei North University - Hebei, China

I hold a Bachelor's degree in Agricultural Science from Hebei North University.

Get In Touch

I'm always interested in discussing new opportunities, collaborations, or just connecting with fellow data enthusiasts.

Let's Connect

Whether you're looking for a data analyst, want to discuss statistical methods, or simply want to network, I'd love to hear from you.

I'm particularly interested in opportunities involving:

  • Statistical analysis and modeling
  • Data visualization and reporting
  • Machine learning applications
  • Research collaborations
  • Teaching and mentoring opportunities

Feel free to reach out via email or connect with me on LinkedIn. I typically respond within 24 hours.

Send a Message