antibiotic_resistance

Antibiotic Resistance — A Data-Driven Exploration

Understanding resistance patterns before building a predictive model

This project explores a synthetic clinical microbiology dataset to understand how bacterial resistance, patient demographics, and temporal trends interact.
Rather than jumping straight into machine learning, the analysis focuses on discovering what the data can reveal on its own; a crucial mindset in real-world data science.

This repository contains the full workflow: from cleaning and reshaping raw clinical data to constructing a first resistance baseline model and visualizing higher-order patterns such as resistance indexes and temporal dynamics.


Project Objectives


Key Insights

Resistance Index (multi-antibiotic measure)

By calculating the proportion of antibiotics for which each isolate is resistant, we uncover:

Temporal Dynamics

A multi-panel visualization shows how infections fluctuate year by year for each species.
These trends often reflect:

Bacterial Behaviour

Species maintain characteristic profiles:

Correlation Structure

A heatmap reveals minimal strong correlations, meaning:

Baseline Predictive Model

A logistic regression model was used as a preliminary benchmark.

Performance Summary:

This prototype sets the stage for tree-based models, SHAP interpretability, and cost-sensitive learning.


Data Cleaning & Preparation

The notebook includes extensive preprocessing:


Future Work


📖 Related Blog Post

A full narrative walkthrough of this project is available here:
👉 https://lucascarpantonio.github.io/antibiotic_resistance/blog/


🧑‍💻 Author

Luca — Data Scientist in progress, focusing on real-world analytical pipelines.