Convolutional Neural Network (CNN) in the domain of dermatology #
Author: Nikita Kurkulskiu
Full implementation on
GitHub.
Introduction #
Skin-cancer screening with deep learning can reduce diagnostic workload, but
black-box models hinder clinical adoption.
We fine-tune a ResNet-18 to separate melanoma from benign lesions on
the open HAM10000 dermatoscopic dataset, then make its decisions
explainable and auditable with T-CAV (Testing with Concept Activation
Vectors). T-CAV quantifies how much human-defined concepts
(age, sex, body location, diagnosis labels) drive predictions, exposing hidden
bias and validating clinically relevant factors.
Background #
Classical explainability (Grad-CAM, LRP) highlights where a network looks, but
does not answer what semantic attributes it uses.
T-CAV fills this gap:
- Concept definition – A user provides a small image set exemplifying a concept (e.g., “male patients”).
- CAV learning – A linear separator in latent space defines a Concept Activation Vector.
- TCAV score – The directional derivative of the class logit along the CAV is computed; the proportion of positive derivatives estimates concept influence.
- Bootstrap p-value – Random sign-flipped CAVs test statistical significance.
Section 1 – Dataset #
- HAM10000 (Kaggle): 10 015 dermatoscopic JPEGs across 7 diagnostic categories.
- We convert metadata into >100 concepts (age buckets, sex, localisation,
diagnosis, etc.) using
concept_prep.py.
Section 2 – Ethical Motivation #
Demographic or anatomical biases in datasets can translate into unfair or unsafe predictions. Quantifying concept influence lets us:
- Detect gender/age bias.
- Validate model focus on clinical features instead of artefacts (e.g., histology stamp).
Methodology #
3.1 Code Organisation #
Xai-Project/
├── archive/ # Raw HAM10000 images + metadata CSV
├── data_dir/ # train/val/test split (benign vs malignant)
├── concepts/ # >100 sub-folders with concept images
├── tcav_results/ # Auto-generated bar-charts & report
├── train.py # CNN training script
├── data_prep.py # Builds data_dir/
├── concept_prep.py # Builds concepts/
├── tcav.py # Basic TCAV (single class)
├── statistical_check.py # Full pipeline: CAV, bootstrap, markdown report
└── TCAV_Report.md # Auto-generated
3.1 Model & Code Snippets #
Training (excerpt from train.py)
model = models.resnet18(weights=models.ResNet18_Weights.IMAGENET1K_V1)
model.fc = nn.Linear(model.fc.in_features, 2) # 2 classes
...
for epoch in range(epochs):
train_loss, train_acc = train_one_epoch(...)
val_loss, val_acc = evaluate(...)
T-CAV scoring (excerpt)
def tcav_score(loader, model, capt, cav, cls_idx):
cav_t = torch.from_numpy(cav).float().to(device)
hits = total = 0
with torch.enable_grad():
for x,_ in loader:
x = x.to(device).requires_grad_(True)
model.zero_grad()
model(x)[:, cls_idx].sum().backward()
dd = (capt.act * cav_t).sum(1) # directional derivative
hits += (dd > 0).sum().item(); total += x.size(0)
return hits / total
3.3 Pipeline #
flowchart LR
A[data_prep.py] --> B(train.py)
B -->|model_best.pth| C(statistical_check.py)
C --> D[TCAV_Report.md<br>bar charts]
E(concept_prep.py) --> C
Insights #
Key TCAV Findings (bootstrap p ≤ 0.05) #
| Class | Concept | Score | p-value | Interpretation |
|---|---|---|---|---|
| malignant | sex_male | 1.000 | 0.030 | Strong male bias – model nearly always boosts melanoma likelihood for males. |
| malignant | dx_mel | 0.974 | 0.056 | High influence, near significance threshold (model recognizes “melanoma”). |
| malignant | localization_chest | 0.990 | 0.054 | Chest location strongly pushes toward malignant. |
No benign concepts passed p ≤ 0.05; highest benign concept waslocalization_lower_extremity (score 0.984, p = 0.090).
Bias & Fairness Analysis #
- Gender:
sex_maleTCAV = 1 → network over-predicts melanoma for men. - Age: Scores > 0.70 for age ≥ 60, but p > 0.14 – weak evidence.
- Location: Chest/trunk dominate; ear/hand minimal influence, indicating imbalance.
Conclusion #
- Achieved ~90 % validation accuracy with ResNet-18 on HAM10000.
- T-CAV uncovered pronounced demographic and anatomical biases beyond lesion visuals.
- Recommended next steps:
- Apply adversarial debiasing (prototype in
statistical_check.py). - Introduce visual ABCD concepts (asymmetry, border irregularity).
- Validate findings on external datasets (PH2, Derm7pt).
- Apply adversarial debiasing (prototype in