Architecture
6-stage ResNet with 12 residual blocks (32→512 channels), BatchNorm, and 1×1 conv skip connections. ~4.2M parameters.
Training
Adam optimizer (lr=1e-4) with StepLR scheduling (step=5, gamma=0.1). Early stopping with patience=5 on validation loss.
Key Innovation
NaN-aware MSE loss masks missing labels, training on all 7,049 samples instead of discarding the ~70% with incomplete annotations.
Output
30 values (x,y for 15 keypoints): eyes, eyebrows, nose tip, mouth corners, and lips. Predictions shown as red dots overlaid on each face.