Follow-Me Gesture-Controlled Drone System

EECS 206A · Autonomous Systems · Drones · UC Berkeley · Aug — Dec 2024

An autonomous drone system that follows a person using real-time face tracking and responds to hands-free gesture commands for media capture. Built on the DJI Tello platform with PID-based feedback control, Haar Cascade face detection, and IMU-based snap gesture recognition via a custom sensor glove.

Designed for scenarios where both hands are occupied — such as mountain biking or kayaking — eliminating the need for a separate controller operator. The drone autonomously maintains a consistent spatial relationship with the target, recovers when the face leaves the frame, and executes media commands triggered by hand snaps.

Demo Walkthrough

  1. Program starts — drone scans the room by rotating until a face is detected
  2. Face locked — drone adjusts yaw, height, and distance to maintain a constant transform
  3. Target exits frame — drone remembers last known position and rotates in exit direction to reacquire
  4. Single snap gesture — drone flies a circular path while recording panoramic video, then resumes tracking
  5. Double snap gesture — drone captures and saves a still photo
View Source
5
Team Members
9
Python Modules
~1K
Lines of Code
3-Axis
PID Control

Team & My Role

During the course project I built the media control module (panoramic recording, photo capture, command dispatch), contributed to the face detection & following module, and ran end-to-end system testing. After the course ended I independently refactored and rewrote the entire codebase — restructuring modules, cleaning up architecture, and adding future-improvement infrastructure. The current open-source repository reflects almost entirely my post-course rewrite.

Honghuai Ke
Media control, face detection & following, system testing, full codebase refactor
Zehan Ma
Face detection, following module, integration
Tianhao Wu
Face detection, following module, integration
Xiaowen Wang
Sensor monitoring modules, system testing
Nolan Lautrette
Sensor monitoring, media control modules

System Architecture

High-level system design diagram
High-level system design — Python backend connects DJI Tello, face tracking, sensor module, and command pipeline
System communication pipeline
Communication pipeline — IMU sensors → I²C → Microcontroller → Serial → Laptop → WiFi → Drone

Key Features

PID Control Response

PID step response for yaw and vertical control
Simulated PID step response — yaw and vertical controllers converge within ±5% in ~2 seconds

Gesture Detection

Gesture detection state machine diagram
Snap gesture state machine — single snap triggers circle recording, double snap takes a photo
Raw IMU sensor data showing snap gesture spikes
Raw dual-IMU data — acceleration spikes from snap gestures detected by the threshold-based state machine

Multi-Rate Timing

Multi-rate system timing diagram
Three concurrent subsystems at different rates — decoupled via dual-process architecture and file-based IPC

Design Decisions

Tradeoffs & Limitations

Code Highlights

PID Controller with Anti-Windup
class PIDController:
    def update(self, error, dt):
        self._integral += error * dt
        self._integral = max(-self.limit, min(self.limit, self._integral))  # anti-windup
        derivative = (error - self._prev_error) / dt if dt > 0 else 0
        self._prev_error = error
        output = self.kp * error + self.ki * self._integral + self.kd * derivative
        return max(-self.limit, min(self.limit, output))
IMU Snap Detector (State Machine)
class SnapDetector:
    """Detect snap gestures from dual MPU6050 acceleration differentials."""
    def detect(self, accel_1, accel_2, timestamp):
        diff = abs(accel_1 - accel_2)
        if self.state == "idle" and diff > self.threshold:  # 50,000 units
            self.state = "detected"
            self.last_snap = timestamp
        elif self.state == "detected":
            if timestamp - self.last_snap > self.window:    # 0.7s timeout
                return self._emit(self.snap_count)             # 1=circle, 2=photo
Atomic File-Based IPC
class CommandChannel:
    """Lock-free IPC via atomic os.replace β€” no corruption, no locks."""
    def write(self, data):
        tmp = self.path + ".tmp"
        with open(tmp, "wb") as f:
            pickle.dump(data, f)
        os.replace(tmp, self.path)  # atomic on POSIX β€” no partial reads

    def read_new(self):
        data = pickle.load(open(self.path, "rb"))
        return data[self._cursor:]   # only unprocessed commands

How It Works

Face Detection: Each video frame is converted to grayscale and processed by a Haar Cascade classifier. When multiple faces are detected, the system selects the largest by bounding box area (the closest person). The face center coordinates and area are passed to the tracking controller.

PID Tracking: Two independent PID controllers compute yaw speed (to center the face horizontally) and vertical speed (to position the face at the upper quarter of the frame). Forward/backward movement uses threshold logic on the bounding box area to maintain the target within a 14,000–15,000 pixel range. When the face is lost for more than 15 consecutive frames, the drone rotates in the last known direction to search.

Gesture Recognition: Two MPU6050 IMU sensors mounted on a glove detect snap gestures by monitoring the acceleration differential between sensors. When the difference exceeds a 50,000-unit threshold, a snap is registered. A state machine with hysteresis prevents false positives, and consecutive snaps within a 0.7-second window are grouped into a single gesture command.

Command Pipeline: The gesture process writes snap counts to a pickle cache file using atomic file operations (write to temp file, then os.replace). The tracking process polls for new commands: 1 snap triggers a circular panoramic recording flight, 2 snaps capture a photo. This decoupled architecture allows the 200Hz sensor loop and 10Hz control loop to operate independently.

Hardware

DJI Tello drone with 720p camera
DJI Tello — 720p camera, WiFi
Gesture recognition glove with labeled IMU sensors and Teensy
Gesture glove — 2× IMU + Teensy 4.1
Teensy 4.1 microcontroller
Teensy 4.1 — ARM Cortex-M7, 600MHz
MPU6050 6-axis IMU sensor
MPU6050 — 6-axis accel + gyro
Drone
DJI Tello (WiFi, 720p camera, ~10min flight)
Microcontroller
Teensy 4.1 (ARM Cortex-M7, 600MHz)
Sensors
2× MPU6050 IMU (6-axis accel + gyro)
Interface
Custom gesture glove (I2C wired to Teensy)

Software Stack

Language
Python 3.9+
Vision
OpenCV (Haar Cascade Classifier)
Drone SDK
djitellopy (DJI Tello Python SDK)
Control
PID (Kp=0.2, Ki=0.04, Kd=0.005)
IPC
Atomic pickle file channel
Sensors
PySerial @ 38400 baud (USB)
Python OpenCV PID Control DJI Tello Computer Vision Autonomous Flight Gesture Recognition IMU Robotics EECS 206A

Challenges & Solutions

Future Work

References