Most apps that use AI to analyze what you’re doing — your face, your voice, your gestures — send that data to a server. A camera feed goes up, gets processed on someone else’s hardware, and a result comes back. That’s cloud-based inference, and it’s the default architecture for most AI-powered products.
On-device AI detection is different. The machine learning model runs entirely on your computer. Your data never leaves your machine. There’s no upload, no server, no round trip. For an app that watches you through a camera to detect a habit, this distinction matters enormously.
Here’s how it works, using Nailed as a concrete example.
The Standard Cloud Approach (And Why It’s a Problem)
In a typical cloud-based AI pipeline, the flow looks like this:
- Your device captures input (camera frame, audio clip, sensor data)
- That input gets compressed and sent to a remote server
- The server runs the ML model on the input
- The server sends the result back to your device
- Your device acts on the result
This architecture has three fundamental problems for anything involving a camera or microphone:
Latency. The round trip takes time. Even on a fast connection, you’re looking at 100–500ms of delay. For something like nail biting detection, where the goal is to alert you before or as the behavior happens, that delay makes the system significantly less useful.
Privacy exposure. Your camera feed — frames of your face, your hands, your environment — travels across the internet to a third-party server. Even if the company promises encryption and deletion, the data physically exists on hardware you don’t control, processed by code you can’t inspect.
Dependency. No internet? No detection. Server goes down? No detection. Company shuts down? No detection. Cloud-dependent apps have a single point of failure that you can’t control.
On-Device: The Model Lives on Your Machine
On-device AI detection eliminates the server entirely. The machine learning model is bundled with the application and runs locally.
For Nailed, the architecture looks like this:
- Your Mac’s camera captures a frame
- The frame goes directly to a local ML model running in WebAssembly
- The model outputs hand landmarks and face landmarks
- Local code analyzes whether a hand-to-mouth gesture is occurring
- If detected, your Mac flashes the screen and plays an audio alert
- The frame is discarded. Nothing is saved or transmitted.
Every step happens on your hardware. The frame never touches a network interface. There’s no HTTP request, no WebSocket connection, no telemetry endpoint. The app works identically whether you’re connected to gigabit fiber or sitting in airplane mode.
MediaPipe: Google’s On-Device ML Framework
The specific ML framework that makes this practical is MediaPipe, developed by Google’s research team. MediaPipe provides pre-trained, optimized models for common perception tasks:
- Hand landmark detection — identifies 21 3D landmarks per hand (fingertip positions, knuckle joints, wrist)
- Face landmark detection — maps 478 facial landmarks including lip boundaries, jawline, and nose position
- Pose estimation — full-body skeletal tracking
- Object detection, gesture recognition, and more
These models are specifically designed for edge deployment. They’re small (typically 2–10 MB per model), fast (real-time inference on consumer hardware), and accurate enough for production use.
Nailed uses two MediaPipe models: the hand landmarker and the face landmarker. By tracking where your fingers are relative to your mouth in 3D space, it determines whether you’re about to bite your nails. The gesture detection logic runs on top of the raw landmark data — no cloud inference needed.
WebAssembly: Near-Native Performance in Any App
Running ML models on-device used to require native code — compiled C++ or platform-specific frameworks like Core ML. WebAssembly (Wasm) changed that.
WebAssembly is a binary instruction format that runs at near-native speed in sandboxed environments. MediaPipe compiles to Wasm, which means it can run inside an Electron or browser-based application with performance close to what you’d get from native compiled code.
For Nailed, this means:
- No native dependencies — the ML models run in a Wasm runtime, making the app simpler to build and distribute
- Sandboxed execution — the model runs in an isolated environment with no direct access to your filesystem, network, or other processes
- Cross-architecture support — Wasm runs on both Intel and Apple Silicon, though Nailed targets Apple M1+ for optimal performance
- Small footprint — the entire detection pipeline adds a few megabytes to the app, not hundreds
The performance characteristics of Wasm on Apple Silicon are particularly good. M-series chips have powerful neural engine and GPU capabilities, but even the CPU execution path through Wasm provides smooth, real-time inference for MediaPipe’s lightweight models.
What “Real-Time” Actually Means
When Nailed processes camera frames, it operates on a continuous loop. Each frame from your Mac’s camera enters the pipeline, gets processed by the hand and face landmarkers, and either triggers an alert or doesn’t. The entire process — capture, inference, decision — takes single-digit milliseconds on an M1 or later Mac.
This is fundamentally impossible with cloud processing. Even under ideal network conditions, a cloud round trip adds 100+ milliseconds of latency. For habit detection, where the window between hand-rising and nail-touching might be 1–2 seconds, that delay is the difference between catching the behavior and missing it.
On-device processing also means consistent timing. There’s no variable network latency, no server queue, no congestion-dependent jitter. Frame N takes approximately the same time to process as frame N+1. This consistency matters for real-time applications more than raw speed.
The Privacy Architecture
On-device processing isn’t just a performance choice — it’s a privacy architecture. Here’s what it means concretely:
No data at rest. Camera frames exist only in memory during processing. They’re never written to disk, never cached, never logged. Once the landmark extraction is complete, the frame is gone.
No data in transit. There is no network component in Nailed’s detection pipeline. The app doesn’t make HTTP requests for inference, analytics, telemetry, or any other purpose. Network monitoring tools will show zero outbound traffic from the detection system.
No data on servers. There is no server. Nailed doesn’t have a backend, a database, an API, or cloud infrastructure. There’s nothing to breach because there’s nothing stored anywhere except on your Mac.
No behavioral profiles. Some cloud-based habit apps build profiles of your behavior over time — when you bite, how often, what triggers it. Nailed doesn’t track any of this. Each detection is independent and ephemeral. The app has no memory of previous sessions.
This matters because camera data is among the most sensitive information an app can access. A camera feed of your face, focused on your mouth and hands, running for hours at a time while you work — that’s exactly the kind of data that should never leave your machine.
Edge Computing vs. Cloud: A Technical Comparison
| Factor | Cloud Processing | On-Device (Edge) |
|---|---|---|
| Latency | 100–500ms+ | 1–10ms |
| Offline support | No | Yes |
| Privacy | Data leaves device | Data stays local |
| Accuracy | Potentially higher (larger models) | Excellent for focused tasks |
| Scalability | Server costs scale with users | Zero server cost |
| Reliability | Depends on internet + server uptime | Always available |
| Cost to developer | Ongoing infrastructure costs | One-time model optimization |
For focused, real-time tasks like gesture detection, the edge computing approach is strictly superior on almost every axis. The one area where cloud has a theoretical advantage — the ability to run much larger models — doesn’t apply to tasks where lightweight, optimized models already achieve the necessary accuracy.
When On-Device AI Makes Sense
Not every AI application should run on-device. Large language models with billions of parameters still generally need server-side hardware (though that’s changing fast). Complex multi-modal tasks that combine vision, language, and reasoning may benefit from cloud resources.
But for these categories, on-device is clearly the better choice:
- Camera and microphone processing — anything analyzing your face, voice, or environment
- Health-adjacent applications — habit tracking, posture monitoring, exercise form
- Real-time feedback systems — where latency directly affects usefulness
- Privacy-critical applications — where data sensitivity outweighs the benefits of cloud processing
- Offline-capable tools — where internet dependency is unacceptable
Nailed sits at the intersection of all five categories. It processes camera data, it’s health-adjacent, it needs real-time feedback, the data is highly sensitive, and it should work without internet. On-device processing isn’t a nice-to-have here — it’s the only architecture that makes sense.
What to Look for in Privacy Claims
Not all “on-device” claims are equal. Some apps process data locally but still phone home with metadata, usage statistics, or anonymized behavioral data. Here’s what to verify:
- Does the app work in airplane mode? If it does, the core functionality is genuinely local.
- Does it require an account? Account creation implies a server and data storage.
- What does network monitoring show? Tools like Little Snitch or Wireshark can verify whether an app makes outbound connections.
- Is the privacy policy specific? “We take your privacy seriously” means nothing. Look for concrete statements: “No data is collected. No data is transmitted. No servers are used.”
- Is the processing model documented? Apps with genuine on-device processing can and should explain their technical architecture.
Nailed’s approach: works offline, no account required, no outbound connections, privacy policy explicitly states zero data collection, and the technical architecture (MediaPipe + WebAssembly + Electron) is documented.
The Future of On-Device AI
The trend is moving toward more on-device processing, not less. Apple’s Neural Engine gets more powerful every chip generation. WebAssembly performance improves yearly. ML model optimization techniques (quantization, pruning, knowledge distillation) keep shrinking model sizes while maintaining accuracy.
What required a data center five years ago runs on a laptop today. What requires a laptop today will run on a phone next year. The practical result is that fewer and fewer AI applications need to send your data to a server.
For users, this shift is unambiguously good. Faster responses, better privacy, offline capability, and no dependency on someone else’s infrastructure. The tradeoff — slightly smaller models with slightly narrower capabilities — is a tradeoff most people would happily make for a camera-based application that runs eight hours a day.
Nailed is built entirely on this principle. A $4.99 one-time purchase, running on your Mac, processing everything locally, collecting nothing. That’s what on-device AI detection looks like in practice.
Frequently Asked Questions
Does on-device AI detection work without an internet connection?
Yes. Once the model files are downloaded with the app, all processing happens locally on your hardware. No internet connection is required for detection to function. This is one of the core advantages of on-device processing — it works offline, every time.
Is on-device AI detection as accurate as cloud-based processing?
For many tasks, yes. Models like MediaPipe are specifically optimized for on-device performance and achieve high accuracy for tasks like hand tracking and gesture recognition. The accuracy gap between edge and cloud models has narrowed significantly, especially for focused, single-purpose detection tasks.
Does on-device processing drain battery or slow down my computer?
Modern on-device models are designed for efficiency. Nailed uses WebAssembly-compiled MediaPipe models that run with minimal CPU overhead on Apple Silicon. Most users don’t notice any performance impact during normal use. The models are small (a few megabytes) and the processing is lightweight.
What data does Nailed collect during detection?
None. Zero. Nailed processes camera frames in real-time on your Mac, detects hand and face landmarks, runs the gesture analysis, and discards every frame immediately. No images are saved, no data is logged, no information leaves your device. There is no server to send data to.