You’re at the grocery store and someone calls: “Hey, can you pick up the dry cleaning by 5?” You say sure. You hang up. You grab your groceries. You drive home. You unpack. At 6:30, standing in the kitchen, it hits you.

The dry cleaning.
The problem wasn’t that you forgot. The problem was that the moment you heard it, you had a three-second window to capture it — and nothing in your toolkit could move that fast.
The Working Memory Window
Cognitive science has a useful concept: working memory capacity. It’s the number of things you can hold in your head at once. For most people, it’s 4-7 items. For some people, it’s fewer. Regardless of the number, there’s a universal rule: new information pushes old information out.
When someone says “dry cleaning by 5,” that occupies one slot. Then you put your phone away (slot replaced by next thought), walk to the produce section (replaced by “do I need onions?”), and the dry cleaning is gone. Not forgotten in the long-term sense — you’ll remember it later, probably too late. Gone from working memory, where action happens.
The only way to beat this: capture the information before working memory moves on. That means right now. Not in two minutes. Not when you’re done shopping. Now.
Measuring Input Speed
I timed the major input methods for adding “dry cleaning by 5pm” to a planning tool:
Typing into a calendar app:
- Unlock phone (1s)
- Open Calendar (2s)
- Tap + (1s)
- Type “Dry cleaning” (3s)
- Scroll to time picker, set 5pm (4s)
- Tap Save (1s) Total: ~12 seconds
Typing into a todo app:
- Unlock phone (1s)
- Open app (2s)
- Tap + (1s)
- Type “Pick up dry cleaning by 5pm” (5s)
- Tap done (1s) Total: ~10 seconds
Voice input:
- Trigger voice input (1s)
- Say “Pick up dry cleaning by 5” (2s) Total: ~3 seconds
Twelve seconds vs. three seconds. That’s the difference between a tool you’ll use in a grocery store aisle and a tool you’ll use only when you’re sitting at your desk with time to spare.
Why Three Seconds Is the Threshold
Three seconds is approximately:
- The length of a single spoken sentence
- The window before a new thought replaces the current one
- The maximum effort most people will exert for a “small” task while doing something else
Above three seconds, the cost-benefit calculation flips. The effort of capturing feels larger than the risk of forgetting. Your brain says “I’ll remember this” — not because it will, but because adding it to the tool feels like too much friction for something that seems simple.
Below three seconds, capture is essentially free. It costs less effort than switching apps on your phone. It happens before the thought fades. It becomes reflexive instead of deliberate.

The Friction Cascade
Every second of input friction has a compounding effect:
At 3 seconds: You capture things as they happen, even when busy. Your system is current. Nothing falls through the cracks because the crack doesn’t have time to open.
At 10 seconds: You capture things when it’s convenient. Some things get captured. Others get deferred (“I’ll add it later”) and forgotten. Your system is partially current. Gaps appear.
At 30 seconds: You capture things during dedicated planning time. Everything between planning sessions is untracked. Your system is a snapshot, not a live view. Major gaps. Missed appointments.
At 60+ seconds: You stop capturing. The tool is too cumbersome. You’re back to “I’ll just remember.”
Each tier loses a percentage of the things that happen in your life. The only tier that catches everything is the first one: three seconds or less.
Voice Is the Only Input That Hits Three Seconds
There’s no typing-based interface that can match voice speed for natural-language input. Here’s why:
Typing requires translation. You have to convert a thought (“oh right, dentist Thursday afternoon”) into discrete fields: title, date, time. That translation step is where friction lives.
Voice is pre-translated. You just say the thought. “Dentist Thursday at 2pm.” The words come out in the same form the thought exists in your head. No translation needed. The tool parses the natural language and creates the structured event.
Typing requires both hands and your eyes. You need to look at the screen, navigate the app, and type. You can’t do this while driving, carrying groceries, or walking into a meeting.
Voice requires your mouth. You can capture while doing almost anything else. The physical barrier is nearly zero.

What This Means for Tool Selection
When evaluating a planning tool, the most important feature isn’t the number of views, the integration list, or the AI capabilities. It’s this:
How many seconds does it take to go from “I just learned about something” to “it’s captured”?
If the answer is more than five seconds, you will lose things. Not because you’re careless. Because the tool is slower than your life.
The tools that survive long-term adoption are the ones that match the speed of thought. Everything else is a demo that works great during a calm Sunday planning session and fails completely during a busy Tuesday.
Your life happens in real time. Your planning tool should too.


