Voice planning on iPhone in 2026 means saying “dentist Tuesday at 2pm” and having Composed create a calendar event, resolve the location, generate an AI prep checklist, and schedule a departure alert — in under 4 seconds, hands-free. Composed beats Siri at intent parsing and Apple Reminders at follow-through because Composed’s edge function parse-event reads natural language end-to-end instead of matching it against a fixed reminder schema.

The rest of this page is the deep version: what happens in each of those 4 seconds, what Composed parses that Siri cannot, where the audio goes, how the ambiguity (“tomorrow at 3,” “next Tuesday,” “in two weeks”) gets resolved, and what to do when voice planning gets it wrong. If you want the marketing tour instead, the voice-to-calendar landing page has the highlights with screenshots.

How does voice planning actually work on iPhone?

Voice planning on iPhone is a four-step pipeline: capture the audio, transcribe it to text with OpenAI Whisper, parse the text into a structured event with the parse-event edge function, and write the event to Apple Calendar with a generated prep checklist and a departure alert. The whole pipeline runs in under 4 seconds for a typical sentence like “dentist Tuesday at 2pm on Main Street.”

Here is the 4-second flow, step by step.

Second 1 — Capture. You tap the voice button in Composed and speak. The audio buffers locally on iPhone using AVAudioEngine while you talk. There is no wake word and no continuous listening — Composed only records when the button is held or tapped.

Second 2 — Transcribe. The audio uploads to Composed’s transcribe-audio edge function on Supabase, which forwards it to OpenAI Whisper. Whisper returns text that preserves natural phrasing: “dentist Tuesday at 2pm on Main Street” comes back exactly as spoken, not normalized to “Event: dentist. Date: Tuesday. Time: 2:00 PM.”

Second 3 — Parse. The transcribed text hits the parse-event edge function. This is the load-bearing part. Where Siri pattern-matches against a fixed reminder schema (title + date + optional location), parse-event reads the whole sentence as natural language and extracts five fields at once: title (“dentist”), event date (Tuesday’s actual ISO date), time of day (14:00), location (“Main Street”), and event category (appointment).

Second 4 — Place lookup + write. The location string (“Main Street”) gets resolved through the search-places edge function, which queries nearby venues to spell the address correctly. The event writes to Composed’s local data store and syncs to Apple Calendar. The generate-checklist edge function runs in parallel to produce three to five prep tasks (“Bring insurance card,” “Confirm parking validation,” “Review last cleaning notes”). A departure alert schedules itself via LeaveByCalculator using current travel time plus, for flights, the 120-minute domestic or 180-minute international airport buffer.

You watched the dictation appear, tapped confirm, and put your phone back in your pocket. Composed handled the calendar, the location, the prep, and the leave-by — without typing.

What can Composed parse that Siri cannot?

Composed parses five fields at once from a single conversational sentence, while Siri parses one to two and routes the rest into a generic “Reminders” item. The structural difference is that Siri matches against EKReminder (Apple’s reminder schema) where Composed parses against an event schema that includes event date, time, location, attendees, category, prep, and travel intent.

What this looks like in practice:

You sayComposed parsesSiri parses
”Dentist Tuesday at 2pm on Main Street”Title, ISO date (next Tuesday), 14:00, location resolved, prep checklist generatedReminder titled “Dentist Tuesday at 2pm on Main Street” set for Tuesday at 2pm
”Pick up Sarah from JFK at 6:30 next Wednesday”Title, ISO date, 18:30, airport location, LeaveByCalculator schedules departureReminder titled with full sentence, no location intelligence
”Coffee with Maya at Stumptown Friday at 10”Title, Friday’s ISO date, 10:00, place discovery surfaces nearby Stumptown locationsReminder titled with full sentence
”Flight to LAX Friday morning, arrive at the gate by 8”Flight event, ISO date, 8:00, LAX airport in IATA dictionary, 120-minute domestic airport buffer, 5 graduated alerts (check-in, summary, boarding, gate close, layover)Reminder titled with full sentence, no flight handling
”Standup every Tuesday and Thursday at 9”Recurrence pattern detected — currently creates the first occurrence; recurring setup deferred to Apple CalendarRecurring reminder if Siri catches the “every” pattern

Apple Reminders is a single-trigger schema. It exists to remind you of one thing at one time. Composed’s parse-event is event-aware: a dentist appointment is not the same as a flight is not the same as “pick up dry cleaning before 6 today,” and the parser routes each into a different downstream behavior — prep checklist, IATA flight handling, floating todo without clock pressure.

The other gap is follow-through. Siri ends at “Reminder created.” Composed continues: the generated prep checklist appears below the event, the LeaveByCalculator schedules a smart departure alert, and the three-layer notification model surfaces gentle awareness 7 days out, action nudges in the final 7 days, and an urgency-tier alert in the final 24 hours.

Why does voice planning beat typing for ADHD brains?

Voice planning beats typing for ADHD brains because it removes the executive-function tax of opening an app, navigating to a form, and translating a thought into a structured input. The cognitive cost of typing an event into Apple Calendar is the cost of initiating the task — and initiation is exactly where executive dysfunction lives.

Russell Barkley’s work on executive function (the 1997 book ADHD and the Nature of Self-Control through his 2023 papers) frames ADHD not as a deficit of knowledge but as a deficit of acting on what you know. The dentist appointment that never gets entered isn’t because the person forgot — it’s because the gap between “I should put this in the calendar” and the actual ten taps required to put it there is too wide to cross while in the middle of something else.

Voice planning collapses that gap. The mental cost of saying “dentist Tuesday at 2pm” out loud is roughly the cost of thinking it. There is no app to open in the traditional sense, no fields to navigate, no decision about which calendar to file it under. You speak, Composed parses, the event exists.

Time blindness — the structural deficit in subjective time perception that Barkley documents — also benefits from how voice planning surfaces what comes before the event. When the dentist appointment auto-generates a prep checklist (“Bring insurance card · Confirm appointment time · Leave 22 minutes early for parking”), the iPhone is doing the time-forward forecasting that the ADHD brain has trouble doing on its own. The prep tasks become visible artifacts of the future, anchored to a specific calendar slot, instead of vague future-state pressure.

Where does the audio actually go — is it private?

Audio captured by Composed’s voice input is uploaded to OpenAI Whisper through Composed’s transcribe-audio edge function for transcription, returns as text, and is then discarded. The audio file is not stored, not retained, and not associated with your account after the transcription completes. Whisper is a cloud service — voice planning requires an internet connection.

The transcribed text is what persists. It hits the parse-event edge function, gets structured into an event, and lands in your Composed local data store (which syncs through Supabase to Apple Calendar via the calendar import bridge). Per Composed’s edge-function rules, transcribe-audio runs with --no-verify-jwt at the gateway and verifies your session internally — meaning the function works whether you have a fresh JWT, an expired-but-refreshable one, or are anonymous (anonymous voice transcription falls back to a credit-limited path).

What this means in plain terms:

  • The audio leaves your iPhone. It goes to Whisper. If that is not acceptable for your workflow, voice planning is not the right capture method — use the screenshot-import or typed-event paths instead.
  • The text persists. The transcribed sentence is stored on the event as the original capture, alongside the parsed structured fields.
  • The account is Sign in with Apple. Composed uses Sign in with Apple as the only auth path. Whisper sees the audio but never sees your Apple ID — the JWT that authorizes the upload is rotated per session.

This is a deliberate tradeoff. On-device transcription would be more private but currently performs worse than Whisper at handling the natural phrasings that make voice planning fast (“dentist Tuesday at 2 on Main”). The trade is accuracy for cloud round-trip.

How does Composed handle the “tomorrow at 3” / “next Tuesday” / “in 2 weeks” ambiguity?

Composed resolves relative-time expressions (“tomorrow,” “next Tuesday,” “in 2 weeks,” “this Friday,” “tonight”) to absolute ISO dates inside the parse-event edge function, using the current device date and timezone as the anchor. The resolution is deterministic and matches the conventions most English speakers expect — “next Tuesday” said on a Wednesday means the Tuesday six days later, not the one in 13 days.

The hard cases:

“Tomorrow at 3.” Resolved to tomorrow’s date at 15:00 local time. If “3” is spoken without “am” or “pm,” Composed picks the next sensible occurrence — if it’s currently 10am, “3” means 3pm today won’t conflict with “tomorrow,” so the parser defaults to 15:00. If you say “3” at midnight, the parser still picks 15:00 because “tomorrow at 3am” is more often a typo than an intent.

“Next Tuesday.” Resolved to the Tuesday after the upcoming Sunday. Said on Monday, “next Tuesday” is 8 days out. Said on Wednesday, “next Tuesday” is 6 days out. Said on Sunday, “next Tuesday” is 2 days out. This matches the dominant US English convention; the alternative reading (“the Tuesday in next week”) is treated as a fallback only when the dominant reading produces a date in the past.

“In 2 weeks.” Resolved to 14 days from today at the spoken time. “In 2 weeks at 9” means 14 days from today at 09:00. The time defaults to the morning (09:00) when only a number is spoken without am/pm context.

“This weekend.” Resolved to the upcoming Saturday at 10:00 by default. If “this weekend” is said on a Saturday or Sunday, the parser picks the current Sunday (or stays on Saturday if it’s currently morning).

“In an hour” or “this evening.” Vague time signals get reasonable defaults — “in an hour” is exact, “this evening” is 19:00, “lunchtime” is 12:00, “first thing tomorrow” is the next day at 09:00.

Timezone resolution for flights. This is the hot zone of the entire app. When you say “flight to LAX Friday at 6pm,” Composed stores eventDate in the departure airport’s timezone (your current TZ) and endAt in the arrival airport’s timezone (resolved from the IATA dictionary). A Tokyo-to-LAX flight at 6pm Tokyo time arriving at 11am Pacific the same day displays correctly across the timeline because the parser knows the two endpoints live in different timezones.

The full timezone-resolution rules are in Composed’s flight-handling layer — the IATA dictionary lookups, the FlightTimeFormatter for time-of-day display, the LeaveByCalculator for airport buffers. They sit underneath the voice pipeline and run for every flight event regardless of how it was captured.

What if voice planning gets it wrong?

Composed shows you the parsed event in a confirmation card before writing it to Apple Calendar, so most parsing errors are caught in the second between dictation and save. The card displays the parsed title, ISO date and time, location, and an Edit button — tapping Edit drops you into a standard event-edit screen with all fields preeditable.

The common error modes:

Wrong date. Whisper hears “Tuesday” as “Thursday,” or you said “next” but meant “this.” Tap Edit on the confirmation card and adjust the date picker. The parser logs the correction internally (used to improve the model over time, not stored against your account).

Wrong location. “Main Street” can match three places within a mile of you. The search-places edge function picks the closest match by default; tap Edit to choose a different result from the dropdown.

Wrong category. “Lunch with Sam” might get categorized as social when you wanted it as a meeting for prep-task generation purposes. Edit the category to regenerate the prep checklist with generate-checklist re-running against the new category.

Misheard word. “Dennis Tuesday” instead of “dentist Tuesday.” Tap Edit, fix the title, save. Composed uses the corrected title to regenerate the prep checklist.

Recurrence not picked up. Composed currently detects recurrence patterns (“every Tuesday,” “weekly,” “the first Monday of every month”) but creates only the first occurrence — recurring setup must be finished in Apple Calendar manually. This is a known gap; the workaround is to create the first event with voice and then long-press in Apple Calendar to set the recurrence rule.

Network down. Voice planning requires an internet connection because Whisper is cloud-based. When iPhone is offline, the voice capture screen displays a connection prompt and falls back to typed entry. The recording itself does not queue locally for later upload — the audio is discarded if the connection fails.

The deeper philosophy: Composed treats voice planning as the capture layer, not the final word. The point is to get the thought out of your head and into a system in under 4 seconds. Editing a misheard field takes 5 more seconds. The combined 9 seconds still beats the 45-second tap-through cost of opening Apple Calendar, finding the day, typing the title, choosing the location, and setting the alert.

How does voice planning compare to Apple Reminders?

Composed and Apple Reminders both accept voice input, but they do fundamentally different things with the result. Apple Reminders creates a single-trigger reminder. Composed creates a structured event with date, time, location, prep checklist, departure alert, and a three-layer notification model.

The specific gaps:

Schema. Apple Reminders has one schema (EKReminder) with title, due date, optional location, optional URL. Composed has an event schema with title, eventDate, endAt (for spanning events and flights), location coordinates, category, prep tasks, follow-up anchor, and travel intent for departure tracking.

Place intelligence. Apple Reminders accepts a location string but does not actively resolve venues. Composed’s search-places cross-references nearby venues so “dentist on Main Street” becomes a specific dental office address, not a freeform string.

Prep generation. Apple Reminders does not generate prep tasks. Composed runs generate-checklist automatically on every event, producing 3-5 prep tasks tuned to the event category.

Departure timing. Apple Reminders fires a single time-based alert. Composed’s LeaveByCalculator computes a real Leave-By time from current traffic via Apple MapKit, recalculates on iPhone foreground every time you reopen the app within 8 hours of an event, and adds airport buffers automatically for flights.

Tone. Apple Reminders uses red badges and time-pressured language to flag delayed items. Composed uses Yellow #FDE047 for deadline cards and language like “Added 3 days ago” — calm by design, never accusatory.

The two apps are not equivalent. Apple Reminders is the right tool when you want a single nudge at a specific time and nothing more. Composed is the right tool when you want the event captured and the preparation handled and the leave-by computed and the gentle escalation across days until the moment arrives.

How does voice planning fit into a calmer planning practice?

Voice planning is the capture layer — it’s what gets things out of your head and into a system in under 4 seconds. The rest of the practice — what to prep, when to leave, what to drop this week — happens after the capture, not during it. The point of making capture fast is not to fill the calendar faster. It’s to let you keep using the calendar at all.

Most people abandon planners around day 7-10. The pattern is consistent: the friction of adding events accumulates faster than the value of having added them. By day 8, opening the app feels like a chore. By day 14, the calendar has a three-day gap and the user mentally writes it off. This is the abandonment curve every planning app fights.

Voice planning is the structural answer to that curve. When the cost of capturing an event approaches zero, the calendar stays current. When the calendar stays current, the gentle reminders are accurate. When the reminders are accurate, the user trusts them. When the user trusts them, they keep using the app.

The compounding effect is small per-event but large in aggregate. A 4-second voice capture used three times a day for a year produces 1,095 events that would have otherwise been dropped, forgotten, or typed in at midnight when the user finally remembered. Each one comes with a prep checklist, a Leave-By time, and a three-layer notification model. The calendar becomes a forecasting instrument instead of a reactive log.

This is the difference between planning and remembering. Remembering is what the iPhone does — notifications, calendar sync, graduated reminders. Planning is the human decision about what matters this week. Voice input makes the iPhone better at remembering so the human is freed up to plan.

The point of voice planning is not to get more done. It’s to keep the system honest enough that planning stays possible. The 4-second capture is the pivot point. Everything else follows.

If you want to see this in real life, the Composed-in-the-Wild April 14 post walks through a real founder’s week — a real dentist appointment, a real flight, a real Apple Watch notification at the gate — captured by voice and managed by the app, written by Jesse himself.

Frequently asked questions

Does voice planning work without an internet connection on iPhone?

Voice planning requires an internet connection because the OpenAI Whisper transcription runs in the cloud, not on-device. When iPhone is offline, Composed's voice capture screen falls back to typed entry. The audio is not queued for later upload — it is discarded if the connection fails.

Can voice planning create recurring events on Apple Calendar?

Composed's parse-event edge function detects recurrence patterns like 'every Tuesday' or 'the first Monday of every month' but currently creates only the first occurrence. Recurring setup must be finished manually in Apple Calendar — long-press the event after creation to set the recurrence rule.

How accurate is voice planning for accents and natural phrasings?

OpenAI Whisper handles accented English well across US, UK, Indian, Australian, and most European varieties. Composed's parse-event layer reads natural conversational phrasings — 'dentist Tuesday at 2,' 'pick up the kids at 3:30,' 'in two weeks at 9' all resolve correctly. Whisper transcribes 'Tuesday' as 'Thursday' occasionally; the confirmation card surfaces the parsed date before writing to Apple Calendar so misheard words are caught in the same second.

Does Composed store my voice recordings?

No. Audio uploaded for transcription is sent to OpenAI Whisper through the transcribe-audio edge function, returns as text, and is discarded. Only the transcribed text persists, stored alongside the parsed event in Composed's local data store and synced to Apple Calendar through the calendar import bridge.

Can I use voice planning for floating todos without a fixed time?

Yes. Composed parses both fixed events ('dentist Tuesday at 2pm') and floating tasks ('pick up the dry cleaning before 6 today,' 'remind me to call Mom this week'). Floating items skip the departure-alert step and file as todos with optional deadlines, which surface as Yellow #FDE047 cards in the Today view without time pressure.

Why is voice planning faster than typing on iPhone?

Typing an event into Apple Calendar takes roughly 45 seconds — open the app, choose the day, tap the title field, type the title, set the date picker, set the time picker, add the location, choose the alert. Voice planning in Composed takes under 4 seconds because the parse-event edge function extracts title, date, time, location, and category from a single conversational sentence, then generate-checklist produces the prep tasks in parallel. The bottleneck moves from input to confirmation.

If you are deciding whether to make voice your primary capture method on iPhone, the honest answer is: try it for a week on the appointments that surface mid-task — driving, cooking, mid-meeting — where typing isn’t an option anyway. That is where voice planning earns its keep. Composed’s voice input feature is built for that use case specifically, and the voice-to-calendar landing page has 15 worked examples you can read through if you want to see exactly what the parser does with different phrasings.