Previous: 2.1 — How to capture without opening an app

Voice to calendar — the method

Founder, Composed · Last updated May 23, 2026

Voice to calendar works best when you say the event as one natural sentence: what, when, where, and what to bring, in any order. Composed transcribes your voice with Apple's on-device Speech framework and extracts the fields with its parse-event service on iPhone. 'Dentist Tuesday at 2 on Main Street, remember the insurance card' resolves to a dated, placed event with a prep task in about two seconds.

Voice to calendar method

Voice to calendar works best when you say the event as one natural sentence: what, when, where, and what to bring, in any order. Composed transcribes your voice with Apple’s on-device Speech framework and runs the text through its parse-event service on iPhone, which pulls out the title, date, time, place, and prep. Say “dentist Tuesday at 2 on Main Street, remember the insurance card” and you get a dated, placed event with a prep task — in about two seconds, with no fields to tap.

The method matters because voice capture has exactly one failure mode worth designing around: a sentence that’s clear to a human but ambiguous to a parser. Get the phrasing right and the first try lands. This chapter is the phrasing — the small, learnable vocabulary that makes Composed resolve a capture cleanly — and the recovery moves for the times it doesn’t. It pairs with the voice-to-calendar hub, which is where to go if you want to see it in action and download; this chapter is the editorial how-to that the hub points to.

How voice to calendar works

Voice-to-calendar is a two-stage pipeline: transcription, then extraction. When you speak, Composed transcribes the audio on your iPhone using Apple’s on-device Speech framework, so the recording itself never leaves the device — only the resulting text moves on. That text goes to the parse-event service, which is the part that turns a sentence into structured fields — it decides which word is the title, which phrase is the date, which is the time, and which is the place, and it pulls any “remember to” clause into a prep task.

Knowing there are two stages tells you where to aim. Apple’s Speech framework is very good at transcription but leans on context for proper nouns — so saying a venue’s name near a street or city helps it spell the venue correctly. The parse-event stage is good at relative dates and casual time phrasing — so you can talk the way you’d talk to a person, not the way you’d fill a form. The two-second round trip from “stop talking” to “event saved” is fast enough that you can capture mid-stride and keep walking. The feature page for voice input covers the mechanics; what follows is how to feed it.

What to say

Say it as one sentence, the way you’d tell a friend, and include three things if you have them: what it is, when it is, and where it is. “Coffee with Priya Thursday at 10 at the place on Fifth.” That’s it. You don’t need to say “create an event” or “set a reminder for” — the act of capturing is the command.

Order doesn’t matter and neither does grammar. “Thursday 10am Priya coffee Fifth Street” resolves the same as a full sentence. What helps is keeping the four pieces distinct so the parser doesn’t have to guess which word does which job. The Wednesday you remembered, walking to your car, that your daughter’s recital was the following Saturday, you could have said “Maya’s recital next Saturday at 6 at the community center” and kept walking — four pieces, one breath, done before you reached the door.

What to avoid is burying the event inside a story. “So I ran into Priya and we were saying we should finally get coffee, probably Thursday-ish” makes the parser hunt for the signal in the noise. Lead with the event. Tell the story to a human, not to the mic.

The time vocabulary

Composed resolves casual time phrasing cleanly, so you can speak in the vocabulary you actually think in. “Tuesday at 2” becomes the next Tuesday at 2:00 p.m. “Next Tuesday” jumps a week. “Tomorrow morning,” “this Friday at noon,” “the 14th at 9,” “in six weeks” — all of these land, because relative dates are exactly what the parse-event stage is tuned for.

The two phrasings that earn their reliability are anchoring and disambiguating. Anchor to a named day or a relative offset rather than a bare number: “Thursday at 2” is more robust than “the 2nd,” which a parser can read as a date or a time. Disambiguate a.m. from p.m. when the hour is genuinely ambiguous — “2 in the afternoon,” “7 in the morning” — though for most appointments the daytime default is the sane guess and you can fix it in one tap if it isn’t. When in doubt, say the day and the rough part of day together: “next Tuesday morning” gives the parser two confirming signals.

The place vocabulary

Name the place out loud and Composed treats it as the event’s location, which is what later powers the leave-by math. “At the dentist on Main Street,” “at Riverside Park,” “the office on Cedar” — say the place and it becomes a real location attached to the event, not just words in the title. Voice input uses nearby venue names to help spell the place correctly, so a landmark plus a street usually beats a half-remembered exact name.

The payoff for naming the place is downstream and large. A located event is what lets Composed calculate when you need to leave using real travel time rather than a guess. The Friday you said “parent-teacher conference at 4 at Lincoln Elementary,” naming the school is what later turned into a leave-by alert that accounted for the 3:30 traffic. Skip the place and you still get the event — you just don’t get the part of the system that gets you out the door on time. If you don’t know the exact venue, a neighborhood or street is enough to anchor it; you can refine it later.

The prep vocabulary

Add the prep by tacking on a “remember to” or “and I need to” clause, and Composed lifts it out of the sentence into the event. “Dentist Tuesday at 2, remember to bring the insurance card” creates the appointment and attaches the insurance card as something to do before you go. The prep clause is the one piece most people leave in their head — and it’s the piece that determines whether you arrive ready instead of just on time.

This is also where capture meets the AI prep checklist. Even without a “remember to” clause, Composed auto-generates three to five context-aware prep tasks the moment the event is created — a dentist gets “bring your insurance card,” a flight gets “verify your passport is valid for six months.” So the “remember to” clause isn’t carrying the whole load; it’s adding the one personal item the generic checklist wouldn’t know — “ask about the night guard,” “bring the form from the counter.” Say the thing you’d otherwise forget, and it stops being your job to remember it.

When voice misses

When voice misses, it almost always misses on one field, not the whole event — so the recovery is a one-tap edit, not a re-do. The capture lands as an event you can see before it’s final; if the time read as 2:00 a.m. instead of 2:00 p.m., or it parsed “Cedar” as the title, you fix that one field and keep the rest.

Voice doesn’t have to be perfect to be faster. A capture that lands 90% right in two seconds and takes one tap to finish still beats six taps from a blank form — and it beats forgetting entirely.

The three common misses and their fixes: a wrong a.m./p.m. is a single toggle. A misheard venue is a quick correction in the location field — and saying the street next time helps Apple’s Speech framework get it. A run-on capture that smashed two plans into one event means you said too much at once; the fix is to capture one plan per sentence. None of these require deleting and starting over.

Voice vs keyboard: quiet edits

Reviewing what the AI captured is a quiet edit, not a re-entry — open the event, glance at the fields, adjust the one that’s off, and the keyboard only comes out for that single fix. This is the reason voice-first doesn’t mean voice-only: the keyboard is the precision instrument for editing, while voice is the speed instrument for capturing. You use the fast tool to get it in and the precise tool to clean it up.

In practice the review takes a few seconds and you’ll skip it entirely once you trust your phrasing. Speak the event, see it land, move on — that’s the loop. The single habit that makes voice-to-calendar reliable is to say the place out loud every time, because the place is the field that unlocks the leave-by math and it’s the one most people drop. Lead with the event, name the day and time the way you think them, name the place, and tack on the one thing to remember. Four pieces, one sentence, two seconds.

Next: Screenshot anything with a date — which images Composed extracts well, and how to handle the edge cases.

Next chapter 2.3 — Screenshot anything with a date

Voice to calendar — the method

Keep exploring