Fixing Siri: Low-Risk Ways to Employ LLMs for Rapid Improvement

Ahead of next week's WWDC conference, everyone's excited to hear about the rumored AR/VR glasses, and so am I! However, I find the recent uptick in reporting suggesting improvements are coming to Siri to be more interesting. Some recent job postings indicate the same thing.

Apple has historically been really caution in deploying new tech; this led me to jailbreak every single iPhone I had in the past decade until the 13 Pro. Thankfully iOS has incorporated a lot of the features I really wanted such as system-wide ad-block, there is still room for improvement namely on Siri.

One of the most obvious use cases of LLMs is NLP. Siri currently only responds to a very narrow set of commands. I have a tough time reminding my close family about those sets of commands, if you're off even by a few words you're led down a very frustrating path.

This is why as soon as I saw Mate Marschalko's shortcut tweak that allowed Siri to control Homebridge devices I immediately started trying to create a shortcut that allowed you to talk to ChatGPT on your iPhone. My brother is a much better developer than I am so he ended up beating me to SiriGPT 😊.

I really doubt that Apple will allow users to talk directly to LLMs (for obvious reasons). Here is how I think they can improve Siri without fundamentally changing anything about Siri.

The first thing that Apple could do is put an LLM as the front end interpreter to Siri's commands in the same way that Mate's homebridge shortcut did. The LLM would be a parsing layer that would translate the messy commands from an end user to very clean commands that a system is known to understand.

"Hey Siri, uuh.... play apple music hits - on full blast and ... remind me to leave for target in 20 minutes"

I would never expect Siri in its current implementation to understand this command. I would assume that Siri would stop listening in the middle of the command due to pauses, umms, and ahhhs. However, you could definitely train a model to break that command down into constituent parts that are known Siri commands.

User Input -> Siri LLM Interpreter -> Known Siri Commands -> Siri Responds

User Input

"Hey Siri, uuh.... play apple music hits - on full blast and ... remind me to leave for target in 20 minutes"

Output from Siri LLM Interpreter = Known Siri Commands

"Hey Siri, play apple music hits"
"Hey Siri, increase volume to 100%"
"Hey Siri, remind me "go to target" in 20 minutes"

Siri to responds to the user

"Playing Apple Music hits"
"Media volume set to 100%"
"I added go to Target in 20 minutes"

Implementing this approach would address some of Siri's limitations in the short term, pushing Siri beyond what it can do today and maybe outperform other voice assistants.

WWDC 2024 wishlist: Is it too much to ask for the ability to skip songs using volume buttons, like I could with my BlackBerry?