Existentialism aside, it's an exciting time in tech. Like many others, I've been working to better understand AI and LLMs over recent months and years. Emerging tech brings with it, a lot of unknowns. To some this feels intimdating and scary, but to others this will feel like Christmas morning. The AI shift requires us to rethink many areas, but what intrigues me most is how the graphical user interface will evolve—we're already seeing conversational UI handle much of the work.
Something that has been interesting to me is the idea of unlocking the power of LLMs for casual users. And the power of LLMs are mostly harnessed through good prompts. When done poorly, prompting and follow-ups can be disappointing and often hurt users creativity and productivity. With that in mind, I was curious about how it might be possible to narrow the gap between having an idea and executing it through a guided prompt pattern.
While checking to see if anyone has done much work here, I came across this Stanford paper about NLI: Natural Language Interfaces which summed the problem up well:
Despite their potential, the lack of need for structured input can make it more challenging for users to translate their intentions into actions accurately. This is particularly true among inexperienced users who may not understand LLMs or how they work. A lot of the current NLI systems consist of various UI design flaws and/or do not incorporate key human-LLM design practices that help bridge the gap between a user’s intention and actions. This lack of consideration often leads to users producing poorly defined prompts, which results in unhelpful responses from the LLM. This discrepancy between intentions and actions limits the accessibility of LLMs and reduces their effectiveness.
Improving Human-LLM Interactions by Redesigning the Chatbot by Rashon Poole
Conversational UI is by it’s nature, highly dynamic and works really well for the endless jobs that LLMs can do. But a multi-modal approach, where conversational UI hands off to graphical UI feels elegant and done at the right times feels like the right approach. In other words, thoughtful orchestration.
Below are seeds of thought for how this pattern could evolve. There is plenty not explored here and maybe I’ll post an update at some point.
Note: I’m using OpenAI’s ChatGPT UI for these explorations as a baseline but these could be applied to an chat based LLM.
Light approach: Inline guidance prompts
Heavy approach: Modal based
The modal approach is interesting to me because it allows for so much more control and feels closer to the advanced search functionality that Google allow for Drive.
It also made me question whether a user would need to see the prompt output at all or just run run the prompt from the modal itself.
Another curiosity I had was how all of this could work with Canvas (or Artifacts if this was done for Claude). There are many parameters that could be surfaced in the Canvas as a means to tweak the initial output (e.g. the budget slider appearing when you hover over the price on the Canvas and that having a large butterfly effect on the output.
The cutting room floor
Various riffs on the previous explorations.