As someone who has tried almost all of the AI browsers that are accessible or in a relatively open beta, plus all the browser control frameworks and agents, I super agree with the notions behind this post.
Curious about your approach, though: so, it's a literal script, or an LLM being told to follow a deterministic script and only get subjective when necessary? Based on the blog, it looks like the former, but why not the later? Get the LLM to be pseudo-deterministic but still step-by-step it so that it can handle UI changes and adjacent interfaces.
A workflow can have subjective parts too. For example, click on button A if it satisfies certain conditions I wrote in plain English, otherwise click on B.
These subjective elements can be defined with user inputs/prompts.
So a workflow is a literal script with embedded LLM calls for branching or even scraping details where literal script feels tedious.
As someone who has tried almost all of the AI browsers that are accessible or in a relatively open beta, plus all the browser control frameworks and agents, I super agree with the notions behind this post.
Curious about your approach, though: so, it's a literal script, or an LLM being told to follow a deterministic script and only get subjective when necessary? Based on the blog, it looks like the former, but why not the later? Get the LLM to be pseudo-deterministic but still step-by-step it so that it can handle UI changes and adjacent interfaces.
A workflow can have subjective parts too. For example, click on button A if it satisfies certain conditions I wrote in plain English, otherwise click on B.
These subjective elements can be defined with user inputs/prompts.
So a workflow is a literal script with embedded LLM calls for branching or even scraping details where literal script feels tedious.
Neat.