So I gave goose a whirl and I actually really like the approach they are taking, especially because I use emacs and not vscode. I would recommend people try it out on an existing project—the results are quite good for small, additive features and even ones that are full stack.
It advertises that it runs locally and that it is "extensible" but then requires you to set up a remote/external provider as the first step of installation? That's a rather weird use of "local" and "extensible". Do words mean anything anymore?
I find it rather depressing. I know it's a more complex thing, but it really feels irl like people have no time for anything past a few seconds before moving onto the next thing. Shows in the results of their work too often as well. Some programming requires very long attention span and if you don't have any, it's not going to be good.
Can’t you just run ollama and provide it a localhost endpoint? I dont think its within scope to reproduce the whole local LLM stack when anyone wanting to do this today can easily use existing better tools to solve that part of it.
This looks very promising. I only played a little bit yesterday but they really need to polish the UI. Comparing to desktop version of chatgpt or perplexity they are in much lower league. Some feedback for team:
1) use better font and size
2) allow to adjust shortcuts and have nice defaults with easy change
3) integrate with local whisper model so I can type with voice triggered with global shortcut
4) change background to blend with default system OS theme so we don't have useless ugly top bar and ugly bottom bar
5) shortcuts buttons to easily copy part of conversion or full conversation, activate web search, star conversion so easy to find in history etc.
They should get more inspiration from raycast/perlexity/chatgpt/arcbrowser/warpai ui/cursor
I don't know if anyone find this useful, but it seems rather useless / not working? I tried with numerous online and local llms for good measure. I installed that computerController extension and tried couple of dozens different versions of open a website (url) in a browser and save a screenshot. Most of the time it wouldn't even open a website, and I never got a screenshot. At best it did open a website once and saved a html (even though I asked a screenshot); and that one was unique in a bunch when it did something instead of complaining it can't find AppleScript or whatever on a linux machine.. I qualified the ask by telling it it's on linux. It managed to find which distro it was on even. Really weird overall.
Running locally is such an important feature, running elsewhere is an instant non-starter for me. I also want the LLM to be able to read the code to build an in-context solution, but not be able to make changes unless they are strictly accepted.
Cursor also just got support this week. Overall it’s still early (MCP only came out a couple of months ago) but seeing multiple clients that allow it to be used with non-Anthropic models, and getting good results, makes me bullish for MCP.
My colleague has working on an MCP server that runs Python code in a sandbox (through https://forevervm.com, shameless plug). I’ve been using Goose a lot since it was announced last week for testing and it’s rough in some spots but the results have been good.
You’re being downvoted for some reason but I feel the same. It’s cool tech but I’ve found I often need to revert changes. It’s far too aggressive with tweaking files. Maybe I can adjust in the settings, idk. Also, it’s expensive as hell to run with Claude sonnet. Cost me like $0.01 per action on a small project, insane. At this point I still prefer the chat interface.
You can basically get the same experience as aider with an MCP server like https://github.com/rusiaaman/wcgw. It's not perfect - sometimes has trouble with exact syntax of find/replace. But it's free to use up to your regular Claude subscription usage limit. I actually use it more than Cursor, because it's easier to flip back and forth between architecting/editing.
$0.01 per action? Yeah, and I’ve gotten up to 10 cents or so I think in a “single” action but so what? The most I’ve ever spent with it in one go has been like $5 for a couple hours (maybe even 6-8) of on and off usage. That $5 was completely worth it.
Also you can use local models if you want it to be “free”.
$0.01 per action that can potentially save you tens of minutes to hours of work sounds like a pretty good deal to me, if I compare this to my hourly wage.
> $0.01 per action that can potentially save you tens of minutes to hours of work sounds like a pretty good deal to me
Save tens of hours in one commit?! No model is this good yet[1] - especially not Aider with its recommended models. I fully agree with parent - current SoTA models require lots of handholding in the domains I care about, and the AI chat/pairing setup works much better compared to the AI creating entire commits/PRs before a human gets to look at it.
1. If they were, Zuckerberg would have already announced another round of layoffs.
Ten hours in one commit? Nope, not yet, but it works great to test out ideas and get unstuck when trying to decide how to proceed. Instead of having to choose, just do both or at least try 1 right away instead of bike-shedding.
I often get hung up on UI, I can’t make a decision on what I think will look decent and so I just sort of lock up. Aider lets me focus on the logic and then have the LLM spit out the UI. Since I’ve given up on projects before due to the UI aspect (lose interest because I don’t feel like I’m making progress, or get overwhelmed by all the UI I’ll need to write) this is a huge boon to me.
I’m not incapable of writing UI, I’m just slower at it so Aider is like having a wiz junior developer who can crank out UI when I need it. I’m even fine to rewrite every line of the UI by hand before “shipping”, the LLM just helps me not get stuck on what it should look like. It lets me focus on the feature.
There are many (ignored) requests, to, like cursor, copilot and cline, automatically pick the files without having to specify them. Not having that makes it much worse than those others. I was a fan before the others but having to add your files is not a thing anymore.
I don't know how useful this is, but my immediate reaction to the animation on the front page was "that's literally worse then the alternative".
Because the example given was "change the color of a component".
Now, it's obviously fairly impressive that a machine can go from plain text to identifying a react component and editing it...but the process to do so literally doesn't save me any time.
"Can you change the current colour of headercomponent.tsx to <some color> and increase the size vertical to 15% of vh" is a longer to type sentence then the time it would take to just open the file and do that.
Moreover, the example is in a very "standard" format. What happens if I'm not using styled components? What happens if that color is set from a function? In fact none of the examples shown seem gamechanging in anyway (i.e. the Confluence example is also what a basic script could do, or a workflow, or anything else - and is still essentially "two mouseclicks" rather then writing out a longer English sentence and then I would guess, waiting substantially more time for inferrence to run.
I agree with the process not saving any of our time. however aren't examples supposed to be simple?
Take it from Aider example: https://github.com/Aider-AI/aider
It asked to add a param and typing to a function. Would that save us more time? I don't think so. but it's a good peek of what it can do
On the one hand, this isn’t a great example for you because you already knew how to do that. There’s probably no good way to automate trivial changes that you can make off the top of your head, and have it be faster than just doing it yourself.
I’ve found LLMs most useful for doing things with unfamiliar tooling, where you know what you want to achieve but not exactly how to do it.
On the other hand, it’s an okay test case because you can easily verify the results.
Yeah the fact that just composing the first prompt would take me longer than just doing the thing is my biggest blocker to using any of these tools on a regular basis
Which is also assuming it gets it right the first prompt, and not 15 minutes of prompt hacking later, giving up and doing it the old fashioned way anyway.
The risk of wasted time is higher than the proposed benefit, for most of my current use cases. I don't do heaps of glue code, it's mostly business logic, and one off fixes, so I have not found LLMs to be useful day to day at work.
Where it has been useful is when I need to do a task with tech I don't use often. I usually know exactly what I want to do but don't have the myriad arcane details. A great example would be needing to do a complex MongoDB query when I don't normally use Mongo.
Cursor + Sonnet has been great for scaffolding tests.
I'll stub out tests (just a name and `assert true`) and have it fill them in. It usually gets them wrong, but I can fix one and then have it update the rest to match.
Not perfect, but beats writing all the tests myself.
So I gave goose a whirl and I actually really like the approach they are taking, especially because I use emacs and not vscode. I would recommend people try it out on an existing project—the results are quite good for small, additive features and even ones that are full stack.
Here's a short writeup of my notes from trying to use it https://notes.alexkehayias.com/goose-coding-ai-agent/
Which LLM did you use with Goose? That really affects the quality of the outcome
I’m using gpt-4o which I think is the default for the OpenAI configuration.
Haven’t tried other models yet but old like to see how o3-mini performs once it’s been added.
I'm confused what this does that Cursor doesn't. The example it shows on the front page is something Cursor can also easily do.
It advertises that it runs locally and that it is "extensible" but then requires you to set up a remote/external provider as the first step of installation? That's a rather weird use of "local" and "extensible". Do words mean anything anymore?
You went as far as checking how it works (thus "requires you to set up a remote/external provider as the first step").
But you didn't bother checking the very next section on side bar, Supported LLM Providers, where ollama is listed.
The attention span issue today is amusing.
> The attention span issue today is amusing.
I find it rather depressing. I know it's a more complex thing, but it really feels irl like people have no time for anything past a few seconds before moving onto the next thing. Shows in the results of their work too often as well. Some programming requires very long attention span and if you don't have any, it's not going to be good.
But people really have no time. There is only one brain and thousands of AI startups pitching something every day.
Yeah, don't need to try any until everyone says 'you have to'. Which happened with Aider and later Cline & Cursor.
Can’t you just run ollama and provide it a localhost endpoint? I dont think its within scope to reproduce the whole local LLM stack when anyone wanting to do this today can easily use existing better tools to solve that part of it.
Did you not see Ollama?
Yeah, they seem to be referring to the Goose agent/CLI that are local. Not models themselves.
You can run ollama, so no, not only Goose itself.
This looks very promising. I only played a little bit yesterday but they really need to polish the UI. Comparing to desktop version of chatgpt or perplexity they are in much lower league. Some feedback for team:
1) use better font and size
2) allow to adjust shortcuts and have nice defaults with easy change
3) integrate with local whisper model so I can type with voice triggered with global shortcut
4) change background to blend with default system OS theme so we don't have useless ugly top bar and ugly bottom bar
5) shortcuts buttons to easily copy part of conversion or full conversation, activate web search, star conversion so easy to find in history etc.
They should get more inspiration from raycast/perlexity/chatgpt/arcbrowser/warpai ui/cursor
I don't know if anyone find this useful, but it seems rather useless / not working? I tried with numerous online and local llms for good measure. I installed that computerController extension and tried couple of dozens different versions of open a website (url) in a browser and save a screenshot. Most of the time it wouldn't even open a website, and I never got a screenshot. At best it did open a website once and saved a html (even though I asked a screenshot); and that one was unique in a bunch when it did something instead of complaining it can't find AppleScript or whatever on a linux machine.. I qualified the ask by telling it it's on linux. It managed to find which distro it was on even. Really weird overall.
Running locally is such an important feature, running elsewhere is an instant non-starter for me. I also want the LLM to be able to read the code to build an in-context solution, but not be able to make changes unless they are strictly accepted.
Have many other projects put MPC servers (https://modelcontextprotocol.io/introduction) to use since it was announced? I haven't seen very many others using it yet.
Cursor also just got support this week. Overall it’s still early (MCP only came out a couple of months ago) but seeing multiple clients that allow it to be used with non-Anthropic models, and getting good results, makes me bullish for MCP.
My colleague has working on an MCP server that runs Python code in a sandbox (through https://forevervm.com, shameless plug). I’ve been using Goose a lot since it was announced last week for testing and it’s rough in some spots but the results have been good.
this is the page i’d link to
https://block.github.io/goose/docs/goose-architecture/
Today I decided that what I need is:
- prompt from command line directly to Claude
- suggestions dumped into a file under ./tmp/ (ignored by git)
- iterate on those files
- shuttle test results over to Claude
Getting those files merged with the source files is also important, but I’m not confident in a better way than copy-pasting at this point.
Aider is fantastic. Worth a look.
I’ve been playing with it and I don’t like it that much? I’m not sure why. It feels a little buggy and like it’s doing too much.
You’re being downvoted for some reason but I feel the same. It’s cool tech but I’ve found I often need to revert changes. It’s far too aggressive with tweaking files. Maybe I can adjust in the settings, idk. Also, it’s expensive as hell to run with Claude sonnet. Cost me like $0.01 per action on a small project, insane. At this point I still prefer the chat interface.
You can basically get the same experience as aider with an MCP server like https://github.com/rusiaaman/wcgw. It's not perfect - sometimes has trouble with exact syntax of find/replace. But it's free to use up to your regular Claude subscription usage limit. I actually use it more than Cursor, because it's easier to flip back and forth between architecting/editing.
Thanks! I’ll take a look at this. It always kind of annoyed me to pay for API credits on top of a subscription, lol.
$0.01 per action? Yeah, and I’ve gotten up to 10 cents or so I think in a “single” action but so what? The most I’ve ever spent with it in one go has been like $5 for a couple hours (maybe even 6-8) of on and off usage. That $5 was completely worth it.
Also you can use local models if you want it to be “free”.
$0.01 per action that can potentially save you tens of minutes to hours of work sounds like a pretty good deal to me, if I compare this to my hourly wage.
> $0.01 per action that can potentially save you tens of minutes to hours of work sounds like a pretty good deal to me
Save tens of hours in one commit?! No model is this good yet[1] - especially not Aider with its recommended models. I fully agree with parent - current SoTA models require lots of handholding in the domains I care about, and the AI chat/pairing setup works much better compared to the AI creating entire commits/PRs before a human gets to look at it.
1. If they were, Zuckerberg would have already announced another round of layoffs.
Ten hours in one commit? Nope, not yet, but it works great to test out ideas and get unstuck when trying to decide how to proceed. Instead of having to choose, just do both or at least try 1 right away instead of bike-shedding.
I often get hung up on UI, I can’t make a decision on what I think will look decent and so I just sort of lock up. Aider lets me focus on the logic and then have the LLM spit out the UI. Since I’ve given up on projects before due to the UI aspect (lose interest because I don’t feel like I’m making progress, or get overwhelmed by all the UI I’ll need to write) this is a huge boon to me.
I’m not incapable of writing UI, I’m just slower at it so Aider is like having a wiz junior developer who can crank out UI when I need it. I’m even fine to rewrite every line of the UI by hand before “shipping”, the LLM just helps me not get stuck on what it should look like. It lets me focus on the feature.
There are many (ignored) requests, to, like cursor, copilot and cline, automatically pick the files without having to specify them. Not having that makes it much worse than those others. I was a fan before the others but having to add your files is not a thing anymore.
Hmm, I want to add my own files. This is because in my workflow I often turn to the web UI in order to get a fresh context.
I do like the idea of letting the model ask for source code.
It’s all about attention / context.
I’ve almost finished an interactive file selector inspired by git add interactive, with the addition of a tree display.
I’m giving myself the option to output collated code to a file, or copy it to clipboard, or just hold onto it for the next prompt.
I know aider does this stuff, but because I’m automating my own workflow, it’s worth doing it myself.
Is it the same as ollama?
I don't know how useful this is, but my immediate reaction to the animation on the front page was "that's literally worse then the alternative".
Because the example given was "change the color of a component".
Now, it's obviously fairly impressive that a machine can go from plain text to identifying a react component and editing it...but the process to do so literally doesn't save me any time.
"Can you change the current colour of headercomponent.tsx to <some color> and increase the size vertical to 15% of vh" is a longer to type sentence then the time it would take to just open the file and do that.
Moreover, the example is in a very "standard" format. What happens if I'm not using styled components? What happens if that color is set from a function? In fact none of the examples shown seem gamechanging in anyway (i.e. the Confluence example is also what a basic script could do, or a workflow, or anything else - and is still essentially "two mouseclicks" rather then writing out a longer English sentence and then I would guess, waiting substantially more time for inferrence to run.
I agree with the process not saving any of our time. however aren't examples supposed to be simple?
Take it from Aider example: https://github.com/Aider-AI/aider It asked to add a param and typing to a function. Would that save us more time? I don't think so. but it's a good peek of what it can do
just like any other hello world example i suppose
Examples are supposed to be simple when they illustrate a process we already know works.
With AI the challenge is that we need to convince the reader that the tool will work. So that calls for a different kind of example.
If you don't know how implement it how can you be sure LLM will do it correctly?
If the task is not simple then break it into simple tasks. Then each of them is as easy as color change.
On the one hand, this isn’t a great example for you because you already knew how to do that. There’s probably no good way to automate trivial changes that you can make off the top of your head, and have it be faster than just doing it yourself.
I’ve found LLMs most useful for doing things with unfamiliar tooling, where you know what you want to achieve but not exactly how to do it.
On the other hand, it’s an okay test case because you can easily verify the results.
Yeah the fact that just composing the first prompt would take me longer than just doing the thing is my biggest blocker to using any of these tools on a regular basis
Which is also assuming it gets it right the first prompt, and not 15 minutes of prompt hacking later, giving up and doing it the old fashioned way anyway.
The risk of wasted time is higher than the proposed benefit, for most of my current use cases. I don't do heaps of glue code, it's mostly business logic, and one off fixes, so I have not found LLMs to be useful day to day at work.
Where it has been useful is when I need to do a task with tech I don't use often. I usually know exactly what I want to do but don't have the myriad arcane details. A great example would be needing to do a complex MongoDB query when I don't normally use Mongo.
Cursor + Sonnet has been great for scaffolding tests.
I'll stub out tests (just a name and `assert true`) and have it fill them in. It usually gets them wrong, but I can fix one and then have it update the rest to match.
Not perfect, but beats writing all the tests myself.
How does this compare to Cline or Cursor's Composer agent?
Are people finding agent frameworks useful or they are unnecessary dependencies like Langchain?
Regarding the “Extensible,” doesn’t that completely moot its whole point?