I was looking into something like this for linux recently. Didn't find anything obviously simple
(considered hooking up whisper.cpp and a bit of audio magic to make it at least transcribe, but it firstly seemed like a fair bit of a pain and secondly I couldn't think of a nice way to do speaker detection.)
https://github.com/m-bain/whisperX looks promising - I'm hacking away on an always-on transcriber for my notes for later search&recall. It has support for diarization (the speaker detection you're looking for).
But overall it's pretty simple to do after you wrangle the Python dependencies - all you need is a sink for the text files (for example, create a new file for every Teams meeting, but that's another story...)
Any good solutions for capturing the audio streams and piping them where they're needed? (I.e both microphone and speakers. I was wondering if I needed to mess with pulseaudio and/or jack (I mean pipewire under the hood, but I think those APIs sit on top and might be clearer))
Never mind, played around a little, and pulseaudio's cli API makes it easy enough to sling some loopback/virtual devices around that you can then read from easily enough.
We did a lot of work at https://www.quillmeetings.com to build a diarization & speaker recognition pipeline that works locally on mac and windows. Basically, we can create embeddings of parts of the audio, like you might create embeddings for text for a RAG system, and cluster them (simplifying a lot of details from the "last 80%" that has taken a lot of effort to get working...)
The speaker recognition can't be as perfect as listening to each stream separately like Zoom itself can do, but it also learns your contacts over time and can recognize voices for ad-hoc in-person meetings, etc. which I've found really magical since we launched it.
Ah yes, a locally-run, mostly-accurate speaker recognition pipeline that isn't open source. Love to see cool features locked away while the rest of us plebs make do with whatever scraps the OSS world has managed to build. But hey, at least it kind of works, so you can enjoy your slightly-wrong diarization in private.
I'm not sure if you have any interest in porting this to Mac, but in case you do, here's some native Swift code that might help. It was built by me and a friend originally for Electron, but the repo should act as a general template. It's completely open source, and if you (or anyone) need any license modifications for any reason, just reach out: https://github.com/O4FDev/electron-system-audio-recorder/blo...
There’s not a “bot” that needs to attend the meeting and show up in the list of attendees thus giving away the recording of the call. Otter.ai, for instance, shows up as “Otter” (or another name) on a Zoom call when it is recording and taking notes.
Because crime is bad and I don't have to be nice to those who support criminals doing crimes. If your marketing differentiator vs all the AI recording bot products is that with your product, you can record people without them knowing you are recording them... then your business model is literally to facilitate crime in many jurisdictions, including California.
Let me be clear: if you have a bot capturing audio in a call you have with someone in California, and you do not tell that person you are recording them, then you have committed a felony, even if you are not in California.
And what is it about you that makes you so allergic to me calling this out? I wonder why....
See, I can do that too. How does that feel? We having a good conversation here?
Are you an attorney? I would be careful making such sweeping statements unless you are. A transcription is not a wiretap, it's not obvious to me that an anti-recording law would apply here. Plus, if you're on a call with coworkers, they likely already know that transcription is taking place, even if you don't explicitly say so at the start of each meeting. This is why you should be more kind - you might not know all these things.
Thanks for your comments. I wish more people had a personal policy of not putting up with those who commit or endorse crime/fraud/bullshit. The world would be a better place.
I can think of at least five reasons to use this that aren't illegal, including the fact that the law doesn't work the way this person thinks it does.
The way he phrased it has turned me from someone eager to discuss the potential uses of this into someone unwilling to engage with him to discuss it any further. Even if it turns out that he's absolutely 100% correct (he's not) I'll talk to someone else about it instead.
I suspect this person regularly has "conversations" where the other party suddenly becomes silent, and he misinterprets that as a "victory" instead of the other person deciding he isn't worth the trouble.
I always ask if I can transcribe using an AI tool regardless of jurisdiction. Not sure what the other commenter’s intentions were but just throwing my two cents in.
I prefer non-bot transcription tools solely because they’re not a nuisance during the meeting — they take up valuable screen real estate and provide no input during the meeting so I’d rather them be invisible.
Whether it is actually a crime for a person in a one-party consent jurisdiction recording a call with a person in a two-party consent jurisdiction is not a consistently settled issue. At least in US courts, dunno about elsewhere.
Sometimes the courts have sided "the stricter jurisdiction's law applies" while other times the courts have sided "the law where the recording was made applies". The federal law is not any clearer, stating one party consent is the default and states can override but offering no further guidance. I suspect this will someday be addressed in the Supreme Court.
Extraterritorial effects are usually limited in scope for this kind of concern. If I had to place a bet I'd say this is the main line of reasoning the current Supreme Court would use to side with the "the law where the recording happens" as well. I may just be advertising my biased though, as that's also the conclusion I think makes sense personally.
Until that actually gets reviewed by a higher court (or more descript higher law comes about) what each regional court concludes remains the reality for cases in that region though.
I'm not a lawyer myself, I just had to spend some time with the company's regarding this topic recently ("yay" for filling in to manage internal IT on the side).
Should your concern lie with individuals transcribing their own conversations, or with mass surveillance and wiretapping actively being executed by a broad range of official and corporate entities without your consent?
Woah, that's a classic logical fallacy you got there, buddy. I can't be upset about A because B is related and also bad. One of the greatest of all time ways to derail an argument.
Shouldn't you be more concerned about starving children or something than my post?
See how productive of a conversation we are having when we both use these fallacies?
You're welcome to care about as many things as you desire, at the same time, friend. It's a question of perspective and relative importance. The reply didn't comment about A and B and C, only A - implying A was the most important thing to consider and discuss.
Something I find annoying with automatic transcriptions and summaries, like the one built into Teams, is that they lack the context necessary to properly interpret what's being said. Example if I have a meeting discussing products, abbreviations or systems with "internal" names then it can't discern them or statistically rejects them, replacing them with its best guess for a dictionary word instead. So say we have a long call involving frequent mentions about a measure called pNet pronounced in the meeting "Peenet". Then you end up with a transcription of a bunch of guys having a discussion about penises. Hilarious, the first few times. OK always hilarious, but not so useful.
Being able to set the system prompt for these transcriptions would be very useful. Like "You are a friendly bot transcribing meetings at a software company. Some common terms and abbreviations you'll encounter are...".
But the error happens in 'audio to text' part, so text prompt won't solve it. The way to fix it is probably fine-tuning the underlying audio to text model.
Doing audio-to-text requires having a statistical model for what word or phrase a piece of sound is most likely to be. Without context, you can't do better than ranking the most likely candidates where a common word is more likely than an uncommon one. Having a task-specific dictionary at that point would help.
One could also imagine doing it at the summary step where the AI could simply be asked to do phonetic analysis. "Here is a transcription of a meeting. Here is a list of terms/names/participants etc. Given the transcription, the meeting context/topics and assuming the transcriptor has made errors, replace similarly sounding words and terms with more likely ones from the context"
Microsoft Teams already provides similar built-in features, along with translation, and I have to say it is one of the rare AI tools from Microsoft that makes sense and actually works -- I had good experience using it for reviewing meetings in non English language. It's not hard to imagine that this will be a standard feature of all mainstream video conference software. Wonder what is the place for these tools.
There's still a surprising lack of good video call recording services that can be controlled programmatically, unlike the end-to-end SaaS apps like Read.ai or Otter.ai.
The only open-source one I could find is Amurex, which looks promising. But it only supports Google Meet for now, it does it a bit differently with a Chrome extension, and it is generally rather immature, but I do wish them the best.
The only API services available are Recall.ai and MeetingBaaS, they both support the big three (Google Meet, Microsoft Teams and Zoom), but they are rather expensive at $0.5 - $1 per hour. The Calendar Syncing feature is also locked behind enterprise tiers with additional monthly fees in the hundreds, and it is rather important real-world use.
The creator of Amurex here. Thank you for the kind words :D More platform support is coming very soon ;) (read next week)
> The only API services available are Recall.ai and MeetingBaaS, they both support the big three (Google Meet, Microsoft Teams and Zoom), but they are rather expensive at $0.5 - $1 per hour.
seems like someone has told you our internal roadmap xD but I am glad to see we are on the right track to solve the problem :D
You are doing great work, and I do think making it open-source is a smart strategic choice. There's still so much potential for building AI intelligence products on top of video call recordings, and right now you are offering the only practical foundation to build such systems on.
I've been keeping a close eye because $1/h is unsustainable for what we are building, and there's no good reason why it should cost so much. It's manageable for early traction, but soon we'll need to consider either to build all those integrations ourselves or to build on top of Amurex. We might be contributing soon.
I did see in GitHub that Teams support was almost done, exciting! Do you plan to continue with the browser extension model, or are you also looking for solutions to record meetings that happen in the Teams/Zoom native client?
I think this is why most companies do it by creating a bot that joins the meeting, it's also great free advertising for them. Of course it's a bit awkward for the user, but it's becoming a normal thing, and ethically it's better to be explicit about the fact you are recording.
> You are doing great work, and I do think making it open-source is a smart strategic choice. There's still so much potential for building AI intelligence products on top of video call recordings, and right now you are offering the only practical foundation to build such systems on.
Thank you :D
> I've been keeping a close eye because $1/h is unsustainable for what we are building, and there's no good reason why it should cost so much. It's manageable for early traction, but soon we'll need to consider either to build all those integrations ourselves or to build on top of Amurex. We might be contributing soon.
Sounds great! We are super happy to support all the integrations. If you can message me on discord, I'd be super keen to hear what you have to say.
> I did see in GitHub that Teams support was almost done, exciting! Do you plan to continue with the browser extension model, or are you also looking for solutions to record meetings that happen in the Teams/Zoom native client?
Coming soon ;)
> I think this is why most companies do it by creating a bot that joins the meeting, it's also great free advertising for them. Of course it's a bit awkward for the user, but it's becoming a normal thing, and ethically it's better to be explicit about the fact you are recording.
The problem with bots is that
- first, they are annoying.
- second, I have a tendency to reject all the bots joining my meeting because they are annoying, which deems the bot products practically useless.
And you raise a good point about ethics, we expect the users to be grown up about their decisions. The users are expected to act according to their state laws.
I grew up reading and being influenced by liberation in FOSS software. I don't really want to impose our own "laws" on a user if their state says otherwise.
Hey there, I'm building an open source Recall at https://github.com/noah-duncan/attendee, designed for convenient self-hosting. It's fairly immature but other engineers are starting to contribute and things are picking up. Pretty sure it's the only open source example of a google meet bot that can extract audio, video, transcript and speak in the meeting.
Granola. Best meeting app I’ve used. I have a notepad that takes markup I can add myself and it intelligently fills in the notes I wrote.
eg. I put bullet points with something like “updates from Steve?” And do that for everyone during our check in. When the meeting ends it takes all their conversation in the transcript and fills in my markup with the notes.
I’ve attended meetings where I had zero participation and focus on doing something else during the meeting. When it’s over it gives me a detailed summary of the meeting. It felt like I had an assistant taking detailed, ordered notes for me. It’s almost like that scene from the movie Old School. Rodney Dangerfield sent his secretary to stenograph the lecture time so he didn’t have to attend and she gets called out by the professor. Felt just like that kind of transcribing.
Spellar.ai does a great job. There’s others out there for Mac but I like Spellar’s calendar integration.
Interestingly, their initial raison d’être was to help with English pronunciation and speaking speed, giving you real time feedback. They’ve downplayed this in recent releases, but the functionality is still there. Though, I’m a native English speaker and it always flagged me as pronouncing words incorrectly even though I’ve got little regional accent (I’ve been told this by others, not just my opinion. I had a speech therapist as a mother, hence little accent)
We do this at quillmeetings.com - the audio stays on your device and is transcribed by whisper. We also do speaker splitting and recognition with a combination of models. If you share or sync notes/meetings they are e2e encrypted.
FYI, the transcript-only product is free forever (it's local, so why not?), but generating AI notes, interpreting screenshots if you enable that, etc. are in the Pro plan and do require using a cloud API.
https://speechpulse.com does fully local audio transcription. The UI and settings are not the most intuitive, but it works fairly well and they are making constant updates.
As an additional note, Spellar does let you bring your own Open AI key but does not allow for purely local processing. You’ve still got to send the audio out for transcription and interpretation.
Also, I have no affiliation with Spellar, just a user.
Looks cool. Is it possible to use a local model (like whisper) to avoid leaking conversations to the cloud-based AI?
That’s what’s planned next :)
[dead]
I was looking into something like this for linux recently. Didn't find anything obviously simple
(considered hooking up whisper.cpp and a bit of audio magic to make it at least transcribe, but it firstly seemed like a fair bit of a pain and secondly I couldn't think of a nice way to do speaker detection.)
https://github.com/m-bain/whisperX looks promising - I'm hacking away on an always-on transcriber for my notes for later search&recall. It has support for diarization (the speaker detection you're looking for).
I'm currently hacking away on a mix of https://github.com/speaches-ai/speaches + https://github.com/ufal/whisper_streaming though - mostly because my laptop doesn't have a decent GPU, I stream the audio to a home server instead.
But overall it's pretty simple to do after you wrangle the Python dependencies - all you need is a sink for the text files (for example, create a new file for every Teams meeting, but that's another story...)
Any good solutions for capturing the audio streams and piping them where they're needed? (I.e both microphone and speakers. I was wondering if I needed to mess with pulseaudio and/or jack (I mean pipewire under the hood, but I think those APIs sit on top and might be clearer))
Never mind, played around a little, and pulseaudio's cli API makes it easy enough to sling some loopback/virtual devices around that you can then read from easily enough.
So which are you "hacking away on" in the end?
I don't think this tool can do what native AI transcription integrations can do, track who is speaking. Is there any novel way of addressing that gap?
We did a lot of work at https://www.quillmeetings.com to build a diarization & speaker recognition pipeline that works locally on mac and windows. Basically, we can create embeddings of parts of the audio, like you might create embeddings for text for a RAG system, and cluster them (simplifying a lot of details from the "last 80%" that has taken a lot of effort to get working...)
The speaker recognition can't be as perfect as listening to each stream separately like Zoom itself can do, but it also learns your contacts over time and can recognize voices for ad-hoc in-person meetings, etc. which I've found really magical since we launched it.
Ah yes, a locally-run, mostly-accurate speaker recognition pipeline that isn't open source. Love to see cool features locked away while the rest of us plebs make do with whatever scraps the OSS world has managed to build. But hey, at least it kind of works, so you can enjoy your slightly-wrong diarization in private.
Truly the future of meetings.
not open source :/
I'm not sure if you have any interest in porting this to Mac, but in case you do, here's some native Swift code that might help. It was built by me and a friend originally for Electron, but the repo should act as a general template. It's completely open source, and if you (or anyone) need any license modifications for any reason, just reach out: https://github.com/O4FDev/electron-system-audio-recorder/blo...
What does “no bot” mean? I don’t see any elaboration, tho maybe I’m just blind!
There’s not a “bot” that needs to attend the meeting and show up in the list of attendees thus giving away the recording of the call. Otter.ai, for instance, shows up as “Otter” (or another name) on a Zoom call when it is recording and taking notes.
Oh, so it is for more "seamlessly" helping people commit the crime of wiretapping in two-party consent jurisdictions, like California?
If you don't like people knowing you are recording them, you probably have a consent issue.
You could have said this exact same thing without it sounding like a personal attack, but you chose to be unkind instead. I wonder why?
Because crime is bad and I don't have to be nice to those who support criminals doing crimes. If your marketing differentiator vs all the AI recording bot products is that with your product, you can record people without them knowing you are recording them... then your business model is literally to facilitate crime in many jurisdictions, including California.
Let me be clear: if you have a bot capturing audio in a call you have with someone in California, and you do not tell that person you are recording them, then you have committed a felony, even if you are not in California.
And what is it about you that makes you so allergic to me calling this out? I wonder why....
See, I can do that too. How does that feel? We having a good conversation here?
Are you an attorney? I would be careful making such sweeping statements unless you are. A transcription is not a wiretap, it's not obvious to me that an anti-recording law would apply here. Plus, if you're on a call with coworkers, they likely already know that transcription is taking place, even if you don't explicitly say so at the start of each meeting. This is why you should be more kind - you might not know all these things.
Thanks for your comments. I wish more people had a personal policy of not putting up with those who commit or endorse crime/fraud/bullshit. The world would be a better place.
I can think of at least five reasons to use this that aren't illegal, including the fact that the law doesn't work the way this person thinks it does.
The way he phrased it has turned me from someone eager to discuss the potential uses of this into someone unwilling to engage with him to discuss it any further. Even if it turns out that he's absolutely 100% correct (he's not) I'll talk to someone else about it instead.
I suspect this person regularly has "conversations" where the other party suddenly becomes silent, and he misinterprets that as a "victory" instead of the other person deciding he isn't worth the trouble.
I always ask if I can transcribe using an AI tool regardless of jurisdiction. Not sure what the other commenter’s intentions were but just throwing my two cents in.
I prefer non-bot transcription tools solely because they’re not a nuisance during the meeting — they take up valuable screen real estate and provide no input during the meeting so I’d rather them be invisible.
Whether it is actually a crime for a person in a one-party consent jurisdiction recording a call with a person in a two-party consent jurisdiction is not a consistently settled issue. At least in US courts, dunno about elsewhere.
Sometimes the courts have sided "the stricter jurisdiction's law applies" while other times the courts have sided "the law where the recording was made applies". The federal law is not any clearer, stating one party consent is the default and states can override but offering no further guidance. I suspect this will someday be addressed in the Supreme Court.
If one state could make something illegal in the other 49 Florida would have already made life very painful for the blue states.
Extraterritorial effects are usually limited in scope for this kind of concern. If I had to place a bet I'd say this is the main line of reasoning the current Supreme Court would use to side with the "the law where the recording happens" as well. I may just be advertising my biased though, as that's also the conclusion I think makes sense personally.
Until that actually gets reviewed by a higher court (or more descript higher law comes about) what each regional court concludes remains the reality for cases in that region though.
I'm not a lawyer myself, I just had to spend some time with the company's regarding this topic recently ("yay" for filling in to manage internal IT on the side).
Should your concern lie with individuals transcribing their own conversations, or with mass surveillance and wiretapping actively being executed by a broad range of official and corporate entities without your consent?
Woah, that's a classic logical fallacy you got there, buddy. I can't be upset about A because B is related and also bad. One of the greatest of all time ways to derail an argument.
Shouldn't you be more concerned about starving children or something than my post?
See how productive of a conversation we are having when we both use these fallacies?
You're welcome to care about as many things as you desire, at the same time, friend. It's a question of perspective and relative importance. The reply didn't comment about A and B and C, only A - implying A was the most important thing to consider and discuss.
Not affiliated, but I'd guess it doesn't have a "bot" account join the zoom/meets call
The other meeting note takers usually have a bot join the meet to take notes, that seemed a bit strange to me.
I’m using Granola for macOS and it’s limited to that platform. Hoping this is a good windows alternative.
Wondering if anyone out there has an OSS macOS client similar to this one so I can ditch payware.
Something I find annoying with automatic transcriptions and summaries, like the one built into Teams, is that they lack the context necessary to properly interpret what's being said. Example if I have a meeting discussing products, abbreviations or systems with "internal" names then it can't discern them or statistically rejects them, replacing them with its best guess for a dictionary word instead. So say we have a long call involving frequent mentions about a measure called pNet pronounced in the meeting "Peenet". Then you end up with a transcription of a bunch of guys having a discussion about penises. Hilarious, the first few times. OK always hilarious, but not so useful.
Being able to set the system prompt for these transcriptions would be very useful. Like "You are a friendly bot transcribing meetings at a software company. Some common terms and abbreviations you'll encounter are...".
My favourite was Kubernetes in our meeting being referred to as Cuban Eighties. ⎈
Anecdotally, if you have an accent and want to reference Maltese Falcon[1], your voice recognition software may understand it as “Maltese f* off”.
[1]: https://en.m.wikipedia.org/wiki/The_Maltese_Falcon_(1941_fil...
Perhaps these will be flagged for the CIA or DEA to investigate due to illegal importation of Cubans from the enemy!
This should be trivially solveable with a glossary as context, as you suggest. I bet the above repo would love a PR, too!
But the error happens in 'audio to text' part, so text prompt won't solve it. The way to fix it is probably fine-tuning the underlying audio to text model.
Doing audio-to-text requires having a statistical model for what word or phrase a piece of sound is most likely to be. Without context, you can't do better than ranking the most likely candidates where a common word is more likely than an uncommon one. Having a task-specific dictionary at that point would help.
One could also imagine doing it at the summary step where the AI could simply be asked to do phonetic analysis. "Here is a transcription of a meeting. Here is a list of terms/names/participants etc. Given the transcription, the meeting context/topics and assuming the transcriptor has made errors, replace similarly sounding words and terms with more likely ones from the context"
Whisper accepts a system prompt.
Gong has such a feature. It’ll even expand out acronyms the first time they show up in the transcript.
Looks awesome, love that it is a local native app
>transcribing it using the Groq API
It's not really local: it sends all the audio to some cloud AI API.
I'm not familiar with Groq, but it looks like:
https://sdk.vercel.ai/providers/ai-sdk-providers/groq
Some open models support it. It seems in theory that you could use your own cloud AI then right?
thats true, plan is to update to transcribe locally next
Microsoft Teams already provides similar built-in features, along with translation, and I have to say it is one of the rare AI tools from Microsoft that makes sense and actually works -- I had good experience using it for reviewing meetings in non English language. It's not hard to imagine that this will be a standard feature of all mainstream video conference software. Wonder what is the place for these tools.
I've thoroughly enjoyed not having to anoint a "note taker" in my meetings in the last few months.
There's still a surprising lack of good video call recording services that can be controlled programmatically, unlike the end-to-end SaaS apps like Read.ai or Otter.ai.
The only open-source one I could find is Amurex, which looks promising. But it only supports Google Meet for now, it does it a bit differently with a Chrome extension, and it is generally rather immature, but I do wish them the best.
The only API services available are Recall.ai and MeetingBaaS, they both support the big three (Google Meet, Microsoft Teams and Zoom), but they are rather expensive at $0.5 - $1 per hour. The Calendar Syncing feature is also locked behind enterprise tiers with additional monthly fees in the hundreds, and it is rather important real-world use.
Hey there
The creator of Amurex here. Thank you for the kind words :D More platform support is coming very soon ;) (read next week)
> The only API services available are Recall.ai and MeetingBaaS, they both support the big three (Google Meet, Microsoft Teams and Zoom), but they are rather expensive at $0.5 - $1 per hour.
seems like someone has told you our internal roadmap xD but I am glad to see we are on the right track to solve the problem :D
You are doing great work, and I do think making it open-source is a smart strategic choice. There's still so much potential for building AI intelligence products on top of video call recordings, and right now you are offering the only practical foundation to build such systems on.
I've been keeping a close eye because $1/h is unsustainable for what we are building, and there's no good reason why it should cost so much. It's manageable for early traction, but soon we'll need to consider either to build all those integrations ourselves or to build on top of Amurex. We might be contributing soon.
I did see in GitHub that Teams support was almost done, exciting! Do you plan to continue with the browser extension model, or are you also looking for solutions to record meetings that happen in the Teams/Zoom native client?
I think this is why most companies do it by creating a bot that joins the meeting, it's also great free advertising for them. Of course it's a bit awkward for the user, but it's becoming a normal thing, and ethically it's better to be explicit about the fact you are recording.
> You are doing great work, and I do think making it open-source is a smart strategic choice. There's still so much potential for building AI intelligence products on top of video call recordings, and right now you are offering the only practical foundation to build such systems on.
Thank you :D
> I've been keeping a close eye because $1/h is unsustainable for what we are building, and there's no good reason why it should cost so much. It's manageable for early traction, but soon we'll need to consider either to build all those integrations ourselves or to build on top of Amurex. We might be contributing soon.
Sounds great! We are super happy to support all the integrations. If you can message me on discord, I'd be super keen to hear what you have to say.
> I did see in GitHub that Teams support was almost done, exciting! Do you plan to continue with the browser extension model, or are you also looking for solutions to record meetings that happen in the Teams/Zoom native client?
Coming soon ;)
> I think this is why most companies do it by creating a bot that joins the meeting, it's also great free advertising for them. Of course it's a bit awkward for the user, but it's becoming a normal thing, and ethically it's better to be explicit about the fact you are recording.
The problem with bots is that
- first, they are annoying. - second, I have a tendency to reject all the bots joining my meeting because they are annoying, which deems the bot products practically useless.
And you raise a good point about ethics, we expect the users to be grown up about their decisions. The users are expected to act according to their state laws.
I grew up reading and being influenced by liberation in FOSS software. I don't really want to impose our own "laws" on a user if their state says otherwise.
Creator of MeetingBaaS here (sorry for the double ping), you actually start lower than $1 / hour, rather $0.69 / hour and that scales down quite fast
Unfortunately cloud infrastructure has a cost :/
Hey there, I'm building an open source Recall at https://github.com/noah-duncan/attendee, designed for convenient self-hosting. It's fairly immature but other engineers are starting to contribute and things are picking up. Pretty sure it's the only open source example of a google meet bot that can extract audio, video, transcript and speak in the meeting.
Hey :)
Creator of MeetingBaas here
We're actually thinking of open-sourcing our bots too!
Has anyone done this on the Mac? I hate sending audio to Otter; it creeps me out.
Granola. Best meeting app I’ve used. I have a notepad that takes markup I can add myself and it intelligently fills in the notes I wrote.
eg. I put bullet points with something like “updates from Steve?” And do that for everyone during our check in. When the meeting ends it takes all their conversation in the transcript and fills in my markup with the notes.
I’ve attended meetings where I had zero participation and focus on doing something else during the meeting. When it’s over it gives me a detailed summary of the meeting. It felt like I had an assistant taking detailed, ordered notes for me. It’s almost like that scene from the movie Old School. Rodney Dangerfield sent his secretary to stenograph the lecture time so he didn’t have to attend and she gets called out by the professor. Felt just like that kind of transcribing.
Spellar.ai does a great job. There’s others out there for Mac but I like Spellar’s calendar integration.
Interestingly, their initial raison d’être was to help with English pronunciation and speaking speed, giving you real time feedback. They’ve downplayed this in recent releases, but the functionality is still there. Though, I’m a native English speaker and it always flagged me as pronouncing words incorrectly even though I’ve got little regional accent (I’ve been told this by others, not just my opinion. I had a speech therapist as a mother, hence little accent)
We do this at quillmeetings.com - the audio stays on your device and is transcribed by whisper. We also do speaker splitting and recognition with a combination of models. If you share or sync notes/meetings they are e2e encrypted.
FYI, the transcript-only product is free forever (it's local, so why not?), but generating AI notes, interpreting screenshots if you enable that, etc. are in the Pro plan and do require using a cloud API.
https://speechpulse.com does fully local audio transcription. The UI and settings are not the most intuitive, but it works fairly well and they are making constant updates.
As an additional note, Spellar does let you bring your own Open AI key but does not allow for purely local processing. You’ve still got to send the audio out for transcription and interpretation.
Also, I have no affiliation with Spellar, just a user.
[dead]
[dead]
[flagged]