Show HN: Mikey – No bot meeting notetaker for Windows

53 points | by hotrod46 5 months ago

72 comments

dmantis 5 months ago
Looks cool. Is it possible to use a local model (like whisper) to avoid leaking conversations to the cloud-based AI?
[-]
- hotrod46 5 months ago
  That’s what’s planned next :)
- hotrod46 4 months ago
  hi, ive added this, lmk what you think
- peterhorvath01 5 months ago
  [dead]
mijoharas 5 months ago
I was looking into something like this for linux recently. Didn't find anything obviously simple
(considered hooking up whisper.cpp and a bit of audio magic to make it at least transcribe, but it firstly seemed like a fair bit of a pain and secondly I couldn't think of a nice way to do speaker detection.)
[-]
- utrack 5 months ago
  https://github.com/m-bain/whisperX looks promising - I'm hacking away on an always-on transcriber for my notes for later search&recall. It has support for diarization (the speaker detection you're looking for).
  I'm currently hacking away on a mix of https://github.com/speaches-ai/speaches + https://github.com/ufal/whisper_streaming though - mostly because my laptop doesn't have a decent GPU, I stream the audio to a home server instead.
  But overall it's pretty simple to do after you wrangle the Python dependencies - all you need is a sink for the text files (for example, create a new file for every Teams meeting, but that's another story...)
  [-]
  - mijoharas 5 months ago
    Any good solutions for capturing the audio streams and piping them where they're needed? (I.e both microphone and speakers. I was wondering if I needed to mess with pulseaudio and/or jack (I mean pipewire under the hood, but I think those APIs sit on top and might be clearer))
    [-]
    - mijoharas 5 months ago
      Never mind, played around a little, and pulseaudio's cli API makes it easy enough to sling some loopback/virtual devices around that you can then read from easily enough.
  - ewuhic 5 months ago
    So which are you "hacking away on" in the end?
m348e912 5 months ago
I don't think this tool can do what native AI transcription integrations can do, track who is speaking. Is there any novel way of addressing that gap?
[-]
- mpdaugherty 5 months ago
  We did a lot of work at https://www.quillmeetings.com to build a diarization & speaker recognition pipeline that works locally on mac and windows. Basically, we can create embeddings of parts of the audio, like you might create embeddings for text for a RAG system, and cluster them (simplifying a lot of details from the "last 80%" that has taken a lot of effort to get working...)
  The speaker recognition can't be as perfect as listening to each stream separately like Zoom itself can do, but it also learns your contacts over time and can recognize voices for ad-hoc in-person meetings, etc. which I've found really magical since we launched it.
  [-]
  - jtswole 5 months ago
    Ah yes, a locally-run, mostly-accurate speaker recognition pipeline that isn't open source. Love to see cool features locked away while the rest of us plebs make do with whatever scraps the OSS world has managed to build. But hey, at least it kind of works, so you can enjoy your slightly-wrong diarization in private.
    Truly the future of meetings.
  - prollyjethi 5 months ago
    not open source :/
lukeluc 5 months ago
I'm not sure if you have any interest in porting this to Mac, but in case you do, here's some native Swift code that might help. It was built by me and a friend originally for Electron, but the repo should act as a general template. It's completely open source, and if you (or anyone) need any license modifications for any reason, just reach out: https://github.com/O4FDev/electron-system-audio-recorder/blo...
[-]
- hotrod46 4 months ago
  thats cool, ill look into it
bbor 5 months ago
What does “no bot” mean? I don’t see any elaboration, tho maybe I’m just blind!
[-]
- simplemindedbot 5 months ago
  There’s not a “bot” that needs to attend the meeting and show up in the list of attendees thus giving away the recording of the call. Otter.ai, for instance, shows up as “Otter” (or another name) on a Zoom call when it is recording and taking notes.
  [-]
  - Cheer2171 5 months ago
    Oh, so it is for more "seamlessly" helping people commit the crime of wiretapping in two-party consent jurisdictions, like California?
    If you don't like people knowing you are recording them, you probably have a consent issue.
    [-]
    - stevenAthompson 5 months ago
      You could have said this exact same thing without it sounding like a personal attack, but you chose to be unkind instead. I wonder why?
      [-]
      - Cheer2171 5 months ago
        Because crime is bad and I don't have to be nice to those who support criminals doing crimes. If your marketing differentiator vs all the AI recording bot products is that with your product, you can record people without them knowing you are recording them... then your business model is literally to facilitate crime in many jurisdictions, including California.
        Let me be clear: if you have a bot capturing audio in a call you have with someone in California, and you do not tell that person you are recording them, then you have committed a felony, even if you are not in California.
        And what is it about you that makes you so allergic to me calling this out? I wonder why....
        See, I can do that too. How does that feel? We having a good conversation here?
        [-]
        free_bip 5 months ago
        Are you an attorney? I would be careful making such sweeping statements unless you are. A transcription is not a wiretap, it's not obvious to me that an anti-recording law would apply here. Plus, if you're on a call with coworkers, they likely already know that transcription is taking place, even if you don't explicitly say so at the start of each meeting. This is why you should be more kind - you might not know all these things.
        [-]
        5 months ago
        [deleted]
        booleandilemma 5 months ago
        Thanks for your comments. I wish more people had a personal policy of not putting up with those who commit or endorse crime/fraud/bullshit. The world would be a better place.
        [-]
        stevenAthompson 5 months ago
        I can think of at least five reasons to use this that aren't illegal, including the fact that the law doesn't work the way this person thinks it does.
        The way he phrased it has turned me from someone eager to discuss the potential uses of this into someone unwilling to engage with him to discuss it any further. Even if it turns out that he's absolutely 100% correct (he's not) I'll talk to someone else about it instead.
        I suspect this person regularly has "conversations" where the other party suddenly becomes silent, and he misinterprets that as a "victory" instead of the other person deciding he isn't worth the trouble.
    - zamadatix 5 months ago
      Whether it is actually a crime for a person in a one-party consent jurisdiction recording a call with a person in a two-party consent jurisdiction is not a consistently settled issue. At least in US courts, dunno about elsewhere.
      Sometimes the courts have sided "the stricter jurisdiction's law applies" while other times the courts have sided "the law where the recording was made applies". The federal law is not any clearer, stating one party consent is the default and states can override but offering no further guidance. I suspect this will someday be addressed in the Supreme Court.
      [-]
      - stevenAthompson 5 months ago
        If one state could make something illegal in the other 49 Florida would have already made life very painful for the blue states.
        [-]
        zamadatix 5 months ago
        Extraterritorial effects are usually limited in scope for this kind of concern. If I had to place a bet I'd say this is the main line of reasoning the current Supreme Court would use to side with the "the law where the recording happens" as well. I may just be advertising my biased though, as that's also the conclusion I think makes sense personally.
        Until that actually gets reviewed by a higher court (or more descript higher law comes about) what each regional court concludes remains the reality for cases in that region though.
        I'm not a lawyer myself, I just had to spend some time with the company's regarding this topic recently ("yay" for filling in to manage internal IT on the side).
    - zephyreon 5 months ago
      I always ask if I can transcribe using an AI tool regardless of jurisdiction. Not sure what the other commenter’s intentions were but just throwing my two cents in.
      I prefer non-bot transcription tools solely because they’re not a nuisance during the meeting — they take up valuable screen real estate and provide no input during the meeting so I’d rather them be invisible.
    - adewinter 5 months ago
      Should your concern lie with individuals transcribing their own conversations, or with mass surveillance and wiretapping actively being executed by a broad range of official and corporate entities without your consent?
      [-]
      - Cheer2171 5 months ago
        Woah, that's a classic logical fallacy you got there, buddy. I can't be upset about A because B is related and also bad. One of the greatest of all time ways to derail an argument.
        Shouldn't you be more concerned about starving children or something than my post?
        See how productive of a conversation we are having when we both use these fallacies?
        [-]
        adewinter 5 months ago
        You're welcome to care about as many things as you desire, at the same time, friend. It's a question of perspective and relative importance. The reply didn't comment about A and B and C, only A - implying A was the most important thing to consider and discuss.
- maccard 5 months ago
  Not affiliated, but I'd guess it doesn't have a "bot" account join the zoom/meets call
- hotrod46 5 months ago
  The other meeting note takers usually have a bot join the meet to take notes, that seemed a bit strange to me.
someonehere 5 months ago
I’m using Granola for macOS and it’s limited to that platform. Hoping this is a good windows alternative.
Wondering if anyone out there has an OSS macOS client similar to this one so I can ditch payware.
alkonaut 5 months ago
Something I find annoying with automatic transcriptions and summaries, like the one built into Teams, is that they lack the context necessary to properly interpret what's being said. Example if I have a meeting discussing products, abbreviations or systems with "internal" names then it can't discern them or statistically rejects them, replacing them with its best guess for a dictionary word instead. So say we have a long call involving frequent mentions about a measure called pNet pronounced in the meeting "Peenet". Then you end up with a transcription of a bunch of guys having a discussion about penises. Hilarious, the first few times. OK always hilarious, but not so useful.
Being able to set the system prompt for these transcriptions would be very useful. Like "You are a friendly bot transcribing meetings at a software company. Some common terms and abbreviations you'll encounter are...".
[-]
- _joel 5 months ago
  My favourite was Kubernetes in our meeting being referred to as Cuban Eighties. ⎈
  [-]
  - thih9 5 months ago
    Anecdotally, if you have an accent and want to reference Maltese Falcon[1], your voice recognition software may understand it as “Maltese f* off”.
    [1]: https://en.m.wikipedia.org/wiki/The_Maltese_Falcon_(1941_fil...
  - sys_64738 5 months ago
    Perhaps these will be flagged for the CIA or DEA to investigate due to illegal importation of Cubans from the enemy!
- jvanderbot 5 months ago
  This should be trivially solveable with a glossary as context, as you suggest. I bet the above repo would love a PR, too!
  [-]
  - sesm 5 months ago
    But the error happens in 'audio to text' part, so text prompt won't solve it. The way to fix it is probably fine-tuning the underlying audio to text model.
    [-]
    - alkonaut 5 months ago
      Doing audio-to-text requires having a statistical model for what word or phrase a piece of sound is most likely to be. Without context, you can't do better than ranking the most likely candidates where a common word is more likely than an uncommon one. Having a task-specific dictionary at that point would help.
      One could also imagine doing it at the summary step where the AI could simply be asked to do phonetic analysis. "Here is a transcription of a meeting. Here is a list of terms/names/participants etc. Given the transcription, the meeting context/topics and assuming the transcriptor has made errors, replace similarly sounding words and terms with more likely ones from the context"
    - ukuina 5 months ago
      Whisper accepts a system prompt.
- collinmcnulty 5 months ago
  Gong has such a feature. It’ll even expand out acronyms the first time they show up in the transcript.
- 5 months ago
  [deleted]
sirjaz 5 months ago
Looks awesome, love that it is a local native app
[-]
- ForHackernews 5 months ago
  >transcribing it using the Groq API
  It's not really local: it sends all the audio to some cloud AI API.
  [-]
  - troyvit 5 months ago
    I'm not familiar with Groq, but it looks like:
    https://sdk.vercel.ai/providers/ai-sdk-providers/groq
    Some open models support it. It seems in theory that you could use your own cloud AI then right?
  - hotrod46 4 months ago
    ive fixed it now, it now runs whisper locally to transcribe
  - hotrod46 5 months ago
    thats true, plan is to update to transcribe locally next
5 months ago
[deleted]
rs186 5 months ago
Microsoft Teams already provides similar built-in features, along with translation, and I have to say it is one of the rare AI tools from Microsoft that makes sense and actually works -- I had good experience using it for reviewing meetings in non English language. It's not hard to imagine that this will be a standard feature of all mainstream video conference software. Wonder what is the place for these tools.
[-]
- darknavi 5 months ago
  I've thoroughly enjoyed not having to anoint a "note taker" in my meetings in the last few months.
- 5 months ago
  [deleted]
oersted 5 months ago
There's still a surprising lack of good video call recording services that can be controlled programmatically, unlike the end-to-end SaaS apps like Read.ai or Otter.ai.
The only open-source one I could find is Amurex, which looks promising. But it only supports Google Meet for now, it does it a bit differently with a Chrome extension, and it is generally rather immature, but I do wish them the best.
The only API services available are Recall.ai and MeetingBaaS, they both support the big three (Google Meet, Microsoft Teams and Zoom), but they are rather expensive at $0.5 - $1 per hour. The Calendar Syncing feature is also locked behind enterprise tiers with additional monthly fees in the hundreds, and it is rather important real-world use.
[-]
- jtswole 5 months ago
  Hey there
  The creator of Amurex here. Thank you for the kind words :D More platform support is coming very soon ;) (read next week)
  > The only API services available are Recall.ai and MeetingBaaS, they both support the big three (Google Meet, Microsoft Teams and Zoom), but they are rather expensive at $0.5 - $1 per hour.
  seems like someone has told you our internal roadmap xD but I am glad to see we are on the right track to solve the problem :D
  [-]
  - oersted 5 months ago
    You are doing great work, and I do think making it open-source is a smart strategic choice. There's still so much potential for building AI intelligence products on top of video call recordings, and right now you are offering the only practical foundation to build such systems on.
    I've been keeping a close eye because $1/h is unsustainable for what we are building, and there's no good reason why it should cost so much. It's manageable for early traction, but soon we'll need to consider either to build all those integrations ourselves or to build on top of Amurex. We might be contributing soon.
    I did see in GitHub that Teams support was almost done, exciting! Do you plan to continue with the browser extension model, or are you also looking for solutions to record meetings that happen in the Teams/Zoom native client?
    I think this is why most companies do it by creating a bot that joins the meeting, it's also great free advertising for them. Of course it's a bit awkward for the user, but it's becoming a normal thing, and ethically it's better to be explicit about the fact you are recording.
    [-]
    - jtswole 5 months ago
      > You are doing great work, and I do think making it open-source is a smart strategic choice. There's still so much potential for building AI intelligence products on top of video call recordings, and right now you are offering the only practical foundation to build such systems on.
      Thank you :D
      > I've been keeping a close eye because $1/h is unsustainable for what we are building, and there's no good reason why it should cost so much. It's manageable for early traction, but soon we'll need to consider either to build all those integrations ourselves or to build on top of Amurex. We might be contributing soon.
      Sounds great! We are super happy to support all the integrations. If you can message me on discord, I'd be super keen to hear what you have to say.
      > I did see in GitHub that Teams support was almost done, exciting! Do you plan to continue with the browser extension model, or are you also looking for solutions to record meetings that happen in the Teams/Zoom native client?
      Coming soon ;)
      > I think this is why most companies do it by creating a bot that joins the meeting, it's also great free advertising for them. Of course it's a bit awkward for the user, but it's becoming a normal thing, and ethically it's better to be explicit about the fact you are recording.
      The problem with bots is that
      - first, they are annoying. - second, I have a tendency to reject all the bots joining my meeting because they are annoying, which deems the bot products practically useless.
      And you raise a good point about ethics, we expect the users to be grown up about their decisions. The users are expected to act according to their state laws.
      I grew up reading and being influenced by liberation in FOSS software. I don't really want to impose our own "laws" on a user if their state says otherwise.
    - Erazal 5 months ago
      Creator of MeetingBaaS here (sorry for the double ping), you actually start lower than $1 / hour, rather $0.69 / hour and that scales down quite fast
      Unfortunately cloud infrastructure has a cost :/
- nduncan_hmc 5 months ago
  Hey there, I'm building an open source Recall at https://github.com/noah-duncan/attendee, designed for convenient self-hosting. It's fairly immature but other engineers are starting to contribute and things are picking up. Pretty sure it's the only open source example of a google meet bot that can extract audio, video, transcript and speak in the meeting.
- Erazal 5 months ago
  Hey :)
  Creator of MeetingBaas here
  We're actually thinking of open-sourcing our bots too!
- 5 months ago
  [deleted]
ttul 5 months ago
Has anyone done this on the Mac? I hate sending audio to Otter; it creeps me out.
[-]
- someonehere 5 months ago
  Granola. Best meeting app I’ve used. I have a notepad that takes markup I can add myself and it intelligently fills in the notes I wrote.
  eg. I put bullet points with something like “updates from Steve?” And do that for everyone during our check in. When the meeting ends it takes all their conversation in the transcript and fills in my markup with the notes.
  I’ve attended meetings where I had zero participation and focus on doing something else during the meeting. When it’s over it gives me a detailed summary of the meeting. It felt like I had an assistant taking detailed, ordered notes for me. It’s almost like that scene from the movie Old School. Rodney Dangerfield sent his secretary to stenograph the lecture time so he didn’t have to attend and she gets called out by the professor. Felt just like that kind of transcribing.
- simplemindedbot 5 months ago
  Spellar.ai does a great job. There’s others out there for Mac but I like Spellar’s calendar integration.
  Interestingly, their initial raison d’être was to help with English pronunciation and speaking speed, giving you real time feedback. They’ve downplayed this in recent releases, but the functionality is still there. Though, I’m a native English speaker and it always flagged me as pronouncing words incorrectly even though I’ve got little regional accent (I’ve been told this by others, not just my opinion. I had a speech therapist as a mother, hence little accent)
- mpdaugherty 5 months ago
  We do this at quillmeetings.com - the audio stays on your device and is transcribed by whisper. We also do speaker splitting and recognition with a combination of models. If you share or sync notes/meetings they are e2e encrypted.
  FYI, the transcript-only product is free forever (it's local, so why not?), but generating AI notes, interpreting screenshots if you enable that, etc. are in the Pro plan and do require using a cloud API.
- doug_life 5 months ago
  https://speechpulse.com does fully local audio transcription. The UI and settings are not the most intuitive, but it works fairly well and they are making constant updates.
- simplemindedbot 5 months ago
  As an additional note, Spellar does let you bring your own Open AI key but does not allow for purely local processing. You’ve still got to send the audio out for transcription and interpretation.
  Also, I have no affiliation with Spellar, just a user.
- 5 months ago
  [deleted]
peterhorvath01 5 months ago
[dead]
peterhorvath01 5 months ago
[dead]
dartos 5 months ago
[flagged]
[-]
- 5 months ago
  [deleted]