I indexed 669 GB of my GoPro videos using my M1 Max computer and local ML models

213 points | by iliashad 7 hours ago

42 comments

asenna 3 hours ago
Funny this is almost EXACTLY what I did a few days ago on the same machine using very similar techniques and was on the front-page of HN as well:
https://news.ycombinator.com/item?id=48222733 https://blog.simbastack.com/indexed-a-year-of-video-locally/
I wasn't familiar with your project though, interesting stuff.
I'm trying to add more photography related features to Framedex but yeah there's so much we can do locally, exciting times.
justinram11 34 minutes ago
Something I've enjoyed more than I expected is Google and Apple photos sending me photo memories and compilations of various things in my life and my kids lives over the last decade.
I'm really bullish on taking more video of my kids, with the thought that it will become easier and easier for AI to put them into little compilations I can enjoy later.
robrain 3 hours ago
DaVinci 21 has indexing built-in (AI IntelliSearch). Not to diminish the work you did, but this is now available to many users (probably only Studio users since it has AI in the name)
[-]
- iliashad 3 hours ago
  Yes, I didn’t look at it. But does it upload your videos to the cloud or process them locally? And does it allow to provide custom faces data to help labeling faces in your videos ?
  I think Adobe premiere pro have it as well but cloud processed
  [-]
  - teovall 3 hours ago
    The AI features in DaVinci Resolve are all processed locally. It does not currently have face tagging.
    [-]
    - robrain 3 hours ago
      Haven’t tried it yet, and I don’t know if it matches OP’s requirements, but the blurb says “You can even search for individual faces”
      https://www.blackmagicdesign.com/products/davinciresolve/wha...
    - iliashad 3 hours ago
      That’s great to know, thank you!
Beijinger 6 hours ago
Does it work for porn collections too?
[-]
- pduggishetti 6 hours ago
  You'll need a lora for this, porn content rejection is heavy. Or you'll need a abliterated model, not sure if vision also works.
  You might want to add something like yolo finetune to detect scenes + face recognition too.
  [-]
  - dotancohen an hour ago
    For GP's purpose, can face recognition techniques be repurposed for, um, other body parts recognition? Sometimes the actresses are facing away from camera. There are exposed lips, if that helps.
  - vorticalbox 4 hours ago
    Vision still works perfectly fine in abliterated models.
    [-]
    - pduggishetti 3 hours ago
      Never tried any of this for porn, just speaking out how I would go about it tbh!
- sarjann 5 hours ago
  Asking the important questions
  [-]
  - fhdkweig 4 hours ago
    The internet is for porn. https://www.youtube.com/watch?v=LTJvdGcb7Fs
- iliashad 3 hours ago
  Why it’s always the same question? Hahah. I posted my project over Reddit and I got the same one hahah
- lifestyleguru 5 hours ago
  Last time I tried whisper, it hallucinated an elaborate conversation from sounds of slapping and moaning and it took minutes to spit every single line of it.
  [-]
  - 3eb7988a1663 4 hours ago
    Parakeet has been trained to detect non-voice sounds and exclude that from identification, so you might have better luck with that family.
  - dotancohen an hour ago
    If I remember correctly, the whisper documentation actually recommends to trim non-speech portions as the models halucinate heavily during those portions.
- supertroop 5 hours ago
  Not sure if you’re being sarcastic but I think this is an interesting question. Would deep seek be useful here since it is local?
  [-]
  - fibers an hour ago
    just because it is local does not mean it wouldn't reject explicit content. you can definitely try and find abilated models and can attempt to use unsloth or something similar to tune it properly.
  - okr an hour ago
    Depends how deep you wanna go.
WarOnPrivacy 4 hours ago
I was surprised to learn that the
```
    M1 Max CPU is an ARM/SoC, comparable to an 11th gen Intel i9
```
Do I have it right? Would Windows ARM performance be similar for those cpu?
ref: https://www.cpubenchmark.net/compare/4585vs4245/Apple-M1-Max...
[-]
- pachouli-please 4 hours ago
  It's also a bit apples (heh) to oranges for a handful of reasons, but most impactful
  - "unified" ram makes all the system ram available as VRAM - dedicated ai coaccelerator thingy
  Both of these reasons allow the apple silicon chips to crush conventional cpus in these kind of AI model workload stuffs
  No idea about what the windows arm stuff is capable of. I know they use Qualcomm snapdragon chips though.
- owldown 3 hours ago
  “Comparable” is maybe true if we are talking about single core performance, but for memory bandwidth, the M1 Max is about 8 times faster. Wider bus, lower latency, not even close.
- iliashad 3 hours ago
  To your question, I can’t deny or confirm that because I didn’t tried it this project over a Windows machine yet or a machine with this config
cake-rusk an hour ago
I have an RTX 5090 card but it only has 32 GB RAM, can something like this work on my machine?
lgats 6 hours ago
the link https://iliashaddad.com/blog/i-indexed-669-gb-of-my-gopro-vi...
[-]
- iliashad 5 hours ago
  Thank you
fl0id 4 hours ago
it is possible to use apple gpu with containers. either with podman + runkit + recent mesa or with recent vllm-metal from docker https://www.docker.com/blog/docker-model-runner-vllm-metal-m...
[-]
- iliashad 3 hours ago
  I was looking for a solution for this issue of running docker containers over MPS and utilizing their GPU power. I think this project will be the solution for it, I’ll try it very soon and add support for it. Thank you, much appreciated
WhitneyLand 3 hours ago
I’d like to see embedding of actual video clips become practical in this type of workflow.
Frame level embedding it covering a lot, but can miss out on a lot of action related searches.
wferrell 2 hours ago
https://iliashaddad.com/blog/i-indexed-669-gb-of-my-gopro-vi...
tontonius 2 hours ago
if anyone is interested in searching large video collections local and offline I suggest taking a look at Jumper https://docs.getjumper.io
comes with some nifty features like NLE- integrations, people search, MCP, API etc
Disclaimer: one of the co-founders
[-]
- dotancohen an hour ago
  The link just timed out for me. I'm in Israel, connecting via residential WiFi. All other sites that I regularly use connect just fine.
rho138 6 hours ago
This would fit most best as a “Show HN:” post :)
[-]
- culi 5 hours ago
  The title should link to the "full article". I wonder if OP's domain name is banned or something and they're doing this to get around it
- iliashad 5 hours ago
  I tried to edit it and add Show HN, but it doesn't show the edited version. Thank you!
iliashad 4 hours ago
I would love your feedback and suggestions for new improvements or features you wanna have, either in the source available version, the desktop app or blog post itself?
m3kw9 4 hours ago
Grab frames, lower res, classify, combine meta data. Write to sql
[-]
- iliashad 3 hours ago
  Not really. Grab frames, lower res, classify, combine metadata, transcribe the audio, convert those data (text, visual and audio) to embedding, save them over a vector DB and SQL DB. Which helped me to do semantic search, RAG, search using a screenshot of the video to find the exact the moment in the video plus search using an audio file as well. And other features unlocked with vector DB
  [-]
  - ingvay7 2 hours ago
    Really cool work and workflow. strongly prefer this kind of local, open pipeline that i control over a dependency on Adobe tools and lock ins.
nyxtom 2 hours ago
Now this ^^ is an awesome use case!