Hey, I am creator of video2docs and this is my first "big" launch, I am nervous (no, I am fine actually) :D
But the idea is quite simple, recently I had to write more and more docs (more often - how-tos and guides for company systems) and got a bit tired of it. Decided it would be cool if I could just record a video clicking through the app (or multiple apps, does not matter) and then analyse the video content even without audio narration.
That is how video2docs was born! I plan to add audio analysis too, for even better quality documentation, but for now, I am happy with how it works even without it.
You can choose from 10 LLM models for video analysis.
Choose documentation style (tutorial, how-to, quickstart...)
And, of course, choose whether to include screenshots in generated markdown docs. Yay, no need to make screenshots manually! :)
I hope someone else might find this useful. I will continue working on this project!
More or less, I have python worker that does the video processing job. Video into frames, frame deduplication, frame LLM analysis and then generating docs from that information. Soon audio narration analysis will be added too!
Hey, I am creator of video2docs and this is my first "big" launch, I am nervous (no, I am fine actually) :D
But the idea is quite simple, recently I had to write more and more docs (more often - how-tos and guides for company systems) and got a bit tired of it. Decided it would be cool if I could just record a video clicking through the app (or multiple apps, does not matter) and then analyse the video content even without audio narration. That is how video2docs was born! I plan to add audio analysis too, for even better quality documentation, but for now, I am happy with how it works even without it.
You can choose from 10 LLM models for video analysis. Choose documentation style (tutorial, how-to, quickstart...) And, of course, choose whether to include screenshots in generated markdown docs. Yay, no need to make screenshots manually! :)
I hope someone else might find this useful. I will continue working on this project!
Looks awesome!
Is there anything you can share about the architecture or pipeline you used for it? A high-level overview would be enough.
I’m guessing you’re doing video-to-image, image-to-text, and then text-to-docs, right? Since not all of the models you mentioned are multimodal.
Thanks! :)
More or less, I have python worker that does the video processing job. Video into frames, frame deduplication, frame LLM analysis and then generating docs from that information. Soon audio narration analysis will be added too!