Congratulations Aria & Tony, this is much needed for healthcare. I work at UK NHS Wales in software engineering, and would be happy to talk with you personally, and also happy to introduce you to the NHS Wales AI team. Peronal email joel@joelparkerhenderson.com, work email joel.henderson@wales.nhs.uk. And we're hiring-- if anyone here is keen to code for social good healthcare, email me.
Hi jph! Not intending to subvert the thread, but I'd love to chat to someone like you. The non-profit I work at has been working on democratizing evals. This wouldn't be to ensure your in-house AI is up to scratch (parachute looks ideal!), but on ensuring the general landscape of models is up-to-date on best practice, e.g. NICE and other guidance, so that everyday model users aren't misled. Such a demo eval is here: https://weval.org/analysis/uk-clinical-scenarios/08278696ca2...
We're looking for domain experts especially in high risk domains like healthcare, education, therapy. Then we'd work together co-authoring an eval in your specialism to expose and motivate AI labs to do better.
Just to say - I was impressed chatting to someone in and around NHS Wales software a few years ago when I was leading the development side of a health app. I seem to remember there were some good plans in place to join up patient health records across the Welsh trusts, which sounded very sensible.
This is exactly the type of company that I like to see:
- Sounds very complicated/thorny to navigate (regulatory, medical, compliance)
- Not super "sexy", which keeps competition lower
- Clear pain points (fines) for customers that can and are willing to pay (hospitals)
Next up is just great execution by you all!
That list of logos you all have - are those paying customers today?
This is cool, but I’m a little skeptical. If Parachute uses AI agents to evaluate other models, who’s evaluating the AI agents? It’s hard to imagine it’s safe to entrust model validation and bias assessments to an automated system, especially in healthcare. Validating clinical AI is pretty complex between finding the right data, ensure event timings are accurate, simulating the model, etc. That’s why I’m guessing Parachute is a little less automated than the landing page makes it out to be, which is maybe a good thing. Regardless, this is cool. Hope you make AI in healthcare more safe.
That’s a great point. We don’t use AI agents to grade other models. Instead, we run in-house evaluations tailored to each category of clinical AI, giving hospitals an apples-to-apples comparison between similar vendors.
This line of thinking always leaves me confused about other peoples experience in the Pre-AI world. People and systems around me fail all the time because evaluation fails. Yes, the failure modes are different but I don't consider them favorable without AI. In fact, I consider that they are better.
Please don't make this mistake of thinking "aha, but you see, a human intervened!" This will never happen in the real world for the vast majority of humans in a similar scenario.
I'm afraid I don't quite understand your point. What line of thinking are you referencing? Also risk scores and algorithms have been used in medicine for over 50 years, so evaluating them isn't anything new.
> This is cool, but I’m a little skeptical. If Parachute uses AI agents to evaluate other models, who’s evaluating the AI agents?
Usually you can run human-in-the-loop spot checks to ensure that there's parity between your LLM evaluators and the equivalent specialist human evaluator.
How did you get the numbers on your landing page? It looks like an AI generated product with AI generated "safety". Just like the "2000 clinical AI tools" that hit the US market, this looks like one of the "2000 governance tools" that hit the market. How are you vetting every AI Scribe tool so your product itself isn't biased ? Have you done any work with the companies you have listed in your landing page? It looks like a governance tool that the "trying-to-be" scribe companies would use to not get legit audits.
We use in-house evals (based on existing state-of-the-art benchmarks) to compare ambient scribes.
If you take a deeper look into the companies on our landing page, you will see that the first list refers to the compliance standards our workflows follow and the second refers to the existing tools we integrate with.
> We use in-house evals (based on existing state-of-the-art benchmarks) to compare ambient scribes.
Have you validated that your in-house evals accurately reflect real-world performance?
> If you take a deeper look into the companies on our landing page, you will see that the first list refers to the compliance standards our workflows follow and the second refers to the existing tools we integrate with.
I am talking about your use of Abridge Ambient Scribe, Nuance and Deepscribe brands in your landing page. You have numbers on the number of beds, hourly efficiency and their costs next to the actual brands. I don't see any proper attributions or disclaimers.
Also, if you were to compare actual numbers you get from the websites, these companies can use different models for different users, have enterprise discounts for different organizations and etc. How are you planning on having access to these to make a proper comparisons to see what they would actually offer to their potential customer?
Fwiw, I am a fan of the "AI marketplace" runs. This one just raises a lot of questions for me.
> Parachute evaluates vendors against a hospital’s clinical needs and flags compliance and security risks before a pilot even begins
this is humans? I'm really not sure how this could be automated given the vast spectrum of applications and specific requirements complex organisations like hospitals have. It would have to boil down to "check box" compliance style analysis which in my experience usually leads to poor outcomes down the track (the worst product from every other point of view gets chosen because it checks the most arbitrary boxes on the security / compliance forms - then the integration bill dwarfs whatever it would have cost to address most of those things bespoke anyway).
Evidence for safe and fair AI systems is possible as long as you define what "safe" and "fair" mean for your usecase. Fairness might look like "no cohort has >5% higher false positive rate than another" and safety might mean "the model must have a false negative rate of less than 15%". Safety more so encompasses the workflows around the model, including human intervention, auditing, monitoring, etc.
Thanks for your question. Parachute's workflows are built around widely accepted conventions for safety and fairness for AI models in healthcare such as NIST AI RMF, CHAI and HAIP's HEAAL framework.
Where are your promises or goals of addressing the fear that these large language model medical paperwork assistants will be implanting subtle time bombs into their reports.
We've all seen how powerful language can be in legal defenses surrounding the for profit healthcare industry of the united states.
What new "pre-existing conditions" alike thought, and legal argument, terminating phrases will these large language models come up with for future generations?
I suppose “It is difficult to get a man to understand something” and all, but I’ll try to help you understand.
The OP provided you one such “time bomb”: pre-existing condition. This was, 40 years ago, a totally innocuous phrase and then it became a rally cry of health insurers’ “delay,deny,defend” modus operandi.
If a large language model is taking notes for a doctor how will you defend against it slipping in phrases such as this to allow insurers to avoid their responsibilities?
Tell me how your product is designed to defend people from health insurers, or admit how your product is designed to help health insurers.
Your comment is a non sequitur. This is a policy issue, not something that clinical AI or monitoring thereof can solve. The Affordable Care Act (Obamacare) prevents health plans from charging more or denying coverage based on pre-existing conditions.
Like I said above, we don’t use AI agents to grade other models. Instead, we run in-house evaluations tailored to each category of clinical AI, giving hospitals an apples-to-apples comparison between similar vendors.
These are extraordinary claims for a rapidly evolving field with a huge breadth of intended uses and technologies.
Here are a few questions that should be part of an evaluation of the Parachute platform to pressure test the claims made on the website and this post:
1) How many Parachute customers have passed regulatory audits by CMS, OCR, CLIA/CLAP, and the FDA?
2) What high quality peer-reviewed scientific evidence supports the claims of increased safety and detection of hallucinations and bias?
3) What liability does Parachute assume during production deployment? What are the SLAs?
4) How many years of regulatory experience does the team have with HIPPA, ISO, CFR, FDA, CMS, and state medical board compliance?
This is a good question. Parachute is not a certification body so we do not help you pass audits. Instead we help with internal decision-making surrounding the implementation of AI tools. We also help hospitals keep track of why decisions we made and who made them to produce to regulators and litigators in the future. Parachute does not assume any liability during production deployment.
Congratulations Aria & Tony, this is much needed for healthcare. I work at UK NHS Wales in software engineering, and would be happy to talk with you personally, and also happy to introduce you to the NHS Wales AI team. Peronal email joel@joelparkerhenderson.com, work email joel.henderson@wales.nhs.uk. And we're hiring-- if anyone here is keen to code for social good healthcare, email me.
Hi jph! Not intending to subvert the thread, but I'd love to chat to someone like you. The non-profit I work at has been working on democratizing evals. This wouldn't be to ensure your in-house AI is up to scratch (parachute looks ideal!), but on ensuring the general landscape of models is up-to-date on best practice, e.g. NICE and other guidance, so that everyday model users aren't misled. Such a demo eval is here: https://weval.org/analysis/uk-clinical-scenarios/08278696ca2...
We're looking for domain experts especially in high risk domains like healthcare, education, therapy. Then we'd work together co-authoring an eval in your specialism to expose and motivate AI labs to do better.
Just to say - I was impressed chatting to someone in and around NHS Wales software a few years ago when I was leading the development side of a health app. I seem to remember there were some good plans in place to join up patient health records across the Welsh trusts, which sounded very sensible.
Thanks! Would love to see how we can help!
This is exactly the type of company that I like to see: - Sounds very complicated/thorny to navigate (regulatory, medical, compliance) - Not super "sexy", which keeps competition lower - Clear pain points (fines) for customers that can and are willing to pay (hospitals)
Next up is just great execution by you all!
That list of logos you all have - are those paying customers today?
Best of luck!
> That list of logos you all have - are those paying customers today?
Doesn't look like it. The first list of logos is standards bodies. The second list of logos is integrations.
That's correct!
This is cool, but I’m a little skeptical. If Parachute uses AI agents to evaluate other models, who’s evaluating the AI agents? It’s hard to imagine it’s safe to entrust model validation and bias assessments to an automated system, especially in healthcare. Validating clinical AI is pretty complex between finding the right data, ensure event timings are accurate, simulating the model, etc. That’s why I’m guessing Parachute is a little less automated than the landing page makes it out to be, which is maybe a good thing. Regardless, this is cool. Hope you make AI in healthcare more safe.
That’s a great point. We don’t use AI agents to grade other models. Instead, we run in-house evaluations tailored to each category of clinical AI, giving hospitals an apples-to-apples comparison between similar vendors.
This line of thinking always leaves me confused about other peoples experience in the Pre-AI world. People and systems around me fail all the time because evaluation fails. Yes, the failure modes are different but I don't consider them favorable without AI. In fact, I consider that they are better.
For example, consider what happens in this video: https://www.youtube.com/watch?v=AZhCYisIQB8&t=2s
Please don't make this mistake of thinking "aha, but you see, a human intervened!" This will never happen in the real world for the vast majority of humans in a similar scenario.
I'm afraid I don't quite understand your point. What line of thinking are you referencing? Also risk scores and algorithms have been used in medicine for over 50 years, so evaluating them isn't anything new.
> This is cool, but I’m a little skeptical. If Parachute uses AI agents to evaluate other models, who’s evaluating the AI agents?
Usually you can run human-in-the-loop spot checks to ensure that there's parity between your LLM evaluators and the equivalent specialist human evaluator.
I wouldn't touch a YC company for this use case. All the marketing from the landing pages is just that - blabla.
My friend, you are overthinking! The funding round just came in and its smooth sailing ahead. Well for the next six months.
"mummmble murmble ummbble... but that can|will be easily fixed|addressed|solved by future models"
How did you get the numbers on your landing page? It looks like an AI generated product with AI generated "safety". Just like the "2000 clinical AI tools" that hit the US market, this looks like one of the "2000 governance tools" that hit the market. How are you vetting every AI Scribe tool so your product itself isn't biased ? Have you done any work with the companies you have listed in your landing page? It looks like a governance tool that the "trying-to-be" scribe companies would use to not get legit audits.
Thanks for your question.
We use in-house evals (based on existing state-of-the-art benchmarks) to compare ambient scribes.
If you take a deeper look into the companies on our landing page, you will see that the first list refers to the compliance standards our workflows follow and the second refers to the existing tools we integrate with.
Hi,
> We use in-house evals (based on existing state-of-the-art benchmarks) to compare ambient scribes.
Have you validated that your in-house evals accurately reflect real-world performance?
> If you take a deeper look into the companies on our landing page, you will see that the first list refers to the compliance standards our workflows follow and the second refers to the existing tools we integrate with.
I am talking about your use of Abridge Ambient Scribe, Nuance and Deepscribe brands in your landing page. You have numbers on the number of beds, hourly efficiency and their costs next to the actual brands. I don't see any proper attributions or disclaimers.
Also, if you were to compare actual numbers you get from the websites, these companies can use different models for different users, have enterprise discounts for different organizations and etc. How are you planning on having access to these to make a proper comparisons to see what they would actually offer to their potential customer?
Fwiw, I am a fan of the "AI marketplace" runs. This one just raises a lot of questions for me.
But, good luck!
> Parachute evaluates vendors against a hospital’s clinical needs and flags compliance and security risks before a pilot even begins
this is humans? I'm really not sure how this could be automated given the vast spectrum of applications and specific requirements complex organisations like hospitals have. It would have to boil down to "check box" compliance style analysis which in my experience usually leads to poor outcomes down the track (the worst product from every other point of view gets chosen because it checks the most arbitrary boxes on the security / compliance forms - then the integration bill dwarfs whatever it would have cost to address most of those things bespoke anyway).
> auditable proof that these models are safe, fair
Impossible to deliver
Evidence for safe and fair AI systems is possible as long as you define what "safe" and "fair" mean for your usecase. Fairness might look like "no cohort has >5% higher false positive rate than another" and safety might mean "the model must have a false negative rate of less than 15%". Safety more so encompasses the workflows around the model, including human intervention, auditing, monitoring, etc.
Here's a good overview of fairness: https://learn.microsoft.com/en-us/azure/machine-learning/con... and there's plenty of papers discussing how to safely use predictive analytics and AI in healthcare.
I don't know if this product can give proof for safe and fair ML systems, but it's not impossible to use these things safely and fairly.
Thanks for your question. Parachute's workflows are built around widely accepted conventions for safety and fairness for AI models in healthcare such as NIST AI RMF, CHAI and HAIP's HEAAL framework.
You're right; we should not even try. Better to have 0% compliance coverage and your honor than 90% coverage as best-effort.
Acceptance criteria has entered the chat.
Where are your promises or goals of addressing the fear that these large language model medical paperwork assistants will be implanting subtle time bombs into their reports.
We've all seen how powerful language can be in legal defenses surrounding the for profit healthcare industry of the united states.
What new "pre-existing conditions" alike thought, and legal argument, terminating phrases will these large language models come up with for future generations?
I don't understand your comment. What sort of time bombs do you mean?
I suppose “It is difficult to get a man to understand something” and all, but I’ll try to help you understand.
The OP provided you one such “time bomb”: pre-existing condition. This was, 40 years ago, a totally innocuous phrase and then it became a rally cry of health insurers’ “delay,deny,defend” modus operandi.
If a large language model is taking notes for a doctor how will you defend against it slipping in phrases such as this to allow insurers to avoid their responsibilities?
Tell me how your product is designed to defend people from health insurers, or admit how your product is designed to help health insurers.
Your comment is a non sequitur. This is a policy issue, not something that clinical AI or monitoring thereof can solve. The Affordable Care Act (Obamacare) prevents health plans from charging more or denying coverage based on pre-existing conditions.
https://www.hhs.gov/healthcare/about-the-aca/pre-existing-co...
HIPAA also allows individuals to request amendments to their medical records if there are errors such as an incorrect diagnosis.
https://www.ecfr.gov/current/title-45/subtitle-A/subchapter-...
Are you guys using AI to check on AI ?
Yes but don’t worry there’s another YC startup who is building AI to check on AI that’s checking on AI.
Gold
Like I said above, we don’t use AI agents to grade other models. Instead, we run in-house evaluations tailored to each category of clinical AI, giving hospitals an apples-to-apples comparison between similar vendors.
How are you going to protect AI which optimise against your tests instead of actual data ?
These are extraordinary claims for a rapidly evolving field with a huge breadth of intended uses and technologies.
Here are a few questions that should be part of an evaluation of the Parachute platform to pressure test the claims made on the website and this post: 1) How many Parachute customers have passed regulatory audits by CMS, OCR, CLIA/CLAP, and the FDA? 2) What high quality peer-reviewed scientific evidence supports the claims of increased safety and detection of hallucinations and bias? 3) What liability does Parachute assume during production deployment? What are the SLAs? 4) How many years of regulatory experience does the team have with HIPPA, ISO, CFR, FDA, CMS, and state medical board compliance?
This is a good question. Parachute is not a certification body so we do not help you pass audits. Instead we help with internal decision-making surrounding the implementation of AI tools. We also help hospitals keep track of why decisions we made and who made them to produce to regulators and litigators in the future. Parachute does not assume any liability during production deployment.