Can we bootstrap AI Safety despite being unable to even define it?

2 points | by cryptohell 12 hours ago

1 comments

cryptohell 12 hours ago
Given several models, assuming only that some unknown subset is "safe", can we construct a single model as safe as that subset? This reduces obtaining a trustworthy model to a plausibly easier task.