I’ve been working on an alternative architecture to Transformers called GAIA. It trains in seconds on CPU and is based on hashing with π-driven partition regularization. Paper and implementation are available here: https://doi.org/10.17605/OSF.IO/2E3C4
I’ve been working on an alternative architecture to Transformers called GAIA. It trains in seconds on CPU and is based on hashing with π-driven partition regularization. Paper and implementation are available here: https://doi.org/10.17605/OSF.IO/2E3C4