I am presenting my book "Inside Solr and Lucene: Algorithms and Engineering Deep Dive". (I have three books out on Amazon—it just so happened that they were released simultaneously, and there's synergy there - all of them are about Search & Recsys).
This book differs from others on the topic of Solr/Lucene in that it describes the algorithms and data structures used in Solr/Lucene, starting from the problem that faced the developers, then examining naive or straightforward solutions, and further sharing how the architects in Solr/Lucene solved the problem. In very many cases, these are interesting, science-intensive solutions, backed by scientific works and papers (links are available in the book).
For example, I talk not only about the inverted index, posting lists, and how segments are stored—you can find it in other books—but also about how Finite-State Transducer (FST) is used there (this is a graph-based structure for term dictionaries, storing terms and metadata with shared prefixes/suffixes, solving high-memory usage and slow prefix/range queries in large vocabularies by providing compact, O(length of term) lookups); I talk about Pulsing Codec, Delta Encoding, Skip Lists, PackedInts (bit-packing algorithm that uses minimal bits per integer block based on the maximum value), Variable-Length Integers, LSM-Tree, HNSW for vectors, Roaring Bitmaps, LZ4 and DEFLATE compressions, Memory-Mapped I/O, Scatter-Gather Query Execution, Hash-Based Routing, SIMD Vectorization and a lot of other things of this kind.
For solution architects and search engineers.
Paperback and e-book versions. Worldwide delivery.
I am presenting my book "Inside Solr and Lucene: Algorithms and Engineering Deep Dive". (I have three books out on Amazon—it just so happened that they were released simultaneously, and there's synergy there - all of them are about Search & Recsys).
This book differs from others on the topic of Solr/Lucene in that it describes the algorithms and data structures used in Solr/Lucene, starting from the problem that faced the developers, then examining naive or straightforward solutions, and further sharing how the architects in Solr/Lucene solved the problem. In very many cases, these are interesting, science-intensive solutions, backed by scientific works and papers (links are available in the book).
For example, I talk not only about the inverted index, posting lists, and how segments are stored—you can find it in other books—but also about how Finite-State Transducer (FST) is used there (this is a graph-based structure for term dictionaries, storing terms and metadata with shared prefixes/suffixes, solving high-memory usage and slow prefix/range queries in large vocabularies by providing compact, O(length of term) lookups); I talk about Pulsing Codec, Delta Encoding, Skip Lists, PackedInts (bit-packing algorithm that uses minimal bits per integer block based on the maximum value), Variable-Length Integers, LSM-Tree, HNSW for vectors, Roaring Bitmaps, LZ4 and DEFLATE compressions, Memory-Mapped I/O, Scatter-Gather Query Execution, Hash-Based Routing, SIMD Vectorization and a lot of other things of this kind. For solution architects and search engineers.
Paperback and e-book versions. Worldwide delivery.
I hope you will find it useful.