I recently wrote a command-line full-text search engine [1]. I needed to implement an inverted index. I choose what seems like the "dumb" solution at first glance: a trie (prefix tree).
There are "smarter" solutions like radix tries, hash tables, or even skip lists, but for any design choice, you also have to examine the tradeoffs. A goal of my project is to make the code simpler to understand and less of a black box, so a simpler data structure made sense, especially since other design choices would not have been all that much faster or use that much less memory for this application.
I guess the moral of the story is to just examine all your options during the design stage. Machine learning solutions are just that, another tool in the toolbox. If another simpler and often cheaper solution gets the job done without all of that fuss, you should consider using it, especially if it ends up being more reliable.
Still happens all the time in certain finance tasks (eg trying to predict stock prices), but I'm not sure how long that will hold. As for why that might be, I don't think I can do any better than linking to this comment about a comment about your question: <https://news.ycombinator.com/item?id=45306256>.
I suspect that locating the referenced comment would require a semantic search system that incorporates "fancy models with complex decision boundaries". A human applying simple heuristics could use that system to find the comment.
In the "Dictionary of Heuristic" chapter, Polya's "How to Solve it" says this: *The feeling that harmonious simple order cannot be deceitful guides the discover in both in mathematical and in other sciences, and is expressed by the Latin saying simplex sigillum veri (simplicity is the seal of truth).*
I have a silly little internal website I use for bookmarks, searching internal tools, and some little utilities. I keep getting pressure to put it into our heavy and bespoke enterprise CICD process. I’ve seen people quit over trying to onboard into this thing… more than one. It’s complete overkill for my silly little site.
My “dumb” solution is a little Ansible job that just runs a git pull on the server. It gets the new code and I’m done. The job also has an option to set everything up, so if the server is wiped out for some reason I can be back up and running in a couple minutes by running the job with a different flag.
I occasionally see people complaining about long TypeScript compile times where a small code base can take multiple minutes (possibly 10 minutes). I think to myself WTF, because large code bases should take no more than 20 seconds on ancient hardware.
On another note I recently wrote this large single page app that is just a collection of functions organized by page sections as a collection of functions according to a nearly flat typescript interface. It’s stupid simple to follow in the code and loads as fast as an eighth of a second. Of course that didn’t stop HN users from crying like children for avoiding use of their favorite framework.
I remember Scalyr, at least before they were bought by SentinelOne basically did parallel / SIMD grep for each search query and consistently beat data that was continually indexed by the likes of Splunk and ElasticSearch.
The common one I fought long ago was folks who always use regular expressions when what they want is a string match, or contains, or other string library function.
I recently wrote a command-line full-text search engine [1]. I needed to implement an inverted index. I choose what seems like the "dumb" solution at first glance: a trie (prefix tree).
There are "smarter" solutions like radix tries, hash tables, or even skip lists, but for any design choice, you also have to examine the tradeoffs. A goal of my project is to make the code simpler to understand and less of a black box, so a simpler data structure made sense, especially since other design choices would not have been all that much faster or use that much less memory for this application.
I guess the moral of the story is to just examine all your options during the design stage. Machine learning solutions are just that, another tool in the toolbox. If another simpler and often cheaper solution gets the job done without all of that fuss, you should consider using it, especially if it ends up being more reliable.
[1] https://github.com/atrettel/wosp
Still happens all the time in certain finance tasks (eg trying to predict stock prices), but I'm not sure how long that will hold. As for why that might be, I don't think I can do any better than linking to this comment about a comment about your question: <https://news.ycombinator.com/item?id=45306256>.
I suspect that locating the referenced comment would require a semantic search system that incorporates "fancy models with complex decision boundaries". A human applying simple heuristics could use that system to find the comment.
In the "Dictionary of Heuristic" chapter, Polya's "How to Solve it" says this: *The feeling that harmonious simple order cannot be deceitful guides the discover in both in mathematical and in other sciences, and is expressed by the Latin saying simplex sigillum veri (simplicity is the seal of truth).*
I have a silly little internal website I use for bookmarks, searching internal tools, and some little utilities. I keep getting pressure to put it into our heavy and bespoke enterprise CICD process. I’ve seen people quit over trying to onboard into this thing… more than one. It’s complete overkill for my silly little site.
My “dumb” solution is a little Ansible job that just runs a git pull on the server. It gets the new code and I’m done. The job also has an option to set everything up, so if the server is wiped out for some reason I can be back up and running in a couple minutes by running the job with a different flag.
I occasionally see people complaining about long TypeScript compile times where a small code base can take multiple minutes (possibly 10 minutes). I think to myself WTF, because large code bases should take no more than 20 seconds on ancient hardware.
On another note I recently wrote this large single page app that is just a collection of functions organized by page sections as a collection of functions according to a nearly flat typescript interface. It’s stupid simple to follow in the code and loads as fast as an eighth of a second. Of course that didn’t stop HN users from crying like children for avoiding use of their favorite framework.
I wrote a clone of battle zone the old Atari tank game. For the enemy tank “AI” I just used a simple state machine with some basic heuristics.
This gave a great impression of an intelligent adversary with very minimal code and low CPU overhead.
I’m mostly a hardware engineer.
I needed to test pumping water through a special tube, but didn’t have access to a pump. I spent days searching how to rig a pump to this thing.
Then I remembered I could just hang a bucket of water up high to generate enough head pressure. Free instant solution!
I remember Scalyr, at least before they were bought by SentinelOne basically did parallel / SIMD grep for each search query and consistently beat data that was continually indexed by the likes of Splunk and ElasticSearch.
The common one I fought long ago was folks who always use regular expressions when what they want is a string match, or contains, or other string library function.
Seen people tripped up with dynamodb like stores, especially when they have a misleading sql interface like Azure tables.
You cant be "agile" with them, you need to design your data storage upfront. Like a system design interview :).
Just use postgres (or friends) until you are webscale. Unless you really have a problem amenible to key/value storage.