I’ve been hacking on a commercial database at work recently. Spent a good week of time querying the database for a good sample of a decade of the company’s historical data. The thing that really kills me is this 15 minute query sniper that has been around since before I joined the company–it exist on all data platforms we have ever had: MySQL, postgresql, hive, vertica, spark, Hadoop.
But in an ironic way, I am actually really glad that all this crap works at all! How recent was it that MySQL would just get stuck or run out of memory or some other hidden unknown problem? How recent was it when hive crashed? How recent was it when you had to query hbase for data? But all that is under control because the company managed to pay for a product that actually works. system-v (anonymize to protect the company) can actually handle the work load that we have doing funky 5-level deep subqueries, multiple mixed inner and outer joins, filters group-by’s, aggregations, string operations, and it never peeped a single complaint! Just runs until query is killed. Production etl was not impacted. Other analysts didn’t complain. All I got was when the query got big, it was killed.
Of course this two dozen person team, director level senior management and all, and all those servers and licensing fee, and all those training classes, and maybe a few quarters of ramp up is more expensive than the two weeks I spent bringing up a spark cluster on an HP laptop and Dell server. My cluster, btw handled the same sized query fine on spark. I only had to upgrade the disks slightly from factory default.
What this illustrates is that closed source software is catching up with Open source software! This is the sweet spot where closed source is at parity in performance and feature when compared with OSS lacking only cheap install and maintenance. Everything works every where. All you choose is your price and reliability. This is where software should stay for ever! Any geek can code up any new algo in matter of seconds. Test it, launch it in the next release. Beat the CSS by a quarter or two to the market. Companies that chose not to use OSS has to wait the few quarters, but that is a choice that they now have! CSS actually must keep up with OSS to stay afloat. OSS is no longer the only choice for real features, performance and non-stupid implementations.
Competition is so awesome for consumers!