Equality of Benefit

I’ve been involved in a lot of discussion around bias, equality and fairness regarding algorithmic decision making. Without going into excessive amount of background and detail the gist of my believe at the current moment is that equality of utility is the safest thing for companies to aspire to.

What is equality of utility? Let’s degenerate into binary decision making: given individual x, who has observable features f(x) and protected feature p(x). Suppose the company has to choose among two actions to take {a,b}. What is a workable definite of fairness or equality in such a decision making effort with respect to protected properties p?

Let god bestow us, a neutral third party, with a utility functor u whose evaluation on the individual u(x) results in a function u(x)(a) is the utility of company taking action a to individual x, u(x)(b) is the utility to individual x of company taking action b.

Let g be the decision process of company, g(•) is the decision company makes either a or b for the situation. Then the right thing to do

g(f(x)) = argmax_{i\in{a,b}}(u(x)(i)) = g(f(x), p(x))

Simple, we do as god says is best for the customer, act as if we have the knowledge of an oracle–even when we know of some reason for discrimination.

The sweet spot between CSS and OSS

I’ve been hacking on a commercial database at work recently. Spent a good week of time querying the database for a good sample of a decade of the company’s historical data. The thing that really kills me is this 15 minute query sniper that has been around since before I joined the company–it exist on all data platforms we have ever had: MySQL, postgresql, hive, vertica, spark, Hadoop.

But in an ironic way, I am actually really glad that all this crap works at all! How recent was it that MySQL would just get stuck or run out of memory or some other hidden unknown problem? How recent was it when hive crashed? How recent was it when you had to query hbase for data? But all that is under control because the company managed to pay for a product that actually works. system-v (anonymize to protect the company) can actually handle the work load that we have doing funky 5-level deep subqueries, multiple mixed inner and outer joins, filters group-by’s, aggregations, string operations, and it never peeped a single complaint! Just runs until query is killed. Production etl was not impacted. Other analysts didn’t complain. All I got was when the query got big, it was killed. 

Of course this two dozen person team, director level senior management and all, and all those servers and licensing fee, and all those training classes, and maybe a few quarters of ramp up is more expensive than the two weeks I spent bringing up a spark cluster on an HP laptop and Dell server. My cluster, btw handled the same sized query fine on spark. I only had to upgrade the disks slightly from factory default.

What this illustrates is that closed source software is catching up with Open source software! This is the sweet spot where closed source is at parity in performance and feature when compared with OSS lacking only cheap install and maintenance. Everything works every where. All you choose is your price and reliability. This is where software should stay for ever! Any geek can code up any new algo in matter of seconds. Test it, launch it in the next release. Beat the CSS by a quarter or two to the market. Companies that chose not to use OSS has to wait the few quarters, but that is a choice that they now have! CSS actually must keep up with OSS to stay afloat. OSS is no longer the only choice for real features, performance and non-stupid implementations.

Competition is so awesome for consumers!