An Attempt at God’s Sign


Do you think it’s fair to say that gods are those that has lower bound in evil and that the devil is one that has an upper bound in goodness?

Right? because god can be angry some times and punish people, and stuff, but he is limited in how much nasty he can bring onto humanity before he stops. Where as the devil, we assume, will not stop at any level of nastiness. However he will also have a bounded goodness he does before he will stop and starting doing bad things.

This is interesting because it took me a second to think through  as well. Our cultures and religion teach us that God is all good and devil is all evil. But because, either because of our lack of ability to comprehend, or our physical world lack the expressive power to express God’s will, that sometimes God’s act appear evil, and sometimes the devil’s work appears kind–just look at all those pretty girls out there, so pleasing, so nice, makes you want to be nice, right? But often they are the devil’s work and the niceness disappears at some point and then it’s all evil.

ehem… not speaking from personal experience.

So, but if you put your mind to it, despite these limitations, we are told that God will eventually recover and reveal to us that it is all good, and much better than before, that the evil we suffer in the mean time is completely overwhelmed by the greatness of what is to follow. If we think it this way that the latter will be better than present, then it would appear that, in our stricter language of mathematics, that God’s evil is bounded below, and in contrast, the Devil, the polar opposite of God has goodness bounded above.

Such believes have implications, of course. The fact that god is bounded below means that he will never bring human to extinction. One can argue that future of universe may be brighter without us, and that next intelligence or being of sorts will be closer to God than us, etc., but that argument is just plain unscientific–it cannot be tested. On the other hand, the perpetuity of humanity is testable, not conclusively, but growing in supporting evidence. I guess it’s kind of pseudo-scientific, but increasing evidence seem better than unprovable, right?

Such believes also means we can detect things. Suppose we find a cause whose effect has always known to be limited in goodness but (essentially) unbound in evil, then we can legitimately suspect that cause to be the Devil. We can actually detect devil from the goodness of its effects!!

Such believes should be defined more carefully, does two infinities of goodness and evil add up to our finite existence?

A Serious Problem with Signs in Previous Entries

Astute reader may have found some significant problem with signs in my earlier posts. The sign of these value functions must be carefully selected lest we exchange God and Devil. It might happen. For instance if you read my quantification of privacy blog entries, you will find that I did not correctly assign signs to the information. Suppose we continue with the example of dinner and leaked email to wife. Information theory is confusing in the sense that it cannot distinguish incriminating information from non-incriminating information. It is possible we can structure “Dinner” such that entropy implies innocence and lack of entropy implies guilt, but most natural cases, the output variable having low entropy could mean both very guilty and not guilty.

When I charge for my loss of privacy, when you rip open my pants and peek into it, I would only want to charge you money if it is embarrassingly to me. If it is show-worthy, I might pay you money for the exposure, right? Also, just to be clear, if the information is leaked as a summary of my private email to wife, the same calculation would take place but the conditional will be the humanization of email.

A purist would say, loss of privacy is loss of privacy without regard to guilt. If this is the case then the quantification will take the form:

IG(Dinner; private email to wife) = H(dinner) – H(Dinner | private email to wife)

In real world, this number is always non-negative, and we compute compensation based on this function. But as a conscientious person who wants orderly society and safety for my family and my fellow beings, my original proposal was to only charge for the private information when it proves to be unhelpful to the cause of crime prevention. This is further strengthened by a system where the law enforcement is punished only when the information proves me innocent. So the three grade of privacy quantification are:

Let a certain private information be a random variable P (such as dinner choice above, or my choice between java or pascal for my next project (pascal being a crime to use)) and let Q be a piece of data that is leaked or taken from me. the privacy loss PL is defined as the information gain regarding P

PL = IG(P;Q) =  H( P ) – H( P | Q )

Strong Privacy: Any private information Q lost that has PL >= 0 is privacy loss. (This is saying that any thing private revealed to non-private party against my direction is privacy loss, because IG is always non-negative)

Medium Privacy: Any private information Q lost that has a PL > 0 is privacy loss.

Weak Privacy: Any information Q lost that has PL > 0 and that P is more certain regarding guilt (For the purpose of punitive assurance, this is any certainty about reality being the same as clandestine actor’s desired outcome whose truth will generate reward for the clandestine actor. ).

SP, MP, and WP for the lazy.

Punitive Privacy Assurance:

Strong Punitive Privacy Assurance: Penalize clandestine actor for my strong privacy loss.

Medium Punitive Privacy Assurance: penalize clandestine actor only for my medium privacy loss.

Weak Punitive Privacy Assurance: Penalize clandestine actor only for my weak privacy loss.

SPPA, MPPA, WPPA for the lazy.

We should have at least Weak Punitive Privacy Assurance(WPPA) in America. IMHO

IG and the Quantification of Privacy

A while back, I talked about computing IG–information gain–by clandestine methods via an otherwise secret(personal) email. I will point to some other prior blogs entries about what can we reasonably consider private and some reasons why I think it’s bad (Because it removes competition….

The basic challenge is this: If your competitor can spy on what you do (unilaterally) then they will never be motivated to innovate. Their key strength will be their ability to hack your secrets and they will work hard on that, but not on how to build a better product or cure a disease or solve a new problem. If you can both spy on each other with perfect information then there is no need to innovate, just calculate the equilibrium and aim for that. If you can disinform your opponent then all your effort will go into disinformation instead of innovation. Basically it is much easier to do something sneaky and cheat than to do the right thing and innovate. This is why the government, a non-competing body whose interest is to make sure everyone compete (at least in America government this is the case), should provide for information security.


I realize in retrospect that IG may not make sense to most people based on the formulation I laid out. Let’s review. IG is the change in entropy from a state without additional knowledge to a state with knowledge

IG = H(secret) – H(secret | private email)

This measurement seem to be of a quite abstract concept of entropy–a unitless measurement. Why would I think this useful for any reason other than that it is called “Information Gain?” Well truth be told, what I had in mind was more of the IG from machine learning literature: Class purity after conditioning on some private information. It is actually used more as a measurement of correctness of predicting discrete output than abstract change in entropy of distribution after conditioning. I will refer reader to these excellent introductory books regarding “classification” algorithms.

… Some days passes and the books will hopefully have arrived on your desks…

So the example is if my secret is the probability that I will have Chinese food tonight. Let’s throw in several more classes, say Italian, Mexican cover 99.9% of all possibilities. This probability may be internal to me. Or it may be an externalizable model like I will toss a three-sided die and figure out what I will eat tonight.

Actually, this system forces us to think of a new class. I will call this new class the innovation class. It covers all cases where something new might happen, such as tonight when I went off on a tangent and forgot to eat dinner completely. Or I might be abducted by Aliens for demanding privacy, Japanese paramilitary for blogging, or God for thinking all these awful things. The fact is, I do not know what will happen, but what I do know is that things I don’t know will happen. So the class is called IC, Innovation Class–now we have a 4 sided die: Chinese, Mexican, Italian, IC; Let’s write naively that the probability for each class is:

Chinese Mexican Italian IC
33% 33% 33% 1%

The formula for the entropy of these classes is written as:

-H(Dinner)= p(Chinese) * log(p(Chinese)) + p(Mexican) * log(p(Mexican)) + p(Italian) * log(p(Italian)) + p(IC)*log(p(IC))

the above evaluates to almost the maximum possible entropy in three-class situation: H(Dinner)= 1.6499060116098556

that’s it. that’s the formula for calculating entropy that we will use repeatedly. Now, suppose that you have read my email to my wife saying “oh man, look at this great deal on groupon, 50% off on Indian food right near our home” What is the right thing to think about the distribution of my dinner?


Indian food is not Chinese or Mexican or Italian, but we have thought of that and put in IC to account for it.

Chinese Mexican Italian IC
10% 10% 10% 70%

-H(Dinner|private email to wife) = p(Chinese|private email to wife) * log(p(Chinese|private email to wife)) + p(Mexican|private email to wife) * log(p(Mexican|private email to wife)) + p(Italian|private email to wife) * log(p(Italian|private email to wife)) + p(IC|private email to wife)*log(p(IC|private email to wife))

gives us the conditional entropy of probability of dinner after reading my private email. This entropy H(Dinner|private email to wife)=0.09596342477405478

IG(Dinner; private email to wife) = H(Dinner) – H(Dinner|private email to wife) = 1.6499060116098556-0.09596342477405478=1.5539425868358008. This corresponds to an IGR of 1619.31%, that is, 15X more information after you saw the email than before.


Great! so now we know how much information is gained by reading that one private email of mine. This number, I think quantifies my loss of privacy.


Btw, this innocent example contain some hand waving. H(Dinner) for example is something that we may or may not know. Most people have trouble writing down a distribution for dinner choices. also, P(Dinner|private email to wife) here written as a table contain assumed values. What if after reading my private email you feel that P(IC)=85%? Who is to say what the reality of this probability is? This is why I felt that this model will not make to main stream legal system because the link between private email and the actual secret itself is not so obvious. You might use naive Bayes as the definitive of reality (refer to chapter in books or wiki), logistic regression, decision trees, or you might use something else… You may even use a distributions system like SVM or god forbid rule based systems…

If you understand this computation above, then it will be easy for you to understand the continuous version. Let dinner be a continuous variable, we can still write the same expression

IG(Dinner; private email to wife) = H(Dinner) – H(Dinner|private email to wife)

and it would have the same meaning. How far are we from the truth. This idea, btw, is indeed partially inspired by the name Information Gain, which also goes by Kullback-Leibler divergence when computed over distributions. The above formation exactly with the exception that “private email to wife” is a distribution, say, perhaps, my emails are generated randomly.

KL( Dinner|private email || Dinner )

But KL divergence does point us to some other interesting characterizations. Divergence–distance without some properties of distance. Namely that it is not a metric distance:

* Nonnegative dl(x,y)>=0:  yes

* Indiscernability: dl(x,y)=0 iff x==y: yes

* Symmetric dl(x,y)==dl(y,x): NO

* Triangle inequality dl(x,y)+dl(y,z) >= dl(x,z): NO

This has some serious implications regarding this formulation of privacy. Somethings that we naturally think should make sense do not.

Let’s say I have two emails, e1 and e2, and let’s say dinner is still the subject of intense TLA investigation:

KL(d;e1) + KL(d;e2) != KL(d;e1,e2)

All private information must be considered together, because considering them separately would yield inconsistent measurement of privacy loss

Let’s say there’re two secrets, d1 is my dinner choose and d2 is my wife’s dinner choose

KL(d1;e1,e2) + KL(d2;e1,e2) != KL(d1,d2; e1,e2)

All secrets must be computed together, because computing IG separately and adding is not equal to the total information gain.

Let’s say we have an intermediate decision called Mode of Transportation (mt), and it is a secret just like my dinner choice.

KL(mt;e1,e2) + KL(d ; mt) != KL(d; e1,e 2)

The intermediate secret can be calculated, but again, it must be calculated carefully and not by additive increase of IG.

Bummer, but fascinating!! But we we must make some choice about how to proceed. Knowledge about the nature of information (and especially electronic information), I believe, informs us about how we make choice in our privacy laws:


  • Should the whole data be analyzed all at once?
  • or should we only allow each individual’s data be processed all at once?
  • or should we only allow daily data of everyone to be processed together?
  • or should we only allow daily data  of each individual to be processed separately?

Each of these choice (and many other) impact the private information loss due to clandestine activities.