Activities of a Clandestine Nature (4 of…

Recently I heard a really great argument against clandestine activities: It perpetuates the practice, the habits, the policies, and the systems that facilitate clandestine activities. Being something that we don’t want, systematic clandestine activities should be pointed out, certainly be strictly live-audited by unbiased third parties.

Why is clandestine activities bad? The truth of the matter is that knowledge begotten of clandestine activities are inherently out of context and incomplete information. Why spy on my computer, when you can walk up to me and ask? When you take a small slice of what happens, you will surely miss the whole as the whole is not represented by some of the things that you are able to see as a clandestine agent.

Previously suggested problem that those taking part in clandestine activities will as all things in nature fall into the path of least resistance. Some day, we will just water board every person we suspect, I mean why not? I’m sure there’s a email I sent once that says “I hate you” or “I’m gonna kill you” or “I hope you die”. And my constant opposition of clandestine activities is surely sign that I plan something and desire that no one sees it.

What is the difference between these series acts: passing a secret law that permits some person unknown to me at a time unknown to me read my emails, gather all my past school and employment records, find copies of all emails I’ve ever sent by USPS, and analyze all information about all my past employment and my family and friends, and these second series of acts: passing a secret law that permits some person unknown to me at a time unknown to me knock me out (perhaps it’s already happening in my sleep ? or even on flights, god knows how often I fall asleep quite inexplicably moments before push off, with two air jets blowing cold air at me and two reading lights shining down! and only to come to quite suddenly for no reason), and torture me and get that information?

Well, you say, there is collateral damage, you feel pain when you are tortured but you do not feel pain when your email is being scanned. This ought to be the most humane way of getting the information from you. Why are you not on your knees thanking all the people who’s hard work went into making it so that you don’t have to be water boarded? (rightfully or not)

Aha, thank you President Obama! The constitution should save us… Let’s see, according to wiki it implicitly presumes innocent for US citizens until proven guilty, but it provides wide leeway for authorities to investigate when suspicion is arouse.

We cannot pursue it through cruel and unusual punishments(8th amendment) as reading my email can hardly be construed as cruel and unusual… even in my interpretation. Although I can imagine some feel it is cruel.

It appears in the Fourth Amendment against unreasonable search and seizure:

The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated, and no Warrants shall issue, but upon probable cause, supported by Oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized.

It also fall under Fifth Amendment of due process:

No person shall be held to answer for a capital, or otherwise infamous crime, unless on a presentment or indictment of a Grand Jury, except in cases arising in the land or naval forces, or in the Militia, when in actual service in time of War or public danger; nor shall any person be subject for the same offence to be twice put in jeopardy of life or limb; nor shall be compelled in any criminal case to be a witness against himself, nor be deprived of life, liberty, or property, without due process of law; nor shall private property be taken for public use, without just compensation.

There needs to be a Grand Jury of my peers selected uniformly at random who when presented with evidence agree to the search and seizure of my information. I should not be deprived of my liberty and (privacy) property without due process of law. And of course the Ninth Amendment says that we may have rights beyond those listed

The enumeration in the Constitution, of certain rights, shall not be construed to deny or disparage others retained by the people.

I should cover my behind and say, you guys in law enforcement are doing a heck of a job, which is much appreciated by present author. And I really hate all those other people who invade my privacy. It’s just that I might have a small chance by conventional means (law suite, legal protests, policies, etc.) of changing those things you do that I don’t like, and I do not have methods to affect those others.

Everyone who do participate in clandestine activities all feel absolute righteousness as they proceed in invasion of privacy that I do not want them to. Their feeling and their intention absolutely annoys me in addition to their act of invasion. Perhaps we should define invasion of privacy more formally so that these feelings regarding their feelings are processed rationally. If they can define information theoretic brain death, why can we not define more precisely what invasion of privacy is? What is personal privacy beyond those facts(bits, characters, words, sentences…) whose association with me is information that may cause me harm? regardless of harm, can we take the entropy of those bits and say that is the quantity of privacy lost? Actually, we should take information gain from a representative population and that is the information I lose–those that you gain. The privacy loss as defined (the negative of your information gain by reading my email from knowledge of all emails) actually only qualifies the privacy. It actually does not quantify it sufficiently.

Sadly, this very truthful and fundamental definition takes us a short ways. If you were an English major trying to find new phrasing of something, or if you are a VC looking for new cute company names, this will definitely find information detrimental to those trying to keep it private. But if I am someone plotting next Taliban attack, or someone discussing 21st century is a Marxist century, then the naive information loss does not help as much as you would like it to (Certainly my email would give away less information under this definition than XYXYXZZZ.com inc) If everyone writes emails using words representing their true meaning equally and every one has same amount of total information(private+public) associated with them then reading your email and reading my email decreases our privacy equally. So we have parameters I_pr for private information, I_pu for public information.

We should compute using baye’s rule to compute

P(I_pr|my emails, others’ emails, I_pu) = P(my emails | I_pr,I_pu, others’ emails)*P(I_pr,I_pu, others’ emails)/P(my emails, others’ emails, I_pu)

and

P(my emails|Others’ emails, I_pu)

and we can then calculate the information

IG(I_pr; my emails|others’ emails, I_pu)

based on these distributions, pending specification of relevant linking functions or mechanisms. But the problem with this much more convincing information gain is that you will never convince anyone that the link functions is representative of you. Too complicated for constitutional purposes for sure, and the courts will surely not be empathetic enough to follow the math… Maybe next century when everyone’s played with IG and done some modeling in grammar school.

For another example the number $54,102,299.14 and the number $14,541,022.99 relieves me of the same character-wise entropy privacy, however are quantitatively different. We need to rely on some oracle magic. Suppose there is a most concise way to describe the entirety of my privacy, say H containing a series of bits an oracle produced. Your knowledge of H would be your complete knowledge about me. erg, we should have a vocabulary of engrams, minimal cognitive elements… H is a series of engrams that is the complete knowledge about me–it’s finiteness is not specified. Let’s also suppose that my emails (the thing that you use to access my privacy) is encoded by the same oracle using the same engram language producing E the complete knowledge about my emails. |H| is the theoretic maximum privacy I can lose, H*E is the information that I actually lost (inner product like operation for vector space, TBD for strings, perhaps LCS for a special oracle). It remains only to calculate distance(such as edit_distance(H,E) for strings and euclidian_distance(H,E) for euclidian spaces) which is disinformation you gained by reading my email. H*E/|H| is the ratio of my privacy lost, H*E/|E| is the truthfulness of my emails.

It remains to be seen how to find an oracle, the definition of the engram language, operations over it, campaign to enact law to monitor and compensate us for the privacy lost, etc. However, I am really really wishing that all these clandestine activities are like zits in the face of growing humanity reaching adulthood and will blow away as our vitalities settle into their respective places.

You have got to be Kidding me

So, Madiant discovery of Chinese hacker has lead to the “discovery” of one of their blogs.

You have got to be fucking kidding me.

I mean, the obvious parallel one would draw is Mark Zuckerberg who used his hacking skills to hack db’s and get pretty girls’ headshots and has now been accepted by society as very successful and very good person… By that I mean, Mark is very rich and not that many people hate him like there are who hate other billionaires.

His Chinese counterpart may be a lowly employee who finally joined his company, or maybe, he committed suicide after he was too embarrassed for not being able to find a wife or … actually more likely provide for a wife in the Chinese social/economic order.

But really, I am having trouble suspending disbelieve and continue that thought. Really? Would the Chinese censor allow this kind of stuff to be posted from a Chinese military installation? You have got to be kidding right?

Hey, also, what’s with this thing where the US spy agencies are given access to US citizen’s financial information?  Don’t they already have it and mine the shit out of them? why the fuck would the CIA and NSA not already have access to this data? Seems really odd

Anyway, I guess it’s nice that Obama Administration decides to make the populace aware of this fact. Those who has anything to hide probably already know, and those who don’t know should be informed.

The other problem with monitoring and surveillance is that I really don’t trust my private information to a stranger. I don’t trust the information I keep private to anybody and that’s why I keep it private. These law enforcement people, they all have a, to a large or small extent, perverse interest in power. The cook gadgets that enable them to snoop, to record, to change things, to have control over other peoples’ lives. The elitist feeling: I’m more important, I have higher authority because I am doing something more important than you.

Fundamentally, these are the factors that drive society. But since law enforcement is to prevent the problems caused by these factors, they cannot be motivated by these same factors. And if YOU tell ME that YOU are a law enforcement officer and that YOU do NOT find a deep attraction to your WEAPON, your VEHICLE, your COMPUTER, your CODE, your TOOLS, your BADGE, your next COMMAND, your next SUSPECT/VICTIM and that you dream about them and that you some times cum to the thoughts of them, then I DO NOT BELIEVE YOU.

And if I do believe you then you are driven by the same forces that drive criminals to do the illegal things (much less bad things), which makes you no more trust worthy than them.

I do not want you to jerk off while looking at my bank accounts or my personal photos or my children’s personal photos. But what guarantees do I have that there is not a law enforcement officer doing that every day? It cases me no material harm, but I just don’t want that to happen. How do I explain this? Under what grounds can I justify my distrust and disgust ??? Is this a human right? is privacy a human right? It feels like it oughta be. It ought to be even more important for me to be able to keep my papers private than my right of speech regarding these papers.

I wish President Obama has an answer to this… I’m sure he does… I mean he signed up to be the commander in chief of all of these perverts. Anyway, all this fussing on my personal blog are probably not going to cause society any good… sigh, for a brief moment, some bits in some computer on some planet in some galaxy… these patterns formed and then vanished…

An Attempt at God’s Sign

 God!

Do you think it’s fair to say that gods are those that has lower bound in evil and that the devil is one that has an upper bound in goodness?

Right? because god can be angry some times and punish people, and stuff, but he is limited in how much nasty he can bring onto humanity before he stops. Where as the devil, we assume, will not stop at any level of nastiness. However he will also have a bounded goodness he does before he will stop and starting doing bad things.

This is interesting because it took me a second to think through  as well. Our cultures and religion teach us that God is all good and devil is all evil. But because, either because of our lack of ability to comprehend, or our physical world lack the expressive power to express God’s will, that sometimes God’s act appear evil, and sometimes the devil’s work appears kind–just look at all those pretty girls out there, so pleasing, so nice, makes you want to be nice, right? But often they are the devil’s work and the niceness disappears at some point and then it’s all evil.

ehem… not speaking from personal experience.

So, but if you put your mind to it, despite these limitations, we are told that God will eventually recover and reveal to us that it is all good, and much better than before, that the evil we suffer in the mean time is completely overwhelmed by the greatness of what is to follow. If we think it this way that the latter will be better than present, then it would appear that, in our stricter language of mathematics, that God’s evil is bounded below, and in contrast, the Devil, the polar opposite of God has goodness bounded above.

Such believes have implications, of course. The fact that god is bounded below means that he will never bring human to extinction. One can argue that future of universe may be brighter without us, and that next intelligence or being of sorts will be closer to God than us, etc., but that argument is just plain unscientific–it cannot be tested. On the other hand, the perpetuity of humanity is testable, not conclusively, but growing in supporting evidence. I guess it’s kind of pseudo-scientific, but increasing evidence seem better than unprovable, right?

Such believes also means we can detect things. Suppose we find a cause whose effect has always known to be limited in goodness but (essentially) unbound in evil, then we can legitimately suspect that cause to be the Devil. We can actually detect devil from the goodness of its effects!!

Such believes should be defined more carefully, does two infinities of goodness and evil add up to our finite existence?

A Serious Problem with Signs in Previous Entries

Astute reader may have found some significant problem with signs in my earlier posts. The sign of these value functions must be carefully selected lest we exchange God and Devil. It might happen. For instance if you read my quantification of privacy blog entries, you will find that I did not correctly assign signs to the information. Suppose we continue with the example of dinner and leaked email to wife. Information theory is confusing in the sense that it cannot distinguish incriminating information from non-incriminating information. It is possible we can structure “Dinner” such that entropy implies innocence and lack of entropy implies guilt, but most natural cases, the output variable having low entropy could mean both very guilty and not guilty.

When I charge for my loss of privacy, when you rip open my pants and peek into it, I would only want to charge you money if it is embarrassingly to me. If it is show-worthy, I might pay you money for the exposure, right? Also, just to be clear, if the information is leaked as a summary of my private email to wife, the same calculation would take place but the conditional will be the humanization of email.

A purist would say, loss of privacy is loss of privacy without regard to guilt. If this is the case then the quantification will take the form:

IG(Dinner; private email to wife) = H(dinner) – H(Dinner | private email to wife)

In real world, this number is always non-negative, and we compute compensation based on this function. But as a conscientious person who wants orderly society and safety for my family and my fellow beings, my original proposal was to only charge for the private information when it proves to be unhelpful to the cause of crime prevention. This is further strengthened by a system where the law enforcement is punished only when the information proves me innocent. So the three grade of privacy quantification are:

Let a certain private information be a random variable P (such as dinner choice above, or my choice between java or pascal for my next project (pascal being a crime to use)) and let Q be a piece of data that is leaked or taken from me. the privacy loss PL is defined as the information gain regarding P

PL = IG(P;Q) =  H( P ) – H( P | Q )

Strong Privacy: Any private information Q lost that has PL >= 0 is privacy loss. (This is saying that any thing private revealed to non-private party against my direction is privacy loss, because IG is always non-negative)

Medium Privacy: Any private information Q lost that has a PL > 0 is privacy loss.

Weak Privacy: Any information Q lost that has PL > 0 and that P is more certain regarding guilt (For the purpose of punitive assurance, this is any certainty about reality being the same as clandestine actor’s desired outcome whose truth will generate reward for the clandestine actor. ).

SP, MP, and WP for the lazy.

Punitive Privacy Assurance:

Strong Punitive Privacy Assurance: Penalize clandestine actor for my strong privacy loss.

Medium Punitive Privacy Assurance: penalize clandestine actor only for my medium privacy loss.

Weak Punitive Privacy Assurance: Penalize clandestine actor only for my weak privacy loss.

SPPA, MPPA, WPPA for the lazy.

We should have at least Weak Punitive Privacy Assurance(WPPA) in America. IMHO

Should it be legal?

Time for another episode of “should it be legal ?”

 

Think of it… we’re in Philadelphia, no the movie, not the city. And Tom Hanks discovers that the corporate email server is very slow… too slow in fact to receive the document he is trying to emailed to his assistant before the end of statute of limitations was set to expire the next day. Would this count towards illegal discriminatory behavior based on race, age, sexual preference or country of origin?

 

Actually a more important question to ask is does anybody even care of fairness at work place? Are there any amongst you that would agree to racial discrimination just to receive some shares of stocks or to feed your family? In this time of terrible economic crisis, I think most people in America do not have the liberty to act on concerns of unfairness.

 

Why has there been more frequent economic crisis? I think I finally know why. It is not because corporate America cannot keep accounts straight or evaluate risk on mortgage loans! The crisis for all practical purposes legalizes discrimination. Everybody is holding their own mouths shut for fear of being seen as against the company.

 

Is it legal in America to restrict employee work-place internet connections and bandwidth based primarily on race, and place of origin?

 

Personally, having no law degree, I feel that it is race based preferential treatment and unfairly bias against a certain group based on racial characteristics and place of origin.

 

Oh, I mean, I know it can’t be traced to the company… just like that fax was lost and recovered inexplicably in Philadelphia. But the mere fact of this capability should be announced publicly like when police decides to arrest people they have to say out loud what and why they are doing it. When the company inspects the employee’s connections from work place computer and delay it or disrupt it, it must be done in an unbiased way.

 

Am I, like, the only one?

Dude, am I like the only one under the sun who don’t know who or how emails are being “unsent” ?

 

The symptom is this: I type the email, hit send, it goes away. Next day (or several days later), I become aware that recipient did not receive the email. I look for the email and it is stored as an unsent “DRAFT” in gmail.

 

I did some quick search on google and didn’t see anybody else talk about this. But my email (gmail) often become unsent after I hit the send button. I doubt it is a bug on google’s side. I also doubt it is very wide spread, since I have neither seen or heard anybody mention this problem.

 

But it does happen often when the content of email is undesirable for the recipient. This happens both in google’s free accounts and in a paid enterprise version of gmail. It happens both in work email and in personal email.

 

I mean, I guess I should admit, now that I’m at it, that I also have occasional ED… Because it is of similar level of embarrassment for a computer guy to not know this crucial skill is probably like ED to sexual ability of man–naturally occurring but failing. Oh, and!?, btw!? I also have urinary incontinence. Experiencing all three, I can tell you that they don’t kill you, but all are very inconvenient and can be very very embarrassing.

 

Let’s see, what have I tried:

 

* Tried google’s 2-phase verification.

* Tried paying google for the gmail account.

* HTTPS always, man-in-the-middle due to invisible corporate proxy cannot be. And it happens at home too.

* And failing that, using a mobile device that goes through an entirely physically separate cellular network.

* Use chrome, which supposedly is more secure than other browsers.

* Bcc myself on all mail.

* porn, sex, not drinking water, and diapers.

 

Still, emails become unsent the next day. The problem with this is that if it is not a bug, then the people who cause this to happen is seriously detracting from my ability to work and live. I mean, I have thought about how it might be my boss who just want to delay a few projects so that he doesn’t have to give me bonus, or my coworker who want to make me look bad so that he can get bonus, or the HR/legal of company who want to reduce liability of the company by making it look like I didn’t communicate vital but damaging information.

 

But those are just suspicions of a really insane person. I mean, seriously, what are the chances that the silly secretary or office manager have more access to information and control my communications than I do? I mean, com’on I actually work and produce things that the company sell for money, it cannot possibly be that there is a person who sits there and reads every single email and evaluates them and selectively unsends them.

 

I don’t have trouble believing that shrewd corporate competitors and business man and an occasional hacker have the means to do this, but the unsending of email happens at several companies, several accounts under management by different people. It happens enough to make me think that every company officially has the capability of unsending emails hosted by google?

 

Is this an attack by Microsoft? Part of the scroogle campaign? Some coworker do come from M$ family… Corporate conspiracy to defame google?

 

Despite these occasional intrusions, I have not been motivated to seek out a new email service provider (ESP) for my personal account, and certainly have no better alternative to recommend to work place.

 

Also, it could be that I just suffer from some kind of interruption in consciousness and somehow I have clicked on “INBOX” instead of “Send” on those occasions. But this is very unlikely as many of these emails contain important information. Also, there are occasions when I’ve checked that the email is in the “SENT” box before leaving work and then seeing the email in “DRAFT” folder several days later.

 

I know I won’t be the first or last guy to complain about ED… But how come there isn’t awareness campaigns and support groups for people who’s email get unsent?

 

 

p.s.

Btw, if you ever get raging hemorrhoids that stay for months and months or anal fissure that reappear daily, try to use some baby diaper cream in addition to the fiber that the doctor prescribe. They cream help you heal just as much as they help baby. fyi I guess… At least I have found some solutions regarding this embarrassing matter.

Code.org Advertisement and no-WFH

Recently code.org publicized a promotional video featuring ppl like Mark Zuckerberg of Facebook and Bill Gates of Micro$oft saying American schools should teach programming more.

 

I don’t like it.

 

I don’t think programming is for everyone and that more programming is for social good or scientific advancement. It lowers cost of labor for all those people in the Advertisement, but it isn’t as good as it sounds.

 

As a person who completed a CS degree, I feel that computer language can be made much better so that there won’t be a “computer programming”

 

The day that I tried to teach my dad to program a for-loop in C and he turned around and teased me about forgetting the closed form expression for arithmetic series was the first time that I thought about how stupid this stuff I do is. It was the expression on my dad’s face… I remember it vividly… For it was then that I realize that I did not comprehend the sheer vulgarity of

for(int x=0;x<100;++x);

so primitive, so stupid.

The next time is when I read about Map-Reduce–sooo freaking cool. I think tomorrow I will find another way to think, another way to say, and another way to program.

 

I want to make a better programming language. a better computer. That would be better than community colleges teaching Fortran IMHO

 

Oh, and p.s.

I think Yahoo!’s new no-policy is nice. I think is real progress for protection of civil liberty in America. Technology companies insists on ownership and monitoring of its employees while working, and admittedly justified to do so. Therefore when Marissa Mayers decided to cancel all WFH, she made a call that will end monitoring of employees’ home networks–because if you don’t work from home, the company will have no cause to instrument any kind of monitoring of your home network.

I think this is a really forward thinking technology leader who care about her employees. I am buying myself some Yahoo! stocks in support of this bold move.

IG and the Quantification of Privacy

A while back, I talked about computing IG–information gain–by clandestine methods via an otherwise secret(personal) email. I will point to some other prior blogs entries about what can we reasonably consider private and some reasons why I think it’s bad (Because it removes competition….

The basic challenge is this: If your competitor can spy on what you do (unilaterally) then they will never be motivated to innovate. Their key strength will be their ability to hack your secrets and they will work hard on that, but not on how to build a better product or cure a disease or solve a new problem. If you can both spy on each other with perfect information then there is no need to innovate, just calculate the equilibrium and aim for that. If you can disinform your opponent then all your effort will go into disinformation instead of innovation. Basically it is much easier to do something sneaky and cheat than to do the right thing and innovate. This is why the government, a non-competing body whose interest is to make sure everyone compete (at least in America government this is the case), should provide for information security.

)

I realize in retrospect that IG may not make sense to most people based on the formulation I laid out. Let’s review. IG is the change in entropy from a state without additional knowledge to a state with knowledge

IG = H(secret) – H(secret | private email)

This measurement seem to be of a quite abstract concept of entropy–a unitless measurement. Why would I think this useful for any reason other than that it is called “Information Gain?” Well truth be told, what I had in mind was more of the IG from machine learning literature: Class purity after conditioning on some private information. It is actually used more as a measurement of correctness of predicting discrete output than abstract change in entropy of distribution after conditioning. I will refer reader to these excellent introductory books regarding “classification” algorithms.

… Some days passes and the books will hopefully have arrived on your desks…

So the example is if my secret is the probability that I will have Chinese food tonight. Let’s throw in several more classes, say Italian, Mexican cover 99.9% of all possibilities. This probability may be internal to me. Or it may be an externalizable model like I will toss a three-sided die and figure out what I will eat tonight.

Actually, this system forces us to think of a new class. I will call this new class the innovation class. It covers all cases where something new might happen, such as tonight when I went off on a tangent and forgot to eat dinner completely. Or I might be abducted by Aliens for demanding privacy, Japanese paramilitary for blogging, or God for thinking all these awful things. The fact is, I do not know what will happen, but what I do know is that things I don’t know will happen. So the class is called IC, Innovation Class–now we have a 4 sided die: Chinese, Mexican, Italian, IC; Let’s write naively that the probability for each class is:

Chinese Mexican Italian IC
33% 33% 33% 1%

The formula for the entropy of these classes is written as:

-H(Dinner)= p(Chinese) * log(p(Chinese)) + p(Mexican) * log(p(Mexican)) + p(Italian) * log(p(Italian)) + p(IC)*log(p(IC))

the above evaluates to almost the maximum possible entropy in three-class situation: H(Dinner)= 1.6499060116098556

that’s it. that’s the formula for calculating entropy that we will use repeatedly. Now, suppose that you have read my email to my wife saying “oh man, look at this great deal on groupon, 50% off on Indian food right near our home” What is the right thing to think about the distribution of my dinner?

P(IC)=99%

Indian food is not Chinese or Mexican or Italian, but we have thought of that and put in IC to account for it.

Chinese Mexican Italian IC
10% 10% 10% 70%

-H(Dinner|private email to wife) = p(Chinese|private email to wife) * log(p(Chinese|private email to wife)) + p(Mexican|private email to wife) * log(p(Mexican|private email to wife)) + p(Italian|private email to wife) * log(p(Italian|private email to wife)) + p(IC|private email to wife)*log(p(IC|private email to wife))

gives us the conditional entropy of probability of dinner after reading my private email. This entropy H(Dinner|private email to wife)=0.09596342477405478

IG(Dinner; private email to wife) = H(Dinner) – H(Dinner|private email to wife) = 1.6499060116098556-0.09596342477405478=1.5539425868358008. This corresponds to an IGR of 1619.31%, that is, 15X more information after you saw the email than before.

 

Great! so now we know how much information is gained by reading that one private email of mine. This number, I think quantifies my loss of privacy.

 

Btw, this innocent example contain some hand waving. H(Dinner) for example is something that we may or may not know. Most people have trouble writing down a distribution for dinner choices. also, P(Dinner|private email to wife) here written as a table contain assumed values. What if after reading my private email you feel that P(IC)=85%? Who is to say what the reality of this probability is? This is why I felt that this model will not make to main stream legal system because the link between private email and the actual secret itself is not so obvious. You might use naive Bayes as the definitive of reality (refer to chapter in books or wiki), logistic regression, decision trees, or you might use something else… You may even use a distributions system like SVM or god forbid rule based systems…

If you understand this computation above, then it will be easy for you to understand the continuous version. Let dinner be a continuous variable, we can still write the same expression

IG(Dinner; private email to wife) = H(Dinner) – H(Dinner|private email to wife)

and it would have the same meaning. How far are we from the truth. This idea, btw, is indeed partially inspired by the name Information Gain, which also goes by Kullback-Leibler divergence when computed over distributions. The above formation exactly with the exception that “private email to wife” is a distribution, say, perhaps, my emails are generated randomly.

KL( Dinner|private email || Dinner )

But KL divergence does point us to some other interesting characterizations. Divergence–distance without some properties of distance. Namely that it is not a metric distance:

* Nonnegative dl(x,y)>=0:  yes

* Indiscernability: dl(x,y)=0 iff x==y: yes

* Symmetric dl(x,y)==dl(y,x): NO

* Triangle inequality dl(x,y)+dl(y,z) >= dl(x,z): NO

This has some serious implications regarding this formulation of privacy. Somethings that we naturally think should make sense do not.

Let’s say I have two emails, e1 and e2, and let’s say dinner is still the subject of intense TLA investigation:

KL(d;e1) + KL(d;e2) != KL(d;e1,e2)

All private information must be considered together, because considering them separately would yield inconsistent measurement of privacy loss

Let’s say there’re two secrets, d1 is my dinner choose and d2 is my wife’s dinner choose

KL(d1;e1,e2) + KL(d2;e1,e2) != KL(d1,d2; e1,e2)

All secrets must be computed together, because computing IG separately and adding is not equal to the total information gain.

Let’s say we have an intermediate decision called Mode of Transportation (mt), and it is a secret just like my dinner choice.

KL(mt;e1,e2) + KL(d ; mt) != KL(d; e1,e 2)

The intermediate secret can be calculated, but again, it must be calculated carefully and not by additive increase of IG.

Bummer, but fascinating!! But we we must make some choice about how to proceed. Knowledge about the nature of information (and especially electronic information), I believe, informs us about how we make choice in our privacy laws:

 

  • Should the whole data be analyzed all at once?
  • or should we only allow each individual’s data be processed all at once?
  • or should we only allow daily data of everyone to be processed together?
  • or should we only allow daily data  of each individual to be processed separately?

Each of these choice (and many other) impact the private information loss due to clandestine activities.

 

 

EMR errors and Privacy

EMR and Troubles with Identity Privacy

So recently I received a bill from an out of the state medical clinic. The bill charges me with treatment that were obviously not rendered unto me. The bill contained my name and physical address.

I called them and they very quickly rescinded the bill.

However, there is one remaining issue, which is that the US has promoted Electronic Medical Records (EMR) system. In fact, it appears that there might even be a mandatory EMR system in the near future. Consider for a minute that such a thing happened when mandatory EMR system is in force. What would happen?

I may be rejected for health insurance on the basis of pre-existing condition based on treatment a hospital claim to have rendered. Is this possible? Well, I have a bill here from an out-of-state clinic that says YES they can make mistake like that and can affect my permanent medical record.

Mandatory EMR/EHR is a godsend for insurance companies. It means they can receive the full history of a person’s past and perform sophisticated risk analysis that produce premium rates according to the person’s risk for illness or injury.

One would react to a  $25 insurance premium by not buying insurance and react to a $5,000,000 premium by committing suicide. Because if it is entirely based on risk and incidence prediction a $25 bill means that insurance company expects to pay that or less for your treatment, and similarly a five million dollar bill would mean there are significant evidence that I will need to pay that much to live.

I had written a bit about how this company found my address and put me down for the bill, but I realize that by talking about it, I am letting the world know how that company found my personal address, and I don’t want that, so let us not talk about the privacy problems here and move on to a larger issue:

The problem it could cause for us when an error is made is that our permanent record will be marred forever. I hate telling horror stories but let me tell this one:

4 months ago I leased my new 2013 Chevy Volt. California has a law that allows drivers of these environmentally friendly cars to drive in HOV lane using a “GREEN STICKER” which the DMV must issue. This is one of the main incentives that moved me to lease this car.

Needless to say it has been 4 months and I have not received the sticker to drive in the HOV lane. After two(3) series of phone calls and four(4) form submissions, I found out finally that my registration address had a problem from the start when I leased it from Boardwalk Chevy. I had a horrible experience leasing this car from them, being forced to sit through negotiation on a national wide promotional program with my eight-month(8) pregnant wife; being forced to sign contract three(3) times with numerous line items changing without due notice to me; the contract requiring me to pay upfront for 9 oil changes that must be used in 3 years(ON A FUCKING PLUGIN HYBRID that will not be burning gas most of the time); and then, after all that, having the wrong address entered into the computer so I cannot receive my green sticker.

But that is besides my current point, which is that there is a lot of racism and hatred and unkindness and merciless greed in this world added on top of legitimate human error and the Devil Satan. I do not believe we, as a species, have overcome these hurdles sufficiently to instrument an EMR system that will be central to our medical treatment.

If the address is wrong in my EMR, and I don’t receive prescription or communication from the doctor, it could be a matter of life or death. I mean, having to sit in traffic for an extra 60 minutes every day is a matter of wasted life, but at least it is not something that caused the complete cessation of life as an EMR error of this sorts could cause.

I am against universal mandatory EMR in the United States any time this decade.

Activities of a Clandestine Nature (4 of…

Recently I heard a really great argument against clandestine activities: It perpetuates the practice, the habits, the policies, and the systems that facilitate clandestine activities. Being something that we don’t want, systematic clandestine activities should be pointed out, certainly be strictly live-audited by unbiased third parties.

Why is clandestine activities bad? The truth of the matter is that knowledge begotten of clandestine activities are inherently out of context and incomplete information. Why spy on my computer, when you can walk up to me and ask? When you take a small slice of what happens, you will surely miss the whole as the whole is not represented by some of the things that you are able to see as a clandestine agent.

Previously suggested problem that those taking part in clandestine activities will as all things in nature fall into the path of least resistance. Some day, we will just water board every person we suspect, I mean why not? I’m sure there’s an email I sent once that says “I hate you” or “I’m gonna kill you” or “I hope you die”. And my constant opposition of clandestine activities is surely sign that I plan something and desire that no one sees it.

What is the difference between these series acts: passing a secret law that permits some person unknown to me at a time unknown to me read my emails, gather all my past school and employment records, find copies of all emails I’ve ever sent by USPS, and analyze all information about all my past employment and my family and friends, and these second series of acts: passing a secret law that permits some person unknown to me at a time unknown to me knock me out (perhaps it’s already happening in my sleep ? or even on flights, god knows how often I fall asleep quite inexplicably moments before push-off, with two air jets blowing cold air at me and two reading lights shining down! and only to come to quite suddenly for no reason), and torture me and get that information?

Well, you say, there is collateral damage, you feel pain when you are tortured but you do not feel pain when your email is being scanned. This ought to be the most humane way of getting the information from you. Why are you not on your knees thanking all the people whose hard work went into making it so that you are not water boarded? (rightfully or not)

Aha, thank you President Obama! The constitution should save us… Let’s see, according to wiki it implicitly presumes innocent for US citizens until proven guilty, but it provides wide leeway for authorities to investigate when suspicion is arouse.

We cannot pursue it through cruel and unusual punishments(8th amendment) as reading my email can hardly be construed as cruel and unusual… even in my interpretation. Although I can imagine some feel it is cruel.

It appears in the Fourth Amendment against unreasonable search and seizure:

The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated, and no Warrants shall issue, but upon probable cause, supported by Oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized.

It also fall under Fifth Amendment of due process:

No person shall be held to answer for a capital, or otherwise infamous crime, unless on a presentment or indictment of a Grand Jury, except in cases arising in the land or naval forces, or in the Militia, when in actual service in time of War or public danger; nor shall any person be subject for the same offence to be twice put in jeopardy of life or limb; nor shall be compelled in any criminal case to be a witness against himself, nor be deprived of life, liberty, or property, without due process of law; nor shall private property be taken for public use, without just compensation.

There should be a Grand Jury of my peers selected uniformly at random who when presented with evidence agree to the search and seizure of my information. I should not be deprived of my liberty and (privacy) property without due process of law. And of course the Ninth Amendment says that we may have rights beyond those listed

The enumeration in the Constitution, of certain rights, shall not be construed to deny or disparage others retained by the people.

I should cover my behind and say, you guys in law enforcement are doing a heck of a job, which is much appreciated by present author. And I really hate all those other people who invade my privacy. It’s just that I might have a small chance by conventional means (law suite, legal protests, policies, etc.) of changing those things you do that I don’t like, and I do not have methods to affect those others.

Everyone who do take part in clandestine activities all feel absolute righteousness as they proceed in invasion of privacy that I do not want them to. Their feeling and their intention absolutely annoys me in addition to their act of invasion. Perhaps we should define invasion of privacy more formally so that these feelings about their feelings are processed rationally. If they can define information theoretic brain death, why can we not define more precisely what invasion of privacy is? What is personal privacy beyond those facts(bits, characters, words, sentences…) whose association with me is information that may cause me harm? regardless of harm, can we take the entropy of those bits and say that is the quantity of privacy lost? Actually, we should take information gain from a representative population and that is the information I lose–those that you gain. The privacy loss as defined (the negative of your information gain by reading my email from knowledge of all emails) actually only qualifies the privacy. It actually does not quantify it sufficiently.

Sadly, this very truthful and fundamental definition takes us a short ways. If you were an English major trying to find new phrasing of something, or if you are a VC looking for new cute company names, this will definitely find information detrimental to those trying to keep it private. But if I am someone plotting next Taliban attack, or someone discussing 21st century is a Marxist century, then the naïve information loss does not help as much as you would like it to (Certainly my email would give away less information under this definition than XYXYXZZZ.com inc) If everyone writes emails using words representing their true meaning equally and every one has same amount of total information(private+public) associated with them then reading your email and reading my email decreases our privacy equally. So we have parameters I_pr for private information, I_pu for public information.

We should compute using Bayes’ rule to compute

P(I_pr|my emails, others’ emails, I_pu) =

P(my emails | I_pr,I_pu, others’ emails)*P(I_pr,I_pu, others’ emails)/P(my emails, others’ emails, I_pu)

and

P(my emails|Others’ emails, I_pu)

and we can then calculate the information

IG(I_pr; my emails|others’ emails, I_pu)

based on these distributions, pending specification of relevant linking functions or mechanisms. But the problem with this much more convincing information gain is that you will never convince anyone that the link functions is representative of you. Too complicated for constitutional purposes for sure, and the courts will surely not be empathetic enough to follow the math… Maybe next century when everyone’s played with IG and done some modeling in grammar school.

For another example the number $54,102,299.14 and the number $14,541,022.99 relieves me of the same character-wise entropy privacy, however are quantitatively different. We need to rely on some oracle magic. Suppose there is a most concise way to describe the entirety of my privacy, say H containing a series of bits an oracle produced. Your knowledge of H would be your complete knowledge about me. ergmum, we should have a vocabulary of engrams, minimal cognitive elements… H is a series of engrams that is the complete knowledge about me–it’s finiteness is not specified. Let’s also suppose that my emails (the thing that you use to access my privacy) is encoded by the same oracle using the same engram language producing E the complete knowledge about my emails. |H| is the theoretic maximum privacy I can lose, H*E is the information that I actually lost (inner product like operation for vector space, TBD for strings, perhaps LCS for a special oracle). It remains only to calculate distance(such as edit_distance(H,E) for strings and euclidian_distance(H,E) for euclidian spaces) which is disinformation you gained by reading my email. H*E/|H| is the ratio of my privacy lost, H*E/|E| is the truthfulness of my emails.

It remains to be seen how to find an oracle, the definition of the engram language, operations over it, campaign to enact law to account and compensate us for the privacy lost, etc. However, I am really really wishing that all these clandestine activities are like zits in the face of growing humanity reaching adulthood and will blow away as our vitalities settle into their respective places.

Things of a Clandestine Nature (3 of…

Money
There should be money value to losses of privacy. Every time an organized clandestine action is done onto me and that their actions is proven wrong, there should be consequence.

Having suspicion is a right, a duty of these law enforcement folks. But acting on an incorrect suspicion(whether justified or not) should carry consequence. Just as they are rewarded for following a hunch and catching a crook, there must be punishment for following a wrong hunch and negatively impacting a person’s life.

In fact, I feel that even the access and analysis of my private information (email, files, my personal space such as my home, the airspace above my head, signals sent into my person and my possessions) these invasions of privacy must be punished when proven to be wrong.

Each violation must state hypothesis and the condition of test requiring invasion of privacy. If test proves hypothesis wrong then a punishment is assessed. If it is proven right then a reward is given.

Every kilobyte of my email you read, you should be paying me $x. If you retain the data then you will be charged $y/year.

This belittles human privacy rights, but it is one way that we can use to quantify, regulate and monitor the clandestine sector.