What of Death in the age of AI

I had a funny conversation with an old friend today. He mentioned that some Netflix show had within its plot a near future where an antagonist suffered death in the family. The support for that future’s human beings is an AI which is built upon the digital and social data recorded during the lifetime of the deceased. Said AI can talk with the living mimicking the deceased.

Oh the lovely thoughts that come to mind when death encroaches onto thy neurons. There is the slight chance that a digital recreation is better than a person’s own recreations (by way of imagination). Main reasons being that it would have a better and more independent PRNG than the human brains, and that it would have more data than any individual is ever exposed to.

I would definitely spring for the Linear Algebra package, for I had just spent half an hour complaining to my father that my poor Linear Algebra skills are in the way of my advancement. I definitely want my avatar, the Huan Chang Memorial Chatbot to know all of Linear Algebra and I want my kids and my dad, in person or as their own AI’s to see me with Linear Algebra Kungfu!

Next on the list would definitely be a spelling and grammar checker. ‘nuf said. Maybe a room simulator that gets messier and messier. Just for those people in my life who hates messes.

Given how much time I spend online, I definitely want my bot to have redundant connections and lifetime subscriptions to things like arxiv, Wikipedia, wolfram alpha, weather underground, …, probably CBS all access for future Star Trek shows. Maybe a low latency feed to wall street so I can watch it crash repeatedly.

Another thing I may want is for my AI bot to run on a cloud having only servers physically located in my home towns. This is kind of a digital age version of having your ashes brought home. I have made few (and may make more) places home in my life time. So my gaibot will have plenty of physical redundancy on different continents with different geopolitical climates.

With all this effort, I should also charge a fee for conversing with The Huan Chang Memorial Chatbot. Let’s set the family and friends price at $0.02 per exchange.

Alright! I have got to get cracking on my social media and digital records. Thinking these matter brings the issue of digital integrity to the forefront of my mind: more important than ever, I will demand that my gaibot to have digital integrity!

Invest While you Spend: a Tale of Freedom Joe Forever

There are a lot of fintech companies doing hyper-personal financial management. Certainly Acorn, Stash, and others like Wealthfront, Betterment, etc. etc. etc. The ideas implemented are simple but amazingly cool.

The trick being used is that the companies will either by means of being at all of your financial purchases via a debit card, or otherwise have access to your spendings through yodlee means. Then at time of each transaction they can bring about an investment. Implementations are different between vendors and I haven’t found one that I really like, but essentially there are two ideas behind this:

  1. the investment is made to the company you spent money at. This makes sense because in making a purchase, you are in some sense increasing the value of the company, and buying shares is just a way to recuperate the lost future investment gain on the spent money. “Invest in what you use/buy/love.”
  2. the investment is made as a percentage of your spending. If we assume that you use toilet paper and drink coffee today, it is reasonable that you will do the same in 10, 20, 30, maybe even 50 years. Given any rate of return, you can calculate how much money you have to put aside to eventually be able to make that same consumption at the same frequency without adding more money.
  3. Use completely automated process in the investment process to execute investments according to a modern data-driven, hyper-personal, scientific, effective and safe design.

So let’s say I drink a $5 Starbucks coffee daily. My financial advisor can guarantee with his life that they can provide a 10% inflation adjusted annual return on any and all investment. The setup is then as follows: the fintech company will make a withdrawal an additional 15% of each purchase from my balance and immediately invest the money. So, that’s $5 To Starbucks and $0.75 to investments for a total of $5.75 out of my bank account. That’s pretax, but you can also include tax in the calculations and pay future sales and income taxes as well. The present calculation do not factor in tax. If we keep doing this daily, then in 20 years, we will have accumulated enough money to drink a coffee every day forever. NB, the design is for the daily per-purchase-investment into the coffee fund to accumulate to a level when in retirement you will no longer need to grow the coffee fund, it’s earnings in investments will pay for all future coffee drinking.

How to get
Free Coffee
Forever!

You can plug your tolerance for saving rate and what you feel is believable sustained inflation-adjusted return on investments in the chart above to see how long you’ll have to save before retiring to the same life style you have today. If you let your imagination run wild a little bit and believe in a stable accumulative return quoted in inflation-adjusted annual percentages yield, then saving for retirement actually doesn’t seem that bad! 20 years is how long I’ve worked for already! If I thought of this when I was a fresh college grad, I’d be sipping on free coffee by now! But it’s not too late now, I can still work another 20 years and get my free coffee thereafter.

There are also other issues like the dollar cost averaging effect and the need for rebalancing the investments. There seem to be some additional games that you can play to increase investment risk as you get older because the fund will not need to sustain you caffeine needs for as long as forever. This is opposite of all investment advise you received today. Normally you are asked to reduce risk of your portfolio when you draw nearer to death. But in reality, if you are sure that you don’t have to pay for free coffee forever, the equation changes and suddenly you can take on much more risk with the extra coffee money.

Another concern is you may have change in taste or lifestyle expectations, one should analyze the directive to prepare for the same cup of joe for the rest of your life. For another example, a woman may not need to buy as much feminine hygiene products after menopause. The proposal uses your every spending habits today as a surrogate for measuring your future life style. But this may very well not be something you look forward to retirement. But these are more advanced financial, psychological, physiological and philosophical topics reserved for homework or future blogging.

Compute in deltas

So, for some years I’ve been stuck unable to figure out delta computing. I use the symbol \Game because it looks similar on my phone to the symbol I want. But here, I will use \Delta in place of \Game.

The small delta means difference between two programs \delta(p_1,p_2) is a program that when applied to the program p_1 produces another program that can takes any input, x, of p_1 to produce r_1=\delta(p_1,p_2)(p_1)(x), a result that is equivalent to the second program run on the same input and environment r_2=p_2(x) such that r_1\equiv r_2 for some useful definition of \equiv. This is the program difference(pd) between two programs.

The large delta the gives us the program differential operator(PD). \Delta(p,a) produces a function that can produce the change in p when a pd of its argument a is offered \delta(a_1,a_2). That is: \Delta(p,a)(\delta(a_1,a_2))\equiv \delta(p(a=a_1),p(a=a_2)) where the RHS partial evaluations are performed by partially specifying just the parametera and leaving the rest free.

An understanding of a pair of pd operator (\delta_1, \delta_2) allows for reversible change if \delta_1\circ \delta_2 is an appropriately typed identity function. A single \delta is an irreversible change. For one example, reading from a true random number generator would be an irreversible program. Inside the realm of a computer, simply reading an input from outside of the computer is an irreversible program within the computer, because it cannot affect the unpressing of the key. Even though outside of the computer we may know the reverse state of the “no” is the question“are you sure” from “rm -rf /“, the computer cannot know that for sure with its own faculties. That is to say, you cannot either, even if you are inside the computer and has access to just the computer memories and interfaces. Invertible pairs are intuitive, such as (sin(x),sin^{-1}(x)).

Our accessible realm of compute in an execution is therefore an accumulation of: (an initial state, irreversibly computed outputs, and the compute graph of reversible deltas) By modeling information this way, we can explicitly consider more general changes of states as well as give rise to a framework for understanding, interacting and developing software programs more effectively.

p.s. Btw, these ideas can be equally well expanded into operational and denotational semantics, each with their own idiosyncrasies.

p.p.s. Can we circumvent first order logic by currying functions instead of using \forall? Elsewhere I have worked out the reparameterization to achieve \forall_{a_1,a_2} \Delta(p,a)(\delta(a_1,a_2))\equiv \delta(p(a=a_1),p(a=a_2)). One of several example of this kind of reparameterization would be \Delta(p,\delta_a) \equiv \delta(p) each LHS and RHS now is requested to takes two parameters typed for a and yields a function that computes the pd of p when it’s parameter a changes from one to the second. To achieve the first order approximation effect of derivation in ordinary calculus on reals, all we need is to specify a loose \equiv^1, the first order equivalence, and so on. There are also sub-first-order equivalences such as: having at least same number of characters in the program code, that they are in the same language, etc. First order equivalence should minimally be a program having sufficiently compatible (-ly typed) input and outputs. Subsequently higher order equivalences include progressively more and more identical runtime behaviors or progressively more matching meaning. Here, again is another example of why presently described paradigm is beneficial: for example if a program is stochastic, how do we determine if another program is equivalent to it other than that the code is identical? By isolating the irreversible compute of receiving (from identical) external entropy, the remaining program can be evaluated in the f^{th} order using conventional \equiv^f. Further higher order equivalence may require that they have same runtime/memory/resource complexities. Which, btw, inspires an n^{th} ordering \geq^n that requires all equivalences \forall k<n \equiv^k and then at the n^{th} level require LHS to be better than RHS—such as lower runtime complexity, etc. The details of all these developments are documented more fully elsewhere.

p.p.p.s. Where is this headed? Well, aside from modeling the universe, one possibility is to achieve truly symbolic differentiation and do back-prop on program code. One can ask, for the PD to a program’s unit test wrt the program. We then pass in the pair (false,true) to arrive at the program (code) mutator that can repairs the input program to produce a program that causes the unit test to pass, after which we use higher ordering to search for a better program.

One can dream…

Deep Universal Regressors Elsewhere

I just chanced upon a fascinating article called the Neural Additive Models Interpretable Machine Learning with Neural Nets(FAMX.3 for me due to my interest, but others may feel this draft is a 2 or 3 due to brevity) The proposed ExU is a layer that has an foreactivated parameter (see my own blog discussions on the need for nonlinear over raw parameters here, here, and here, etc.)

h(x)=f(e^w * (x-b)

I’m very excited that people like Jeffery Hinton and Richard Caruana are thinking about and writing about stuff that I’m thinking about and writing about at about the same time and arriving at similar solutions. In this case they performed foreactivation on a weight matrix. This paper of course is a collection of massive amount of experimentation, far more than I had resources to accomplish. These smart folks also solved the problem of sign that I had struggled with a bit—the sign is washed out by having multiple layers. (64 in their successful examples)

oh! That was obvious, now that they say that. the tanh-autoactivated sign I wanted to multiply on the front of the e^W was not necessary after all. As long as there is at least one “linear” layer at the output of the subnetwork that does not use the ExU or another sign-restricting foreactivation on the parameters, then the output can have a full range in R irrespective of input and therefore can be a universal regressor.

My only concern is the effort it required to arrive at their awesome results, no less than 4 hyperparameters had to be tuned using Bayesian optimization. I think my own laziness demands that there be a way to tune a model hyperparameter using only learning rate warmup and decays—the dynamical nature of a model and its data should be entirely taken care of by the model and automated training process. The foreactivation is one such mechanism.

Of course, I only have access to the initial draft posted on 2020-04-29. I am very hopeful that in subsequent revisions and sequels this highly flexible and highly interpretable modeling technique can made easier to use.

Good luck

I just showed my kids Apollo 11 launch to the moon, Apollo 13 around the moon, and Space Shuttle Challenger launch. T minus 18 minutes.

I mean yes, many my reckoning, Musk’s company Solar City conn’d $20k from me by selling me a wrongly designed solar and battery system. Their support and resolution teams are basically people acting like robots sending scripted emails. I am very angry with this company throughout my entire purchase experience. I attribute some of their behavior to Musk’s leadership as well. I hope these people are laid off for ever and change jobs because this is just evil what they do to their customers. This is just evil—in the sense that they suck money from customers using deception. These people should just retire and never work again(and they can after musk pumped their stock up) because what they did is wrong and evil.

I really do hope this launch goes smoothly though. watching the Apollo launches, my eyes are drawn to the brilliant fixed-width san-serifs letters USA emblazoned on the side of the rocket. What a great time, 1969, when USA had meaning and spirit that can shine through even a few letters. It’s sight stirs an exuberant excitement within me today even though I wasn’t even alive then.

Today, you’d probably have to look at letters like T-e-s-l-a or S-p-a-c-e-X for anything with any spirit and meaning and hope and excitement and confidence and motion.

So, be that as it is, I wish them absolutely the best. May it prod mankind forward. I hope SpaceX is not evil to its customers and investors like Tesla is.

Electronic Management

There have been a lot of news about increase in workplace management, as in remote monitoring, as many folks are either forced to or choose to work from home due to covid19.

It occurs to me that there might be some argument for the additions to the rights and freedoms of the individual. The company is a very powerful entity with a lot of resources at its disposal. In my past posts, I have argued for organizing labor for software engineers. It must be recognized that there is a gross disparity of power between individuals and their employers when it comes to the modern electronic work.

Okay, sure, yes, they teach in business school that information asymmetry is only way to get an edge. But considering the relationship as part of human society, we can clearly see that the individual has too little information and too little power to preserve their own human interests. If everyone had more power, there would not be a few billionaires and a lot of poorer people. Wealth and income inequality is really just a consequence of information and power inequality—secrecy and domination. So in some ways, by balancing the power of each individual versus the collective singular entity of a company, we may influence those inequalities that we care about.

So, for one example: it is publicly suggested by news media that most workplace computer have technologies that can: record what you type on it, live viewing or record video of screen as you use the computer, and remote alteration of computer files on your work computer. Although this sounds like science fiction to many, it may very well become true by the time you read this, the technology required to accomplish this not so far fetched. Let’s assume for this discussion that our world is one where such technology is in prevalent use.

These “solutions” that companies use were purchased for legitimate reasons. They need to keep people productive at work. They need to protect company secrets. They (hopefully) need to preserve workplace professional integrity. Sometimes they need to correct certain situations directly without passing through the chain of management. And many other reasons. The technology very directly solves these problem.

They may be right about what happened two hundred thousand years ago. The past is written but the Future is left for us to write, and we’d have powerful tools, Rios, Openness, optimism and the spirit of curiosity. All they have is secrecy and fear, and fear is the Great Destroyer, not…

Star Trek Picard S1

It seems in the fictional 24th century, they have been enlighten to the fact that secretive actions driven by fear is not the path, that there is a brighter way forward for everyone.

So, it is towards that end I consider the state of workplace monitoring. Perhaps we have not advanced, each of us as individuals and all of us as a civilization, to a level where we need no workplace monitoring—I mean if we had that we probably wouldn’t need laws and law enforcement either. But when we must have monitoring. When the humans in a company must play with god like powers, perhaps there should at least be transparency.

What I propose is very simple: employees who are under company surveillance and control must be given the information the company has collected on them. If the company makes a change to files on his computer, he must be informed of those changes. Equal power also means you have access to the tools they use to analyze your activities. If your boss had a dashboard where he can look into your bathroom visits, by day of week, by time of day, as measured by frequency and duration, you should have the same access to the same dashboard. If he knows how fast you type on Monday mornings before and after your coffee break, so should you, and with the same latency that he has. If he changes a file, maybe even a single letter, you should be informed of it.

Actually we can probably separate the employer activities into different levels of access:

  • Lowest level:
    • computer IO(screen recording and keyboard logger)
    • audio-visual recordings
    • Live monitoring of screen should be accompanied by a clearly visible signal to the employee that someone is watching live.
    • Raw data files should be made available to the employee
  • Aggregation and analysis
    • Longitudinal data analysis
    • Alerts generated by the employers systems
    • Access to dashboards and analytics tools should be made available to the employee
  • Decision making
    • Explanation of why and AI made a determination for or suggestion to the company.
    • I would ask for an explanation of why a human manager made a determination for or suggestion to the company,…, but that’s not the fairness I’m fighting for today.
    • How a decision or determination was made must be communicated to the employee.
  • Alterations and Interventions
    • All changes the company makes to an employee’s files stored on company device or company cloud storage are relevant alterations.
    • Alterations should be communicated to the employee immediately.
    • The company should not assume that by moving the mouse on screen and typing the changes into the UI consist of informing the employee.
    • Intentional retardation (or speeding up) of equipment performance: computation, inputs and commands to the computer, the computers user interface(UI) responses, network transmission, these interventions directed towards changing the employee or his direct activities are activities that must be firmly recorded and promptly communicated to the employee.
    • The company should not represent that the alterations and interventions it made remotely to the computer data of an employee was made by said employee in any form or record such acts on media such that it is knowable to said employee, other employees, management or law enforcement.
  • Additionally, it would be very irresponsible to understate the meta-requirement of employment. The amount of effort an employees or subjects should devote to monitoring of metrics the organization provides them as part of their jobs or subjugation, or as the case maybe, as part of their “relationship” or “complication”.

Also, to be perfectly clear, for all your quick-jerk reactionaries, I obviously mean employee should have access to recordings of their own activities and not other employees activities. Employees in management role with access to other employee’s data bear extra burden of integrity and responsibility.

I make these suggestion because I feel that they are essential to preservation of human workplace digital rights and digital integrity. In America, we can still dream of human freedoms and rights. In America we can speak opening about what we feel is right and just. We can still do right by ourselves and treat everyone with dignity and respect and trust and support.

And, I mean, think of it, you don’t want street riots in America when China or Russia or India or the EU announces human digital integrity and management transparency laws, do you? Following “I can’t breath,” may be “I can’t ty…” or shall we just follow GDPR X when it comes to pass ? The late-comer advantage is very clear here, it is much easier for technologically underdeveloped groups of people to establish new regimes in the technology that they build afresh than the established technology industry with “stuff that (still) works)” If you think you’re afraid of China having 5G, just wait till the worlds start copying the Russian constitution for rigorously defined and well balanced laws regarding digital rights and integrity. What truths do we hold evident then?

If we can just get these things right, then our world will flourish with the truly free use of our computers technologies to advance us. If we can treat each other with dignity and fairness, we can fly to the stars.

Let’s make it so!

P.s.

The need for disclosure to the monitored subjects does not rise to the level of medical and scientific disclosure to human experiment subjects. those pursuits tend to have higher-minded goals of universally improving human knowledge and life. Not all organizations have or need to be held to that standard.

Present demand for disclosure also does not invoke fundamental human rights, and leave that open to argument.

What we do stake support in is the need for governance of power. Corporations and other legal entities have power over individual human entities. Those powers must be kept in check. Just as companies are required to disclose the results of credit and background checks made for purpose of employment, advanced monitoring should also include mandatory disclosure of their products. Clearly we have great precedence for mandatory disclosure for other sensitive and private information regarding a person. And certainly, when something is done to a person’s he must be informed of those alterations. We should have it for all monitoring, recording and affectations targeting a person in presence of material power and information disparity.

Oops about judge dear critically

The old saying that I say: “that which you hold very dear you judge very critically.”

But I just watched the first two volumes of Disney Family Sing-Along, and there’s not any Chinese artists here either… and Disney is certainly unarguably the definitive entity for story-telling, thought-inspiring, and in 4 years, multi-centennial, money-making media giga-conglomerate.

I cannot count the number of times I wrote about Star Trek not having any Chinese resembling Asian people in the future.

The one Chinese character they do have has misspelled name, her family name should be Hua not Fa. I wonder where Disney gets their history? From Hun-speak???!!

Star Trek still has another 50 years to exceed this, and I am confident it will!

The Hesitation

Suppose I want my chatbot to be conservative in its learning, what are the ways we control that? There is dropout, and weight decay, and normalization. One idea was to find a way for it to learn and then sleep for a while before learning again. If we look at the gradient of such a function, it would look like 1+ cos(x) :

1 + cos(x)

This gradient is motivated by the want of rest period, when the input passes a certain periodic magnitude the progress of gradient based optimizer is gradually slowed down(but still pointed towards the same direction.) After the sleep, the function wants to progress fast to make up for the time spent sleeping, so the steps are bigger. A bit of manipulation of that expression produces the function with that gradient function x+sin(x)

x + sin(x)

This addition of a hesitation can be used as a foreactivation (defined in previous FAM entries). Or in fact the hesitation layer can be placed any where a dropout is normally used.

A certain amount of experimentation is required to inject useful amount of randomization. This layer is particularly easy to instantiate after units that have known scaling, such as sigmoidal activations, softmax, batchnorm, and others. One formula in particular is amplitude. For \alpha \in (0, 1] The hesitation layer H_\alpha(t) = t + \alpha sin (2 \pi t)/{2\pi} makes it possible to adjust the flat part of the sleep cycle. Here is \alpha = 0.5 in blue next to the original:

H_{\alpha=0.5}(t) = t + \alpha sin (2 \pi t)/{2\pi}

More work is needed to establish the precise effect this layer has different types of optimizers. For optimizers like Adam, the added variability would normally increase the noise of the gradients passing through the layer thereby effecting a reduction the learning rate. For \alpha>1 the hesitation layer becomes non-monotonic, but since Adam accumulates gradients, the gradient in the “wrong” direction will slow the progress of optimization more significantly than smaller \alpha’s. It will not necessarily produce an unrecoverable valley of local minimum. For \alpha \in (0, 1) this layer will not introduce new univariate local minimums or saddles to the optimization it is being added to. With randomization, the units will sleep at different times giving other paths of gradient that are not sleeping a chance to explore their potential to improve.

Another direction to explore is learnable sleep cycles in the form of H_{\alpha\beta}=x + \alpha sin(2 \pi \beta x)/{2 \pi \beta} . Where \alpha \beta are either a single scalar or properly sized tensor for element-wise application. Generally the adjustment of \alpha \beta will be for quicker progress towards the direction of underlying gradients.

Wdyt?

Ps one can work out the gradient H_{\alpha}(f(x))= f(x) + \alpha sin(f(x)) . Take the derivative wrt x. \partial H_{\alpha}(f(x)) / \partial x= (1+\alpha sin(f(x))\partial f(x)/ \partial x. So you see the gradient varies between [1-\alpha, 1+\alpha] times of f(x)’s gradients.