Machine Teaching

So, obviously all you smart people in science and technology have built internal algorithms for keeping up with this insane progress. In particular, it is especially visible in peer reviewed software and publications. More stuff of incredible quality are being produced than any person can contain in his head. If you compute the bits of input you have via physical constraints of the nurons in your eyes and ears and nose, skin, etc. It receives less information in 24 hour period than that which is produced. Imho.

I am looking for how people think about the totality of human knowledge, its organization, advancement and teaching. There was a decade or so when Google was set to perform this task: organize the world’s information. But, apparently, they have abandoned that for Alphabet.

Let’s see. If I were to think about it, and I will disclaim that I know of a lot of people who are better than me at knowing more and thinking effectively, but for a person of my knowledge and skill, I can imagine all scientific knowledge and software as code.

OSS and scientific publications are versioned objects in this language. It may inherit from Publication, with Peer Review. Publication class may contain Authors, title, abstract, section, reference and appendix, it may also have venue(Nature, GitHub) and genre(immunology, reinforcement learning, group theory, reversed web proxy), date. More generically, there is a vocabulary associated with the publication of genre. Commonly used subroutines may include metrics, sampling or experimentation methodology, methods of comparison(i.e. a function that takes in present paper and other papers and judges present paper worthy of publication (irrespective of peer reviewers))

For example

better_rocauc(this_paper_algo1, [other_papers], [Iris, mushroom, mnist])

smaller_amotized_runtime(this_paper_algo2, [algos from other referred papers], [imagenet])

better_false_discovery_rate(this_paper_procedure_A, [text book approaches, procedures in popular use])

best_worst_case_performance(my_algo, your_algo)

The publication objects most importantly exposes to the public

  • Context of the publication, including references to previous publications documenting relavent or comparable knowledge.
  • Export data for future publication comparison
  • Export of functionality via API
  • Explanation of what it improves on(beats what other publication on what metric)
  • Describes sufficiently how to replicate the representation of knowledge in this publication.
  • State known limitations and directions of improvements.

For technical publication, the API might be a new procedure for distilling gold out of stream water, for scientific publications, we have to come up with a functions or procedural representation of knowledge. Consider a new measurement of speed of light, it’s obvious that this can be done. What about discovery of a new planet or organism or organ?… What about a nonterminating program? What about a new useful transcendental functions? It will rely on the establishment of a vocabulary and semantics–the language of knowledge.

But there seems to be an infinite way to write these things. Is Occam’s razor the principal to take in designing this language? I would propose a couple of approaches to solving knowledge.

  • Communication: One important principle is to restrict considerations to knowledge as communication. Knowledge as we know it may have powers for us to think and act. But the knowledge that this blog post is considering is mainly the communication of knowledge. Publications in scientific journals and OSS are mainly to teach and empower other systems or people to think and act. Admittedly, the knowledge within, it’s representation in my head or that cluster of massively parallel computers are highly important to our success. They may be of great research interest to any, but we have no real power over them unless the communication of knowledge is established. Otherwise we can only stand idly by and watch each of our separate intelligence perform separately based on their separate knowledges.
  • Generalization: This is a subsequent restriction to knowledge of interest to those as communication. Roughly speaking, generalized communication is an effective broadcast of knowledge. This is to contrast with, for example, point-to-point communication, or multi-cast communication, and encrypted communication, where in each case the communication is either intended or guaranteed to be comprehensible only by designated parties and not a general unrestricted pipulation. Dissemination of knowledge is the sole goal of present endeavor.
  • Efficiency: Another important principle is to balance efficiency and expandability. Occam’s razor, as great as it is suffers from short-sightedness. The modern knowledge-base designer must be conscientious of the present limits of total human-computer cognition on earth. It must admit the imminent possibility of a redesign to include new knowledge that we do not yet know and those that we have not anticipated. This is the only prudent path forward and must be audited frequently.
  • Verification: A second balance is the one between theoretic guarantees and empirical verifiability. Properties of the language must be empirically verified. Theoretical analysis on its limits and powers are also very important. This balance is not mutually exclusive and is not subservient to any other principles.
  • Usefulness: A final principle of solution is that they must be requirements-driven.

One must always ask: what’s this for?? Each of these design seem like monumental to pursue, and they certainly intermix and needs to be translatable between each other.

  • The language is for human consumption.
  • The language is for human production.
  • The language is for machine consumption.
  • The language is for automated scientific or programming systems.

Each language will have dialects for different genres. For example the human consumption language may have

  • Biological Sciences
  • Numerical Algorithms
  • Psychology
  • Theology

The dialects for machine interpretation may be:

  • Python
  • Perl
  • ADA
  • Java

Although, one note of caution, these organically grown programming languages often communicate meaning both to human and computers. Additionally, the “source code” often explicitly stipulates internal machine representation during execution. In this regard, we must rethink language design and separate all these concerns!

These abstractions can also result in other changes such as new formal peer review system. Machine invention, machine experimentation, and the one thing I could really use: Machine Teaching. The machine should teach me the agglomeration of human knowledge and history, all that is interesting and necessary I want to learn before 12… next life time perhaps.

Okay, that’s all for now. Time to hit the books on epistemology… I have to learn everything the old way before the machines can teach it all back to me a better way.

P.s. this blog post was Made on Earth, and © 2018 FAM Blog.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s