Dynamic Intepretation of Random Forests Predictions

Notes on understanding why Random Forests makes its decisions.

Understanding Random Forests

A good and visual explanation of how Random Forests works.

Model Feature Importances

Feature importances can be taken from Scikit-learn and Spark MLLib implementations after training.

However, this explains features as a whole based on the training dataset. i.e. We are still lacking visibility on an individual prediction.

Different methods of Explaining 

A good overview of ways to explain a random forests model.

Visual explanation for each prediction 

This library does the job.

Git LFS Migrate from GitHub storeage to Dropbox using lfs-folderstore

GitHub charges exorbitant prices for the storage of files on LFS.
50 GB on GitHub costs $5 a month
2000 GB on DropBox costs $9.99 a month
GitHub is charging 20 times more per GB!
Hence join the rebellion! Move your Git LFS files to DropBox and I hope this will force GitHub to lower their prices to a more reasonable level!
Instructions here are for GitHub, but they should work with no/minimal modifications for BitBucket or any other Git repo hosting provider.

Install lfs-folderstore

Get lfs-folderstore binary from:
Add lfs-folderstore to path (e.g. copy to /usr//bin)

Clone the GitHub repo that you want to migrate LFS objects to Dropbox

git clone <url>

Pull all LFS objects from GitHub repo to local repo

git lfs fetch --all origin

Configure GIT to use lfs-folderstore LFS custom transfer agent

Sample Config lines to add to .git/config
[lfs "customtransfer.lfs-folder"]
    path = lfs-folderstore
    args = E:/Dropbox/git-lfs-folderstore/ns-test-data
    standalonetransferagent = lfs-folder
    url = https://localhost
Commands to run to make the above changes to git config
git config --add lfs.customtransfer.lfs-folder.path lfs-folderstore
git config --add lfs.customtransfer.lfs-folder.args "C:/path/to/your/dropbox/folder"
git config --add lfs.standalonetransferagent lfs-folder
git config lfs.url "https://localhost"

Push to Dropbox LFS folder

GIT_TRACE=1 git lfs push origin master --all
Check “C:/path/to/your/dropbox/folder” to verify that lfs-folderstore has copied LFS objects there. (They will be in folders with hexadecimal names like “f5”, “ee”, “1d”, etc.)

Prune local LFS objects

They are not needed because they have been copied to the DropBox folder in the previous step
git lfs prune

Verify that you can check-out past versions of LFS files

For example:
Checkout a commit 10 commits behind HEAD of master:
git checkout HEAD~10
You may check that the files tracked by git-lfs has changed accordingly.
git lfs ls-files
Lets you see which files are tracked by git-lfs.
Reset back to HEAD of master:
git checkout master

Free-up you GitHub LFS storage!

Send Email/Ticket to GitHub and ask for the LFS objects in your repo to be deleted.
Give them the name of your repo. e.g. username/repo-name and make sure it is ONLY LFS objects deleted.

How Deep is your… Neural Network? How deep should it be?

How many hidden layers? How deep should your neural network be? How large or deep a fully-connected neural network can or should be?

All good questions, here we explore some answers.

This book’s chapter takes the cake for how large or deep a fully-connected neural network can or should be:

At present day, it looks like theoretically demonstrating (or disproving) the superiority of deep networks is far outside the ability of our mathematicians.

One way of thinking about fully connected networks is that each fully connected layer effects a transformation of the feature space in which the problem resides. The idea of transforming the representation of a problem to render it more malleable is a very old one in engineering and physics. It follows that deep learning methods are sometimes called “representation learning.”

Some lively discussion on that is more practical:

An answer quotes:

Determining the Number of Hidden Layers

Number of Hidden LayersResult
0Only capable of representing linear separable functions or decisions
1Can approximate any function that contains a continuous mapping
from one finite space to another
2Can represent an arbitrary decision boundary to arbitrary accuracy
with rational activation functions and can approximate any smooth
mapping to any accuracy
From Introduction to Neural Networks for Java (second edition) by Jeff Heaton

Another answer says:

More than 2 [Number of Hidden Layers] – Additional layers can learn complex representations (sort of automatic feature engineering) for layer layers.

These nice academic folks wrote a whole paper exploring heuristics and things like genetic algorithms to find the optimal size and depth of a fully-connected neural network:

Maximum accuracy was achieved with a network with 2 hidden layers, of which the topology was found using a genetic algorithm.

I have extracted Table 2 from the paper for your viewing pleasure:

Note that for deeper topologies (i.e. more hidden layers), the variance of accuracy and gap between max and min accuracies are far larger. This implies more time and effort is needed to figure out the best training method for a deeper network.

It seems that deeper networks can achieve higher accuracy due to better representation learning, however, they are much more unstable when training and many training iterations may be required to exceed the performance of a shallower fully-connected neural network. This implies that a should system should be in place to permutate or learn the hyper-parameter search-space.

As for how many nodes per hidden layer, the evidence seems to point towards larger numbers and taking advantage of the drop-off hyperparameter to avoid overfitting the model.


For Success – Start Doing These Things NOW

Some key points I like:

  • Start working on your emotional health, now
  • Save and invest as much money as you possibly can
  • Find friends that are going somewhere in life
  • Keep reading
  • Find a workout regime that supplements your primary aim in life
    • Instead of only associating exercise with “getting fit,” think of it as a routine to make you better in all aspects of your life.

  • Don’t let your hobbies die
  • Find a mentor — and forgo short-term rewards for knowledge that will last a lifetime
  • Nurture your relationship with your significant other
    • …having someone to share the journey with, to emotionally support you along the way, isn’t going to hold you back. If anything, a life partner will make you better

Best NLP Model – Not Best for The Job?


The post above examines current state-of-the-art (SOTA) models namely:

  • ELMo
  • USE (Universal Sentence Encoder)
  • BERT
  • XLNet

It goes on to introduce different methods to evaluate those models based on the task at hand.

A little explanation of why the models are different is also given.

They did not state which version of USE was used – there are two versions:

  • Deep Averaging Network (USE-DAN)
  • Transformer (USE-T)

The former being less accurate but more performant on longer sentences.

Another thing to note is that ELMo while being contextual is not deeply contextual as declared by the people that created BERT. Obviously BERT is.

Also missing from the action is OpenAI’s GPT-2, I would have liked it if it was included.

There is some buzz about XLNet, but I have not read enough about it to comment other than it promises the ability to learn longer-term dependencies in text. However, given that transformer models compute cost grows quadratically to input text length, I am curious how they handled that.

Other takeaways:

…without specific fine-tuning, it seems that BERT is not suited to finding similar sentences.

…USE is trained on a number of tasks but one of the main tasks is to identify the similarity between pairs of sentences. The authors note that the task was to identify “semantic textual similarity (STS) between sentence pairs scored by Pearson correlation with human judgments”. This would help explain why the USE is better at the similarity task.

Pre-trained models are your friend: Most of the models published now are capable of being fine-tuned, but you should use the pre-trained model to get a quick idea of its suitability.

Machines – More creative than humans?

Being creative or artistic has long been the sole domain of humans. What if machines are able to be as creative? What if machines get better than humans at that?

Allow me to show you some contemporary developments in Artificial Intelligence that might just challenge our assumptions in these four areas:

  • Writing
  • Art
  • Music
  • Problem-solving and Teamwork

But first, let us have a brief discourse about creativity.

What is Creativity?

Mundane dictionary definition below:

For our purposes, I focus on the “create something not seen before” and “invent a new way to solve problems” part of creativity.


In February 2019, OpenAI released  GPT-2 to the public – an AI that read 8 million web pages to learn English. It has about 1.5 billion parameters (a.k.a. artificial neurons). In comparison, the human brain has about 100 billion biological neurons.

In school, our writing skills are evaluated in a number of ways, for example, say, the ubiquitous “Complete the story” exercise. That is to say, we get a prompt, usually in the form of a paragraph like:

In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.

We are then expected to write the rest of the story in a way that pleases our teachers. The following is one such example of such a story:

The scientist named the population, after their distinctive horn, Ovid’s Unicorn. These four-horned, silver-white unicorns were previously unknown to science.

Now, after almost two centuries, the mystery of what sparked this odd phenomenon is finally solved.

Dr. Jorge Pérez, an evolutionary biologist from the University of La Paz, and several companions, were exploring the Andes Mountains when they found a small valley, with no other animals or humans. Pérez noticed that the valley had what appeared to be a natural fountain, surrounded by two peaks of rock and silver snow.

Pérez and the others then ventured further into the valley. “By the time we reached the top of one peak, the water looked blue, with some crystals on top,” said Pérez.

Pérez and his friends were astonished to see the unicorn herd. These creatures could be seen from the air without having to move too much to see them – they were so close they could touch their horns.

While examining these bizarre creatures the scientists discovered that the creatures also spoke some fairly regular English. Pérez stated, “We can see, for example, that they have a common ‘language,’ something like a dialect or dialectic.”

Dr. Pérez believes that the unicorns may have originated in Argentina, where the animals were believed to be descendants of a lost race of people who lived there before the arrival of humans in those parts of South America.

While their origins are still unclear, some believe that perhaps the creatures were created when a human and a unicorn met each other in a time before human civilization. According to Pérez, “In South America, such incidents seem to be quite common.”

However, Pérez also pointed out that it is likely that the only way of knowing for sure if unicorns are indeed the descendants of a lost alien race is through DNA. “But they seem to be able to communicate in English quite well, which I believe is a sign of evolution, or at least a change in social organization,” said the scientist.

What do you think? Not too shabby eh?

As you might have guessed, that was written by GPT-2. Its writing is not perfect, but it was pretty entertaining to me. It took one paragraph – the prompt and created a story of nine original paragraphs all along the same theme as the prompt. Along the way, it invented names, places and introduced the notion that unicorns are “descendants of a lost alien race” and the way of “knowing for sure” is to through DNA.


One way to learn art is to view many drawings of a particular art style. Our brain then does something wonderful – it generalizes features of an art style so we can transform, mix and match the features to create original art. i.e. Copy from one artwork and it is plagiarism, copy from a number of artworks and that is being creative!

The Neural Network (or AI) in action is called a DCGAN  or Deep Convolutional Generative Adversarial Network

Generative Adversarial Networks (GANs) are one of the most interesting ideas in computer science today. Two models are trained simultaneously by an adversarial process. A generator (“the artist”) learns to create images that look real, while a discriminator (“the art critic”) learns to tell real images apart from fakes.

So we feed the AI one thousand anime faces – here’s a subset of them (see all of them here):

After just half an hour of training on a humble PC, here is a sub-set of its output:

And there we go – original art by an AI. It is not perfect, but some of its “drawings” are pretty good. It is important to note that none of the drawings exists and that they are not just augmentations of the drawing fed to it – the AI learned the art style and proportions features found on anime faces and “day-dreamed” the new drawings. On a more technical note, the AI was fed random noise and out came those drawings.


We have seen this scene many times, someone plays a short piece (aka riff/motif) on say, a piano, then a fellow band member says: “Hey! That sounds rad.” and proceeds to play an extension of that motif in the same spirit.

So here is the rub – there exists an AI that can do just that – play an extension to a motif played on a piano. It is called a Music Transformer with Relative Self-attention.

Here is the motif we give the AI:

And here is what it comes out with:

How does it work? Well, it’s complicated but here is a glimpse:

Generating long pieces of music is a challenging problem, as music contains structure at multiple timescales, from milisecond timings to motifs to phrases to repetition of entire sections. We present Music Transformer, an attention-based neural network that can generate music with improved long-term coherence.

And here is a peek on how the AI is doing it:

To see the self-reference, we visualized the last layer of attention weights with the arcs showing which notes in the past are informing the future.

I wish I could play the piano half as well… the AI is paying “attention” to what is played to determine what to play next – hence the self-reference/self-attention.

Problem-solving and Teamwork

In typical human fashion, we have left the best for last – problem-solving and teamwork.

Presenting *drumroll* – OpenAI Five!

At OpenAI, we’ve used the multiplayer video game Dota 2 as a research platform for general-purpose AI systems. Our Dota 2 AI, called OpenAI Five, learned by playing over 10,000 years of games against itself. It demonstrated the ability to achieve expert-level performance, learn human–AI cooperation, and operate at internet scale.

Let us extract three juicy bits from the grandiose paragraph above in the following order:

  1. Dota 2
  2. Expert-level Performance
  3. Human-AI cooperation  

Dota 2

  • On average, there are around 500,000 humans around the world playing Dota 2 at any moment (With an all-time high of 1.2 million players)
  • It is the most massive and competitive e-sport – The International 9 Dota 2 tournament has a prize pool of US$30 million.
  • It is a game that has 5 players on both opposing teams (for a total of 10 players per game) and requires the best teamwork to win.
  • A game between 2 teams takes 40 to 50 minutes to complete (on average). This means some longer-term strategic planning is required to win. i.e. Instead of scoring immediately, do something that seems to compromise the chance of victory early in the game because it is important for victory 10 to 30 minutes later.

Expert-level Performance

Like any proper sport, every team in Dota 2 is ranked. Win games and your Matchmaking Rating (MMR) increases, vice versa if you lose games. As its name implies, it is pretty handy to use a team’s MMR to matchmake them with a team that has a similar MMR to have more enjoyable games.

With the above in mind, the AI (OpenAI Five) has beaten teams in the following order:

  • Best OpenAI employee team: 2.5k MMR (46th percentile)
  • Valve (Makers of Dota 2) employee team: 2.5-4k MMR (46th-90th percentile)
  • Amateur team: 4.2k MMR (93rd percentile)
  • Semi-pro team: 5.5k MMR (99th percentile)

Blitz – a professional Dota 2 commentator said that OpenAI Five used tactics that he only learned after 8 years of playing the game.

OpenAI Five was also observed to “deviate” from current playstyle (i.e. optimal way of playing the game as done by the pros) This suggests that it found a better way to win games that humans did not discover yet.

Human-AI Cooperation

OpenAI Five is scaled up to play the Internet as a competitor or teammate and has won 99.4% of 7656 games it played. It played against 15,000 players and played cooperatively with 18,700 players.

OpenAI Five’s ability to play with humans presents a compelling vision for the future of human-AI interaction, one where AI systems collaborate and enhance the human experience. Our testers reported feeling supported by their bot teammates, that they learned from playing alongside these advanced systems, and that it was generally a fun experience overall.

Wait, what? Players feel supported and learned from playing alongside the AI? Here is more fuel to this fire:

It actually felt nice; my Viper gave his life for me at some point. He tried to help me, thinking “I’m sure she knows what she’s doing” and then obviously I didn’t. But, you know, he believed in me. I don’t get that a lot with [human] teammates. —Sheever

“…he [the AI] believed in me. I don’t get that a lot with [human] teammates”

Yep, I totally see a future where my best buddy that I play games with is an AI. Curious about how our buddies of the future works? Read on!

Over-simplification/glossing of how OpenAI Five works

OpenAI Five uses a Large-Scale Reinforcement Learning Long-Short Term Memory Network trained using Proximal Policy Optimization. See the paper here.

Large-scale means they managed to make it learn/train on many many computers (128,000 CPU cores – the average computer these days has 6 CPU cores)

The primary reason why so many computers are required is that it learns by playing with itself! Playing with humans to learn the game is just too slow and expensive. Not to mention the AI might pick up their bad habits. It plays the equivalent of 900 years of games every day.

Reinforcement Learning means it learns which actions are best to take given a particular state of the game. It does so by exploring (aka messing around) and exploiting – making use of what it has learned. Rewards (e.g. scores, or kills) from taking those actions are used to determine which state and action are best.

Long-Short Term Memory (LSTM) is quite like what it sounds – an artificial neural network that takes note of things that happens in the short term and long term.

Proximal Policy Optimization (PPO) effectively makes the AI learn slowly so that it fairly explores as many possibilities as it can before learning that a particular way is better.

OpenAI Five Disclaimers

OpenAI does not play with all Dota 2 Features, it uses a custom game type specifically made for the AI. In particular, it restricts hero selection to only 17 types vs 117 in the official game. The game is played by controlling heroes selected. Invisibility effects are also removed.

The captain of the team OG (which made history as the first two-time world champion team) said this:

I don’t believe in comparing OpenAI Five to human performance, since it’s like comparing the strength we have to hydraulics. Instead of looking at how inhuman and absurd its reaction time is, or how it will never get tired or make the mistakes you’ll make as a human, we looked at the patterns it showed moving around the map and allocating resources.


Needless to say, we have just scratched the surface of it all. These are interesting times, my hope is that the future is humans cooperating with AI to make the world a better place to live in. However, in the long-run human-AI integration might be inevitable for humans to evolve past an AI dominated landscape.



Getting Krusader to work on OSX 10.14 (Mojave)

Krusader is a very useful file management application in the likes of the “legendary” midnight commander.

Since I use my Macbook on OSX a lot, I accepted the challenge to get it to work on OSX 10.14 Mojave.

There are 5 things to do here:

  1. Use Macports to compile krusader (This will fail as you will see below)
  2. Apply patches to Macports downloaded krusader source files and get it to compile it successfully.
  3. At this point, krusader “runs” but crashes at startup, so we now have to enable the dbus daemon on OSX.
  4. Now it runs and but the interface is flickering/blinking making it quiet usable, so we use qtconfig to configure the “Interface -> Default Graphics System” to “Native”
  5. Profit.


Install XCode

Use the App Store and install XCode (The version I used is 11.3.1)

Install Macports

Sync Macports

Run the command below in Terminal to get the latest index of Macports packages:

sudo port -v sync

Get Macports to install krusader. It installs packages and downloads source code for compilation.

sudo port install krusader

As mentioned earlier, it will fail to install due to a compilation error. Macports will print a vague message telling you that you want to take a look at a .log file to see what happened. Open the .log file (using your favorite text editor – I use Sublime, but TextEdit that comes with OSX will work), scroll down to near the end and notice the filenames that gave an error during compilation.

Patch Krusader Source Files

Download Patch Files

This is the ticket that has the fix to krusader not compiling on OSX 10.13 –

Below that page, you will find a link to the patches that we need:

We want to download those 4 patches. Click on each of them and near the top of the page, right-click on Download and save the files to your computer


If you look at the first patch file, you will find a line like this at the top:

--- krusader/Dialogs/packgui.cpp.orig   2018-08-30 09:48:14 UTC

Remember the filename you got while looking at the Macports install log file? We now want to use Terminal and navigate to the parent of the parent folder of the file.

For example, if the filename path is: /some/long/path/krusader/Dialogs/packgui.cpp.orig, we want to be in the folder: /some/long/path

Move the patch files you downloaded to the folder mentioned above, 

Now run the following for each of the 4 patch files:

patch -p0 the_name_of_patch_file_you_downloaded

The above command should say something along the line that the patch was successfully applied.

Get Macports to compile Krusader again


sudo port install krusader

Congrats, it should go ahead and complete compilation of Krusader and install it as /Applications/MacPort/KDE4/

Enable dbus Daemon

In Terminal, run:

sudo launchctl load -w /Library/LaunchDaemons/org.freedesktop.dbus-system.plist
launchctl load -w /Library/LaunchAgents/org.freedesktop.dbus-session.plist

If you run Krusader, it should run now, however, you might notice the interface blinking erratically. The fix is found below.

Configure QT/KDE Default Graphics System

Run qtconfig – Double-click qtconfig found in /Applications/MacPorts/Qt4

Navigate to the Interface tab and select OpenGL for the Default Graphics System (found below the tab page). Close the app and click on Save when it asks you if you want to.

Double click in Finder: /Applications/MacPorts/KDE4/

Voila! Krusader runs in OSX 10.14 as advertised. Enjoy and stay safe.

Measuring Performance is about Measuring Relationships

What is the underlying focus of a KPI? It could be a telltale sign of the organization’s culture.

The following post is a riveting and sometimes incriminating commentary on how relationship-based goals in an organization can prevent alienation of the people most important to it, namely:

  • Employees
  • Customers
  • Partners
  • Management

A handy “test” is included for anyone interested. 🙂

…measuring performance is measuring relationships.

Here was an organization with nearly 7,000 staff but none of its 29 KPIs related to employee satisfaction, safety, turnover, productivity, or innovation. This was not a good sign for the workforce, nor did it reflect positively on the way the CEO and the executive team thought about the organization.

The Case Against Powerpoint Presentations

How about digital blackboards?

I am going to try digital blackboard presentations for a bit to see how that goes largely because research shows that fancy Powerpoint presentations help the audience retain more information than “old-school” teaching methods.

Kind of like this:

Personally, I have seen my share of Powerpoint presentations that make me absorb less information than just saying it without visuals. Not to mention that being taught with visuals and text scribbled on a board on-the-fly feels a lot more personal and interactive.

That said, Powerpoint slides has its place. My favorite communicator that uses slides is Steve Jobs, the slides are used as segues or markers and when a digital picture or animation makes it clearer. Here is one of his best:

However, don’t just take my word for it. Thankfully there are some really smart people that have done the homework to prove the point. Here are some excerpts from research done on this topic:

Information retention from PowerPoint and traditional lectures

…use in university lectures has influenced investigations of PowerPoint’s effects on student performance (e.g., overall quiz/exam scores) in comparison to lectures based on overhead projectors, traditional lectures (e.g., “chalk-and-talk”)…

Students retained 15% less information delivered verbally by the lecturer during PowerPoint presentations, but they preferred PowerPoint presentations over traditional presentations.

Does a High Tech (Computerized, Animated, Powerpoint) Presentation Increase Retention of Material Compared to a Low Tech (Black on Clear Overheads) Presentation?

The purpose was to determine if differences in (a) subjective evaluation; (b) short-term retention of material; and (c) long-term retention of material occurred with the use of static overheads versus computerized, animated PowerPoint for a presentation to medical students.”

“There were no significant differences between the groups on any parameter. Conclusions: In this study, students rated both types of presentation equally and displayed no differences in short- or long-term retention of material.

Deep Blue to AlphaGo – Why Is It So Much Better?

20 years after Deep Blue defeated the World Champion at Chess, Alpha Go did the same for the World Champion at Go. What are the key changes that make it so much better?

Deep Blue

Excerpts from:

The system derived its playing strength mainly from brute force computing power. It was a massively parallel, RS/6000 SP Thin P2SC-based system with 30 nodes, with each node containing a 120 MHz P2SC microprocessor, enhanced with 480 special purpose VLSI chess chips.

To be fair, it’s not just “brute force computing” it does Alpha-beta pruning with some neat heuristics programmed by the team – “Deep Blue employed custom VLSI chips to execute the alpha-beta search algorithm in parallel, an example of GOFAI (Good Old-Fashioned Artificial Intelligence) rather than of deep learning which would come a decade later. It was a brute force approach, and one of its developers even denied that it was artificial intelligence at all”

Here’s the ground-breaking (back in the day) paper for Deep Blue –

Excerpts from:


Humans have been studying chess openings for centuries and developed their own favorite [moves]. The grand masters helped us choose a bunch of those to program into Deep Blue.

How did Deep Blue advance from 1996 to 1997 in order to beat Kasparov?

We did a couple of things. We more or less doubled the speed of the system by creating a new generation of hardware. And then we increased the chess knowledge of the system by adding features to the chess chip that enabled it to recognize different positions and made it more aware of chess concepts. Those chips could then search through a tree of possibilities to figure out the best move in a position. Part of the improvement between ‘96 and ‘97 is we detected more patterns in a chess position and could put values on them and therefore evaluate chess positions more accurately. The 1997 version of Deep Blue searched between 100 million and 200 million positions per second, depending on the type of position. The system could search to a depth of between six and eight pairs of moves—one white, one black—to a maximum of 20 or even more pairs in some situations.

Excerpts from:

Deep Blue’s evaluation function was initially written in a generalized form, with many to-be-determined parameters (e.g. how important is a safe king position compared to a space advantage in the center, etc.). The optimal values for these parameters were then determined by the system itself, by analyzing thousands of master games. The evaluation function had been split into 8,000 parts, many of them designed for special positions. In the opening book there were over 4,000 positions and 700,000 grandmaster games. The endgame database contained many six-piece endgames and five or fewer piece positions. Before the second match, the chess knowledge of the program was fine-tuned by grandmaster Joel Benjamin. The opening library was provided by grandmasters Miguel Illescas, John Fedorowicz, and Nick de Firmian.



AlphaGo Zero

Here’s a cheat sheet (click for higher resolution image):

Courtesy of:

The paper that the cheat sheet is based on was published in Nature and is available here.

Some key assertions of the paper:

Here we introduce an algorithm based solely on reinforcement learning, without human data, guidance or domain knowledge beyond game rules.

Starting tabula rasa, our new program AlphaGo Zero achieved superhuman performance, winning 100–0 against the previously published, champion-defeating AlphaGo.

Our new method uses a deep neural network fθ with parameters θ. This neural network takes as an input the raw board representation s of the position and its history, and outputs both move probabilities and a value, (p, v) =fθ(s). The vector of move probabilities p represents the probability of selecting each move a (including pass)

Finally, it uses a simpler tree search that relies upon this single neural network to evaluate positions and sample moves, without performing any Monte Carlo rollouts. To achieve these results, we introduce a new reinforcement learning algorithm that incorporates lookahead search inside the training loop, resulting in rapid improve­ment and precise and stable learning.


DeepMind’s AlphaZero replaces the simulation step with an evaluation based on a neural network. –

Effectively, rather than scoring using man-crafted heuristics (i.e. human gameplay experience), AlphaGo encapsulates game playing “experience” in the neural network. This effectively means that AlphaGo learns its own evaluation heuristic function.

The neural network:

  • Intuitively predicts the next best move based on the state of the game board.
  • Learns that intuition by playing many games with itself without human intervention.
  • Reduced the need for calculating ~200 million moves a second for an average of 170 seconds (average of 34 billion moves per move) to 1600 moves in ~0.4 seconds.

AlphaGo Zero took a few days to learn its “heuristic” function from tabula rasa in contrast to Deep Blue that had a database of chess moves from Grandmasters over the years.


Deep Blue versus Garry Kasparov – Game 6 Log as released by IBM: 

Additional Reads