Peculiar Coding EndeavoursSoftware engineer. Algorithm & data structures nut. On my way to learning AI. Eternal student.
https://www.peculiar-coding-endeavours.com/
Tue, 02 May 2023 13:17:18 +0000Tue, 02 May 2023 13:17:18 +0000Jekyll v3.9.3Traveling salesman problem<p>The traveling salesman problem is one of those annoying interview questions you sometimes get hurled at you, accompanied by “How would you solve this”? I remember a time when I was a fresh hatchling straight out of college, finding myself in that exact situation. I’ll spare you the details, but I can tell you I didn’t get particularly far 😉</p>
<p>Times change, though, and over the years I grew into the type of software engineer that definitely prefers to work with data structures and algorithms over the next “best” framework used to make text and images appear in a web browser. If that happens to be your cup of tea, don’t get too offended; I’m an UI-ignorant back-end kind of guy, and you will undoubtedly run circles around me if it comes to any
front-end related tech.</p>
<p><img src="https://www.peculiar-coding-endeavours.com/assets/tsp/salesmen.jpg" alt="salesmen" /></p>
<h3 id="some-solutions">Some solutions…</h3>
<p>Now, quite a few years later, I thought it’d be a nice moment to provide a few potential solutions, and give you a little bit of a summary regarding how to go about it. I assume you know what the <a href="https://en.wikipedia.org/wiki/Travelling_salesman_problem" target="_blank">Traveling Salesman Problem</a> entails, at least from a high level. In short, the goal is to find the Hamiltonian cycle in an undirected weighted graph with the lowest cost. It’s one of those nice annoying NP-hard problems, so more than enough ways to have fun with it. It has many applications, from planning, logistics to manufacturing microchips. In this article, I’ll go over 4 example implementations of possible solutions to the problem:</p>
<ul>
<li>a simple brute force algorithm, guaranteed to give you the optimal shortest distance, but obviously horribly slow</li>
<li>another guaranteed optimal solution, but quite a bit more efficient, using dynamic programming</li>
<li>an approximation of a solution using genetic algorithms</li>
<li>another approximation, using ant colony optimization</li>
</ul>
<p>I’ve been interested in genetic algorithms and ant colony optimization for a while now, and the traveling salesman problem is a nice use-case to apply these algorithms to. In case you are not familiar with one or both of them, you can find more information about <a href="https://en.wikipedia.org/wiki/Genetic_algorithm">Genetic algorithms</a> and <a href="https://en.wikipedia.org/wiki/Ant_colony_optimization_algorithms">Ant Colony Optimization</a> all over the internet. Or you can ask ChatGPT, apparently that’s quite the trend lately. There are several other ways to go about finding solutions, like simulated annealing, n-opt and several of its variations, but I wanted to stick to what triggered my curiosity the most.</p>
<h3 id="the-project">The project</h3>
<p>The full code of the project is available on my <a href="https://github.com/tomvanschaijk/travelingsalesman" target="_blank">GitHub</a>. The requirements to run it are pretty basic. It’s all Python code, and I mainly use PyGame, Numpy, Asyncio and Aiostream. Just install the requirements in requirements.txt and you’ll be set. You can preview the end result right here:</p>
<div class="embed-container">
<iframe width="640" height="390" src="https://www.youtube.com/embed/XCZSwM--vCA" frameborder="0" allowfullscreen=""></iframe>
</div>
<style>
.embed-container {
position: relative;
padding-bottom: 56.25%;
height: 0;
overflow: hidden;
max-width: 100%;
}
.embed-container iframe,
.embed-container object,
.embed-container embed {
position: absolute;
top: 0;
left: 0;
width: 100%;
height: 100%;
}
</style>
<p>In fact, give that a quick gander, keep it open in your browser to get back to every now and then, and the rest of the article will quickly become clear. Obviously, cloning it yourself and running it will help your understanding even more.</p>
<p>The PyGame window that pops up when running shows you 5 panes. The top center one is the one where you can left-click to add points. These points will be the destinations our supposed salesman will have to visit. The first point you create will be green, and will be the starting point and end destination. All other blue ones are the cities to be visited before getting back to the starting point. Hitting the spacebar will reset the screen, enter starts the algorithm. The distances between points are simply the Euclidean distance in pixels between the points.</p>
<p>Since brute force and dynamic programming solutions are inherently slow for anything more than a few points, you’ll notice there’s a cut-off point where those algorithms are not being taken along for the ride anymore. It would simply take too much time, as the time complexity for both just explodes when the amount of points gets too high. Dynamic programming will be used for a bit longer than brute force, but when you start getting in the double digits in terms of destination count, both will not be executed anymore.</p>
<p>In case you do follow along in the code and check out the GitHub project, there aren’t a lot of files of interest:</p>
<ul>
<li>main.py: nothing you wouldn’t expect. The only code of slight interest there is how to run the 4 algorithms concurrently. All the rest is setup for PyGame, running the main “game loop”</li>
<li>graph.py: contains an implementation of an undirected weighted graph, slightly tailored to the current problem at hand</li>
<li>the folder /solvers contains the implementations of the 4 algorithms. The rest of the article will mostly focus on those</li>
</ul>
<h3 id="processing-the-results">Processing the results</h3>
<p>As stated earlier, the main.py file doesn’t contain much regarding the actual problem. However, maybe one little thing to touch on is how the 4 algorithms are executed concurrently, and results are processed as they come in.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">algorithms</span> <span class="o">=</span> <span class="p">[</span><span class="n">algorithm</span> <span class="k">for</span> <span class="n">algorithm</span> <span class="ow">in</span> <span class="bp">self</span><span class="p">.</span><span class="n">__algorithms</span>
<span class="k">if</span> <span class="n">algorithm</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span><span class="p">]</span>
<span class="n">zipped</span> <span class="o">=</span> <span class="n">stream</span><span class="p">.</span><span class="n">ziplatest</span><span class="p">(</span><span class="o">*</span><span class="n">algorithms</span><span class="p">)</span>
<span class="n">merged</span> <span class="o">=</span> <span class="n">stream</span><span class="p">.</span><span class="nb">map</span><span class="p">(</span><span class="n">zipped</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="nb">dict</span><span class="p">(</span><span class="nb">enumerate</span><span class="p">(</span><span class="n">x</span><span class="p">)))</span>
<span class="k">async</span> <span class="k">with</span> <span class="n">merged</span><span class="p">.</span><span class="n">stream</span><span class="p">()</span> <span class="k">as</span> <span class="n">streamer</span><span class="p">:</span>
<span class="k">async</span> <span class="k">for</span> <span class="n">resultset</span> <span class="ow">in</span> <span class="n">streamer</span><span class="p">:</span>
</code></pre></div></div>
<p>To explain what goes on there, we’ll consider the following simple example:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">asyncio</span>
<span class="kn">from</span> <span class="nn">aiostream</span> <span class="kn">import</span> <span class="n">stream</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">process_results_interleaved</span><span class="p">(</span><span class="n">races</span><span class="p">):</span>
<span class="n">combine</span> <span class="o">=</span> <span class="n">stream</span><span class="p">.</span><span class="n">merge</span><span class="p">(</span><span class="o">*</span><span class="n">races</span><span class="p">)</span>
<span class="k">async</span> <span class="k">with</span> <span class="n">combine</span><span class="p">.</span><span class="n">stream</span><span class="p">()</span> <span class="k">as</span> <span class="n">streamer</span><span class="p">:</span>
<span class="k">async</span> <span class="k">for</span> <span class="n">item</span> <span class="ow">in</span> <span class="n">streamer</span><span class="p">:</span>
<span class="k">print</span><span class="p">(</span><span class="n">item</span><span class="p">)</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">process_results_packet</span><span class="p">(</span><span class="n">races</span><span class="p">):</span>
<span class="n">zipped</span> <span class="o">=</span> <span class="n">stream</span><span class="p">.</span><span class="n">ziplatest</span><span class="p">(</span><span class="o">*</span><span class="n">races</span><span class="p">)</span>
<span class="n">merged</span> <span class="o">=</span> <span class="n">stream</span><span class="p">.</span><span class="nb">map</span><span class="p">(</span><span class="n">zipped</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="nb">dict</span><span class="p">(</span><span class="nb">enumerate</span><span class="p">(</span><span class="n">x</span><span class="p">)))</span>
<span class="k">async</span> <span class="k">with</span> <span class="n">merged</span><span class="p">.</span><span class="n">stream</span><span class="p">()</span> <span class="k">as</span> <span class="n">streamer</span><span class="p">:</span>
<span class="k">async</span> <span class="k">for</span> <span class="n">resultset</span> <span class="ow">in</span> <span class="n">streamer</span><span class="p">:</span>
<span class="k">print</span><span class="p">(</span><span class="n">resultset</span><span class="p">)</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">race</span><span class="p">(</span><span class="n">racer</span><span class="p">,</span> <span class="n">sleep_time</span><span class="p">,</span> <span class="n">checkpoints</span><span class="p">):</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">checkpoints</span> <span class="o">+</span> <span class="mi">1</span><span class="p">):</span>
<span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="n">sleep_time</span><span class="p">)</span>
<span class="k">if</span> <span class="n">i</span> <span class="o">==</span> <span class="n">checkpoints</span><span class="p">:</span>
<span class="k">yield</span> <span class="sa">f</span><span class="s">"</span><span class="si">{</span><span class="n">racer</span><span class="si">}</span><span class="s"> finished!"</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">yield</span> <span class="sa">f</span><span class="s">"</span><span class="si">{</span><span class="n">racer</span><span class="si">}</span><span class="s"> hits checkpoint: </span><span class="si">{</span><span class="n">i</span><span class="si">}</span><span class="s">"</span>
<span class="k">def</span> <span class="nf">create_races</span><span class="p">(</span><span class="n">checkpoints</span><span class="p">):</span>
<span class="k">return</span> <span class="p">[</span><span class="n">race</span><span class="p">(</span><span class="s">"Turtle"</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="n">checkpoints</span><span class="p">),</span>
<span class="n">race</span><span class="p">(</span><span class="s">"Hare"</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">checkpoints</span><span class="p">),</span>
<span class="n">race</span><span class="p">(</span><span class="s">"Dragster"</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">,</span> <span class="n">checkpoints</span><span class="p">)]</span>
<span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
<span class="n">checkpoints</span> <span class="o">=</span> <span class="mi">10</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Starting race, processing results as they come in from each contestant:"</span><span class="p">)</span>
<span class="n">races</span> <span class="o">=</span> <span class="n">create_races</span><span class="p">(</span><span class="n">checkpoints</span><span class="p">)</span>
<span class="n">asyncio</span><span class="p">.</span><span class="n">run</span><span class="p">(</span><span class="n">process_results_interleaved</span><span class="p">(</span><span class="n">races</span><span class="p">))</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Starting race, processing results in packets:"</span><span class="p">)</span>
<span class="n">races</span> <span class="o">=</span> <span class="n">create_races</span><span class="p">(</span><span class="n">checkpoints</span><span class="p">)</span>
<span class="n">asyncio</span><span class="p">.</span><span class="n">run</span><span class="p">(</span><span class="n">process_results_packet</span><span class="p">(</span><span class="n">races</span><span class="p">))</span>
<span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="s">"__main__"</span><span class="p">:</span>
<span class="n">main</span><span class="p">()</span>
</code></pre></div></div>
<p>Just go over the code real quick, and it’ll become pretty clear what the setup is. We’ll have a race between a turtle, a hare and a fancy dragster car. They go through a number of checkpoints, and moving from one to the next simply takes them some amount of time, clumsily simulated by sleeping for an amount of time in the loop where they go through the checkpoints. The turtle takes 2 seconds to go from a checkpoint to the next, the hare takes 1 second, and the racing car only takes 300ms. Easy.</p>
<p>The whole reason behind this little fantasy is to show you how to process the results of this contest as each contestant hits a checkpoint. I first show an update every time a racer hits a checkpoint, in process_results_interleaved. Then, in process_results_packet, you’ll see that you get the full set of results whenever any of the racers hits a checkpoint. The output looks like this:</p>
<p><img src="https://www.peculiar-coding-endeavours.com/assets/tsp/race_results.png" alt="race_results" /></p>
<p>The whole point of this is that I wanted a way to combine the results of several concurrently running asynchronous generators as they come in, and, more importantly: identify exactly which result comes from which generator, without necessarily putting code in each specific generator to help with that identification. I wanted to keep the implementation of each algorithm focused on what it’s supposed to do and not tailor it to the fact that I want to use it in a scenario where I want to run several at the same time and compare them. That’s not a concern the algorithm needs to care for.</p>
<p>When you go and run the 4 different algorithms that will look for a solution to the TSP, you want to receive updates whenever some improvement is reached, or whenever something useful can be put on the screen, without polluting the generators with some identifying code to update the right result set. So the way it’s done in process_results_packet will be the method to use for running the 4 TSP algorithms together. I’ll just put the 4 algorithms in an array, start them all at the same time, and every time any one of them yields a result, I get a dictionary back with the updated results. It’s these few lines that do the trick:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="n">zipped</span> <span class="o">=</span> <span class="n">stream</span><span class="p">.</span><span class="n">ziplatest</span><span class="p">(</span><span class="o">*</span><span class="n">races</span><span class="p">)</span>
<span class="n">merged</span> <span class="o">=</span> <span class="n">stream</span><span class="p">.</span><span class="nb">map</span><span class="p">(</span><span class="n">zipped</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="nb">dict</span><span class="p">(</span><span class="nb">enumerate</span><span class="p">(</span><span class="n">x</span><span class="p">)))</span>
<span class="k">async</span> <span class="k">with</span> <span class="n">merged</span><span class="p">.</span><span class="n">stream</span><span class="p">()</span> <span class="k">as</span> <span class="n">streamer</span><span class="p">:</span>
<span class="p">...</span>
</code></pre></div></div>
<p>Quite simple and quite powerful. Ok, let’s get to the interesting parts now!</p>
<h3 id="a-brute-force-approach">A brute force approach</h3>
<p>Whenever you start implementing a non-trivial algorithm, it’s always a good idea to start with an implementation that is focused on getting the correct result, disregarding any tendency you might have to go for early optimizations. Make it as simple and clear as you can, but focus on achieving a 100% correct solution. Explore the full set of possible solutions, perform an exhaustive search and simply keep updating the candidate solution every time there’s an improvement until there are no more candidate solutions, and you just found yourself the optimal solution. It will help you understand the problem, and you can use this initial iteration as a reference to compare future solutions to, and doublecheck them for correctness.</p>
<p>In case of the TSP, implementing a brute-force solution is quite easy. We simply generate all possible permutations of the set of destinations, evaluate the full path distance from start through all nodes back to the initial node, and store the optimal path length and the sequence of nodes traveled in order to achieve it. Contrary to what I just said a sentence or two ago, I’d say some small optimizations are permitted. One such obvious thing is that a sequence of A - B - C - A is identical in length to the sequence A - C - B - A, and as such, both do not need to be evaluated. We only consider unique permutations. We obviously don’t need to continuously build the actual graph under consideration for the algorithm to work. However, I wanted to display the paths under evaluation as the algorithm does its work, so that little bit of extra effort is worth it for this project.</p>
<p>If you check the <a href="https://youtu.be/XCZSwM--vCA" target="_blank">clip</a> I linked before, you will see that during each iteration, a new path permutation (the lines in white) is being evaluated. Whenever a path is shorter than the known shortest path, that path is updated and shown in green. If you want to follow along with a bit more detail, just change the sleeping time at the end of the for-loop to something that gives you a bit more time to realize what’s occurring.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">async</span> <span class="k">def</span> <span class="nf">brute_force</span><span class="p">(</span><span class="n">graph</span><span class="p">:</span> <span class="n">Graph</span><span class="p">,</span> <span class="n">distances</span><span class="p">:</span> <span class="nb">dict</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">],</span> <span class="nb">int</span><span class="p">])</span> <span class="o">-></span> <span class="n">AsyncIterator</span><span class="p">[</span><span class="n">AlgorithmResult</span><span class="p">]:</span>
<span class="s">"""Solve the TSP problem with a brute force implementation, running through all permutations"""</span>
<span class="n">unique_permutations</span> <span class="o">=</span> <span class="nb">set</span><span class="p">()</span>
<span class="n">paths_evaluated</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">evaluations_until_solved</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">start</span> <span class="o">=</span> <span class="n">graph</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">key</span>
<span class="n">keys</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">graph</span><span class="p">.</span><span class="n">nodes</span><span class="p">.</span><span class="n">keys</span><span class="p">())</span>
<span class="n">sub_keys</span> <span class="o">=</span> <span class="n">keys</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span>
<span class="k">for</span> <span class="n">permutation</span> <span class="ow">in</span> <span class="n">permutations</span><span class="p">(</span><span class="n">sub_keys</span><span class="p">):</span>
<span class="n">permutation</span> <span class="o">=</span> <span class="p">(</span><span class="n">start</span><span class="p">,)</span> <span class="o">+</span> <span class="n">permutation</span> <span class="o">+</span> <span class="p">(</span><span class="n">start</span><span class="p">,)</span>
<span class="k">if</span> <span class="n">permutation</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">unique_permutations</span> <span class="ow">and</span> <span class="n">permutation</span><span class="p">[::</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">unique_permutations</span><span class="p">:</span>
<span class="n">paths_evaluated</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="n">unique_permutations</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">permutation</span><span class="p">)</span>
<span class="n">current_path_length</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">start</span>
<span class="n">vertices</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">]]</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">key</span> <span class="ow">in</span> <span class="n">permutation</span><span class="p">[</span><span class="mi">1</span><span class="p">:]:</span>
<span class="n">distance</span> <span class="o">=</span> <span class="n">distances</span><span class="p">[(</span><span class="n">node</span><span class="p">,</span> <span class="n">key</span><span class="p">)]</span>
<span class="n">current_path_length</span> <span class="o">+=</span> <span class="n">distance</span>
<span class="n">vertices</span><span class="p">.</span><span class="n">append</span><span class="p">((</span><span class="n">node</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">distance</span><span class="p">))</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">key</span>
<span class="n">graph</span><span class="p">.</span><span class="n">remove_vertices</span><span class="p">()</span>
<span class="n">graph</span><span class="p">.</span><span class="n">add_vertices</span><span class="p">(</span><span class="n">vertices</span><span class="p">)</span>
<span class="k">if</span> <span class="n">current_path_length</span> <span class="o"><</span> <span class="n">graph</span><span class="p">.</span><span class="n">optimal_cycle_length</span><span class="p">:</span>
<span class="n">graph</span><span class="p">.</span><span class="n">optimal_cycle</span> <span class="o">=</span> <span class="n">ShortestPath</span><span class="p">(</span><span class="n">current_path_length</span><span class="p">,</span> <span class="n">vertices</span><span class="p">)</span>
<span class="n">evaluations_until_solved</span> <span class="o">=</span> <span class="n">paths_evaluated</span>
<span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="mf">0.0001</span><span class="p">)</span>
<span class="k">yield</span> <span class="n">AlgorithmResult</span><span class="p">(</span><span class="n">paths_evaluated</span><span class="p">,</span> <span class="n">evaluations_until_solved</span><span class="p">)</span>
<span class="n">graph</span><span class="p">.</span><span class="n">remove_vertices</span><span class="p">()</span>
<span class="n">graph</span><span class="p">.</span><span class="n">add_vertices</span><span class="p">(</span><span class="n">graph</span><span class="p">.</span><span class="n">optimal_cycle</span><span class="p">.</span><span class="n">vertices</span><span class="p">)</span>
<span class="k">yield</span> <span class="n">AlgorithmResult</span><span class="p">(</span><span class="n">paths_evaluated</span><span class="p">,</span> <span class="n">evaluations_until_solved</span><span class="p">)</span>
</code></pre></div></div>
<p>And there you have it. Problem solved! Well, kind of. For an ideal path between, let’s say, 5 or so points, I guess this brute force approach is still ok. You’ll notice, though, that when you have 8 or more points, things become very painful. Since we check all permutations, the runtime of the brute force algorithm is N! since we want to consider every possible vertex between each pair of nodes. Quite horrific, but at least we now have a way to test other, more optimal solutions for correctness.</p>
<h3 id="dynamic-programming-to-the-rescue">Dynamic programming to the rescue</h3>
<p>That brings us to an approach we’ve all used as a first optimization for so many algorithmic problems: dynamic programming. That will bring the time complexity down from O(n!) to O(n²2n). For the more curious readers that couldn’t help but scroll down instead of reading this, you’ve noticed that this requires quite a bit more code than our naive brute force approach. This already hammers home the statement about not diving into optimization too early. It’s easy to make a small mistake, and depending on how involved a solution you start to implement beyond brute force tactics, it can be quite cumbersome to debug.</p>
<p>The specific implementation I went for is one such example. As often the case when using dynamic programming, memoization is a big help here. But not only that, you also notice some bit-wise operations going on to further improve the speed of this solution. I will explain the steps we go through in this approach as best I can.</p>
<p>The overall approach here is that we will no longer consider all possible full paths through the set of nodes until we find the shortest one. As we do in dynamic programming, we solve a subset of the problem and keep expanding on it until we find a solution to the full problem. During this expansion, we use previously achieved results to avoid doing double work, using memoization. Here’s the full code (for all you copy-paste problem solvers out there 😉), read through it, and then I’ll focus on some specific parts. Just in case dynamic programming is a new concept to you, check out some articles or video tutorials online, maybe comparing greedy algorithms to dynamic programming for problems like the knapsack problem or coin change problem. That should help you get it down. The full code:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">async</span> <span class="k">def</span> <span class="nf">dynamic_programming</span><span class="p">(</span><span class="n">graph</span><span class="p">:</span> <span class="n">Graph</span><span class="p">,</span> <span class="n">distances</span><span class="p">:</span> <span class="nb">dict</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">],</span> <span class="nb">int</span><span class="p">]</span>
<span class="p">)</span> <span class="o">-></span> <span class="n">AsyncIterator</span><span class="p">[</span><span class="n">AlgorithmResult</span><span class="p">]:</span>
<span class="s">"""Solve the TSP problem with dynamic programming"""</span>
<span class="n">node_count</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">graph</span><span class="p">)</span>
<span class="n">start</span> <span class="o">=</span> <span class="n">graph</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">key</span>
<span class="n">memo</span> <span class="o">=</span> <span class="p">[[</span><span class="mi">0</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span> <span class="o"><<</span> <span class="n">node_count</span><span class="p">)]</span> <span class="k">for</span> <span class="n">__</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">node_count</span><span class="p">)]</span>
<span class="n">optimal_cycle</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">optimal_cycle_length</span> <span class="o">=</span> <span class="n">maxsize</span>
<span class="n">cycles_evaluated</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">evaluations_until_solved</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">memo</span> <span class="o">=</span> <span class="n">setup</span><span class="p">(</span><span class="n">memo</span><span class="p">,</span> <span class="n">graph</span><span class="p">,</span> <span class="n">distances</span><span class="p">,</span> <span class="n">start</span><span class="p">)</span>
<span class="k">for</span> <span class="n">nodes_in_subcycle</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="n">node_count</span> <span class="o">+</span> <span class="mi">1</span><span class="p">):</span>
<span class="k">for</span> <span class="n">subcycle</span> <span class="ow">in</span> <span class="n">initialize_combinations</span><span class="p">(</span><span class="n">nodes_in_subcycle</span><span class="p">,</span> <span class="n">node_count</span><span class="p">):</span>
<span class="k">if</span> <span class="n">is_not_in</span><span class="p">(</span><span class="n">start</span><span class="p">,</span> <span class="n">subcycle</span><span class="p">):</span>
<span class="k">continue</span>
<span class="n">cycles_evaluated</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="c1"># Look for the best next node to attach to the cycle
</span> <span class="k">for</span> <span class="n">next_node</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">node_count</span><span class="p">):</span>
<span class="k">if</span> <span class="n">next_node</span> <span class="o">==</span> <span class="n">start</span> <span class="ow">or</span> <span class="n">is_not_in</span><span class="p">(</span><span class="n">next_node</span><span class="p">,</span> <span class="n">subcycle</span><span class="p">):</span>
<span class="k">continue</span>
<span class="n">subcycle_without_next_node</span> <span class="o">=</span> <span class="n">subcycle</span> <span class="o">^</span> <span class="p">(</span><span class="mi">1</span> <span class="o"><<</span> <span class="n">next_node</span><span class="p">)</span>
<span class="n">min_cycle_length</span> <span class="o">=</span> <span class="n">maxsize</span>
<span class="k">for</span> <span class="n">last_node</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">node_count</span><span class="p">):</span>
<span class="k">if</span> <span class="p">(</span><span class="n">last_node</span> <span class="o">==</span> <span class="n">start</span> <span class="ow">or</span> <span class="n">last_node</span> <span class="o">==</span> <span class="n">next_node</span> <span class="ow">or</span> <span class="n">is_not_in</span><span class="p">(</span><span class="n">last_node</span><span class="p">,</span> <span class="n">subcycle</span><span class="p">)):</span>
<span class="k">continue</span>
<span class="n">new_cycle_length</span> <span class="o">=</span> <span class="p">(</span><span class="n">memo</span><span class="p">[</span><span class="n">last_node</span><span class="p">][</span><span class="n">subcycle_without_next_node</span><span class="p">]</span> <span class="o">+</span> <span class="n">distances</span><span class="p">[(</span><span class="n">last_node</span><span class="p">,</span> <span class="n">next_node</span><span class="p">)])</span>
<span class="k">if</span> <span class="n">new_cycle_length</span> <span class="o"><</span> <span class="n">min_cycle_length</span><span class="p">:</span>
<span class="n">min_cycle_length</span> <span class="o">=</span> <span class="n">new_cycle_length</span>
<span class="n">memo</span><span class="p">[</span><span class="n">next_node</span><span class="p">][</span><span class="n">subcycle</span><span class="p">]</span> <span class="o">=</span> <span class="n">min_cycle_length</span>
<span class="n">evaluations_until_solved</span> <span class="o">=</span> <span class="n">cycles_evaluated</span>
<span class="n">optimal_cycle_length</span> <span class="o">=</span> <span class="n">calculate_optimal_cycle_length</span><span class="p">(</span><span class="n">start</span><span class="p">,</span> <span class="n">nodes_in_subcycle</span><span class="p">,</span> <span class="n">memo</span><span class="p">,</span> <span class="n">distances</span><span class="p">)</span>
<span class="n">optimal_cycle</span> <span class="o">=</span> <span class="n">find_optimal_cycle</span><span class="p">(</span><span class="n">start</span><span class="p">,</span> <span class="n">nodes_in_subcycle</span><span class="p">,</span> <span class="n">memo</span><span class="p">,</span> <span class="n">distances</span><span class="p">)</span>
<span class="n">vertices</span> <span class="o">=</span> <span class="n">create_vertices</span><span class="p">(</span><span class="n">optimal_cycle</span><span class="p">,</span> <span class="n">distances</span><span class="p">)</span>
<span class="n">graph</span><span class="p">.</span><span class="n">optimal_cycle</span> <span class="o">=</span> <span class="n">ShortestPath</span><span class="p">(</span><span class="n">optimal_cycle_length</span><span class="p">,</span> <span class="n">vertices</span><span class="p">)</span>
<span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="mf">0.0001</span><span class="p">)</span>
<span class="k">yield</span> <span class="n">AlgorithmResult</span><span class="p">(</span><span class="n">cycles_evaluated</span><span class="p">,</span> <span class="n">evaluations_until_solved</span><span class="p">)</span>
<span class="n">optimal_cycle_length</span> <span class="o">=</span> <span class="n">calculate_optimal_cycle_length</span><span class="p">(</span><span class="n">start</span><span class="p">,</span> <span class="n">node_count</span><span class="p">,</span> <span class="n">memo</span><span class="p">,</span> <span class="n">distances</span><span class="p">)</span>
<span class="n">optimal_cycle</span> <span class="o">=</span> <span class="n">find_optimal_cycle</span><span class="p">(</span><span class="n">start</span><span class="p">,</span> <span class="n">node_count</span><span class="p">,</span> <span class="n">memo</span><span class="p">,</span> <span class="n">distances</span><span class="p">)</span>
<span class="n">vertices</span> <span class="o">=</span> <span class="n">create_vertices</span><span class="p">(</span><span class="n">optimal_cycle</span><span class="p">,</span> <span class="n">distances</span><span class="p">)</span>
<span class="n">graph</span><span class="p">.</span><span class="n">remove_vertices</span><span class="p">()</span>
<span class="n">graph</span><span class="p">.</span><span class="n">add_vertices</span><span class="p">(</span><span class="n">vertices</span><span class="p">)</span>
<span class="n">graph</span><span class="p">.</span><span class="n">optimal_cycle</span> <span class="o">=</span> <span class="n">ShortestPath</span><span class="p">(</span><span class="n">optimal_cycle_length</span><span class="p">,</span> <span class="n">vertices</span><span class="p">)</span>
<span class="k">yield</span> <span class="n">AlgorithmResult</span><span class="p">(</span><span class="n">cycles_evaluated</span><span class="p">,</span> <span class="n">evaluations_until_solved</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">setup</span><span class="p">(</span><span class="n">memo</span><span class="p">:</span> <span class="nb">list</span><span class="p">,</span> <span class="n">graph</span><span class="p">:</span> <span class="n">Graph</span><span class="p">,</span> <span class="n">distances</span><span class="p">:</span> <span class="nb">dict</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">],</span> <span class="nb">int</span><span class="p">],</span> <span class="n">start</span><span class="p">:</span> <span class="nb">int</span><span class="p">)</span> <span class="o">-></span> <span class="nb">list</span><span class="p">:</span>
<span class="s">"""Prepare the array used for memoization during the dynamic programming algorithm"""</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">node</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">graph</span><span class="p">):</span>
<span class="k">if</span> <span class="n">start</span> <span class="o">==</span> <span class="n">node</span><span class="p">.</span><span class="n">key</span><span class="p">:</span>
<span class="k">continue</span>
<span class="n">memo</span><span class="p">[</span><span class="n">i</span><span class="p">][</span><span class="mi">1</span> <span class="o"><<</span> <span class="n">start</span> <span class="o">|</span> <span class="mi">1</span> <span class="o"><<</span> <span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">distances</span><span class="p">[(</span><span class="n">start</span><span class="p">,</span> <span class="n">i</span><span class="p">)]</span>
<span class="k">return</span> <span class="n">memo</span>
<span class="k">def</span> <span class="nf">initialize_combinations</span><span class="p">(</span><span class="n">nodes_in_subcycle</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">node_count</span><span class="p">:</span> <span class="nb">int</span><span class="p">)</span> <span class="o">-></span> <span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">]:</span>
<span class="s">"""Initialize the combinations to consider in the next step of the algorithm"""</span>
<span class="n">subcycle_list</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">initialize_combination</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">nodes_in_subcycle</span><span class="p">,</span> <span class="n">node_count</span><span class="p">,</span> <span class="n">subcycle_list</span><span class="p">)</span>
<span class="k">return</span> <span class="n">subcycle_list</span>
<span class="k">def</span> <span class="nf">initialize_combination</span><span class="p">(</span><span class="n">subcycle</span><span class="p">,</span> <span class="n">at</span><span class="p">,</span> <span class="n">nodes_in_subcycle</span><span class="p">,</span> <span class="n">node_count</span><span class="p">,</span> <span class="n">subcycle_list</span><span class="p">)</span> <span class="o">-></span> <span class="bp">None</span><span class="p">:</span>
<span class="s">"""Initialize the combination to consider in the next step of the algorithm"""</span>
<span class="n">elements_left_to_pick</span> <span class="o">=</span> <span class="n">node_count</span> <span class="o">-</span> <span class="n">at</span>
<span class="k">if</span> <span class="n">elements_left_to_pick</span> <span class="o"><</span> <span class="n">nodes_in_subcycle</span><span class="p">:</span>
<span class="k">return</span>
<span class="k">if</span> <span class="n">nodes_in_subcycle</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">subcycle_list</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">subcycle</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">at</span><span class="p">,</span> <span class="n">node_count</span><span class="p">):</span>
<span class="n">subcycle</span> <span class="o">|=</span> <span class="mi">1</span> <span class="o"><<</span> <span class="n">i</span>
<span class="n">initialize_combination</span><span class="p">(</span><span class="n">subcycle</span><span class="p">,</span> <span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="n">nodes_in_subcycle</span> <span class="o">-</span> <span class="mi">1</span><span class="p">,</span>
<span class="n">node_count</span><span class="p">,</span> <span class="n">subcycle_list</span><span class="p">)</span>
<span class="n">subcycle</span> <span class="o">&=</span> <span class="o">~</span><span class="p">(</span><span class="mi">1</span> <span class="o"><<</span> <span class="n">i</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">is_not_in</span><span class="p">(</span><span class="n">index</span><span class="p">,</span> <span class="n">subcycle</span><span class="p">)</span> <span class="o">-></span> <span class="nb">bool</span><span class="p">:</span>
<span class="s">"""Checks if the bit at the given index is a 0"""</span>
<span class="k">return</span> <span class="p">((</span><span class="mi">1</span> <span class="o"><<</span> <span class="n">index</span><span class="p">)</span> <span class="o">&</span> <span class="n">subcycle</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span>
<span class="k">def</span> <span class="nf">calculate_optimal_cycle_length</span><span class="p">(</span><span class="n">start</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">node_count</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">memo</span><span class="p">:</span> <span class="nb">list</span><span class="p">,</span>
<span class="n">distances</span><span class="p">:</span> <span class="nb">dict</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">],</span> <span class="nb">int</span><span class="p">])</span> <span class="o">-></span> <span class="nb">int</span><span class="p">:</span>
<span class="s">"""Calculate the optimal cycle length"""</span>
<span class="n">end</span> <span class="o">=</span> <span class="p">(</span><span class="mi">1</span> <span class="o"><<</span> <span class="n">node_count</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span>
<span class="n">optimal_cycle_length</span> <span class="o">=</span> <span class="n">maxsize</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">node_count</span><span class="p">):</span>
<span class="k">if</span> <span class="n">i</span> <span class="o">==</span> <span class="n">start</span><span class="p">:</span>
<span class="k">continue</span>
<span class="n">cycle_cost</span> <span class="o">=</span> <span class="n">memo</span><span class="p">[</span><span class="n">i</span><span class="p">][</span><span class="n">end</span><span class="p">]</span> <span class="o">+</span> <span class="n">distances</span><span class="p">[(</span><span class="n">i</span><span class="p">,</span> <span class="n">start</span><span class="p">)]</span>
<span class="k">if</span> <span class="n">cycle_cost</span> <span class="o"><</span> <span class="n">optimal_cycle_length</span><span class="p">:</span>
<span class="n">optimal_cycle_length</span> <span class="o">=</span> <span class="n">cycle_cost</span>
<span class="k">return</span> <span class="n">optimal_cycle_length</span>
<span class="k">def</span> <span class="nf">find_optimal_cycle</span><span class="p">(</span><span class="n">start</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">node_count</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">memo</span><span class="p">:</span> <span class="nb">list</span><span class="p">,</span>
<span class="n">distances</span><span class="p">:</span> <span class="nb">dict</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">],</span> <span class="nb">int</span><span class="p">]):</span>
<span class="s">"""Recreate the optimal cycle"""</span>
<span class="n">last_index</span> <span class="o">=</span> <span class="n">start</span>
<span class="n">state</span> <span class="o">=</span> <span class="p">(</span><span class="mi">1</span> <span class="o"><<</span> <span class="n">node_count</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span>
<span class="n">optimal_cycle</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">]</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">node_count</span> <span class="o">-</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">):</span>
<span class="n">index</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span>
<span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">node_count</span><span class="p">):</span>
<span class="k">if</span> <span class="n">j</span> <span class="o">==</span> <span class="n">start</span> <span class="ow">or</span> <span class="n">is_not_in</span><span class="p">(</span><span class="n">j</span><span class="p">,</span> <span class="n">state</span><span class="p">):</span>
<span class="k">continue</span>
<span class="k">if</span> <span class="n">index</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">:</span>
<span class="n">index</span> <span class="o">=</span> <span class="n">j</span>
<span class="n">prev_cycle_length</span> <span class="o">=</span> <span class="n">memo</span><span class="p">[</span><span class="n">index</span><span class="p">][</span><span class="n">state</span><span class="p">]</span> <span class="o">+</span> <span class="n">distances</span><span class="p">[(</span><span class="n">index</span><span class="p">,</span> <span class="n">last_index</span><span class="p">)]</span>
<span class="n">new_cycle_length</span> <span class="o">=</span> <span class="n">memo</span><span class="p">[</span><span class="n">j</span><span class="p">][</span><span class="n">state</span><span class="p">]</span> <span class="o">+</span> <span class="n">distances</span><span class="p">[(</span><span class="n">j</span><span class="p">,</span> <span class="n">last_index</span><span class="p">)]</span>
<span class="k">if</span> <span class="n">new_cycle_length</span> <span class="o"><</span> <span class="n">prev_cycle_length</span><span class="p">:</span>
<span class="n">index</span> <span class="o">=</span> <span class="n">j</span>
<span class="n">optimal_cycle</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">index</span><span class="p">)</span>
<span class="n">state</span> <span class="o">=</span> <span class="n">state</span> <span class="o">^</span> <span class="p">(</span><span class="mi">1</span> <span class="o"><<</span> <span class="n">index</span><span class="p">)</span>
<span class="n">last_index</span> <span class="o">=</span> <span class="n">index</span>
<span class="n">optimal_cycle</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">start</span><span class="p">)</span>
<span class="n">optimal_cycle</span><span class="p">.</span><span class="n">reverse</span><span class="p">()</span>
<span class="n">optimal_cycle</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">start</span><span class="p">)</span>
<span class="k">return</span> <span class="n">optimal_cycle</span>
<span class="k">def</span> <span class="nf">create_vertices</span><span class="p">(</span><span class="n">optimal_cycle</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">],</span> <span class="n">distances</span><span class="p">:</span> <span class="nb">dict</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">],</span> <span class="nb">int</span><span class="p">]</span>
<span class="p">)</span> <span class="o">-></span> <span class="nb">list</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">]]:</span>
<span class="s">"""Transform the list of visited node keys to something our graph can work with"""</span>
<span class="n">vertices</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">]]</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">optimal_cycle</span><span class="p">)):</span>
<span class="n">weight</span> <span class="o">=</span> <span class="n">distances</span><span class="p">[(</span><span class="n">optimal_cycle</span><span class="p">[</span><span class="n">i</span> <span class="o">-</span> <span class="mi">1</span><span class="p">],</span> <span class="n">optimal_cycle</span><span class="p">[</span><span class="n">i</span><span class="p">])]</span>
<span class="n">vertices</span><span class="p">.</span><span class="n">append</span><span class="p">((</span><span class="n">optimal_cycle</span><span class="p">[</span><span class="n">i</span> <span class="o">-</span> <span class="mi">1</span><span class="p">],</span> <span class="n">optimal_cycle</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">weight</span><span class="p">))</span>
<span class="k">return</span> <span class="n">vertices</span>
</code></pre></div></div>
<p>Again, in case you want to understand the steps at a high level, change the sleep time to about a second or so, and you will see what happens. Starting with a cycle of 2 nodes, building up to a cycle including all the points, we find the shortest possible path using the points we include in the cycle. Whenever an optimal solution is found for 2 points, we find an optimal solution for those 2 points plus 1 more, reusing what we have already learned so far. We keep doing that until the full set of nodes has been traveled. It’s really quite beautiful to see in action, in my opinion. How to achieve it, though?</p>
<h4 id="getting-the-memoization-matrix-set-up">Getting the memoization matrix set up</h4>
<p>We start out by preparing our memoization matrix:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">memo</span> <span class="o">=</span> <span class="p">[[</span><span class="mi">0</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span> <span class="o"><<</span> <span class="n">node_count</span><span class="p">)]</span> <span class="k">for</span> <span class="n">__</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">node_count</span><span class="p">)]</span>
<span class="p">...</span>
<span class="k">def</span> <span class="nf">setup</span><span class="p">(</span><span class="n">memo</span><span class="p">:</span> <span class="nb">list</span><span class="p">,</span> <span class="n">graph</span><span class="p">:</span> <span class="n">Graph</span><span class="p">,</span> <span class="n">distances</span><span class="p">:</span> <span class="nb">dict</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">],</span> <span class="nb">int</span><span class="p">],</span> <span class="n">start</span><span class="p">:</span> <span class="nb">int</span><span class="p">)</span> <span class="o">-></span> <span class="nb">list</span><span class="p">:</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">node</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">graph</span><span class="p">):</span>
<span class="k">if</span> <span class="n">start</span> <span class="o">==</span> <span class="n">node</span><span class="p">.</span><span class="n">key</span><span class="p">:</span>
<span class="k">continue</span>
<span class="n">memo</span><span class="p">[</span><span class="n">i</span><span class="p">][</span><span class="mi">1</span> <span class="o"><<</span> <span class="n">start</span> <span class="o">|</span> <span class="mi">1</span> <span class="o"><<</span> <span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">distances</span><span class="p">[(</span><span class="n">start</span><span class="p">,</span> <span class="n">i</span><span class="p">)]</span>
<span class="k">return</span> <span class="n">memo</span>
</code></pre></div></div>
<p>Here, I create a matrix with a number of rows equal to the number of nodes we have and columns equal to 2 raised to the power of the number of nodes. So for 5 nodes, that’s a 5x32 matrix. As you may know, raising 2 to the power of N equals shifting a 1-bit to the left N times, since a move to the left doubles the binary value. It’s just faster using bit shifting, simple as that. We loop through each node, skipping the start node, and we store the optimal distance from the start node to each other node.</p>
<h4 id="other-helper-functions">Other helper functions</h4>
<p>Besides the preparation of the memoization matrix, other functions such as find_optimal_cycle and create_vertices are actually not terribly complicated or interesting. If we keep in mind that, during dynamic programming, we build on top of earlier achieved results (optimal paths through n nodes) to get the optimal result for n+1 nodes, it’s not hard to imagine we will need to consider a new set of cycles and nodes to evaluate whether adding a certain node to our cycle yields a more optimal results than adding another. Beyond that, in the visualization of the brute force algorithm ,you noticed we kept on drawing the vertices under consideration. For the dynamic programming solution, we simply want to build the new optimal cycle and put it on the screen, so we have some functions for that. Stepping through them with a small number of nodes will make it very clear what they do. Be sure to brush off those bit manipulator operations first 😉</p>
<h4 id="just-one-more-thing">Just one more thing</h4>
<p>Maybe a bit of explanation on the initialize_combinations and initialize_combination methods: initialize_combinations will fill the subcycle_list variable using the initialize_combination method, that much is obvious. initialize_combination is a recursive method to generate bit sets, starting from an empty set (0). From an empty set, we want to set nodes_in_subcycle out of node_count bits to be 1 for all possible combinations. We keep track of which index position we’re at (indicated by variable i), try to set it to 1 and keep moving forward. At the end of that, we should have exactly node_count bits. If we don’t, we backtrack and flip off (not like that) the i-th bit and move to the next position. This is a classic backtracking problem, and if you want to learn more about it, look for backtracking tutorials for power sets.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">initialize_combinations</span><span class="p">(</span><span class="n">nodes_in_subcycle</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">node_count</span><span class="p">:</span> <span class="nb">int</span><span class="p">)</span> <span class="o">-></span> <span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">]:</span>
<span class="s">"""Initialize the combinations to consider in the next step of the algorithm"""</span>
<span class="n">subcycle_list</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">initialize_combination</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">nodes_in_subcycle</span><span class="p">,</span> <span class="n">node_count</span><span class="p">,</span> <span class="n">subcycle_list</span><span class="p">)</span>
<span class="k">return</span> <span class="n">subcycle_list</span>
<span class="k">def</span> <span class="nf">initialize_combination</span><span class="p">(</span><span class="n">subcycle</span><span class="p">,</span> <span class="n">at</span><span class="p">,</span> <span class="n">nodes_in_subcycle</span><span class="p">,</span> <span class="n">node_count</span><span class="p">,</span> <span class="n">subcycle_list</span><span class="p">)</span> <span class="o">-></span> <span class="bp">None</span><span class="p">:</span>
<span class="s">"""Initialize the combination to consider in the next step of the algorithm"""</span>
<span class="n">elements_left_to_pick</span> <span class="o">=</span> <span class="n">node_count</span> <span class="o">-</span> <span class="n">at</span>
<span class="k">if</span> <span class="n">elements_left_to_pick</span> <span class="o"><</span> <span class="n">nodes_in_subcycle</span><span class="p">:</span>
<span class="k">return</span>
<span class="k">if</span> <span class="n">nodes_in_subcycle</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">subcycle_list</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">subcycle</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">at</span><span class="p">,</span> <span class="n">node_count</span><span class="p">):</span>
<span class="n">subcycle</span> <span class="o">|=</span> <span class="mi">1</span> <span class="o"><<</span> <span class="n">i</span>
<span class="n">initialize_combination</span><span class="p">(</span><span class="n">subcycle</span><span class="p">,</span> <span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="n">nodes_in_subcycle</span> <span class="o">-</span> <span class="mi">1</span><span class="p">,</span>
<span class="n">node_count</span><span class="p">,</span> <span class="n">subcycle_list</span><span class="p">)</span>
<span class="n">subcycle</span> <span class="o">&=</span> <span class="o">~</span><span class="p">(</span><span class="mi">1</span> <span class="o"><<</span> <span class="n">i</span><span class="p">)</span>
</code></pre></div></div>
<h4 id="solving-the-puzzle">Solving the puzzle</h4>
<p>This is where the actual work happens, and there’s a bit to it. Make sure the other functions are clear to you and the use of a memoization matrix in dynamic programming in general is a concept you understand.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">for</span> <span class="n">nodes_in_subcycle</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="n">node_count</span> <span class="o">+</span> <span class="mi">1</span><span class="p">):</span>
<span class="k">for</span> <span class="n">subcycle</span> <span class="ow">in</span> <span class="n">initialize_combinations</span><span class="p">(</span><span class="n">nodes_in_subcycle</span><span class="p">,</span> <span class="n">node_count</span><span class="p">):</span>
<span class="k">if</span> <span class="n">is_not_in</span><span class="p">(</span><span class="n">start</span><span class="p">,</span> <span class="n">subcycle</span><span class="p">):</span>
<span class="k">continue</span>
<span class="n">cycles_evaluated</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="c1"># Look for the best next node to attach to the cycle
</span> <span class="k">for</span> <span class="n">next_node</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">node_count</span><span class="p">):</span>
<span class="k">if</span> <span class="n">next_node</span> <span class="o">==</span> <span class="n">start</span> <span class="ow">or</span> <span class="n">is_not_in</span><span class="p">(</span><span class="n">next_node</span><span class="p">,</span> <span class="n">subcycle</span><span class="p">):</span>
<span class="k">continue</span>
<span class="n">subcycle_without_next_node</span> <span class="o">=</span> <span class="n">subcycle</span> <span class="o">^</span> <span class="p">(</span><span class="mi">1</span> <span class="o"><<</span> <span class="n">next_node</span><span class="p">)</span>
<span class="n">min_cycle_length</span> <span class="o">=</span> <span class="n">maxsize</span>
<span class="k">for</span> <span class="n">last_node</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">node_count</span><span class="p">):</span>
<span class="k">if</span> <span class="p">(</span><span class="n">last_node</span> <span class="o">==</span> <span class="n">start</span> <span class="ow">or</span> <span class="n">last_node</span> <span class="o">==</span> <span class="n">next_node</span> <span class="ow">or</span> <span class="n">is_not_in</span><span class="p">(</span><span class="n">last_node</span><span class="p">,</span> <span class="n">subcycle</span><span class="p">)):</span>
<span class="k">continue</span>
<span class="n">new_cycle_length</span> <span class="o">=</span> <span class="p">(</span><span class="n">memo</span><span class="p">[</span><span class="n">last_node</span><span class="p">][</span><span class="n">subcycle_without_next_node</span><span class="p">]</span> <span class="o">+</span> <span class="n">distances</span><span class="p">[(</span><span class="n">last_node</span><span class="p">,</span> <span class="n">next_node</span><span class="p">)])</span>
<span class="k">if</span> <span class="n">new_cycle_length</span> <span class="o"><</span> <span class="n">min_cycle_length</span><span class="p">:</span>
<span class="n">min_cycle_length</span> <span class="o">=</span> <span class="n">new_cycle_length</span>
<span class="n">memo</span><span class="p">[</span><span class="n">next_node</span><span class="p">][</span><span class="n">subcycle</span><span class="p">]</span> <span class="o">=</span> <span class="n">min_cycle_length</span>
<span class="n">evaluations_until_solved</span> <span class="o">=</span> <span class="n">cycles_evaluated</span>
<span class="n">optimal_cycle_length</span> <span class="o">=</span> <span class="n">calculate_optimal_cycle_length</span><span class="p">(</span><span class="n">start</span><span class="p">,</span> <span class="n">nodes_in_subcycle</span><span class="p">,</span> <span class="n">memo</span><span class="p">,</span> <span class="n">distances</span><span class="p">)</span>
<span class="n">optimal_cycle</span> <span class="o">=</span> <span class="n">find_optimal_cycle</span><span class="p">(</span><span class="n">start</span><span class="p">,</span> <span class="n">nodes_in_subcycle</span><span class="p">,</span> <span class="n">memo</span><span class="p">,</span> <span class="n">distances</span><span class="p">)</span>
<span class="n">vertices</span> <span class="o">=</span> <span class="n">create_vertices</span><span class="p">(</span><span class="n">optimal_cycle</span><span class="p">,</span> <span class="n">distances</span><span class="p">)</span>
<span class="n">graph</span><span class="p">.</span><span class="n">optimal_cycle</span> <span class="o">=</span> <span class="n">ShortestPath</span><span class="p">(</span><span class="n">optimal_cycle_length</span><span class="p">,</span> <span class="n">vertices</span><span class="p">)</span>
<span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="mf">0.0001</span><span class="p">)</span>
<span class="k">yield</span> <span class="n">AlgorithmResult</span><span class="p">(</span><span class="n">cycles_evaluated</span><span class="p">,</span> <span class="n">evaluations_until_solved</span><span class="p">)</span>
</code></pre></div></div>
<p>As you can see, the outer loop increases the length of the cycle by 1 with each iteration. In every iteration, we find the optimal cycle of a certain length, increasing the cycle length until we have an optimal cycle equal to the number of nodes. The first inner loop goes over all the distinct subcycles we want to consider. Obviously, we only want to consider subcycles where the start node is actually part of the subcycle. Otherwise, that subcycle could not have started at the starting node. The function to check that is called is_not_in (I know it’s a one-line function… in my defense, there’s a npm package called “is_even”. So hold your horses 😀):</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">is_not_in</span><span class="p">(</span><span class="n">index</span><span class="p">,</span> <span class="n">subcycle</span><span class="p">)</span> <span class="o">-></span> <span class="nb">bool</span><span class="p">:</span>
<span class="s">"""Checks if the bit at the given index is a 0"""</span>
<span class="k">return</span> <span class="p">((</span><span class="mi">1</span> <span class="o"><<</span> <span class="n">index</span><span class="p">)</span> <span class="o">&</span> <span class="n">subcycle</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span>
</code></pre></div></div>
<p>and all that’s done here is check whether or not the bit at the index position in the subcycle is a 0. If it is, the node is not part of the subcycle.</p>
<p>After that, we loop over all the potential next nodes. Here, we only want to consider nodes that are actually in the subcycle and are not equal to our starting node. We then represent the next subcycle without this next node. We do this so we can use our memoization matrix to look up what the best partial tour length was without the next node already included. The last inner loop, where we cycle the last_node variable over the range of possible nodes, is there to test all possible end nodes of the currently considered subcycle, and consider which node best optimizes that subcycle. This last node can, of course, not be the start node or the next node that is fixed in the loop before and should be part of the current subcycle.</p>
<p>Per potential new last node, we check if the cost of the cycle with this last node is better than the currently known lowest cost (or shortest distance in our case). If so, we set the new min_cycle_length and store the best subcycle in the memoization matrix.</p>
<p>At the end of each new subcycle length, we do some bookkeeping and a little bit of work to reconstruct the actual newly found shortest cycle of length nodes_in_subcycle and yield that back so it can be displayed on screen. The functions calculate_optimal_cycle_length and find_optimal_cycle are responsible for this. If you are familiar with dynamic programming, reconstructing a solution from a memoization table will be familiar to you. The only bit of added complexity here is that this reconstruction is done using our bitmasks in the memoization table.</p>
<p>And that does it. If this is not entirely clear, I recommend you make sure the basics of dynamic programming and bit-wise operators are under your belt, and simply step through the code using a graph with 5 or so nodes. That allows you to still reason about the whole concept and actually follow along what is happening.</p>
<h4 id="beware-python-looping">Beware Python looping</h4>
<p>A final word of warning if you play around with this yourself: there’s quite a bit of looping going on in this implementation. Normally, that would be no issue. Python, however, is notoriously slow when it comes to looping. In my previous post about the <a href="https://www.peculiar-coding-endeavours.com/2023/game-of-life/" target="_blank">Game of Life</a>, I tackled that problem by using Numpy, Numba, and some search space optimizations. I decided not to take that route this time, however, since I didn’t want to make the implementation even more complex, and also give a fair comparison of all 4 algorithms without any trickery going into it. I will probably make a Rust 🦀 version at some point, probably without too many visuals going on (unless I get good at Bevy real fast), so the raw performance potential of several algorithms is more obvious. For now, though, bear with me (or Python, rather). I don’t execute the brute force and dynamic programming algorithms when the number of nodes exceeds 8 and 17 nodes, respectively, so things won’t be overly painful.</p>
<h3 id="genetic-algorithm">Genetic algorithm</h3>
<p>Finally, this is where the real fun starts… Now that we have two ways of solving the TSP that give us a guaranteed optimal result, even though it might take a bit to get there, we’ll dig into some approximation algorithms. Firstly, genetic algorithms. You will see that the code isn’t terribly complex at all, which allows me to focus on the high-level concepts of these interesting algorithms and apply them to this problem. If the dynamic programming section was a bit of a head-scratcher, no worries; it’s all fun and games from here on out.</p>
<h4 id="genetic-evolution">Genetic? Evolution?</h4>
<p>First and foremost, what are genetic algorithms..? They are an optimization technique, using concepts from evolutionary biology to search for a global optimal solution. As you would expect when drawing the analogy to evolution, we start with an initial population. We will crossbreed certain entities from the population with each other to obtain a new generation. There are various techniques we can apply to this process, and as you would expect, strong specimens will be more likely to breed, and some random mutations will also occur. As such, it really mimics the process of evolutionary biology, where we have non-random survival, with non-random selection, and random mutation. The survival of a certain specimen (or the spreading of its ‘genes) is steered by a fitness function that measures how well it can optimize for its purpose (in our case, finding the shortest route between a set of nodes). Stronger specimens have a higher chance of propagating their DNA through the generations, as they are more likely to be chosen for reproduction. To make sure there is some sense of discovery built into the whole system, random mutation will be an important part of the process.</p>
<p>When I first read about these algorithms, I got incredibly excited to learn about them, take a swing at them myself, and I’m still amazed at how well the whole concept works. You will see that close to optimal results can be achieved very quickly, both in terms of generations needed to do so and in terms of processing time. Each step is fairly simple, very visual, and easy to implement. Most of the difficulty regarding the efficacy of these algorithms lies in the details and many of the parameters. How big is my population? How many generations will I evolve before I stop and accept the result as close enough to optimal? What techniques will I use to steer the crossbreeding? How will I introduce genetic mutations in new specimens? A lot of these questions are often a matter of experimentation. Since I’m still very much a beginner in terms of these algorithms, I’d encourage you to start your own learning journey if all of this spikes your interest. For now, let me show you the implementation I came up with after experimenting with all of the above:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">async</span> <span class="k">def</span> <span class="nf">genetic_algorithm</span><span class="p">(</span><span class="n">graph</span><span class="p">:</span> <span class="n">Graph</span><span class="p">,</span> <span class="n">distances</span><span class="p">:</span> <span class="nb">dict</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">],</span> <span class="nb">int</span><span class="p">],</span> <span class="n">population_size</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span>
<span class="n">max_generations</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">max_no_improvement</span><span class="p">:</span> <span class="nb">int</span><span class="p">)</span> <span class="o">-></span> <span class="n">AsyncIterator</span><span class="p">[</span><span class="n">AlgorithmResult</span><span class="p">]:</span>
<span class="s">"""Solve the TSP problem with a genetic algorithm"""</span>
<span class="n">generations_evaluated</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">generations_until_solved</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">generations_without_improvement</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">optimal_cycle</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">]</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">optimal_cycle_length</span> <span class="o">=</span> <span class="n">maxsize</span>
<span class="n">population</span> <span class="o">=</span> <span class="n">spawn</span><span class="p">(</span><span class="n">graph</span><span class="p">,</span> <span class="n">population_size</span><span class="p">)</span>
<span class="n">cycle_lengths</span> <span class="o">=</span> <span class="n">get_cycle_lengths</span><span class="p">(</span><span class="n">population</span><span class="p">,</span> <span class="n">distances</span><span class="p">)</span>
<span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">max_generations</span><span class="p">):</span>
<span class="k">if</span> <span class="n">generations_without_improvement</span> <span class="o">>=</span> <span class="n">max_no_improvement</span><span class="p">:</span>
<span class="k">break</span>
<span class="n">fitness</span> <span class="o">=</span> <span class="n">determine_fitness</span><span class="p">(</span><span class="n">cycle_lengths</span><span class="p">)</span>
<span class="n">improved</span> <span class="o">=</span> <span class="bp">False</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">cycle_length</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">cycle_lengths</span><span class="p">):</span>
<span class="k">if</span> <span class="n">cycle_length</span> <span class="o"><</span> <span class="n">optimal_cycle_length</span><span class="p">:</span>
<span class="n">improved</span> <span class="o">=</span> <span class="bp">True</span>
<span class="n">optimal_cycle_length</span> <span class="o">=</span> <span class="n">cycle_length</span>
<span class="n">optimal_cycle</span> <span class="o">=</span> <span class="n">population</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
<span class="n">generations_evaluated</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">if</span> <span class="n">improved</span><span class="p">:</span>
<span class="n">generations_until_solved</span> <span class="o">=</span> <span class="n">generations_evaluated</span>
<span class="n">generations_without_improvement</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">vertices</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">]]</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">optimal_cycle</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">for</span> <span class="n">key</span> <span class="ow">in</span> <span class="n">optimal_cycle</span><span class="p">[</span><span class="mi">1</span><span class="p">:]:</span>
<span class="n">distance</span> <span class="o">=</span> <span class="n">distances</span><span class="p">[(</span><span class="n">node</span><span class="p">,</span> <span class="n">key</span><span class="p">)]</span>
<span class="n">vertices</span><span class="p">.</span><span class="n">append</span><span class="p">((</span><span class="n">node</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">distance</span><span class="p">))</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">key</span>
<span class="n">graph</span><span class="p">.</span><span class="n">remove_vertices</span><span class="p">()</span>
<span class="n">graph</span><span class="p">.</span><span class="n">add_vertices</span><span class="p">(</span><span class="n">vertices</span><span class="p">)</span>
<span class="n">graph</span><span class="p">.</span><span class="n">optimal_cycle</span> <span class="o">=</span> <span class="n">ShortestPath</span><span class="p">(</span><span class="n">optimal_cycle_length</span><span class="p">,</span> <span class="n">vertices</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">generations_without_improvement</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">population</span><span class="p">)</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span>
<span class="k">break</span>
<span class="n">population</span><span class="p">,</span> <span class="n">cycle_lengths</span> <span class="o">=</span> <span class="n">create_next_population</span><span class="p">(</span><span class="n">population</span><span class="p">,</span> <span class="n">cycle_lengths</span><span class="p">,</span>
<span class="n">fitness</span><span class="p">,</span> <span class="n">distances</span><span class="p">)</span>
<span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="mf">0.0001</span><span class="p">)</span>
<span class="k">yield</span> <span class="n">AlgorithmResult</span><span class="p">(</span><span class="n">generations_evaluated</span><span class="p">,</span> <span class="n">generations_until_solved</span><span class="p">)</span>
<span class="k">yield</span> <span class="n">AlgorithmResult</span><span class="p">(</span><span class="n">generations_evaluated</span><span class="p">,</span> <span class="n">generations_until_solved</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">spawn</span><span class="p">(</span><span class="n">graph</span><span class="p">:</span> <span class="n">Graph</span><span class="p">,</span> <span class="n">population_size</span><span class="p">:</span> <span class="nb">int</span><span class="p">)</span> <span class="o">-></span> <span class="nb">list</span><span class="p">[</span><span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">]]:</span>
<span class="s">"""Create the initial generation"""</span>
<span class="n">start</span> <span class="o">=</span> <span class="n">graph</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">key</span>
<span class="n">keys</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">graph</span><span class="p">.</span><span class="n">nodes</span><span class="p">.</span><span class="n">keys</span><span class="p">())[</span><span class="mi">1</span><span class="p">:]</span>
<span class="n">max_size</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">factorial</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">keys</span><span class="p">))</span> <span class="o">/</span> <span class="mi">2</span><span class="p">)</span>
<span class="n">unique_permutations</span> <span class="o">=</span> <span class="nb">set</span><span class="p">()</span>
<span class="k">while</span> <span class="nb">len</span><span class="p">(</span><span class="n">unique_permutations</span><span class="p">)</span> <span class="o"><</span> <span class="n">population_size</span> <span class="ow">and</span> <span class="nb">len</span><span class="p">(</span><span class="n">unique_permutations</span><span class="p">)</span> <span class="o"><</span> <span class="n">max_size</span><span class="p">:</span>
<span class="n">permutation</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">keys</span><span class="p">)</span>
<span class="n">shuffle</span><span class="p">(</span><span class="n">permutation</span><span class="p">)</span>
<span class="n">permutation</span> <span class="o">=</span> <span class="p">(</span><span class="n">start</span><span class="p">,)</span> <span class="o">+</span> <span class="nb">tuple</span><span class="p">(</span><span class="n">permutation</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="n">start</span><span class="p">,)</span>
<span class="k">if</span> <span class="n">permutation</span><span class="p">[::</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">unique_permutations</span><span class="p">:</span>
<span class="n">unique_permutations</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">permutation</span><span class="p">)</span>
<span class="k">return</span> <span class="p">[</span><span class="nb">list</span><span class="p">(</span><span class="n">permutation</span><span class="p">)</span> <span class="k">for</span> <span class="n">permutation</span> <span class="ow">in</span> <span class="n">unique_permutations</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">create_next_population</span><span class="p">(</span><span class="n">current_population</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">]],</span> <span class="n">cycle_lengths</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">],</span>
<span class="n">fitness</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">float</span><span class="p">],</span> <span class="n">distances</span><span class="p">:</span> <span class="nb">dict</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">],</span> <span class="nb">int</span><span class="p">]</span>
<span class="p">)</span> <span class="o">-></span> <span class="nb">tuple</span><span class="p">[</span><span class="nb">list</span><span class="p">[</span><span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">]],</span> <span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">]]:</span>
<span class="s">"""Create the next generation"""</span>
<span class="n">new_population</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">]]</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">population_size</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">current_population</span><span class="p">)</span>
<span class="c1"># Create the offspring of the current generation
</span> <span class="n">offspring</span> <span class="o">=</span> <span class="n">create_offspring</span><span class="p">(</span><span class="n">current_population</span><span class="p">,</span> <span class="n">fitness</span><span class="p">,</span> <span class="n">population_size</span><span class="p">)</span>
<span class="c1"># Perform a variation of elitism where we add the offspring to the current generation
</span> <span class="c1"># and only continue with the fittest list of size population_size
</span> <span class="n">new_population</span> <span class="o">=</span> <span class="n">current_population</span> <span class="o">+</span> <span class="n">offspring</span>
<span class="n">offspring_cycle_lengths</span> <span class="o">=</span> <span class="n">get_cycle_lengths</span><span class="p">(</span><span class="n">offspring</span><span class="p">,</span> <span class="n">distances</span><span class="p">)</span>
<span class="n">new_population_cycle_lengths</span> <span class="o">=</span> <span class="n">cycle_lengths</span> <span class="o">+</span> <span class="n">offspring_cycle_lengths</span>
<span class="n">new_population_fitness</span> <span class="o">=</span> <span class="n">fitness</span> <span class="o">+</span> <span class="n">determine_fitness</span><span class="p">(</span><span class="n">offspring_cycle_lengths</span><span class="p">)</span>
<span class="n">survivor_candidates</span> <span class="o">=</span> <span class="nb">zip</span><span class="p">(</span><span class="n">new_population_fitness</span><span class="p">,</span> <span class="n">new_population</span><span class="p">)</span>
<span class="n">fittest_indices</span> <span class="o">=</span> <span class="p">[</span><span class="n">i</span> <span class="k">for</span> <span class="n">_</span><span class="p">,</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">heapq</span><span class="p">.</span><span class="n">nlargest</span><span class="p">(</span><span class="n">population_size</span><span class="p">,</span> <span class="p">((</span><span class="n">x</span><span class="p">,</span> <span class="n">i</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">x</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">survivor_candidates</span><span class="p">)))]</span>
<span class="n">new_population</span> <span class="o">=</span> <span class="p">[</span><span class="n">new_population</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">fittest_indices</span><span class="p">]</span>
<span class="n">new_population_cycle_lengths</span> <span class="o">=</span> <span class="p">[</span><span class="n">new_population_cycle_lengths</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">fittest_indices</span><span class="p">]</span>
<span class="k">return</span> <span class="n">new_population</span><span class="p">,</span> <span class="n">new_population_cycle_lengths</span>
<span class="k">def</span> <span class="nf">create_offspring</span><span class="p">(</span><span class="n">current_population</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">]],</span> <span class="n">fitness</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">float</span><span class="p">],</span>
<span class="n">population_size</span><span class="p">:</span> <span class="nb">int</span><span class="p">)</span> <span class="o">-></span> <span class="nb">list</span><span class="p">[</span><span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">]]:</span>
<span class="s">"""Create a new generation"""</span>
<span class="n">offspring</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">]]</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">while</span> <span class="nb">len</span><span class="p">(</span><span class="n">offspring</span><span class="p">)</span> <span class="o"><</span> <span class="n">population_size</span><span class="p">:</span>
<span class="n">parent1</span> <span class="o">=</span> <span class="n">parent2</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">while</span> <span class="n">parent1</span> <span class="o">==</span> <span class="n">parent2</span><span class="p">:</span>
<span class="n">parent1</span> <span class="o">=</span> <span class="n">get_parent</span><span class="p">(</span><span class="n">fitness</span><span class="p">)</span>
<span class="n">parent2</span> <span class="o">=</span> <span class="n">get_parent</span><span class="p">(</span><span class="n">fitness</span><span class="p">)</span>
<span class="n">child1</span> <span class="o">=</span> <span class="n">crossover</span><span class="p">(</span><span class="n">current_population</span><span class="p">[</span><span class="n">parent1</span><span class="p">],</span> <span class="n">current_population</span><span class="p">[</span><span class="n">parent2</span><span class="p">])</span>
<span class="n">child2</span> <span class="o">=</span> <span class="n">crossover</span><span class="p">(</span><span class="n">current_population</span><span class="p">[</span><span class="n">parent2</span><span class="p">],</span> <span class="n">current_population</span><span class="p">[</span><span class="n">parent1</span><span class="p">])</span>
<span class="n">child1</span> <span class="o">=</span> <span class="n">mutate</span><span class="p">(</span><span class="n">child1</span><span class="p">)</span>
<span class="n">child2</span> <span class="o">=</span> <span class="n">mutate</span><span class="p">(</span><span class="n">child2</span><span class="p">)</span>
<span class="n">offspring</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">child1</span><span class="p">)</span>
<span class="n">offspring</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">child2</span><span class="p">)</span>
<span class="k">return</span> <span class="n">offspring</span>
<span class="k">def</span> <span class="nf">get_parent</span><span class="p">(</span><span class="n">fitness</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">float</span><span class="p">]):</span>
<span class="s">"""Get a parent using either tournament selection or biased random selection"""</span>
<span class="k">if</span> <span class="n">randint</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">):</span>
<span class="k">return</span> <span class="n">tournament_selection</span><span class="p">(</span><span class="n">fitness</span><span class="p">)</span>
<span class="k">return</span> <span class="n">biased_random_selection</span><span class="p">(</span><span class="n">fitness</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">tournament_selection</span><span class="p">(</span><span class="n">fitness</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">float</span><span class="p">])</span> <span class="o">-></span> <span class="nb">int</span><span class="p">:</span>
<span class="s">"""Perform basic tournament selection to get a parent"""</span>
<span class="n">start</span><span class="p">,</span> <span class="n">end</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">fitness</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span>
<span class="n">candidate1</span> <span class="o">=</span> <span class="n">randint</span><span class="p">(</span><span class="n">start</span><span class="p">,</span> <span class="n">end</span><span class="p">)</span>
<span class="n">candidate2</span> <span class="o">=</span> <span class="n">randint</span><span class="p">(</span><span class="n">start</span><span class="p">,</span> <span class="n">end</span><span class="p">)</span>
<span class="k">while</span> <span class="n">candidate1</span> <span class="o">==</span> <span class="n">candidate2</span><span class="p">:</span>
<span class="n">candidate2</span> <span class="o">=</span> <span class="n">randint</span><span class="p">(</span><span class="n">start</span><span class="p">,</span> <span class="n">end</span><span class="p">)</span>
<span class="k">return</span> <span class="n">candidate1</span> <span class="k">if</span> <span class="n">fitness</span><span class="p">[</span><span class="n">candidate1</span><span class="p">]</span> <span class="o">></span> <span class="n">fitness</span><span class="p">[</span><span class="n">candidate2</span><span class="p">]</span> <span class="k">else</span> <span class="n">candidate2</span>
<span class="k">def</span> <span class="nf">biased_random_selection</span><span class="p">(</span><span class="n">fitness</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">float</span><span class="p">])</span> <span class="o">-></span> <span class="nb">int</span><span class="p">:</span>
<span class="s">"""Perform biased random selection to get a parent"""</span>
<span class="n">random_specimen</span> <span class="o">=</span> <span class="n">randint</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">fitness</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">fitness</span><span class="p">):</span>
<span class="k">if</span> <span class="n">fitness</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">>=</span> <span class="n">fitness</span><span class="p">[</span><span class="n">random_specimen</span><span class="p">]:</span>
<span class="k">return</span> <span class="n">i</span>
<span class="k">return</span> <span class="n">random_specimen</span>
<span class="k">def</span> <span class="nf">crossover</span><span class="p">(</span><span class="n">parent1</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">],</span> <span class="n">parent2</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">])</span> <span class="o">-></span> <span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">]:</span>
<span class="s">"""Cross-breed a new set of children from the given parents"""</span>
<span class="n">start</span> <span class="o">=</span> <span class="n">parent1</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">end</span> <span class="o">=</span> <span class="n">parent1</span><span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="n">parent1</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">]</span>
<span class="n">parent1</span> <span class="o">=</span> <span class="n">parent1</span><span class="p">[</span><span class="mi">1</span><span class="p">:</span><span class="nb">len</span><span class="p">(</span><span class="n">parent1</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">]</span>
<span class="n">parent2</span> <span class="o">=</span> <span class="n">parent2</span><span class="p">[</span><span class="mi">1</span><span class="p">:</span><span class="nb">len</span><span class="p">(</span><span class="n">parent2</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">]</span>
<span class="n">split</span> <span class="o">=</span> <span class="n">randint</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">parent1</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span>
<span class="n">child</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">]</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">*</span> <span class="nb">len</span><span class="p">(</span><span class="n">parent1</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">split</span><span class="p">):</span>
<span class="n">child</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">parent1</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
<span class="n">remainder</span> <span class="o">=</span> <span class="p">[</span><span class="n">i</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">parent2</span> <span class="k">if</span> <span class="n">i</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">child</span><span class="p">]</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">data</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">remainder</span><span class="p">):</span>
<span class="n">child</span><span class="p">[</span><span class="n">split</span> <span class="o">+</span> <span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">data</span>
<span class="k">return</span> <span class="p">[</span><span class="n">start</span><span class="p">,</span> <span class="o">*</span><span class="n">child</span><span class="p">,</span> <span class="n">end</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">mutate</span><span class="p">(</span><span class="n">child</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">])</span> <span class="o">-></span> <span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">]:</span>
<span class="s">"""Mutate the child sequence"""</span>
<span class="k">if</span> <span class="n">randint</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">):</span>
<span class="n">child</span> <span class="o">=</span> <span class="n">swap_mutate</span><span class="p">(</span><span class="n">child</span><span class="p">)</span>
<span class="n">child</span> <span class="o">=</span> <span class="n">rotate_mutate</span><span class="p">(</span><span class="n">child</span><span class="p">)</span>
<span class="k">return</span> <span class="n">child</span>
<span class="k">def</span> <span class="nf">swap_mutate</span><span class="p">(</span><span class="n">child</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">])</span> <span class="o">-></span> <span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">]:</span>
<span class="s">"""Mutate the cycle by swapping 2 nodes"""</span>
<span class="n">index1</span> <span class="o">=</span> <span class="n">randint</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">child</span><span class="p">)</span> <span class="o">-</span> <span class="mi">2</span><span class="p">)</span>
<span class="n">index2</span> <span class="o">=</span> <span class="n">randint</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">child</span><span class="p">)</span> <span class="o">-</span> <span class="mi">2</span><span class="p">)</span>
<span class="n">child</span><span class="p">[</span><span class="n">index1</span><span class="p">],</span> <span class="n">child</span><span class="p">[</span><span class="n">index2</span><span class="p">]</span> <span class="o">=</span> <span class="n">child</span><span class="p">[</span><span class="n">index2</span><span class="p">],</span> <span class="n">child</span><span class="p">[</span><span class="n">index1</span><span class="p">]</span>
<span class="k">return</span> <span class="n">child</span>
<span class="k">def</span> <span class="nf">rotate_mutate</span><span class="p">(</span><span class="n">child</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">])</span> <span class="o">-></span> <span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">]:</span>
<span class="s">"""Mutate the cycle by rotating a part nodes"""</span>
<span class="n">split</span> <span class="o">=</span> <span class="n">randint</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">child</span><span class="p">)</span> <span class="o">-</span> <span class="mi">2</span><span class="p">)</span>
<span class="n">head</span> <span class="o">=</span> <span class="n">child</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="n">split</span><span class="p">]</span>
<span class="n">mid</span> <span class="o">=</span> <span class="n">child</span><span class="p">[</span><span class="n">split</span><span class="p">:</span><span class="nb">len</span><span class="p">(</span><span class="n">child</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">][::</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="n">tail</span> <span class="o">=</span> <span class="n">child</span><span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="n">child</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">:]</span>
<span class="n">child</span> <span class="o">=</span> <span class="n">head</span> <span class="o">+</span> <span class="n">mid</span> <span class="o">+</span> <span class="n">tail</span>
<span class="k">return</span> <span class="n">child</span>
<span class="k">def</span> <span class="nf">get_cycle_lengths</span><span class="p">(</span><span class="n">population</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">]],</span>
<span class="n">distances</span><span class="p">:</span> <span class="nb">dict</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">],</span> <span class="nb">int</span><span class="p">])</span> <span class="o">-></span> <span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">]:</span>
<span class="s">"""Get the lengths of all cycles in the graph"""</span>
<span class="n">cycle_lengths</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">]</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">specimen</span> <span class="ow">in</span> <span class="n">population</span><span class="p">:</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">specimen</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">cycle_length</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">for</span> <span class="n">key</span> <span class="ow">in</span> <span class="n">specimen</span><span class="p">[</span><span class="mi">1</span><span class="p">:]:</span>
<span class="n">key</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">key</span><span class="p">)</span>
<span class="n">cycle_length</span> <span class="o">+=</span> <span class="n">distances</span><span class="p">[(</span><span class="n">node</span><span class="p">,</span> <span class="n">key</span><span class="p">)]</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">key</span>
<span class="n">cycle_lengths</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">cycle_length</span><span class="p">)</span>
<span class="k">return</span> <span class="n">cycle_lengths</span>
<span class="k">def</span> <span class="nf">determine_fitness</span><span class="p">(</span><span class="n">cycle_lengths</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">])</span> <span class="o">-></span> <span class="nb">list</span><span class="p">[</span><span class="nb">float</span><span class="p">]:</span>
<span class="s">"""Determine the fitness of the specimens in the population"""</span>
<span class="c1"># Invert so that shorter paths get higher values
</span> <span class="n">fitness_sum</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">(</span><span class="n">cycle_lengths</span><span class="p">)</span>
<span class="n">fitness</span> <span class="o">=</span> <span class="p">[</span><span class="n">fitness_sum</span> <span class="o">/</span> <span class="n">cycle_length</span> <span class="k">for</span> <span class="n">cycle_length</span> <span class="ow">in</span> <span class="n">cycle_lengths</span><span class="p">]</span>
<span class="c1"># Normalize the fitness
</span> <span class="n">fitness_sum</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">(</span><span class="n">fitness</span><span class="p">)</span>
<span class="n">fitness</span> <span class="o">=</span> <span class="p">[</span><span class="n">f</span> <span class="o">/</span> <span class="n">fitness_sum</span> <span class="k">for</span> <span class="n">f</span> <span class="ow">in</span> <span class="n">fitness</span><span class="p">]</span>
<span class="k">return</span> <span class="n">fitness</span>
</code></pre></div></div>
<h4 id="the-initial-population">The initial population</h4>
<p>If you take your time to go over the implementation and maybe step through it yourself, I think your first impression will be that the complexity here is fairly low. Let’s go over what exactly happens, and focus on some of the functions in detail. Our first task is to spawn an initial population:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">population</span> <span class="o">=</span> <span class="n">spawn</span><span class="p">(</span><span class="n">graph</span><span class="p">,</span> <span class="n">population_size</span><span class="p">)</span>
<span class="p">...</span>
<span class="k">def</span> <span class="nf">spawn</span><span class="p">(</span><span class="n">graph</span><span class="p">:</span> <span class="n">Graph</span><span class="p">,</span> <span class="n">population_size</span><span class="p">:</span> <span class="nb">int</span><span class="p">)</span> <span class="o">-></span> <span class="nb">list</span><span class="p">[</span><span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">]]:</span>
<span class="s">"""Create the initial generation"""</span>
<span class="n">start</span> <span class="o">=</span> <span class="n">graph</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">key</span>
<span class="n">keys</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">graph</span><span class="p">.</span><span class="n">nodes</span><span class="p">.</span><span class="n">keys</span><span class="p">())[</span><span class="mi">1</span><span class="p">:]</span>
<span class="n">max_size</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">factorial</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">keys</span><span class="p">))</span> <span class="o">/</span> <span class="mi">2</span><span class="p">)</span>
<span class="n">unique_permutations</span> <span class="o">=</span> <span class="nb">set</span><span class="p">()</span>
<span class="k">while</span> <span class="nb">len</span><span class="p">(</span><span class="n">unique_permutations</span><span class="p">)</span> <span class="o"><</span> <span class="n">population_size</span> <span class="ow">and</span> <span class="nb">len</span><span class="p">(</span><span class="n">unique_permutations</span><span class="p">)</span> <span class="o"><</span> <span class="n">max_size</span><span class="p">:</span>
<span class="n">permutation</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">keys</span><span class="p">)</span>
<span class="n">shuffle</span><span class="p">(</span><span class="n">permutation</span><span class="p">)</span>
<span class="n">permutation</span> <span class="o">=</span> <span class="p">(</span><span class="n">start</span><span class="p">,)</span> <span class="o">+</span> <span class="nb">tuple</span><span class="p">(</span><span class="n">permutation</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="n">start</span><span class="p">,)</span>
<span class="k">if</span> <span class="n">permutation</span><span class="p">[::</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">unique_permutations</span><span class="p">:</span>
<span class="n">unique_permutations</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">permutation</span><span class="p">)</span>
<span class="k">return</span> <span class="p">[</span><span class="nb">list</span><span class="p">(</span><span class="n">permutation</span><span class="p">)</span> <span class="k">for</span> <span class="n">permutation</span> <span class="ow">in</span> <span class="n">unique_permutations</span><span class="p">]</span>
</code></pre></div></div>
<p>So what does it mean to have a “population”? For our particular problem, each specimen in the population is basically a candidate route that starts out at our designated starting node, visits every other node exactly once, and arrives back at that starting node. Whether or not that’s an optimal route or not is of no significance at this point. We trust that an approximation of the ideal route can be achieved through evolution, and we won’t have to perform an exhaustive search through all permutations. That last part is the whole point of it, especially for problems that are NP-hard: we do not want to generate the full set of potential solutions and go through them one by one. We will simply generate a number of candidate solutions and start the process from there. Also, it’s important that these initial candidate solutions be chosen randomly. When we use heuristics to already pre-optimize candidates, this leads to low diversity in the population, which can yield suboptimal solutions. There is no need to increase the initial fitness of the population. It’s the diversity of the solutions that will lead to optimality.</p>
<p>That begs another question: how do we determine the size of our initial population? If we start with a high initial population count, this can cause the algorithm to perform very poorly, which we want to avoid. Then again, an overly small population may not be enough to create a high-quality and diverse mating pool. A lot of trial and error can go into finding a population size that suits the specific problem. In this case, I have settled on half of n!, where n is the set of nodes excluding the starting node. For example, if we have a set of 8 total nodes, the initial population size would consist of (7!) / 2 = 2520. Putting it in perspective, in this case, there would be 40320 potential solutions, so we’re still far from generating the full set of candidates. In case you want to learn more, there are several whitepapers out there that just focus on techniques to determine the ideal initial population size.</p>
<h4 id="keep-on-spinning-forever">Keep on spinning forever?</h4>
<p>So now that we have decided on an initial population, the next question is: how long will we keep this little ecosystem doing its thing? Remember two sentences back? Yeah, there’s also enough research going on regarding that question. At some point, there is simply no improvement possible since we converged on the optimal solution, or the whole cycle doesn’t yield any improvement anymore. Since we’ll never know whether we actually found an optimal solution (not without some “help” from the outside at least), we’ll just decide to stop running the algorithm when we haven’t seen any improvement for a certain number of generations:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">max_generations</span><span class="p">):</span>
<span class="k">if</span> <span class="n">generations_without_improvement</span> <span class="o">>=</span> <span class="n">max_no_improvement</span><span class="p">:</span>
<span class="k">break</span>
</code></pre></div></div>
<p>What that number is will be dependent on the size of the problem and many other factors. In main.py, you see that some experimentation made me settle on some numbers of population size and max_generations that seemed to work well for a certain number of nodes:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">def</span> <span class="nf">__determine_ga_parameters</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node_count</span><span class="p">:</span> <span class="nb">int</span><span class="p">)</span> <span class="o">-></span> <span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">]:</span>
<span class="n">match</span> <span class="n">node_count</span><span class="p">:</span>
<span class="n">case</span> <span class="n">_</span> <span class="k">if</span> <span class="n">node_count</span> <span class="o"><=</span> <span class="mi">5</span><span class="p">:</span> <span class="k">return</span> <span class="mi">50</span><span class="p">,</span> <span class="mi">5</span>
<span class="n">case</span> <span class="n">_</span> <span class="k">if</span> <span class="n">node_count</span> <span class="o"><=</span> <span class="mi">10</span><span class="p">:</span> <span class="k">return</span> <span class="mi">250</span><span class="p">,</span> <span class="mi">10</span>
<span class="n">case</span> <span class="n">_</span> <span class="k">if</span> <span class="n">node_count</span> <span class="o"><=</span> <span class="mi">15</span><span class="p">:</span> <span class="k">return</span> <span class="mi">500</span><span class="p">,</span> <span class="mi">30</span>
<span class="n">case</span> <span class="n">_</span> <span class="k">if</span> <span class="n">node_count</span> <span class="o"><=</span> <span class="mi">20</span><span class="p">:</span> <span class="k">return</span> <span class="mi">750</span><span class="p">,</span> <span class="mi">50</span>
<span class="n">case</span> <span class="n">_</span> <span class="k">if</span> <span class="n">node_count</span> <span class="o"><=</span> <span class="mi">25</span><span class="p">:</span> <span class="k">return</span> <span class="mi">1000</span><span class="p">,</span> <span class="mi">75</span>
<span class="n">case</span> <span class="n">_</span> <span class="k">if</span> <span class="n">node_count</span> <span class="o"><=</span> <span class="mi">30</span><span class="p">:</span> <span class="k">return</span> <span class="mi">1250</span><span class="p">,</span> <span class="mi">100</span>
<span class="n">case</span> <span class="n">_</span> <span class="k">if</span> <span class="n">node_count</span> <span class="o"><=</span> <span class="mi">35</span><span class="p">:</span> <span class="k">return</span> <span class="mi">1500</span><span class="p">,</span> <span class="mi">150</span>
<span class="n">case</span> <span class="n">_</span> <span class="k">if</span> <span class="n">node_count</span> <span class="o"><=</span> <span class="mi">50</span><span class="p">:</span> <span class="k">return</span> <span class="mi">5000</span><span class="p">,</span> <span class="mi">250</span>
<span class="k">return</span> <span class="mi">25000</span><span class="p">,</span> <span class="mi">500</span>
</code></pre></div></div>
<h4 id="nature-calls">Nature calls</h4>
<p>And after that, you could say it’s off to the races. We first determine the fitness of our current population, which is fairly simple. All we do is take the total sum of the cycle lengths and divide the length of each cycle by that number. That way, shorter paths get higher values. We then just normalize those values so they all sum up to 1.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">determine_fitness</span><span class="p">(</span><span class="n">cycle_lengths</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">])</span> <span class="o">-></span> <span class="nb">list</span><span class="p">[</span><span class="nb">float</span><span class="p">]:</span>
<span class="s">"""Determine the fitness of the specimens in the population"""</span>
<span class="c1"># Invert so that shorter paths get higher values
</span> <span class="n">fitness_sum</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">(</span><span class="n">cycle_lengths</span><span class="p">)</span>
<span class="n">fitness</span> <span class="o">=</span> <span class="p">[</span><span class="n">fitness_sum</span> <span class="o">/</span> <span class="n">cycle_length</span> <span class="k">for</span> <span class="n">cycle_length</span> <span class="ow">in</span> <span class="n">cycle_lengths</span><span class="p">]</span>
<span class="c1"># Normalize the fitness
</span> <span class="n">fitness_sum</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">(</span><span class="n">fitness</span><span class="p">)</span>
<span class="n">fitness</span> <span class="o">=</span> <span class="p">[</span><span class="n">f</span> <span class="o">/</span> <span class="n">fitness_sum</span> <span class="k">for</span> <span class="n">f</span> <span class="ow">in</span> <span class="n">fitness</span><span class="p">]</span>
<span class="k">return</span> <span class="n">fitness</span>
</code></pre></div></div>
<p>After this step, we check if we found a shorter cycle in this generation and update the shortest known cycle to that specific candidate. We then do some bookkeeping to update our graph to reflect this newly found solution, which is just something we do for this particular project in order to display intermediary results. The next important thing we need to do is create a whole new generation. There are a lot of possibilities to experiment with here, from the methods for selection to the way we introduce mutation, and even the size of the new population.</p>
<h4 id="non-random-selection">Non-random selection</h4>
<p>What determines which specimens get to propagate their DNA through the generations? The temptation might be to simply select only the strongest candidate solutions. This will inevitably lead to getting stuck in local optimal solutions, low diversity in the population, and ultimately, far from a close-to-optimal solution in the long run. There are two quite popular methods for selecting parents: tournament selection and biased random selection. We will randomly select either one of them. If you think: “that sounds awefully random to me; where does the non-random part come in?” no worries, both still favor more fit specimens.</p>
<p>Tournament selection works as follows: we pick two completely random candidate solutions out of the population, and whichever has the highest fitness wins. Yeah. That’s it. I told you the code was simple. Here it is:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">tournament_selection</span><span class="p">(</span><span class="n">fitness</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">float</span><span class="p">])</span> <span class="o">-></span> <span class="nb">int</span><span class="p">:</span>
<span class="s">"""Perform basic tournament selection to get a parent"""</span>
<span class="n">start</span><span class="p">,</span> <span class="n">end</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">fitness</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span>
<span class="n">candidate1</span> <span class="o">=</span> <span class="n">randint</span><span class="p">(</span><span class="n">start</span><span class="p">,</span> <span class="n">end</span><span class="p">)</span>
<span class="n">candidate2</span> <span class="o">=</span> <span class="n">randint</span><span class="p">(</span><span class="n">start</span><span class="p">,</span> <span class="n">end</span><span class="p">)</span>
<span class="k">while</span> <span class="n">candidate1</span> <span class="o">==</span> <span class="n">candidate2</span><span class="p">:</span>
<span class="n">candidate2</span> <span class="o">=</span> <span class="n">randint</span><span class="p">(</span><span class="n">start</span><span class="p">,</span> <span class="n">end</span><span class="p">)</span>
<span class="k">return</span> <span class="n">candidate1</span> <span class="k">if</span> <span class="n">fitness</span><span class="p">[</span><span class="n">candidate1</span><span class="p">]</span> <span class="o">></span> <span class="n">fitness</span><span class="p">[</span><span class="n">candidate2</span><span class="p">]</span> <span class="k">else</span> <span class="n">candidate2</span>
</code></pre></div></div>
<p>As an alternative, biased random selection works a bit differently. We select a random specimen, and we look for the first specimen in the list that has a higher fitness score than that random one. If we find it. If not, the initially randomly selected specimen is the one that will return. Quite literally random, but… biased.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">biased_random_selection</span><span class="p">(</span><span class="n">fitness</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">float</span><span class="p">])</span> <span class="o">-></span> <span class="nb">int</span><span class="p">:</span>
<span class="s">"""Perform biased random selection to get a parent"""</span>
<span class="n">random_specimen</span> <span class="o">=</span> <span class="n">randint</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">fitness</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">fitness</span><span class="p">):</span>
<span class="k">if</span> <span class="n">fitness</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">>=</span> <span class="n">fitness</span><span class="p">[</span><span class="n">random_specimen</span><span class="p">]:</span>
<span class="k">return</span> <span class="n">i</span>
<span class="k">return</span> <span class="n">random_specimen</span>
</code></pre></div></div>
<h4 id="crossover">Crossover</h4>
<p>Great, so now we have two parents. We all know what comes next. Detail: we don’t have to limit ourselves to only 2 parents in the case of crossover during genetic algorithms, there’s no reason why we would not use 3/4 or more parents when creating a new specimen for the next generation (also, wipe that smirk off your face). In this case, let’s be nice about it and stick to two parents. Our crossover function looks as follows:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">crossover</span><span class="p">(</span><span class="n">parent1</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">],</span> <span class="n">parent2</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">])</span> <span class="o">-></span> <span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">]:</span>
<span class="s">"""Cross-breed a new set of children from the given parents"""</span>
<span class="n">start</span> <span class="o">=</span> <span class="n">parent1</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">end</span> <span class="o">=</span> <span class="n">parent1</span><span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="n">parent1</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">]</span>
<span class="n">parent1</span> <span class="o">=</span> <span class="n">parent1</span><span class="p">[</span><span class="mi">1</span><span class="p">:</span><span class="nb">len</span><span class="p">(</span><span class="n">parent1</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">]</span>
<span class="n">parent2</span> <span class="o">=</span> <span class="n">parent2</span><span class="p">[</span><span class="mi">1</span><span class="p">:</span><span class="nb">len</span><span class="p">(</span><span class="n">parent2</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">]</span>
<span class="n">split</span> <span class="o">=</span> <span class="n">randint</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">parent1</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span>
<span class="n">child</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">]</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">*</span> <span class="nb">len</span><span class="p">(</span><span class="n">parent1</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">split</span><span class="p">):</span>
<span class="n">child</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">parent1</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
<span class="n">remainder</span> <span class="o">=</span> <span class="p">[</span><span class="n">i</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">parent2</span> <span class="k">if</span> <span class="n">i</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">child</span><span class="p">]</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">data</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">remainder</span><span class="p">):</span>
<span class="n">child</span><span class="p">[</span><span class="n">split</span> <span class="o">+</span> <span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">data</span>
<span class="k">return</span> <span class="p">[</span><span class="n">start</span><span class="p">,</span> <span class="o">*</span><span class="n">child</span><span class="p">,</span> <span class="n">end</span><span class="p">]</span>
</code></pre></div></div>
<p>“A part of mommy, a part of dad”. It sounds clumsily simple, but that is really all there is to it. We take a part of the first parent and assign it to our offspring. That “part”, in case you are wondering, is just the start of the route as described by the parent specimen. A certain sequence of nodes, starting from our designated starting node, up to some cut-off point that is randomly decided on. That is basically our DNA. After this, we need to complete the route from the cut-off point back to the starting node. In this case, though, we need to respect the problem at hand and make sure we only visit each node once. So we loop through the nodes of parent2 and only add each node to the child if it has not been visited yet. You could say that potentially scrambles up the second part of the offspring candidate, and that is true. We literally take the exact sequence of nodes of parent1 up to the cut-off point but will not respect the sequence of nodes of parent2 as we add them to the child.</p>
<h4 id="mutation">Mutation</h4>
<p>As we all know, biology isn’t perfect, so let’s include some random mutations just like they happen in nature. After generating our offspring per parent pair, we mutate them using one of two possible methods. Swap mutate, and rotate mutate. Why introduce mutations at all? We want to maintain genetic diversity in our population so we don’t get stuck in local mimima. Usually, in genetic algorithms, mutation is only done sporadically. Here, you see that I mutate every single offspring specimen. Experiment yourself when implementing genetic algorithms for your specific problem to see what works and what doesn’t.</p>
<p>Swap mutate is a really simple and low-impact mutation that will simply swap 2 nodes in the sequence, and it looks like this:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">swap_mutate</span><span class="p">(</span><span class="n">child</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">])</span> <span class="o">-></span> <span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">]:</span>
<span class="s">"""Mutate the cycle by swapping 2 nodes"""</span>
<span class="n">index1</span> <span class="o">=</span> <span class="n">randint</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">child</span><span class="p">)</span> <span class="o">-</span> <span class="mi">2</span><span class="p">)</span>
<span class="n">index2</span> <span class="o">=</span> <span class="n">randint</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">child</span><span class="p">)</span> <span class="o">-</span> <span class="mi">2</span><span class="p">)</span>
<span class="n">child</span><span class="p">[</span><span class="n">index1</span><span class="p">],</span> <span class="n">child</span><span class="p">[</span><span class="n">index2</span><span class="p">]</span> <span class="o">=</span> <span class="n">child</span><span class="p">[</span><span class="n">index2</span><span class="p">],</span> <span class="n">child</span><span class="p">[</span><span class="n">index1</span><span class="p">]</span>
<span class="k">return</span> <span class="n">child</span>
</code></pre></div></div>
<p>On the other hand, rotate mutation really scrambles up the specimen. We chose a split point, excluding the start node, and divided our sequence into a head, mid, and tail. We then completely reversed the mid, and reconstruct the sequence. This adds quite some diversity to the population.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">rotate_mutate</span><span class="p">(</span><span class="n">child</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">])</span> <span class="o">-></span> <span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">]:</span>
<span class="s">"""Mutate the cycle by rotating a part nodes"""</span>
<span class="n">split</span> <span class="o">=</span> <span class="n">randint</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">child</span><span class="p">)</span> <span class="o">-</span> <span class="mi">2</span><span class="p">)</span>
<span class="n">head</span> <span class="o">=</span> <span class="n">child</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="n">split</span><span class="p">]</span>
<span class="n">mid</span> <span class="o">=</span> <span class="n">child</span><span class="p">[</span><span class="n">split</span><span class="p">:</span><span class="nb">len</span><span class="p">(</span><span class="n">child</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">][::</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="n">tail</span> <span class="o">=</span> <span class="n">child</span><span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="n">child</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">:]</span>
<span class="n">child</span> <span class="o">=</span> <span class="n">head</span> <span class="o">+</span> <span class="n">mid</span> <span class="o">+</span> <span class="n">tail</span>
<span class="k">return</span> <span class="n">child</span>
</code></pre></div></div>
<p>And that is all there is to it. As a reminder, the full create_offspring function, with all these building blocks put together, looks like this:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">create_offspring</span><span class="p">(</span><span class="n">current_population</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">]],</span> <span class="n">fitness</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">float</span><span class="p">],</span>
<span class="n">population_size</span><span class="p">:</span> <span class="nb">int</span><span class="p">)</span> <span class="o">-></span> <span class="nb">list</span><span class="p">[</span><span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">]]:</span>
<span class="s">"""Create a new generation"""</span>
<span class="n">offspring</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">]]</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">while</span> <span class="nb">len</span><span class="p">(</span><span class="n">offspring</span><span class="p">)</span> <span class="o"><</span> <span class="n">population_size</span><span class="p">:</span>
<span class="n">parent1</span> <span class="o">=</span> <span class="n">parent2</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">while</span> <span class="n">parent1</span> <span class="o">==</span> <span class="n">parent2</span><span class="p">:</span>
<span class="n">parent1</span> <span class="o">=</span> <span class="n">get_parent</span><span class="p">(</span><span class="n">fitness</span><span class="p">)</span>
<span class="n">parent2</span> <span class="o">=</span> <span class="n">get_parent</span><span class="p">(</span><span class="n">fitness</span><span class="p">)</span>
<span class="n">child1</span> <span class="o">=</span> <span class="n">crossover</span><span class="p">(</span><span class="n">current_population</span><span class="p">[</span><span class="n">parent1</span><span class="p">],</span> <span class="n">current_population</span><span class="p">[</span><span class="n">parent2</span><span class="p">])</span>
<span class="n">child2</span> <span class="o">=</span> <span class="n">crossover</span><span class="p">(</span><span class="n">current_population</span><span class="p">[</span><span class="n">parent2</span><span class="p">],</span> <span class="n">current_population</span><span class="p">[</span><span class="n">parent1</span><span class="p">])</span>
<span class="n">child1</span> <span class="o">=</span> <span class="n">mutate</span><span class="p">(</span><span class="n">child1</span><span class="p">)</span>
<span class="n">child2</span> <span class="o">=</span> <span class="n">mutate</span><span class="p">(</span><span class="n">child2</span><span class="p">)</span>
<span class="n">offspring</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">child1</span><span class="p">)</span>
<span class="n">offspring</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">child2</span><span class="p">)</span>
<span class="k">return</span> <span class="n">offspring</span>
</code></pre></div></div>
<p>Now that we focused on each individual step, the complete function is very simple to understand. We simply keep pairing up two selected parents, create two children per parent, apply mutation to each of them, and add them to the new generation.</p>
<h4 id="lets-get-fancy-about-it">Let’s get fancy about it</h4>
<p>Just one more thing, though. We did create a whole new generation of candidate solutions, equal in size to the previous generation. We introduced a bunch of randomization along the way, both in the selection of parents as well as in how their DNA was propagated to the next generation. And just like in nature, the parents have now played their parts, and they die a lonely death, assured in the knowledge that all that was best about them lives on in their children as they wither away. Wait, that got way too grim, way too fast. Also, this isn’t nature. Or is it? We’re still playing survival of the fittest, and there’s no need to brush aside our strong performers just because we’ve got a new generation waiting to take over.</p>
<p>So we’ll introduce another little technique called elitism, which is implemented as such:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="n">new_population</span> <span class="o">=</span> <span class="n">current_population</span> <span class="o">+</span> <span class="n">offspring</span>
<span class="n">offspring_cycle_lengths</span> <span class="o">=</span> <span class="n">get_cycle_lengths</span><span class="p">(</span><span class="n">offspring</span><span class="p">,</span> <span class="n">distances</span><span class="p">)</span>
<span class="n">new_population_cycle_lengths</span> <span class="o">=</span> <span class="n">cycle_lengths</span> <span class="o">+</span> <span class="n">offspring_cycle_lengths</span>
<span class="n">new_population_fitness</span> <span class="o">=</span> <span class="n">fitness</span> <span class="o">+</span> <span class="n">determine_fitness</span><span class="p">(</span><span class="n">offspring_cycle_lengths</span><span class="p">)</span>
<span class="n">survivor_candidates</span> <span class="o">=</span> <span class="nb">zip</span><span class="p">(</span><span class="n">new_population_fitness</span><span class="p">,</span> <span class="n">new_population</span><span class="p">)</span>
<span class="n">fittest_indices</span> <span class="o">=</span> <span class="p">[</span><span class="n">i</span> <span class="k">for</span> <span class="n">_</span><span class="p">,</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">heapq</span><span class="p">.</span><span class="n">nlargest</span><span class="p">(</span><span class="n">population_size</span><span class="p">,</span> <span class="p">((</span><span class="n">x</span><span class="p">,</span> <span class="n">i</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">x</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">survivor_candidates</span><span class="p">)))]</span>
<span class="n">new_population</span> <span class="o">=</span> <span class="p">[</span><span class="n">new_population</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">fittest_indices</span><span class="p">]</span>
<span class="n">new_population_cycle_lengths</span> <span class="o">=</span> <span class="p">[</span><span class="n">new_population_cycle_lengths</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">fittest_indices</span><span class="p">]</span>
</code></pre></div></div>
<p>We simply add the new generation of offspring to the current generation. And then we execute our fitness function again. If you’re a visual thinker, imagine we’re having ourselves a good old brawl that will knock out half of the population, leaving the strongest ones standing. It’s these specimens that we’ll keep on board to continue the algorithm. Also, notice we’re using a priority queue, which has a max binary heap implementation under the hood to only select the best x largest of elements without having to order the full list. I actually implemented that myself <a href="https://www.peculiar-coding-endeavours.com/2020/through-the-trees/" target="_blank">here</a> - well it’s a min binary heap there, same idea though - before I knew that there was an implementation built-in 😉 Still, I strongly believe that you should know about and be able to implement these kinds of basic data structures on your own. Don’t depend on others to do all the thinking for you. I always valued fundamentals, algorithms, and data structures way more than cobbling together packages and frameworks. And it seems like, in these ChatGPT times where example implementations of frameworks are a prompt away, that’ll be even more useful than ever.</p>
<p>In any case, that’s all in terms of implementation. As I said before, the code isn’t really that involved at all, and each step can be easily visualized. This is only a very basic implementation, though, and there’s so much to uncover about genetic algorithms, techniques to perform crossover, selection, mutation, optimization of the parameters and so much more. I hope this small example does trigger your interest to start researching more about them. It’s really quite wonderful how such a simple concept can yield very good results in a short time.</p>
<h3 id="thousands-of-little-creepers">Thousands of little creepers…</h3>
<p>Talking about wonderful… The next algorithm got me even more excited than genetic algorithms. We’ll stick to nature, but switch gears from the wonders of evolution to the behavior of one of the most numerous animals on the planet: ants! I was extremely excited to dig my fingers into this one, and I hope it will get you equally excited. Ant colony optimization is a very interesting algorithm where we mimic the behavior of ant swarms. We will unleash a number of waves of ants onto our starting node and have them explore our graph until they visit all nodes and end up back at the starting node. As they explore the graph, they leave behind a pheromone trail that will encourage other ants to also take that route. The better (shorter, in our case) a route is, the stronger the pheromone trail, and the more chance there is that another ant will prefer to take that route, leaving behind additional pheromones to further strengthen that path. To also introduce some sense of exploration and to avoid local minima, pheromones will evaporate over time. While getting myself familiar with the overall concept and the details behind it, several clips of HK Lam, <a href="https://www.youtube.com/watch?v=jNd7QJQH-kk" target="_blank">like this one and others</a> helped my understanding tremendously. He has playlists on several interesting topics in the same sphere, definitely check them out! The full code of the algorithm:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">async</span> <span class="k">def</span> <span class="nf">ant_colony</span><span class="p">(</span><span class="n">graph</span><span class="p">:</span> <span class="n">Graph</span><span class="p">,</span> <span class="n">distances</span><span class="p">:</span> <span class="nb">dict</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">],</span> <span class="nb">int</span><span class="p">],</span>
<span class="n">max_swarms</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">max_no_improvement</span><span class="p">:</span> <span class="nb">int</span><span class="p">)</span> <span class="o">-></span> <span class="n">AsyncIterator</span><span class="p">[</span><span class="n">AlgorithmResult</span><span class="p">]:</span>
<span class="s">"""Solve the TSP problem using ant colony optimization"""</span>
<span class="n">swarms_evaluated</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">evaluations_until_solved</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">swarms_without_improvement</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">node_count</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">graph</span><span class="p">.</span><span class="n">nodes</span><span class="p">)</span>
<span class="n">pheromones</span> <span class="o">=</span> <span class="n">initialize_pheromones</span><span class="p">(</span><span class="n">node_count</span><span class="p">,</span> <span class="n">distances</span><span class="p">)</span>
<span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">max_swarms</span><span class="p">):</span>
<span class="k">if</span> <span class="n">swarms_without_improvement</span> <span class="o">>=</span> <span class="n">max_no_improvement</span><span class="p">:</span>
<span class="k">break</span>
<span class="n">vertices</span><span class="p">,</span> <span class="n">cycle_lengths</span> <span class="o">=</span> <span class="n">swarm_traversal</span><span class="p">(</span><span class="n">pheromones</span><span class="p">,</span> <span class="n">distances</span><span class="p">)</span>
<span class="n">pheromones</span> <span class="o">=</span> <span class="n">pheromone_evaporation</span><span class="p">(</span><span class="n">pheromones</span><span class="p">)</span>
<span class="n">best_cycle_index</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">argmin</span><span class="p">(</span><span class="n">cycle_lengths</span><span class="p">)</span>
<span class="n">best_cycle</span> <span class="o">=</span> <span class="n">vertices</span><span class="p">[</span><span class="n">best_cycle_index</span><span class="p">]</span>
<span class="n">best_cycle_length</span> <span class="o">=</span> <span class="n">cycle_lengths</span><span class="p">[</span><span class="n">best_cycle_index</span><span class="p">]</span>
<span class="n">pheromones</span> <span class="o">=</span> <span class="n">pheromone_release</span><span class="p">(</span><span class="n">vertices</span><span class="p">,</span> <span class="n">best_cycle</span><span class="p">,</span> <span class="n">best_cycle_length</span><span class="p">,</span> <span class="n">pheromones</span><span class="p">)</span>
<span class="n">pheromones</span> <span class="o">=</span> <span class="n">normalize</span><span class="p">(</span><span class="n">pheromones</span><span class="p">)</span>
<span class="n">swarms_evaluated</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">if</span> <span class="n">cycle_lengths</span><span class="p">[</span><span class="n">best_cycle_index</span><span class="p">]</span> <span class="o"><</span> <span class="n">graph</span><span class="p">.</span><span class="n">optimal_cycle_length</span><span class="p">:</span>
<span class="n">graph</span><span class="p">.</span><span class="n">remove_vertices</span><span class="p">()</span>
<span class="n">graph</span><span class="p">.</span><span class="n">add_vertices</span><span class="p">(</span><span class="n">best_cycle</span><span class="p">)</span>
<span class="n">graph</span><span class="p">.</span><span class="n">optimal_cycle</span> <span class="o">=</span> <span class="n">ShortestPath</span><span class="p">(</span><span class="n">best_cycle_length</span><span class="p">,</span> <span class="n">best_cycle</span><span class="p">)</span>
<span class="n">evaluations_until_solved</span> <span class="o">=</span> <span class="n">swarms_evaluated</span>
<span class="n">swarms_without_improvement</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="mf">0.0001</span><span class="p">)</span>
<span class="k">yield</span> <span class="n">AlgorithmResult</span><span class="p">(</span><span class="n">swarms_evaluated</span><span class="p">,</span> <span class="n">evaluations_until_solved</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">swarms_without_improvement</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">yield</span> <span class="n">AlgorithmResult</span><span class="p">(</span><span class="n">swarms_evaluated</span><span class="p">,</span> <span class="n">evaluations_until_solved</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">initialize_pheromones</span><span class="p">(</span><span class="n">node_count</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">distances</span><span class="p">:</span> <span class="nb">dict</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">],</span> <span class="nb">int</span><span class="p">])</span> <span class="o">-></span> <span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">:</span>
<span class="s">"""Initialize the pheromone array"""</span>
<span class="n">pheromones</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">node_count</span><span class="p">,</span> <span class="n">node_count</span><span class="p">),</span> <span class="n">dtype</span><span class="o">=</span><span class="nb">float</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">j</span> <span class="ow">in</span> <span class="n">np</span><span class="p">.</span><span class="n">ndindex</span><span class="p">(</span><span class="n">pheromones</span><span class="p">.</span><span class="n">shape</span><span class="p">):</span>
<span class="k">if</span> <span class="n">i</span> <span class="o">!=</span> <span class="n">j</span><span class="p">:</span>
<span class="n">pheromones</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="n">distances</span><span class="p">[(</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">)]</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">node_count</span><span class="p">):</span>
<span class="n">row_sum</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">(</span><span class="n">pheromones</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="p">:])</span>
<span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">node_count</span><span class="p">):</span>
<span class="k">if</span> <span class="n">i</span> <span class="o">!=</span> <span class="n">j</span><span class="p">:</span>
<span class="n">pheromones</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="n">row_sum</span> <span class="o">/</span> <span class="n">pheromones</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">]</span>
<span class="n">row_sum</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">(</span><span class="n">pheromones</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="p">:])</span>
<span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">node_count</span><span class="p">):</span>
<span class="k">if</span> <span class="n">i</span> <span class="o">!=</span> <span class="n">j</span><span class="p">:</span>
<span class="n">pheromones</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">]</span> <span class="o">/=</span> <span class="n">row_sum</span>
<span class="k">return</span> <span class="n">pheromones</span>
<span class="k">def</span> <span class="nf">normalize</span><span class="p">(</span><span class="n">pheromones</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">)</span> <span class="o">-></span> <span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">:</span>
<span class="s">"""Normalize the pheromone matrix into probabilities"""</span>
<span class="n">node_count</span> <span class="o">=</span> <span class="n">pheromones</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">node_count</span><span class="p">):</span>
<span class="n">row_sum</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">(</span><span class="n">pheromones</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="p">:])</span>
<span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">node_count</span><span class="p">):</span>
<span class="k">if</span> <span class="n">i</span> <span class="o">!=</span> <span class="n">j</span><span class="p">:</span>
<span class="n">pheromones</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">]</span> <span class="o">/=</span> <span class="n">row_sum</span>
<span class="k">return</span> <span class="n">pheromones</span>
<span class="k">def</span> <span class="nf">swarm_traversal</span><span class="p">(</span><span class="n">pheromones</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">,</span> <span class="n">distances</span><span class="p">:</span> <span class="nb">dict</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">],</span> <span class="nb">int</span><span class="p">])</span> <span class="o">-></span> <span class="nb">tuple</span><span class="p">[</span><span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">,</span> <span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">]]:</span>
<span class="s">"""Traverse the graph with a number of ants equal to the number of nodes"""</span>
<span class="n">node_count</span> <span class="o">=</span> <span class="n">pheromones</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">swarm_size</span> <span class="o">=</span> <span class="n">node_count</span> <span class="o">*</span> <span class="n">node_count</span>
<span class="n">vertices</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([[(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">)]</span> <span class="o">*</span> <span class="n">node_count</span><span class="p">]</span> <span class="o">*</span> <span class="n">swarm_size</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="s">"i,i,i"</span><span class="p">)</span>
<span class="n">cycle_lengths</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">]</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">*</span> <span class="n">swarm_size</span>
<span class="c1"># Traverse the graph swarm_size times
</span> <span class="k">with</span> <span class="n">ThreadPoolExecutor</span><span class="p">(</span><span class="n">max_workers</span><span class="o">=</span><span class="nb">min</span><span class="p">(</span><span class="n">swarm_size</span><span class="p">,</span> <span class="mi">50</span><span class="p">))</span> <span class="k">as</span> <span class="n">executor</span><span class="p">:</span>
<span class="n">futures</span> <span class="o">=</span> <span class="p">[</span><span class="n">executor</span><span class="p">.</span><span class="n">submit</span><span class="p">(</span><span class="n">traverse</span><span class="p">,</span> <span class="n">node_count</span><span class="p">,</span> <span class="n">pheromones</span><span class="p">,</span> <span class="n">distances</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">swarm_size</span><span class="p">)]</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">completed</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">as_completed</span><span class="p">(</span><span class="n">futures</span><span class="p">)):</span>
<span class="n">cycle_length</span><span class="p">,</span> <span class="n">cycle</span> <span class="o">=</span> <span class="n">completed</span><span class="p">.</span><span class="n">result</span><span class="p">()</span>
<span class="n">vertices</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">cycle</span>
<span class="n">cycle_lengths</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">cycle_length</span>
<span class="k">return</span> <span class="n">vertices</span><span class="p">,</span> <span class="n">cycle_lengths</span>
<span class="k">def</span> <span class="nf">traverse</span><span class="p">(</span><span class="n">node_count</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">pheromones</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">,</span>
<span class="n">distances</span><span class="p">:</span> <span class="nb">dict</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">],</span> <span class="nb">int</span><span class="p">])</span> <span class="o">-></span> <span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">]:</span>
<span class="s">"""Perform a traversal through the graph"""</span>
<span class="c1"># Each traversal consists of node_count vertices
</span> <span class="n">current_node</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">visited</span> <span class="o">=</span> <span class="nb">set</span><span class="p">([</span><span class="n">current_node</span><span class="p">])</span>
<span class="n">cycle_length</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">vertices</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">)]</span> <span class="o">*</span> <span class="n">node_count</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="s">"i,i,i"</span><span class="p">)</span>
<span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">node_count</span> <span class="o">-</span> <span class="mi">1</span><span class="p">):</span>
<span class="n">row_sorted_indices</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">argsort</span><span class="p">(</span><span class="n">pheromones</span><span class="p">[</span><span class="n">current_node</span><span class="p">,</span> <span class="p">:])</span>
<span class="n">row</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">take</span><span class="p">(</span><span class="n">pheromones</span><span class="p">[</span><span class="n">current_node</span><span class="p">,</span> <span class="p">:],</span> <span class="n">row_sorted_indices</span><span class="p">)</span>
<span class="n">cumul</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">for</span> <span class="n">k</span><span class="p">,</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">row</span><span class="p">):</span>
<span class="k">if</span> <span class="n">row</span><span class="p">[</span><span class="n">k</span><span class="p">]</span> <span class="o">==</span> <span class="mf">0.0</span><span class="p">:</span>
<span class="k">continue</span>
<span class="n">row</span><span class="p">[</span><span class="n">k</span><span class="p">],</span> <span class="n">cumul</span> <span class="o">=</span> <span class="n">row</span><span class="p">[</span><span class="n">k</span><span class="p">]</span> <span class="o">+</span> <span class="n">cumul</span><span class="p">,</span> <span class="n">row</span><span class="p">[</span><span class="n">k</span><span class="p">]</span> <span class="o">+</span> <span class="n">cumul</span>
<span class="n">index</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span>
<span class="n">chance</span> <span class="o">=</span> <span class="n">random</span><span class="p">()</span>
<span class="k">for</span> <span class="n">k</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">row</span><span class="p">)):</span>
<span class="n">candidate</span> <span class="o">=</span> <span class="n">row_sorted_indices</span><span class="p">[</span><span class="n">k</span><span class="p">]</span>
<span class="k">if</span> <span class="p">(</span><span class="n">row</span><span class="p">[</span><span class="n">k</span> <span class="o">-</span> <span class="mi">1</span><span class="p">]</span> <span class="o"><</span> <span class="n">chance</span> <span class="o"><=</span> <span class="n">row</span><span class="p">[</span><span class="n">k</span><span class="p">]</span> <span class="ow">and</span> <span class="n">candidate</span> <span class="o">!=</span> <span class="n">current_node</span> <span class="ow">and</span> <span class="n">candidate</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">visited</span><span class="p">):</span>
<span class="n">index</span> <span class="o">=</span> <span class="n">candidate</span>
<span class="k">break</span>
<span class="c1"># If no suitable index was found, the generated chance was probably too low
</span> <span class="c1"># Pick the first index that's not itself and not visited yet
</span> <span class="k">if</span> <span class="n">index</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">:</span>
<span class="k">for</span> <span class="n">k</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">row</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">):</span>
<span class="n">candidate</span> <span class="o">=</span> <span class="n">row_sorted_indices</span><span class="p">[</span><span class="n">k</span><span class="p">]</span>
<span class="k">if</span> <span class="n">candidate</span> <span class="o">!=</span> <span class="n">current_node</span> <span class="ow">and</span> <span class="n">candidate</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">visited</span><span class="p">:</span>
<span class="n">index</span> <span class="o">=</span> <span class="n">candidate</span>
<span class="k">break</span>
<span class="n">distance</span> <span class="o">=</span> <span class="n">distances</span><span class="p">[</span><span class="n">current_node</span><span class="p">,</span> <span class="n">index</span><span class="p">]</span>
<span class="n">vertices</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">current_node</span><span class="p">,</span> <span class="n">index</span><span class="p">,</span> <span class="n">distance</span><span class="p">)</span>
<span class="n">cycle_length</span> <span class="o">+=</span> <span class="n">distance</span>
<span class="n">visited</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">index</span><span class="p">)</span>
<span class="n">current_node</span> <span class="o">=</span> <span class="n">index</span>
<span class="c1"># Add the last vertex back to the starting node
</span> <span class="n">distance</span> <span class="o">=</span> <span class="n">distances</span><span class="p">[</span><span class="n">current_node</span><span class="p">,</span> <span class="mi">0</span><span class="p">]</span>
<span class="n">vertices</span><span class="p">[</span><span class="n">node_count</span> <span class="o">-</span> <span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">current_node</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">distance</span><span class="p">)</span>
<span class="n">cycle_length</span> <span class="o">+=</span> <span class="n">distance</span>
<span class="k">return</span> <span class="n">cycle_length</span><span class="p">,</span> <span class="n">vertices</span>
<span class="k">def</span> <span class="nf">pheromone_evaporation</span><span class="p">(</span><span class="n">pheromones</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">)</span> <span class="o">-></span> <span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">:</span>
<span class="s">"""Evaporation of pheromones after each traversal"""</span>
<span class="k">for</span> <span class="n">index</span> <span class="ow">in</span> <span class="n">np</span><span class="p">.</span><span class="n">ndindex</span><span class="p">(</span><span class="n">pheromones</span><span class="p">.</span><span class="n">shape</span><span class="p">):</span>
<span class="n">pheromones</span><span class="p">[</span><span class="n">index</span><span class="p">]</span> <span class="o">*=</span> <span class="mi">1</span> <span class="o">-</span> <span class="n">pheromones</span><span class="p">[</span><span class="n">index</span><span class="p">]</span>
<span class="k">return</span> <span class="n">pheromones</span>
<span class="k">def</span> <span class="nf">pheromone_release</span><span class="p">(</span><span class="n">vertices</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">,</span> <span class="n">best_cycle</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">,</span>
<span class="n">best_cycle_length</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">pheromones</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">)</span> <span class="o">-></span> <span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">:</span>
<span class="s">"""Perform pheromone release, with elitism towards shorter cycles"""</span>
<span class="k">for</span> <span class="n">cycle</span> <span class="ow">in</span> <span class="n">vertices</span><span class="p">:</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">,</span> <span class="n">weight</span> <span class="ow">in</span> <span class="n">cycle</span><span class="p">:</span>
<span class="n">pheromones</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">]</span> <span class="o">+=</span> <span class="mi">1</span> <span class="o">/</span> <span class="n">weight</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">,</span> <span class="n">_</span> <span class="ow">in</span> <span class="n">best_cycle</span><span class="p">:</span>
<span class="n">pheromones</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">]</span> <span class="o">+=</span> <span class="mi">1</span> <span class="o">/</span> <span class="n">best_cycle_length</span>
<span class="k">return</span> <span class="n">pheromones</span>
</code></pre></div></div>
<h4 id="follow-the-crumbs-or-pheromones-really">Follow the crumbs, or pheromones really…</h4>
<p>You’ll notice we use similar ideas as we did for the genetic algorithm. We’ll exit when we have a certain amount of swarm traversals without improvement in terms of finding better routes, and parameters such as the number of swarms and the size of a swarm are very much subject to trial and error. Just like we initialized a starting generation to kick off the genetic algorithm, we’ll have some initialization going on here, namely the pheromone matrix that will drive the behavior of the ants as they choose a certain route.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">initialize_pheromones</span><span class="p">(</span><span class="n">node_count</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">distances</span><span class="p">:</span> <span class="nb">dict</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">],</span> <span class="nb">int</span><span class="p">])</span> <span class="o">-></span> <span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">:</span>
<span class="s">"""Initialize the pheromone array"""</span>
<span class="n">pheromones</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">node_count</span><span class="p">,</span> <span class="n">node_count</span><span class="p">),</span> <span class="n">dtype</span><span class="o">=</span><span class="nb">float</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">j</span> <span class="ow">in</span> <span class="n">np</span><span class="p">.</span><span class="n">ndindex</span><span class="p">(</span><span class="n">pheromones</span><span class="p">.</span><span class="n">shape</span><span class="p">):</span>
<span class="k">if</span> <span class="n">i</span> <span class="o">!=</span> <span class="n">j</span><span class="p">:</span>
<span class="n">pheromones</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="n">distances</span><span class="p">[(</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">)]</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">node_count</span><span class="p">):</span>
<span class="n">row_sum</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">(</span><span class="n">pheromones</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="p">:])</span>
<span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">node_count</span><span class="p">):</span>
<span class="k">if</span> <span class="n">i</span> <span class="o">!=</span> <span class="n">j</span><span class="p">:</span>
<span class="n">pheromones</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="n">row_sum</span> <span class="o">/</span> <span class="n">pheromones</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">]</span>
<span class="n">row_sum</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">(</span><span class="n">pheromones</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="p">:])</span>
<span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">node_count</span><span class="p">):</span>
<span class="k">if</span> <span class="n">i</span> <span class="o">!=</span> <span class="n">j</span><span class="p">:</span>
<span class="n">pheromones</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">]</span> <span class="o">/=</span> <span class="n">row_sum</span>
<span class="k">return</span> <span class="n">pheromones</span>
</code></pre></div></div>
<p>We fill up a NxN matrix with zeros, where N is the number of nodes to visit. Then we add the distances from each node to each other node to that matrix. After that, we simply normalize these distances for each row into probabilities, just like we did for the genetic algorithm. Shorter distances between two nodes will result in a higher probability of that path being chosen. It’s really quite simple, and once that concept clicks with you, it’s not hard to imagine what follows. We will use this pheromone matrix to influence the routes that the ants take. Whenever an ant is at a node, their decision for the next node to go to depends on whether that node has already been visited and the probability assigned to that node.</p>
<h4 id="unleashing-the-swarms">Unleashing the swarms</h4>
<p>Now that our initial pheromone matrix is set up, we’ll let loose a swarm of ants on the starting node and have them traverse the entire graph. They will each visit every node exactly once, and end up back at the starting node. The amount of ants in a swarm is up for debate and theoretical discussion, but we’ll limit ourselves to n² ants per swarm. We spin up a number of threads to process each swarm traversal and collect the results of each traversal, which are the cycle lengths and the vertices traveled in sequence.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">swarm_traversal</span><span class="p">(</span><span class="n">pheromones</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">,</span> <span class="n">distances</span><span class="p">:</span> <span class="nb">dict</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">],</span> <span class="nb">int</span><span class="p">])</span> <span class="o">-></span> <span class="nb">tuple</span><span class="p">[</span><span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">,</span> <span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">]]:</span>
<span class="s">"""Traverse the graph with a number of ants equal to the number of nodes"""</span>
<span class="n">node_count</span> <span class="o">=</span> <span class="n">pheromones</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">swarm_size</span> <span class="o">=</span> <span class="n">node_count</span> <span class="o">*</span> <span class="n">node_count</span>
<span class="n">vertices</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([[(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">)]</span> <span class="o">*</span> <span class="n">node_count</span><span class="p">]</span> <span class="o">*</span> <span class="n">swarm_size</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="s">"i,i,i"</span><span class="p">)</span>
<span class="n">cycle_lengths</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">]</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">*</span> <span class="n">swarm_size</span>
<span class="c1"># Traverse the graph swarm_size times
</span> <span class="k">with</span> <span class="n">ThreadPoolExecutor</span><span class="p">(</span><span class="n">max_workers</span><span class="o">=</span><span class="nb">min</span><span class="p">(</span><span class="n">swarm_size</span><span class="p">,</span> <span class="mi">50</span><span class="p">))</span> <span class="k">as</span> <span class="n">executor</span><span class="p">:</span>
<span class="n">futures</span> <span class="o">=</span> <span class="p">[</span><span class="n">executor</span><span class="p">.</span><span class="n">submit</span><span class="p">(</span><span class="n">traverse</span><span class="p">,</span> <span class="n">node_count</span><span class="p">,</span> <span class="n">pheromones</span><span class="p">,</span> <span class="n">distances</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">swarm_size</span><span class="p">)]</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">completed</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">as_completed</span><span class="p">(</span><span class="n">futures</span><span class="p">)):</span>
<span class="n">cycle_length</span><span class="p">,</span> <span class="n">cycle</span> <span class="o">=</span> <span class="n">completed</span><span class="p">.</span><span class="n">result</span><span class="p">()</span>
<span class="n">vertices</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">cycle</span>
<span class="n">cycle_lengths</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">cycle_length</span>
<span class="k">return</span> <span class="n">vertices</span><span class="p">,</span> <span class="n">cycle_lengths</span>
</code></pre></div></div>
<p>Each traversal is simply a sequence of decisions going from one node to the next. We use the pheromone matrix to help us determine which candidate we will visit next. We generate a random number between 0 and 1 and loop over all candidates. We compare the generated random probability to each candidate (in the row in the pheromone matrix for that node, sorted to make this selection possible), and the first node we find that has a probability high enough to satisfy the generated number is the one we will go for (if it has not been visited already or is equal to the actual node we are at). As a fallback, we just take the first node that has not been visited yet. The code looks more involved than what is actually behind it, and there are undeniably more terse ways to express this, but this code allows you to follow along much easier if you decide to step through the code yourself.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">traverse</span><span class="p">(</span><span class="n">node_count</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">pheromones</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">,</span>
<span class="n">distances</span><span class="p">:</span> <span class="nb">dict</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">],</span> <span class="nb">int</span><span class="p">])</span> <span class="o">-></span> <span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">]:</span>
<span class="s">"""Perform a traversal through the graph"""</span>
<span class="c1"># Each traversal consists of node_count vertices
</span> <span class="n">current_node</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">visited</span> <span class="o">=</span> <span class="nb">set</span><span class="p">([</span><span class="n">current_node</span><span class="p">])</span>
<span class="n">cycle_length</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">vertices</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">)]</span> <span class="o">*</span> <span class="n">node_count</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="s">"i,i,i"</span><span class="p">)</span>
<span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">node_count</span> <span class="o">-</span> <span class="mi">1</span><span class="p">):</span>
<span class="n">row_sorted_indices</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">argsort</span><span class="p">(</span><span class="n">pheromones</span><span class="p">[</span><span class="n">current_node</span><span class="p">,</span> <span class="p">:])</span>
<span class="n">row</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">take</span><span class="p">(</span><span class="n">pheromones</span><span class="p">[</span><span class="n">current_node</span><span class="p">,</span> <span class="p">:],</span> <span class="n">row_sorted_indices</span><span class="p">)</span>
<span class="n">cumul</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">for</span> <span class="n">k</span><span class="p">,</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">row</span><span class="p">):</span>
<span class="k">if</span> <span class="n">row</span><span class="p">[</span><span class="n">k</span><span class="p">]</span> <span class="o">==</span> <span class="mf">0.0</span><span class="p">:</span>
<span class="k">continue</span>
<span class="n">row</span><span class="p">[</span><span class="n">k</span><span class="p">],</span> <span class="n">cumul</span> <span class="o">=</span> <span class="n">row</span><span class="p">[</span><span class="n">k</span><span class="p">]</span> <span class="o">+</span> <span class="n">cumul</span><span class="p">,</span> <span class="n">row</span><span class="p">[</span><span class="n">k</span><span class="p">]</span> <span class="o">+</span> <span class="n">cumul</span>
<span class="n">index</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span>
<span class="n">chance</span> <span class="o">=</span> <span class="n">random</span><span class="p">()</span>
<span class="k">for</span> <span class="n">k</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">row</span><span class="p">)):</span>
<span class="n">candidate</span> <span class="o">=</span> <span class="n">row_sorted_indices</span><span class="p">[</span><span class="n">k</span><span class="p">]</span>
<span class="k">if</span> <span class="p">(</span><span class="n">row</span><span class="p">[</span><span class="n">k</span> <span class="o">-</span> <span class="mi">1</span><span class="p">]</span> <span class="o"><</span> <span class="n">chance</span> <span class="o"><=</span> <span class="n">row</span><span class="p">[</span><span class="n">k</span><span class="p">]</span> <span class="ow">and</span> <span class="n">candidate</span> <span class="o">!=</span> <span class="n">current_node</span> <span class="ow">and</span> <span class="n">candidate</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">visited</span><span class="p">):</span>
<span class="n">index</span> <span class="o">=</span> <span class="n">candidate</span>
<span class="k">break</span>
<span class="c1"># If no suitable index was found, the generated chance was probably too low
</span> <span class="c1"># Pick the first index that's not itself and not visited yet
</span> <span class="k">if</span> <span class="n">index</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">:</span>
<span class="k">for</span> <span class="n">k</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">row</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">):</span>
<span class="n">candidate</span> <span class="o">=</span> <span class="n">row_sorted_indices</span><span class="p">[</span><span class="n">k</span><span class="p">]</span>
<span class="k">if</span> <span class="n">candidate</span> <span class="o">!=</span> <span class="n">current_node</span> <span class="ow">and</span> <span class="n">candidate</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">visited</span><span class="p">:</span>
<span class="n">index</span> <span class="o">=</span> <span class="n">candidate</span>
<span class="k">break</span>
<span class="n">distance</span> <span class="o">=</span> <span class="n">distances</span><span class="p">[</span><span class="n">current_node</span><span class="p">,</span> <span class="n">index</span><span class="p">]</span>
<span class="n">vertices</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">current_node</span><span class="p">,</span> <span class="n">index</span><span class="p">,</span> <span class="n">distance</span><span class="p">)</span>
<span class="n">cycle_length</span> <span class="o">+=</span> <span class="n">distance</span>
<span class="n">visited</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">index</span><span class="p">)</span>
<span class="n">current_node</span> <span class="o">=</span> <span class="n">index</span>
<span class="c1"># Add the last vertex back to the starting node
</span> <span class="n">distance</span> <span class="o">=</span> <span class="n">distances</span><span class="p">[</span><span class="n">current_node</span><span class="p">,</span> <span class="mi">0</span><span class="p">]</span>
<span class="n">vertices</span><span class="p">[</span><span class="n">node_count</span> <span class="o">-</span> <span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">current_node</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">distance</span><span class="p">)</span>
<span class="n">cycle_length</span> <span class="o">+=</span> <span class="n">distance</span>
<span class="k">return</span> <span class="n">cycle_length</span><span class="p">,</span> <span class="n">vertices</span>
</code></pre></div></div>
<p>Once we find the next node to visit, we simply add that to the set of already visited nodes so we can avoid visiting the same node twice. At the end, we simply make the last hop back to our starting node and return the completed cycle and its length.</p>
<h4 id="pheromone-release-and-evaporation">Pheromone release and evaporation</h4>
<p>After each traversal, we are hopefully a bit wiser as to which sequence of nodes results in a more ideal path. However, ants wouldn’t be ants if there wasn’t some sort of overarching mechanism to guide the whole swarm to more desirable circumstances. The pheromone matrix we initialized at the start will enable us to do just that. Based on the results of each traversal, we can implement a kind of feedback loop back into the swarm that will guide us to make even better decisions the next time we decide to make the journey. The functions pheromone_evaporation and pheromone_release do exactly that. Let’s look at each of them, starting with pheromone evaporation:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">pheromone_evaporation</span><span class="p">(</span><span class="n">pheromones</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">)</span> <span class="o">-></span> <span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">:</span>
<span class="s">"""Evaporation of pheromones after each traversal"""</span>
<span class="k">for</span> <span class="n">index</span> <span class="ow">in</span> <span class="n">np</span><span class="p">.</span><span class="n">ndindex</span><span class="p">(</span><span class="n">pheromones</span><span class="p">.</span><span class="n">shape</span><span class="p">):</span>
<span class="n">pheromones</span><span class="p">[</span><span class="n">index</span><span class="p">]</span> <span class="o">*=</span> <span class="mi">1</span> <span class="o">-</span> <span class="n">pheromones</span><span class="p">[</span><span class="n">index</span><span class="p">]</span>
<span class="k">return</span> <span class="n">pheromones</span>
</code></pre></div></div>
<p>Why would we want to dissolve the pheromone trail in the first place? Well, specifically for our problem, we want to avoid strengthening paths that initially look more interesting but do not ultimately lead to a close to optimal solution. Remember, during the pheromone matrix initialization, we already favored shorter paths by giving them a higher probability. Although some randomization is built into the path choice, we want to keep some sense of exploration in the system, and as such, we do not want to infinitely strengthen already strong paths while at the same time discouraging paths that seemed very unlikely to lead to ideal results. So we slightly lower the intensity of each pheromone trail over time. Practically, when a certain node path has a probability of 0.45, its new value will be (1 - 0.45) x 0.45 = 0.2475. A path with a probability of 0.1 will evaporate to (1 - 0.1) x 0 .1 = 0.09. Stronger paths will have stronger evaporation than weak paths.</p>
<p>Of course, paths that lead to great overall results (meaning shorter cycle lengths) are to be encouraged. This is why we also have a positive feedback loop in the form of releasing pheromones:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">pheromone_release</span><span class="p">(</span><span class="n">vertices</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">,</span> <span class="n">best_cycle</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">,</span>
<span class="n">best_cycle_length</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">pheromones</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">)</span> <span class="o">-></span> <span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">:</span>
<span class="s">"""Perform pheromone release, with elitism towards shorter cycles"""</span>
<span class="k">for</span> <span class="n">cycle</span> <span class="ow">in</span> <span class="n">vertices</span><span class="p">:</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">,</span> <span class="n">weight</span> <span class="ow">in</span> <span class="n">cycle</span><span class="p">:</span>
<span class="n">pheromones</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">]</span> <span class="o">+=</span> <span class="mi">1</span> <span class="o">/</span> <span class="n">weight</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">,</span> <span class="n">_</span> <span class="ow">in</span> <span class="n">best_cycle</span><span class="p">:</span>
<span class="n">pheromones</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">]</span> <span class="o">+=</span> <span class="mi">1</span> <span class="o">/</span> <span class="n">best_cycle_length</span>
<span class="k">return</span> <span class="n">pheromones</span>
</code></pre></div></div>
<p>For each vertex in the cycle, we increase the pheromones on that vertex by 1 divided by the weight (in our case, the distance between the nodes on each side of the vertex). Simply put, shorter paths will be rewarded more than longer paths, and their pheromone trail will be enforced much more strongly. On top of this, just like we implemented a form of elitism in the case of our genetic algorithm, we will do so here. We’ll also add some extra pheromone strength to each path in the best cycle we’ve found so far.</p>
<p>The combination of this negative and positive feedback loop, applied to the results of the traversals of each ant in the swarm over multiple swarms in time, will very quickly lead to a close to optimal shortest path through the entire graph. The code is quite easy and was quickly implemented. Compared to the genetic algorithm, though, the explainability of the entire algorithm (although the steps in and of themselves are fairly clear) is lower than in the case of genetic algorithms. Intuitively, it’s much easier to reason about how and exactly why genetic algorithms lead to a high-quality approximation than for ant colony optimization. Maybe you feel differently, though. In any case, both are amazing algorithms that are quite easy to implement, and it doesn’t stop to amaze me how quickly both of them converge to a very high quality approximation of the ideal solution.</p>
<h3 id="concluding">Concluding</h3>
<p>Well, that’s about all I have to say about this topic for now. Since I always wanted to dabble in genetic algorithms and ant colony optimization, this was a nice little project to keep me occupied for a few days. It took me a few months to start this actual blog, but I’m currently investing pretty much all my spare time into learning Rust, so this is definitely the last Python project I’ll post for a while. I already use Python in my day-to-day work, and since I do heavily prefer statically typed compiled languages and I’m a sucker for performance, expect a lot of Rust in the future. I’m currently playing around with cloth simulation, wave function collapse, and procedural generation, so expect several of those (and other) little adventures to appear here soon. I played around with a fractal zoomer for the Mandelbrot set using <a href="https://github.com/taichi-dev/taichi" target="_blank">Taichi</a> to get it to run on the GPU, but I’m thinking to<br />
<img src="https://www.peculiar-coding-endeavours.com/assets/tsp/rewriteinrust.jpg" alt="rewrite in rust" /><br />
and use <a href="https://bevyengine.org/" target="_blank">Bevy</a>, since that will probably enable me to zoom into fractals even further than what Python allows me to crunch out. We’ll see. If not, I’ll update this article with the GitHub link to my Python project. So much to do, so little time 😉</p>
<p>In any case, thank you for reading this (if you got through it at all, I wouldn’t blame you otherwise). I do hope it sparked some interest and you learned something! I’m 100% only a rookie concerning genetic algorithms and ant colony optimization, but this project and the little bit of study I needed to crank it out at least familiarized me with the concepts involved. I hope this inspires at least some people to lay off the ChatGPT’s of the world and learn new stuff the old-school way, because it’s just a whole lot of fun 😉 Take it from me, though: don’t learn a few new algorithms, make an example implementation in 3 days, and then set it aside for 4 months so you have to relearn it all real quick for the blog post 😀 not the most efficient use of time.</p>
Mon, 01 May 2023 16:00:00 +0000
https://www.peculiar-coding-endeavours.com/2023/traveling-salesman/
https://www.peculiar-coding-endeavours.com/2023/traveling-salesman/tspalgorithmspythonTechAlgorithmsA way of life<p>Everybody and their grandma probably tried their hand at an implementation of Conway’s Game of Life at some point, in one or the other language. It’s a straightforward and elegant little algorithm that can give you cool looking results, and a nice goal to work towards while learning a language. Drilling plain syntax or frameworks is incredibly boring. When learning any new language, having a concrete goal in mind besides “just learning it”, helps speed the process along and makes it more enjoyable by far. So, while learning Python, I took a swing at it.</p>
<p><img src="https://www.peculiar-coding-endeavours.com/assets/game_of_life/glider_gun.png" alt="glider" /></p>
<h3 id="game-of-what">Game of what?</h3>
<p>Just in case you haven’t Googled what I’m even talking about, the Game of Life is a cellular automaton designed by mathematician John Conway. It’s a game that requires just an initial input state, consisting of several alive and dead cells in a grid, after which the games’ rules take over and evolve the game from state to state:</p>
<ul>
<li>a live cell with less than 2 live neighbours dies by underpopulation</li>
<li>a live cell with 2 or 3 live neighbours lives on</li>
<li>a live cell with more than 3 live neighbours dies by overpopulation</li>
<li>a dead cell with 3 live neighbours becomes a live cell</li>
</ul>
<p>These few incredibly simple rules can manifest suprisingly complex patterns and really demonstrate the power of cellular automatons. The Game of Life is also Turning complete, and has been studied extensively. For more info, you can read up on it on <a href="https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life" target="_blank">Wikipedia</a>, where you can take a much deeper dive into the subject.</p>
<h3 id="my-own-spin-on-it">My own spin on it</h3>
<p>During this project, I familiarized myself with several language constructs and frameworks specific to Python, and tried several approaches to tackle the inherent slowness of Python for computational CPU-heavy work. You can check out a little summary video of what I’ll be working toward right here:</p>
<div class="embed-container">
<iframe width="640" height="390" src="https://www.youtube.com/embed/2HOLWExgwzU" frameborder="0" allowfullscreen=""></iframe>
</div>
<style>
.embed-container {
position: relative;
padding-bottom: 56.25%;
height: 0;
overflow: hidden;
max-width: 100%;
}
.embed-container iframe,
.embed-container object,
.embed-container embed {
position: absolute;
top: 0;
left: 0;
width: 100%;
height: 100%;
}
</style>
<p>All the code is available on my <a href="https://github.com/tomvanschaijk/wayoflife" target="_blank">GitHub</a>. The complete code contains all the optimizations and little features I came up with.</p>
<p>One change I introduced, compared to the usual implementations you will find online, is that there are not 2 states for each cell, but 3:</p>
<ul>
<li>New: a cell that just became alive because of the game rules</li>
<li>Survivor: an alive cell that hasn’t changed state in the current iteration</li>
<li>Dead: a cell that is marked as dead, and will be removed from the game in the next iteration</li>
</ul>
<p>This small change in how cells are visualized allows you to follow the evolution of each cell and the environment as a whole. Coupled with that, there are a number of inputs you can perform to interact with the grid:</p>
<ul>
<li>the game can be paused at any time</li>
<li>a random cell layout can be generated</li>
<li>several pre-built patterns can be added to the grid</li>
<li>new cells can be injected into the grid</li>
<li>alive and dead cells can switch places</li>
<li>all surviving cells can be removed</li>
<li>surviving cells that haven’t changed states in x iterations can be removed</li>
<li>while the game is paused, you can move through the states one step at a time</li>
<li>it’s possible to revert x amount of previous steps</li>
<li>colors of all cells can be changed or reset to default values</li>
<li>cell size can be changed to a predefined set of sizes</li>
<li>target framerate can be changed</li>
</ul>
<p>If you are interested in just a simple implementation with 2 states, you can checkout <a href="https://github.com/tomvanschaijk/wayoflife/tree/just_2_states" target="_blank">this branch</a>. If you want to follow the evolution of the project from a basic implementation to using Numba to aid in faster computation of states, optimization of the search space, avoiding unnecessary recalculation of neighbours, and many other of the features above, there are a number of commits you can check out:</p>
<ul>
<li><a href="https://github.com/tomvanschaijk/wayoflife/commit/4fcbf96c61d2f2a7529652c65309eb730640dae5" target="_blank">A basic implementation</a></li>
<li><a href="https://github.com/tomvanschaijk/wayoflife/commit/705f115229768ec80cc675d7730f6463e5f43856" target="_blank">Use of Numba</a></li>
<li><a href="https://github.com/tomvanschaijk/wayoflife/commit/6ffe2844d5979c2afe74de21a74c3e0445dffd0a" target="_blank">Shrink the search space</a></li>
<li><a href="https://github.com/tomvanschaijk/wayoflife/commit/0a30d7abc94f2d1f51a703fe6a2df347b6dec260" target="_blank">Move neighbour counting out of loop</a></li>
</ul>
<p>or just check out the <a href="https://github.com/tomvanschaijk/wayoflife" target="_blank">develop</a> branch for the finished product.</p>
<p>In the rest of the article, I will look at each of these commits and highlight what I consider the most interesting and fun things I worked on. I assume you have some basic knowledge of Python and virtual environments. The requirements.txt file to spin up your own is included, so you can simply check out the code, create your own environment and run the code. Throughout the different sections and checkouts, some stuff might be added to the requirements file, so be wary of the occasional necessity to install more packages ;-)</p>
<h3 id="the-basic-implementation">The basic implementation</h3>
<p>So before we start getting creative, let’s get back to basics and look at what we will implement. In essence, all we need is a grid layout of cells, allow for some input by the user to mark cells as alive or dead, and a way to kick off the game. After that, it’s a matter of implementing the game rules iteratively, pushing each new state to the grid. Sounds simple enough. I used PyGame to implement the GUI. I’ve never been in love with front-end at all, but I was surprised by how easy PyGame was to get into and play around with. Besides PyGame, since we’re dealing with a grid of cells, the obvious choice is to go with NumPy to represent the data structure for the grid. Again, the code for this part can be found in <a href="https://github.com/tomvanschaijk/wayoflife/commit/4fcbf96c61d2f2a7529652c65309eb730640dae5" target="_blank">this commit</a>. Pretty much the only 2 functions that are worth talking about are the initialize and update functions:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">initialize</span><span class="p">(</span><span class="n">width</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">height</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">cell_size</span><span class="p">:</span> <span class="nb">int</span>
<span class="p">)</span> <span class="o">-></span> <span class="nb">tuple</span><span class="p">[</span><span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">,</span> <span class="n">pg</span><span class="p">.</span><span class="n">Surface</span><span class="p">,</span> <span class="n">pg</span><span class="p">.</span><span class="n">time</span><span class="p">.</span><span class="n">Clock</span><span class="p">]:</span>
<span class="s">"""Initialize all we need to start running the game"""</span>
<span class="n">pg</span><span class="p">.</span><span class="n">init</span><span class="p">()</span>
<span class="n">pg</span><span class="p">.</span><span class="n">display</span><span class="p">.</span><span class="n">set_caption</span><span class="p">(</span><span class="s">"Game of Life"</span><span class="p">)</span>
<span class="n">screen</span> <span class="o">=</span> <span class="n">pg</span><span class="p">.</span><span class="n">display</span><span class="p">.</span><span class="n">set_mode</span><span class="p">((</span><span class="n">width</span><span class="p">,</span> <span class="n">height</span><span class="p">))</span>
<span class="n">screen</span><span class="p">.</span><span class="n">fill</span><span class="p">(</span><span class="n">GRID_COLOR</span><span class="p">)</span>
<span class="n">columns</span><span class="p">,</span> <span class="n">rows</span> <span class="o">=</span> <span class="n">width</span> <span class="o">//</span> <span class="n">cell_size</span><span class="p">,</span> <span class="n">height</span> <span class="o">//</span> <span class="n">cell_size</span>
<span class="n">cells</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">rows</span><span class="p">,</span> <span class="n">columns</span><span class="p">),</span> <span class="n">dtype</span><span class="o">=</span><span class="nb">int</span><span class="p">)</span>
<span class="k">return</span> <span class="n">cells</span><span class="p">,</span> <span class="n">screen</span><span class="p">,</span> <span class="n">pg</span><span class="p">.</span><span class="n">time</span><span class="p">.</span><span class="n">Clock</span><span class="p">()</span>
</code></pre></div></div>
<p>It’s pretty clear what happens here. Besides setting up some boilerplate code to initialize PyGame, we create a simple matrix with zeros. A 0 at a certain position in the matrix represents a dead cell, whereas a 1 signifies an alive cell. This grid will then be taken as the input of the update function, in which we’ll calculate the new grid by applying the game rules to the current grid. We’ll also draw the rectangles that need to be redrawn based on the new situation:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">update</span><span class="p">(</span><span class="n">cells</span><span class="p">,</span> <span class="n">cell_size</span><span class="p">,</span> <span class="n">screen</span><span class="p">,</span> <span class="n">running</span><span class="p">):</span>
<span class="s">"""Update the screen"""</span>
<span class="n">updated_cells</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">cells</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">cells</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]),</span> <span class="n">dtype</span><span class="o">=</span><span class="nb">int</span><span class="p">)</span>
<span class="k">for</span> <span class="n">row</span><span class="p">,</span> <span class="n">col</span> <span class="ow">in</span> <span class="n">np</span><span class="p">.</span><span class="n">ndindex</span><span class="p">(</span><span class="n">cells</span><span class="p">.</span><span class="n">shape</span><span class="p">):</span>
<span class="n">alive_neighbours</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nb">sum</span><span class="p">(</span><span class="n">cells</span><span class="p">[</span><span class="n">row</span><span class="o">-</span><span class="mi">1</span><span class="p">:</span><span class="n">row</span><span class="o">+</span><span class="mi">2</span><span class="p">,</span> <span class="n">col</span><span class="o">-</span><span class="mi">1</span><span class="p">:</span><span class="n">col</span><span class="o">+</span><span class="mi">2</span><span class="p">])</span> <span class="o">-</span> <span class="n">cells</span><span class="p">[</span><span class="n">row</span><span class="p">,</span> <span class="n">col</span><span class="p">]</span>
<span class="n">color</span> <span class="o">=</span> <span class="n">BACKGROUND_COLOR</span> <span class="k">if</span> <span class="n">cells</span><span class="p">[</span><span class="n">row</span><span class="p">,</span> <span class="n">col</span><span class="p">]</span> <span class="o">==</span> <span class="mi">0</span> <span class="k">else</span> <span class="n">NEW_COLOR</span>
<span class="k">if</span> <span class="n">cells</span><span class="p">[</span><span class="n">row</span><span class="p">,</span> <span class="n">col</span><span class="p">]</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span>
<span class="k">if</span> <span class="n">alive_neighbours</span> <span class="o">==</span> <span class="mi">2</span> <span class="ow">or</span> <span class="n">alive_neighbours</span> <span class="o">==</span> <span class="mi">3</span><span class="p">:</span>
<span class="n">updated_cells</span><span class="p">[</span><span class="n">row</span><span class="p">,</span> <span class="n">col</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span>
<span class="k">if</span> <span class="n">running</span><span class="p">:</span>
<span class="n">color</span> <span class="o">=</span> <span class="n">SURVIVOR_COLOR</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">if</span> <span class="n">running</span><span class="p">:</span>
<span class="n">color</span> <span class="o">=</span> <span class="n">DEAD_COLOR</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">if</span> <span class="n">alive_neighbours</span> <span class="o">==</span> <span class="mi">3</span><span class="p">:</span>
<span class="n">updated_cells</span><span class="p">[</span><span class="n">row</span><span class="p">,</span> <span class="n">col</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span>
<span class="k">if</span> <span class="n">running</span><span class="p">:</span>
<span class="n">color</span> <span class="o">=</span> <span class="n">NEW_COLOR</span>
<span class="n">pg</span><span class="p">.</span><span class="n">draw</span><span class="p">.</span><span class="n">rect</span><span class="p">(</span><span class="n">screen</span><span class="p">,</span> <span class="n">color</span><span class="p">,</span> <span class="p">(</span><span class="n">col</span> <span class="o">*</span> <span class="n">cell_size</span><span class="p">,</span> <span class="n">row</span> <span class="o">*</span> <span class="n">cell_size</span><span class="p">,</span>
<span class="n">cell_size</span> <span class="o">-</span> <span class="mi">1</span><span class="p">,</span> <span class="n">cell_size</span> <span class="o">-</span> <span class="mi">1</span><span class="p">))</span>
<span class="k">return</span> <span class="n">updated_cells</span>
</code></pre></div></div>
<p>An important line of code here is:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="n">alive_neighbours</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nb">sum</span><span class="p">(</span><span class="n">cells</span><span class="p">[</span><span class="n">row</span><span class="o">-</span><span class="mi">1</span><span class="p">:</span><span class="n">row</span><span class="o">+</span><span class="mi">2</span><span class="p">,</span> <span class="n">col</span><span class="o">-</span><span class="mi">1</span><span class="p">:</span><span class="n">col</span><span class="o">+</span><span class="mi">2</span><span class="p">])</span> <span class="o">-</span> <span class="n">cells</span><span class="p">[</span><span class="n">row</span><span class="p">,</span> <span class="n">col</span><span class="p">]</span>
</code></pre></div></div>
<p>Here, we simply count the alive cells (the cells holding value 1) surrounding the cell under consideration. Since an alive cell has value 1, we can simply perform a summation. It also happens to be one of the most computational heavy lines of code in the whole algorithm. For each cell, we will count how many alive neighbours that cell has. It’s quite easy to see how this approach might become problematic as the gridsize increases, especially if we want to perform many updates per second and make the whole experience somewhat enjoyable to look at.</p>
<p>The result of the count will then determine the course of action for the considered cell. If it was alive (holding value 1), then we will color it as either a survivor (if it has 2 or 3 neighbours) or dead. For a dead cell, we will color it as a new cell if it has exactly 3 neighbours. This is simply the application of the game rules, with the aforementioned change that we will have 3 different cell states. Of course - for now - the states of the cell are only a visual feature since we color a brand new cell differently than a ‘survivor cell’. In the grid data structure, we do not make this distinction yet.</p>
<p>Besides these code snippets, you will not find much interesting in the basic implementation. There’s the main game loop, where the update function is called, and some event handling for marking and unmarking cells, and starting and stopping the game. When starting the game, all you need to do is click in the grid to mark a cell as alive. Rightclicking said cell marks it as dead. Hitting spacebar starts and pauses the game. So it’s simple to make a few clusters of cells (do yourself a favor and don’t click individual cells, you can just click-and-drag over cells to mark them alive or dead), and hit space to make the game start.</p>
<h3 id="lets-speed-things-up">Let’s speed things up</h3>
<p>When you run the game yourself, you will notice how small the PyGame window is:
<img src="https://www.peculiar-coding-endeavours.com/assets/game_of_life/basic.png" alt="basic" />
We have a very small window of 800x600 pixels, with a gridsize of 10 pixels. In short, our grid has 80 columns and 60 rows, for a total of 4800 cells. Even for such a small window, keep that one line of code in mind, where we count the number of neighbours of a cell… For each of those 4800 cells, we’ll have to perform a sum of the values of the 8 surrounding ones. Doesn’t seem too bright of an idea to start doing that several times a second (at least 60 times, since 60 frames per second does sound like an enjoyable experience), on a grid bigger than a little thumbnail.</p>
<p>So in the next section, we’ll make some big changes from our crude implementation into something that’s actually worth writing a blog about ;-) In case you do want to look at this step, check out <a href="https://github.com/tomvanschaijk/wayoflife/commit/705f115229768ec80cc675d7730f6463e5f43856" target="_blank">this commit</a>.</p>
<h2 id="a-dedicated-grid-class">A dedicated grid class</h2>
<p>Firstly, let’s create a class to express the grid, and hold all the game logic and ruleset. That way, the UI-related code is separated from the actual interesting stuff. So, from now on, all code in gameoflife.py will not be looked at anymore. Fiddle and inspect it if you feel like it, but you will find nothing particularly interesting in there. All it is, is setting up PyGame, responding to user inputs, updating the screen and handling the main game loop. The code that’s of actual interest all sits in conwaygolgrid.py. Most of the code in the class is fairly straightforward. The most important and interesting stuff happens in the update method, which changed pretty drastically from the previous iteration. In fact, the update function is now simply a pass-through to the static method __perform_update, in which the actual work is done.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="o">@</span><span class="nb">staticmethod</span>
<span class="o">@</span><span class="n">njit</span><span class="p">(</span><span class="n">fastmath</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">cache</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">__perform_update</span><span class="p">(</span><span class="n">cells</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">,</span> <span class="n">background_color</span><span class="p">:</span> <span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">],</span>
<span class="n">new_color</span><span class="p">:</span> <span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">],</span>
<span class="n">survivor_color</span><span class="p">:</span> <span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">],</span>
<span class="n">dead_color</span><span class="p">:</span> <span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">]</span>
<span class="p">)</span> <span class="o">-></span> <span class="nb">tuple</span><span class="p">[</span><span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">,</span> <span class="nb">list</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">,</span>
<span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">]]]]:</span>
<span class="s">"""Updates the grid to the next step in the iteration, following
Conway's Game of Life rules. Evaluates each cell, and returns the
list of cells to be redrawn and their colors
"""</span>
<span class="n">cells_to_redraw</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">,</span> <span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">]]]</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">updated_cells</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([[</span><span class="mi">0</span><span class="p">]</span> <span class="o">*</span> <span class="n">cells</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]]</span> <span class="o">*</span> <span class="n">cells</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="k">for</span> <span class="n">row</span><span class="p">,</span> <span class="n">col</span> <span class="ow">in</span> <span class="n">np</span><span class="p">.</span><span class="n">ndindex</span><span class="p">(</span><span class="n">cells</span><span class="p">.</span><span class="n">shape</span><span class="p">):</span>
<span class="n">alive_neighbours</span> <span class="o">=</span> <span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="nb">sum</span><span class="p">(</span><span class="n">cells</span><span class="p">[</span><span class="nb">max</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">row</span><span class="o">-</span><span class="mi">1</span><span class="p">):</span><span class="n">row</span><span class="o">+</span><span class="mi">2</span><span class="p">,</span>
<span class="nb">max</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">col</span><span class="o">-</span><span class="mi">1</span><span class="p">):</span><span class="n">col</span><span class="o">+</span><span class="mi">2</span><span class="p">])</span> <span class="o">-</span> <span class="n">cells</span><span class="p">[</span><span class="n">row</span><span class="p">,</span> <span class="n">col</span><span class="p">])</span>
<span class="k">if</span> <span class="n">cells</span><span class="p">[</span><span class="n">row</span><span class="p">,</span> <span class="n">col</span><span class="p">]</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span>
<span class="k">if</span> <span class="n">alive_neighbours</span> <span class="o">==</span> <span class="mi">2</span> <span class="ow">or</span> <span class="n">alive_neighbours</span> <span class="o">==</span> <span class="mi">3</span><span class="p">:</span>
<span class="n">updated_cells</span><span class="p">[</span><span class="n">row</span><span class="p">,</span> <span class="n">col</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span>
<span class="n">cells_to_redraw</span><span class="p">.</span><span class="n">append</span><span class="p">((</span><span class="n">row</span><span class="p">,</span> <span class="n">col</span><span class="p">,</span> <span class="n">survivor_color</span><span class="p">))</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">cells_to_redraw</span><span class="p">.</span><span class="n">append</span><span class="p">((</span><span class="n">row</span><span class="p">,</span> <span class="n">col</span><span class="p">,</span> <span class="n">dead_color</span><span class="p">))</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">if</span> <span class="n">alive_neighbours</span> <span class="o">==</span> <span class="mi">3</span><span class="p">:</span>
<span class="n">updated_cells</span><span class="p">[</span><span class="n">row</span><span class="p">,</span> <span class="n">col</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span>
<span class="n">cells_to_redraw</span><span class="p">.</span><span class="n">append</span><span class="p">((</span><span class="n">row</span><span class="p">,</span> <span class="n">col</span><span class="p">,</span> <span class="n">new_color</span><span class="p">))</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">cells_to_redraw</span><span class="p">.</span><span class="n">append</span><span class="p">((</span><span class="n">row</span><span class="p">,</span> <span class="n">col</span><span class="p">,</span> <span class="n">background_color</span><span class="p">))</span>
<span class="k">return</span> <span class="n">updated_cells</span><span class="p">,</span> <span class="n">cells_to_redraw</span>
</code></pre></div></div>
<p>You will also notice the decorator called njit. Besides that, the function now returns a tuple of the updated cells and the cells to redraw on the screen. So what is that all about? Simply: to remedy the fact that doing a lot of computation (however simple it is) in Python is always going to be slow, we use <a href="https://numba.pydata.org/" target="_blank">Numba</a>. In a nutshell, Numba is a jit-compiler for Python which allows you to compile a decorated function to machine-code and execute it much faster. Numba works best on Numpy types, as well as basic for loops in Python. However, not all datatypes in Python are usable in Numba, and since the Numba-compiled code kinda runs separately, you can not simply pass in parameters and get them back as if you are writing plain Python. That is why the __perform_update function is marked as a static method, and data that is known in the class is still explicitely passed in instead of refering it using self.</p>
<p>Thanks to Numba, all the repetitive computation and looping that would be excruciatingly slow to have to suffer through in plain vanilla Python, will now be handled just fine. The arguments that are added to the njit decorator are further optimizations that will help the performance along. In short, fastmath relaxes numerical rigour (since we are simply performing a sum of 8 values in the neighbour cells and are not calculating highly accurate decimal numbers), and setting cache to True will cache the compiled code for next executions. The very first time an njit-decorated function is called, it needs to be compiled, so this first iteration is handled slightly slower. All other calls will call the cached compiled function and will execute very fast.</p>
<p>If you care to convince yourself of the difference between using Numba and not using it: comment out the</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="o">@</span><span class="n">njit</span><span class="p">(</span><span class="n">fastmath</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">cache</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
</code></pre></div></div>
<p>line of code. The program will run just fine. However, you will notice that for any decent amount of drawn cells before you hit space to kick off the game, the speed will be dramatic. Performing a lot of CPU-heavy work in a loop is simply not what Python was made for. For your suffering/amusement, I added an fps counter in the bottom left. You’ll notice that I kicked up the screen size to 1920x1200 pixels, giving us a grid of 192 columns and 120 rows, which translates to 23040 cells to recalculate every iteration. In pure Python, that results in a less-than-enjoyable 3fps. Do yourself a favor and uncomment that decorator. It will result in about 20-30 fps. Still not quite great, but at least there’s some movement to behold without wondering if there’s something wrong with your computer.</p>
<p>I’d even say, if your idea of “fun” and “cool” is to watch a cellular automaton evolve semi-quickly, you should start developing warm fuzzy feelings right about now. If so, we’re on the same frequency. For the 4 guys reading this that are still following along, and to which that applies: keep reading, cause we can do better…</p>
<h3 id="lets-do-even-less">Let’s do even less</h3>
<p>As often is the case when speeding up algorithms, we can achieve that by doing less. Sure, it’s nice to see an existing framework do some work for us and speed things up. But let’s pretend it’s not 2023, and we can’t depend on others to do the work for us. Let’s actually think for a second, and see what we’re doing. We’ll also use the help of line-profiler, which we can use to see the running time of certain functions. I don’t want to dive in too much details about this Python package here, but you can find more information about it <a href="https://pypi.org/project/line-profiler/" target="_blank">here</a>. In short, we will use it to diagnose the more costly operations inside of the __perform_update function. Since this is the main function of the game loop, it would be good to run this as fast as possible to be able maximize the number of game state calculations per second, and thus a more fluid experience. Running the profiling.py file in the current state will yield an output as follows:
<img src="https://www.peculiar-coding-endeavours.com/assets/game_of_life/profiling1.png" alt="first_profiling_results" />
If you want to run this yourself, do make sure to comment out the njit-decorator first. line-profiler will only profile Python code. If you leave the njit-decorator active, the __perform_update function will be compiled into C-code, after which this compiled version will be called on, and we will not get any valueable data back from line-profiler. The profiling.py file just runs one iteration of the function using a 800x600 grid with cellsize 10, filled up with a random amount of alive and dead cells (well, there’s more chance for cells to be alive than dead to at least have a decent chunk of data to calculate). Looking at the results, we can immediately draw 2 conclusions:</p>
<ol>
<li>The calculation to determine the alive neighbours of a cell (which basically steers our entire ruleset) is performed 4800 times. In short: for every cell.</li>
<li>Each individual calculation takes up a bit too much time. We slice up the Numpy array to get a subset of the cells, and then perform a sum on their values (0 or 1) to find out how many neighbours it has.</li>
</ol>
<p>Some thoughts about these things: if we envision the complete grid, it’s not hard to imagine that most of the cells in the grid are dead and/or surrounded by dead cells. For those cells, no calculation would need to happen at all, as they would not have to change state from one iteration to the next. Additionally, as we traverse the grid from row to row, and column to column, we are bound to do double work. The neighbours of the cell at coordinates [0, 1] are at least partially the same as the neighbours of the cell at coordinates [0, 0]. However, we still slice up the array around each of those coordinates and pretend the work from the past (being the previous cell) never happened. That’s definitely a waste of time.</p>
<p>In short, performing memoization and shrinking the search space are 2 classical approaches we can apply to this problem to makes sure we are limiting the amount of work we do per iteration, and that the work we actually do is not lost. There are several approaches here, and it can sometimes become a balancing act between results and readability and complexity of the resulting code. Let’s go over the route I decided to take, and focus on the most important parts. You can find the actual code in <a href="https://github.com/tomvanschaijk/wayoflife/commit/6ffe2844d5979c2afe74de21a74c3e0445dffd0a" target="_blank">this commit</a>.</p>
<h2 id="new-representation-of-cells">New representation of cells</h2>
<p>We’ll focus on the new __perform_update function:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="o">@</span><span class="nb">staticmethod</span>
<span class="o">@</span><span class="n">njit</span><span class="p">(</span><span class="n">fastmath</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">cache</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">__perform_update</span><span class="p">(</span><span class="n">cells</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">,</span> <span class="n">new_cells</span><span class="p">:</span> <span class="nb">set</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">]],</span>
<span class="n">survivor_cells</span><span class="p">:</span> <span class="nb">set</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">]],</span> <span class="n">dead_cells</span><span class="p">:</span> <span class="nb">set</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">]],</span>
<span class="n">rows</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">columns</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">new_color</span><span class="p">:</span> <span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">],</span>
<span class="n">survivor_color</span><span class="p">:</span> <span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">],</span>
<span class="n">dead_color</span><span class="p">:</span> <span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">],</span>
<span class="n">background_color</span><span class="p">:</span> <span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">]</span>
<span class="p">)</span> <span class="o">-></span> <span class="nb">tuple</span><span class="p">[</span><span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">,</span> <span class="nb">list</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">,</span>
<span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">]]]]:</span>
<span class="s">"""Updates the grid to the next step in the iteration, following
Conway's Game of Life rules. Evaluates each cell, and returns the
list of cells to be redrawn and their colors
"""</span>
<span class="c1"># Grab the coordinates of the non-background cells
</span> <span class="n">active_cells</span> <span class="o">=</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">new_cells</span><span class="p">.</span><span class="n">union</span><span class="p">(</span><span class="n">survivor_cells</span><span class="p">).</span><span class="n">union</span><span class="p">(</span><span class="n">dead_cells</span><span class="p">))</span>
<span class="c1"># Per active cell, grab the coordinates from surrounding cells, add them to a set
</span> <span class="c1"># to be able to evaluate each cell once.
</span> <span class="n">cells_to_evaluate</span> <span class="o">=</span> <span class="nb">set</span><span class="p">()</span>
<span class="k">for</span> <span class="n">row</span><span class="p">,</span> <span class="n">col</span> <span class="ow">in</span> <span class="n">active_cells</span><span class="p">:</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">max</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">row</span><span class="o">-</span><span class="mi">1</span><span class="p">),</span> <span class="nb">min</span><span class="p">(</span><span class="n">row</span><span class="o">+</span><span class="mi">2</span><span class="p">,</span> <span class="n">rows</span><span class="p">)):</span>
<span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">max</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">col</span><span class="o">-</span><span class="mi">1</span><span class="p">),</span> <span class="nb">min</span><span class="p">(</span><span class="n">col</span><span class="o">+</span><span class="mi">2</span><span class="p">,</span> <span class="n">columns</span><span class="p">)):</span>
<span class="n">cells_to_evaluate</span><span class="p">.</span><span class="n">add</span><span class="p">((</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">))</span>
<span class="n">updated_cells</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([[</span><span class="bp">False</span><span class="p">]</span> <span class="o">*</span> <span class="n">cells</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]]</span> <span class="o">*</span> <span class="n">cells</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="n">cells_to_redraw</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">,</span> <span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">]]]</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">row</span><span class="p">,</span> <span class="n">col</span> <span class="ow">in</span> <span class="n">cells_to_evaluate</span><span class="p">:</span>
<span class="n">cell</span> <span class="o">=</span> <span class="p">(</span><span class="n">row</span><span class="p">,</span> <span class="n">col</span><span class="p">)</span>
<span class="c1"># Count the alive cells around the current cell
</span> <span class="n">alive_neighbours</span> <span class="o">=</span> <span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="nb">sum</span><span class="p">(</span><span class="n">cells</span><span class="p">[</span><span class="nb">max</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">row</span><span class="o">-</span><span class="mi">1</span><span class="p">):</span><span class="n">row</span><span class="o">+</span><span class="mi">2</span><span class="p">,</span>
<span class="nb">max</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">col</span><span class="o">-</span><span class="mi">1</span><span class="p">):</span><span class="n">col</span><span class="o">+</span><span class="mi">2</span><span class="p">])</span> <span class="o">-</span> <span class="n">cells</span><span class="p">[</span><span class="n">cell</span><span class="p">])</span>
<span class="k">if</span> <span class="n">cells</span><span class="p">[</span><span class="n">cell</span><span class="p">]:</span>
<span class="k">if</span> <span class="n">alive_neighbours</span> <span class="ow">in</span> <span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">):</span>
<span class="n">updated_cells</span><span class="p">[</span><span class="n">cell</span><span class="p">]</span> <span class="o">=</span> <span class="bp">True</span>
<span class="n">cells_to_redraw</span><span class="p">.</span><span class="n">append</span><span class="p">((</span><span class="n">row</span><span class="p">,</span> <span class="n">col</span><span class="p">,</span> <span class="n">survivor_color</span><span class="p">))</span>
<span class="n">survivor_cells</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">cell</span><span class="p">)</span>
<span class="n">new_cells</span><span class="p">.</span><span class="n">discard</span><span class="p">(</span><span class="n">cell</span><span class="p">)</span>
<span class="n">dead_cells</span><span class="p">.</span><span class="n">discard</span><span class="p">(</span><span class="n">cell</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">cells_to_redraw</span><span class="p">.</span><span class="n">append</span><span class="p">((</span><span class="n">row</span><span class="p">,</span> <span class="n">col</span><span class="p">,</span> <span class="n">dead_color</span><span class="p">))</span>
<span class="n">dead_cells</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">cell</span><span class="p">)</span>
<span class="n">new_cells</span><span class="p">.</span><span class="n">discard</span><span class="p">(</span><span class="n">cell</span><span class="p">)</span>
<span class="n">survivor_cells</span><span class="p">.</span><span class="n">discard</span><span class="p">(</span><span class="n">cell</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">if</span> <span class="n">alive_neighbours</span> <span class="o">==</span> <span class="mi">3</span><span class="p">:</span>
<span class="n">updated_cells</span><span class="p">[</span><span class="n">cell</span><span class="p">]</span> <span class="o">=</span> <span class="bp">True</span>
<span class="n">cells_to_redraw</span><span class="p">.</span><span class="n">append</span><span class="p">((</span><span class="n">row</span><span class="p">,</span> <span class="n">col</span><span class="p">,</span> <span class="n">new_color</span><span class="p">))</span>
<span class="n">new_cells</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">cell</span><span class="p">)</span>
<span class="n">survivor_cells</span><span class="p">.</span><span class="n">discard</span><span class="p">(</span><span class="n">cell</span><span class="p">)</span>
<span class="n">dead_cells</span><span class="p">.</span><span class="n">discard</span><span class="p">(</span><span class="n">cell</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">cells_to_redraw</span><span class="p">.</span><span class="n">append</span><span class="p">((</span><span class="n">row</span><span class="p">,</span> <span class="n">col</span><span class="p">,</span> <span class="n">background_color</span><span class="p">))</span>
<span class="n">dead_cells</span><span class="p">.</span><span class="n">discard</span><span class="p">(</span><span class="n">cell</span><span class="p">)</span>
<span class="k">return</span> <span class="n">updated_cells</span><span class="p">,</span> <span class="n">cells_to_redraw</span>
</code></pre></div></div>
<p>Quite the change, quite some more incoming arguments, but as you will hopefully realize, the complexity didn’t grow that much whereas the increase in speed did. To reiterate, our previous implementation had 2 issues:</p>
<ol>
<li>We perform a calculation for each cell in the grid. This doesn’t scale when the grid becomes large.</li>
<li>Each calculation ignores the previous work being done (being the neighbour count for cells that have overlapping neighbours)</li>
</ol>
<p>To remedy this, I have introduced a few simple changes:</p>
<ol>
<li>Instead of holding the grid in a simple Numpy matrix, I also introduced dedicated sets for alive, survivor and dead cells. Each of these sets only hold the coordinates of the cells in that respective state. At any point in time, we know which cells are in which state.</li>
<li>The result of that change (which basically is an increased space complexity to achieve a better time complexity) enables us to just take the coordinates of cells that are either new from the previous iterations, surviving from the previous generation, or newly marked as dead because of the application of the game rules in the previous iteration. Any cell that is just “part of the background” and which wasn’t even changed in the previous iteration, is not even taken into consideration for calculation of the next game state. As a reminder: for most of the runtime of the game, that’s actually most of the cells in the grid. In short, just like that, we could cut out most of the work in each loop.</li>
</ol>
<p>An important detail here of course, is that it’s not ONLY the active cells (new cells + survivor cells + dead cells) that will be potentially changing state in the next iteration of the game, but also their immediate surrounding cells. In short, once we know the coordinates of all active cells, we add them and their immediate surrounding cells to a set, which will result in a collection of cells that we will have to calculate the next game state for. We use a set so we have an easy built-in way to avoid adding the same cell twice.</p>
<p>In essence, we still do the same thing in the new function: we iterate over the cells we want to take into consideration, count the alive neighbours of that cell and simply apply the basic game rules to that cell. We need to of course apply some extra bookkeeping to make sure each of our sets hold the correct cells at any point in time, but this is trivial. If we run the profiling on these changes, an example output looks as follows:
<img src="https://www.peculiar-coding-endeavours.com/assets/game_of_life/profiling2.png" alt="second_profiling_results" />
Some conclusions here:</p>
<ol>
<li>Grabbing the active cells by taking the union of the individual sets for new, surviving and dead cells is cheap. Also, it’s just performed once.</li>
<li>Taking the surrounding cells of these active cells, yielding us the list of cells we want to perform the main calculations on, is also a lightweight operation.</li>
<li>Counting the neighbours per cell is still a costly operation, we didn’t really change this, but: we perform this MUCH less. As opposed to doing this for each cell, in each iteration (which would be 4800 times for a 800x600 grid of cellsize 10), we only do this 2260 times in this case. And remind yourself: this is for a grid that is very filled up, which will be quite rare, and close to a worse case scenario.</li>
</ol>
<p>Besides just looking at numbers, start the application (don’t forget the uncomment the njit-decorator if you commented it to generate the profiling results). Notice I kept the screen size of 1920x1200 the same, but have change the cell size to 5x5 pixels. That means 384 columns and 240 rows, and as such, 92160 cells to handle. And still, instead of reaching 20-30fps, we are now reaching 30-40fps for more than 3 times the amount of cells! In fact, to be snarky about it, comment out that njit-decorator again. You’ll reach about 10fps. That’s not great in itself, but think back to the first implementation. We had a 800x600 grid with 4800 cells, and reached 3fps using pure Python. Now, due to the optimization we built in ourselves instead of relying on precompiled highly-optimized code, we reach 10fps in a grid with 92160 cells. Sometimes, even if it’s just as an exercise for yourself, it’s just more satisfying to think outside the box instead of relying on pre-built tools to do the job for you. And the combination of both can yield great results.</p>
<p>Now, this is by no means an upper limit of what you can reach. However, it’s a fun optimization to reach, despite having to deal with a language that is really not built or suited for this kind of work. Build this project in C++ or Rust if you want to see real speed. For Python though, I’d call this a satisfying result. Remember: we are still calculating the neighbours of each cell individually. You COULD perform some memoization there and re-use the work you performed for previous cells. There are many ways to do this, again with varying added complexity as a result.</p>
<h2 id="a-different-way-to-count-neighbours">A different way to count neighbours</h2>
<p>Let’s inspect how I decided to do it, and checkout <a href="https://github.com/tomvanschaijk/wayoflife/commit/0a30d7abc94f2d1f51a703fe6a2df347b6dec260" target="_blank">this commit</a>. The new __perform_update function looks like this:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="o">@</span><span class="nb">staticmethod</span>
<span class="o">@</span><span class="n">njit</span><span class="p">(</span><span class="n">fastmath</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">cache</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">__perform_update</span><span class="p">(</span><span class="n">cells</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">,</span> <span class="n">neighbour_count</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">,</span>
<span class="n">new_cells</span><span class="p">:</span> <span class="nb">set</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">]],</span>
<span class="n">survivor_cells</span><span class="p">:</span> <span class="nb">set</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">]],</span>
<span class="n">dead_cells</span><span class="p">:</span> <span class="nb">set</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">]],</span>
<span class="n">rows</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">columns</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span>
<span class="n">new_color</span><span class="p">:</span> <span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">],</span>
<span class="n">survivor_color</span><span class="p">:</span> <span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">],</span>
<span class="n">dead_color</span><span class="p">:</span> <span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">],</span>
<span class="n">background_color</span><span class="p">:</span> <span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">]</span>
<span class="p">)</span> <span class="o">-></span> <span class="nb">tuple</span><span class="p">[</span><span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">,</span> <span class="nb">list</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">,</span>
<span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">]]]]:</span>
<span class="s">"""Updates the grid to the next step in the iteration, following Conway's
Game of Life rules. Evaluates each cell, and returns the list of cells
to be redrawn and their colors
"""</span>
<span class="c1"># Grab the coordinates of the non-background cells
</span> <span class="n">active_cells</span> <span class="o">=</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">new_cells</span><span class="p">.</span><span class="n">union</span><span class="p">(</span><span class="n">survivor_cells</span><span class="p">).</span><span class="n">union</span><span class="p">(</span><span class="n">dead_cells</span><span class="p">))</span>
<span class="c1"># Per active cell, grab the coordinates from surrounding cells,
</span> <span class="c1"># add them to a set to be able to evaluate each cell once.
</span> <span class="n">cells_to_evaluate</span> <span class="o">=</span> <span class="nb">set</span><span class="p">()</span>
<span class="k">for</span> <span class="n">row</span><span class="p">,</span> <span class="n">col</span> <span class="ow">in</span> <span class="n">active_cells</span><span class="p">:</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">max</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">row</span><span class="o">-</span><span class="mi">1</span><span class="p">),</span> <span class="nb">min</span><span class="p">(</span><span class="n">row</span><span class="o">+</span><span class="mi">2</span><span class="p">,</span> <span class="n">rows</span><span class="p">)):</span>
<span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">max</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">col</span><span class="o">-</span><span class="mi">1</span><span class="p">),</span> <span class="nb">min</span><span class="p">(</span><span class="n">col</span><span class="o">+</span><span class="mi">2</span><span class="p">,</span> <span class="n">columns</span><span class="p">)):</span>
<span class="n">cells_to_evaluate</span><span class="p">.</span><span class="n">add</span><span class="p">((</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">))</span>
<span class="n">updated_cells</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([[</span><span class="bp">False</span><span class="p">]</span> <span class="o">*</span> <span class="n">cells</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]]</span> <span class="o">*</span> <span class="n">cells</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="n">neighbour_count_to_update</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">],</span> <span class="nb">int</span><span class="p">]]</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">cells_to_redraw</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">,</span> <span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">]]]</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">row</span><span class="p">,</span> <span class="n">col</span> <span class="ow">in</span> <span class="n">cells_to_evaluate</span><span class="p">:</span>
<span class="n">cell</span> <span class="o">=</span> <span class="p">(</span><span class="n">row</span><span class="p">,</span> <span class="n">col</span><span class="p">)</span>
<span class="c1"># Count the alive cells around the current cell
</span> <span class="n">alive_neighbours</span> <span class="o">=</span> <span class="n">neighbour_count</span><span class="p">[</span><span class="n">row</span><span class="p">,</span> <span class="n">col</span><span class="p">]</span>
<span class="k">if</span> <span class="n">cells</span><span class="p">[</span><span class="n">cell</span><span class="p">]:</span>
<span class="k">if</span> <span class="n">alive_neighbours</span> <span class="ow">in</span> <span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">):</span>
<span class="n">updated_cells</span><span class="p">[</span><span class="n">cell</span><span class="p">]</span> <span class="o">=</span> <span class="bp">True</span>
<span class="n">cells_to_redraw</span><span class="p">.</span><span class="n">append</span><span class="p">((</span><span class="n">row</span><span class="p">,</span> <span class="n">col</span><span class="p">,</span> <span class="n">survivor_color</span><span class="p">))</span>
<span class="n">survivor_cells</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">cell</span><span class="p">)</span>
<span class="n">new_cells</span><span class="p">.</span><span class="n">discard</span><span class="p">(</span><span class="n">cell</span><span class="p">)</span>
<span class="n">dead_cells</span><span class="p">.</span><span class="n">discard</span><span class="p">(</span><span class="n">cell</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">cells_to_redraw</span><span class="p">.</span><span class="n">append</span><span class="p">((</span><span class="n">row</span><span class="p">,</span> <span class="n">col</span><span class="p">,</span> <span class="n">dead_color</span><span class="p">))</span>
<span class="n">neighbour_count_to_update</span><span class="p">.</span><span class="n">append</span><span class="p">((</span><span class="n">cell</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">))</span>
<span class="n">dead_cells</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">cell</span><span class="p">)</span>
<span class="n">new_cells</span><span class="p">.</span><span class="n">discard</span><span class="p">(</span><span class="n">cell</span><span class="p">)</span>
<span class="n">survivor_cells</span><span class="p">.</span><span class="n">discard</span><span class="p">(</span><span class="n">cell</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">if</span> <span class="n">alive_neighbours</span> <span class="o">==</span> <span class="mi">3</span><span class="p">:</span>
<span class="n">updated_cells</span><span class="p">[</span><span class="n">cell</span><span class="p">]</span> <span class="o">=</span> <span class="bp">True</span>
<span class="n">cells_to_redraw</span><span class="p">.</span><span class="n">append</span><span class="p">((</span><span class="n">row</span><span class="p">,</span> <span class="n">col</span><span class="p">,</span> <span class="n">new_color</span><span class="p">))</span>
<span class="n">neighbour_count_to_update</span><span class="p">.</span><span class="n">append</span><span class="p">((</span><span class="n">cell</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
<span class="n">new_cells</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">cell</span><span class="p">)</span>
<span class="n">survivor_cells</span><span class="p">.</span><span class="n">discard</span><span class="p">(</span><span class="n">cell</span><span class="p">)</span>
<span class="n">dead_cells</span><span class="p">.</span><span class="n">discard</span><span class="p">(</span><span class="n">cell</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">cells_to_redraw</span><span class="p">.</span><span class="n">append</span><span class="p">((</span><span class="n">row</span><span class="p">,</span> <span class="n">col</span><span class="p">,</span> <span class="n">background_color</span><span class="p">))</span>
<span class="n">dead_cells</span><span class="p">.</span><span class="n">discard</span><span class="p">(</span><span class="n">cell</span><span class="p">)</span>
<span class="k">for</span> <span class="n">coordinates</span><span class="p">,</span> <span class="n">delta</span> <span class="ow">in</span> <span class="n">neighbour_count_to_update</span><span class="p">:</span>
<span class="n">row</span><span class="p">,</span> <span class="n">col</span> <span class="o">=</span> <span class="n">coordinates</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">max</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">row</span><span class="o">-</span><span class="mi">1</span><span class="p">),</span> <span class="nb">min</span><span class="p">(</span><span class="n">row</span><span class="o">+</span><span class="mi">2</span><span class="p">,</span> <span class="n">rows</span><span class="p">)):</span>
<span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">max</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">col</span><span class="o">-</span><span class="mi">1</span><span class="p">),</span> <span class="nb">min</span><span class="p">(</span><span class="n">col</span><span class="o">+</span><span class="mi">2</span><span class="p">,</span> <span class="n">columns</span><span class="p">)):</span>
<span class="k">if</span> <span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">)</span> <span class="o">!=</span> <span class="n">coordinates</span><span class="p">:</span>
<span class="n">neighbour_count</span><span class="p">[(</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">)]</span> <span class="o">+=</span> <span class="n">delta</span>
<span class="k">return</span> <span class="n">updated_cells</span><span class="p">,</span> <span class="n">cells_to_redraw</span>
</code></pre></div></div>
<p>You will notice it’s pretty much the same as in the last commit, except for one crucial change:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">alive_neighbours</span> <span class="o">=</span> <span class="n">neighbour_count</span><span class="p">[</span><span class="n">row</span><span class="p">,</span> <span class="n">col</span><span class="p">]</span>
</code></pre></div></div>
<p>and later in the function:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">coordinates</span><span class="p">,</span> <span class="n">delta</span> <span class="ow">in</span> <span class="n">neighbour_count_to_update</span><span class="p">:</span>
<span class="n">row</span><span class="p">,</span> <span class="n">col</span> <span class="o">=</span> <span class="n">coordinates</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">max</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">row</span><span class="o">-</span><span class="mi">1</span><span class="p">),</span> <span class="nb">min</span><span class="p">(</span><span class="n">row</span><span class="o">+</span><span class="mi">2</span><span class="p">,</span> <span class="n">rows</span><span class="p">)):</span>
<span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">max</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">col</span><span class="o">-</span><span class="mi">1</span><span class="p">),</span> <span class="nb">min</span><span class="p">(</span><span class="n">col</span><span class="o">+</span><span class="mi">2</span><span class="p">,</span> <span class="n">columns</span><span class="p">)):</span>
<span class="k">if</span> <span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">)</span> <span class="o">!=</span> <span class="n">coordinates</span><span class="p">:</span>
<span class="n">neighbour_count</span><span class="p">[(</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">)]</span> <span class="o">+=</span> <span class="n">delta</span>
</code></pre></div></div>
<p>In short, we have introduced a new matrix of the same size as the grid. Each cell will simply hold the amount of neighbours of the cell at that index. When initializing the grid, obviously each value holds a zero. Then, as the user clicks a cell, or as we calculate the new state of each cell, we perform updates to that matrix. This approach to optimizing the “counting of neighbours” problem is called amortization. You can read about it <a href="https://www.geeksforgeeks.org/introduction-to-amortized-analysis/" target="_blank">here</a>, but simply put, it is an approach to defer the cost of a (time-wise) expensive operation and simply spread it out over time instead of having to suffer the full brunt of it all at once. This is one of many ways to do memoization in algorithms.</p>
<p>Very simply put: instead of calculating all neighbours for each cell in each iteration, we remember the amount of alive neighbours for each cell, and update that value whenever it changes. This guarantees that whenever we want to know about the neighbours of each cell at the start and we can just fetch that value in constant-time. This trade-off of performing a bit of extra working when we change the state of a cell, as opposed to counting the neighbours for each cell even though that cell didn’t even change state, will result in even faster game state calculation.</p>
<p>If you run the program, you will notice that (unless you completely overload the grid with alive cells) it will reach 50-60fps now. And even while commenting out the njit-decorator, reverting us to pure Python and not having the luxury of a precompiled optimized C-function, we get close to 20fps.</p>
<h3 id="some-more-numbers">Some more numbers…</h3>
<p>Besides actually looking at the game play out, and seeing the obvious speed changes from commit to commit, let’s check out <a href="https://github.com/tomvanschaijk/wayoflife/commit/bceecfc3561093ea7174c4123d8d8903ad7385f2" target="_blank">a commit</a> with no code changes, but where I have added 3 profiling results:</p>
<ul>
<li>profiling_results_numba_only: the optimized results due to the use of Numba only</li>
<li>profiling_results_smaller_searchspace: as the previous, but we limited the searching space as explained earlier</li>
<li>profiling_results_optimized_neighbour_count: more of the same, but we also changed the way we count neighbours</li>
</ul>
<p>In fact, if you are still reading this, you MUST be a developer, and hence you are lazy. Instead of checking out the code, here are the screenshots of these profiling results:
<img src="https://www.peculiar-coding-endeavours.com/assets/game_of_life/profiling_numba_only.png" alt="numba_only" />
<img src="https://www.peculiar-coding-endeavours.com/assets/game_of_life/profiling_smaller_searchspace.png" alt="smaller_searchspace" />
<img src="https://www.peculiar-coding-endeavours.com/assets/game_of_life/profiling_optimized_neighbourcount.png" alt="optimized_neighbourcount" /></p>
<p>Now, this is just a sample of course, but a single calculation of a new game state on a 800x600 grid with cellsize 10x10 pixels went from 0.084secs with Numba to 0.055secs by limiting the searchspace, to 0.021secs by optimization of the neighbour count. So, although Numba definitely unlocked a big speed gain, some cricital thinking concerning the actual work, and a few simple changes, granted us a 4x speed increase, even though we used a language that is not really built for a lot of computation.</p>
<p>In reality, we kind of danced around that problem by moving the complexity from time to space, and just did less computation. It’s not hard to imagine that if we use a compiled language that’s more suited to this kind of work like C++ or Rust, we’ll reach much better performance. Turning that argument on its’ head though: if we did use those languages to solve this, we wouldn’t have even considered our optimizations because the language would have done most of the work for us.</p>
<p>Using Python for this type of little project does unlock a lot of ways to have fun. You can use frameworks that do a lot of work for you, like Numpy and Numba, but if you care to dig deeper, the sheer fun of the language and the easy syntax of it encourage you to try out new approaches. Before I converged on the current implementation, I really tried many different approaches for counting neighbours and such. Because of how easy it is to write Python, you can iterate on solutions and prototype very quickly. And ironically, simply because it wasn’t made for computationally heavy work, you can mimic the 80’s and 90’s again, where you had to depend on your own enginuity and creativity to use the tools and knowledge under your belt to come up with a good solution under certain hardware (or in this case, language) constraints.</p>
<p>In fact, this frame of mind made me get into solving the traveling salesman problem using Python in 4 different ways. That’s for my next blog though, although you can already see the results <a href="https://www.youtube.com/watch?v=XCZSwM--vCA" target="_blank">right here</a>. I had a lot of fun with that one, especially the genetic algorithm and ant colony optimization implementations. Something I always wanted to try, but finally got around to.</p>
<h3 id="features">Features…</h3>
<p>However… before getting to that: after I implemented the optimizations I mentioned above, and saw that I could reach quite a decent framerate, I decided it was time to stop thinking too much and just have some fun with it. If you check out the last version of the develop branch, you’ll find that the code changed quite a bit. Just start the program and press F1 when you see the PyGame window appear, and you will get a short summary of what you are able to do. Again, the results of that are summarized in this <a href="https://youtu.be/2HOLWExgwzU" target="_blank">YouTube video</a>, but I leave it to you to play around with it, and maybe use it as an inspiration to optimize it even further and just run with it. I think especially the manual step forward and revert functionality can be handy to learn about the way the Game of Life works, if that’s your cup of tea. If that’s not your deal and you just want to experiment with self-induced epilepsy, have at it and change colors in an overly crowded grid. Don’t say I made you. Just a hint: throwing on a few of the eternal generator presets I built in, as well as overlaying the grid with the cross-pattern repeatedly as it runs, can yield pretty cool results and long-running games. Again, my idea of fun might slighty diverge from yours.</p>
<p>Thank you for reading this entire blog, I’m sure there’s more enjoyable and relaxing ways to spend your time on the internet. There is still a very large amount of things I could talk about concerning the implementation of several features, the profiling, the caching of cells bloomed with OpenCV,… If you have any questions, remarks or interesting thoughts to share, let me know! If you improve on the implementation or have ideas on how to possibly optimize it further, don’t hold back on either implementing it yourself and sharing the results, or plant that seed in my mind by sharing. I might just be crazy enough to start doing it. Again, thanks for reading this blog, you did it to yourself, but I thank you all the same ;-)</p>
Sat, 11 Feb 2023 16:00:00 +0000
https://www.peculiar-coding-endeavours.com/2023/game-of-life/
https://www.peculiar-coding-endeavours.com/2023/game-of-life/game of lifeconwayalgorithmspythonTechAlgorithmsPrime skills?<p>I was always charmed by the idea and the practice of algorithm-heavy programming. The heavy lifting, making seemingly complex and daunting tasks run like <nowiki>****</nowiki> of a shovel.</p>
<p><img src="https://www.peculiar-coding-endeavours.com/assets/prime_skills/eratosthenes.png" alt="eratosthenes" /></p>
<p>That’s what got me into programming 20 years ago, as I started ploughing my way through lines of TurboPascal code, and it’s what keeps me loving programming nowadays as well. Since that time, I underwent VB 6.0, learned a bit of Java, some VB.NET, dabbled in VBA, fell in love with C#, learned to hate JavaScript, played with meta-programming (Rascal) and functional programming (Haskell) in university, and because of my study endeavours in the field of data science and AI, am experiencing a fair bit of Python and R as well.</p>
<p>Now, because all of this, I do spend a fair amount of time wandering through sites like <a href="https://www.topcoder.com" target="_blank">TopCoder</a>, <a href="https://www.hackerrank.com" target="_blank">HackerRank</a>, <a href="https://www.codefights.com" target="_blank">CodeFights</a>. You know, all those nerdy coding websites for developers that don’t particularly enjoy learning all the ways to create flashy front-end webdesigns. The type of programmer who believes that high quality code, efficiency and performance do actually matter, and is convinced that mastering those skills will set you up for a rewarding career in the long haul (how naive we are, we’d better stick with current trends and learn Angular2 right? NO!! Although that’s a topic for another post). I do not proclaim to be great at all this stuff, but I like a challenge from time to time. I came about such a challenge very recently, and after being presented with the possible solution, I got that typical thought I get sometimes. It goes something like this:</p>
<blockquote>
<p>“Yeah, you can do it like that, but surely there’s a better way, I just don’t remember it right now, but let me dig into it for a minute and I’ll get right back to you.”</p>
</blockquote>
<p>We all have those moments. Now let’s face it, unless you work at Google, Apple, Microsoft, Amazon or some other company where you work through problems with complexity similar to theirs, you simply will not get confronted with algorithm-heavy programming. That’s a reality. Now for most people, that’s no problem at all. Most people don’t have the ambition to do so, and will spend their careers not once having to solve such a problem. But let’s ignore those people for now (sorry guys ;-) ) and start from the philosophy that, although you may not need to know all sorting/searching/pathfinding algorithms for your day-to-day work or side-projects, you can be damn sure that knowing them will make you a much better programmer. Additionally, depending on your goals and ambitions, it does increase your chances of getting involved in more complex areas of development, with higher impact, and brings you in contact with smart and highly skilled people.</p>
<p>Obviously, you forget a lot of the details of that stuff if you don’t work through it frequently enough. So, from time to time, you owe it to yourself to pick up a book or online course and brush off those skills. It sure beats sitting in your couch and watching tv. The question is: where to start? You can’t jump in the overly complex stuff rightaway. Although sorting and searching algorithms are a good start, my recent experience is a possibility to leave the beaten path a bit, and introduce an alternative starting point for getting into algorithms.</p>
<h3 id="the-challenge">The challenge</h3>
<p>So, what’s the deal? Well, the deal/assignment/challenge was as follows:</p>
<blockquote>
<p>“Get me all the prime numbers below 100!”</p>
</blockquote>
<p>Now, after reading that sentence, 95% of you will ignore the rest of this post and will be on their merry way. 4.9% will think</p>
<blockquote>
<p>“Allright, that’s nice and all, but that’s a beginner problem…”</p>
</blockquote>
<p>and move on to more interesting areas of the internet. Can’t blame them, since in terms of algorithms, this is indeed a beginner problem. The goal, however, is to shed some light on the way to go about thinking about any algorithmic problem, instead of just giving you the solution to it and pack up. No matter how complex the problem is, the way to approach it is often very similar. So, for the 3 programmers that are ticked off, and the 2 recruiters that spend their days browsing the profiles of all developers, hoping to snatch them away (you know who you are), and that are interested in someones thought process, here goes…</p>
<h3 id="the-problem-area">The problem area</h3>
<p>First off, we might want to reiterate what the hell a prime number is. Most of us will obviously know, but to really make sure we understand completely and don’t forget about edge cases, let’s just define the problem area here.</p>
<blockquote>
<p>A prime number is a natural number greater than 1 that has no positive divisors other than 1 and itself.</p>
</blockquote>
<p>So, that’s pretty clear.</p>
<ul>
<li>Natural number.</li>
<li>Greater than one.</li>
<li>No positive divisors besides itself and 1.</li>
</ul>
<p>Ok, now that’s obvious, let’s just have a first go at this…</p>
<h3 id="a-naive-solution">A naive solution</h3>
<p>No, that’s not an insult to the people that will come up with the solution I’ll explain below. A naive solution to a problem is basically the algorithm that solves the problem with no particular aim towards optimization. It’ll get the job done, but that’s it most of the time.</p>
<p>So one possible naive solution is the one below. By the way, I’m writing this in C#, Java or C++ programmers out there will have similar constructs for the collections I use. In general, any C-type language connoisseur (see how I used a relatively rare word there?) will have no problems understanding at least the syntax of my coding gibberish. Also, note that I’m using the pinnacle of front-end splendor (a console application), so that’s why my functions are static. If you don’t like that, go back and read the intro again, you’ll realize you should have known.</p>
<div class="language-cs highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">static</span> <span class="kt">long</span><span class="p">[]</span> <span class="nf">SimplePrimes</span><span class="p">(</span><span class="kt">long</span> <span class="n">n</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">n</span> <span class="p"><</span> <span class="m">2</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="k">new</span> <span class="kt">long</span><span class="p">[]</span> <span class="p">{</span> <span class="p">};</span> <span class="p">}</span>
<span class="kt">var</span> <span class="n">primes</span> <span class="p">=</span> <span class="k">new</span> <span class="n">List</span><span class="p"><</span><span class="kt">long</span><span class="p">>();</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">var</span> <span class="n">i</span> <span class="p">=</span> <span class="m">2</span><span class="p">;</span> <span class="n">i</span> <span class="p"><=</span> <span class="n">n</span><span class="p">;</span> <span class="n">i</span><span class="p">++)</span>
<span class="p">{</span>
<span class="kt">var</span> <span class="n">isPrime</span> <span class="p">=</span> <span class="k">true</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">var</span> <span class="n">j</span> <span class="p">=</span> <span class="m">2</span><span class="p">;</span> <span class="n">j</span> <span class="p"><</span> <span class="n">i</span><span class="p">;</span> <span class="n">j</span><span class="p">++)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">i</span> <span class="p">%</span> <span class="n">j</span> <span class="p">==</span> <span class="m">0</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">isPrime</span> <span class="p">=</span> <span class="k">false</span><span class="p">;</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">isPrime</span><span class="p">)</span> <span class="p">{</span> <span class="n">primes</span><span class="p">.</span><span class="nf">Add</span><span class="p">(</span><span class="n">i</span><span class="p">);</span> <span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="n">primes</span><span class="p">.</span><span class="nf">ToArray</span><span class="p">();</span>
<span class="p">}</span>
</code></pre></div></div>
<p>That’s a pretty raw version, and as far as I can see it’s pretty much the most horrible solution that still gets the job done. Of course, this should immediately trigger the</p>
<blockquote>
<p>“That can’t be ideal…”</p>
</blockquote>
<p>itch that you just have to scratch, but it’s a place to start from. So, in short: what happens here?</p>
<ul>
<li>Right off the bat, I’m returning an empty array in the case that n is smaller than 2. Primes are greater than 1, remember?</li>
<li>Then we start looping i from 2 to n, which is our upper limit. Those are basically the numbers we have to test for being prime.</li>
<li>In our loop, we set isPrime to true, to assume i is a prime.</li>
<li>In an inner loop (red alert) we loop from 2 to i (well, i-1 really) and we check if i mod j equals 0. If that’s the case, guess what, i is not prime, we set isPrime to false, and we can break out of the loop. In the case of i being 2, the inner loop doesn’t even run once, so 2 gets added, we’re safe there.</li>
<li>If our flag survived the inner loop, i is prime, so we add it to the list.</li>
</ul>
<h3 id="some-issues">Some issues</h3>
<p>Now, obviously this is not a particularly smart way to go about things. Whenever you start doing inner loops, things potentially get dangerous. So we have something to inspect: what really happens in the inner loop? First of all, it’s not really O(n²). The inner loop does not loop all the way to n (in fact, that wouldn’t even work), so that is at least something. Some immediate observations though:</p>
<ol>
<li>The outer loop runs from 2 to n. I don’t like that. Any n lower than 2 made us jump out of the function already. So we know n is at least 2 by the time we get to the loop (if we ever get there). So why not already add 2 to the list of primes, and start looping from 3? It’s a detail, but it counts.</li>
<li>Building on that, we know that even numbers higher than 2 can not be prime. So instead of testing each even number, why not increase i with 2 every time, so i will be 3, 5, 7, 9, …? That already saves us half the trouble. I like that.</li>
<li>Now for the optimization that was in the solution that I got presented, and that actually sparked my desire to write this blog. Concerning the inner loop: we don’t really need to loop all the way to i. Let’s say i equals 17, so we’re testing 17 for being prime. We don’t have to test 17 % 2, 17 % 3, 17 % 4 all the way to 17 % 16. It’s enough to let the inner loop go to the square root of i. If we test all numbers until there, we know we tested all factors. In itself a pretty strong optimization, since it makes sure we don’t have O(n²), but basically O(n* sqrt(n)). The outer loop runs n times, the inner one max sqrt(n) times.</li>
<li>Just a thought: when i is a prime, you’d have to perform the inner loop all the way from 2 to i. If i is not prime, you break out of the loop much sooner. So it’s funny that a succesful find is much more expensive. Just a thought on the side.</li>
<li>Although the inner loop does not go all the way to n, it does get progressively worse the higher i gets. Sure, we avoid true quadratic complexity, but it just doesn’t feel right to me. Now, we are only running to 100, so we’re not going to prove much. In short, I’m thinking</li>
</ol>
<blockquote>
<p>“Who cares about all primes below 100? I’d have to write a pretty horrific piece of code for that to run slow… No, I’ll focus on primes below 1.000.000 (for starters). That’ll give me a better indication about how efficient the code runs.”</p>
</blockquote>
<p>In case you wonder how this actually performs, bear with me, the results are shown at the bottom of this post, together with the results of the optimizations that I did. But first, let’s take the remarks into account and refactor this into something that is likely to do a much better job.</p>
<h3 id="take-2">Take 2</h3>
<p>If we implement optimizations for remarks 1 to 3 (and some extra stuff) from the list above, we achieve the following code:</p>
<div class="language-cs highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">static</span> <span class="kt">long</span><span class="p">[]</span> <span class="nf">OptimusPrime</span><span class="p">(</span><span class="kt">long</span> <span class="n">n</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">n</span> <span class="p"><</span> <span class="m">2</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="k">new</span> <span class="kt">long</span><span class="p">[]</span> <span class="p">{</span> <span class="p">};</span> <span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">n</span> <span class="p">==</span> <span class="m">2</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="k">new</span> <span class="kt">long</span><span class="p">[]</span> <span class="p">{</span> <span class="m">2</span> <span class="p">};</span> <span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">n</span> <span class="p">==</span> <span class="m">3</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="k">new</span> <span class="kt">long</span><span class="p">[]</span> <span class="p">{</span> <span class="m">2</span><span class="p">,</span> <span class="m">3</span> <span class="p">};</span> <span class="p">}</span>
<span class="kt">var</span> <span class="n">primes</span> <span class="p">=</span> <span class="k">new</span> <span class="n">List</span><span class="p"><</span><span class="kt">long</span><span class="p">>()</span> <span class="p">{</span> <span class="m">2</span><span class="p">,</span> <span class="m">3</span> <span class="p">};</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">var</span> <span class="n">i</span> <span class="p">=</span> <span class="m">5</span><span class="p">;</span> <span class="n">i</span> <span class="p"><=</span> <span class="n">n</span><span class="p">;</span> <span class="n">i</span> <span class="p">+=</span> <span class="m">2</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">var</span> <span class="n">isPrime</span> <span class="p">=</span> <span class="k">true</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">var</span> <span class="n">j</span> <span class="p">=</span> <span class="m">2</span><span class="p">;</span> <span class="n">j</span> <span class="p"><=</span> <span class="n">Math</span><span class="p">.</span><span class="nf">Sqrt</span><span class="p">(</span><span class="n">i</span><span class="p">);</span> <span class="n">j</span><span class="p">++)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">i</span> <span class="p">%</span> <span class="n">j</span> <span class="p">==</span> <span class="m">0</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">isPrime</span> <span class="p">=</span> <span class="k">false</span><span class="p">;</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">isPrime</span><span class="p">)</span> <span class="p">{</span> <span class="n">primes</span><span class="p">.</span><span class="nf">Add</span><span class="p">(</span><span class="n">i</span><span class="p">);</span> <span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="n">primes</span><span class="p">.</span><span class="nf">ToArray</span><span class="p">();</span>
<span class="p">}</span>
</code></pre></div></div>
<p>The check for n being smaller than 2 has stayed the same. For giggles, I added 2 silly little micro-optimizations in the case of n being 2 or 3. In those cases, we already return with finished arrays.</p>
<p>Only if n is 5 or higher, does the actual algorithm take place. We already add 2 and 3 to the list, since those are obviously prime and always in the list if we made it so far. Then we start looping from 5, and add 2 for every iteration, ensuring we stick to testing odd numbers.</p>
<p>Concerning the inner loop, we only run to the square root of i, cutting the number of iterations of that loop way down.</p>
<p>Without even presenting you with test results right now, you can already tell that this implementation will be way more efficient. However, that’s not what I was thinking when faced with this solution (although the original one was slightly less optimized than this), since in the back of my mind, I remembered parts of a much better solution, with much higher efficiency.</p>
<h3 id="so-lets-try-that-again">So let’s try that again</h3>
<p>I did remember a much better solution that I learned once, but kind of forgot. Weird as this may sound, I’m not calculating prime numbers in my spare time (except for this one time I guess), so this one was pretty rusty. After checking online and brushing off some stuff, I remembered the Sieve of Eratosthenes algorithm, which is a pretty old but simple algorithm to calculate primes in a very efficient way. Of course, you can just copy an implementation from a website and go with that, but I would strongly advise you to</p>
<blockquote>
<p><strong>NOT EVER DO THAT</strong></p>
</blockquote>
<p>It may be (partly) wrong and you won’t learn a thing. And frankly: if you think spending your free time checking out algorithms on prime numbers and writing a blog post about it is a bit weird, how does copying one from the internet make you feel? That’s just sad.</p>
<p>The way the algorithm works is pretty simple actually. You create a list of candidate numbers, from 2 all the way to your upper limit. Then you start with 2. You mark it as prime, and then mark all its multiples as non-prime (since they are composite numbers after all). You then search the first non-marked number in the list. That will be a prime, and you then mark all its multiples as non-prime. You repeat this until you can’t find any unmarked numbers in the list. All remaining unmarked numbers are the primes.</p>
<p>The algorithm as I interpreted it:</p>
<div class="language-cs highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">static</span> <span class="kt">long</span><span class="p">[]</span> <span class="nf">SieveOfEratosthenes</span><span class="p">(</span><span class="kt">long</span> <span class="n">n</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">n</span> <span class="p"><</span> <span class="m">2</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="k">new</span> <span class="kt">long</span><span class="p">[]</span> <span class="p">{</span> <span class="p">};</span> <span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">n</span> <span class="p">==</span> <span class="m">2</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="k">new</span> <span class="kt">long</span><span class="p">[]</span> <span class="p">{</span> <span class="m">2</span> <span class="p">};</span> <span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">n</span> <span class="p">==</span> <span class="m">3</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="k">new</span> <span class="kt">long</span><span class="p">[]</span> <span class="p">{</span> <span class="m">2</span><span class="p">,</span> <span class="m">3</span> <span class="p">};</span> <span class="p">}</span>
<span class="c1">// Fill array</span>
<span class="kt">var</span> <span class="n">numbers</span> <span class="p">=</span> <span class="k">new</span> <span class="kt">long</span><span class="p">[</span><span class="n">n</span><span class="p">];</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">var</span> <span class="n">i</span> <span class="p">=</span> <span class="m">1</span><span class="p">;</span> <span class="n">i</span> <span class="p"><</span> <span class="n">n</span><span class="p">;</span> <span class="n">i</span><span class="p">++)</span>
<span class="p">{</span>
<span class="n">numbers</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="p">=</span> <span class="n">i</span> <span class="p">+</span> <span class="m">1</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">var</span> <span class="n">i</span> <span class="p">=</span> <span class="m">1</span><span class="p">;</span> <span class="n">i</span> <span class="p"><=</span> <span class="n">Math</span><span class="p">.</span><span class="nf">Sqrt</span><span class="p">(</span><span class="n">n</span><span class="p">);)</span>
<span class="p">{</span>
<span class="c1">// Calculate the current number</span>
<span class="kt">var</span> <span class="n">curNr</span> <span class="p">=</span> <span class="n">i</span> <span class="p">+</span> <span class="m">1</span><span class="p">;</span>
<span class="c1">// Set all multiples of curNr to 0 (they are not prime)</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">var</span> <span class="n">j</span> <span class="p">=</span> <span class="p">(</span><span class="kt">long</span><span class="p">)</span><span class="n">Math</span><span class="p">.</span><span class="nf">Pow</span><span class="p">(</span><span class="n">curNr</span><span class="p">,</span> <span class="m">2</span><span class="p">)</span> <span class="p">-</span> <span class="m">1</span><span class="p">;</span> <span class="n">j</span> <span class="p"><</span> <span class="n">n</span><span class="p">;</span> <span class="n">j</span> <span class="p">+=</span> <span class="n">curNr</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">numbers</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="p">=</span> <span class="m">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">// Find the next unmarked (!= 0) entry in the array</span>
<span class="n">i</span><span class="p">++;</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">var</span> <span class="n">j</span> <span class="p">=</span> <span class="n">i</span><span class="p">;</span> <span class="n">j</span> <span class="p"><</span> <span class="n">n</span><span class="p">;</span> <span class="n">j</span><span class="p">++)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">numbers</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="p">!=</span> <span class="m">0</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">i</span> <span class="p">=</span> <span class="n">j</span><span class="p">;</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="n">numbers</span><span class="p">.</span><span class="nf">Where</span><span class="p">(</span><span class="n">e</span> <span class="p">=></span> <span class="n">e</span> <span class="p">!=</span> <span class="m">0</span><span class="p">).</span><span class="nf">ToArray</span><span class="p">();</span>
<span class="p">}</span>
</code></pre></div></div>
<p>What in the hell happens here? Well, let’s go through it step by step:</p>
<ul>
<li>First, we go through the special cases where n is lower than 2, or equals 2 or 3.</li>
<li>Then we fill a numbers array from one to n.</li>
<li>Then we start looping until we reach the square root of n, basically for the same reason as we did when optimizing the simple algorithm.</li>
<li>Now, again we do an inner loop, which would be reason to get worried. The thing is though, we only start from the exponent of the current number. The reason for this lies in the fact that all numbers below that would already have been marked as 0 by the inner loops for lower numbers. This combination of a limited outer loop and a limited inner loop makes for a very (guess what) limited number of operations that get executed, which in turn catapults the efficiency way up.</li>
<li>After setting all multiples of the current number to 0, we go and look for the next number in the list that is not marked as 0.</li>
</ul>
<p>You will see (very shortly) from the results that you get real big gains in execution time with this algorithm. Although it is definitely not the fastest algorithm possible, it is a really simple way to tackle the problem, with pretty good results.</p>
<p>Now, I wouldn’t be the idealist I am if I didn’t want to optimize this even further. Remember when I optimized the simple algorithm to make sure it only considered odd numbers (besides 2)? Well, we could do that here as well! And we should! Now, just for fun, I’m not even going to explain this one. I’ll leave it up to you to figure this one out. The obvious issue is that you will need to do some calculations to determine the right positions. If you visualize your array though, you’ll get it pretty quickly. The code looks as follows:</p>
<div class="language-cs highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">static</span> <span class="kt">long</span><span class="p">[]</span> <span class="nf">OptimusSieveOfEratosthenes</span><span class="p">(</span><span class="kt">long</span> <span class="n">n</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">n</span> <span class="p"><</span> <span class="m">2</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="k">new</span> <span class="kt">long</span><span class="p">[]</span> <span class="p">{</span> <span class="p">};</span> <span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">n</span> <span class="p">==</span> <span class="m">2</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="k">new</span> <span class="kt">long</span><span class="p">[]</span> <span class="p">{</span> <span class="m">2</span> <span class="p">};</span> <span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">n</span> <span class="p">==</span> <span class="m">3</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="k">new</span> <span class="kt">long</span><span class="p">[]</span> <span class="p">{</span> <span class="m">2</span><span class="p">,</span> <span class="m">3</span> <span class="p">};</span> <span class="p">}</span>
<span class="c1">// Calculate the number of elements in the array</span>
<span class="kt">var</span> <span class="n">elements</span> <span class="p">=</span> <span class="n">n</span> <span class="p">%</span> <span class="m">2</span> <span class="p">==</span> <span class="m">0</span> <span class="p">?</span> <span class="p">(</span><span class="n">n</span> <span class="p">-</span> <span class="m">1</span><span class="p">)</span> <span class="p">/</span> <span class="m">2</span> <span class="p">:</span> <span class="n">n</span> <span class="p">/</span> <span class="m">2</span><span class="p">;</span>
<span class="kt">var</span> <span class="n">numbers</span> <span class="p">=</span> <span class="k">new</span> <span class="kt">long</span><span class="p">[</span><span class="n">elements</span> <span class="p">+</span> <span class="m">1</span><span class="p">];</span>
<span class="c1">// Already put 2 in the array, followed by all odd numbers // starting from 3</span>
<span class="n">numbers</span><span class="p">[</span><span class="m">0</span><span class="p">]</span> <span class="p">=</span> <span class="m">2</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">var</span> <span class="n">i</span> <span class="p">=</span> <span class="m">0</span><span class="p">;</span> <span class="n">i</span> <span class="p"><</span> <span class="n">elements</span><span class="p">;</span> <span class="n">i</span><span class="p">++)</span>
<span class="p">{</span>
<span class="n">numbers</span><span class="p">[</span><span class="n">i</span> <span class="p">+</span> <span class="m">1</span><span class="p">]</span> <span class="p">=</span> <span class="m">3</span> <span class="p">+</span> <span class="p">(</span><span class="n">i</span> <span class="p">*</span> <span class="m">2</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">var</span> <span class="n">max</span> <span class="p">=</span> <span class="p">(</span><span class="kt">long</span><span class="p">)</span><span class="n">Math</span><span class="p">.</span><span class="nf">Sqrt</span><span class="p">(</span><span class="n">numbers</span><span class="p">.</span><span class="nf">Last</span><span class="p">())</span> <span class="p">/</span> <span class="m">2</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">var</span> <span class="n">i</span> <span class="p">=</span> <span class="m">1</span><span class="p">;</span> <span class="n">i</span> <span class="p"><=</span> <span class="n">max</span><span class="p">;)</span>
<span class="p">{</span>
<span class="c1">// Calculate the current number</span>
<span class="kt">var</span> <span class="n">curNr</span> <span class="p">=</span> <span class="n">i</span> <span class="p">*</span> <span class="m">2</span> <span class="p">+</span> <span class="m">1</span><span class="p">;</span>
<span class="c1">// Set all multiples of curNr to 0 (they are not prime)</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">var</span> <span class="n">j</span> <span class="p">=</span> <span class="n">i</span> <span class="p">*</span> <span class="n">curNr</span> <span class="p">+</span> <span class="n">i</span><span class="p">;</span> <span class="n">j</span> <span class="p"><=</span> <span class="n">elements</span><span class="p">;</span> <span class="n">j</span> <span class="p">+=</span> <span class="n">curNr</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">numbers</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="p">=</span> <span class="m">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">// Find the next unmarked (!= 0) entry in the array</span>
<span class="n">i</span><span class="p">++;</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">var</span> <span class="n">j</span> <span class="p">=</span> <span class="n">i</span><span class="p">;</span> <span class="n">j</span> <span class="p"><</span> <span class="n">elements</span><span class="p">;</span> <span class="n">j</span><span class="p">++)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">numbers</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="p">!=</span> <span class="m">0</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">i</span> <span class="p">=</span> <span class="n">j</span><span class="p">;</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="n">numbers</span><span class="p">.</span><span class="nf">Where</span><span class="p">(</span><span class="n">e</span> <span class="p">=></span> <span class="n">e</span> <span class="p">!=</span> <span class="m">0</span><span class="p">).</span><span class="nf">ToArray</span><span class="p">();</span>
<span class="p">}</span>
</code></pre></div></div>
<h3 id="results">Results</h3>
<p>So, after all that, how do those 4 algorithms actually stack up? Well, I decided to run them together. Let’s look at how they perform when setting the n-variable to 10 and keep on going to a higher order of magnitude, until n equals 1.000.000. The results quickly reveal the immense differences in speed (first measure is milliseconds, second one is ticks):</p>
<div class="language-cs highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="m">4</span> <span class="n">primes</span> <span class="p"><=</span> <span class="m">10</span> <span class="k">in</span> <span class="m">0</span><span class="n">ms</span><span class="p">,</span> <span class="m">787</span><span class="nf">ts</span> <span class="p">(</span><span class="n">SimplePrime</span><span class="p">)</span>
<span class="m">4</span> <span class="n">primes</span> <span class="p"><=</span> <span class="m">10</span> <span class="k">in</span> <span class="m">0</span><span class="n">ms</span><span class="p">,</span> <span class="m">889</span><span class="nf">ts</span> <span class="p">(</span><span class="n">OptimusPrime</span><span class="p">)</span>
<span class="m">4</span> <span class="n">primes</span> <span class="p"><=</span> <span class="m">10</span> <span class="k">in</span> <span class="m">3</span><span class="n">ms</span><span class="p">,</span> <span class="m">9</span><span class="p">,</span><span class="m">427</span><span class="nf">ts</span> <span class="p">(</span><span class="n">SieveOfEratosthenes</span><span class="p">)</span>
<span class="m">4</span> <span class="n">primes</span> <span class="p"><=</span> <span class="m">10</span> <span class="k">in</span> <span class="m">1</span><span class="n">ms</span><span class="p">,</span> <span class="m">3</span><span class="p">,</span><span class="m">171</span><span class="nf">ts</span> <span class="p">(</span><span class="n">OptimusSieveOfEratosthenes</span><span class="p">)</span>
<span class="m">25</span> <span class="n">primes</span> <span class="p"><=</span> <span class="m">100</span> <span class="k">in</span> <span class="m">0</span><span class="n">ms</span><span class="p">,</span> <span class="m">49</span><span class="nf">ts</span> <span class="p">(</span><span class="n">SimplePrime</span><span class="p">)</span>
<span class="m">25</span> <span class="n">primes</span> <span class="p"><=</span> <span class="m">100</span> <span class="k">in</span> <span class="m">0</span><span class="n">ms</span><span class="p">,</span> <span class="m">18</span><span class="nf">ts</span> <span class="p">(</span><span class="n">OptimusPrime</span><span class="p">)</span>
<span class="m">25</span> <span class="n">primes</span> <span class="p"><=</span> <span class="m">100</span> <span class="k">in</span> <span class="m">0</span><span class="n">ms</span><span class="p">,</span> <span class="m">46</span><span class="nf">ts</span> <span class="p">(</span><span class="n">SieveOfEratosthenes</span><span class="p">)</span>
<span class="m">25</span> <span class="n">primes</span> <span class="p"><=</span> <span class="m">100</span> <span class="k">in</span> <span class="m">0</span><span class="n">ms</span><span class="p">,</span> <span class="m">10</span><span class="nf">ts</span> <span class="p">(</span><span class="n">OptimusSieveOfEratosthenes</span><span class="p">)</span>
<span class="m">168</span> <span class="n">primes</span> <span class="p"><=</span> <span class="m">1</span><span class="p">,</span><span class="m">000</span> <span class="k">in</span> <span class="m">0</span><span class="n">ms</span><span class="p">,</span> <span class="m">2</span><span class="p">,</span><span class="m">342</span><span class="nf">ts</span> <span class="p">(</span><span class="n">SimplePrime</span><span class="p">)</span>
<span class="m">168</span> <span class="n">primes</span> <span class="p"><=</span> <span class="m">1</span><span class="p">,</span><span class="m">000</span> <span class="k">in</span> <span class="m">0</span><span class="n">ms</span><span class="p">,</span> <span class="m">268</span><span class="nf">ts</span> <span class="p">(</span><span class="n">OptimusPrime</span><span class="p">)</span>
<span class="m">168</span> <span class="n">primes</span> <span class="p"><=</span> <span class="m">1</span><span class="p">,</span><span class="m">000</span> <span class="k">in</span> <span class="m">0</span><span class="n">ms</span><span class="p">,</span> <span class="m">71</span><span class="nf">ts</span> <span class="p">(</span><span class="n">SieveOfEratosthenes</span><span class="p">)</span>
<span class="m">168</span> <span class="n">primes</span> <span class="p"><=</span> <span class="m">1</span><span class="p">,</span><span class="m">000</span> <span class="k">in</span> <span class="m">0</span><span class="n">ms</span><span class="p">,</span> <span class="m">51</span><span class="nf">ts</span> <span class="p">(</span><span class="n">OptimusSieveOfEratosthenes</span><span class="p">)</span>
<span class="m">1</span><span class="p">,</span><span class="m">229</span> <span class="n">primes</span> <span class="p"><=</span> <span class="m">10</span><span class="p">,</span><span class="m">000</span> <span class="k">in</span> <span class="m">51</span><span class="n">ms</span><span class="p">,</span> <span class="m">129</span><span class="p">,</span><span class="m">781</span><span class="nf">ts</span> <span class="p">(</span><span class="n">SimplePrime</span><span class="p">)</span>
<span class="m">1</span><span class="p">,</span><span class="m">229</span> <span class="n">primes</span> <span class="p"><=</span> <span class="m">10</span><span class="p">,</span><span class="m">000</span> <span class="k">in</span> <span class="m">2</span><span class="n">ms</span><span class="p">,</span> <span class="m">5</span><span class="p">,</span><span class="m">501</span><span class="nf">ts</span> <span class="p">(</span><span class="n">OptimusPrime</span><span class="p">)</span>
<span class="m">1</span><span class="p">,</span><span class="m">229</span> <span class="n">primes</span> <span class="p"><=</span> <span class="m">10</span><span class="p">,</span><span class="m">000</span> <span class="k">in</span> <span class="m">0</span><span class="n">ms</span><span class="p">,</span> <span class="m">503</span><span class="nf">ts</span> <span class="p">(</span><span class="n">SieveOfEratosthenes</span><span class="p">)</span>
<span class="m">1</span><span class="p">,</span><span class="m">229</span> <span class="n">primes</span> <span class="p"><=</span> <span class="m">10</span><span class="p">,</span><span class="m">000</span> <span class="k">in</span> <span class="m">0</span><span class="n">ms</span><span class="p">,</span> <span class="m">299</span><span class="nf">ts</span> <span class="p">(</span><span class="n">OptimusSieveOfEratosthenes</span><span class="p">)</span>
<span class="m">9</span><span class="p">,</span><span class="m">592</span> <span class="n">primes</span> <span class="p"><=</span> <span class="m">100</span><span class="p">,</span><span class="m">000</span> <span class="k">in</span> <span class="m">3</span><span class="p">,</span><span class="m">848</span><span class="n">ms</span><span class="p">,</span> <span class="m">9</span><span class="p">,</span><span class="m">742</span><span class="p">,</span><span class="m">503</span><span class="nf">ts</span> <span class="p">(</span><span class="n">SimplePrime</span><span class="p">)</span>
<span class="m">9</span><span class="p">,</span><span class="m">592</span> <span class="n">primes</span> <span class="p"><=</span> <span class="m">100</span><span class="p">,</span><span class="m">000</span> <span class="k">in</span> <span class="m">57</span><span class="n">ms</span><span class="p">,</span> <span class="m">145</span><span class="p">,</span><span class="m">474</span><span class="nf">ts</span> <span class="p">(</span><span class="n">OptimusPrime</span><span class="p">)</span>
<span class="m">9</span><span class="p">,</span><span class="m">592</span> <span class="n">primes</span> <span class="p"><=</span> <span class="m">100</span><span class="p">,</span><span class="m">000</span> <span class="k">in</span> <span class="m">1</span><span class="n">ms</span><span class="p">,</span> <span class="m">4</span><span class="p">,</span><span class="m">705</span><span class="nf">ts</span> <span class="p">(</span><span class="n">SieveOfEratosthenes</span><span class="p">)</span>
<span class="m">9</span><span class="p">,</span><span class="m">592</span> <span class="n">primes</span> <span class="p"><=</span> <span class="m">100</span><span class="p">,</span><span class="m">000</span> <span class="k">in</span> <span class="m">1</span><span class="n">ms</span><span class="p">,</span> <span class="m">2</span><span class="p">,</span><span class="m">866</span><span class="nf">ts</span> <span class="p">(</span><span class="n">OptimusSieveOfEratosthenes</span><span class="p">)</span>
<span class="m">78</span><span class="p">,</span><span class="m">498</span> <span class="n">primes</span> <span class="p"><=</span> <span class="m">1</span><span class="p">,</span><span class="m">000</span><span class="p">,</span><span class="m">000</span> <span class="k">in</span> <span class="m">337</span><span class="p">,</span><span class="m">699</span><span class="n">ms</span><span class="p">,</span> <span class="m">854</span><span class="p">,</span><span class="m">801</span><span class="p">,</span><span class="m">981</span><span class="nf">ts</span> <span class="p">(</span><span class="n">SimplePrime</span><span class="p">)</span>
<span class="m">78</span><span class="p">,</span><span class="m">498</span> <span class="n">primes</span> <span class="p"><=</span> <span class="m">1</span><span class="p">,</span><span class="m">000</span><span class="p">,</span><span class="m">000</span> <span class="k">in</span> <span class="m">1</span><span class="p">,</span><span class="m">467</span><span class="n">ms</span><span class="p">,</span> <span class="m">3</span><span class="p">,</span><span class="m">715</span><span class="p">,</span><span class="m">209</span><span class="nf">ts</span> <span class="p">(</span><span class="n">OptimusPrime</span><span class="p">)</span>
<span class="m">78</span><span class="p">,</span><span class="m">498</span> <span class="n">primes</span> <span class="p"><=</span> <span class="m">1</span><span class="p">,</span><span class="m">000</span><span class="p">,</span><span class="m">000</span> <span class="k">in</span> <span class="m">24</span><span class="n">ms</span><span class="p">,</span> <span class="m">63</span><span class="p">,</span><span class="m">136</span><span class="nf">ts</span> <span class="p">(</span><span class="n">SieveOfEratosthenes</span><span class="p">)</span>
<span class="m">78</span><span class="p">,</span><span class="m">498</span> <span class="n">primes</span> <span class="p"><=</span> <span class="m">1</span><span class="p">,</span><span class="m">000</span><span class="p">,</span><span class="m">000</span> <span class="k">in</span> <span class="m">13</span><span class="n">ms</span><span class="p">,</span> <span class="m">33</span><span class="p">,</span><span class="m">610</span><span class="nf">ts</span> <span class="p">(</span><span class="n">OptimusSieveOfEratosthenes</span><span class="p">)</span>
</code></pre></div></div>
<p>You can see that the Eratosthenes-algorithm totally slays the simple algorithm from the moment n equals or is higher than 100, and it gets progressively worse for the simple algorithm from there.</p>
<p>Now for the big point I was trying to make all along: sure there is a big difference between the naive prime algorithm (SimplePrime) and the optimization (OptimusPrime, which was even more optimized than the suggestion I got). The difference is immense. However, from the moment n is 1000 or more, that so-called optimization becomes pretty damn slow itself! Compare the optimized basic algorithm with the basic sieve algorithm and notice the incredible difference in speed. Now that is what I call a fast algorithm! Forget about calculating a 100 primes. How about +75.000 primes in 24ms compared to 1.4 seconds…?</p>
<p>Additionally, to really hammer the point home, take a look at the gains that I got from optimizing the sieve algorithm even further! You really have to look at them side by side, and make n even bigger to really appreciate what you can get there:</p>
<div class="language-cs highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="m">1</span><span class="p">,</span><span class="m">229</span> <span class="n">primes</span> <span class="p"><=</span> <span class="m">10</span><span class="p">,</span><span class="m">000</span> <span class="k">in</span> <span class="m">4</span><span class="n">ms</span><span class="p">,</span> <span class="m">10</span><span class="p">,</span><span class="m">505</span><span class="nf">ts</span> <span class="p">(</span><span class="n">SieveOfEratosthenes</span><span class="p">)</span>
<span class="m">1</span><span class="p">,</span><span class="m">229</span> <span class="n">primes</span> <span class="p"><=</span> <span class="m">10</span><span class="p">,</span><span class="m">000</span> <span class="k">in</span> <span class="m">1</span><span class="n">ms</span><span class="p">,</span> <span class="m">3</span><span class="p">,</span><span class="m">614</span><span class="nf">ts</span> <span class="p">(</span><span class="n">OptimusSieveOfEratosthenes</span><span class="p">)</span>
<span class="m">9</span><span class="p">,</span><span class="m">592</span> <span class="n">primes</span> <span class="p"><=</span> <span class="m">100</span><span class="p">,</span><span class="m">000</span> <span class="k">in</span> <span class="m">3</span><span class="n">ms</span><span class="p">,</span> <span class="m">8</span><span class="p">,</span><span class="m">469</span><span class="nf">ts</span> <span class="p">(</span><span class="n">SieveOfEratosthenes</span><span class="p">)</span>
<span class="m">9</span><span class="p">,</span><span class="m">592</span> <span class="n">primes</span> <span class="p"><=</span> <span class="m">100</span><span class="p">,</span><span class="m">000</span> <span class="k">in</span> <span class="m">1</span><span class="n">ms</span><span class="p">,</span> <span class="m">4</span><span class="p">,</span><span class="m">418</span><span class="nf">ts</span> <span class="p">(</span><span class="n">OptimusSieveOfEratosthenes</span><span class="p">)</span>
<span class="m">78</span><span class="p">,</span><span class="m">498</span> <span class="n">primes</span> <span class="p"><=</span> <span class="m">1</span><span class="p">,</span><span class="m">000</span><span class="p">,</span><span class="m">000</span> <span class="k">in</span> <span class="m">34</span><span class="n">ms</span><span class="p">,</span> <span class="m">86</span><span class="p">,</span><span class="m">399</span><span class="nf">ts</span> <span class="p">(</span><span class="n">SieveOfEratosthenes</span><span class="p">)</span>
<span class="m">78</span><span class="p">,</span><span class="m">498</span> <span class="n">primes</span> <span class="p"><=</span> <span class="m">1</span><span class="p">,</span><span class="m">000</span><span class="p">,</span><span class="m">000</span> <span class="k">in</span> <span class="m">17</span><span class="n">ms</span><span class="p">,</span> <span class="m">45</span><span class="p">,</span><span class="m">368</span><span class="nf">ts</span> <span class="p">(</span><span class="n">OptimusSieveOfEratosthenes</span><span class="p">)</span>
<span class="m">664</span><span class="p">,</span><span class="m">579</span> <span class="n">primes</span> <span class="p"><=</span> <span class="m">10</span><span class="p">,</span><span class="m">000</span><span class="p">,</span><span class="m">000</span> <span class="k">in</span> <span class="m">353</span><span class="n">ms</span><span class="p">,</span> <span class="m">894</span><span class="p">,</span><span class="m">920</span><span class="nf">ts</span> <span class="p">(</span><span class="n">SieveOfEratosthenes</span><span class="p">)</span>
<span class="m">664</span><span class="p">,</span><span class="m">579</span> <span class="n">primes</span> <span class="p"><=</span> <span class="m">10</span><span class="p">,</span><span class="m">000</span><span class="p">,</span><span class="m">000</span> <span class="k">in</span> <span class="m">185</span><span class="n">ms</span><span class="p">,</span> <span class="m">469</span><span class="p">,</span><span class="m">128</span><span class="nf">ts</span> <span class="p">(</span><span class="n">OptimusSieveOfEratosthenes</span><span class="p">)</span>
<span class="m">5</span><span class="p">,</span><span class="m">761</span><span class="p">,</span><span class="m">455</span> <span class="n">primes</span> <span class="p"><=</span> <span class="m">100</span><span class="p">,</span><span class="m">000</span><span class="p">,</span><span class="m">000</span> <span class="k">in</span> <span class="m">3</span><span class="p">,</span><span class="m">406</span><span class="n">ms</span><span class="p">,</span> <span class="m">8</span><span class="p">,</span><span class="m">622</span><span class="p">,</span><span class="m">256</span><span class="nf">ts</span> <span class="p">(</span><span class="n">SieveOfEratosthenes</span><span class="p">)</span>
<span class="m">5</span><span class="p">,</span><span class="m">761</span><span class="p">,</span><span class="m">455</span> <span class="n">primes</span> <span class="p"><=</span> <span class="m">100</span><span class="p">,</span><span class="m">000</span><span class="p">,</span><span class="m">000</span> <span class="k">in</span> <span class="m">1</span><span class="p">,</span><span class="m">860</span><span class="n">ms</span><span class="p">,</span> <span class="m">4</span><span class="p">,</span><span class="m">708</span><span class="p">,</span><span class="m">271</span><span class="nf">ts</span> <span class="p">(</span><span class="n">OptimusSieveOfEratosthenes</span><span class="p">)</span>
</code></pre></div></div>
<p>That’s 5.761.455 prime numbers, calculated in almost the same time as the optimized basic algorithm took to calculate 78.498. That’s almost 2 orders of magnitude right there. Case closed.</p>
<h3 id="o">O</h3>
<p>The reason for all of this is obvious. The simple algorithm has close to quadratic complexity. Now, granted, for getting primes under 100 or even 1000 or 10.000, you won’t notice that. You could defend that. But the point of an algorithmic challenge like this is being able to find a solution that may not be perfect, or the best solution (the best solutions as far as I’m aware of are optimized AKS-algorithms, for which the original authors received a Gödel prize in 2006, we’re not quite aiming for that), but we should always aim for a solution that keeps standing when numbers get fairly high. A quadratic solution is not that. In fact, that’s pretty aweful.</p>
<p>The optimized basic algorithm still has O(n* sqrt(n)) complexity. The Sieve of Eratosthenes has O(n (log n) (log log n)) complexity. Although it’s not extremely difficult to analyze what’s happening here and go over how many iterations the inner loop does per outer loop, I’d suggest you read up on it <a href="https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes" target="_blank">here</a>.</p>
<p>As I said, this algorithm is definitely not the fastest. It can be optimized much further than I did (although that would raise the complexity quite a bit), and even then other sieves can outperform it. I do feel though, that this is a great first algorithm to learn besides the obvious search and sort algorithm every courses starts beating you over the head with. Although they are important, this is a nice alternative. Just remember: although it’s very tempting to give up after a first attempt, don’t! Stick at it, take a step back and think about the problem before coding it out. You’ll learn the most, and a bit of headache never killed anyone ;-)</p>
<p>If you notice anything silly or sloppy in this explanation, or have remarks or questions, do let me know! I’d love to hear your thoughts on this!</p>
Fri, 20 May 2022 16:00:00 +0000
https://www.peculiar-coding-endeavours.com/2022/prime-skills/
https://www.peculiar-coding-endeavours.com/2022/prime-skills/algorithmseratosthenesprime numbersmathematicsTechAlgorithmsHow artificial intelligence is transforming software engineering<h4 id="automation-of-the-automation-process"><strong>Automation of the automation process</strong></h4>
<p><img src="https://www.peculiar-coding-endeavours.com/assets/haitse/ai.png" alt="ai" /></p>
<p><em>Abstract</em> — This paper explores the influences of the growing field of artificial intelligence (AI) on the software development process. Several techniques and their potential positive effects on multiple areas of software development will be explored. The main focus is on requirements engineering, the development process, testing and deployment, and the human factor in the field of software development as a whole. A number of scenarios concerning the future influence of AI on software engineering are presented.</p>
<h3 id="i----introduction">I. Introduction</h3>
<p>Software development has been a strongly evolving discipline from the moment of its inception. Surely it has changed society for the better in a multitude of ways. Among many other things, software regulates our nuclear power plants and power grids, enables businesses to do their work more efficiently, monitors the thousands of airline flights traversing our skylines daily, helps surgeons to perform their procedures with a far higher degree of safety and precision, facilitates communication between all human beings through social networks and has changed the way we shop for groceries, clothing, electronics and many other things.</p>
<p>Being a relatively new field of science though, software engineering (SE) has displayed many growing pains and faced a fair share of challenges in a number of areas. SE has not yet benefited from the hundreds of years of collective experience that, for example, the fields of manufacturing and construction industries have, and is still prone to many growing pains. One example of this is the perpetual state of inexperience that the available software engineering workforce finds itself in. The number of software engineers has grown vastly over the last decades, as did the rise of new technologies, programming paradigms and languages. Due to the combination of the relative short shelf-life of frameworks, constant emergence of new languages, and the enormous growth of new software engineers that graduate and become available to the market, a relatively high percentage of software engineers have a very limited set of experiences. This situation results in new generations of software engineers making a lot of the errors the previous generations made all over again. Another challenge is the actual process of software development and management of the software development projects. Over the last decades, this set of activities has undergone an evolution on its own, moving from sequential design processes like Waterfall (in several incarnations and variations) to iterative approaches, with Agile being the predominantly used methodology nowadays. This evolution came to be out of sheer necessity, after a multitude of reports, like the CHAOS report by the Standish Group, showing the low success rate of software development endeavors using sequential project management methods. The software implementation process has evolved strongly because of the larger scope of systems, available hardware and technological advancements such cloud technologies. The role of software and information technology in many enterprises has changed thanks to the DevOps movement. While software development used to happen as an isolated set of practices, by operational units that were largely separated from the other operations of the business, DevOps has the philosophy of deeper involvement of other stakeholders with the development and deployment processes, and as such, changed the way in which software development is seen in many business.</p>
<p>In the fast-moving field of software development, it would only be a matter of time before the very process of automation itself got automated. Artificial intelligence (AI), the field of computer science in which sophisticated algorithms and mathematics are used in order for machines or software to emulate human-like properties such as learning, cognitive functions and the seemingly portraying of a certain degree of intelligence, has found its way into software engineering itself. Research on the use of various techniques in the field of artificial intelligence has demonstrated interesting new ways of improving the software development process <a href="#2">[2]</a><a href="#3">[3]</a>. This paper will focus on some of these initiatives. In section 2, the common ground that SE and AI share in terms of their problem areas will be explained. Sections 3, 4 and 5 will explore several current contributions of AI to SE in the development process, testing and deployment, and the human factor of software engineering respectively. Concluding, we will look forward to potential new areas to explore, and manners in which AI can further benefit software and requirements engineering, as well as some afterthoughts on the weight we put on AI as a whole.</p>
<h3 id="ii---common-ground">II. Common ground</h3>
<p>Software engineering is a complex process, with a vast amount of uncertainties and variables. During the requirements engineering phase, stakeholders with many different points of view must come together and try to clearly define goals, often using terms that are unfamiliar to the other party. The technologies that will be used to realize the defined requirements must be selected. Change scenarios must be drawn out, often for a considerable time in the future. Estimates and deadlines for the delivery of milestone features will be set. Lost time due to obstructions and unforeseen delays must be taken into account. Continuous feedback and communication lines between multiple stakeholders will have to be created. The sheer number of unknowns in all of the stages of software development makes it an extremely challenging undertaking. The rise of project management methodologies such as Agile has partially accommodated for these unknowns, and has incorporated them in the software development process using iterative approaches.</p>
<p>In <a href="#2">[2]</a>, Haman presented a number of examples of how AI is already being applied in SE, because of the commonalities of the problem sets that AI and SE are involved in. Both AI and SE deal with complex real world problems, where a great deal of unknowns, variables and ill-defined data are often the starting point. Another similarity is the practice of optimization over time. In the field of AI, techniques and algorithms such as probabilistic reasoning and neural networks are aimed at evolving and learning, and can therefore optimize their output over time. In a very similar fashion, the experience of project managers, software architects and developer teams concerning the estimations of deadlines, the use of architectural patterns, manageable sprint sizes (in the case of Agile methods) is a determining factor of the potential accuracy of those estimates and the quality of the outcome.</p>
<h3 id="iii--influence-on-the-development-process">III. Influence on the development process</h3>
<p>As clearly demonstrated in <a href="#1">[1]</a>, several parts of the software development process have already been benefiting greatly from automation. The specific areas that are most affected have been configuration, quality assurance, testing and deployment. Several developments and studies have shown the possibility of AI being successfully utilized in earlier stages of the software engineering process as well, such as requirements engineering and the actual development of the software. By taking a lot of work off the developer’s hands using AI techniques, the process of automation itself could be partially automated. Previous work, referred to in <a href="#2">[2]</a><a href="#3">[3]</a><a href="#5">[5]</a>, shows that applying AI to the software development process can also dramatically diminish work as well as risks in various parts of the process. In that context, AI presents itself as a valuable tool for increasing quality and efficiency in many stages.</p>
<p>For example, requirements elicitation, architectural design and code refactoring can be facilitated using several AI techniques such as Search Based Software Engineering (SBSE) <a href="#3">[3]</a><a href="#5">[5]</a>. In SBSE, search techniques such as genetic algorithms are applied by redefining SE problems as optimization problems. An example of the application of SBSE is the use of genetic algorithms for code generation. The obtained code samples can then be considered as the population, which are then evolved and tested for survivability and optimization.</p>
<p>Using natural-language requirements (NLR) documents such as user stories and use cases, several attempts have been made to translate these documents into formal specifications<a href="#3">[3]</a>. However, much work is still to be done in this area, and the produced result should still only be seen as a guideline, yet to be approved by stakeholders with the relevant domain knowledge.</p>
<p>Estimation in SE is performed to plan deadlines, anticipate obstacles, and prepare for changes in architecture, requirements, technologies and the very business domain itself. These estimates are often made using incomplete and blurry data, gathered from sources that may not always be as trustworthy as we would like. In this context, statistical methods such as Bayesian models have been used in order to make more accurate estimates in a speedier fashion <a href="#2">[2]</a>, instead of solely depending on the experience of project managers. Another application of these models in the field of requirements engineering is by using them to test the quality of requirements specifications <a href="#3">[3]</a>.</p>
<h3 id="iv--testing-and-deployment">IV. Testing and deployment</h3>
<p>A quite time-consuming portion of the development process goes to testing and verification of the produced software. Although techniques such as Test Driven Design (TDD) have facilitated this by taking up a test-first approach, integration testing and testing scenarios where certain resources become unavailable at unpredictable times are still time-consuming and difficult to define and generate, especially for modern cloud-based distributed applications. Similar to using AI for code generation, SBSE can be applied to the generation of increasingly optimal test cases. AI algorithms can be used to take down certain services during certain test scenarios, or introduce high loads to several parts of the infrastructure to test the integrity of the architecture. An additional contribution of AI in this regard is the possibility of continuously making small adjustments to configurations in order to optimize performance.</p>
<p>With the emergence of new technologies, the sheer number of types of devices that software is being deployed to has been increasing. Desktop computers, smartphones, tablets, Internet of Things (IoT) devices can differ greatly in terms of hardware, processing power and bandwidth, while the functional requirements of the software often should remain the same. AI can help to bridge the gap and take a lot of work off the hands of developers in this area, as explained by Haman in <a href="#2">[2]</a>, by automatically patching and porting the software to the desired platform.</p>
<h3 id="v---the-human-factor">V. The human factor</h3>
<p>In <a href="#1">[1]</a>, Fugetta and Di Nitto refer to a number of studies showing the success of Scrum and other Agile approaches. The strong focus on collaboration and interactions between stakeholders are very clear contributors to the success of these methods. In <a href="#4">[4]</a>, it became clear that the influence of motivation, salaries, training and career opportunities, working circumstances, colleague cooperation and other human factors have an overwhelming influence on the outcome of software development processes and the quality of the software it produces. Using AI techniques, the potential negative influence and unpredictability of these human factors can be more closely monitored, anticipated, and perhaps even diminished, resulting in higher quality software, with a higher degree of predictability.</p>
<p>A determining part of the effectiveness of quality assurance obviously is the quality of the incoming data concerning user experiences. Although testing is already being greatly automated using tools like Selenium and Fitnesse, many elements in the user experience area of testing are fuzzy, and rather subjective. Another point of high contact with end users is the requirements engineering phase of a software development process. A great deal of stakeholder-driven requirement elicitation techniques such as interviews and group sessions are both time-consuming as potentially fuzzy. For one, we must assume that the data our interviewees provide us with is factual and based on truth. Secondly, it is not far-fetched that many people would alter the intensity of their message, depending on who sits in front of them. Of course, data can be collected anonymously, but a certain degree of bias will likely remain present whenever people are aware they are being monitored or probed for their feedback.</p>
<h3 id="vi--future-work">VI. Future work</h3>
<p>Although AI has been a field of study for over 50 years, its applicability in the field has only been really viable for the last decade, due to the massive availability of data and data storage capacity, as well as the increasing processing power and cloud infrastructure to facilitate all of this. As a result, enterprises have been adopting AI on an increasing scale in order to tackle a varying array of problems. AI is quickly conquering many branches of business, due to the broad range of problems it can be applied to. It sometimes seems that the possibilities of this exciting new technology are endless. E-commerce websites that adapt the list of shown products to the online behavior of the current user, streaming and video-on-demand services adapting their offer per customer based on their previously watched movies, voice-activated smart-home hubs to control lighting, heating, and many other functionalities at home, are things that are fairly well established.</p>
<p>AI is definitely changing the way things are done in many areas, and a lot of the work that is now performed by way of manual labor can be done by intelligent AI algorithms (perhaps combined with robotics). The last decades, a lot of jobs have already been replaced by this combination, creating a vastly different landscape. After many decades of automating our work, taking the heavy burden of our human shoulders, we now seems to be entering an era where we start to automate the process of automation itself. The job of software developers will undoubtedly change in ways we cannot fully predict as of today. However, the possibilities of these new technologies are already creating an evolution of the software development process, with many changes still to come.</p>
<p>Using AI, new approaches such as intelligent chat bots or face-recognition algorithms catching micro-expressions in order to determine user experiences of prototypes, can be used as an additional source of trustworthy data. Filling out a questionnaire is a single point of contact in time, potentially biased and dependent on current mood, most recent experiences, exposure to other systems or alternatives, whereas data gathered in an unobtrusive manner over a longer period of time, interpreted using highly sophisticated AI techniques, and automatically aggregated to serve several stakeholders, is just one of the ways software engineers could gather higher quality data, and even more feedback from end users. Feedback from certain sources could be given a higher degree of importance, based on experiences in the past, thus optimizing the requirements engineering process. This is clearly an interesting new area where AI can be used to cancel out possible issues that are inherently connected to the subjectivity of human interaction.</p>
<p>At the same time, we must be weary not to fall prey to an over-reliance on AI. Overemphasis on the use of any emerging technology can be quite a common pitfall. The realization that AI is simply another tool, and that we should treat it as such, is an important one. As brought up in <a href="#5">[5]</a>, it is often beneficial to keep incorporating human knowledge, judgment and experience in the process. Concerning the specific application of AI in software development, it is obvious that many benefits can be gained. To make the most out of these opportunities, and to build upon the knowledge and experiences that have already been gathered in the software engineering community, it would be wise to remember that successful software engineering is the product of strong collaborations between many stakeholders. In this list of stakeholders, artificial intelligence can definitely be added as a contributor, a facilitator and an enabler.</p>
<h3 id="references">References</h3>
<p>[1] <a name="1">Fuggetta A, Di Nitto E. 2014. Software process. In <em>Proceedings of the on Future of Software Engineering</em>, May 2014, 1-12.</a></p>
<p>[2] <a name="2">Harman M. 2012. The role of Artificial Intelligence in Software Engineering. In <em>Proceedings of the First International Workshop on Realizing AI Synergies in Software Engineering (RAISE)</em>, 2012, June 5, 1-6.</a></p>
<p>[3] <a name="3">Ammar H.H., Abdelmoez W., Hamdi M.S. 2012. Software engineering using artificial intelligence techniques: current state and open problems. The Second International Conference on Communications and Information Technology, February 2012.</a></p>
<p>[4] <a name="4">Fred. J. Heemstra, Rob. J. Kusters and Jos J.M. Trienekens. 2002. Invloed van de factor mens op softwarekwaliteit. In <em>Softwarekwaliteit</em> 2002, 253-271</a></p>
<p>[5] <a name="5">M. Harman: The relationship between search based software engineering and predictive modeling. In <em>6th International Conference on Predictive Models in Software Engineering, Timisoara, Romania</em> (2010)</a></p>
Mon, 07 Mar 2022 17:30:00 +0000
https://www.peculiar-coding-endeavours.com/2022/how-ai-transforming-se/
https://www.peculiar-coding-endeavours.com/2022/how-ai-transforming-se/artificial intelligencesoftware developmentsoftware requirements engineeringTechArtificial intelligenceThrough the trees...<p>Some wise man - Jim Rohn, for the record - once said:</p>
<blockquote>
<p>“Celebrate your achievements”</p>
</blockquote>
<p>so since I’ve worked diligently on getting through my <a href="https://www.coursera.org/specializations/data-structures-algorithms" target="_blank">Coursera Data Structures and Algorithms Specialization</a> courses over the last months, and I just completed the specialization on graphs, I decided to spoil myself with a little treat. I own several overly nerdy hoodies and shirts, although I only ever wear them at home, since I don’t really get the “software engineers dressing like teenagers at work” thing.</p>
<p>So instead of buying another piece of apparel to wear once a year, I conjured up the incredibly original idea that I would buy my own mug for work, and a similar one for in my car. The kind that makes it very obvious what I do for a living, but does have an underlying tongue-in-cheek joke engrained. People in my own field of specialty would immediately recognise it, whereas outsiders just shrug their shoulders and think of me as some kind of mad scientist computer magician man. Something like that. So, in short, I can share some knowledge regarding a topic I love, can reward myself with a fun little item, can write about it here which further engrains that piece of knowledge, plus I get to differentiate myself from the general population of software engineers that think this kind of stuff isn’t too useful or have never heard of it at all.</p>
<p><img src="https://www.peculiar-coding-endeavours.com/assets/through_the_trees/mug.png" alt="kruskal mug" /></p>
<h3 id="kruskal-trees-potato">Kruskal? Trees? Potato?</h3>
<p>So, what is this article actually about? My (basic) implementation of Kruskal’s algorithm. The reason I chose this particular algorithm is that it gave me the opportunity to combine several interesting things I’ve learned over the last year: I get to show some Python code and at the same time I can explain some of my favourite data structures and how I implemented them for this particular example. Also, there was no mug with an edgy (get it, edgy?) reference to a bidirectional A* search. So without much further delay, let’s dive into the forest (get it? trees, forest? wink wink!) of what Kruskal’s algorithm actually is, and why you or anyone should care.</p>
<p>So, who is this Kruskal guy, what’s with the forest, and what am I on about? Well, obviously he’s a pretty big deal, which explains the algorithm that carries his name. I think we can all agree that you kind of hit a homerun when you get a number, algorithm, or organ named after you. In this case, we’re talking about an algorithm to compute the <strong>minimal spanning tree</strong> in a weighted graph. Now, I’m not going to elaborate on graph theory, so if you have no clue what I’m talking about, get a book on discrete math and read up on it. Alternatively, to have a quick condensed read: <a href="https://www.khanacademy.org/computing/computer-science/algorithms/graph-representation/a/describing-graphs" target="_blank">Khan Academy</a> has a nice short article to get you up to speed on the basic terminology. If you don’t care about the article, proceed with the section below. You can skip that section if you do read the article, or are in no need of further convincing.</p>
<h3 id="so-what">So what?</h3>
<p>That’s always a nice question, isn’t it? Why should you care about graphs, algorithms and data structures, all the boring stuff nobody talks about? Nowadays, all you hear about are tools, technologies and frameworks like Angular or React, NoSQL databases, cloud technologies, Docker, microservice architectures, big data, VR, blockchain, IoT, etc… So shouldn’t you focus on those over boring abstract dusty computer science stuff you hardly ever hear about?</p>
<p>Yes and no. Of course these things have great importance, depending on what you want to do and where your specialty lies. However, what we do is still called computer <strong>science</strong>, and software <strong>engineering</strong>. Like it or not, “abstract dusty computer science stuff” still lies at the core of all we do, whether we consciously think about it or not. If you are satisfied with learning a new framework or tool set every few years, and using that to make a nice living, by all means, have at it. Great and fulfilling careers can be had this way. However, there’s so much more exciting stuff out there, and many more satisfying rewards to reap. But obviously, you have to bite down and do more than play with the coolest new tools and toys that someone else created for you. Although we don’t all get the chance to develop the next big thing in technology, that’s no excuse to disregard the more grainy, abstract, math-related topics. Sure it can hurt your brain a bit more, since we’re not talking about the instant gratification that many frameworks give you. But topics like these lead to amazing possibilities, and will at least demystify a lot of contemporary developments in the field. So let’s not settle for the low-hanging fruits and pay respect to the fundamentals. Or like another great man once said:</p>
<blockquote>
<p>“You can practice shooting eight hours a day, but if your technique is wrong, then all you become is very good at shooting the wrong way. <strong>Get the fundamentals down and the level of everything you do will rise.</strong>”</p>
</blockquote>
<p>That’s Michael Jordan for you. He didn’t become one of the greatest players of his sport ever by chance, nor by practicing dunks and trickshots all day. He did it by repeated and deliberate focus on the core building blocks of the game. That’s how to approach a craft, which is what software engineering ultimately is. So of course, we need to learn all these new technological developments that fit in our niche. But remember that they are the end product of creative and smart applications of core fundamentals.</p>
<p>I was lucky enough to partake in some job interviews for one of the top 5 IT companies. It won’t surprise you that I didn’t get questioned about my knowledge of C#, SOLID design principles, JavaScript frameworks, or whether I worked with this or that tool. We’re all assumed to be able to learn these quickly, as tools and frameworks come and go all the time. Being able to quickly understand, learn and apply them is a bare minimum of what is expected of us as software engineers or computer scientists. I was provided with a whiteboard and a marker and hammered with implementations of binary splay trees, optimisations of sort algorithms, worst case time and space complexity of various search methods. To me, what defines our quality is the understanding of fundamentals and the ability to apply them in order to achieve the newest technologies. After all, what do you think lies at the core of AI, data science, VR, all these amazing new technologies that you hear about all the time and that are changing our world daily? (that was rhetorical, I think you get the point now, so if you haven’t already, please read up on graphs).</p>
<h3 id="back-to-the-forest">Back to the forest</h3>
<p>I’m going to overuse that reference to death. But back to minimum spanning trees, since you now know all about graphs. So what Kruskal’s algorithm entails, is a nice efficient way (runs in O(|E| log |V|) time if you choose your data structures well) to find the minimum spanning tree in a connected weighted graph. Now, what is a spanning tree? I might want to explain that since I’m already several paragraphs in ;-)</p>
<p>Put simply, a minimum spanning tree is a subset of the edges of a connected weighted graph that connects all the vertices in such a way that no cycles are created and the total edge weight is minimised. What’s with the forest then? Well, any weighted undirected graph has a minimum spanning forest, which is nothing more than the union of the minimum spanning trees for its connected components. That’s basically part of the joke on the mug. I know, it’s a homerun of a joke, and if you need to pause this read to catch a breath because of all your laughter, I would understand.</p>
<p>But now we know what a minimum spanning tree is, why care? What is it used for? Well, the applications are pretty numerous. The standard application is that of a telephone, electrical, TV cable or road grid. We want to connect all points with the least amount of cable/wire/road possible, since that’s a costly commodity. That’s basically the example I will elaborate a bit further on. However, several more out-of-the-box or indirect use cases do exist. You can use the algorithm to obtain an approximation of NP-problems such as the traveling salesman problem, for clustering data points in data science (although the K-means algorithm can also be used, depending on the application), in an autoconfig protocol for Ethernet bridging to avoid network cycles, and several others. To get the point across and explain the algorithm, we will use a classical road (or rather tunneling) problem.</p>
<h3 id="kruskals-algorithm">Kruskal’s algorithm</h3>
<p>So, finally, what you all came here for ;-) the algorithm itself. Kruskal’s algorithm uses a greedy approach (which means it follows the heuristic of making the optimal choice in each stage of the process) to solve the minimum spanning tree problem. It does so in a few, quite simple, steps:</p>
<ol>
<li>Starting from a number of vertices, create a graph G where each vertex V is a separate tree. In short: no edges exist, just the trees.</li>
<li>Create a collection C containing all possible edges in G.</li>
<li>While C is non-empty and G is not yet spanning: remove the edge E with minimum weight from C. If E connects 2 different trees in G, add it to the forest, thereby combining 2 trees in a single new tree.</li>
</ol>
<p>And that’s basically all there is to it. Additionally, you get the joke on the mug now. Starting from a forest of trees (non-connected vertices), you start connecting them using the possible vertices in increasing order, until there’s only 1 tree: the one containing all vertices, connecting those vertices with the cheapest possible edges that do not introduce cycles in the graph.</p>
<p>As I mentioned earlier - and as is often the case - the running time of the algorithm is dependent on the choice of your data structures. So let’s see what data structures I decided to use. Obviously we need a <strong>graph</strong>, which is the easy bit. I chose a straight forward <strong>adjacency list</strong> implementation. Also, we need a collection to store all edges in. We will perform repeated reads from that collection and pop out the cheapest remaining edge. I decided to use a priority queue in which the priority is the edge cost. I implemented this as a <strong>min binary heap</strong>. As a third and last data structure, I need something that allows me to recognise which trees belong together, as this enables me to detect cycles. When considering to connect 2 vertices by allowing the edge under consideration into the spanning tree, all I need to detect is whether they already belong to the same tree. A <strong>disjoint set</strong> is perfect for this. Let’s focus on each of the data structures first, after which we’ll look at the problem area, and the actual algorithm.</p>
<h3 id="the-graph">The graph</h3>
<p>First and foremost, let me remind you that I’m doing all of this in <strong>Python</strong>. For our graph, we don’t really need anything fancy at all. All we need in this case is a simple dictionary with the vertices as keys, and a list with the neighbours of that key as value in order to save the graph as an adjacency list. I provided a number of simple <strong>dunder methods</strong> (the methods with the 2 underscores in front and back of them) for some basic operations we might want to execute on our graph, as well as a method to add a weighted edge to a vertex. For plain ease and simplicity, we just assume an <strong>undirected graph</strong> and delegate the burden of adding the edge back from neighbour to vertex to the graph. Vertices that don’t exist just get created on the spot. Again, just to keep things simple for the example. Check out the code below:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Graph</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">vertices</span><span class="o">=</span><span class="p">[]):</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__vertices</span> <span class="o">=</span> <span class="p">{</span><span class="n">vertex</span><span class="p">:</span> <span class="p">[]</span> <span class="k">for</span> <span class="n">vertex</span> <span class="ow">in</span> <span class="n">vertices</span><span class="p">}</span>
<span class="k">def</span> <span class="nf">__del__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__vertices</span> <span class="o">=</span> <span class="bp">None</span>
<span class="k">def</span> <span class="nf">__len__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">__vertices</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">__contains__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">vertex</span><span class="p">):</span>
<span class="k">return</span> <span class="n">vertex</span> <span class="ow">in</span> <span class="bp">self</span><span class="p">.</span><span class="n">__vertices</span>
<span class="k">def</span> <span class="nf">__getitem__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">vertex</span><span class="p">):</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">vertex</span> <span class="ow">in</span> <span class="bp">self</span><span class="p">.</span><span class="n">__vertices</span><span class="p">:</span>
<span class="k">return</span> <span class="p">[]</span>
<span class="k">return</span> <span class="bp">self</span><span class="p">.</span><span class="n">__vertices</span><span class="p">[</span><span class="n">vertex</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">__iter__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">for</span> <span class="n">vertex</span> <span class="ow">in</span> <span class="bp">self</span><span class="p">.</span><span class="n">__vertices</span><span class="p">:</span>
<span class="k">yield</span> <span class="n">vertex</span>
<span class="k">def</span> <span class="nf">add_edge</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">vertex</span><span class="p">,</span> <span class="n">neighbour</span><span class="p">,</span> <span class="n">cost</span><span class="p">):</span>
<span class="k">if</span> <span class="n">vertex</span> <span class="ow">not</span> <span class="ow">in</span> <span class="bp">self</span><span class="p">.</span><span class="n">__vertices</span><span class="p">:</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__vertices</span><span class="p">[</span><span class="n">vertex</span><span class="p">]</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">if</span> <span class="n">neighbour</span> <span class="ow">not</span> <span class="ow">in</span> <span class="bp">self</span><span class="p">.</span><span class="n">__vertices</span><span class="p">:</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__vertices</span><span class="p">[</span><span class="n">neighbour</span><span class="p">]</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">if</span> <span class="p">(</span><span class="n">neighbour</span><span class="p">,</span> <span class="n">cost</span><span class="p">)</span> <span class="ow">not</span> <span class="ow">in</span> <span class="bp">self</span><span class="p">.</span><span class="n">__vertices</span><span class="p">[</span><span class="n">vertex</span><span class="p">]:</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__vertices</span><span class="p">[</span><span class="n">vertex</span><span class="p">].</span><span class="n">append</span><span class="p">((</span><span class="n">neighbour</span><span class="p">,</span> <span class="n">cost</span><span class="p">))</span>
<span class="k">if</span> <span class="p">(</span><span class="n">vertex</span><span class="p">,</span> <span class="n">cost</span><span class="p">)</span> <span class="ow">not</span> <span class="ow">in</span> <span class="bp">self</span><span class="p">.</span><span class="n">__vertices</span><span class="p">[</span><span class="n">neighbour</span><span class="p">]:</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__vertices</span><span class="p">[</span><span class="n">neighbour</span><span class="p">].</span><span class="n">append</span><span class="p">((</span><span class="n">vertex</span><span class="p">,</span> <span class="n">cost</span><span class="p">))</span>
</code></pre></div></div>
<h3 id="the-priority-queue">The priority queue</h3>
<p>As mentioned, our next main building block is the collection to store the edges in, as well as getting them out in ascending order of cost as cheaply as possible. This is pretty vital, as it greatly determines the running cost of the entire algorithm. That’s why I chose one of my favourite little data structures: the <strong>min binary heap</strong> to implement a <strong>priority queue</strong>. Again, it’s adjusted for this example, and I know it’s not Pythonic to leave out docstrings or use names like “i” or “j”, but bear with me here.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">MinBinaryHeap</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="n">values</span><span class="p">):</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__elements</span> <span class="o">=</span> <span class="p">[</span><span class="n">values</span><span class="p">]</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__check_elements</span> <span class="o">=</span> <span class="nb">set</span><span class="p">(</span><span class="n">value</span> <span class="k">for</span> <span class="n">value</span> <span class="ow">in</span> <span class="n">values</span><span class="p">)</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__ubound</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">values</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__build_heap</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">__del__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__elements</span> <span class="o">=</span> <span class="bp">None</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__check_elements</span> <span class="o">=</span> <span class="bp">None</span>
<span class="k">def</span> <span class="nf">__contains__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">element</span><span class="p">):</span>
<span class="k">return</span> <span class="n">element</span> <span class="ow">in</span> <span class="bp">self</span><span class="p">.</span><span class="n">__check_elements</span>
<span class="k">def</span> <span class="nf">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="s">", "</span><span class="p">.</span><span class="n">join</span><span class="p">([</span><span class="nb">str</span><span class="p">(</span><span class="n">el</span><span class="p">)</span>
<span class="k">for</span> <span class="n">el</span> <span class="ow">in</span> <span class="bp">self</span><span class="p">.</span><span class="n">__elements</span><span class="p">[</span><span class="mi">0</span><span class="p">:(</span><span class="bp">self</span><span class="p">.</span><span class="n">__ubound</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)]])</span>
<span class="k">def</span> <span class="nf">empty</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="p">.</span><span class="n">__ubound</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span>
<span class="k">def</span> <span class="nf">push</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__check_elements</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__ubound</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">if</span> <span class="bp">self</span><span class="p">.</span><span class="n">__ubound</span> <span class="o">></span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">__elements</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">:</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__elements</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__elements</span><span class="p">[</span><span class="bp">self</span><span class="p">.</span><span class="n">__ubound</span><span class="p">]</span> <span class="o">=</span> <span class="n">value</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__sift_up</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">__ubound</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">peek</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="p">.</span><span class="n">__elements</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">pop</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">if</span> <span class="bp">self</span><span class="p">.</span><span class="n">__ubound</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">None</span>
<span class="n">element</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">peek</span><span class="p">()</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__remove</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="k">return</span> <span class="n">element</span>
<span class="k">def</span> <span class="nf">__build_heap</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">__ubound</span> <span class="o">/</span> <span class="mi">2</span><span class="p">),</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">):</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__sift_down</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">__sift_up</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">i</span><span class="p">):</span>
<span class="k">if</span> <span class="n">i</span> <span class="o"><=</span> <span class="mi">0</span> <span class="ow">or</span> <span class="n">i</span> <span class="o">></span> <span class="bp">self</span><span class="p">.</span><span class="n">__ubound</span><span class="p">:</span>
<span class="k">return</span>
<span class="n">parent</span> <span class="o">=</span> <span class="nb">int</span><span class="p">((</span><span class="n">i</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="o">/</span> <span class="mi">2</span><span class="p">)</span>
<span class="k">if</span> <span class="bp">self</span><span class="p">.</span><span class="n">__elements</span><span class="p">[</span><span class="n">parent</span><span class="p">][</span><span class="mi">1</span><span class="p">]</span> <span class="o">></span> <span class="bp">self</span><span class="p">.</span><span class="n">__elements</span><span class="p">[</span><span class="n">i</span><span class="p">][</span><span class="mi">1</span><span class="p">]:</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__swap</span><span class="p">(</span><span class="n">parent</span><span class="p">,</span> <span class="n">i</span><span class="p">)</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__sift_up</span><span class="p">(</span><span class="n">parent</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">__sift_down</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">i</span><span class="p">):</span>
<span class="k">if</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">0</span> <span class="ow">or</span> <span class="n">i</span> <span class="o">></span> <span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">__ubound</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="o">/</span> <span class="mi">2</span><span class="p">:</span>
<span class="k">return</span>
<span class="n">left</span> <span class="o">=</span> <span class="p">(</span><span class="n">i</span> <span class="o">*</span> <span class="mi">2</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span>
<span class="k">if</span> <span class="n">left</span> <span class="o">></span> <span class="bp">self</span><span class="p">.</span><span class="n">__ubound</span><span class="p">:</span>
<span class="k">return</span>
<span class="n">right</span> <span class="o">=</span> <span class="p">(</span><span class="n">i</span> <span class="o">*</span> <span class="mi">2</span><span class="p">)</span> <span class="o">+</span> <span class="mi">2</span>
<span class="k">if</span> <span class="p">(</span><span class="n">right</span> <span class="o">></span> <span class="bp">self</span><span class="p">.</span><span class="n">__ubound</span> <span class="ow">or</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__elements</span><span class="p">[</span><span class="n">left</span><span class="p">][</span><span class="mi">1</span><span class="p">]</span> <span class="o"><</span> <span class="bp">self</span><span class="p">.</span><span class="n">__elements</span><span class="p">[</span><span class="n">right</span><span class="p">][</span><span class="mi">1</span><span class="p">]):</span>
<span class="k">if</span> <span class="bp">self</span><span class="p">.</span><span class="n">__elements</span><span class="p">[</span><span class="n">i</span><span class="p">][</span><span class="mi">1</span><span class="p">]</span> <span class="o">></span> <span class="bp">self</span><span class="p">.</span><span class="n">__elements</span><span class="p">[</span><span class="n">left</span><span class="p">][</span><span class="mi">1</span><span class="p">]:</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__swap</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">left</span><span class="p">)</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__sift_down</span><span class="p">(</span><span class="n">left</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">if</span> <span class="p">(</span><span class="n">right</span> <span class="o"><=</span> <span class="bp">self</span><span class="p">.</span><span class="n">__ubound</span> <span class="ow">and</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__elements</span><span class="p">[</span><span class="n">right</span><span class="p">][</span><span class="mi">1</span><span class="p">]</span> <span class="o"><</span> <span class="bp">self</span><span class="p">.</span><span class="n">__elements</span><span class="p">[</span><span class="n">i</span><span class="p">][</span><span class="mi">1</span><span class="p">]):</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__swap</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">right</span><span class="p">)</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__sift_down</span><span class="p">(</span><span class="n">right</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">__remove</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">i</span><span class="p">):</span>
<span class="k">if</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">0</span> <span class="ow">or</span> <span class="n">i</span> <span class="o">></span> <span class="bp">self</span><span class="p">.</span><span class="n">__ubound</span><span class="p">:</span>
<span class="k">return</span>
<span class="n">element</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">__elements</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__check_elements</span><span class="p">.</span><span class="n">remove</span><span class="p">(</span><span class="n">element</span><span class="p">)</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__swap</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">__ubound</span><span class="p">)</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__ubound</span> <span class="o">-=</span> <span class="mi">1</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__sift_down</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">__swap</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">):</span>
<span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">__elements</span><span class="p">[</span><span class="n">i</span><span class="p">],</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__elements</span><span class="p">[</span><span class="n">j</span><span class="p">])</span> <span class="o">=</span> <span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">__elements</span><span class="p">[</span><span class="n">j</span><span class="p">],</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__elements</span><span class="p">[</span><span class="n">i</span><span class="p">])</span>
</code></pre></div></div>
<p>I will refrain from going too deep into the implementation of the data structure, because it would take me an entire article to do that alone. Please read up on <strong>binary heaps</strong> to learn more. What’s important to know right now is that all edges and their costs are stored in a simple list, that is then reordered in such a way that it adheres to a few core properties of a min binary heap:</p>
<ul>
<li>The <strong>shape property</strong>: the list translates to a complete binary tree (which is basically already the case since I store all values in a list and calculate the locations of all children in a naive way)</li>
<li>The <strong>heap property</strong>: the value of each node in the tree is smaller than (or equal to) each of the nodes below it.</li>
</ul>
<p>This does <strong>NOT</strong> mean that the list is ordered, by the way! That’s the main strength of the min binary heap and the heapsort algorithm it enables. We do not need to sort the entire list. All we need to do is reorder the list to satisfy those properties, which is why the build_heap method only operates on edges in the first half of the list. After all, the lowest half of the elements represent the leaves in the tree (remember, we double up the amount of nodes at each level of the binary tree), and the <strong>heap property</strong> is already satisfied for leaves. The 2 main components of the binary heap are the sift_up and sift_down operations, which basically make sure that a node is bubbled up or down the tree so that it’s in the correct position to satisfy the min binary heap properties. They maintain these properties during all the pushing and popping we do when working with the tree. The worst case runtimes of those 2 operations equal to O(log n) where n is the number of elements in the tree (which is basically the height of the tree).</p>
<p>All in all, since we have to get all edges out of the priority queue and consider them, the eventual running time for this part of the algorithm is O(n log n) with n being the number of edges, which is basically optimal for a comparison sort.</p>
<h3 id="the-disjoint-set">The disjoint set</h3>
<p>The last component before getting into the algorithm itself is the way we determine which trees belong together in order to detect cycles, as well as merging trees. A great one for that is the <strong>disjoint set</strong> data structure, which is also known as the <strong>union-find</strong> data structure. That basically sums up what we are trying to do, right? Again, no great detailed explanation here, since that would take this article too far, but I’ll at least provide some information. Read up on it if you want to expand your knowledge on this.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">DisjointSet</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">values</span><span class="p">):</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__parents</span> <span class="o">=</span> <span class="p">[]</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__ranks</span> <span class="o">=</span> <span class="p">[]</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__values_to_indices</span> <span class="o">=</span> <span class="p">{}</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">value</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">values</span><span class="p">):</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__parents</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__ranks</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__values_to_indices</span><span class="p">[</span><span class="n">value</span><span class="p">]</span> <span class="o">=</span> <span class="n">i</span>
<span class="k">def</span> <span class="nf">__del__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__parents</span> <span class="o">=</span> <span class="bp">None</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__ranks</span> <span class="o">=</span> <span class="bp">None</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__values_to_indices</span> <span class="o">=</span> <span class="bp">None</span>
<span class="k">def</span> <span class="nf">__get_parent</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">index</span><span class="p">):</span>
<span class="k">if</span> <span class="n">index</span> <span class="o">!=</span> <span class="bp">self</span><span class="p">.</span><span class="n">__parents</span><span class="p">[</span><span class="n">index</span><span class="p">]:</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__parents</span><span class="p">[</span><span class="n">index</span><span class="p">]</span> <span class="o">=</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__get_parent</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">__parents</span><span class="p">[</span><span class="n">index</span><span class="p">])</span>
<span class="k">return</span> <span class="bp">self</span><span class="p">.</span><span class="n">__parents</span><span class="p">[</span><span class="n">index</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">get_parent_index</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">element</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="p">.</span><span class="n">__get_parent</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">__values_to_indices</span><span class="p">[</span><span class="n">element</span><span class="p">])</span>
<span class="k">def</span> <span class="nf">merge</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">element1</span><span class="p">,</span> <span class="n">element2</span><span class="p">):</span>
<span class="n">index1</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">get_parent_index</span><span class="p">(</span><span class="n">element1</span><span class="p">)</span>
<span class="n">index2</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">get_parent_index</span><span class="p">(</span><span class="n">element2</span><span class="p">)</span>
<span class="k">if</span> <span class="n">index1</span> <span class="o">==</span> <span class="n">index2</span><span class="p">:</span>
<span class="k">return</span>
<span class="k">if</span> <span class="bp">self</span><span class="p">.</span><span class="n">__ranks</span><span class="p">[</span><span class="n">index1</span><span class="p">]</span> <span class="o">></span> <span class="bp">self</span><span class="p">.</span><span class="n">__ranks</span><span class="p">[</span><span class="n">index2</span><span class="p">]:</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__parents</span><span class="p">[</span><span class="n">index2</span><span class="p">]</span> <span class="o">=</span> <span class="n">index1</span>
<span class="k">else</span><span class="p">:</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__parents</span><span class="p">[</span><span class="n">index1</span><span class="p">]</span> <span class="o">=</span> <span class="n">index2</span>
<span class="k">if</span> <span class="bp">self</span><span class="p">.</span><span class="n">__ranks</span><span class="p">[</span><span class="n">index1</span><span class="p">]</span> <span class="o">==</span> <span class="bp">self</span><span class="p">.</span><span class="n">__ranks</span><span class="p">[</span><span class="n">index2</span><span class="p">]:</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__ranks</span><span class="p">[</span><span class="n">index2</span><span class="p">]</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">__ranks</span><span class="p">[</span><span class="n">index1</span><span class="p">]</span> <span class="o">+</span> <span class="mi">1</span>
</code></pre></div></div>
<p>In short, we store our values in several ways. in values_to_indices (what’s in a name, right?) we basically have a dictionary that will tell us at which index in the parents and ranks lists each value is stored. Each entry in parents holds the index of the parent of the value at that index. I know that sounds like gibberish, so I’ll clarify with an example. Let’s say we have 4 values: “a”, “b”, “c”, “d”. Then our variables look like this:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="bp">self</span><span class="p">.</span><span class="n">__parents</span> <span class="o">==</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__ranks</span> <span class="o">==</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">]</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__values_to_indices</span> <span class="o">==</span> <span class="p">{</span> <span class="s">"a"</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">"b"</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="s">"c"</span><span class="p">:</span> <span class="mi">2</span><span class="p">,</span> <span class="s">"d"</span><span class="p">:</span> <span class="mi">3</span><span class="p">}</span>
</code></pre></div></div>
<p>Which means that “a” points to index 0 in parents and ranks, “b” to index 1 and so on. In parents, we see that the element at index 1 has the element with index 1 as a parent. Same for 0, 2, 3. Basically, they all have themselves as parent, which means they just stand on their own. When merging 2 elements, we get their parent indices (using the <strong>path compression heuristic</strong> along the way - again, read up on it), and if they are different, it means they belong to different sets and can be merged. When doing the merging, we perform a <strong>union-by-rank</strong> (read, read, read) to keep our tree as shallow as possible in order to minimise running times. Let’s assume we want to merge elements “b” and “d”. The parent of “b” sits at index 1, the parent of “d” at index 3. So, they are not in the same set. We can then merge them, and we do so by setting the parent of “b” to be “d”. Since they both have the same ranks (0), element “d” sees it’s rank increased to 1. After that, our variables look like this:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="bp">self</span><span class="p">.</span><span class="n">__parents</span> <span class="o">==</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__ranks</span> <span class="o">==</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">]</span>
<span class="bp">self</span><span class="p">.</span><span class="n">__values_to_indices</span> <span class="o">==</span> <span class="p">{</span> <span class="s">"a"</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s">"b"</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="s">"c"</span><span class="p">:</span> <span class="mi">2</span><span class="p">,</span> <span class="s">"d"</span><span class="p">:</span> <span class="mi">3</span><span class="p">}</span>
</code></pre></div></div>
<p>Elements “a” and “c” at index 0 and 2, respectively, still have themselves as parent, meaning they represent separate sets. Also, they kept their ranks as we didn’t touch them in this merge operation. However, element “b” at index 1 now has the element at index 3 as its parent. That would be “d”, as values_to_indices tells us. Element “d” itself didn’t change in terms of parent (meaning it’s still at the top of its tree), but did see its rank increase, since we took the rank of its new child “b”, increased it with 1 and assigned it to the rank of “d”. So we now have 3 sets: [“a”], [“c”] and [“d”,”b”]. That’s in simple terms how the <strong>disjoint set</strong> works. Thanks to the combination of the <strong>union-by-rank</strong> and <strong>path compression</strong> heuristics, our <strong>amortised running time</strong> per operation is lowered to O(α(n)), where α stands for the <strong>inverse Ackermann</strong> function. Details aside, this function has a result of <5 for any value of n that can be written in our physical universe. That makes disjoint set operations with these heuristics in place practically operate in <strong>constant time</strong>. That’s pretty awesome.</p>
<h3 id="the-algorithm-itself">The algorithm itself</h3>
<p>Now for the main event, the algorithm you all came to admire. For those expecting a great big codefest, spoiler alert: this will be a bummer. The code below is really all there is to it, thanks to good use of the data structures we created before.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">get_cost_minimal_spanning_tree</span><span class="p">(</span><span class="n">vertices</span><span class="p">,</span> <span class="n">edgecoll</span><span class="p">):</span>
<span class="c1"># Kruskal's algorithm, just so you know, you might miss it
</span> <span class="n">total_cost</span> <span class="o">=</span> <span class="mf">0.0</span>
<span class="n">edges</span> <span class="o">=</span> <span class="n">MinBinaryHeap</span><span class="p">(</span><span class="n">edgecoll</span><span class="p">)</span>
<span class="n">minimal_tree</span> <span class="o">=</span> <span class="n">Graph</span><span class="p">(</span><span class="n">vertices</span><span class="p">)</span>
<span class="n">disjoint_set</span> <span class="o">=</span> <span class="n">DisjointSet</span><span class="p">(</span><span class="n">vertices</span><span class="p">)</span>
<span class="k">while</span> <span class="ow">not</span> <span class="n">edges</span><span class="p">.</span><span class="n">empty</span><span class="p">():</span>
<span class="p">(</span><span class="n">vertex1</span><span class="p">,</span> <span class="n">vertex2</span><span class="p">),</span> <span class="n">cost</span> <span class="o">=</span> <span class="n">edges</span><span class="p">.</span><span class="n">pop</span><span class="p">()</span>
<span class="k">if</span><span class="p">(</span><span class="n">disjoint_set</span><span class="p">.</span><span class="n">get_parent_index</span><span class="p">(</span><span class="n">vertex1</span><span class="p">)</span> <span class="o">!=</span>
<span class="n">disjoint_set</span><span class="p">.</span><span class="n">get_parent_index</span><span class="p">(</span><span class="n">vertex2</span><span class="p">)):</span>
<span class="n">disjoint_set</span><span class="p">.</span><span class="n">merge</span><span class="p">(</span><span class="n">vertex1</span><span class="p">,</span> <span class="n">vertex2</span><span class="p">)</span>
<span class="n">minimal_tree</span><span class="p">.</span><span class="n">add_edge</span><span class="p">(</span><span class="n">vertex1</span><span class="p">,</span> <span class="n">vertex2</span><span class="p">,</span> <span class="n">cost</span><span class="p">)</span>
<span class="n">total_cost</span> <span class="o">+=</span> <span class="n">cost</span>
<span class="k">return</span> <span class="n">minimal_tree</span><span class="p">,</span> <span class="n">total_cost</span>
</code></pre></div></div>
<p>That’s messed up right? All of this reading for a few lines of code. But remember: all the heavy lifting is done by the data structures. They are what make or break this algorithm. That being said, we are in pretty good shape due to the choices we made. For this implementation, we assume we get the vertices and edges (with their costs) provided to us in lists vertices and edgecoll. The function not only creates the minimal spanning tree, but it also keeps track of the total cost of the tree so far, and returns them both as a tuple.</p>
<h3 id="an-example">An example</h3>
<p>Let’s demonstrate with a simple example where Joseph, Robert, Otto, Jarnik and Peter (inside joke) live in the same neighbourhood, and they want to create an underground tunnel system between their houses. I don’t know why, but there have been blockbusters making millions of dollars that had worse plots than this, so bear with me. Now, since they’re scientists, they don’t want to put too much work into this, as they have to research some new algorithms (hint to the joke) and want to maximise the time for that. So they want to connect all their houses with the least amount of tunnel needed to get the job done. I can’t give you any hint that is more obvious as to how to measure that. Minimum spanning trees to the rescue!</p>
<p>To simplify, we will use simple (x,y) coordinates to determine the locations of their houses. The coordinates for each house are:</p>
<ul>
<li>Joseph: (0, 0)</li>
<li>Robert: (0, 2)</li>
<li>Otto: (1, 1)</li>
<li>Peter (3, 2)</li>
<li>Jarnik (3, 0)</li>
</ul>
<p>If we were to plot these locations on an imaginary chart, we would end up with something like this:
<img src="https://www.peculiar-coding-endeavours.com/assets/through_the_trees/graph_nodes.png" alt="graph nodes" /></p>
<p>which makes for a decent enough estimation of the relative locations of the houses. Next up, we need to consider all possible tunnels (edges in the graph) we can dig between each of the houses. Simple graph theory tells us we will have (n * (n-1)) / 2 edges to check, where n is the number of nodes or houses. Thus, we will have 10 edges to go through, which cover all possible edges between all of the houses. As an example, this will suffice since we can easily reason about this, as well as visualise it. When considering this type of problem in real life however, when connecting hubs on a network or houses in a telephone grid or power grid, it’s easy to imagine the need for a fast algorithm, as the number of candidate edges will quickly run in the millions.</p>
<p>Besides building a list of all possible edges, we need to calculate the cost per edge. For our example, the costs are just Euclidean distances. I’m sure you remember your math from long ago, but as a reminder, we just take the Pythagorean theorem A² + B² = C² and give it a spin, giving us a simple way to calculate the distances in this 2-dimensional plane. By way of example, the distance between Otto and Peter is the root of ((3-1)² + (2-1)²), which equals to 2.24 (rounded up to 2 decimals). In this way, we calculate the costs between all nodes and add them to the edges, achieving the situation below:</p>
<p><img src="https://www.peculiar-coding-endeavours.com/assets/through_the_trees/graph_all_connections.png" alt="graph all connections" /></p>
<p>This is basically our starting point, our initial graph from which we need to distill our minimum spanning tree. I would invite you to once again read the algorithm steps, summarized for your convenience:</p>
<ol>
<li>Create a graph G where each vertex V is a separate tree. We did that when plotting out all houses on our imaginary chart.</li>
<li>Create a collection C containing <strong>all possible edges</strong> in G. That’s the graph you found just above this list of steps. Our minimum spanning tree will consist of a <strong>subset of those edges</strong>, thereby connecting all points with the <strong>cheapest possible edges</strong> while avoiding <strong>cycles</strong>.</li>
<li>While C is non-empty and G is not yet spanning: remove the <strong>edge E with minimum weight</strong> from C. If E connects 2 different trees in G, remove it from the forest and add it to the spanning tree, thereby combining 2 trees in a single new tree.</li>
</ol>
<p>And that’s what I leave up to you to work out for yourself. Again: we’re not in instant gratification land here ;-) it’s time to put those cells to work. But remember it’s quite simple: you will start from the graph without any edges. Then we start considering each edge in increasing order of cost. That would be either the edge between Robert and Otto, or Joseph and Otto. It doesn’t really matter. After that, the cheapest edge is that between Peter and Jarnik or between Robert and Joseph. Then the edges between Otto and Peter, and Otto and Jarnik are considered. When thinking about all those, watch out, since adding some could create a cycle, which you must avoid. Which ones you pick (in the case of equal cost) doesn’t really matter in the end result, your spanning tree could look slightly different, but the total cost remains the same). The next cheapest after that are the edges between Joseph and Jarnik, and between Robert and Peter. Again, watch out for cycles, and follow the steps in the algorithm and the logic in the data structures! Last edges are those between Robert and Jarnik and between Joseph and Peter. And that makes 10 edges you checked, depleting your priority queue, ending the algorithm and leaving you with the total_cost variable containing 7.06 as the minimum total cost, and a minimum spanning tree looking something like this:</p>
<p><img src="https://www.peculiar-coding-endeavours.com/assets/through_the_trees/graph_minimal_connections.png" alt="graph minimal connections" /></p>
<p>Again, the actual layout of your result could differ a bit, but the total cost should be the same. If the result you conjured up differs, you made an error in your reasoning somewhere, and I would encourage you to go over it again. So, to connect all their houses, the guys need to dig at least 7.06 (miles, meters, whatever your metric is) of tunnel. At least they found out fast enough, thanks to Joseph’s work. Now, some of the other guys also developed algorithms that are very similar, but I’ll leave that for you to discover ;-)</p>
<h3 id="the-running-time">The running time</h3>
<p>Now, how slow or fast is Kruskal’s algorithm? Well, as I said several times: it depends on the data structures. But to look at the running time, we should consider the 2 main steps:</p>
<ol>
<li>Getting the sorted (by edge weight or cost) list of edges to consider: that’s where the binary heap comes in. It provides our edges in O(n log n) time, which is as good as it’s going to get for comparison sorting algorithms.</li>
<li>Processing each edge: thanks to the disjoint set, we can count on near-constant running times.</li>
</ol>
<p>As we know, the total running time is determined by the slowest acting part of our algorithm. In this case, that’s the sorting of the edges. That makes Kruskal’s algorithm perform in O(|E| log |V|). Now, if there would be a scenario where some pre-processing was done, and the edges were already provided to us in increasing order, we would not need to perform the sorting work. This would imply we can achieve near linear running times, since we would perform a near-constant time operation for each edge, which would make the algorithm perform close to O(|E|) time.</p>
<p>And there you have it: my implementation and explanation of Kruskal’s algorithm. I hope you had a nice read and at least picked up some nice data structures to get you inspired. Feel free to copy the code and play around with it, keep in mind this is only really scratching the surface of all cool stuff out there, and remember that algorithms and data structures like these are the engines behind all the cool technology we have the privilege of using every day. So next time you consider a topic to study, give this a whirl. I promise you it will pay dividends in the future, and the JavaScript frameworks will still be there when you get back ;-)</p>
Sun, 20 Dec 2020 16:30:00 +0000
https://www.peculiar-coding-endeavours.com/2020/through-the-trees/
https://www.peculiar-coding-endeavours.com/2020/through-the-trees/algorithmsdata structureskruskalgraphbinary heapdisjoint setpriority queueTechAlgorithmsData structuresImage classification: MLP vs CNN<p>In this article, I will make a short comparison between the use of a standard MLP (<strong>multi-layer perceptron</strong>, or <strong>feed forward network</strong>, or vanilla neural network, whatever term or nickname suits your fancy) and a CNN (<strong>convolutional neural network</strong>) for image recognition using supervised learning.</p>
<p>It’ll be clear that, although an MLP could be used, CNN’s are much more suited for this task, since they take the dimensional information of a picture into account, something MLP’s do not do.</p>
<p>When thinking about providing this article as an example, I went back and forth a few times, and ultimately decided to make it <em>somewhat</em> beginner-friendly. Thus, I will provide explanations of some basic concepts like perceptrons, layers, logistic regression, activation or cost functions, gradient descent, overfitting and possible counter-measures against it, and so on. Don’t expect a full in-depth tutorial about the full theory behind it all though. There are nowadays so many courses to be found, like the ones in the <a href="https://eu.udacity.com/school-of-ai" target="_blank">Udacity School of AI</a> (which I highly recommend!), or <a href="https://www.coursera.org/learn/machine-learning" target="_blank">Coursera Machine Learning</a>, where you can learn all about the basic concepts and even implement some examples. Now, I don’t have too much in-depth experience with online AI courses other than from those 2 sources, but what I did notice when looking at other courses is that a lot of them do explain the core concepts and theory behind it all, but fail to convert that into actual applicable skills (the Udacity ones being an exception to that, again, invest in them, they are well worth it).</p>
<p>So what I decided to go for is to limit the theory and focus on application. I will explain some concepts from time to time, but you’d do best to build up some basic theoretical knowledge about core concepts. Together with the explanation I will give here, that should be enough to understand what is going on. If you really want to fully comprehend every single detail here (and preferably much more than that), be realistic and dig further and deeper. The courses I referred to above are very interesting and quite thorough (although of course they are also only the beginning of this broad topic). Explaining everything that goes on is not something that is doable in a short little article. I just hope to demonstrate some interesting stuff, explain some concepts, and ultimately, to entice you to start learning more about this amazing field of technology.</p>
<h3 id="the-tools">The tools</h3>
<p>All code in the article is using Python. In order to build the neural networks and do the training, I used <a href="https://keras.io" target="_blank">Keras</a>, with <a href="https://www.tensorflow.org" target="_blank">TensorFlow</a> as the backend. I went with TensorFlow-GPU to be more specific (to shorten the training time). Of course you can use the standard version of TensorFlow that will run on the CPU, but that will definitely take a bit more time. Alternatively, run this code in the cloud. AWS is a good start, they have several GPU machine learning options to choose from. Other Python libraries I used are <a href="http://www.numpy.org" target="_blank">NumPy</a> and <a href="https://matplotlib.org" target="_blank">Matplotlib</a>, which are basically the usual suspects when doing this line of work. You can check out the <a href="https://github.com/tomvanschaijk/mlp_vs_cnn" target="_blank">GitHub link</a> to the repository, where you will find this article in the form of a Jupyter notebook, together with the requirements.txt file that you can use to set up the virtual environment, so that you can start playing around with it for yourself.</p>
<h2 id="the-dataset">The dataset</h2>
<p>As you know (or maybe not, just take my word for it), having a clean, complete data set is very important, so in order to avoid most of the clean-up and preprocessing steps, and focus on the actual neural network, I used a prepared example, namely the <strong>CIFAR10</strong> dataset, which is included in the installation of Keras. This dataset consists of 50.000 (that’s 50,000 for you using a dot as a decimal separator) 32x32 pixel color training images, labeled over 10 categories, and 10k (see how I avoided the formatting problem there?) test images. Why we have training and testing images, and what that means, I’ll touch on shortly in a minute.</p>
<h3 id="mlp">MLP</h3>
<p>Let’s start with a basic explanation of what an MLP actually is. In fact, the description “multi-layer perceptron” pretty much says it all (and says nothing at the same time). I trust the “multi-layer” part will sort itself out from the moment you know a bit more about what a perceptron is. Here it is for you: a perceptron is quite simply the most basic neural network you can think of. You probably already know, like a biological neuron having dendrites, a cell body and an axon, like this charming specimen:</p>
<p><img src="https://www.peculiar-coding-endeavours.com/assets/mlp_vs_cnn/neuron.jpg" alt="neuron" /></p>
<p>the equivalents of that for an artificial neuron are the input channels, a processing stage, and an output, as displayed here:</p>
<p><img src="https://www.peculiar-coding-endeavours.com/assets/mlp_vs_cnn/singleperceptron.png" alt="Perceptron" /></p>
<p>What happens here, is really rather simple (and I’m going to speed through this, so sit tight): image a simple function with a number of inputs x<sub>1</sub>, x<sub>2</sub>, … all the way to x<sub>n</sub>. When the inputs come in, each gets multiplied by the weight value that is assigned to the particular input: so x<sub>1</sub> gets multiplied by w<sub>1</sub>, x<sub>2</sub> by w<sub>2</sub> and so on. All the resulting values are then summed up to a single value. We also have what’s called a bias (nothing crazy, it’s like the constant in linear regression), which is also added to the sum. Finally, the result of these calculations has to be turned into an output signal. This is done by feeding the result to an <strong>activation function</strong>.</p>
<p>The activation function in the last step, designated with the funny Greek symbol (just say <strong>sigmoid</strong>), is the function that will transform the result of the processing of our inputs to a signal for the outside world. That statement in and of itself doesn’t help much, I realize that. But think about what an axon in a biological neuron does. It “fires” a response or output, or it does not. The same must happen for our artificial neuron. The result of the calculation inside the neuron could be anything ranging from -inf to +inf. After all, we receive several inputs, which we then have to combine with the weights we chose, and add the results of those multiplications (and the bias) to get one single numeric result. Inside of the neuron, it doesn’t really matter what that result looks like or how big or small it is, but we obviously care, since we are actually trying to make the perceptron do something.</p>
<p>The most basic use case is simple classification into 2 categories. So for example: if the result of the function is above a certain value, classify in one category. Otherwise, designate the other category. So, in reality, we are outputting a <strong>probability</strong>. We want to check the probability that the result of our function makes a certain set of input variables end up in one or the other category. So we use a sigmoid function for that (in this case, the logistic function as displayed below this paragraph), which takes any value and transforms it to a value between 0 and 1, which is basically our probability. There are many other activation functions (softmax for multiple classification, reactive linear unit or relu, hyperbolic tangent, …), each with their own uses and properties. I’m not going to go deep into those, there are several sources out there to learn more about them. For now: you’ll find below how the logistic sigmoid function looks like, in formula and in a chart.</p>
<p><img src="https://www.peculiar-coding-endeavours.com/assets/mlp_vs_cnn/logisticsigmoid.jpg" alt="Sigmoid" /></p>
<p>So, in very simple terms, that is what a perceptron is and does. So what can you do with it? Well, as I told you above, you could use it to do 2-class classification. For example, to predict whether a student passes a course based on the points he or she scored on several tests. Or you could emulate AND, OR, or NOT functions. They basically do the same thing, right? Based on a number of inputs (for example: TRUE, FALSE, FALSE, TRUE, TRUE) you determine whether the end result of applying a logical operator to those inputs, results in the output being TRUE or FALSE. So you could take a dataset of the results of applying, for example, the AND function to 4 input variables (each having the value TRUE or FALSE), and predict for any combination of 4 values whether the output is TRUE or FALSE. Or translated to neural network terms: given 4 input variables, what is the probability of the output being TRUE?</p>
<p>Now, at the start, all neurons have random weights and a random bias. In an <strong>iterative process</strong> called <strong>feedforward</strong> and <strong>backpropagation</strong>, the weights and biases are gradually shifted so that each consecutive result is a bit closer to the desired output. In this fashion, the neural network gradually moves towards a state where the desired patterns are “learned”. That is, in one of the most extreme nutshells, what the process is all about. If you require a bit more detail, look up <strong>loss function</strong> (which is how we determine how “off” our network is in its estimations. and of course <strong>feed forward</strong>, <strong>back propagation</strong>.</p>
<p>So what about the “multi-layer” part? Well, it’s clear that a simple perceptron has a limited set of use cases. The world starts to look extremely interesting however, if we start combining them. In very simple terms we can do that by</p>
<ul>
<li>sending the inputs not to 1 perceptron, but to several</li>
<li>treating the output of each of the perceptrons as inputs for another perceptron, or even multiple layers of perceptrons</li>
</ul>
<p>So whereas 1 perceptron can classify an input in 2 categories, achieving (if displayed on a graph) something like this (basically finding that ideal line that separates the two blobs of points, thereby being able to categorize each new point in one of those 2 categories):</p>
<p><img src="https://www.peculiar-coding-endeavours.com/assets/mlp_vs_cnn/2classification.png" alt="2 class classification" /></p>
<p>a more elaborate amalgam (I love that word) of perceptrons, orchestrated in several layers (I love English in general) as such:</p>
<p><img src="https://www.peculiar-coding-endeavours.com/assets/mlp_vs_cnn/mlp.png" alt="mlp" /></p>
<p>where we see all inputs (in the first column) be <strong>fully connected</strong> to each neuron in the second layer (or column), which is in turn connected to each neuron in the third layer, whose inputs then go to the output layer, which then outputs a probability, could look something like this:</p>
<p><img src="https://www.peculiar-coding-endeavours.com/assets/mlp_vs_cnn/hyperplane.png" alt="hyperplane" /></p>
<p>Now please realize this is all explained in very simple basic terms, in order not to complicate matters in this tutorial. In reality, this kind of MLP will quickly turn into an almost obscure brew of linear functions, activation functions, backpropagation, hyperparameter optimalization, matrix multiplications and a mix of funny terms and techniques. But in reality, it all boils down to as simple an explanation as provided above. Just with much more grimy details (that matter a lot).</p>
<h3 id="just-one-more-thing">Just one more thing</h3>
<p>Before we start diving in code, just a few words. Remember how I told you in the beginning that we did <strong>supervised</strong> learning? Just a quick explanation on that: this basically means that we will provide our neural network with a number of, let’s say, “examples”. Translated to the CIFAR10 dataset, we will initialize the weights in our network (well, TensorFlow will), and start feeding it the images in the training dataset. For each image, we will check how the network classifies it (in which of the 10 categories does it fall according to our network). No doubt, it’ll screw up often in the beginning. We know it screws up, because we actually already know which category an image falls in. Those are the <strong>labels</strong> of the images; basically identifying its category. We will compare the prediction our network makes to the actual value. Each mistake enables us to alter the weights of the neurons in our network. Doing this often enough should lead to an increasingly more accurate estimation, until we hit a percentage of accuracy that satisfies our goals. For more details about this: look up <strong>maximum likelihood</strong> (which you would want to maximize) or even better <strong>cross-entropy</strong> (which you want to minimize, and is more interesting to calculate because sums are better than multiplications, and logarithms are easier as well - just look it up, you’ll get what I mean). While you’re at it, look up <strong>logistic regression</strong> and <strong>gradient descent</strong> which will clarify much more.</p>
<p>Lastly, I just mentioned “a percentage of accuracy that satisfies our goals”. So what is satisfactory in this short little example? Well, we know we have 10 categories, which implies that random guessing would lead to an accuracy of 10%, right? So as far as I’m concerned, anything substantially higher than than - let’s say 20-25% - at the very least demonstrates the merrit of the idea. We’ll be testing our network on the testing set of 10k images, so if our network classifies 2500 of those correctly, we’re at least on to something. Just keep that in mind ;-)</p>
<h3 id="enough-talk-show-me-the-code">Enough talk, show me the code!</h3>
<p>Remember how I said in the introduction I would like to keep this as practical as possible? Now scroll up. So much for promises, right? But trust me, I can’t really make it much shorter than that. But this is where the fun starts. I’ll guide you through every part of the code, explain what it does and why I decided to do it like that. We will first start with the MLP implementation. After that, it’s time to explain just a bit more for you to understand the CNN example. In the end, I’ll display the results of both approaches and you’ll see the difference is quite substantial. So, at long last, here we go!</p>
<h3 id="some-general-stuff">Some general stuff</h3>
<p>Let’s first do some imports, define some functions regardless of the MLP or CNN implementation, just go get them out of the way:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="kn">from</span> <span class="nn">collections</span> <span class="kn">import</span> <span class="n">namedtuple</span> <span class="k">as</span> <span class="n">nt</span>
<span class="n">Data</span> <span class="o">=</span> <span class="n">nt</span><span class="p">(</span><span class="s">"Data"</span><span class="p">,</span> <span class="s">"x_train y_train x_valid y_valid x_test y_test"</span><span class="p">)</span>
<span class="o">%</span><span class="n">matplotlib</span> <span class="n">inline</span>
</code></pre></div></div>
<p>As you see, hardly anything crazy occurs here. We import NumPy and Matplotlib on the first 2 lines. On line 3, I import namedtuple from the collections library, which I use on line 5 to define my own little type. I do this because, in the end, I will need 3 sets of input x’s, and 3 sets of y labels: 1 to do the training, 1 to validate my progress during training, and 1 to test the quality of my network after the training. Just storing it in one simple data variable is just handy. I could have used a tuple, but being able to call these things by name makes life so much easier. The last line just tells Matplotlib to spit out the created plots in place in the notebook.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">visualize_data</span><span class="p">(</span><span class="n">data</span><span class="p">):</span>
<span class="n">images_to_show</span> <span class="o">=</span> <span class="mi">36</span>
<span class="n">per_row</span> <span class="o">=</span> <span class="mi">12</span>
<span class="n">fig</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">20</span><span class="p">,</span><span class="mi">5</span><span class="p">))</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">images_to_show</span><span class="p">):</span>
<span class="n">pos</span> <span class="o">=</span> <span class="p">(</span><span class="n">i</span> <span class="o">//</span> <span class="n">per_row</span><span class="p">,</span> <span class="p">((</span><span class="n">i</span> <span class="o">%</span> <span class="n">per_row</span><span class="p">)</span> <span class="o">+</span> <span class="n">per_row</span><span class="p">)</span> <span class="o">%</span> <span class="n">per_row</span><span class="p">)</span>
<span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplot2grid</span><span class="p">((</span><span class="nb">int</span><span class="p">(</span><span class="n">images_to_show</span> <span class="o">/</span> <span class="n">per_row</span><span class="p">),</span> <span class="n">per_row</span><span class="p">),</span>
<span class="n">pos</span><span class="p">,</span> <span class="n">xticks</span><span class="o">=</span><span class="p">[],</span> <span class="n">yticks</span><span class="o">=</span><span class="p">[])</span>
<span class="n">ax</span><span class="p">.</span><span class="n">imshow</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">squeeze</span><span class="p">(</span><span class="n">data</span><span class="p">.</span><span class="n">x_train</span><span class="p">[</span><span class="n">i</span><span class="p">]))</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p>This is a very simple little function that I will use to display 36 of the images in the CIFAR10 dataset, just so you can see what we are working with.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># A chart showing how the accuracy for the training and tests sets evolved
</span><span class="k">def</span> <span class="nf">visualize_training</span><span class="p">(</span><span class="n">hist</span><span class="p">):</span>
<span class="n">plt</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">hist</span><span class="p">.</span><span class="n">history</span><span class="p">[</span><span class="s">'acc'</span><span class="p">])</span>
<span class="n">plt</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">hist</span><span class="p">.</span><span class="n">history</span><span class="p">[</span><span class="s">'val_acc'</span><span class="p">])</span>
<span class="n">plt</span><span class="p">.</span><span class="n">title</span><span class="p">(</span><span class="s">'accuracy'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">ylabel</span><span class="p">(</span><span class="s">'accuracy'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">xlabel</span><span class="p">(</span><span class="s">'epochs'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">legend</span><span class="p">([</span><span class="s">'training'</span><span class="p">,</span> <span class="s">'validation'</span><span class="p">],</span> <span class="n">loc</span><span class="o">=</span><span class="s">'lower right'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
<span class="c1"># A chart showing our training vs validation loss
</span> <span class="n">plt</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">hist</span><span class="p">.</span><span class="n">history</span><span class="p">[</span><span class="s">'loss'</span><span class="p">])</span>
<span class="n">plt</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">hist</span><span class="p">.</span><span class="n">history</span><span class="p">[</span><span class="s">'val_loss'</span><span class="p">])</span>
<span class="n">plt</span><span class="p">.</span><span class="n">title</span><span class="p">(</span><span class="s">'loss'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">ylabel</span><span class="p">(</span><span class="s">'loss'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">xlabel</span><span class="p">(</span><span class="s">'epochs'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">legend</span><span class="p">([</span><span class="s">'training'</span><span class="p">,</span> <span class="s">'validation'</span><span class="p">],</span> <span class="n">loc</span><span class="o">=</span><span class="s">'upper right'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p>After the training, I will display a chart where you see how the training process occurs, how the <strong>error loss</strong> decreases and the accuracy increases with each <strong>epoch</strong> (quickly snuck in a new term there; an epoch is an iteration during which all training samples are passed through the network, and the weights are updated with backpropagation). This function just takes the entire training history and displays the charts where we can see the accuracy and loss of the training and validation sets.</p>
<h3 id="mlp-implementation">MLP implementation</h3>
<p>That’s it for the helpers, that wasn’t so bad right? From here on, it’s all MLP related code, so hold on tight!</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">keras.models</span> <span class="kn">import</span> <span class="n">Sequential</span>
<span class="kn">from</span> <span class="nn">keras.layers</span> <span class="kn">import</span> <span class="n">Dense</span><span class="p">,</span> <span class="n">Activation</span><span class="p">,</span> <span class="n">Flatten</span><span class="p">,</span> <span class="n">Dropout</span>
<span class="kn">from</span> <span class="nn">keras.optimizers</span> <span class="kn">import</span> <span class="n">SGD</span>
<span class="kn">from</span> <span class="nn">keras.utils</span> <span class="kn">import</span> <span class="n">to_categorical</span>
<span class="kn">from</span> <span class="nn">keras.callbacks</span> <span class="kn">import</span> <span class="n">ModelCheckpoint</span>
<span class="kn">from</span> <span class="nn">keras.datasets</span> <span class="kn">import</span> <span class="n">cifar10</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Using TensorFlow backend.
</code></pre></div></div>
<p>What happens here? Well, we import everything from Keras that we need. We will use the Sequential model (another option is the API, but right now, the Sequential model is enough), since our neural network will simply be a linear ordering of layers. Furthermore, the layers we will actually use are Dense (fully connected), Activation (to perform our activation function such as sigmoid, relu, …), Flatten (that will flatten out our dataset to a vector), Dropout (a regularization technique that will prevent overfitting). I will refrain from explaining more about these layers, which may be a bit annoying to you, but explaining it all in detail would make this explanation just too big. Just the activation functions or <strong>regularization</strong> techniques alone would fill several chapters of a well-sized textbook, so know that they are extremely important. The same goes for the SGD optimizers (again, chapters to be filled with this, and all the <strong>hyperparameter optimization</strong> that goes on there). From the keras.utils libary we dig up the to_categorical function, which we will use to perform <strong>one-hot encoding</strong>. Obviously you want to know what that is, so I suggest you look it up. The short version is that it is a data preprocessing step that converts categorical variables into a form that could be provided to a machine learning algorithm to do a better job in prediction. In our example we have 10 categories, right? So you can image a column being filled with data from 0 to 9, each depicting the category id. In order for the algorithms to be able to take this data into account, you convert this one column to 10 columns, each having a 0 or a 1, whether or not the image belongs to that category. Again, in a nutshell. The last line imports the CIFAR10 dataset in our program.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">preprocess</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">categories</span><span class="p">):</span>
<span class="n">x_train</span> <span class="o">=</span> <span class="n">data</span><span class="p">.</span><span class="n">x_train</span><span class="p">.</span><span class="n">astype</span><span class="p">(</span><span class="s">"float32"</span><span class="p">)</span> <span class="o">/</span> <span class="mi">255</span>
<span class="n">x_test</span> <span class="o">=</span> <span class="n">data</span><span class="p">.</span><span class="n">x_test</span><span class="p">.</span><span class="n">astype</span><span class="p">(</span><span class="s">"float32"</span><span class="p">)</span> <span class="o">/</span> <span class="mi">255</span>
<span class="n">y_train</span> <span class="o">=</span> <span class="n">to_categorical</span><span class="p">(</span><span class="n">data</span><span class="p">.</span><span class="n">y_train</span><span class="p">,</span> <span class="n">categories</span><span class="p">)</span>
<span class="n">y_test</span> <span class="o">=</span> <span class="n">to_categorical</span><span class="p">(</span><span class="n">data</span><span class="p">.</span><span class="n">y_test</span><span class="p">,</span> <span class="n">categories</span><span class="p">)</span>
<span class="k">return</span> <span class="n">Data</span><span class="p">(</span><span class="n">x_train</span><span class="p">[</span><span class="mi">5000</span><span class="p">:],</span> <span class="n">y_train</span><span class="p">[</span><span class="mi">5000</span><span class="p">:],</span>
<span class="n">x_train</span><span class="p">[:</span><span class="mi">5000</span><span class="p">],</span> <span class="n">y_train</span><span class="p">[:</span><span class="mi">5000</span><span class="p">],</span>
<span class="n">x_test</span><span class="p">,</span> <span class="n">y_test</span><span class="p">)</span>
</code></pre></div></div>
<p>I alluded to it before, but data preprocessing is pretty much everything in deep learning. No matter how great of an architecture (which layers, how many,…) we create, pushing poorly optimized or incorrect data through it can pretty much render your results useless. Even with our nicely clean up dataset that Keras gives us, we need to massage it in such a way that our MLP can work with it. Our images are 32x32 pixels (one for each color channel in RGB), with a value between 0 and 255 for each pixel. Now, since we are going to do a massive amount of multiplication, we want to convert our values to floats, and divide by 255 so we have a value we can work with that won’t explode with each iteration or calculation. So we do that for the training and test values. Concerning our labels (the categories in this case), we perform one-hot encoding on them. In the end we return our preprocessed input data and labels as a variable of our predefined type. Note that we select the first 5000 images as validation data, and the first 45000 as training data. Our validation data is used to measure the accuracy during training on a different set that the actual training data. Our test data will be used to run the whole neural network against data it never even saw before, in order to have a true test of quality.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">build_mlp</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">categories</span><span class="p">):</span>
<span class="c1"># Create model architecture
</span> <span class="n">model</span> <span class="o">=</span> <span class="n">Sequential</span><span class="p">()</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Flatten</span><span class="p">(</span><span class="n">input_shape</span><span class="o">=</span><span class="n">data</span><span class="p">.</span><span class="n">x_train</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">:]))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Dense</span><span class="p">(</span><span class="mi">1000</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">"relu"</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Dropout</span><span class="p">(</span><span class="mf">0.2</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Dense</span><span class="p">(</span><span class="mi">512</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">"relu"</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Dropout</span><span class="p">(</span><span class="mf">0.2</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Dense</span><span class="p">(</span><span class="n">categories</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">"softmax"</span><span class="p">))</span>
<span class="c1"># Compile the model
</span> <span class="n">model</span><span class="p">.</span><span class="nb">compile</span><span class="p">(</span><span class="n">loss</span><span class="o">=</span><span class="s">"categorical_crossentropy"</span><span class="p">,</span> <span class="n">optimizer</span><span class="o">=</span><span class="s">"rmsprop"</span><span class="p">,</span>
<span class="n">metrics</span><span class="o">=</span><span class="p">[</span><span class="s">"accuracy"</span><span class="p">])</span>
<span class="k">return</span> <span class="n">model</span>
</code></pre></div></div>
<p>This function is what builds the actual neural network architecture. It receives the data that we got from the preprocess function, as well as the number of categories (labels). Just following the code demonstrates the easy Keras API. We create a Sequential neural network in the first line. Then we add a Flatten layer. This is a fundamental difference between the MLP and a CNN: an MLP uses simple data vectors, arrays if you will, with the x-values of every input image provided. Because our image is a 32x32 matrix, we need to convert it to a flattened vector. That’s what the Flatten layer does for us. It requires (like any first layer in the model would) the input shape. The shape of x_train is 45000x32x32x3 since we have 45000 images of 32x32 pixels with 3 values (RGB) each. That will then get flattened to one long array of numbers.</p>
<p>After that we add a Dense (fully connected) layer with 1000 neurons (why 1000? I can only say hyperparameter optimization is an interesting topic, and this is just an example). The activation function (the function that will be performed on the result of multiplying all x-values with their weights, and then adding them up and adding the bias) is the relu function. I won’t go too deep into it, but what it does for a certain x value is: if the value is lower than 0, return 0. Otherwise return x. So it gives the max of 0 and x.</p>
<p>After that we get a Dropout layer with a parameter of 0.2. In short, a Dropout layer with, for example, 0.2 as an argument will result in a 20% chance that either neuron will be disregarded from the training in each epoch. This is a countermeasure against overfitting (which you should definitely look up). After those 2 layers, we get another fully connected layer, followed by another Dropout.</p>
<p>At the end, the last Dense layer contains “categories” amount of neurons. We know that will be 10, since those are the number of categories. So each of those neurons will output the probability of the input image being that category. The “softmax” activation function is a multiclass categorization function. A simple search online will explain you what the formula is, and why (btw, e = 2.7182818 - and then some). We compile the model with the categorical_crossentropy loss function (this is the error function we will use to judge the quality of the model, basically the accuracy of the weights of our equations), and rmsprop as optimizer. RmsProp is only one of the optimizers avaible in Keras, there are multiple available such as Adam, Adagrad, …, each of which have several hyperparameters such as <strong>learning rate</strong>, <strong>learning rate decay</strong>, ways to deal with local minima and so on.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Load data
</span><span class="p">(</span><span class="n">x_train</span><span class="p">,</span> <span class="n">y_train</span><span class="p">),</span> <span class="p">(</span><span class="n">x_test</span><span class="p">,</span> <span class="n">y_test</span><span class="p">)</span> <span class="o">=</span> <span class="n">cifar10</span><span class="p">.</span><span class="n">load_data</span><span class="p">()</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">Data</span><span class="p">(</span><span class="n">x_train</span><span class="p">,</span> <span class="n">y_train</span><span class="p">,</span> <span class="bp">None</span><span class="p">,</span> <span class="bp">None</span><span class="p">,</span> <span class="n">x_test</span><span class="p">,</span> <span class="n">y_test</span><span class="p">)</span>
</code></pre></div></div>
<p>So, with those functions out of the way, it’s time to get going. First up: loading the data and filling our predefined datatype variable, nothing complex going on here.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Visualize the data
</span><span class="n">visualize_data</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
</code></pre></div></div>
<p><img src="https://www.peculiar-coding-endeavours.com/assets/mlp_vs_cnn/cifarimages.png" alt="cifarimages" /></p>
<p>Let’s have a glimpse at our data. I think you can quite easily identify some categories here. Trucks, cars, horses, frogs, cats. Now you know what we are working with. These are the images we want our network to identify. Just by giving it the values of the pixels in the RGB range.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Preprocess the data
</span><span class="n">categories</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">unique</span><span class="p">(</span><span class="n">data</span><span class="p">.</span><span class="n">y_train</span><span class="p">))</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Shape of x_train pre-processing: "</span><span class="p">,</span> <span class="n">data</span><span class="p">.</span><span class="n">x_train</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Shape of y_train pre-processing: "</span><span class="p">,</span> <span class="n">data</span><span class="p">.</span><span class="n">y_train</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span>
<span class="n">processed_data</span> <span class="o">=</span> <span class="n">preprocess</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">categories</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Shape of x_train post-processing: "</span><span class="p">,</span> <span class="n">processed_data</span><span class="p">.</span><span class="n">x_train</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Shape of y_train post-processing: "</span><span class="p">,</span> <span class="n">processed_data</span><span class="p">.</span><span class="n">y_train</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Shape of x_valid post-processing: "</span><span class="p">,</span> <span class="n">processed_data</span><span class="p">.</span><span class="n">x_valid</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Shape of y_valid post-processing: "</span><span class="p">,</span> <span class="n">processed_data</span><span class="p">.</span><span class="n">y_valid</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Shape of x_test post-processing: "</span><span class="p">,</span> <span class="n">processed_data</span><span class="p">.</span><span class="n">x_test</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Shape of y_test post-processing: "</span><span class="p">,</span> <span class="n">processed_data</span><span class="p">.</span><span class="n">y_test</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Shape of x_train pre-processing: (50000, 32, 32, 3)
Shape of y_train pre-processing: (50000, 1)
Shape of x_train post-processing: (45000, 32, 32, 3)
Shape of y_train post-processing: (45000, 10)
Shape of x_valid post-processing: (5000, 32, 32, 3)
Shape of y_valid post-processing: (5000, 10)
Shape of x_test post-processing: (10000, 32, 32, 3)
Shape of y_test post-processing: (10000, 10)
</code></pre></div></div>
<p>After loading the data, of course we need to do some preprocessing to make it usable. I displayed the dimensions for your viewing pleasure. Can you make sense of them? We start out with the 50000 32x32 pixel RGB images as x_train data, and their 50000 labels. After preprocessing, we transformed the pixels to floats between 0 and 1, and spliced the training set into a training of 45000 and a validation of 5000 images. The labels get one-hot encoded and are no longer a 50000x1 matrix, but a 50000x10 (one column for each category. 9 of those will contain a 0, 1 will contain a 1). Our data is ready to pump through the network!</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Build mlp
</span><span class="n">mlp</span> <span class="o">=</span> <span class="n">build_mlp</span><span class="p">(</span><span class="n">processed_data</span><span class="p">,</span> <span class="n">categories</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"MLP architecture:"</span><span class="p">)</span>
<span class="n">mlp</span><span class="p">.</span><span class="n">summary</span><span class="p">()</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>MLP architecture:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten_2 (Flatten) (None, 3072) 0
_________________________________________________________________
dense_3 (Dense) (None, 1000) 3073000
_________________________________________________________________
dropout_3 (Dropout) (None, 1000) 0
_________________________________________________________________
dense_4 (Dense) (None, 512) 512512
_________________________________________________________________
dropout_4 (Dropout) (None, 512) 0
_________________________________________________________________
dense_5 (Dense) (None, 10) 5130
=================================================================
Total params: 3,590,642
Trainable params: 3,590,642
Non-trainable params: 0
_________________________________________________________________
</code></pre></div></div>
<p>Here we call on the build_mlp function we built earlier. Do you understand the MLP architecture? Let’s go over it together:</p>
<ul>
<li>The flatten layer receives each image as a 32x32x3 matrix, which it flattens to a vector of length 3072. No trainable parameters here, this is getting the image ready for our dense layers.</li>
<li>Our first dense layer had 1000 neurons, remember? That means that each of the 3072 x-values will get multiplied with 3072 weights in a 1000 linear functions (because a dense layer is fully connected, so every node in one layer will be connected to every node in the next layer). But 3072x1000 is 3072000, so we are missing some. Those are the biases, remember those? So it’s not 3072 values, but 3073, which will result in 3073000 parameters. That’s 3073000 weights ready to be trained.</li>
<li>Then we have our first dropout layer. That’ll do what I explained above, with a likelihood of 20%.</li>
<li>Next up, another dense layer, in which each of the 512 nodes will receive the 1000 output x-values (+ a bias) of the previous layer, giving us 512x1001=512512 trainable parameters. That’s starting to add up, don’t you think..?</li>
<li>Another dropout, you know this by now, judging from the amount of parameters, I guess you understand why it’s good to use.</li>
<li>The last dense layer is our output layer. We have 10 categories, so 10 neurons, each outputting the probability of an image being in that category. So, our 512 (+bias) parameters get thrown in each of those, giving us 10x513=5130 weights ready to be optimized.</li>
</ul>
<p>Adding all those up will result in the amount of parameters that are so finely summed up for you by Keras. I think you agree that’s a LOT of work to be done. Starting to see the problem..?</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">mlp_weights_path</span> <span class="o">=</span> <span class="s">"saved_weights/cifar10_mlp_best.hdf5"</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Train the mlp
</span><span class="n">checkpointer_mlp</span> <span class="o">=</span> <span class="n">ModelCheckpoint</span><span class="p">(</span><span class="n">filepath</span><span class="o">=</span><span class="n">mlp_weights_path</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">save_best_only</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">hist_mlp</span> <span class="o">=</span> <span class="n">mlp</span><span class="p">.</span><span class="n">fit</span><span class="p">(</span><span class="n">processed_data</span><span class="p">.</span><span class="n">x_train</span><span class="p">,</span> <span class="n">processed_data</span><span class="p">.</span><span class="n">y_train</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="mi">32</span><span class="p">,</span>
<span class="n">epochs</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">validation_data</span><span class="o">=</span><span class="p">(</span><span class="n">processed_data</span><span class="p">.</span><span class="n">x_valid</span><span class="p">,</span>
<span class="n">processed_data</span><span class="p">.</span><span class="n">y_valid</span><span class="p">),</span>
<span class="n">callbacks</span><span class="o">=</span><span class="p">[</span><span class="n">checkpointer_mlp</span><span class="p">],</span> <span class="n">shuffle</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Train on 45000 samples, validate on 5000 samples
Epoch 1/100
45000/45000 [==============================] - 15s 332us/step - loss: 1.7700 - acc: 0.3732 - val_loss: 1.7557 - val_acc: 0.3632
Epoch 00001: val_loss improved from inf to 1.75572, saving model to saved_weights/cifar10_mlp_best.hdf5
Epoch 2/100
45000/45000 [==============================] - 14s 321us/step - loss: 1.7658 - acc: 0.3761 - val_loss: 1.6900 - val_acc: 0.3952
Epoch 00002: val_loss improved from 1.75572 to 1.69003, saving model to saved_weights/cifar10_mlp_best.hdf5
Epoch 3/100
45000/45000 [==============================] - 14s 321us/step - loss: 1.7655 - acc: 0.3772 - val_loss: 1.9421 - val_acc: 0.3354
................
Epoch 00098: val_loss did not improve from 1.69003
Epoch 99/100
45000/45000 [==============================] - 14s 321us/step - loss: 1.8777 - acc: 0.3590 - val_loss: 1.9564 - val_acc: 0.3396
Epoch 00099: val_loss did not improve from 1.69003
Epoch 100/100
45000/45000 [==============================] - 14s 322us/step - loss: 1.8580 - acc: 0.3591 - val_loss: 2.0851 - val_acc: 0.3490
Epoch 00100: val_loss did not improve from 1.69003
</code></pre></div></div>
<p>So after we have our actual neural network structure defined, it’s time to start training it. ModelCheckPoint is a handy little utility that you can use to store the weights during the training process. This enables us to load them into our network later. The fit function that we call on model takes several parameters, that I will go over shortly:</p>
<ul>
<li>The first argument are the x-values of the training set to be fed into our network</li>
<li>The second argument are the labels, the categories that each respective x-value set should result in. Our network will make a prediction on the x-values, compare the prediction with the given label, and correct the weights in each neuron accordingly.</li>
<li>batch_size: instead of pumping our dataset through the network in one go, we can make smaller batches. The size of the batch is part of hyperparameter optimization, the implications of making it large or small require some understanding, but 32 is often a good starting point. A larger value gives you boosted computational benefits because of matrix multiplication, but requires more memory, which increase the chances of running out of resources. Smaller batch sizes introduce more noise in their error calculation, which can be helpful (overcoming local minima). Experiment with 32, 64, 128 and other values depending on your application and data.</li>
<li>verbose=1 says something about the degree of feedback you want during the training process</li>
<li>save_best_only tells our ModelCheckPoint to only save the neuron weights when improvement is being made (in our validation set)</li>
<li>epochs: just interpret this as iterations, to keep it simple. 1 epoch is equivalent to pumping the entire dataset through the network 1 single time.</li>
<li>validation_data are the x-values and y-labels that we want to validate our updated network against and jugde whether things are improving</li>
<li>callbacks: an array with, well guess what, callbacks to execute during training. We pass it the checkpointer to save the weights.</li>
<li>shuffle: whether to shuffle the training data before each epoch</li>
</ul>
<p>Now, if you execute this yourself in the notebook or in a program, now would be a good time to grab some coffee. This will take a bit of time. Notice we do 100 epochs, so although it’s not too dramatic, your system will be busy for a bit. You can inspect the feedback of each epoch, which displays the time spent, the result of the loss function and accuracy for the training and validation sets.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">visualize_training</span><span class="p">(</span><span class="n">hist_mlp</span><span class="p">)</span>
</code></pre></div></div>
<p><img src="https://www.peculiar-coding-endeavours.com/assets/mlp_vs_cnn/mlp_hist_acc.png" alt="mlp_hist_acc" /></p>
<p><img src="https://www.peculiar-coding-endeavours.com/assets/mlp_vs_cnn/mlp_hist_loss.png" alt="mlp_hist_loss" /></p>
<p>Here, we visualize our training process. We see the accuracy on our training data move around quite a bit, and the validation set jumps all across the board. Same for the loss. Basically, this type of visualization can be handy to quickly pinpoint the exact point we pass our <strong>goldilocks point</strong> and we start overfitting to our training set. I’d say, in time, it’s definitely getting worse. That could tell us that 100 is pretty much overkill. There are options you can set up in TensorFlow and Keras to make sure the process stops when accuracy doesn’t increase for x number of epochs, so that can save you some time.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">mlp</span><span class="p">.</span><span class="n">load_weights</span><span class="p">(</span><span class="n">mlp_weights_path</span><span class="p">)</span>
<span class="n">score_mlp</span> <span class="o">=</span> <span class="n">mlp</span><span class="p">.</span><span class="n">evaluate</span><span class="p">(</span><span class="n">processed_data</span><span class="p">.</span><span class="n">x_test</span><span class="p">,</span> <span class="n">processed_data</span><span class="p">.</span><span class="n">y_test</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
</code></pre></div></div>
<p>So, after doing the entire training process, we load the optimal weights from the checkpoint file and evaluate the model with these weights against data it never even saw before, namely the test set. I know you are curious about the result, but let’s save that for last, where we compare it to what our next neural network does: the convolutional neural network.</p>
<h3 id="convolutional-neural-networks">Convolutional Neural Networks</h3>
<p>Before jumping into the CNN implementation, we do have to hit the brake a bit, and take a look at what’s not to like about the MLP implementation of this problem. As you might have guessed already, judging from the remarks I made in the MLP architecture paragraph, there were a <strong>whole lot of parameters</strong> to train: well over 3 million parameters. Now, that’s for small images of 32x32 pixels. A natural result is that it took quite some time.</p>
<p>Another very important disadvantage is this: we took those pixels, and we flattened them. Instead of using the nice 32x32x3 matrix of data. In short: we <strong>disregarded the spacial properties</strong> of images, and just threw the values in one long row. When thinking about images, that alone can hardly be ideal. Images are spacially ordered data, so we want to treat them as such. Those dimensional properties will for sure help us in classification, and enable us to achieve a higher degree of precision in our predictions. And exactly that is where CNN’s come in to save the day. The bad news is that we just have to dive into a tiny bit of theory for that. This is fun stuff however, so let’s go for it. We already saw the use of Dense, Dropout, Activation, Flatten layers in vanilla feed forward networks. In CNN’s we will add just 2 more: Convolutional (big surprise) and Pooling layers. To make it clear what those do, we’ll shortly dive into how a CNN looks at image data to perform its magic.</p>
<h3 id="convolutional-layer">Convolutional layer</h3>
<p>First and foremost, I will keep this explanation rather short and simple. Although CNN’s are not hard to understand, there’s a lot that can be said about them, and I just want to communicate the basic premise of how they work, as that is the main message I want to get across.</p>
<p>The main takeaway is that, when using CNN’s, you keep in mind the fact that image pixels that are close to eachother are more heavily related than pixels far away from eachother. So what we do not want, is to remove that spacial correlation, and “unpack” or “flatten” those pixels to a long array, as we did in the case of MLP’s. A better alternative would be to keep that matrix alive and look at pixel groups in order to process the raw data points in that matrix. Spacial information matters, and that is exactly the core idea that drives Convolutional layers.</p>
<p>First, let’s introduce the concept of a <strong>feature</strong>. A feature is nothing more than a small 2-dimensional array of values which represents a pattern that we are looking for. In order for a CNN to classify images, we will look for patterns in the image by scanning it piece by piece. We slide a small little 2-dimensional window (<strong>kernel</strong>) over the image, and we will look for features. This way, CNN’s get a lot better at identifying parts of an image instead of taking the whole image in as one big chunk. Features recognize certain aspects of an image. In the case of our images, features at the start of our network could consist of horizontal, vertical or diagonal lines, and in later layers of our network, we could start recognizing ever more elaborate shapes, circles or squares, up to noses, eyes, lips, and so on.</p>
<p>When an image is pushed through a CNN, the network won’t know where the features we are looking for will be located, so it will look for them in every possible position. In order to calculate the matches to a feature across the whole image, we make it into a filter. The actual math that is used is called <strong>convolution</strong>, from which Convolutional Neural Networks take their name. The simple gist of it, is that each pixel in the feature is multiplied by the value of the corresponding pixel in the image. Then all results are summed up and divided by the total number of pixels in the feature. This way we can identify matches to a certain pattern in a particular part of the image. This process is then repeated for every part in the image. Using all the outputs of those calculations, a new 2-dimensional array is created, which is in fact a filtered version of the original image, “highlighting” the specific features we are looking for. Moving through the layers, more complicated features will be identified, which looks something like this visualized:</p>
<p><img src="https://www.peculiar-coding-endeavours.com/assets/mlp_vs_cnn/features.png" alt="features" /></p>
<p>An added benefit of this approach is that our layers do not need to be not fully connected. Not every node in a layer is connected to every node in a next layer. This avoids an explosion in the amount of parameters, and thus, we need to do much less work. If you don’t understand this, reconsider the fact that you slide a window over an image, calculating a result using the pixels in that small window and the corresponding values of the window. Within each window position, you calculate a certain result and don’t take the rest of the image into account. That only happens after you slide the window to its next position and redo the process. This principle is called <strong>local connectivity</strong>. An example of the actual way the calculation occurs can be seen here:</p>
<p><img src="https://www.peculiar-coding-endeavours.com/assets/mlp_vs_cnn/locality.jpg" alt="locality" /></p>
<p>I hope it’s somewhat clear that, by running a window of a certain small enough (another hyperparameter) size over an image and doing some simple elementwise matrix multiplications, we end up with identification of certain features, those being ever more complicated lines, shapes, and so on. Because we add several filters in each layer, we end up deepening our images. When inputting for example a 32x32x3 pixel image (32 pixels wide, 32 pixels high, and 3 layers in the case of RGB images), and throwing 16 filters at it, we end up with a 32x32x16 dimension matrix. Each layer will still represent the image, but will contain information about shapes and lines instead of simple color values.</p>
<h3 id="pooling-layer">Pooling layer</h3>
<p>While Convolutional layers are responsible for <strong>deepening</strong> an image by introducing multiple filters, Pooling layers make sure the width and height of an image get reduced. This is done by way of the same concept of sliding a window of certain dimensions over the input matrix, and outputting a value, depending on the type of Pooling layer:</p>
<ul>
<li>MaxPooling layers take the average of the values that the kernel is currently sliding over, and takes the maximum value there. Quite simple, using an example, this looks like the picture below, effectively cutting the width and height dimensions in half. Note the term <em>stride</em>, which is nothing more than the steps the window takes. If it is 1, the window will just slide one pixel over after selecting the largest value. In this example, the window size is 2x2 and the stride is also 2, which means we will have a total of 4 positions to move to and calculate a new result.</li>
</ul>
<p><img src="https://www.peculiar-coding-endeavours.com/assets/mlp_vs_cnn/maxpooling.png" alt="maxpooling" /></p>
<ul>
<li>Global pooling layers do not use a stride or kernel size, but quite simply take the average of the entire set of values in a single feature map, and outputs 1 value. This is a more extreme way of reducing dimensionality.</li>
</ul>
<p><img src="https://www.peculiar-coding-endeavours.com/assets/mlp_vs_cnn/globalpooling.png" alt="globalpooling" /></p>
<p>As such, the result of a Pooling layer is that a certain image gets reduced in width and height, while respecting the filters we added by using Convolutional layers. Slowly but surely, as we push through the network, our image is transformed from a matrix that is much wider and higher than it is deep, to a very small but very deep matrix, until it’s ready to go to a Dense layer, which as we know already requires a 1 dimensional vector.</p>
<p>That’s all the theoretical gibberish you’ll get thrown at you! I hope you can agree it wasn’t that bad, and quite interesting as well to see how a computer treats the information in an image, as compared to the way humans do it! For those waiting for more code, now’s the time to put all that stuff into practice. As a summary, to hammer the point home, a complete CNN architecture could look something like the picture below. Again, realize that this explanation (even more so for CNN’s than was the case for MLP’s) is extremely brief and simple. Much more is to be said about all the intricacies here, but this short introduction does give you an idea of the basic concepts.</p>
<p><img src="https://www.peculiar-coding-endeavours.com/assets/mlp_vs_cnn/cnnexample.png" alt="cnnexample" /></p>
<h3 id="cnn-implementation">CNN implementation</h3>
<p>And here we are, let’s implement an actual CNN. As with the MLP, I’ll guide you through every step of the process. Surely, if you paid attention, some parts require much less explanation. I will provide more explanation in parts where I think it’s required though, no worries. Without further yapping on, here goes:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">keras.layers</span> <span class="kn">import</span> <span class="n">Conv2D</span><span class="p">,</span> <span class="n">MaxPooling2D</span>
</code></pre></div></div>
<p>First, we will import the 2D Convolutional and MaxPooling layers that we need to create our CNN architecture.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">build_cnn</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">categories</span><span class="p">):</span>
<span class="c1"># Create model architecture
</span> <span class="n">model</span> <span class="o">=</span> <span class="n">Sequential</span><span class="p">()</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Conv2D</span><span class="p">(</span><span class="n">filters</span><span class="o">=</span><span class="mi">16</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="s">"same"</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">"relu"</span><span class="p">,</span>
<span class="n">input_shape</span><span class="o">=</span><span class="n">data</span><span class="p">.</span><span class="n">x_train</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">:]))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">MaxPooling2D</span><span class="p">(</span><span class="n">pool_size</span><span class="o">=</span><span class="mi">2</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Conv2D</span><span class="p">(</span><span class="n">filters</span><span class="o">=</span><span class="mi">32</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="s">"same"</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">"relu"</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">MaxPooling2D</span><span class="p">(</span><span class="n">pool_size</span><span class="o">=</span><span class="mi">2</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Conv2D</span><span class="p">(</span><span class="n">filters</span><span class="o">=</span><span class="mi">64</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="s">"same"</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">"relu"</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">MaxPooling2D</span><span class="p">(</span><span class="n">pool_size</span><span class="o">=</span><span class="mi">2</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Dropout</span><span class="p">(</span><span class="mf">0.3</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Flatten</span><span class="p">())</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Dense</span><span class="p">(</span><span class="mi">500</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">"relu"</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Dropout</span><span class="p">(</span><span class="mf">0.4</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Dense</span><span class="p">(</span><span class="n">categories</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">"softmax"</span><span class="p">))</span>
<span class="c1"># Compile the model
</span> <span class="n">model</span><span class="p">.</span><span class="nb">compile</span><span class="p">(</span><span class="n">loss</span><span class="o">=</span><span class="s">"categorical_crossentropy"</span><span class="p">,</span> <span class="n">optimizer</span><span class="o">=</span><span class="s">"rmsprop"</span><span class="p">,</span>
<span class="n">metrics</span><span class="o">=</span><span class="p">[</span><span class="s">"accuracy"</span><span class="p">])</span>
<span class="k">return</span> <span class="n">model</span>
</code></pre></div></div>
<p>As expected, now we start implementing the architecture of our CNN. Note that we don’t have to do any special or extra preprocessing of our data, we can just start with the same preprocessed data as we used with the MLP. For starters, we again define a Sequential model. Then we add our very first Convolutional layer. This consists of 16 filters, each of which uses a 2x2 window that we will slide over our image. The padding is set to “same”, which means that we will add blank pixels at the side of our images in the case that the image width and height don’t enable the convolutional window to nicely cover each original image pixel block. Our activation function is again the familiar reactive linear unit, and as always we provide the training shape of 32x32x3.</p>
<p>After each Convolutional layer, we place a MaxPooling layer with the same window size. As you see, this combination is applied several times. The effects of this, are that the width and height of our images get reduced (basically halved because of the window size of 2x2), but dimensions get added. Over the course of the entire network we slowly convert our 32x32x3 image to a long array of numbers.</p>
<p>After 3 sets of Convolutional+Pooling layers, we add a Dropout layer. Since our array is at that point not yet a 1-dimensional vector, we do require a Flatten layer at this point. We connect this to a Dense layer with 500 neurons, followed by another Dropout layer and a final Dense output layer to calculate the probabilities for each of the categories using softmax. We compile the model with the categorical_crossentropy loss function, and using rmsprop as optimizer. Of course, we aim for accuracy.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Build cnn
</span><span class="n">cnn</span> <span class="o">=</span> <span class="n">build_cnn</span><span class="p">(</span><span class="n">processed_data</span><span class="p">,</span> <span class="n">categories</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"CNN architecture:"</span><span class="p">)</span>
<span class="n">cnn</span><span class="p">.</span><span class="n">summary</span><span class="p">()</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>CNN architecture:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 32, 32, 16) 208
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 16, 16, 16) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 16, 16, 32) 2080
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 8, 8, 32) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 8, 8, 64) 8256
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 4, 4, 64) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 4, 4, 64) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 1024) 0
_________________________________________________________________
dense_1 (Dense) (None, 500) 512500
_________________________________________________________________
dropout_2 (Dropout) (None, 500) 0
_________________________________________________________________
dense_2 (Dense) (None, 10) 5010
=================================================================
Total params: 528,054
Trainable params: 528,054
Non-trainable params: 0
_________________________________________________________________
</code></pre></div></div>
<p>As we did for your MLP, we print out a summary of our CNN architecture and try to explain the numbers we are seeing. Each image enters the first Convolutional layer as a 32x32x3 matrix. Remember again, that is a vast distinction from the MLP, which need a Flattened array. So our output shape here is 32x32x16, since we use 16 filters, effectively deepening the image. Concerning the amount of parameters, remember our window size was 2x2, and we convolve it over each of the 3 RGB layers. As mentioned we have 16 filters, and we shouldn’t forget the biases as well. That gives us ((2x2x3)+1)x16=208 parameters. Not hardly as much as the first layer in the MLP.</p>
<p>Then we get our MaxPooling layer, which takes that 32x32x16 matrix and takes the maximum of each 2x2 window, for each of the 16 filters, leaving us with a 16x16x16 matrix. Onto the second Convolutional layer then. This outputs a 16x16x32 matrix, since we have 32 filters. Using the same formula for our parameters, we calculate ((2x2x16)+1)x32=2080 parameters. Feeding the 16x16x32 matrix into the next MaxPooling layer results in a 8x8x32 matrix. Getting leaner and meaner (I know that’s a bad joke, I don’t care). The last Convolutional window then again takes that 8x8x32 matrix and leaves the image as a 8x8x64 matrix, using ((2x2x32)+1)x64=8256 weights that we can optimize. The final MaxPooling layer thins out that image matrix even more, into a 4x4x64 matrix.</p>
<p>At this point, all CNN specific layers are pretty much in place. We now want to add some fully connected layers, for which we obviously need to flatten our matrix, resulting in a 1024 length array. That’s no big drama by now: we squeezed out all interesting dimensional data and used it to optimize the weights. Now it’s ok to flatten it out and throw a Dense layer at it. We use 500 nodes here, and 1025x500=512500 weights ready to be optimized. Again with the weights explosion, but no fear, we’re nearly at the end.</p>
<p>After another Dropout layer to counteract overfitting, we go to the last Dense layer, which is the output layer to calculate probability for the 10 categories, which requires 5010 parameters. In total we hit around half a million parameters, which is substantially less than in the case of the MLP. Will it pay off though? Let’s train the network and find out.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">cnn_weights_path</span> <span class="o">=</span> <span class="s">"saved_weights/cifar10_cnn_best.hdf5"</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Train the cnn
</span><span class="n">checkpointer_cnn</span> <span class="o">=</span> <span class="n">ModelCheckpoint</span><span class="p">(</span><span class="n">cnn_weights_path</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">save_best_only</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">hist_cnn</span> <span class="o">=</span> <span class="n">cnn</span><span class="p">.</span><span class="n">fit</span><span class="p">(</span><span class="n">processed_data</span><span class="p">.</span><span class="n">x_train</span><span class="p">,</span> <span class="n">processed_data</span><span class="p">.</span><span class="n">y_train</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="mi">32</span><span class="p">,</span>
<span class="n">epochs</span><span class="o">=</span><span class="mi">20</span><span class="p">,</span> <span class="n">validation_data</span><span class="o">=</span><span class="p">(</span><span class="n">processed_data</span><span class="p">.</span><span class="n">x_valid</span><span class="p">,</span>
<span class="n">processed_data</span><span class="p">.</span><span class="n">y_valid</span><span class="p">),</span>
<span class="n">callbacks</span><span class="o">=</span><span class="p">[</span><span class="n">checkpointer_cnn</span><span class="p">])</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Train on 45000 samples, validate on 5000 samples
Epoch 1/20
45000/45000 [==============================] - 13s 286us/step - loss: 1.0418 - acc: 0.6503 - val_loss: 0.9909 - val_acc: 0.6718
Epoch 00001: val_loss improved from inf to 0.99090, saving model to saved_weights/cifar10_cnn_best.hdf5
Epoch 2/20
45000/45000 [==============================] - 13s 281us/step - loss: 1.0583 - acc: 0.6434 - val_loss: 0.9868 - val_acc: 0.6794
Epoch 00002: val_loss improved from 0.99090 to 0.98677, saving model to saved_weights/cifar10_cnn_best.hdf5
Epoch 3/20
45000/45000 [==============================] - 13s 294us/step - loss: 1.0701 - acc: 0.6438 - val_loss: 1.1324 - val_acc: 0.6510
................
Epoch 00018: val_loss did not improve from 0.95311
Epoch 19/20
45000/45000 [==============================] - 12s 269us/step - loss: 1.3577 - acc: 0.5526 - val_loss: 1.4432 - val_acc: 0.5072
Epoch 00019: val_loss did not improve from 0.95311
Epoch 20/20
45000/45000 [==============================] - 12s 271us/step - loss: 1.3729 - acc: 0.5500 - val_loss: 1.2693 - val_acc: 0.5810
Epoch 00020: val_loss did not improve from 0.95311
</code></pre></div></div>
<p>By now, this hardly needs explanation. We define our checkpoint again, and start pushing our dataset through our CNN. Note we only use 20 epochs this time, which substantially cuts down training time as well.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">visualize_training</span><span class="p">(</span><span class="n">hist_cnn</span><span class="p">)</span>
</code></pre></div></div>
<p><img src="https://www.peculiar-coding-endeavours.com/assets/mlp_vs_cnn/cnn_hist_acc.png" alt="cnn_hist_acc" /></p>
<p><img src="https://www.peculiar-coding-endeavours.com/assets/mlp_vs_cnn/cnn_hist_loss.png" alt="cnn_hist_loss" /></p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">cnn</span><span class="p">.</span><span class="n">load_weights</span><span class="p">(</span><span class="n">cnn_weights_path</span><span class="p">)</span>
<span class="n">score_cnn</span> <span class="o">=</span> <span class="n">cnn</span><span class="p">.</span><span class="n">evaluate</span><span class="p">(</span><span class="n">processed_data</span><span class="p">.</span><span class="n">x_test</span><span class="p">,</span> <span class="n">processed_data</span><span class="p">.</span><span class="n">y_test</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
</code></pre></div></div>
<p>We load up the optimized weights, and evaluate our model against our test dataset. We’ve seen that before. Ultimately, let’s compare the accuracy of our MLP in the previous section with this CNN.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Compare the scores
</span><span class="k">print</span><span class="p">(</span><span class="s">"Accuracy mlp: {0:.2f}%"</span><span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="n">score_mlp</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">*</span> <span class="mi">100</span><span class="p">))</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Accuracy cnn: {0:.2f}%"</span><span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="n">score_cnn</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">*</span> <span class="mi">100</span><span class="p">))</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Accuracy mlp: 39.42%
Accuracy cnn: 67.90%
</code></pre></div></div>
<p>Well, I think the difference is pretty clear. Although the MLP didn’t perform bad (way above “random guessing” levels of about 10%, which proves we are at least doing something right here), the CNN shatters those numbers, with less training time and way less parameters.</p>
<p>The reasons for this should be rather clear at this point. The sheer conservation of dimensional data, and the possibility to keep that into account during much of the training process obviously is crucial in recognizing the images. Achieving 67% accuracy is pretty good as a first attempt!</p>
<p>Can we achieve more? Obviously we can, but remember we just trained this model from scratch, in a few minutes. Our architecture is rather simple, our hyperparameters are not really optimized or researched, and we didn’t even use techniques such as image augmentation or transfer learning, which would push our results to even higher levels, especially for real-life applications and more messy datasets. In short, it’s safe to say that at least for this type of application, we have a clear winner.</p>
<h3 id="concluding">Concluding</h3>
<p>So far for this little proof of concept! I do hope this sparked your interest to go and research more about deep learning, MLP’s, CNN’s and much more. Although it can be easily overlooked, remember what we did here! For 10000 images, using the simple 32x32x3 matrix of values between 0 and 255, we managed to determine, with an accuracy of 67%, what that image represents. That in itself is pretty awesome, especially taking into account that the network was trained in only a few minutes. That beats having to classify those 10000 images by hand I’d say.</p>
<p>However, there’s so much more out there (RNN’s, GAN’s, auto-encoders, …) to apply on an enormous amount of areas of industry, and research is still very much ongoing in most of these fields. Time to get involved! Much more fun stuff, examples and tutorials will follow in time, using deep learning/machine learning techniques such as MLP’s, CNN’s, RNN’s, GAN’s and so on, or other AI applications. Stay tuned, and by all means, give me your feedback about what you thought about this example ;-)</p>
<h3 id="and-by-the-way">And by the way</h3>
<p>In the case of 67% accuracy does not overly impress you and you need some more persuasion, let’s aim a bit higher. I will implement a few techniques I mentioned earlier already:</p>
<ul>
<li>data augmentation</li>
<li>regularization: we’ll use <strong>L2 regularization</strong></li>
</ul>
<p>I will additionaly increase the complexity of the architecture of the network by adding more layers. We’ll also take more control of the optimizer that we use during the training, and will adapt the learning rate as we go. Lastly (or rather, firstly), we’ll preprocess our data just a little bit differently than we did in our previous examples. I’ll add the minimal explanation and will highlight some important terms you want to research in order to better understand what I’m doing there. This part of the code is more about a short demonstration about what’s possible, instead of explaining fully how to achieve it.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">keras.layers</span> <span class="kn">import</span> <span class="n">BatchNormalization</span>
<span class="kn">from</span> <span class="nn">keras.optimizers</span> <span class="kn">import</span> <span class="n">rmsprop</span>
<span class="kn">from</span> <span class="nn">keras.callbacks</span> <span class="kn">import</span> <span class="n">LearningRateScheduler</span><span class="p">,</span> <span class="n">EarlyStopping</span>
<span class="kn">from</span> <span class="nn">keras.preprocessing.image</span> <span class="kn">import</span> <span class="n">ImageDataGenerator</span>
<span class="kn">from</span> <span class="nn">keras</span> <span class="kn">import</span> <span class="n">regularizers</span>
</code></pre></div></div>
<p>You clearly see what we are doing here. We import all libraries and functions that we need to perform the higher level of customization to apply to our neural network.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">optimized_preprocess</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">categories</span><span class="p">):</span>
<span class="c1"># Z-score normalization of data
</span> <span class="n">mean</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">data</span><span class="p">.</span><span class="n">x_train</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">))</span>
<span class="n">std</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">std</span><span class="p">(</span><span class="n">data</span><span class="p">.</span><span class="n">x_train</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">))</span>
<span class="n">x_train</span> <span class="o">=</span> <span class="p">((</span><span class="n">data</span><span class="p">.</span><span class="n">x_train</span> <span class="o">-</span> <span class="n">mean</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="n">std</span> <span class="o">+</span> <span class="mf">1e-7</span><span class="p">)).</span><span class="n">astype</span><span class="p">(</span><span class="s">"float32"</span><span class="p">)</span>
<span class="n">x_test</span> <span class="o">=</span> <span class="p">((</span><span class="n">data</span><span class="p">.</span><span class="n">x_test</span> <span class="o">-</span> <span class="n">mean</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="n">std</span> <span class="o">+</span> <span class="mf">1e-7</span><span class="p">)).</span><span class="n">astype</span><span class="p">(</span><span class="s">"float32"</span><span class="p">)</span>
<span class="n">y_train</span> <span class="o">=</span> <span class="n">to_categorical</span><span class="p">(</span><span class="n">data</span><span class="p">.</span><span class="n">y_train</span><span class="p">,</span> <span class="n">categories</span><span class="p">)</span>
<span class="n">y_test</span> <span class="o">=</span> <span class="n">to_categorical</span><span class="p">(</span><span class="n">data</span><span class="p">.</span><span class="n">y_test</span><span class="p">,</span> <span class="n">categories</span><span class="p">)</span>
<span class="k">return</span> <span class="n">Data</span><span class="p">(</span><span class="n">x_train</span><span class="p">[</span><span class="mi">5000</span><span class="p">:],</span> <span class="n">y_train</span><span class="p">[</span><span class="mi">5000</span><span class="p">:],</span>
<span class="n">x_train</span><span class="p">[:</span><span class="mi">5000</span><span class="p">],</span> <span class="n">y_train</span><span class="p">[:</span><span class="mi">5000</span><span class="p">],</span>
<span class="n">x_test</span><span class="p">,</span> <span class="n">y_test</span><span class="p">)</span>
</code></pre></div></div>
<p>Here we perform the known necessary normalization of the data. However, we don’t just divide by 255 as we did before, but we use Z-score or standard score normalization. Read more about methods of normalization <a href="https://en.wikipedia.org/wiki/Normalization_(statistics)" target="_blank">here</a>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">learningrate_schedule</span><span class="p">(</span><span class="n">epoch</span><span class="p">):</span>
<span class="c1"># We use a standard learning rate of 0.001
</span> <span class="c1"># From the 51st epoch, we decrease it to 0.0007
</span> <span class="c1"># From the 101st epoch, we decrease it further to 0.0005
</span> <span class="c1"># From the 1356h epoch, we decrease it further to 0.0003
</span> <span class="c1"># From the 176th epoch, we decrease it further to 0.0001
</span> <span class="n">rate</span> <span class="o">=</span> <span class="mf">0.001</span>
<span class="k">if</span> <span class="n">epoch</span> <span class="o">></span> <span class="mi">175</span><span class="p">:</span>
<span class="n">rate</span> <span class="o">=</span> <span class="mf">0.0001</span>
<span class="k">elif</span> <span class="n">epoch</span> <span class="o">></span> <span class="mi">135</span><span class="p">:</span>
<span class="n">rate</span> <span class="o">=</span> <span class="mf">0.0003</span>
<span class="k">elif</span> <span class="n">epoch</span> <span class="o">></span> <span class="mi">100</span><span class="p">:</span>
<span class="n">rate</span> <span class="o">=</span> <span class="mf">0.0005</span>
<span class="k">elif</span> <span class="n">epoch</span> <span class="o">></span> <span class="mi">50</span><span class="p">:</span>
<span class="n">rate</span> <span class="o">=</span> <span class="mf">0.0007</span>
<span class="k">return</span> <span class="n">rate</span>
</code></pre></div></div>
<p>Remember this function, we’ll use it later during the training, as one of the callbacks. It implies that the learning rate starts out as being 0.001 (which is often a reasonable starting point), and as we enter the higher epochs, we change it to lower values. This is one of the ways in which we can avoid <strong>local minima</strong> during the gradient descent.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">build_optimized_cnn</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">categories</span><span class="p">):</span>
<span class="c1"># Create model architecture
</span> <span class="n">weight_decay</span> <span class="o">=</span> <span class="mf">1e-4</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">Sequential</span><span class="p">()</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Conv2D</span><span class="p">(</span><span class="mi">32</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="s">"same"</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">"elu"</span><span class="p">,</span>
<span class="n">kernel_regularizer</span><span class="o">=</span><span class="n">regularizers</span><span class="p">.</span><span class="n">l2</span><span class="p">(</span><span class="n">weight_decay</span><span class="p">),</span> <span class="n">input_shape</span><span class="o">=</span><span class="n">data</span><span class="p">.</span><span class="n">x_train</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">:]))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">BatchNormalization</span><span class="p">())</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Conv2D</span><span class="p">(</span><span class="mi">32</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="s">"same"</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">"elu"</span><span class="p">,</span>
<span class="n">kernel_regularizer</span><span class="o">=</span><span class="n">regularizers</span><span class="p">.</span><span class="n">l2</span><span class="p">(</span><span class="n">weight_decay</span><span class="p">)))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">BatchNormalization</span><span class="p">())</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">MaxPooling2D</span><span class="p">(</span><span class="n">pool_size</span><span class="o">=</span><span class="mi">2</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Dropout</span><span class="p">(</span><span class="mf">0.2</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Conv2D</span><span class="p">(</span><span class="mi">64</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="s">"same"</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">"elu"</span><span class="p">,</span>
<span class="n">kernel_regularizer</span><span class="o">=</span><span class="n">regularizers</span><span class="p">.</span><span class="n">l2</span><span class="p">(</span><span class="n">weight_decay</span><span class="p">)))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">BatchNormalization</span><span class="p">())</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Conv2D</span><span class="p">(</span><span class="mi">64</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="s">"same"</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">"elu"</span><span class="p">,</span>
<span class="n">kernel_regularizer</span><span class="o">=</span><span class="n">regularizers</span><span class="p">.</span><span class="n">l2</span><span class="p">(</span><span class="n">weight_decay</span><span class="p">)))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">BatchNormalization</span><span class="p">())</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">MaxPooling2D</span><span class="p">(</span><span class="n">pool_size</span><span class="o">=</span><span class="mi">2</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Dropout</span><span class="p">(</span><span class="mf">0.3</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Conv2D</span><span class="p">(</span><span class="mi">128</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="s">"same"</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">"elu"</span><span class="p">,</span>
<span class="n">kernel_regularizer</span><span class="o">=</span><span class="n">regularizers</span><span class="p">.</span><span class="n">l2</span><span class="p">(</span><span class="n">weight_decay</span><span class="p">)))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">BatchNormalization</span><span class="p">())</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Conv2D</span><span class="p">(</span><span class="mi">128</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="s">"same"</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">"elu"</span><span class="p">,</span>
<span class="n">kernel_regularizer</span><span class="o">=</span><span class="n">regularizers</span><span class="p">.</span><span class="n">l2</span><span class="p">(</span><span class="n">weight_decay</span><span class="p">)))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">BatchNormalization</span><span class="p">())</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">MaxPooling2D</span><span class="p">(</span><span class="n">pool_size</span><span class="o">=</span><span class="mi">2</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Dropout</span><span class="p">(</span><span class="mf">0.4</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Flatten</span><span class="p">())</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Dense</span><span class="p">(</span><span class="n">categories</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">"softmax"</span><span class="p">))</span>
<span class="c1"># Compile the model, using an optimized rms this time, which we will adapt
</span> <span class="c1"># during training
</span> <span class="n">optimized_rmsprop</span> <span class="o">=</span> <span class="n">rmsprop</span><span class="p">(</span><span class="n">lr</span><span class="o">=</span><span class="mf">0.001</span><span class="p">,</span><span class="n">decay</span><span class="o">=</span><span class="mf">1e-6</span><span class="p">)</span>
<span class="n">model</span><span class="p">.</span><span class="nb">compile</span><span class="p">(</span><span class="n">loss</span><span class="o">=</span><span class="s">"categorical_crossentropy"</span><span class="p">,</span> <span class="n">optimizer</span><span class="o">=</span><span class="n">optimized_rmsprop</span><span class="p">,</span>
<span class="n">metrics</span><span class="o">=</span><span class="p">[</span><span class="s">"accuracy"</span><span class="p">])</span>
<span class="k">return</span> <span class="n">model</span>
</code></pre></div></div>
<p>That’s a slightly more interesting architecture, isn’t it? Notice the use of more 2D-convolutional layers, the kernel regularizers, batch normalization, the pool and kernel sizes, the dropout layers. This obviously builds on what we saw in the more basic cnn architecture. Be sure to research the terms that are not familiar to you:</p>
<ul>
<li><strong>kernel regularization</strong></li>
<li><strong>L1 and L2 regularization</strong></li>
<li><strong>batch normalization</strong></li>
</ul>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Preprocess for optimized cnn
</span><span class="n">optimized_processed_data</span> <span class="o">=</span> <span class="n">optimized_preprocess</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">categories</span><span class="p">)</span>
</code></pre></div></div>
<p>Not much to say here. We perform the preprocessing step as described above.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Build optimized cnn
</span><span class="n">optimized_cnn</span> <span class="o">=</span> <span class="n">build_optimized_cnn</span><span class="p">(</span><span class="n">optimized_processed_data</span><span class="p">,</span> <span class="n">categories</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Optimized CNN architecture:"</span><span class="p">)</span>
<span class="n">optimized_cnn</span><span class="p">.</span><span class="n">summary</span><span class="p">()</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Optimized CNN architecture:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_40 (Conv2D) (None, 32, 32, 32) 896
_________________________________________________________________
batch_normalization_25 (Batc (None, 32, 32, 32) 128
_________________________________________________________________
conv2d_41 (Conv2D) (None, 32, 32, 32) 9248
_________________________________________________________________
batch_normalization_26 (Batc (None, 32, 32, 32) 128
_________________________________________________________________
max_pooling2d_28 (MaxPooling (None, 16, 16, 32) 0
_________________________________________________________________
dropout_39 (Dropout) (None, 16, 16, 32) 0
_________________________________________________________________
conv2d_42 (Conv2D) (None, 16, 16, 64) 18496
_________________________________________________________________
batch_normalization_27 (Batc (None, 16, 16, 64) 256
_________________________________________________________________
conv2d_43 (Conv2D) (None, 16, 16, 64) 36928
_________________________________________________________________
batch_normalization_28 (Batc (None, 16, 16, 64) 256
_________________________________________________________________
max_pooling2d_29 (MaxPooling (None, 8, 8, 64) 0
_________________________________________________________________
dropout_40 (Dropout) (None, 8, 8, 64) 0
_________________________________________________________________
conv2d_44 (Conv2D) (None, 8, 8, 128) 73856
_________________________________________________________________
batch_normalization_29 (Batc (None, 8, 8, 128) 512
_________________________________________________________________
conv2d_45 (Conv2D) (None, 8, 8, 128) 147584
_________________________________________________________________
batch_normalization_30 (Batc (None, 8, 8, 128) 512
_________________________________________________________________
max_pooling2d_30 (MaxPooling (None, 4, 4, 128) 0
_________________________________________________________________
dropout_41 (Dropout) (None, 4, 4, 128) 0
_________________________________________________________________
flatten_18 (Flatten) (None, 2048) 0
_________________________________________________________________
dense_39 (Dense) (None, 10) 20490
=================================================================
Total params: 309,290
Trainable params: 308,394
Non-trainable params: 896
_________________________________________________________________
</code></pre></div></div>
<p>Here we build our more advanced cnn (at least in terms of architecture, we’ll see about the results later). As you see, there’s quite a bit more going on, and we have quite a few parameters again.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Perform data augmentation
</span><span class="n">datagen</span> <span class="o">=</span> <span class="n">ImageDataGenerator</span><span class="p">(</span><span class="n">rotation_range</span><span class="o">=</span><span class="mi">15</span><span class="p">,</span> <span class="n">width_shift_range</span><span class="o">=</span><span class="mf">0.15</span><span class="p">,</span>
<span class="n">height_shift_range</span><span class="o">=</span><span class="mf">0.15</span><span class="p">,</span> <span class="n">horizontal_flip</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">datagen</span><span class="p">.</span><span class="n">fit</span><span class="p">(</span><span class="n">optimized_processed_data</span><span class="p">.</span><span class="n">x_train</span><span class="p">)</span>
</code></pre></div></div>
<p>Here you can see one of the new concepts in the wild: data augmentation. In short, what we do is rotate, shift and flip the images, thereby introducting more variation in our dataset, so that our network could generalize beter. That’s, in a single simple sentence, what data augmentation is about. Read more about how to do it in Keras <a href="https://keras.io/preprocessing/image/" target="_blank">here</a>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Train the optimized cnn
</span><span class="n">batch_size</span> <span class="o">=</span> <span class="mi">64</span>
<span class="n">optimized_cnn_path_best</span> <span class="o">=</span> <span class="s">"saved_weights/optimized_cifar10_cnn_best.hdf5"</span>
<span class="n">checkpointer_optimized_cnn</span> <span class="o">=</span> <span class="n">ModelCheckpoint</span><span class="p">(</span><span class="n">optimized_cnn_path_best</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="n">save_best_only</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">hist_optimized_cnn</span> <span class="o">=</span>
<span class="n">optimized_cnn</span><span class="p">.</span><span class="n">fit_generator</span><span class="p">(</span><span class="n">datagen</span><span class="p">.</span><span class="n">flow</span><span class="p">(</span><span class="n">optimized_processed_data</span><span class="p">.</span><span class="n">x_train</span><span class="p">,</span>
<span class="n">optimized_processed_data</span><span class="p">.</span><span class="n">y_train</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="n">batch_size</span><span class="p">),</span>
<span class="n">steps_per_epoch</span><span class="o">=</span><span class="n">optimized_processed_data</span><span class="p">.</span><span class="n">x_train</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">//</span> <span class="n">batch_size</span><span class="p">,</span> <span class="n">epochs</span><span class="o">=</span><span class="mi">250</span><span class="p">,</span>
<span class="n">verbose</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">validation_data</span><span class="o">=</span> <span class="p">(</span><span class="n">optimized_processed_data</span><span class="p">.</span><span class="n">x_valid</span><span class="p">,</span>
<span class="n">optimized_processed_data</span><span class="p">.</span><span class="n">y_valid</span><span class="p">),</span> <span class="n">callbacks</span><span class="o">=</span><span class="p">[</span><span class="n">checkpointer_optimized_cnn</span><span class="p">,</span>
<span class="n">LearningRateScheduler</span><span class="p">(</span><span class="n">learningrate_schedule</span><span class="p">),</span> <span class="n">EarlyStopping</span><span class="p">(</span><span class="n">min_delta</span><span class="o">=</span><span class="mf">0.001</span><span class="p">,</span>
<span class="n">patience</span><span class="o">=</span><span class="mi">40</span><span class="p">)])</span>
<span class="n">optimized_cnn</span><span class="p">.</span><span class="n">load_weights</span><span class="p">(</span><span class="n">optimized_cnn_path_best</span><span class="p">)</span>
<span class="n">score_optimized_cnn</span> <span class="o">=</span> <span class="n">optimized_cnn</span><span class="p">.</span><span class="n">evaluate</span><span class="p">(</span><span class="n">optimized_processed_data</span><span class="p">.</span><span class="n">x_test</span><span class="p">,</span>
<span class="n">optimized_processed_data</span><span class="p">.</span><span class="n">y_test</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Epoch 00001: val_loss improved from inf to 1.44708, saving model to saved_weights/optimized_cifar10_cnn_best.hdf5
Epoch 00002: val_loss improved from 1.44708 to 1.22333, saving model to saved_weights/optimized_cifar10_cnn_best.hdf5
Epoch 00003: val_loss improved from 1.22333 to 0.95801, saving model to saved_weights/optimized_cifar10_cnn_best.hdf5
................
Epoch 00223: val_loss did not improve from 0.39229
Epoch 00224: val_loss did not improve from 0.39229
Epoch 00225: val_loss did not improve from 0.39229
</code></pre></div></div>
<p>Our training takes place here. We define the batch size to be 64, and prepare our checkpoint to save the best weights. The fit function has been replaced by a fit_generator function, since we use our data generator. That’s also the reason why we use the datagen.flow function instead of simply presenting the datasets x and y values. Also notice the addition of a LearningRateScheduler (refering to our function learningrate_schedule) to the callbacks. You can imagine what’s happening there. The EarlyStopping callback stops our training if there was no improvement for 40 epochs. What can be seen as “improvement” is defined by the min_delta parameter. Look at the amount of epochs as well. Obviously this training will take a bit more time. I changed the verbose parameter to 0, so you will only see the result of each epoch.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">visualize_training</span><span class="p">(</span><span class="n">hist_optimized_cnn</span><span class="p">)</span>
</code></pre></div></div>
<p><img src="https://www.peculiar-coding-endeavours.com/assets/mlp_vs_cnn/optimized_cnn_hist_acc.png" alt="optimized_cnn_hist_acc" /></p>
<p><img src="https://www.peculiar-coding-endeavours.com/assets/mlp_vs_cnn/optimized_cnn_hist_loss.png" alt="optimized_cnn_hist_loss" /></p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">print</span><span class="p">(</span><span class="s">"Accuracy optimized cnn: {0:.2f}%"</span><span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="n">score_optimized_cnn</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">*</span> <span class="mi">100</span><span class="p">))</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Accuracy optimized cnn: 89.12%
</code></pre></div></div>
<p>And the moment of truth: printing our achieved degree of accuracy. I leave you with that, and I hope this sparked your eagerness to learn about this amazing topic even more.</p>
Thu, 12 Dec 2019 20:30:00 +0000
https://www.peculiar-coding-endeavours.com/2019/mlp_vs_cnn/
https://www.peculiar-coding-endeavours.com/2019/mlp_vs_cnn/artificial intelligencemachine learningdeep learningcomputer visionTechAIDeep learning