An Introduction to Quantum Machine Learning
Lie Algebra

An Introduction to Quantum Machine Learning

https://github.com/JohnCook17/Quantum_Machine_Learning


There are several things someone can mean when they say quantum machine learning. They can mean machine learning that involves the study of quantum mechanics. They can mean machine learning that uses quantum mechanics to optimize machine learning using a process called quantum annealing. Or They can mean using quantum properties to encode and predict information. Or any combination of the above methods. While I am most familiar with using quantum properties to predict information, I will also discuss quantum annealing to some extent. I have read an analogy that quantum annealing is best used for data sets that are a barren plateau with crags, while traditional machine learning is better suited for green valleys. 

No alt text provided for this image
No alt text provided for this image


This means that if the data is plentiful and distinct like handwriting use traditional machine learning optimization methods. However if the data is barren or indistinct you may want to look into quantum annealing. Such a data set might be if you are trying to find a planet based on it passing in front of a star, there is not much change there but it's there. Quantum annealing works by using quantum tunneling to jump through the canyon walls and find the lowest point. Traditional machine learning follows the slopes. Now onto the main focus of this post, quantum based prediction.

There are several ways to do this but the best one that I personally know of is described in this paper, I am going to describe the contents of this paper in simpler terms, so hopefully anyone with a basic understanding of machine learning can understand how it works.

Before I get ahead of myself I should explain what a qubit is. A qubit is a quantum particle that can be used to encode and represent information. A qubit can behave like a normal bit, and be 0 or 1, but it can also be both. This means that each qubit can be viewed as holding 2 bits in a sense. Another unique property of qubits is that they can be entangled. This is vital to our machine learning model. Entanglement is a very complicated subject but for our purposes it allows us to obtain equal probabilities.

This model that I tried to reproduce uses a quNit, a quNit is like a byte is to bits, however a quNit can have N units in it. However the larger the N the more unstable it becomes and this allows for more noise to creep into the experiment. The current upper limit according to the papers authors lies around 27, which for most cases is enough classes to do quite a lot with. The quNit is used to encode our data, in this case images of handwriting, the MNIST data set. The quNit encodes our data into Hilbert space. Hilbert space is a high dimensional vector space. What does this mean? Well put simply it encodes the data into a dimensional space that allows for the features to be more easily extracted. Confusing I know, but I would watch this part of a video by 3Blue1Brown, to better understand how vectors in space can make up complex things:


Put more simply it compares the smallest of details to each class as it learns and classifies based on this, it is like using a magnifying glass to see better, kind of. This is one of the hardest concepts to wrap your head around in this project.

A unique feature of quNits is that a single quNit has a feature called a Special Unitary Group, or SU(N), that is kind of like a map to this high dimensional vector space. According to wikipedia a SU(N) group is: “In mathematics, the special unitary group of degree n, denoted SU(n), is the Lie group of n × n unitary matrices with determinant 1.” An important concept to know is that there is not just one SU(N) but many, possibly infinite SU(N)’s that will satisfy this condition. Think of it as directions to get to a destination, there might be a most efficient way to get somewhere, but if you get there in a timely manner does it matter how you got there from a directional standpoint? The particular map or in more technical terms the algorithm that is used to represent our data we want to classify has another feature called Lie Algebra, again there are infinite versions of the Lie Algebra that will get the job done. Lie Algebra is defined by Wikipedia as: “The Lie algebra of SU(N) consists of a NxN skew-Hermitian matrices with trace zero” This is also a complex idea to wrap you head around but think of the Lie Algebra as the route you want to take on the map. What this does to the quNit is actually rotating it by angles first around the Z axis, then the Y axis then the Z axis again, what does this accomplish? Well I like to think of it in similar terms to solving a Rubik’s Cube, you have to rotate it about the axis in a certain way to solve it, the idea is similar here. If you look at the formulas provided in the paper I am referencing you may start to notice some things. Firstly there is an Alpha term, according to the paper this term is learnable and it scales the Lie Algebra, and in fact acts in a similar way to biases in traditional machine learning. Whereas the weights are in fact the Z term used while encoding the data we want to evaluate into hilbert space. This is all very complicated but if you look at the math and understand that the weights are encoded into the Hilbert space and so are the biases this produces a vector in the space that can give a more generalized description of our data classes. Going back to the map analogy it the Z term takes into account the topography and things like how fast you can ascend or descend a slope, whereas the Alpha term acts like finding a quicker route between points. This is not a perfect analogy but it is hard to think in high dimensional spaces. 

We call this mapping of our data to the Hilbert space Ket X, why Ket X? Cause in quantum physics vectors are represented in Dirac notation, and Driac thought it was clever to call the components of vectors Bra and Ket, forming the word braket together. The Bra part is the complex numbers of the vector space and the Ket part is the real numbers. Complex numbers in my mind is a poor name, but better than imaginary numbers, as in the simplest way to think about complex numbers is as two dimensional numbers. Instead of a number line you have a number plane. This number plane is perfect for representing vectors so we use Bra Ket notation. We take Ket X, or vector representation of our data and we do a quantum measurement on it, this quantum measurement takes the many representations of each class the vector could be and produces the most probable one. After this we calculate the error using formulas 5 and 6 provided in the paper and then use traditional machine learning optimization such as the Adam optimizer to correct the weights and biases or the terms Z and Alpha. This is one pass through the whole quantum machine learning model. The other unique feature of this model is that it uses a process called “Single Shot Training” this means it takes all representations of a class and teaches the model all of them at once as opposed to one at a time in random sequence. This speeds things up significantly.

Now that we have the premise of the paper I was trying to replicate down we can move onto how I attempted to implement it. I decided to use a jupyter notebook as this would allow me to see the outputs of each step along the way and allow me to have check points in my code so that I did not have to run everything over and over. I also was unable to calculate the SU(N) myself where N was higher then 2, so I had to use the special orthogonal group of n or SO(N) of which SU(N) is part of, I believe that it could still work using this but will not be as efficient or accurate. As of the writing of this post I was able to get the training to produce promising results but the prediction part does not work. Also I attempted to use Qiskit’s quantum backend but my circuit architecture is incompatible with the quantum computers that I have access to so I have to use a local simulator as my back end. However there is no reason that my architecture would not work that I am aware of.

As mentioned above I used Qiskit for my quantum computations, and I used numpy for the math. I use tensorflow’s to categorical, to determine the number of classes but this is the only thing I use it for. I decided to keep it as low level as possible so that I learned as much as possible. However as mentioned I could not calculate the SU(N) and used the module Geomstats to calculate the SO(N), this should get the job done but it's not as clean. I used an Adam optimizer I found here and modified it to isolate the results per class as I recombine them later in the process. This allows me to see if there is any confusion among classes like if 3’s look like 8’s.

After importing all of my modules I prepare the data, I split the data into train and test sets and sort the training data, for the single shot training method used. I use matplotlib to plot a few images to verify that the data is properly labeled. I then implement the Adam optimizer class. After this is the primary code, the class Forward_and_backward. This is the main body of the code.

In the Forward_and_backward class I first initialize all the variables used by many of the classes, although I initialize many variables as needed throughout the code, I would like to clean this up at some point but for now it will have to do.Next is the H method, H is short for Hadamard gate, a hadamard gate is used to initialize the quantum states of qubits so that they have equal probabilities. My H method does this by entanglement N qubits using the map_circuit sub method. The H method then prepares the quantum circuits as stated above and gets them ready for the quNit.The quNit method further prepaires the quantum circuits and transforms them into a quNit. It also applies the weights and encodes the data into a quantum vector. The A method is the A(k) term is formula 3 of the research paper I am using to implement the code. It is used by H to add the alpha term and the Lie Algebra to the quantum circuit, this behaves similarly to biases. Next is the SU_of_N method, here is where things get a little shaky, as I have said I was unable to calculate a true SU(N) on my own and ended up using the Geomstats module for it. Next I calculated the Lie Algebra in the lie_algebra method. I am not sure that my implementation of Lie Algebra is correct but it seems work for the training at least, however there may be errors in it. The forward method does just that it takes the forward step, and uses all the above methods to encode the data into a quantum vector state. Next is the backward method it does the back propagation and cost functions. I deviated from the cost function of the paper to some extent, but is should work. I did this due to the fact that I wanted to see if I could improve upon the original but I do not yet know if it is better or worse as I have not had time to sufficiently test it. The backward function takes Ket X and turns it into a measurement of accuracy per class. The training method is used to train the model on the data and the predict method calls the training method with the optimizer off and a few other variables set off so that it can make predictions. The predictions are currently not working but I think all I have to do is train for more then 3 iterations and sum the weights of all the classes into a master weight for faster predictions. As it stands my implementation of this model trains properly I believe but I still need to work on the predictions method.

I have learned a lot from this project. I did not know any math used in quantum mechanics before starting this project. However I did have a simple understanding of the basic concepts. I still need to do some work on it and graph my findings and determine the approximate accuracy once I get it working but for now as it stands it comes close but it does not work. I also think I did admirably well considering that it is very hard to calculate SU(N) in high dimensions, and Lie algebra in general is hard to do. I think once I have had some time I will come back to this project and make it work. I also learned a lot about how jupyter notebooks work and cross platform development in anaconda. This project has taught me quite a lot and I can not wait to see where quantum computing goes especially quantum machine learning. Ultimately I believe that quantum machine learning will be essential to creating true artificial intelligence instead of simulating intelligence.


This is the future and the number one disruptor going forward

Like
Reply

To view or add a comment, sign in

More articles by John Cook

  • Optical Character Recognition for white boards

    I wrote a program to recognize handwriting on a whiteboard, and translate it to digital text. This is achieved through…

    1 Comment

Others also viewed

Explore content categories