This tutorial is the third part of Martin N. P. Nilsson‘s “Quantum Computing for Beginners”. You can find the previous parts below. Find the original Arxiv article here. Here we discuss how you can implement a GPU in simulating a quantum computer. All code examples in this series of tutorials is licensed under the MIT license.

The tutorial was inspired by the “Quantum Autumn School” held in October 2023 and organised by ENCCS, Wallenberg Centre of Quantum Technologies (WACQT), and Nordic-Estonian Quantum Computing e-Infrastructure Quest (NordIQuEst). You can find the full recordings and training material of the Quantum Autumn School below

## A quantum computer simulator using GPUs

Looking at the code above, it’s clear that it would be beneficial to move workspace to a GPU. The operations performed on the workspace are perfect for the GPU, and there isn’t much com- munication needed between the CPU and the GPU.

We start by importing the PyTorch package. For resource requirements and installation instruc- tions, see the PyTorch documentation. PyTorch can also run on a CPU, so you don’t necessarily need an NVidia graphics card to try it out. The code to initialize PyTorch is simple:

```
import torch as pt
pt.autograd.set_grad_enabled(False) # disable autogradients
if pt.cuda.is_available(): # check if GPU is available
print("GPU available")
else:
print("Sorry, only CPU available")
```

Python`GPU available`

Below are the adjustments needed for the quantum computer simulator to use PyTorch, both on a CPU and a GPU with CUDA. The major difference from the previous code is that the workspace now lives entirely in the GPU’s memory. Consequently, many calls to numpy have been replaced with PyTorch calls. Only the lines whose comments begin with #! are new or changed.

```
def pushQubit(name,weights):
global workspace
global namestack
if (workspace.shape[0],workspace.shape[1]) == (1,1): #!
namestack = [] # reset if workspace empty
namestack.append(name)
weights = weights/np.linalg.norm(weights) # normalize
weights = pt.tensor(weights,device=workspace.device, #!
dtype=workspace[0,0].dtype) #!
workspace = pt.reshape(workspace,(1,-1)) #!
workspace = pt.kron(workspace,weights) #!
def tosQubit(name):
global workspace
global namestack
k = len(namestack)-namestack.index(name) # position of qubit
if k > 1: # if non-trivial
namestack.append(namestack.pop(-k))
workspace = pt.reshape(workspace,(-1,2,2**(k-1))) #!
workspace = pt.swapaxes(workspace,-2,-1) #!
def applyGate(gate,*names):
global workspace
if list(names) != namestack[-len(names):]: # reorder stack
for name in names: # if necessary
tosQubit(name)
workspace = pt.reshape(workspace,(-1,2**len(names))) #!
subworkspace = workspace[:,-gate.shape[0]:]
gate = pt.tensor(gate.T,device=workspace.device, #!
dtype=workspace[0,0].dtype) #!
if workspace.device.type == 'cuda': #!
pt.matmul(subworkspace,gate,out=subworkspace) #!
else: #! workaround for issue #114350 in torch.matmul
subworkspace[:,:]=pt.matmul(subworkspace,gate) #!
def probQubit(name): # Check probabilities
global workspace # of qubit being 0 or 1
tosQubit(name) # qubit to TOS
workspace = pt.reshape(workspace,(-1,2)) #! to 2 cols
prob = pt.linalg.norm(workspace,axis=0)**2 #! compute prob
prob = pt.Tensor.cpu(prob).numpy() #! convert to numpy
return prob/prob.sum() # make sure sum is one
def measureQubit(name): # Measure and pop qubit
global workspace
global namestack
prob = probQubit(name) # Compute probabilities
measurement = np.random.choice(2,p=prob) # 0 or 1
workspace = (workspace[:,[measurement]]/ # extract col
np.sqrt(prob[measurement]))
namestack.pop() # pop stacks
return measurement
```

PythonNow, we can test run Grover’s search with PyTorch on both the GPU and the CPU:

```
import time
workspace = pt.tensor([[1.]],device=pt.device('cuda'),
dtype=pt.float32)
t = time.process_time() # with GPU
groverSearch(16, printProb=False) # skip prob printouts
print("\nWith GPU:", time.process_time() - t, "s")
workspace = pt.tensor([[1.]],device=pt.device('cpu'),
dtype=pt.float32)
t = time.process_time() # with CPU
groverSearch(16, printProb=False) # skip prob printouts
print("\nWith CPU (single-core):", time.process_time() - t, "s")
```

Python```
1111111111111101
With GPU: 4.09375 s
1111111111111101
With CPU (single-core): 44.90625 s
```

With 25 qubits and using the float32 data type, the search takes 1200 seconds on my computer with an 8GB RTX2070 graphics card.

It doesn’t seem unlikely that the interface to a future QPU (‘Quantum Processing Unit’) will re- semble the interface to a GPU, perhaps by selecting `device=pt.device('qpu')`

.

## Conclusions

Alright, so you’ve got yourself a quantum computer, kind of. What’s next? There’s a bunch of stuff you can do to make it better. Take the implementation, for instance. Efficiency can be improved, especially in reducing how often you shuffle things around on the stack when it’s already almost set up for the next gate application. Imagine a debugger that shows you the odds of your qubits as you step through the code – wouldn’t that be neat?

Then, think about implementing an adder module using as few ancillary qubits as possible. What about multiplication? Or raising numbers to a power? Shor’s quantum algorithm for prime fac- torization needs modular exponentiation. Give that a whirl, or dive into other famous quantum algorithms. Like the Harrow–Hassidim–Loyd (HHL) algorithm solving linear equations, or the Quantum Approximate Optimization Algorithm (QAOA) for tackling combinatorial optimization problems. And hey, wouldn’t it be awesome to have a compiler that takes high-level concepts and compiles them into gate operations? The sky’s the limit!

As it turns out, a quantum computer is not really that complicated, but extracting results from the computations can be difficult due to all the physical restrictions. It may require quite clever algorithms, and it seems likely that in the future, general quantum algorithms from libraries will be used rather than developing a new quantum program for each domain-specific problem.

Quantum computers are currently programmed at the gate level, which can be painful. Probably, some specification or programming language at higher levels of abstraction will be developed for quantum computers, perhaps a language similar to VHDL.

Using a quantum computer is somewhat reminiscent of the situation with GPUs when they were new. One day, quantum computers may be used similarly.

It will take a while before we have practically useful quantum computers, as there are many techni- cal hurdles. I’m not holding my breath while waiting, but development is rapid, and breakthroughs can happen anytime. Meanwhile, it’s well worth getting into the technology and development of tools, for instance, by programming your own quantum computer simulator. I recommend the very well-written, concise, and freely available references [2-3].

## Appendix A: A Complete Implementation

Below is the complete code for a full implementation. The simulator consists of 40 lines of code and Grover’s search of 23 lines.

```
# DIY quantum computer simulator
# Martin Nilsson, RISE, 2023-11-26
import numpy as np
def pushQubit(name,weights):
global workspace
global namestack
if workspace.shape == (1,1): namestack = []
namestack.append(name)
weights = np.array(weights/np.linalg.norm(weights),
dtype=workspace[0,0].dtype)
workspace = np.kron(np.reshape(workspace,(1,-1)),weights)
def tosQubit(name):
global workspace
global namestack
k = len(namestack)-namestack.index(name)
if k > 1:
namestack.append(namestack.pop(-k))
workspace = np.reshape(workspace,(-1,2,2**(k-1)))
workspace = np.swapaxes(workspace,-2,-1)
def applyGate(gate,*names):
global workspace
if list(names) != namestack[-len(names):]:
[tosQubit(name) for name in names]
workspace = np.reshape(workspace,(-1,2**(len(names))))
subworkspace = workspace[:,-gate.shape[0]:]
np.matmul(subworkspace,gate.T,out=subworkspace)
def probQubit(name):
global workspace
tosQubit(name)
workspace = np.reshape(workspace,(-1,2))
prob = np.linalg.norm(workspace,axis=0)**2
return prob/prob.sum()
def measureQubit(name):
global workspace
global namestack
prob = probQubit(name)
measurement = np.random.choice(2,p=prob)
workspace = (workspace[:,[measurement]]/
np.sqrt(prob[measurement]))
namestack.pop()
return str(measurement)
# ---------- Grover search example
X_gate = np.array([[0, 1],[1, 0]])
H_gate = np.array([[1, 1],[1,-1]])*np.sqrt(1/2)
Z_gate = H_gate @ X_gate @ H_gate
def sample_phaseOracle(qubits):
applyGate(X_gate,qubits[1])
applyGate(Z_gate,*namestack)
applyGate(X_gate,qubits[1])
def zero_phaseOracle(qubits):
[applyGate(X_gate,q) for q in qubits]
applyGate(Z_gate,*namestack)
[applyGate(X_gate,q) for q in qubits]
def groverSearch(n, printProb=True):
qubits = list(range(n))
[pushQubit(q,[1,1]) for q in qubits]
for k in range(int(np.pi/4*np.sqrt(2**n)-1/2)):
sample_phaseOracle(qubits)
[applyGate(H_gate,q) for q in qubits]
zero_phaseOracle(qubits)
[applyGate(H_gate,q) for q in qubits]
if printProb: print(probQubit(qubits[0]))
[print(measureQubit(q),end="") for q in reversed(qubits)]
workspace = np.array([[1.]])
groverSearch(8)
```

Python```
[0.48449707 0.51550293]
[0.45445636 0.54554364]
[0.41174808 0.58825192]
[0.35903106 0.64096894]
[0.29958726 0.70041274]
[0.2371174 0.7628826]
[0.17551059 0.82448941]
[0.11860222 0.88139778]
[0.06993516 0.93006484]
[0.03253923 0.96746077]
[0.00874254 0.99125746]
[2.65827874e-05 9.99973417e-01]
11111101
```

## Appendix B: A Minimalist Quantum Computer

A complete implementation in just twelve lines, just to show it’s possible! Grover’s search here is 30 lines.

```
# Minimalist quantum computer
# Martin Nilsson, RISE, 2023-11-24
import numpy as np
def pushQubit(q,w):
return np.kron(np.reshape(w,(1,-1)),q)
def applyGate(g,w):
return np.matmul(np.reshape(w,(-1,g.shape[0])),g.T)
def tosQubit(k,w):
return np.swapaxes(np.reshape(w,(-1,2,2**(k-1))),-2,-1)
def measureQubit(w):
w = np.reshape(w,(-1,2))
p = np.linalg.norm(w,axis=0)
m = np.random.choice(2,p=p**2) return (m,w[:,[m]]/p[m])
# ---------- Grover search example
def sample_phaseOracle(w):
w = applyGate(X_gate,tosQubit(2,w))
w = applyGate(CZn_gate,tosQubit(2,w))
w = applyGate(X_gate,tosQubit(2,w))
return tosQubit(2,w)
def zero_phaseOracle(w):
for i in range(n):
w = applyGate(X_gate,tosQubit(n,w))
w = applyGate(CZn_gate,w)
for i in range(n):
w = applyGate(X_gate,tosQubit(n,w))
return w
def groverSearch(w):
for i in range(n):
w = pushQubit(H_gate@[1,0],w)
for k in range(int(np.pi/4*np.sqrt(2**n)-1/2)):
w = sample_phaseOracle(w)
for i in range(n):
w = applyGate(H_gate,tosQubit(n,w))
w = zero_phaseOracle(w)
for i in range(n):
w = applyGate(H_gate,tosQubit(n,w))
for i in range(n):
(m,w) = measureQubit(tosQubit(n-i,w))
print(m,end="")
n = 10
X_gate = np.array([[0, 1],[1, 0]])
H_gate = np.array([[1, 1],[1,-1]])*np.sqrt(1/2)
CZn_gate = np.diag(list(reversed(2*np.sign(range(2**n))-1)))
groverSearch(np.array([[1.]]))
```

Python`1111111101`