Program your own quantum computer – Part 3 – GPUs

This tutorial is the third part of Martin N. P. Nilsson's "Quantum Computing for Beginners". You can find the previous parts below. Find the original Arxiv article here. Here we discuss how you can implement a GPU in simulating a quantum computer. All code examples in this series of tutorials is licensed under the MIT license.

The tutorial was inspired by the "Quantum Autumn School" held in October 2023 and organised by ENCCS, Wallenberg Centre of Quantum Technologies (WACQT), and Nordic-Estonian Quantum Computing e-Infrastructure Quest (NordIQuEst). You can find the full recordings and training material of the Quantum Autumn School below


A quantum computer simulator using GPUs

Looking at the code above, it’s clear that it would be beneficial to move workspace to a GPU. The operations performed on the workspace are perfect for the GPU, and there isn’t much com- munication needed between the CPU and the GPU.

We start by importing the PyTorch package. For resource requirements and installation instruc- tions, see the PyTorch documentation. PyTorch can also run on a CPU, so you don’t necessarily need an NVidia graphics card to try it out. The code to initialize PyTorch is simple:

import torch as pt
pt.autograd.set_grad_enabled(False)      # disable autogradients
if pt.cuda.is_available():               # check if GPU is available
    print("GPU available") 
else:
    print("Sorry, only CPU available")
Python
GPU available

Below are the adjustments needed for the quantum computer simulator to use PyTorch, both on a CPU and a GPU with CUDA. The major difference from the previous code is that the workspace now lives entirely in the GPU’s memory. Consequently, many calls to numpy have been replaced with PyTorch calls. Only the lines whose comments begin with #! are new or changed.

def pushQubit(name,weights):
    global workspace
    global namestack
    if (workspace.shape[0],workspace.shape[1]) == (1,1): #!
        namestack = []                                   # reset if workspace empty 
    namestack.append(name)
    weights = weights/np.linalg.norm(weights)            # normalize 
    weights = pt.tensor(weights,device=workspace.device, #!
                        dtype=workspace[0,0].dtype)      #! 
    workspace = pt.reshape(workspace,(1,-1))             #! 
    workspace = pt.kron(workspace,weights)               #!
    
def tosQubit(name):
    global workspace
    global namestack
    k = len(namestack)-namestack.index(name)                  # position of qubit 
    if k > 1:                                                 # if non-trivial
        namestack.append(namestack.pop(-k))
        workspace = pt.reshape(workspace,(-1,2,2**(k-1)))     #! 
        workspace = pt.swapaxes(workspace,-2,-1)              #!
        
def applyGate(gate,*names):
    global workspace
    if list(names) != namestack[-len(names):]:                # reorder stack
        for name in names:                                    # if necessary 
            tosQubit(name)
    workspace = pt.reshape(workspace,(-1,2**len(names)))      #!
    subworkspace = workspace[:,-gate.shape[0]:]
    gate = pt.tensor(gate.T,device=workspace.device,          #! 
                     dtype=workspace[0,0].dtype)              #! 
    if workspace.device.type == 'cuda':                       #! 
        pt.matmul(subworkspace,gate,out=subworkspace)         #!
    else:    #! workaround for issue #114350 in torch.matmul 
        subworkspace[:,:]=pt.matmul(subworkspace,gate) #!
        
def probQubit(name):                             # Check probabilities
    global workspace                             # of qubit being 0 or 1
    tosQubit(name)                               # qubit to TOS
    workspace = pt.reshape(workspace,(-1,2))     #! to 2 cols
    prob = pt.linalg.norm(workspace,axis=0)**2   #! compute prob 
    prob = pt.Tensor.cpu(prob).numpy()           #! convert to numpy
    return prob/prob.sum()                       # make sure sum is one
    
def measureQubit(name):                          # Measure and pop qubit
    global workspace
    global namestack
    prob = probQubit(name)                      # Compute probabilities
    measurement = np.random.choice(2,p=prob)    # 0 or 1 
    workspace = (workspace[:,[measurement]]/    # extract col
                 np.sqrt(prob[measurement])) 
    namestack.pop()                             # pop stacks
    return measurement
    
Python

Now, we can test run Grover’s search with PyTorch on both the GPU and the CPU:

import time
workspace = pt.tensor([[1.]],device=pt.device('cuda'),
                             dtype=pt.float32) 
t = time.process_time()                               # with GPU
groverSearch(16, printProb=False)                     # skip prob printouts 
print("\nWith GPU:", time.process_time() - t, "s")
workspace = pt.tensor([[1.]],device=pt.device('cpu'),
                             dtype=pt.float32)
t = time.process_time()                               # with CPU
groverSearch(16, printProb=False)                     # skip prob printouts 
print("\nWith CPU (single-core):", time.process_time() - t, "s")
Python
1111111111111101
With GPU: 4.09375 s
1111111111111101
With CPU (single-core): 44.90625 s

With 25 qubits and using the float32 data type, the search takes 1200 seconds on my computer with an 8GB RTX2070 graphics card.

It doesn’t seem unlikely that the interface to a future QPU (‘Quantum Processing Unit’) will re- semble the interface to a GPU, perhaps by selecting device=pt.device('qpu').

Conclusions

Alright, so you’ve got yourself a quantum computer, kind of. What’s next? There’s a bunch of stuff you can do to make it better. Take the implementation, for instance. Efficiency can be improved, especially in reducing how often you shuffle things around on the stack when it’s already almost set up for the next gate application. Imagine a debugger that shows you the odds of your qubits as you step through the code - wouldn’t that be neat?

Then, think about implementing an adder module using as few ancillary qubits as possible. What about multiplication? Or raising numbers to a power? Shor’s quantum algorithm for prime fac- torization needs modular exponentiation. Give that a whirl, or dive into other famous quantum algorithms. Like the Harrow–Hassidim–Loyd (HHL) algorithm solving linear equations, or the Quantum Approximate Optimization Algorithm (QAOA) for tackling combinatorial optimization problems. And hey, wouldn’t it be awesome to have a compiler that takes high-level concepts and compiles them into gate operations? The sky’s the limit!

As it turns out, a quantum computer is not really that complicated, but extracting results from the computations can be difficult due to all the physical restrictions. It may require quite clever algorithms, and it seems likely that in the future, general quantum algorithms from libraries will be used rather than developing a new quantum program for each domain-specific problem.

Quantum computers are currently programmed at the gate level, which can be painful. Probably, some specification or programming language at higher levels of abstraction will be developed for quantum computers, perhaps a language similar to VHDL.

Using a quantum computer is somewhat reminiscent of the situation with GPUs when they were new. One day, quantum computers may be used similarly.

It will take a while before we have practically useful quantum computers, as there are many techni- cal hurdles. I’m not holding my breath while waiting, but development is rapid, and breakthroughs can happen anytime. Meanwhile, it’s well worth getting into the technology and development of tools, for instance, by programming your own quantum computer simulator. I recommend the very well-written, concise, and freely available references [2-3].

Appendix A: A Complete Implementation

Below is the complete code for a full implementation. The simulator consists of 40 lines of code and Grover’s search of 23 lines.

# DIY quantum computer simulator
# Martin Nilsson, RISE, 2023-11-26

import numpy as np

def pushQubit(name,weights):
    global workspace
    global namestack
    if workspace.shape == (1,1): namestack = [] 
    namestack.append(name)
    weights = np.array(weights/np.linalg.norm(weights),
                       dtype=workspace[0,0].dtype)
    workspace = np.kron(np.reshape(workspace,(1,-1)),weights)

def tosQubit(name):
    global workspace
    global namestack
    k = len(namestack)-namestack.index(name) 
    if k > 1:
        namestack.append(namestack.pop(-k))
        workspace = np.reshape(workspace,(-1,2,2**(k-1)))
        workspace = np.swapaxes(workspace,-2,-1)

def applyGate(gate,*names): 
    global workspace
    if list(names) != namestack[-len(names):]: 
        [tosQubit(name) for name in names]
    workspace = np.reshape(workspace,(-1,2**(len(names))))
    subworkspace = workspace[:,-gate.shape[0]:]
    np.matmul(subworkspace,gate.T,out=subworkspace)
    
def probQubit(name):
    global workspace
    tosQubit(name)
    workspace = np.reshape(workspace,(-1,2)) 
    prob = np.linalg.norm(workspace,axis=0)**2 
    return prob/prob.sum()
    
def measureQubit(name):
    global workspace
    global namestack
    prob = probQubit(name)
    measurement = np.random.choice(2,p=prob) 
    workspace = (workspace[:,[measurement]]/
                 np.sqrt(prob[measurement]))
    namestack.pop()
    return str(measurement)
    
# ---------- Grover search example

X_gate = np.array([[0, 1],[1, 0]])
H_gate = np.array([[1, 1],[1,-1]])*np.sqrt(1/2)
Z_gate = H_gate @ X_gate @ H_gate

def sample_phaseOracle(qubits): 
    applyGate(X_gate,qubits[1]) 
    applyGate(Z_gate,*namestack) 
    applyGate(X_gate,qubits[1])

def zero_phaseOracle(qubits): 
[applyGate(X_gate,q) for q in qubits] 
applyGate(Z_gate,*namestack) 
[applyGate(X_gate,q) for q in qubits]

def groverSearch(n, printProb=True):
    qubits = list(range(n))
    [pushQubit(q,[1,1]) for q in qubits]
    for k in range(int(np.pi/4*np.sqrt(2**n)-1/2)):
        sample_phaseOracle(qubits)
        [applyGate(H_gate,q) for q in qubits] 
        zero_phaseOracle(qubits) 
        [applyGate(H_gate,q) for q in qubits]
        if printProb: print(probQubit(qubits[0]))
    [print(measureQubit(q),end="") for q in reversed(qubits)]

workspace = np.array([[1.]])
groverSearch(8)
Python
[0.48449707 0.51550293]
[0.45445636 0.54554364]
[0.41174808 0.58825192]
[0.35903106 0.64096894]
[0.29958726 0.70041274]
[0.2371174 0.7628826]
[0.17551059 0.82448941]
[0.11860222 0.88139778]
[0.06993516 0.93006484]
[0.03253923 0.96746077]
[0.00874254 0.99125746]
[2.65827874e-05 9.99973417e-01]
11111101

Appendix B: A Minimalist Quantum Computer

A complete implementation in just twelve lines, just to show it’s possible! Grover’s search here is 30 lines.

# Minimalist quantum computer
# Martin Nilsson, RISE, 2023-11-24

import numpy as np 

def pushQubit(q,w):
    return np.kron(np.reshape(w,(1,-1)),q) 

def applyGate(g,w):
    return np.matmul(np.reshape(w,(-1,g.shape[0])),g.T) 

def tosQubit(k,w):
    return np.swapaxes(np.reshape(w,(-1,2,2**(k-1))),-2,-1)

def measureQubit(w):
    w = np.reshape(w,(-1,2))
    p = np.linalg.norm(w,axis=0)
    m = np.random.choice(2,p=p**2) return (m,w[:,[m]]/p[m])
    
# ---------- Grover search example

def sample_phaseOracle(w):
    w = applyGate(X_gate,tosQubit(2,w))
    w = applyGate(CZn_gate,tosQubit(2,w)) 
    w = applyGate(X_gate,tosQubit(2,w)) 
    return tosQubit(2,w)
    
def zero_phaseOracle(w):
    for i in range(n):
        w = applyGate(X_gate,tosQubit(n,w)) 
    w = applyGate(CZn_gate,w)
    for i in range(n):
        w = applyGate(X_gate,tosQubit(n,w)) 
    return w
    
def groverSearch(w): 
    for i in range(n):
        w = pushQubit(H_gate@[1,0],w)
    for k in range(int(np.pi/4*np.sqrt(2**n)-1/2)):
        w = sample_phaseOracle(w) 
        for i in range(n):
            w = applyGate(H_gate,tosQubit(n,w)) 
        w = zero_phaseOracle(w)
        for i in range(n):
            w = applyGate(H_gate,tosQubit(n,w)) 
    for i in range(n):
        (m,w) = measureQubit(tosQubit(n-i,w))
        print(m,end="")
        
n = 10
X_gate = np.array([[0, 1],[1, 0]])
H_gate = np.array([[1, 1],[1,-1]])*np.sqrt(1/2)
CZn_gate = np.diag(list(reversed(2*np.sign(range(2**n))-1)))

groverSearch(np.array([[1.]]))
Python
1111111101

Categories: