Table of Contents
QuEST can be integrated into your C or C++ project, simply by including
#include <QuEST.h>
Your simulation code will look the same and compile with the same build system, regardless of whether run in multithreaded, GPU and distributed modes.
For example, here is a platform agnostic simulation of a very simple circuit which produces and measures state
#include <QuEST.h>
int main() {
// load QuEST
QuESTEnv env = createQuESTEnv();
// create a 2 qubit register in the zero state
Qureg qubits = createQureg(2, env);
initZeroState(qubits);
// apply circuit
hadamard(qubits, 0);
controlledNot(qubits, 0, 1);
measure(qubits, 1);
// unload QuEST
destroyQureg(qubits, env);
destroyQuESTEnv(env);
return 0;
}
Of course, this code doesn't output anything!
Let's walk through a more sophisticated circuit.
We first construct a QuEST environment with createQuESTEnv()
which abstracts away any preparation of multithreading, distribution or GPU-acceleration strategies.
QuESTEnv env = createQuESTEnv();
We then create a quantum register, in this case containing 3 qubits, via createQureg()
Qureg qubits = createQureg(3, env);
and initialise the register.
initZeroState(qubits);
We can create multiple Qureg
instances, and QuEST will sort out allocating memory for the state-vectors, even over networks! If we wanted to simulate noise in our circuit, we can replace createQureg
with createDensityQureg
to create a more powerful density matrix capable of representing mixed states, and simulating decoherence.
We're now ready to apply some unitaries to our qubits, which in this case have indices 0
, 1
and 2
.
When applying an operator, we pass along which quantum register to operate upon.
hadamard(qubits, 0);
controlledNot(qubits, 0, 1);
rotateY(qubits, 2, .1);
Some gates allow us to specify a general number of control qubits
int controls[] = {0, 1, 2};
multiControlledPhaseGate(qubits, controls, 3);
We can specify general single-qubit unitary operations as 2x2 matrices
// sqrt(X) with a pi/4 global phase
ComplexMatrix2 u = {
.real = {{.5, .5}, { .5,.5}},
.imag = {{.5,-.5}, {-.5,.5}}};
unitary(qubits, 0, u);
or more compactly, foregoing the global phase factor,
Complex a = {.real = .5, .imag = .5};
Complex b = {.real = .5, .imag =-.5};
compactUnitary(qubits, 1, a, b);
or even more compactly, as a rotation around an arbitrary axis on the Bloch-sphere
Vector v = {.x=1, .y=0, .z=0};
rotateAroundAxis(qubits, 2, 3.14/2, v);
We can controlled-apply general unitaries
controlledCompactUnitary(qubits, 0, 1, a, b);
even with multiple control qubits!
multiControlledUnitary(qubits, (int[]) {0, 1}, 2, 2, u);
There are many questions and calculations we can now ask of our quantum register.
qreal prob = getProbAmp(qubits, 7);
printf("Probability amplitude of |111>: %lf\n", prob);
Here, qreal
is an alias for a real floating point number, like double
. This is to keep our code precision agnostic, so that we may change the numerical precision at compile time (by setting build option PRECISION
) without any changes to our code. Changing the precision can be useful in verifying numerical convergences or studying rounding errors.
How probable is measuring our final qubit (with index 2
) in outcome 1
?
prob = calcProbOfOutcome(qubits, 2, 1);
printf("Probability of qubit 2 being in state 1: %f\n", prob);
We can also perform non-unitary gates upon the state. Let's destructively measure the first qubit, randomly collapsing into outcome 0
or 1
int outcome = measure(qubits, 0);
printf("Qubit 0 was measured in state %d\n", outcome);
and now measure our final qubit, while also learning of the probability of its outcome.
outcome = measureWithStats(qubits, 2, &prob);
printf("Qubit 2 collapsed to %d with probability %f\n", outcome, prob);
We could even apply non-physical operators to our register, to break its normalisation, which can often allow us to take computational shortcuts like this one.
At the conclusion of our circuit, we should free up the memory used by our quantum registers.
destroyQureg(qubits, env);
destroyQuESTEnv(env);
The effect of the code above is to simulate the circuit below
and after compiling (see section below) and running, gives psuedo-random output
Probability amplitude of |111>: 0.498751 Probability of qubit 2 being in state 1: 0.749178 Qubit 0 was measured in state 1 Qubit 2 collapsed to 1 with probability 0.998752
Probability amplitude of |111>: 0.498751 Probability of qubit 2 being in state 1: 0.749178 Qubit 0 was measured in state 0 Qubit 2 collapsed to 1 with probability 0.499604
QuEST uses the Mersenne Twister algorithm to generate random numbers used for randomly collapsing quantum states. The user can seed this RNG using seedQuEST()
, otherwise QuEST will by default create a seed from the current time and the process id.
In distributed mode (see below), all code in your source files will be executed independently on every node. To execute some code (e.g. printing) only on one node, use
QuESTEnv env = createQuESTEnv(); if (env.rank == 0) printf("Only one node executes this print!");Such conditions are valid and always satisfied in code run on a single node.
See this page to obtain the necessary compilers.
QuEST uses CMake (version 3.7
or higher) as its build system. Configure the build by supplying the below -D[VAR=VALUE]
options after the cmake ..
command. You can alternatively compile via GNU Make directly with the provided makefile.
Windows users should install CMake and Build Tools, and run the below commands in the Developer Command Prompt for VS
To compile, run:
mkdir build
cd build
cmake .. -DUSER_SOURCE="[FILENAME]"
make
where [FILENAME]
is the name of your source file, including the file extension, relative to the root QuEST directory (above build
).
Windows users should replace the final two build commands with
cmake .. -G "NMake Makefiles" nmake
If using MSVC and NMake in this way fails, users can forego GPU acceleration, download MinGW-w64, and compile via
cmake .. -G "MinGW Makefiles" make
Compiling directly with
make
and the provided makefile, copied to the root directory, may prove easier.
If your project contains multiple source files, separate them with semi-colons. For example,
-DUSER_SOURCE="source1.c;source2.cpp"
-
To set the compilers used by cmake (to e.g.
gcc-6
), use-DCMAKE_C_COMPILER=gcc-6
and similarly to set the C++ compiler (as used in GPU mode), use
-DCMAKE_CXX_COMPILER=g++-6
-
If you wish your executable to be named something other than
demo
, you can set this too by adding argument:-DOUTPUT_EXE="myExecutable"
-
To compile your code to use multithreading, for parallelism on multi-core or multi-CPU systems, use
-DMULTITHREADED=1
Before launching your executable, set the number of participating threads using
OMP_NUM_THREADS
. For example,export OMP_NUM_THREADS=16 ./myExecutable
-
To compile your code to run on distributed or networked systems use
-DDISTRIBUTED=1
Depending on your MPI implementation, your executable can be launched via
mpirun -np [NUM_NODES] [EXEC]
where
[NUM_NODES]
is the number of distributed compute nodes to use, and[EXEC]
is the name of your executable. Note that QuEST hybridises multithreading and distribution. Hence you should set[NUM_NODES]
to equal exactly the number of distinct compute nodes (which don't share memory), and setOMP_NUM_THREADS
as above to assign the number of threads used on each compute node. -
To compile for GPU, use
-DGPUACCELERATED=1 -DGPU_COMPUTE_CAPABILITY=[CC]
where
[CC]
is the compute cabability of your GPU, written without a decimal point. This can can be looked up at the NVIDIA website, and to check you have selected the right one, you should run the unit tests.Note that CUDA is not compatible with all compilers. To force
cmake
to use a compatible compiler, overrideCMAKE_C_COMPILER
andCMAKE_CXX_COMPILER
.
For example, to compile for the Quadro P6000 withgcc-6
:cmake .. -DGPUACCELERATED=1 -DGPU_COMPUTE_CAPABILITY=61 \ -DCMAKE_C_COMPILER=gcc-6 -DCMAKE_CXX_COMPILER=g++-6
QuEST can also leverage NVIDIA's cuQuantum and Thrust libraries for optimised GPU simulation on modern GPUs. You must first install cuQuantum (which includes sub-library
cuStateVec
used by QuEST) here. When compiling QuEST, in addition to the above compiler options, simply specify-DUSE_CUQUANTUM=1
QuEST can also run on AMD GPUs using HIP. For the HIP documentation see HIP programming guide. To compile for AMD GPUs, use
-DGPUACCELERATED=1 -DUSE_HIP=1 -DGPU_ARCH=[ARCH]
where
[ARCH]
is the architecture of your GPU, for examplegfx90a
. A table for AMD GPU architectures can be looked up here. To check you have used the correctGPU_ARCH
, you should run the unit tests. -
You can additionally customise the floating point precision used by QuEST's
qreal
type, via-DPRECISION=1 -DPRECISION=2 -DPRECISION=4
which uses single (
qreal = float
), double (qreal = double
) and quad (qreal = long double
) respectively. Using greater precision means more precise computation but at the expense of additional memory requirements and runtime. Checking results are unchanged when switching the precision can be a great test that your calculations are sufficiently precise.
After making changes to your code, you can quickly recompile using make
directly, within the build/
directory.
For a full list of available configuration parameters, use
cmake -LH ..
For manual configuration (not recommended) you can change the CMakeLists.txt
in the root QuEST directory. You can also directly modify makefile, and compile using GNUMake directly, by copying makefile into the root repository directory and running
make
Once compiled as above, the compiled executable can be locally run from within the build
directory.
./myExecutable
-
In multithreaded mode, the number of threads QuEST will use can be set by modifying
OMP_NUM_THREADS
, ideally to the number of available cores on your machineexport OMP_NUM_THREADS=8 ./myExecutable
-
In distributed mode, QuEST will uniformly divide every
Qureg
between a power-of-2 number of nodes, and can be launched withmpirun
. For example, here using8
nodesmpirun -np 8 ./myExecutable
If multithreading is also enabled, the number of threads used by each node can be set using
OMP_NUM_THREADS
. For example, here using8
nodes with16
threads on each (a total of128
processors):export OMP_NUM_THREADS=16 mpirun -np 8 ./myExecutable
In some circumstances, like when large-memory multi-core nodes have multiple CPU sockets, it is worthwhile to deploy multiple MPI processes to each node.
-
In GPU mode, the executable is launched directly via
./myExecutable
There are no special requirements for running QuEST through job submission systems, like SLURM. Just call ./myExecutable
as you would any other binary.
For example, the tutorial code can be run with on 4
distributed nodes (each with 8
cores) on a SLURM system using the following SLURM submission script
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=1
module load mvapich2
mkdir build
cd build
cmake .. -DDISTRIBUTED=1 -DMULTITHREADED=1
make
export OMP_NUM_THREADS=8
mpirun ./myExecutable
A PBS submission script like is similar
#PBS -l select=4:ncpus=8
module purge
module load mvapich2
mkdir build
cd build
cmake -DDISTRIBUTED=1 ..
make
export OMP_NUM_THREADS=8
aprun -n 4 -d 8 -cc numa_node ./myExecutable
Running QuEST on a GPU is just a matter of specifying resources and the appropriate compilers
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --gres=gpu:1
#SBATCH --partition=gpu ## name may vary
module purge
module load cuda ## name may vary
mkdir build
cd build
cmake -DGPUACCELERATED=1 -DGPU_COMPUTE_CAPABILITY=[Compute capability] ..
make
./myExecutable
On each platform, there is no change to our source code or our QuEST interface. We simply recompile, and QuEST will utilise the available hardware (a GPU, shared-memory or distributed CPUs) to speedup our code.
QuEST includes a comprehensive set of unit tests, to assure every function performs correctly. These are located in the tests directory (documented here), and compare QuEST's optimised routines to slower, algorithmically distinct methods (documented here). It is a good idea to run these tests on your machine to check QuEST is properly configured, and especially so in GPU mode, to check you have correctly set GPU_COMPUTE_CAPABILITY
.
Tests should be compiled in a build directory within the root QuEST directory.
mkdir build
cd build
To compile, run:
cmake .. -DTESTING=ON
make
You can include additional CMake arguments to target your desired hardware, such as -DDISTRIBUTION=1
.
Next, to launch all unit tests, run:
make test
You should see each function being tested in turn; some will be very fast, and some very slow.
This is because the tests run functions with every one of their possible inputs (where possible). Functions with more possible inputs will hence take longer to test. The difference in testing time between different functions can hence be very large, and does not indicate a testing nor performance problem.
For example:
Start 1: calcDensityInnerProduct
1/117 Test #1: calcDensityInnerProduct ............. Passed 0.16 sec
Start 2: calcExpecDiagonalOp
2/117 Test #2: calcExpecDiagonalOp ................. Passed 0.07 sec
Start 3: calcExpecPauliHamil
3/117 Test #3: calcExpecPauliHamil ................. Passed 0.64 sec
Start 4: calcExpecPauliProd
4/117 Test #4: calcExpecPauliProd .................. Passed 94.88 sec
You can also run the executable build/tests/tests
directly, to see more statistics, and to make use of the Catch2 command-line
./tests/tests
===============================================================================
All tests passed (99700 assertions in 117 test cases)
This is necessary to run the tests in distributed mode:
mpirun -np 8 tests/tests
Using the command-line is especially useful for contributors to QuEST, for example to run only their new function:
./tests/tests myNewFunction
or a sub-test within:
./tests/tests myNewFunction -c "correctness" -c "density-matrix" -c "unnormalised"
Ideally, a new function should have its unit test run in every configuration of hardware (including #threads and #nodes) and precision. The below bash script automates this.
export f=myNewFunction # function to test
export cc=30 # GPU compute-capability
export nt=16 # number of CPU threads
test() {
cmake .. -DTESTING=ON -DPRECISION=$p \
-DMULTITHREADED=$mt -DDISTRIBUTED=$d \
-DGPUACCELERATED=$ga -DGPU_COMPUTE_CAPABILITY=$cc
# insert additional cmake params here, if needed
make
export OMP_NUM_THREADS=$nt
if (( $d == 1 )); then
mpirun -np $nn ./tests/tests $f
else
./tests/tests $f
fi
}
# precision
for p in 1 2 4; do
# serial
mt=0 d=0 ga=0 test
# multithreaded
mt=1 d=0 ga=0 test
# gpu
mt=0 d=0 ga=1 test
# distributed (+multithreaded)
for nn in 2 4 8 16; do
mt=1 d=1 ga=0 test
done
done