TODO

- Use new gauge.transport also for staples, field strength, covariant shift

- Add cache to poke/peek

- First instance of random should also be faster ... random

- AlignedVector<Vector<index_type>> lut_vec_; very wasteful in block.h (imagine osites * 2MB pages)
  Need an unaligned/minimally-aligned version of Vector!

- Complete implementation of new blocked memory system in many node setup, re-run also test in init.cc

- Improve random performance further ; generate data in optimal ordering

- block_map additional parameter tensor index projection ([0,1],[2,3] for upper/lower half chiral protection)

- Remark: mapping to physical page happens on first write.  With fixed OMP core binding it is crucial to have
          appropriate thread layout on first write.

  
- General wilson loop (describe through direction, distance)

- Rework Multigrid based on sequence ?

- Rework Cshift based on view interface

- QIS:
  - Symbolic circuit optimization
  - Combine all X/CNOT into single sweep, similar for Rphis

- matrix_operator <> vector_space

- coarse_matrix needs to pass lists of U's for sub-blocks to cgpt for speedup

- If we ever need to speed up inverse matrix operator creation, can
  create solver_cache class: sc = slv_cache(solver,cache) then when
  sc(mat) gets called, it creates solver(mat) and sticks it in cache,
  if already in there, reuses it.

- Gauge Fix class that takes w.propagator and gauge fixing matrices as input

- Grid-production-code/zmobius_2pt_hvp_con_gstore/Fourier... <-- First TM, then 5d TM, then  FA

- test applications on summit with new version + new Grid

- A2A meson fields

- Using the l[...] interface, I could implement in python a stencil!  This may still
  be somewhat slower than the C++ operators but should be worth it for slightly less
  performance-critical code.

- Stout smearing, plaquette implementation using covariant shifts

- Complete sparse/split grid implementation

- Add RLE capable convertor to cgpt from coordinates to different
  linear orders

- Based on this implement in python a parallel reader for file formats nerscIO, openQCD

- sources

- verbose=eval -> Bytes/s & Flops/s for expression evaluation

- For some machines it may be useful to be able to git clone the gpt repository and set an
  environment variable such that gpt.repository copies files from there instead of downloading
  them from the web.