Examples Ep: add CUDA variant

If someone is enthusiastic for Kokkos, that could be done as well, but so far there are
- directive based (OpenMP offloading),
- language based (Python-numba *in progress*),
- "portable" kernel based (SYCL) implementations, 

but no CUDA or HIP. So CUDA would be useful.