We need to determine from a given declaration if use of calc_vec, which will rely on Eigen's vectorized expressions, is valid.
Here are my somewhat tentative thoughts.
First, I'm not seeing a lot of documentation regarding how Eigen handles vectorized expressions such as y[1:5] = exp(x[1:5]). AI suggests that it sets up a for loop that avoids creating any temporary arrays and also relies on the processors vectorized/SIMD instructions (e.g., AVX) to compute multiple values in parallel. And that for best efficiency values should be contiguous in memory (but I think this would generally hold for calc_one too and have more to do with the order in which the loops are executed).
So I think we want to consider the following:
- When there are multiple loops, choose an indexing variable to vectorize over. I suppose consider the index associated with the loop with the most iterations as the focal index and vectorize only with respect to that index (though I'm not clear on the mechanism by which
for(i in 1:8) exp(x[1:000,i]) is better than for (i in 1:1000) exp(x[i,1:8])).
- Check that all indexing involves scalars to avoid things like
x[i,2:5] or x[i:i+3], which presumably couldn't make use of vectorized instructions. That said, what would Eigen do with something like y[1:3, 5:10] = exp(x[1:3,5:10]). Perhaps it would run fine.
- Regarding nested indexing, it's not clear to me that this poses a problem, though it may not provide for any additional efficiency than calc_one.
- Similarly, it's not clear there is a problem with something like
y[i+j], if we only have one index vectorized and the other index is looped over explicitly.
I guess one question in all this is if our determination is trying to determine when vectorized code will work versus when it will be more efficient than calc_one.
We need to determine from a given declaration if use of calc_vec, which will rely on Eigen's vectorized expressions, is valid.
Here are my somewhat tentative thoughts.
First, I'm not seeing a lot of documentation regarding how Eigen handles vectorized expressions such as
y[1:5] = exp(x[1:5]). AI suggests that it sets up a for loop that avoids creating any temporary arrays and also relies on the processors vectorized/SIMD instructions (e.g., AVX) to compute multiple values in parallel. And that for best efficiency values should be contiguous in memory (but I think this would generally hold for calc_one too and have more to do with the order in which the loops are executed).So I think we want to consider the following:
for(i in 1:8) exp(x[1:000,i])is better thanfor (i in 1:1000) exp(x[i,1:8])).x[i,2:5]orx[i:i+3], which presumably couldn't make use of vectorized instructions. That said, what would Eigen do with something likey[1:3, 5:10] = exp(x[1:3,5:10]). Perhaps it would run fine.y[i+j], if we only have one index vectorized and the other index is looped over explicitly.I guess one question in all this is if our determination is trying to determine when vectorized code will work versus when it will be more efficient than calc_one.