Yes. That is why you need to implement the functions on the host. Otherwise the computation is incomplete.
Frameworks like Pytorch have not been specifically developed for AI accelerators but for CPUs and GPUs. So, they can describe operators that are better run on a CPU. They are usually at the beginning and end of the network. We call it pre- and post-processing. And for popular networks we provide it as part of our examples or even add it back during model conversion e.g. NMS for the YOLO models.
For your own models you will need to do this yourself. It is some extra work but in return you get a much more efficient execution of your model compared to running the model on a CPU or GPU.
If my math is right your array seems to be a bit small to create the shape. 38 x 192 = 7296. So 12 x missing.