AI Systems Performance Engineering : Optimizing Model Training and Inference Workloads with Gpus, Cuda, and Pytorch