Batching in GPU inference is effective due to parallelism. We implement request aggregation and batching in Llama.cpp to explore if it is useful in CPUs. Our optimizations show 20% speedup in request response time.

Implemented several Obilivious RAM(ORAM) techniques used to combat against memory access patterns as a side channel and measured their performance against settings with no security.

A simulator built in C++ and Python to analyze memory access patterns and improve the performance of disaggregated server architecture using the predictable nature of data-oblivious algorithms

Implemented a dataset expander, generator, and robustness checker using reverse nearest neighbors, with augmentation techniques like distance averaging and random noise addition.