Sitota Ezra Mersha

Batching in GPU inference is effective due to parallelism. We implement request aggregation and batching in Llama.cpp to explore if it is useful in CPUs. Our optimizations show 20% speedup in request response time.

Implemented several Obilivious RAM(ORAM) techniques used to combat against memory access patterns as a side channel and measured their performance against settings with no security.

A simulator built in C++ and Python to analyze memory access patterns and improve the performance of disaggregated server architecture using the predictable nature of data-oblivious algorithms

Implemented a dataset expander, generator, and robustness checker using reverse nearest neighbors, with augmentation techniques like distance averaging and random noise addition.