James Quinlan, Ph.D., Department of Computer Science, recently presented his latest research on iterative refinement using mixed-precision posit arithmetic at the Conference for Next-Generation Arithmetic (CoNGA’24). His work explores strategies to leverage new low-precision number formats for improved performance and efficiency.
Emerging low-precision formats, like posits, can offer double the speed while requiring just a fraction of the storage compared to standard representations. Fields like high-performance computing and artificial intelligence actively examine these nontraditional floating-point options to address power and performance bottlenecks.
Major hardware manufacturers like Intel and NVIDIA now offer hardware support for these new low-precision number formats customized for artificial intelligence applications like deep learning. For example, NVIDIA’s Tensor Cores perform efficient mixed precision matrix multiplication using formats like half-precision FP16 to accelerate deep neural networks. Intel has added bfloat16 support to their latest Xeon scalable processors.
However, fully utilizing these formats requires adapting computational algorithms and mathematical software to work effectively with lower-precision representations. Quinlan’s research focuses on techniques to refine iterative numerical solutions, a common approach in scientific computing, to achieve correct convergence when using mixed precision calculations. This involves exploring different scaling strategies and preconditioners that transform the problem to aid stability.
His experiments leverage the Universal Numbers Library, an open-source C++ library developed by Stillwater Supercomputing that implements posits and other arithmetic types with configurable precision. The library provides building blocks to experiment with these new formats in accurate numerical computations.
Student opportunities exist to contribute to this research and the Universal Numbers Library. Contact James Quinlan for more details.