Louis Ledoux

Ada and me (I am the human).

Hey there, I am Louis Ledoux (or Luis Eduardo Yves Ledoux Pardo for completeness), a PhD student at the Barcelona Supercomputing Center (BSC) and Universitat Politècnica de Catalunya (UPC).

My academic focus is on floating-point arithmetic paradigms, spanning from high-level problems down to silicon implementation. I am familiar with systolic arrays for matrix multiplications, real number representation, and automated pipelines on ASIC and FPGA technologies. However, my interests extend far beyond these areas.

On a more daily basis, my hobbies include DJ-ing, producing music, even crafting synthesizers, and more generally fusing art and science. I might have been too inspired by Ovid’s saying, “ars similis cassus.” I am also the lucky partner of the cutest cat, Ada, observable in the picture.

news

No news so far...

latest posts

Oct 20, 2024	Music/DJ set
Oct 19, 2024	My Take on Manim Slides
Aug 24, 2024	ADA Wavetables

selected publications

LLMMMM: Large Language Models Matrix-Matrix Multiplications Characterization on Open Silicon

Louis Ledoux, and Marc Casas

May 2024

Poster

Bib HTML PDF

@misc{ledoux_llmmmm_2024,
  title = {{LLMMMM: Large Language Models Matrix-Matrix Multiplications Characterization on Open Silicon}},
  author = {Ledoux, Louis and Casas, Marc},
  url = {https://hal.science/hal-04592229},
  note = {Poster},
  howpublished = {{11th BSC Symposium}},
  organization = {{Barcelona Supercomputing Center}},
  volume = {11},
  year = {2024},
  month = may,
  keywords = {Multi-Perspective Layout visualizations and Power Performance Area (PPA) Analysis introduction of PPAA (Power Performance Area Accuracy) metric PDK vs. Computer format ASAP7: 4096 PEs 8.2 TFLOP/s <1 mm 2 Congestion heatmap 64 PEs e4m3-$\beta$ 8 ; Multi-Perspective Layout visualizations and Power ; Performance ; Area (PPA) Analysis introduction of PPAA (Power ; Area ; Accuracy) metric PDK vs. Computer format ASAP7: 4096 PEs ; 8.2 TFLOP/s ; <1 mm 2 Congestion heatmap ; 64 PEs ; e4m3-$\beta$ 8},
  hal_id = {hal-04592229},
  hal_version = {v1},
}

FCCM
A Generator of Numerically-Tailored and High-Throughput Accelerators for Batched GEMMs

Louis Ledoux, and Marc Casas

In 2022 IEEE 30th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) , May 2022

Abs Bib HTML

We propose a hardware generator of GEMM accelerators. Our generator produces vendor-agnostic HDL describing highly customizable systolic arrays guided by accuracy and energy efficiency goals. The generated arrays have three main novel aspects. First, the accelerators handle a large variety of computer number formats using intermediate representations based on our Sign Scale Significand (S3) format. Second, the processing elements perform all intermediate dot-product arithmetic operations required by the GEMM kernel without any intermediate rounding, which makes it possible to deliver better energy efficiency than state-of-the-art approaches while offering more accuracy and reproducible results. Third, our accelerators feature the Half-Speed Sink Down (HSSD) mechanism, which maximizes the overlap of host-accelerator data transfers with GEMM computations.We evaluate our automatically generated designs in a cutting-edge setup composed of a POWER9 host, CAPI (Coherent Accelerator Processor Interface) link, and a Virtex Ultrascale Plus FPGA. Arrays can operate at the speed of the link and saturate it to reach a 13GB/s throughput. Our fine-grain customization approach allows to cover a wide range of accuracy versus efficiency scenarios and can reach 0.65GOps/s/W while producing 1024 accurate bits or 148.7GOps/s/W with 6 accurate bits. Our configurations achieve up to 1613GOps/s system performance and power efficiencies of up to 240GOps/s/W for the FPGA. This automatic generator is the first being able to produce such a variety of designs. We improve the single-precision energy efficiency of state-of-the-art FPGA GEMM accelerators by 1.86×.
@inproceedings{ledoux_generator_2022, title = {A {Generator} of {Numerically}-{Tailored} and {High}-{Throughput} {Accelerators} for {Batched} {GEMMs}}, doi = {10.1109/FCCM53951.2022.9786164}, booktitle = {2022 {IEEE} 30th {Annual} {International} {Symposium} on {Field}-{Programmable} {Custom} {Computing} {Machines} ({FCCM})}, author = {Ledoux, Louis and Casas, Marc}, month = may, year = {2022}, pages = {1--10}, google_scholar_id = {u5HHmVD_uO8C}, }
FPL
An Open-Source Framework for Efficient Numerically-Tailored Computations

Louis Ledoux, and Marc Casas

In 2023 33rd International Conference on Field-Programmable Logic and Applications (FPL) , Sep 2023

ISSN: 1946-1488

Abs Bib HTML

We present a versatile open-source framework designed to facilitate efficient, numerically-tailored Matrix-Matrix Multiplications (MMMs). The framework offers two primary contributions: first, a fine-tuned, automated pipeline for arithmetic datapath generation, enabling highly customizable systolic MMM kernels; second, seamless integration of the generated kernels into user code, irrespective of the programming language employed, without necessitating modifications. We employ this framework within a cutting-edge platform, comprising a Power9 host, an OpenCAPI link, and a Xilinx Virtex UltraScale+ FPGA. The framework demonstrates a systematic enhancement in accuracy per energy cost across diverse High Performance Computing (HPC) workloads displaying a variety of numerical requirements, such as Artificial Intelligence (AI) inference and Sea Surface Height (SSH) computation. For AI inference, we consider a set of state-of-the-art neural network models, namely ResNetl8, ResNet34, ResNet50, DenseNet121, DenseNet161, DenseNet169, and VGG11, in conjunction with two datasets, two computer formats, and 27 distinct intermediate arithmetic datapaths. Our approach consistently reduces energy consumption across all cases, with a notable example being the reduction by factors of 3.3x for IEEE754-32 and 1.4x for Bfloat16 during ImageNet inference with ResNet50. This is accomplished while maintaining accuracies of 82.3% and 86%, comparable to those achieved with conventional Floating-Point Units (FPUs). In the context of SSH computation, our method achieves fully-reproducible results using double-precision words, surpassing the accuracy of conventional double- and quad-precision arithmetic in FPUs. Our approach enhances SSH computation accuracy by a minimum of 5× and 27× compared to IEEE754-64 and IEEE754-128, respectively, resulting in 5.6× and 15.1 × improvements in accuracy per power cost.
@inproceedings{ledoux_framework_2023, title = {An {Open}-{Source} {Framework} for {Efficient} {Numerically}-{Tailored} {Computations}}, url = {https://ieeexplore.ieee.org/document/10296314}, doi = {10.1109/FPL60245.2023.00011}, booktitle = {2023 33rd {International} {Conference} on {Field}-{Programmable} {Logic} and {Applications} ({FPL})}, author = {Ledoux, Louis and Casas, Marc}, month = sep, year = {2023}, note = {ISSN: 1946-1488}, pages = {19--26}, google_scholar_id = {d1gkVwhDpl0C}, }