- Spcl-friends - example.com

SPCL_Bcast: Theodoros Rekatsinas, Data Selection - Data Challenges when Training Generative Models, Thursday, 8th May, 9AM CET
by Marcin Chrapek 06 May '25

06 May '25

Theodoros Rekatsinas The Scalable Parallel Computing Lab's *SPCL_Bcast* seminar continues with *Theodoros Rekatsinas**of **Axelera AI* presenting on *Data Selection - Data Challenges when Training Generative Models*. Everyone is welcome to attend (over Zoom)! *When:* Thursday, 8th May, 9AM CET *Where:* Zoom Join <https://spcl.inf.ethz.ch/Bcast/join> *Abstract:* This talk explores how strategic data selection can improve the efficiency of training generative AI models. I will cover approaches for both pre-training and fine-tuning that achieve comparable performance to full training while using only a fraction of the data. During the talk I will cover key filtering techniques and data selection methods for efficient pre-training as well as the connection between data selection and optimal transport for optimized fine-tuning. I will conclude with promising future directions for adaptive data selection research. *Biography:* Theo Rekatsinas is the VP of Machine Learning at Axelera AI. before that he was a tech lead at Apple working on on-device intelligence and a senior manager in the Apple Knowledge Graph (KG) team responsible for the KG construction and Graph Machine learning teams. Theo co-founded Inductiv (acquired by Apple), a company that developed Generative AI solutions for identifying and correcting errors in data. Theo was also a Professor of Computer Science at ETH Zürich and the University of Wisconsin-Madison. Theo's research focuses on scalable machine learning over billion-scale relational and graph-structured data. His research focused on exploring the fundamental connections between data preparation, data integration, and knowledge management with statistical machine learning and probabilistic inference. More details & future talks <https://spcl.inf.ethz.ch/Bcast/> Scalable Parallel Computing Lab (SPCL) Department of Computer Science, ETH Zurich Website <https://spcl.inf.ethz.ch> X(Twitter) <https://twitter.com/spcl_eth> YouTube <https://www.youtube.com/@spcl> GitHub <https://github.com/spcl>

1 0

SPCL_Bcast: John Mellor-Crummey, Measurement and Analysis of Application Performance on GPU-accelerated Systems at Exascale, Thursday, 13th March, 6PM CET
by Marcin Chrapek 10 Mar '25

10 Mar '25

John Mellor-Crummey The Scalable Parallel Computing Lab's *SPCL_Bcast* seminar continues with *John Mellor-Crummey**of **Rice University* presenting on *Measurement and Analysis of Application Performance on GPU-accelerated Systems at Exascale*. Everyone is welcome to attend (over Zoom)! *When:* Thursday, 13th March, 6PM CET *Where:* Zoom Join <https://spcl.inf.ethz.ch/Bcast/join> *Abstract:* As part of the US DOE's Exascale Computing Project, Rice University began extending its HPCToolkit performance tools to support instruction-level measurement and analysis of applications executing on GPU-accelerated exascale supercomputers. Hardware support for instruction-level performance measurement in AMD, Intel, and NVIDIA GPUs was developed at the urging of the HPCToolkit project team. HPCToolkit employs PC sampling or binary instrumentation to perform instruction-level measurements of GPU computations. When measuring a GPU-accelerated application, HPCToolkit employs a novel wait-free data structure to communicate performance measurements between tool threads and application threads. To help attribute performance information in detail, HPCToolkit performs parallel analysis of large CPU and GPU binaries involved in the execution of an exascale application to rapidly recover mappings between machine instructions and source code. To analyze terabytes of performance measurements gathered during executions at exascale, HPCToolkit employs distributed-memory parallelism, multithreading, sparse data structures, and out-of-core streaming analysis algorithms. To support interactive exploration of profiles up to terabytes in size, HPCToolkit's hpcviewer GUI uses out-of-core methods to visualize performance data. These strategies have enabled HPCToolkit to efficiently measure, analyze and explore terabytes of performance data for executions using as many as 64K MPI ranks and 64K GPU tiles on ORNL's Frontier supercomputer. This talk will describe key aspects of HPCToolkit, successes analyzing applications, and some challenges ahead. *Biography:* John Mellor-Crummey is a Professor of Computer Science at Rice University in Houston, TX, USA. His principal research focus at present is tools for measurement and analysis of application performance. His past work includes scalable synchronization algorithms for shared-memory multiprocessors, compilers and runtime systems for parallel computing, techniques for execution replay of parallel programs, tools for dynamic data race detection, and techniques for network performance analysis and optimization. Mellor-Crummey co-led development of the OMPT tools interface for OpenMP 5. He is a co-recipient of the 2006 Dijkstra Prize in Distributed Computing, co-recipient of a 2024 Honor Award from the US Secretary of Energy, and a Fellow of the ACM. More details & future talks <https://spcl.inf.ethz.ch/Bcast/> Scalable Parallel Computing Lab (SPCL) Department of Computer Science, ETH Zurich Website <https://spcl.inf.ethz.ch> X(Twitter) <https://twitter.com/spcl_eth> YouTube <https://www.youtube.com/@spcl> GitHub <https://github.com/spcl>

1 0

SPCL_Bcast: Jesper Larsson Träff, Broadcast, Reduction and beyond with Block Schedules and Circulant Graphs, Thursday, 12th December, 10AM CET
by Marcin Chrapek 10 Dec '24

10 Dec '24

Jesper Larsson Träff The Scalable Parallel Computing Lab's *SPCL_Bcast* seminar continues with *Jesper Larsson Träff**of **TU Wien (Vienna University of Technology)* presenting on *Broadcast, Reduction and beyond with Block Schedules and Circulant Graphs*. Everyone is welcome to attend (over Zoom)! *When:* Thursday, 12th December, 10AM CET *Where:* Zoom Join <https://spcl.inf.ethz.ch/Bcast/join> *Abstract:* We present a round-optimal algorithm for broadcasting n indivisible blocks of data over p processors communicating in a regular, logarithmic degree circulant graph pattern. This broadcast algorithm immediately leads to partly new, likewise round-optimal algorithms for the reduction to root, all-to-all broadcast (allgatherv) and irregular and regular reduce-scatter operations. The broadcast algorithm relies on block schedules with certain properties which we indicate can be computed optimally in O(log p) operations per processor without communication. The communication pattern and algorithms are attractive for implementing most of the standard, dense collective operations of MPI. *Biography:* Jesper Larsson Träff is professor for Parallel Computing at TU Wien (Vienna University of Technology) since 2011. From 2010 to 2011 he was guest professor for Scientific Computing at the University of Vienna. From 1998 until 2010 he was working at the NEC Laboratories Europe in Sankt Augustin, Germany on efficient implementations of MPI for NEC vector supercomputers; this work led to a doctorate (Dr. Scient.; Habilitation) from the University of Copenhagen in 2009. From 1995 to 1998 he spent four years as PostDoc/Research Associate in the Algorithms Group of the Max-Planck Institute for Computer Science in Saarbrücken, and the Efficient Algorithms Group at the Technical University of Munich. He received an M.Sc. in computer science in 1989, and, after two interim years at the industrial research center ECRC in Munich, a Ph.D. in 1995, both from the University of Copenhagen. More details & future talks <https://spcl.inf.ethz.ch/Bcast/> Scalable Parallel Computing Lab (SPCL) Department of Computer Science, ETH Zurich Website <https://spcl.inf.ethz.ch> X(Twitter) <https://twitter.com/spcl_eth> YouTube <https://www.youtube.com/@spcl> GitHub <https://github.com/spcl>

1 0

SPCL_Bcast: Mark Silberstein, The evolution of accelerator-centric GPU services - past, present, future., Thursday, 28th November, 6PM CET
by Marcin Chrapek 26 Nov '24

26 Nov '24

Mark Silberstein The Scalable Parallel Computing Lab's *SPCL_Bcast* seminar continues with *Mark Silberstein**of **Technion* presenting on *The evolution of accelerator-centric GPU services - past, present, future.*. Everyone is welcome to attend (over Zoom)! *When:* Thursday, 28th November, 6PM CET *Where:* Zoom Join <https://spcl.inf.ethz.ch/Bcast/join> *Abstract:* GPUs have come a long way, evolving from gaming processors to the main driving force behind modern AI systems. However, from a system design perspective, they remain co-processors: they cannot operate independently of the host CPU, which is necessary to invoke kernels, manage GPU memory, perform data transfers, and interact with I/O devices. Thus, beyond the complexity of optimizing individual kernels, GPU-accelerated application development faces fundamental challenges in integrating GPU computations into complex data and control flows involving networking and storage. Since 2013, my students in the Accelerated Computing Systems Group (https://acsl.group) have been exploring an alternative, accelerator-centric system design in which a GPU runs specially crafted OS layers that allow GPU kernels to access files, storage devices, SmartNICs, and network services, without CPU involvement in the data and/or control path. We have demonstrated how such an approach simplifies the programming burden and achieves high performance. In this talk, I will survey the key ideas of the accelerator-centric design, discuss the main takeaways, and explore future trends. *Biography:* Mark Silberstein is a professor in the Electrical and Computer Engineering Department at the Technion - Israel Institute of Technology. His research interests span a broad range of topics in computer systems, including OS, networking, computer architecture, and systems security. His projects have been published in top systems venues, with some winning awards and others being adopted by the industry. He regularly serves on the program committees of leading systems conferences, including as a program co-chair of Eurosys '24 and ASPLOS '26. More details & future talks <https://spcl.inf.ethz.ch/Bcast/> Scalable Parallel Computing Lab (SPCL) Department of Computer Science, ETH Zurich Website <https://spcl.inf.ethz.ch> X(Twitter) <https://twitter.com/spcl_eth> YouTube <https://www.youtube.com/@spcl> GitHub <https://github.com/spcl>

1 0

SPCL_Bcast: Oskar Mencer, Programming Groq LPUs without IEEE Floating Point, Thursday, 2nd May, 6PM CET
by Marcin Chrapek 25 Apr '24

25 Apr '24

Oskar Mencer The Scalable Parallel Computing Lab's *SPCL_Bcast* seminar continues with *Oskar Mencer**of **Groq* presenting on *Programming Groq LPUs without IEEE Floating Point*. Everyone is welcome to attend (over Zoom)! *When:* Thursday, 2nd May, 6PM CET *Where:* Zoom Join <https://spcl.inf.ethz.ch/Bcast/join> *Abstract:* The IEEE standard has been a great advance in the early days of software. In these early days, the speed of software development was imperative. The Intel x86 instruction set became a standard as well as IEEE Floating point. Today, we have the first commodity computing application, the LLM, and others are rapidly following. In the commodity economy, efficiency and cost become the utmost imperative. As we are giving up on the x86 instruction set, we have to also consider custom number representations for each variable in our programs, opening the world of Physics and Computer Science to a new dimension in computing (as predicted in my talk at ETH in 2000). In this talk I will cover how to find the (locally) optimal range and precision for each variable, and how to optimally utilize custom precision arithmetic units in modern leading compute chips such as the Groq LPU. *Biography:* Oskar Mencer got a PhD in Computer Engineering from Stanford University in 2000, interviewed unsuccessfully at ETH for an Assistant Professor position, joined Bell Labs 1127, then became EPSRC Advanced Fellow at Imperial, started Maxeler Technologies, and later got major investments among others from JP Morgan and CME Group. Maxeler was recently acquired by Groq, the leading AI inference company in California. Oskar remains CEO of Maxeler, a Groq Company and now lives on Palm Jumeirah in Dubai. More details & future talks <https://spcl.inf.ethz.ch/Bcast/> Scalable Parallel Computing Lab (SPCL) Department of Computer Science, ETH Zurich Website <https://spcl.inf.ethz.ch> X(Twitter) <https://twitter.com/spcl_eth> YouTube <https://www.youtube.com/@spcl> GitHub <https://github.com/spcl>

1 0

SPCL_Bcast: Petar Veličković, Capturing Computation with Algorithmic Alignment, Thursday, 21st March, 6PM CET
by Marcin Chrapek 18 Mar '24

18 Mar '24

Petar Veličković The Scalable Parallel Computing Lab's *SPCL_Bcast* seminar continues with *Petar Veličković**of **DeepMind, and University of Cambridge* presenting on *Capturing Computation with Algorithmic Alignment*. Everyone is welcome to attend (over Zoom)! *When:* Thursday, 21st March, 6PM CET *Where:* Zoom Join <https://spcl.inf.ethz.ch/Bcast/join> *Abstract:* What makes a neural network better, or worse, at fitting certain tasks? This question is arguably at the heart of neural network architecture design, and it is remarkably hard to answer rigorously. Over the past few years, there have been a plethora of attempts, using various facets of advanced mathematics, to answer this question under various assumptions. One of the most successful directions -- algorithmic alignment -- assumes that the target function, and a mechanism for computing it, are completely well-defined and known (i.e. the target is to learn to execute an algorithm). In this setting, fitting a task is equated to capturing the computations of an algorithm, inviting analyses from diverse branches of mathematics and computer science. I will present some of my personal favourite works in algorithmic alignment, along with their implications for building intelligent systems of the future. *Biography:* Petar is a Staff Research Scientist at Google DeepMind, an Affiliated Lecturer at the University of Cambridge, and an Associate of Clare Hall, Cambridge. He holds a PhD in Computer Science from the University of Cambridge (Trinity College), obtained under the supervision of Pietro Liò. His research concerns geometric deep learning—devising neural network architectures that respect the invariances and symmetries in data (a topic he’s co-written a proto-book about). For his contributions, he is recognized as an ELLIS Scholar in the Geometric Deep Learning Program. Particularly, he focuses on graph representation learning and its applications in algorithmic reasoning (featured in VentureBeat). He is the first author of Graph Attention Networks—a popular convolutional layer for graphs—and Deep Graph Infomax—a popular self-supervised learning pipeline for graphs (featured in ZDNet). His research has been used in substantially improving travel-time predictions in Google Maps (featured in CNBC, Endgadget, VentureBeat, CNET, the Verge, and ZDNet), and guiding the intuition of mathematicians towards new top-tier theorems and conjectures (featured in Nature, Science, Quanta Magazine, New Scientist, The Independent, Sky News, The Sunday Times, la Repubblica, and The Conversation). More details & future talks <https://spcl.inf.ethz.ch/Bcast/> Scalable Parallel Computing Lab (SPCL) Department of Computer Science, ETH Zurich Website <https://spcl.inf.ethz.ch> X(Twitter) <https://twitter.com/spcl_eth> YouTube <https://www.youtube.com/@spcl> GitHub <https://github.com/spcl>

1 0

SPCL_Bcast: Albert Cohen, Can I Cook a 5 o'clock Compiler Cake and Eat It at 2?, Thursday, 7th December, 9AM CET
by Marcin Chrapek 05 Dec '23

05 Dec '23

The Scalable Parallel Computing Lab's *SPCL_Bcast* seminar continues with *Albert Cohen of Google* presenting on *Can I Cook a 5 o'clock Compiler Cake and Eat It at 2?* Everyone is welcome to attend (over Zoom)! *When:* Thursday, 7th December, 9AM CET *Where:* Zoom Join <https://spcl.inf.ethz.ch/Bcast/join> *Abstract:* In high-performance computing words: can we build a compiler that will eventually save a lot of performance engineering effort while immediately delivering competitive results? Here, competitiveness refers to achieving near hardware peak-performance for important applications. The question is particularly hot in a domain-specific setting, where the building blocks for constructing an effective optimizing compiler may be inadequate, too generic, or too low-level. It is widely understood that compiler construction has failed to deliver early afternoon sweets. I personally feel bad about it, but until recently it remained an academic exercise to challenge the status quo. Maybe it is now time to reconsider this assumption: ML-enhanced compilers become the norm rather than the exception. New compiler frameworks reconcile optimizations for the common case with application-specific performance. Domain-specific code generators play an essential role in the implementation of dense and sparse numerical libraries. But even with the help of domain-specific compilers, peak performance can only be achieved at the expense of a dramatic loss of programmability. Are we ever going to find a way out of this programmability/performance dilemma? What about the velocity and agility of compiler engineers? Can we make ML-based heuristics scalable enough to compile billions of lines of code? Can we do so while enabling massive code reuse across domains, languages and hardware? We will review these questions, based on recent successes and half-successes in academia and industry. We will also form an invitation to tackle these challenges in future research and software development. *Biography:* Albert Cohen is a research scientist at Google. An alumnus of École Normale Supérieure de Lyon and the University of Versailles, he has been a research scientist at Inria, a visiting scholar at the University of Illinois, an invited professor at Philips Research, and a visiting scientist at Facebook Artificial Intelligence Research. Albert works on parallelizing, optimizing and machine learning compilers, and on dataflow and synchronous programming languages, with applications to high-performance computing, artificial intelligence and reactive control. More details & future talks <https://spcl.inf.ethz.ch/Bcast/> Scalable Parallel Computing Lab (SPCL) Department of Computer Science, ETH Zurich Website <https://spcl.inf.ethz.ch> X(Twitter) <https://twitter.com/spcl_eth> YouTube <https://www.youtube.com/@spcl> GitHub <https://github.com/spcl>

1 1

SPCL_Bcast: Marian Verhelst, Heterogeneous multi-core systems for efficient EdgeML, Thursday, 26th October, 9AM CET
by Marcin Chrapek 23 Oct '23

23 Oct '23

SPCL_Bcast Virginia Smith The Scalable Parallel Computing Lab's *SPCL_Bcast* seminar continues with *Marian Verhelst of KU Leuven* presenting on *Heterogeneous multi-core systems for efficient EdgeML*. Everyone is welcome to attend (over Zoom)! When: Thursday, 26th October, 9AM CET Where: Zoom Join <https://spcl.inf.ethz.ch/Bcast/join> Abstract: Embedded ML applications are characterized by increasingly diverse workloads, forming a rich mixture of signal processing, GeMM and conv kernels, attention layers, and even graph processing. Accelerator efficiency suffers from supporting this wide variety of kernels. Heterogeneous multicore systems can offer a solution but come with their own challenges, such as: 1.) How to find the most optimal combination of cores?; 2.) How to efficiently map workloads across cores?; 3.) How to share data between these cores? This talk will report on a heterogeneous multi-core system for embedded neural network processing taped out at KULeuven MICAS. Moreover, it will give an outlook on work in progress towards further expanding this system for covering more workloads and more heterogeneous cores. Biography: Marian Verhelst is a full professor at the MICAS laboratories of KU Leuven and a research director at imec. Her research focuses on embedded machine learning, hardware accelerators, HW-algorithm co-design and low-power edge processing. She received a PhD from KU Leuven in 2008, and worked as a research scientist at Intel Labs, Hillsboro OR from 2008 till 2010. Marian is a member of the board of directors of tinyML and active in the TPC’s of DATE, ISSCC, VLSI and ESSCIRC and was the chair of tinyML2021 and TPC co-chair of AICAS2020. Marian is an IEEE SSCS Distinguished Lecturer, was a member of the Young Academy of Belgium, an associate editor for TVLSI, TCAS-II and JSSC and a member of the STEM advisory committee to the Flemish Government. Marian received the laureate prize of the Royal Academy of Belgium in 2016, the 2021 Intel Outstanding Researcher Award, the André Mischke YAE Prize for Science and Policy in 2021, and two ERC grants. More details & future talks <https://spcl.inf.ethz.ch/Bcast/> Scalable Parallel Computing Lab (SPCL) Department of Computer Science, ETH Zurich Website <https://spcl.inf.ethz.ch> X(Twitter) <https://twitter.com/spcl_eth> YouTube <https://www.youtube.com/@spcl> GitHub <https://github.com/spcl>

1 0

2026

2025

2024

2023

Spcl-friends