Theodoros Rekatsinas
The Scalable Parallel Computing Lab's *SPCL_Bcast* seminar continues
with *Theodoros Rekatsinas**of **Axelera AI* presenting on *Data
Selection - Data Challenges when Training Generative Models*. Everyone
is welcome to attend (over Zoom)!
*When:* Thursday, 8th May, 9AM CET
*Where:* Zoom
Join <https://spcl.inf.ethz.ch/Bcast/join>
*Abstract:* This talk explores how strategic data selection can improve
the efficiency of training generative AI models. I will cover approaches
for both pre-training and fine-tuning that achieve comparable
performance to full training while using only a fraction of the data.
During the talk I will cover key filtering techniques and data selection
methods for efficient pre-training as well as the connection between
data selection and optimal transport for optimized fine-tuning. I will
conclude with promising future directions for adaptive data selection
research.
*Biography:* Theo Rekatsinas is the VP of Machine Learning at Axelera
AI. before that he was a tech lead at Apple working on on-device
intelligence and a senior manager in the Apple Knowledge Graph (KG) team
responsible for the KG construction and Graph Machine learning teams.
Theo co-founded Inductiv (acquired by Apple), a company that developed
Generative AI solutions for identifying and correcting errors in data.
Theo was also a Professor of Computer Science at ETH Zürich and the
University of Wisconsin-Madison. Theo's research focuses on scalable
machine learning over billion-scale relational and graph-structured
data. His research focused on exploring the fundamental connections
between data preparation, data integration, and knowledge management
with statistical machine learning and probabilistic inference.
More details & future talks <https://spcl.inf.ethz.ch/Bcast/>
Scalable Parallel Computing Lab (SPCL)
Department of Computer Science, ETH Zurich
Website <https://spcl.inf.ethz.ch> X(Twitter)
<https://twitter.com/spcl_eth> YouTube <https://www.youtube.com/@spcl>
GitHub <https://github.com/spcl>