What: SPCL_Bcast is an open, online seminar series that covers a broad range of topics around parallel and high-performance computing, scalable machine learning, and related areas.

Who: We invite top researchers and engineers from all over the world to speak.

Where: Anyone is welcome to join over Zoom! This link will always redirect to the right Zoom meeting. When possible, we make recordings available on our YouTube channel.

Social media: Follow along with #spcl_bcast on Twitter!

When: Every two weeks on Thursdays, in one of two slots (depending on speaker).

10 September – 22 October, 2020:

  • Morning: 9 AM Zurich, 4 PM Tokyo, 3 PM Beijing, 3 AM New York, 12 AM (midnight) San Francisco
  • Evening: 6 PM Zurich, 1 AM (Friday) Tokyo, 12 AM (midnight) Beijing, 12 PM (noon) New York, 9 AM San Francisco

5 November – 17 December, 2020:

  • Morning: 9 AM Zurich, 5 PM Tokyo, 4 PM Beijing, 3 AM New York, 12 AM (midnight) San Francisco
  • Evening: 6 PM Zurich, 2 AM (Friday) Tokyo, 1 AM (Friday) Beijing, 12 PM (noon) New York, 9 AM San Francisco

10 September, 2020 — Satoshi Matsuoka
9 AM Zurich, 4 PM Tokyo, 3 PM Beijing, 3 AM New York, 12 AM (midnight) San Francisco

Fugaku: The First 'Exascale' Supercomputer – Past, Present and Future

Abstract: Fugaku is the first 'exascale' supercomputer of the world, not due to its peak double precision flops, but rather, its demonstrated performance in real applications that were expected of exascale machines on their conceptions 10 years ago, as well as reaching actual exaflops in new breeds of benchmarks such as HPL-AI. But the importance of Fugaku is its "applications first" philosophy under which it was developed, and its resulting mission to be the centerpiece for rapid realization of the so-called Japanese 'Society 5.0' as defined by the Japanese S&T national policy. As such, Fugaku's immense power is directly applicable not only to traditional scientific simulation applications, but can be a target of Society 5.0 applications that encompasses conversion of HPC & AI & Big Data as well as Cyber (IDC & Network) vs. Physical (IoT) space, with immediate societal impact. In fact, Fugaku is already in partial operation a year ahead of schedule, primarily to obtain early Society 5.0 results including combatting COVID-19 as well as resolving other important societal issues. The talk will introduce how Fugaku had been conceived, analyzed, and built over the 10 year period, look at its current efforts regarding Society 5.0 and COVID, as well as touch upon our thoughts on the next generation machine, or "Fugaku NeXT".

Picture of Satoshi Matsuoka Bio: Satoshi Matsuoka is the director of RIKEN R-CCS, the top-tier HPC center in Japan which operates the K Computer and will host its successor Supercomputer Fugaku, and he is a Specially Appointed Professor at Tokyo Tech since 2018. He had been a Full Professor at the Global Scientific Information and Computing Center (GSIC), Tokyo Institute of Technology, since 2000, where he has been the leader of the TSUBAME series of supercomputers that have won many accolades such as world #1 in power-efficient computing. Satoshi Matsuoka also leads various major supercomputing research projects in areas such as parallel algorithms and programming, resilience, green computing, and convergence of Big Data/AI with HPC. He has written over 500 articles, chaired numerous ACM/IEEE conferences, and has won many awards, such as the ACM Gordon Bell Prize in 2011 and the highly prestigious 2014 IEEE-CS Sidney Fernbach Memorial Award.

24 September, 2020 — Amir Gholami & Zhewei Yao
6 PM Zurich, 1 AM (Friday) Tokyo, Midnight Beijing, 12 PM New York, 9 AM San Francisco — Zoom

A Paradigm Shift to Second Order Methods for Machine Learning

Abstract: The amount of compute needed to train modern NN architectures has been doubling every few months. With this trend, it is no longer possible to perform brute force hyperparameter tuning to train the model to good accuracy. However, first-order methods such as Stochastic Gradient Descent are quite sensitive to such hyperparameter tuning and can easily diverge for challenging problems. However, many of these problems can be addressed with second-order optimizers. In this direction, we introduce AdaHessian, a new stochastic optimization algorithm. AdaHessian directly incorporates approximate curvature information from the loss function, and it includes several novel performance-improving features, including: (i) a fast Hutchinson based method to approximate the curvature matrix with low computational overhead; and (ii) a spatial/temporal block diagonal averaging to smooth out variations of second-derivate over different parameters/iterations. Extensive tests on NLP, CV, and recommendation system tasks, show that AdaHessian achieves state-of-the-art results, with 10x less sensitivity to hyperparameter tuning as compared to ADAM.
In particular, we find that AdaHessian:
(i) outperforms AdamW for transformers by 0.13/0.33 BLEU score on IWSLT14/WMT14, 2.7/1.0 PPL on PTB/Wikitext-103;
(ii) outperforms AdamW for SqueezeBert by 0.41 points on GLUE;
(iii) achieves 1.45%/5.55% higher accuracy on ResNet32/ResNet18 on Cifar10/ImageNet as compared to Adam; and
(iv) achieves 0.032% better score than AdaGrad for DLRM on the Criteo Ad Kaggle dataset.
The cost per iteration of AdaHessian is comparable to first-order methods, and AdaHessian exhibits improved robustness towards variations in hyperparameter values.

Picture of Amir Gholami Bio: Amir Gholami is a senior research fellow at ICSI and Berkeley AI Research (BAIR). He received his PhD from UT Austin, working on large scale 3D image segmentation, a research topic which received UT Austin’s best doctoral dissertation award in 2018. He is a Melosh Medal finalist, the recipient of best student paper award in SC'17, Gold Medal in the ACM Student Research Competition, as well as best student paper finalist in SC’14. Amir is a recognized expert in industry with long lasting contributions. He was part of the Nvidia team that for the first time made FP16 training possible, enabling more than 10x increase in compute power through tensor cores. That technology has been widely adopted in GPUs today. Amir's current research focuses on exa-scale neural network training and efficient inference.
Picture of Zhewei Yao Zhewei Yao is a Ph.D. student in the BAIR, RISELab (former AMPLab), BDD and Math Department at University of California at Berkeley. He is advised by Michael Mahoney, and he is also working very closely with Kurt Keutzer. His research interest lies in computing statistics, optimization, and machine learning. Currently, he is interested in leveraging tools from randomized linear algebra to provide efficient and scalable solutions for large-scale optimization and learning problems. He is also working on the theory and application of deep learning. Before joining UC Berkeley, he received his B.S. in Math from Zhiyuan Honor College at Shanghai Jiao Tong University.

8 October, 2020 — Rio Yokota
9 AM Zurich, 4 PM Tokyo, 3 PM Beijing, 3 AM New York, 12 AM (midnight) San Francisco — Zoom

Distributed Deep Learning with Second Order Information

Abstract: As the scale of deep neural networks continues to increase exponentially, distributed training is becoming an essential tool in deep learning. Especially in the context of un/semi/self-supervised pretraining, larger models tend to achieve much higher accuracy. This trend is especially clear in natural language processing, where the latest GPT-3 model has 175 billion parameters. The training of such models requires hybrid data+model-parallelism. In this talk, I will describe two of our recent efforts; 1) second-order optimization and 2) reducing memory footprint, in the context of large-scale distributed deep learning.

Picture of Rio Yokota Bio: Rio Yokota is an Associate Professor at the Tokyo Institute of Technology. His research interest lie at the intersection of HPC and ML. On the HPC side, he has worked on hierarchical low-rank approximation methods such as FMM and H-matrices. He has worked on GPU computing since 2007 and won the Gordon Bell prize using the first GPU supercomputer in 2009. On the ML side, he works on distributed deep learning and second-order optimization. His work on training ImageNet in 2 minutes with second-order methods has been extended to various applications using second-order information.

22 October, 2020 — To be announced

Speaker to be announced.

5 November, 2020 — Jidong Zhai

Talk to be announced.

19 November, 2020 — To be announced

Speaker to be announced.

3 December, 2020 — To be announced

Speaker to be announced.

17 December, 2020 — To be announced

Speaker to be announced.