Copyright Notice:
The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.
Publications of SPCL
By year
From to
By type
[1] S. Ashkboos, A. Mohtashami, M. L. Croci, B. Li, M. Jaggi, D. Alistarh, T. Hoefler, J. Hensman: | ||
QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs
In Proceedings of the Neural Information Processing Systems, presented in Vancouver, Canada, Dec. 2024, |
[2] M. Khalilov, S. Di Girolamo, M. Chrapek, R. Nudelman, G. Bloch, T. Hoefler: | ||
Network-Offloaded Bandwidth-Optimal Broadcast and Allgather for Distributed AI
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'24), presented in Atlanta, GA, USA, pages 103:1-103:17, IEEE Press, ISBN: 9798350352917, Nov. 2024, (acceptance rate 21.1%, 99/470) |
[3] S. Ashkboos, I. Markov, E. Frantar, T. Zhong, X. Wang, J. Ren, T. Hoefler, D. Alistarh: | ||
QUIK: Towards End-to-End 4-Bit Inference on Generative Large Language Models
In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP'24), presented in Miami, FL, USA, pages 3355-3371, Association for Computational Linguistics, Nov. 2024, |
[4] S. Shen, L. Huang, M. Chrapek, T. Schneider, J. Dayal, M. Gajbe, R. Wisniewski, T. Hoefler: | ||
LLAMP: Assessing Network Latency Tolerance of HPC Applications with Linear Programming
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'24), presented in Atlanta, GA, USA, pages 1004-1021, IEEE Press, ISBN: 979-8-3503-5291-7, Nov. 2024, (acceptance rate 21.1%, 99/470) SC'24 Best Paper Award (1/99) |
[5] P. Okanovic, G. Kwasniewski, P. Sylos Labini, M. Besta, F. Vella, T. Hoefler: | ||
High Performance Unstructured SpMM Computation Using Tensor Cores
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'24), presented in Atlanta, GA, USA, pages 154:1-154:14, IEEE Press, ISBN: 979-8-3503-5291-7, Nov. 2024, (acceptance rate 21.1%, 99/470) |
[6] M. Copik, A. Calotoiu, G. Rethy, R. Böhringer, R. Bruno, T. Hoefler: | ||
Process-as-a-Service: Unifying Elastic and Stateful Clouds with Serverless Processes
Nov. 2024, |
[7] A. Lepori, A. Calotoiu, T. Hoefler: | ||
Iterating Pointers: Enabling Static Analysis for Loop-based Pointers
ACM Transactions on Architecture and Code Optimization. Oct. 2024, |
[8] M. Besta, R. Gerstenberger, P. Iff, P. Sonawane, J. Gómez Luna, R. Kanakagiri, R. Min, O. Mutlu, T. Hoefler, R. Appuswamy, A. O Mahony: | ||
Hardware Acceleration for Knowledge Graph Processing: Challenges & Recent Developments
arXiv:2408.12173. Aug. 2024, |
[9] M. Khalilov, M. Chrapek, S. Shen, A. Vezzu, T. Benz, S. Di Girolamo, T. Schneider, D. De Sensi, L. Benini, T. Hoefler: | ||
OSMOSIS: Enabling Multi-Tenancy in Datacenter SmartNICs
In The Proceedings of the 2024 USENIX Annual Technical Conference, USENIX: The Advanced Computing Systems Association, Jul. 2024, (acceptance rate 15.9%, 77/482) |
[10] L. Huang, L. Gianinazzi, Y. Yu, P. D. Dueben, T. Hoefler: | ||
DiffDA: a Diffusion model for weather-scale Data Assimilation
In Proceedings of the 41st International Conference on Machine Learning, Jul. 2024, |
[11] M. Besta, L. Paleari, A. Kubicek, P. Nyczyk, R. Gerstenberger, P. Iff, T. Lehmann, H. Niewiadomski, T. Hoefler: | ||
CheckEmbed: Effective Verification of LLM Solutions to Open-Ended Tasks
arXiv:2406.02524. Jun. 2024, |
[12] M. Copik, A. Calotoiu, P. Zhou, K. Taranov, T. Hoefler: | ||
FaaSKeeper: Learning from Building Serverless Services with ZooKeeper as an Example
Jun. 2024, |
[13] K. Lakhotia, L. Monroe, K. Isham, M. Besta, N. Blach, T. Hoefler, F. Petrini: | ||
PolarStar: Expanding the Horizon of Diameter-3 Networks
In Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA'24), presented in Nantes, France, pages 345–357, Association for Computing Machinery, ISBN: 9798400704161, Jun. 2024, (acceptance rate 29.9%, 35/117) |
[14] M. Besta, F. Scheidl, L. Gianinazzi, S. Klaiman, J. Müller, T. Hoefler: | ||
Demystifying Higher-Order Graph Neural Networks
arXiv:2406.12841. Jun. 2024, |
[15] M. Besta, A. Kubicek, R. Niggli, R. Gerstenberger, L. Weitzendorf, M. Chi, P. Iff, J. Gajda, P. Nyczyk, J. Müller, H. Niewiadomski, M. Chrapek, M. Podstawski, T. Hoefler: | ||
Multi-Head RAG: Solving Multi-Aspect Problems with LLMs
arXiv:2406.05085. Jun. 2024, |
[16] M. Besta, T. Hoefler: | ||
Parallel and Distributed Graph Neural Networks: An In-Depth Concurrency Analysis
IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol 46, Nr. 5, pages 2584-2606, IEEE Press, May 2024, |
[17] S. Ashkboos, M. L. Croci, M. Gennari do Nascimento, T. Hoefler, J. Hensman: | ||
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
In The Twelfth International Conference on Learning Representations, May 2024, |
[18] Y. Baumann, T. Ben-Nun, M. Besta, L. Gianinazzi, T. Hoefler, P. Luczynski: | ||
Low-Depth Spatial Tree Algorithms
In Proceedings of the 38th IEEE International Parallel and Distributed Processing Symposium (IPDPS'24), presented in San Francisco, CA, USA, pages 180-192, IEEE Press, May 2024, (acceptance rate 26.1%, 88/337) |
[19] T. Dettmers, R. A. Svirschevski, V. Egiazarian, D. Kuznedelev, E. Frantar, S. Ashkboos, A. Borzunov, T. Hoefler, D. Alistarh: | ||
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression
In The Twelfth International Conference on Learning Representations, May 2024, |
[20] M. Copik, M. Chrapek, L. Schmid, A. Calotoiu, T. Hoefler: | ||
Software Resource Disaggregation for HPC with Serverless Computing
In Proceedings of the 38th IEEE International Parallel and Distributed Processing Symposium (IPDPS'24), presented in San Francisco, CA, USA, IEEE, May 2024, |
[21] P. Luczynski, L. Gianinazzi, P. Iff, L. Wilson, D. De Sensi, T. Hoefler: | ||
Near-Optimal Wafer-Scale Reduce
In Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing (HPDC'24), presented in Pisa, Italy, Association for Computing Machinery, May 2024, |
[22] N. Blach, M. Besta, D. De Sensi, J. Domke, H. Harake, S. Li, P. Iff, M. Konieczny, K. Lakhotia, A. Kubicek, M. Ferrari, F. Petrini, T. Hoefler: | ||
A High-Performance Design, Implementation, Deployment, and Evaluation of the Slim Fly Network
In 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI '24), presented in Santa Clara, CA, USA, pages 1025-1044, USENIX Association, ISBN: 978-1-939133-39-7, Apr. 2024, |
[23] B. Stevens et al.: | ||
Earth Virtualization Engines (EVE)
Earth System Science Data (ESSD). Vol 16, Nr. 4, pages 2113-2122, Copernicus Publications, Apr. 2024, |
[24] N. Abubaker, T. Hoefler: | ||
SpComm3D: A Framework for Enabling Sparse Communication in 3D Sparse Kernels
arXiv:2404.19638. Apr. 2024, |
[25] D. De Sensi, T. Bonato, D. Saam, T. Hoefler: | ||
Swing: Short-cutting Rings for Higher Bandwidth Allreduce
In 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI '24), presented in Santa Clara, CA, USA, pages 1445-1462, USENIX Association, ISBN: 978-1-939133-39-7, Apr. 2024, |
[26] T. Hoefler, M. Copik, P. Beckman, A. Jones, I. Foster, M. Parashar, D. Reed, M. Troyer, T. Schulthess, D. Ernst, J. Dongarra: | ||
XaaS: Acceleration as a Service to Enable Productive High-Performance Cloud Computing
Computing in Science and Engineering (CiSE). IEEE Computer Society, ISSN: 1521-9615, Apr. 2024, |
[27] P. Bauer, T. Hoefler, B. Stevens, W. Hazeleger: | ||
Digital Twins of Earth and the Computing Challenge of Human Interaction
Nature Computational Science. Vol 4, Nr. 3, pages 154-157, Nature, ISSN: 2662-8457, Mar. 2024, |
[28] L. Gianinazzi, A. Nikolaos Ziogas, P. Luczynski, L. Huang, S. Ashkboos, F. Scheidl, A. Carigiet, C. Ge, N. Abubaker, M. Besta, T. Ben-Nun, T. Hoefler: | ||
Arrow Matrix Decomposition: A Novel Approach for Communication-Efficient Sparse Matrix Multiplication
In Proceedings of the 29th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'24), presented in Edinburgh, United Kingdom, pages 404-416, Association for Computing Machinery, Mar. 2024, |
[29] M. Besta, N. Blach, A. Kubicek, R. Gerstenberger, M. Podstawski, L. Gianinazzi, J. Gajda, T. Lehmann, H. Niewiadomski, P. Nyczyk, T. Hoefler: | ||
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
Proceedings of the AAAI Conference on Artificial Intelligence. Vol 38, Nr. 16, presented in Vancouver, Canada, pages 17682-17690, AAAI Press, Mar. 2024, (acceptance rate 23.75%, 2342/9862) |
[30] M. Besta, F. Memedi, Z. Zhang, R. Gerstenberger, G. Piao, N. Blach, P. Nyczyk, M. Copik, G. Kwaśniewski, J. Müller, L. Gianinazzi, A. Kubicek, H. Niewiadomski, A. O'Mahony, O. Mutlu, T. Hoefler: | ||
Demystifying Chains, Trees, and Graphs of Thoughts
arXiv:2401.14295. Jan. 2024, |
[31] L. Möller, M. Copik, A. Calotoiu, T. Hoefler: | ||
Cppless: Productive and Performant Serverless Programming in C++
arXiv:2401.10834. Jan. 2024, |
[32] W. Qiu, M. Copik, Y. Wang, A. Calotoiu, T. Hoefler: | ||
User-guided Page Merging for Memory Deduplication in Serverless Systems
In 2023 IEEE International Conference on Big Data (Big Data), Dec. 2023, (acceptance rate 17.5%, 92/526) |
[33] M. Besta, A. Claudino Catarino, L. Gianinazzi, N. Blach, P. Nyczyk, H. Niewiadomski, T. Hoefler: | ||
HOT: Higher-Order Dynamic Graph Representation Learning with Efficient Transformers
In Proceedings of the Second Learning on Graphs Conference (LOG'23), presented in Virtual, PMLR, Nov. 2023, |
[34] M. Besta, P. Renc, R. Gerstenberger, P. Sylos Labini, A. Ziogas, T. Chen, L. Gianinazzi, F. Scheidl, K. Szenes, A. Carigiet, P. Iff, G. Kwasniewski, R. Kanakagiri, C. Ge, S. Jaeger, J. Wąs, F. Vella, T. Hoefler: | ||
High-Performance and Programmable Attentional Graph Neural Networks with Global Tensor Formulations
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'23), presented in Denver, CO, USA, Association for Computing Machinery, ISBN: 979-8-400701-09-2, Nov. 2023, (acceptance rate 23.9%, 90/376) |
[35] P. Schaad, T. Schneider, T. Ben-Nun, A. Nikolaos Ziogas, A. Calotoiu, T. Hoefler: | ||
FuzzyFlow: Leveraging Dataflow To Find and Squash Program Optimization Bugs
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'23), Association for Computing Machinery, ISBN: 979-8-400701-09-2, Nov. 2023, (acceptance rate 23.9%, 90/376) |
[36] M. Besta, R. Gerstenberger, M. Fischer, M. Podstawski, N. Blach, B. Egeli, G. Mitenkov, W. Chlapek, M. Michalewicz, H. Niewiadomski, J. Müller, T. Hoefler: | ||
The Graph Database Interface: Scaling Online Transactional and Analytical Graph Workloads to Hundreds of Thousands of Cores
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'23), presented in Denver, CO, USA, Association for Computing Machinery, ISBN: 979-8-400701-09-2, Nov. 2023, (acceptance rate 23.9%, 90/376) Best Paper Finalist |
[37] R. L. Castro, A. Ivanov, D. Andrade, T. Ben-Nun, B. B. Fraguela, T. Hoefler: | ||
VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'23), presented in Denver, CO, USA, Association for Computing Machinery, ISBN: 979-8-400701-09-2, Nov. 2023, (acceptance rate 23.9%, 90/376) |
[38] W. Jiang, S. Li, Y. Zhu, J. de Fine Licht, Z. He, R. Shi, C. Renggli, S. Zhang, T. Rekatsinas, T. Hoefler, G. Alonso: | ||
Co-design Hardware and Algorithm for Vector Search
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'23), presented in Denver, CO, USA, Association for Computing Machinery, ISBN: 979-8-400701-09-2, Nov. 2023, (acceptance rate 23.9%, 90/376) |
[39] P. Iff, B. Bruggmann, M. Besta, L. Benini, T. Hoefler: | ||
RapidChiplet: A Toolchain for Rapid Design Space Exploration of Chiplet Architectures
arXiv:2311.06081. Nov. 2023, |
[40] M. Chrapek, M. Khalilov, T. Hoefler: | ||
HEAR: Homomorphically Encrypted Allreduce
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'23), presented in Denver, CO, USA, Association for Computing Machinery, ISBN: 979-8-400701-09-2, Nov. 2023, (acceptance rate 23.9%, 90/376) SC23 Best student paper, SC23 Reproducibility Advancement Award |
[41] P. Scheffler, F. Zaruba, F. Schuiki, T. Hoefler, L. Benini: | ||
Sparse Stream Semantic Registers: A Lightweight ISA Extension Accelerating General Sparse Linear Algebra
IEEE Trans. Parallel Distrib. Syst.. Vol 34, Nr. 12, pages 3147-3161, Oct. 2023, |
[42] Y. Li, J. C van Gemert, T. Hoefler, B. Moons, E. Eleftheriou, B. Verhoef: | ||
Differentiable Transportation Pruning
2023 IEEE/CVF International Conference on Computer Vision (ICCV). Oct. 2023, |
[43] M. Besta, R. Gerstenberger, E. Peter, M. Fischer, M. Podstawski, C. Barthels, G. Alonso, T. Hoefler: | ||
Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph Queries
ACM Comput. Surv.. Vol 56, Nr. 2, Association for Computing Machinery, ISSN: 0360-0300, Sep. 2023, |
[44] J. Bazinska, A. Ivanov, T. Ben-Nun, N. Dryden, M. Besta, S. Shen, T. Hoefler: | ||
Cached Operator Reordering: A Unified View for Fast GNN Training
arXiv:2308.12093. Aug. 2023, |
[45] T. Hoefler: | ||
Towards smart(er) High-Performance Networking Driving Future Simulations
(Presentation) presented in Seattla, WA, USA, Aug. 2023, Invited talk at the MODSIM'23 workshop |
[46] T. Hoefler: | ||
Scalable and Efficient AI: From Supercomputers to Smartphones
(Presentation) presented in Salt Lake City, UT, USA, Aug. 2023, Keynote talk at the 52nd International Conference on Parallel Processing |
[47] P. Bauer, P. D. Dueben, M. Chantry, F. Doblas-Reyes, T. Hoefler, A. McGovern, B. Stevens: | ||
Deep learning and a changing economy in weather and climate prediction
Nature Reviews Earth and Environment. Vol 4, Nr. 1, pages 507-509, Aug. 2023, |
[48] T. Hoefler, D. Roweth, K. Underwood, R. Alverson, M. Griswold, V. Tabatabaee, M. Kalkunte, S. Anubolu, S. Shen, M. McLaren, A. Kabbani, S. Scott: | ||
Data Center Ethernet and Remote Direct Memory Access: Issues at Hyperscale
IEEE Computer. Vol 56, Nr. 7, pages 67-77, IEEE Computer Society, ISSN: 1521-9615, Jul. 2023, Cover Feature Technology Predictions |
[49] A. Ivanov, B. Rothenberger, A. Dethise, M. Canini, T. Hoefler, A. Perrig: | ||
SAGE: Software-based Attestation for GPU Execution
In 2023 USENIX Annual Technical Conference (USENIX ATC 23), pages 485--499, USENIX Association, ISBN: 978-1-939133-35-9, Jul. 2023, |
[50] P. Iff, M. Besta, M. Cavalcante, T. Fischer, L. Benini, T. Hoefler: | ||
Sparse Hamming Graph: A Customizable Network-on-Chip Topology
In Proceedings of the 60th Annual Design Automation Conference, Jul. 2023, |
[51] P. Iff, M. Besta, M. Cavalcante, T. Fischer, L. Benini, T. Hoefler: | ||
HexaMesh: Scaling to Hundreds of Chiplets with an Optimized Chiplet Arrangement
In Proceedings of the 60th Annual Design Automation Conference, Jul. 2023, |
[52] T. De Matteis, L. Gianinazzi, J. de Fine Licht, T. Hoefler: | ||
Streaming Task Graph Scheduling for Dataflow Architectures
Jun. 2023, |
[53] M. Copik, R. Böhringer, A. Calotoiu, T. Hoefler: | ||
FMI: Fast and Cheap Message Passing for Serverless Functions
Jun. 2023, |
[54] K. Lakhotia, K. Isham, L. Monroe, M. Besta, T. Hoefler, F. Petrini: | ||
In-network Allreduce with Multiple Spanning Trees on PolarFly
In Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA'23), presented in Orlando, FL, USA, pages 165–176, Association for Computing Machinery, ISBN: 9781450395458, Jun. 2023, |
[55] M. Besta, M. Fischer, V. Kalavri, M. Kapralov, T. Hoefler: | ||
Practice of Streaming Processing of Dynamic Graphs: Concepts, Models, and Systems
IEEE Transactions of Parallel and Distributed Systems. Vol 34, Nr. 6, pages 1860-1876, IEEE, Jun. 2023, |
[56] A. Ivanov, T. Schneider, L. Benini, T. Hoefler: | ||
RIVETS: An Efficient Training and Inference Library for RISC-V with Snitch Extensions
In RISC-V Summit Europe, Jun. 2023, |
[57] T. Hoefler: | ||
Scalable and Efficient AI: From Supercomputers to Smartphones
(Presentation) presented in Orlando, FL, USA, Jun. 2023, Keynote talk at the 2023 Federated Computing Research Conference |
[58] T. Ben-Nun, L. Gianinazzi, T. Hoefler, Y. Oltchik: | ||
Maximum Flows in Parametric Graph Templates
In Algorithms and Complexity - 13th International Conference, Jun. 2023, |
[59] L. Truemper, T. Ben-Nun, P. Schaad, A. Calotoiu, T. Hoefler: | ||
Performance Embeddings: A Similarity-Based Transfer Tuning Approach to Performance Optimization
Jun. 2023, |
[60] L. Huang, T. Hoefler: | ||
Compressing multidimensional weather and climate data into neural networks
In The Eleventh International Conference on Learning Representations, May 2023, Notable Top 5% (Oral) |
[61] T. Hoefler, B. Stevens, A. F. Prein, J. Baehr, T. Schulthess, T. F. Stocker, J. Taylor, D. Klocke, P. Manninen, P. M. Forster, T. Kölling, N. Gruber, H. Anzt, C. Frauen, F. Ziemen, M. Klöwer, K. Kashinath, C. Schär, O. Fuhrer, B. N. Lawrence: | ||
Earth Virtualization Engines -- A Technical Perspective
Computing in Science and Engineering (CiSE). Vol 25, Nr. 3, IEEE Computer Society, ISSN: 1521-9615, May 2023, |
[62] T. Hoefler, T. Häner, M. Troyer: | ||
Disentangling hype from practicality: On realistically achieving quantum advantage
Vol 66, Nr. 5, In Communications of the ACM, pages 82-87, ACM, May 2023, |
[63] E. Frantar, S. Ashkboos, T. Hoefler, D. Alistarh: | ||
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
In The Eleventh International Conference on Learning Representations, May 2023, |
[64] M. Copik, K. Taranov, A. Calotoiu, T. Hoefler: | ||
rFaaS: Enabling High Performance Serverless with RDMA and Leases
In Proceedings of the 37th IEEE Interational Parallel and Distributed Processing Symposium, May 2023, |
[65] M. Copik, T. Hoefler: | ||
High-Performance Serverless for HPC and Clouds
In 37th IEEE International Parallel & Distributed Processing Symposium (IPDPS), PhD Forum, May 2023, |
[66] T. Ben-Nun, B. Ates, A. Calotoiu, T. Hoefler: | ||
Bridging Control-Centric and Data-Centric Optimization
In 2023 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pages 173-185, Feb. 2023, |
[67] S. Ashkboos, L. Huang, N. Dryden, T. Ben-Nun, P. Dueben, L. Gianinazzi, L. Kummer, T. Hoefler: | ||
ENS-10: A Dataset For Post-Processing Ensemble Weather Forecasts
In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, presented in New Orleans, Louisiana, Dec. 2022, |
[68] D. De Sensi, T. De Matteis, K. Taranov, S. Di Girolamo, T. Rahn, T. Hoefler: | ||
Noise in the Clouds: Influence of Network Performance Variability on Application Scalability
Proc. ACM Meas. Anal. Comput. Syst.. Vol 6, Nr. 3, presented in New York, NY, USA, Association for Computing Machinery, Dec. 2022, |
[69] M. Besta, P. Iff, F. Scheidl, K. Osawa, N. Dryden, M. Podstawski, T. Chen, T. Hoefler: | ||
Neural Graph Databases
In Proceedings of the Learning on Graphs Conference (LOG'22), presented in Virtual, PMLR, Dec. 2022, |
[70] N. Dryden, T. Hoefler: | ||
Spatial Mixture-of-Experts
In Advances in Neural Information Processing Systems 35, presented in New Orleans, Louisiana, Dec. 2022, |
[71] P. Schaad, T. Ben-Nun, T. Hoefler: | ||
Boosting Performance Optimization with Interactive Data Movement Visualization
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'22), ISBN: 9784665454445, Nov. 2022, |
[72] T. Ben-Nun, L. Groner, F. Deconinck, T. Wicky, E. Davis, J. Dahm, O. Elbert, R. George, J. McGibbon, L. Trümper, E. Wu, O. Fuhrer, T. Schulthess, T. Hoefler: | ||
Productive Performance Engineering for Weather and Climate Modeling with Python
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'22), ISBN: 9784665454445, Nov. 2022, |
[73] S. Di Girolamo, D. De Sensi, K. Taranov, M. Malesevic, M. Besta, T. Schneider, S. Kistler, T. Hoefler: | ||
Building Blocks for Network-Accelerated Distributed File Systems
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'22), Nov. 2022, Best Paper Finalist |
[74] A. Nikolaos Ziogas, G. Kwasniewski, T. Ben-Nun, T. Schneider, T. Hoefler: | ||
Deinsum: Practically I/O Optimal Multilinear Algebra
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'22), Nov. 2022, |
[75] M. Besta, C. Miglioli, P. Sylos Labini, J. Tětek, P. Iff, R. Kanakagiri, S. Ashkboos, K. Janda, M. Podstawski, G. Kwasniewski, N. Gleinig, F. Vella, O. Mutlu, T. Hoefler: | ||
ProbGraph: High-Performance and High-Accuracy Graph Mining with Probabilistic Set Representations
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'22), Nov. 2022, SC22 Best Paper (1/82) |
[76] K. Taranov, B. Rothenberger, D. De Sensi, A. Perrig, T. Hoefler: | ||
NeVerMore: Exploiting RDMA Mistakes in NVMe-oF Storage Applications
In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security (CCS '22), Nov. 2022, Best Paper Honorable Mention |
[77] S. Li, K. Osawa, T. Hoefler: | ||
Efficient Quantized Sparse Matrix Operations on Tensor Cores
Nov. 2022, Best Paper Finalist |
[78] T. Hoefler, T. Bonato, D. De Sensi, S. Di Girolamo, S. Li, M. Heddes, J. Belk, D. Goel, M. Castro, S. Scott: | ||
HammingMesh: A Network Topology for Large-Scale Deep Learning
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'22), Nov. 2022, SC22 Reproducibility Advancement Award and Invited as CACM Research Highlight |
[79] K. Lakhotia, M. Besta, L. Monroe, K. Isham, P. Iff, T. Hoefler, F. Petrini: | ||
PolarFly: A Cost-Effective and Flexible Low-Diameter Topology
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'22), Nov. 2022, |
[80] S. Cao, S. Di Girolamo, T. Hoefler: | ||
Accelerating Data Serialization/Deserialization Protocols with In-Network Compute
In 2022 IEEE/ACM International Workshop on Exascale MPI (ExaMPI), Nov. 2022, |
[81] M. E Beverland, P. Murali, M. Troyer, K. M Svore, T. Hoefler, V. Kliuchnikov, G. Hao Low, M. Soeken, A. Sundaram, A. Vaschillo: | ||
Assessing requirements to scale to practical quantum advantage
Nov. 2022, Presented at the Quantum Information Processing (QIP) conference |
[82] C. Johnsen, T. De Matteis, T. Ben-Nun, J. de Fine Licht, T. Hoefler: | ||
Temporal Vectorization: A Compiler Approach to Automatic Multi-Pumping
In 2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD), Oct. 2022, |
[83] M. Besta, R. Grob, C. Miglioli, N. Bernold, G. Kwasniewski, G. Gjini, R. Kanakagiri, S. Ashkboos, L. Gianinazzi, N. Dryden, T. Hoefler: | ||
Motif Prediction with Graph Neural Networks
In Proceedings of the 28th SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'22), presented in Washington DC, USA, pages 35–45, Association for Computing Machinery, ISBN: 9781450393850, Aug. 2022, |
[84] T. Hoefler: | ||
Benchmarking data science: Twelve ways to lie with statistics and performance on parallel computers
IEEE Computer. Vol 55, pages 49-56, Aug. 2022, Cover Feature Research Reproducibility |
[85] T. Hoefler, A. Hendel, D. Roweth: | ||
The Convergence of Hyperscale Data Center and High-Performance Computing Networks
IEEE Computer. Vol 55, Nr. 7, pages 29-37, Jul. 2022, Cover Feature Technology Predictions |
[86] A. Calotoiu, T. Ben-Nun, G. Kwasniewski, J. de Fine Licht, T. Schneider, P. Schaad, T. Hoefler: | ||
Lifting C Semantics for Dataflow Optimization
In Proceedings of the 2022 International Conference on Supercomputing (ICS'22), Jul. 2022, |
[87] A. Ivanov, N. Dryden, T. Hoefler: | ||
STen: An Interface for Efficient Sparsity in PyTorch
In Sparsity in Neural Networks workshop, Jul. 2022, |
[88] O. Rausch, T. Ben-Nun, N. Dryden, A. Ivanov, S. Li, T. Hoefler: | ||
A Data-Centric Optimization Framework for Machine Learning
In Proceedings of the 2022 International Conference on Supercomputing (ICS'22), Jul. 2022, |
[89] L. Schmid, M. Copik, A. Calotoiu, D. Werle, A. Reiter, M. Selzer, A. Koziolek, T. Hoefler: | ||
Performance-Detective: Automatic Deduction of Cheap and Accurate Performance Models
In Proceedings of the 2022 International Conference on Supercomputing (ICS'22), Jul. 2022, |
[90] N. Gleinig, M. Besta, T. Hoefler: | ||
I/O-Optimal Cache-Oblivious Sparse Matrix-Sparse Matrix Multiplication
In Proceedings of the 36th IEEE Interational Parallel and Distributed Processing Symposium (to appear), Jun. 2022, |
[91] A. Lascu, A. F. Donaldson, T. Grosser, T. Hoefler: | ||
Metamorphic Fuzzing of C++ Libraries
In IEEE International Conference on Software Testing, Verification and Validation, Jun. 2022, |
[92] A. Strausz, F. Vella, S. Di Girolamo, M. Besta, T. Hoefler: | ||
Asynchronous Distributed-Memory Triangle Counting and LCC with RMA Caching
In Proceedings of the 36th IEEE Interational Parallel and Distributed Processing Symposium (to appear), Jun. 2022, |
[93] K. Taranov, S. Byan, V. Marathe, T. Hoefler: | ||
KafkaDirect: Zero-copy Data Access for Apache Kafka over RDMA Networks
In Proceedings of the 2022 ACM SIGMOD International Conference on Management of Data, Jun. 2022, |
[94] N. Gleinig, T. Hoefler: | ||
The Red-Blue Pebble Game on Trees and DAGs with Large Input
In Structural Information and Communication Complexity - 29th International Colloquium, SIROCCO 2022, Proceedings (to appear), Jun. 2022, |
[95] J. de Fine Licht, C. A. Pattison, A. Nikolaos Ziogas, D. Simmons-Duffin, T. Hoefler: | ||
Fast Arbitrary Precision Floating Point on FPGA
In Proceedings of the 30th IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM'22), May 2022, |
[96] L. Gianinazzi, T. Ben-Nun, M. Besta, S. Ashkboos, Y. Baumann, P. Luczynski, T. Hoefler: | ||
The spatial computer: A model for energy-efficient parallel computation
arXiv:2205.04934. May 2022, |
[97] S. Li, T. Hoefler: | ||
Near-Optimal Sparse Allreduce for Distributed Deep Learning
In Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Apr. 2022, |
[98] B. A. Plummer, N. Dryden, J. Frost, T. Hoefler, K. Saenko: | ||
Neural Parameter Allocation Search
In Tenth International Conference on Learning Representations, Apr. 2022, |
[99] M. Copik, T. Grosser, T. Hoefler, P. Bientinesi, B. Berkels: | ||
Work-Stealing Prefix Scan: Addressing Load Imbalance in Large-Scale Image Registration
IEEE Transactions on Parallel and Distributed Systems. Vol 33, Nr. 3, pages 523-535, IEEE, Mar. 2022, |
[100] T. Hoefler, L. Groner, T. Ben-Nun, T. Wicky: | ||
Weather and Climate Simulations in Python using GT4Py and DaCe
(Presentation) presented in Virtual, Jan. 2022, |
[101] A. Cossettini, K. Taranov, C. Vogt, M. Magno, T. Hoefler, L. Benini: | ||
A RDMA Interface for Ultra-Fast Ultrasound Data-Streaming over an Optical Link
In Proceedings of Design, Automation, and Test in Europe (DATE), 2022, |
[102] N. Gleinig, T. Hoefler: | ||
An Efficient Algorithm for Sparse Quantum State Preparation
In Proceedings of the 58th Annual Design Automation Conference, presented in San Francisco, CA, USA, ACM, Dec. 2021, (acceptance rate 23%) |
[103] M. Copik, G. Kwasniewski, M. Besta, M. Podstawski, T. Hoefler: | ||
SeBS: A Serverless Benchmark Suite for Function-as-a-Service Computing
In Proceedings of the 22nd International Middleware Conference, presented in Qu\'{e}bec city, Canada, ACM, ISBN: 9781450385343, Dec. 2021, |
[104] T. Häner, D. S. Steiger, T. Hoefler, M. Troyer: | ||
Distributed Quantum Computing with QMPI
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC21), Nov. 2021, (acceptance rate 25.9%, 98/379) |
[105] A. Pitchanathan, C. Ulmann, M. Weber, T. Hoefler, T. Grosser: | ||
FPL: fast Presburger arithmetic through transprecision
OOPSLA '21: Proceedings of the ACM international conference on Object oriented programming systems languages and applications. ACM, Nov. 2021, OOPSLA distinguished paper award (6/71) |
[106] G. Kwasniewski, M. Kabić, T. Ben-Nun, A. Nikolaos Ziogas, J. Eirik Saethre, A. Gaillard, T. Schneider, M. Besta, A. Kozhevnikov, J. VandeVondele, T. Hoefler: | ||
On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal Matrix Factorizations
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC21), Nov. 2021, (acceptance rate 25.9%, 98/379) |
[107] S. Li, T. Hoefler: | ||
Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC21), presented in St. Louis, Missouri, ACM, Nov. 2021, (acceptance rate 25.9%, 98/379) Best Paper Finalist |
[108] N. Dryden, R. Böhringer, T. Ben-Nun, T. Hoefler: | ||
Clairvoyant Prefetching for Distributed Machine Learning I/O
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC21), presented in St. Louis, Missouri, ACM, Nov. 2021, (acceptance rate 25.9%, 98/379) |
[109] A. Nikolaos Ziogas, T. Schneider, T. Ben-Nun, A. Calotoiu, T. De Matteis, J. de Fine Licht, L. Lavarini, T. Hoefler: | ||
Productivity, Portability, Performance: Data-Centric Python
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC21), Nov. 2021, (acceptance rate 25.9%, 98/379) |
[110] D. De Sensi, S. Di Girolamo, S. Ashkboos, S. Li, T. Hoefler: | ||
Flare: Flexible In-Network Allreduce
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC21), presented in St. Louis, Missouri, ACM, Nov. 2021, (acceptance rate 25.9%, 98/379) |
[111] M. Besta, R. Kanakagiri, G. Kwasniewski, R. Ausavarungnirun, J. Beránek, K. Kanellopoulos, K. Janda, Z. Vonarburg-Shmaria, L. Gianinazzi, I. Stefan, J. Gómez Luna, M. Copik, L. Kapp-Schwoerer, S. Di Girolamo, N. Blach, M. Konieczny, O. Mutlu, T. Hoefler: | ||
SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory Systems
In Proceedings of the 54th IEEE/ACM International Symposium on Microarchitecture (MICRO), Oct. 2021, |
[112] T. Hoefler, D. Alistarh, T. Ben-Nun, N. Dryden, A. Peste: | ||
Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks
Journal of Machine Learning Research. Vol 22, Nr. 241, pages 1-124, Sep. 2021, |
[113] D. Ittah, T. Häner, V. Kliuchnikov, T. Hoefler: | ||
QIRO: A Static Single Assignment-Based Quantum Program Representation for Optimization
In ACM Transactions on Quantum Computing, Association for Computing Machinery, ISSN: 2643-6809, Aug. 2021, |
[114] M. Besta, Z. Vonarburg-Shmaria, Y. Schaffner, L. Schwarz, G. Kwasniewski, L. Gianinazzi, J. Beranek, K. Janda, T. Holenstein, S. Leisinger, P. Tatkowski, E. Ozdemir, A. Balla, M. Copik, P. Lindenberger, P. Kalvoda, M. Konieczny, O. Mutlu, T. Hoefler: | ||
GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra
In Proceedings of the 47th International Conference on Very Large Data Bases (VLDB'21), Aug. 2021, |
[115] K. Taranov, R. Bruno, G. Alonso, T. Hoefler: | ||
Naos: Serialization-free RDMA networking in Java
In Proceedings of the 2021 USENIX Annual Technical Conference, USENIX, Jul. 2021, (acceptance rate 18.8%, 64/341) |
[116] G. Kwasniewski, T. Ben-Nun, L. Gianinazzi, A. Calotoiu, T. Schneider, A. Nikolaos Ziogas, M. Besta, T. Hoefler: | ||
Pebbles, Graphs, and a Pinch of Combinatorics: Towards Tight I/O Lower Bounds for Statically Analyzable Programs
In Proceedings of the 33nd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA'21), Jul. 2021, (acceptance rate 14.9%) |
[117] C. Cummins, Z. V. Fisches, T. Ben-Nun, T. Hoefler, M. O’Boyle, H. Leather: | ||
ProGraML: A Graph-based Program Representation for Data Flow Analysis and Compiler Optimizations
In Thirty-eighth International Conference on Machine Learning, presented in Virtual, PMLR, Jul. 2021, (acceptance rate 21%) |
[118] M. Planeta, J. Bierbaum, L. Sahaya Daphne Antony, T. Hoefler, H. Härtig: | ||
MigrOS: Transparent Live-Migration Support for Containerised RDMA Applications
In Proceedings of the 2021 USENIX Annual Technical Conference, USENIX, Jul. 2021, (acceptance rate 18.8%, 64/341) |
[119] L. Gianinazzi, M. Besta, Y. Schaffner, T. Hoefler: | ||
Parallel Algorithms for Finding Large Cliques in Sparse Graphs
In Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA'21), ACM, Jul. 2021, |
[120] L. Gianinazzi, M. Fries, N. Dryden, T. Ben-Nun, M. Besta, T. Hoefler: | ||
Learning Combinatorial Node Labeling Algorithms
arXiv:2106.03594. Jun. 2021, |
[121] S. Di Girolamo, A. Kurth, A. Calotoiu, T. Benz, T. Schneider, J. Beránek, L. Benini, T. Hoefler: | ||
A RISC-V in-network accelerator for flexible high-performance low-power packet processing
In Proceedings of the 48th Annual International Symposium on Computer Architecture (ISCA'21), Jun. 2021, |
[122] A. Nikolaos Ziogas, T. Ben-Nun, T. Schneider, T. Hoefler: | ||
NPBench: A Benchmarking Suite for High-Performance NumPy
In Proceedings of the 2021 International Conference on Supercomputing (ICS'21), Jun. 2021, |
[123] K. Taranov, S. Di Girolamo, T. Hoefler: | ||
CoRM: Compactable Remote Memory over RDMA
In Proceedings of the 2021 ACM SIGMOD International Conference on Management of Data, Jun. 2021, |
[124] M. Besta, M. Schneider, S. Di Girolamo, A. Singla, T. Hoefler: | ||
Towards Million-Server Network Simulations on Just a Laptop
May 2021, |
[125] J. de Fine Licht, M. Besta, S. Meierhans, T. Hoefler: | ||
Transformations of High-Level Synthesis Codes for High-Performance Computing
IEEE Transactions on Parallel and Distributed Systems. Vol 32, Nr. 5, pages 1014-1029, IEEE, May 2021, |
[126] M. Ritter, A. Geiss, J. Wehrstein, A. Calotoiu, T. Reimann, T. Hoefler, F. Wolf: | ||
Noise-Resilient Empirical Performance Modeling with Deep Neural Networks
In IPDPS '21: Proceedings of the 35th IEEE Interational Parallel and Distributed Processing Symposium, May 2021, |
[127] M. Besta, J. Domke, M. Schneider, M. Konieczny, S. Di Girolamo, T. Schneider, A. Singla, T. Hoefler: | ||
High-Performance Routing with Multipathing and Path Diversity in Ethernet and HPC Networks
IEEE Transactions of Parallel and Distributed Systems. Vol 32, Nr. 4, pages 943-959, IEEE, Apr. 2021, |
[128] A. Ivanov, N. Dryden, T. Ben-Nun, S. Li, T. Hoefler: | ||
Data Movement Is All You Need: A Case Study on Optimizing Transformers
In Proceedings of Machine Learning and Systems 3 (MLSys 2021), Apr. 2021, (acceptance rate: 23.5% (52/221)) Outstanding Paper Award (5/52) |
[129] P. Bauer, P. D. Dueben, T. Hoefler, T. Quintino, T. C. Schulthess, N. P. Wedi: | ||
The digital revolution of Earth-system science
Nature Computational Science. Vol 1, Nr. 1, pages 104-113, Feb. 2021, |
[130] M. Copik, A. Calotoiu, T. Grosser, N. Wicki, F. Wolf, T. Hoefler: | ||
Extracting Clean Performance Models from Tainted Programs
In PPoPP '21: Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Feb. 2021, (acceptance rate: 21% (31/150)) |
[131] P. Grönquist, C. Yao, T. Ben-Nun, N. Dryden, P. Dueben, S. Li, T. Hoefler: | ||
Deep Learning for Post-Processing Ensemble Weather Forecasts
Philosophical Transactions of the Royal Society A. Vol 379, Nr. 2194, The Royal Society, Feb. 2021, |
[132] S. Li, T. Ben-Nun, G. Nadiradze, S. Di Girolamo, N. Dryden, D. Alistarh, T. Hoefler: | ||
Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging
IEEE Transactions on Parallel and Distributed Systems. Vol 32, Nr. 7, pages 1725-1739, IEEE, 2021, |
[133] T. Gysi, C. Müller, O. Zinenko, S. Herhut, E. Davis, T. Wicky, O. Fuhrer, T. Hoefler, T. Grosser: | ||
Domain-Specific Multi-Level IR Rewriting for GPU: The Open Earth Compiler for GPU-Accelerated Climate Simulation
ACM Trans. Archit. Code Optim.. Vol 18, Nr. 4, Association for Computing Machinery, ISSN: 1544-3566, 2021, |
[134] P. Scheffler, F. Zaruba, F. Schuiki, T. Hoefler, L. Benini: | ||
Indirection Stream Semantic Register Architecture for Efficient Sparse-Dense Linear Algebra
In Proceedings of Design, Automation, and Test in Europe (DATE), 2021, |
[135] B. Rothenberger, K. Taranov, A. Perrig, T. Hoefler: | ||
ReDMArk: Bypassing RDMA Security Mechanisms
In Proceedings of the 2021 USENIX Security Symposium, USENIX, 2021, |
[136] J. de Fine Licht, A. Kuster, T. De Matteis, T. Ben-Nun, D. Hofer, T. Hoefler: | ||
StencilFlow: Mapping Large Stencil Programs to Distributed Spatial Computing Systems
In Proceedings of the 19th ACM/IEEE International Symposium on Code Generation and Optimization (CGO'21), 2021, |
[137] T. Hoefler: | ||
A Data-Centric Approach to Performance Portability
(Presentation) presented in Virtual, Dec. 2020, Keynote talk at the Computing Insight UK 2020 Conference (CIUK'19) |
[138] T. De Matteis, J. de Fine Licht, T. Hoefler: | ||
FBLAS: Streaming Linear Algebra on FPGA
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC20), IEEE Press, ISBN: 9781728199986, Nov. 2020, (acceptance rate: 25.1% (95/378)) |
[139] Y. Jin, H. Wang, T. Yu, X. Tang, T. Hoefler, X. Liu, J. Zhai: | ||
SCALANA: Automating Scaling Loss Detection with Graph Analysis
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC20), Nov. 2020, (acceptance rate 25.1% (95/378)) |
[140] D. De Sensi, S. Di Girolamo, K. H. McMahon, D. Roweth, T. Hoefler: | ||
An In-Depth Analysis of the Slingshot Interconnect
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC20), Nov. 2020, (acceptance rate: 25.1% (95/378)) |
[141] T. Hoefler: | ||
Scientific Benchmarking of Parallel Computing Systems
(Presentation) presented in virtual, Nov. 2020, Keynote talk at the 2020 BenchCouncil International Symposium on Benchmarking, Measuring and Optimizing (Bench'20) |
[142] M. Besta, M. Schneider, M. Konieczny, K. Cynk, E. Henriksson, S. Di Girolamo, A. Singla, T. Hoefler: | ||
FatPaths: Routing in Supercomputers and Data Centers when Shortest Paths Fall Short
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC20), Nov. 2020, (acceptance rate: 25.1% (95/378)) |
[143] A. Calotoiu, M. Geisenhofer, F. Kummer, M. Ritter, J. Weber, T. Hoefler, M. Oberlack, F. Wolf: | ||
Empirical Modeling of Spatially Diverging Performance
In 2020 IEEE/ACM International Workshop on HPC User Support Tools (HUST) and Workshop on Programming and Performance Visualization, Nov. 2020, |
[144] A. Ali Khan, H. Mewes, T. Grosser, T. Hoefler, J. Castrillon: | ||
Polyhedral Compilation for Racetrack Memories
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. Vol 39, Nr. 11, IEEE, Nov. 2020, |
[145] T. Grosser, T. Theodoridis, M. Falkenstein, A. Pitchanathan, M. Kruse, M. Rigger, Z. Su, T. Hoefler: | ||
Fast Linear Programming through Transprecision Computing on Small and Sparse Data
OOPSLA '20: Proceedings of the ACM international conference on Object oriented programming systems languages and applications. ACM, Nov. 2020, |
[146] T. Häner, M. Troyer, T. Hoefler: | ||
Assertion-based optimization of quantum programs
OOPSLA '20: Proceedings of the ACM international conference on Object oriented programming systems languages and applications. ACM, Nov. 2020, |
[147] M. Besta, A. Carigiet, K. Janda, Z. Vonarburg-Shmaria, L. Gianinazzi, T. Hoefler: | ||
High-Performance Parallel Graph Coloring with Strong Guarantees on Work, Depth, and Quality
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC20), Nov. 2020, (acceptance rate: 25.1% (95/378)) |
[148] T. Hoefler: | ||
General in-network processing - time is ripe!
(Presentation) presented in hybrid/virtual, Oct. 2020, Keynote talk at the High-performance Interconnects Forum (in conjunction with HPC China 2020) |
[149] G. Kwasniewski, T. Ben-Nun, A. Nikolaos Ziogas, T. Schneider, M. Besta, T. Hoefler: | ||
On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal LU Factorization
arXiv:2010.05975. Oct. 2020, |
[150] L. Chelini, T. Gysi, T. Grosser, M. Kong, H. Corporaal: | ||
Automatic Generation of Multi-Objective Polyhedral Compiler Transformations
In Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques, presented in Virtual, ACM, Oct. 2020, |
[151] T. Hoefler: | ||
High-performance distributed memory systems – from supercomputers to data centers
(Presentation) presented in virtual, Oct. 2020, Keynote talk at the 2020 International Symposium on DIStributed Computing (DISC) |
[152] C. Barthels, I. Müller, K. Taranov, T. Hoefler, G. Alonso: | ||
Strong consistency is not hard to get: TwoPhase Locking and TwoPhase Commit on Thousands of Cores
In Proceedings of the VLDB Endowment, Vol. 12, No. 13, VLDB Endowment, Sep. 2020, |
[153] F. Zaruba, F. Schuiki, T. Hoefler, L. Benini: | ||
Snitch: A tiny Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads
IEEE Transactions on Computers (TOC). IEEE, Sep. 2020, Featured Paper in November 2021 issue |
[154] A. Nigay, L. Mosimann, T. Schneider, T. Hoefler: | ||
Communication and Timing Issues with MPI Virtualization
In 27th European MPI Users' Group Meeting, presented in Austin, TX, USA, pages 11–20, Association for Computing Machinery, ISBN: 9781450388801, Sep. 2020, |
[155] K. Taranov, B. Rothenberger, A. Perrig, T. Hoefler: | ||
sRDMA -- Efficient NIC-based Authentication and Encryption for Remote Direct Memory Access
In Proceedings of the 2020 USENIX Annual Technical Conference, USENIX, Jul. 2020, (acceptance rate 18.6%, 65/348) |
[156] L. Gianinazzi, T. Hoefler: | ||
Parallel Planar Subgraph Isomorphism and Vertex Connectivity
In Proceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA'20), ACM, Jul. 2020, Best Paper Finalist (5/68) |
[157] T. Hoefler: | ||
Deep Learning for Post-Processing Ensemble Weather Forecasts
(Presentation) presented in virtual, Jun. 2020, Invited talk at the 2020 ESIWACE Workshop |
[158] T. Hoefler: | ||
High-Performance Communication in Machine Learning
(Presentation) presented in virtual, Jun. 2020, Keynote talk at the 2020 International Conference on High Performance Big Data and Intelligent Systems (HPBD&IS 2020) |
[159] E. Hoffer, T. Ben-Nun, I. Hubara, N. Giladi, T. Hoefler, D. Soudry: | ||
Increasing batch size through instance repetition improves generalization
In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2020, |
[160] A. Kurth, S. Riedel, F. Zaruba, T. Hoefler, L. Benini: | ||
ATUNs: Modular and Scalable Support for Atomic Operations in a Shared Memory Multiprocessor
In Proceedings of the 57th Annual Design Automation Conference, ACM, Jun. 2020, Best Paper Finalist (6/228) |
[161] M. Besta, R. Kanakagiri, H. Mustafa, M. Karasikov, G. Rätsch, T. Hoefler, E. Solomonik: | ||
Communication-Efficient Jaccard Similarity for High-Performance Distributed Genome Comparisons
May 2020, In Proceedings of the 34th IEEE International Parallel and Distributed Processing Symposium |
[162] M. Ritter, A. Calotoiu, T. Reimann, T. Hoefler, F. Wolf: | ||
Learning Cost-Effective Sampling Strategies for Empirical Performance Modeling
presented in New Orleans, LA, USA, IEEE, May 2020, The 34th IEEE International Parallel & Distributed Processing Symposium (IPDPS'20) |
[163] C. Osuna, T. Wicky, F. Thuering, T. Hoefler, O. Fuhrer: | ||
Dawn: a High Level Domain-Specific Language Compiler Toolchain for Weather and Climate Applications
Supercomputing Frontiers and Innovation. Vol 7, Nr. 2, May 2020, |
[164] F. Schuiki, F. Zaruba, T. Hoefler, L. Benini: | ||
Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores
IEEE Transactions on Computers (TOC). IEEE, Apr. 2020, |
[165] S. Li, T. Ben-Nun, S. Di Girolamo, D. Alistarh, T. Hoefler: | ||
Taming Unbalanced Training Workloads in Deep Learning with Partial Collective Operations
In Proceedings of the 25th Symposium on Principles and Practice of Parallel Programming (PPoPP'20), Feb. 2020, (acceptance rate: 23.1% (28/121)) Best Paper Nomination (5/28) |
[166] J. de Fine Licht, G. Kwasniewski, T. Hoefler: | ||
Flexible Communication Avoiding Matrix Multiplication on FPGA with High-Level Synthesis
Feb. 2020, In Proceedings of the 28th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays |
[167] M. Besta, M. Fischer, T. Ben-Nun, D. Stanojevic, J. de Fine Licht, T. Hoefler: | ||
Substream-Centric Maximum Matchings on FPGA
Jan. 2020, In Proceedings of the ACM Trans. Reconfig. Technol. Syst Special Issue, Invited Paper |
[168] A. Calotoiu, M. Copik, T. Hoefler, M. Ritter, S. Shudler, F. Wolf: | ||
ExtraPeak: Advanced Automatic Performance Modeling for HPC Applications
Springer. In Software for Exascale Computing - SPPEXA 2016-2019, pages 453–482, 2020, |
[169] C. Schär, O. Fuhrer, A. Arteaga, N. Ban, C. Charpilloz, S. Di Girolamo, L. Hentgen, T. Hoefler, X. Lapillonne, D. Leutwyler, K. Osterried, D. Panosetti, S. Rüdisühli, L. Schlemmer, T. Schulthess, M. Sprenger, S. Ubbiali, H. Wernli: | ||
Kilometer-scale climate models: Prospects and challenges
Bulletin of the American Meteorological Society. Vol 100, Nr. 12, American Meteorological Society, Dec. 2019, Early Online Release |
[170] P. Grönquist, T. Ben-Nun, N. Dryden, P. Dueben, L. Lavarini, S. Li, T. Hoefler: | ||
Predicting Weather Uncertainty with Deep Convnets
In Machine Learning and the Physical Sciences Workshop at the 33rd Conference on Neural Information Processing Systems (NeurIPS), presented in Vancouver, BC, Canada, Dec. 2019, |
[171] C. Renggli, D. Alistarh, M. Aghagolzadeh, T. Hoefler: | ||
SparCML: High-Performance Sparse Communication for Machine Learning
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC19), Nov. 2019, (acceptance rate: 22.7% (78/344)) |
[172] A. Nikolaos Ziogas, T. Ben-Nun, G. Indalecio Fernández, T. Schneider, M. Luisier, T. Hoefler: | ||
Optimizing the Data Movement in Quantum Transport Simulations via Data-Centric Parallel Programming
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC19), Nov. 2019, (acceptance rate: 22.7% (78/344)) |
[173] M. Besta, S. Weber, L. Gianinazzi, R. Gerstenberger, A. Ivanov, Y. Oltchik, T. Hoefler: | ||
Slim Graph: Practical Lossy Graph Compression for Approximate Graph Processing, Storage, and Analytics
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC19), Nov. 2019, (acceptance rate: 22.7% (78/344)) Best Paper Finalist, Best Student Paper Finalist |
[174] A. Nikolaos Ziogas, T. Ben-Nun, G. Indalecio Fernández, T. Schneider, M. Luisier, T. Hoefler: | ||
A Data-Centric Approach to Extreme-Scale Ab initio Dissipative Quantum Transport Simulations
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC19), Nov. 2019, Won ACM Gordon Bell Prize |
[175] S. Di Girolamo, K. Taranov, A. Kurth, M. Schaffner, T. Schneider, J. Beránek, M. Besta, L. Benini, D. Roweth, T. Hoefler: | ||
Network-Accelerated Non-Contiguous Memory Transfers
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC19), Nov. 2019, (acceptance rate: 22.7% (78/344)) |
[176] D. De Sensi, S. Di Girolamo, T. Hoefler: | ||
Mitigating Network Noise on Dragonfly Networks through Application-Aware Routing
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC19), Nov. 2019, (acceptance rate: 22.7% (78/344)) |
[177] T. Ben-Nun, J. de Fine Licht, A. Nikolaos Ziogas, T. Schneider, T. Hoefler: | ||
Stateful Dataflow Multigraphs: A Data-Centric Model for Performance Portability on Heterogeneous Architectures
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC19), Nov. 2019, (acceptance rate: 22.7% (78/344)) |
[178] T. De Matteis, J. de Fine Licht, J. Beránek, T. Hoefler: | ||
Streaming Message Interface: High-Performance Distributed Memory Programming on Reconfigurable Hardware
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC19), Nov. 2019, (acceptance rate: 22.7% (78/344)) |
[179] G. Kwasniewski, M. Kabić, M. Besta, J. VandeVondele, R. Solcà, T. Hoefler: | ||
Red-Blue Pebbling Revisited: Near Optimal Parallel Matrix-Matrix Multiplication
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC19), Nov. 2019, (acceptance rate: 22.7% (78/344)) Best Paper Finalist, SC19 Best Student Paper (1/87) |
[180] J. de Fine Licht, T. Hoefler: | ||
hlslib: Software Engineering for Hardware Design
In Fifth International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC'19), presented in Denver, CO, United States, IEEE, Nov. 2019, |
[181] T. Hoefler: | ||
HPC for ML and ML for HPC - Scalability, Communication, and Programming
(Presentation) presented in Denver, CO, USA, Nov. 2019, Keynote talk at the International Machine Learning in High-Performance Computing (MLHPC'19 in conjunction with ACM/IEEE Supercomputing, SC19) |
[182] T. Hoefler: | ||
Data-Centric Parallel Programming
(Presentation) presented in Prague, Czech Republic, Sep. 2019, Keynote talk at the The 18th International Parallel Computing conference (ParCo'19) |
[183] T. Hoefler: | ||
High-Performance Communication in Machine Learning
(Presentation) presented in Bialystok, Poland, Sep. 2019, Keynote talk at the 13th International Conference on Parallel Processing and Applied Mathematics (PPAM'19) |
[184] T. Gysi, T. Grosser, T. Hoefler: | ||
Absinthe: Learning an Analytical Performance Model to Fuse and Tile Stencil Codes in One Shot
In Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques (PACT), presented in Seattle, WA, USA, IEEE, Sep. 2019, |
[185] T. Ben-Nun, T. Hoefler: | ||
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis
ACM Comput. Surv.. Vol 52, Nr. 4, pages 65:1--65:43, ACM, ISSN: 0360-0300, Aug. 2019, |
[186] S. Shudler, Y. Berens, A. Calotoiu, T. Hoefler, A. Strube, F. Wolf: | ||
Engineering Algorithms for Scalability through Continuous Validation of Performance Expectations
IEEE Transactions on Parallel and Distributed Systems. Vol 30, Nr. 8, pages 1768-1785, IEEE, Aug. 2019, |
[187] E. Hoffer, B. Weinstein, I. Hubara, T. Ben-Nun, T. Hoefler, D. Soudry: | ||
Mix & match: training convnets with mixed image sizes for improved accuracy, speed and scale resiliency
Aug. 2019, |
[188] M. Besta, T. Hoefler: | ||
Towards high-performance processing, storage, and analytics of extreme-scale graphs
(Presentation) presented in Valencia, Spain, Jun. 2019, Invited talk at the 2019 International Congress on Industrial and Applied Mathematics (ICIAM'19) |
[189] T. Hoefler: | ||
Performance Reproducibility in HPC and Deep Learning
(Presentation) presented in Frankfurt, Germany, Jun. 2019, Keynote talk at the Numerical Reproducibility at Exascale Workshop (NRE2019), ISC’19 |
[190] T. Hoefler, T. Ben-Nun: | ||
Optimizing and Benchmarking Large-Scale Deep Learning
(Presentation) presented in Frankfurt, Germany, Jun. 2019, Invited talk at the Machine Learning day at the International Conference on Supercomputing (ISC'19) |
[191] T. Hoefler: | ||
The Green Graph500 List (June 2019)
(Presentation) presented in Frankfurt, Germany, Jun. 2019, Presented at the Green Graph 500 BoF at the International Conference on Supercomputing (ISC'19) |
[192] T. Hoefler, A. Nikolaos Ziogas, T. Ben-Nun, G. Indalecio Fernández, T. Schneider, M. Luisier, J. de Fine Licht: | ||
Data-Centric Parallel Programming
(Presentation) presented in Frankfurt, Germany, Jun. 2019, invited talk at the International Conference on Supercomputing (ISC'19) |
[193] P. R. Eller, T. Hoefler, W. Gropp: | ||
Using Performance Models to Understand Scalable Krylov Solver Performance at Scale for Structured Grid Problems
In Proceedings of the 2019 ACM International Conference on Supercomputing (ICS'19), presented in Phoenix, AZ, ACM, Jun. 2019, |
[194] T. Gysi, T. Grosser, L. Brandner, T. Hoefler: | ||
A Fast Analytical Model of Fully Associative Caches
In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, presented in Phoenix, AZ, USA, pages 816--829, ACM, ISBN: 978-1-4503-6712-7, Jun. 2019, |
[195] N. Gleinig, F. Ann Hubis, T. Hoefler: | ||
Embedding Functions Into Reversible Circuits: A Probabilistic Approach to the Number of Lines
In Proceedings of the 56th Annual Design Automation Conference, presented in Las Vegas, NV, USA, ACM, ISBN: 978-1-4503-6725-7/19/06, Jun. 2019, |
[196] F. Thaler, S. Moosbrugger, C. Osuna, M. Bianco, H. Vogt, A. Afanasyev, L. Mosimann, O. Fuhrer, T. Schulthess, T. Hoefler: | ||
Porting the COSMO Weather Model to Intel KNL
presented in Zurich, Switzerland, ACM, Jun. 2019, Accepted at the ACM Platform for Advanced Scientific Computing Conference (PASC19) |
[197] S. Di Girolamo, P. Schmid, T. Schulthess, T. Hoefler: | ||
SimFS: A Simulation Data Virtualizing File System Interface
In Proceedings of the 33st IEEE International Parallel & Distributed Processing Symposium (IPDPS'19), presented in Rio de Janeiro, Brazil, IEEE, May 2019, |
[198] T. Hoefler: | ||
Performance Portability with Data-Centric Parallel Programming
(Presentation) presented in Rio de Janeiro, Brasil, May 2019, Keynote talk at the The Ninth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES) (delayed online) |
[199] T. Ben-Nun, M. Besta, S. Huber, A. Nikolaos Ziogas, D. Peter, T. Hoefler: | ||
A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning
IEEE, May 2019, Accepted at the 33rd IEEE International Parallel & Distributed Processing Symposium (IPDPS'19) |
[200] T. Hoefler: | ||
High-Performance Communication for Machine Learning
(Presentation) presented in Huddersfield, UK, Apr. 2019, Keynote talk at the 5th Conference on Emerging Technologies – EMiT2019 |
[201] T. Hoefler: | ||
RDMA, Scalable MPI-3 RMA, and Next-Generation Post-RDMA Interconnects
(Presentation) Apr. 2019, Best talk award winner at Swiss HPC Advisory Council Conference 2019 |
[202] T. Hoefler: | ||
Extreme-Scale Graphs
(Presentation) presented in Warsaw, Poland, Mar. 2019, Invited talk at Supercomputing Frontiers Europe 2019 |
[203] M. Besta, M. Fischer, T. Ben-Nun, J. de Fine Licht, T. Hoefler: | ||
Substream-Centric Maximum Matchings on FPGA
Feb. 2019, In Proceedings of the 27th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (acceptance rate: 23%) Best Paper Finalist (4/30) |
[204] M. Kuettler, M. Planeta, J. Bierbaum, C. Weinhold, H. Haertig, A. Barak, T. Hoefler: | ||
Corrected Trees for Reliable Group Communication
Feb. 2019, Accepted at The ACM Conference Principles and Practice of Parallel Programming 2019 (PPoPP'19) (acceptance rate: 19% (29/152)) |
[205] T. Hoefler: | ||
High-Performance Communication in Machine Learning
(Presentation) presented in Knowville, TN, Feb. 2019, |
[206] M. Besta, D. Stanojevic, J. de Fine Licht, T. Ben-Nun, T. Hoefler: | ||
Graph Processing on FPGAs: Taxonomy, Survey, Challenges
CoRR. Vol abs/1903.06697, Feb. 2019, |
[207] A. Nigay, T. Schneider, T. Hoefler: | ||
TinyMPI tasking prototype
Feb. 2019, |
[208] T. Hoefler: | ||
High-Performance Communication in Machine Learning
(Presentation) presented in Grundlsee, Austria, Feb. 2019, Keynote at the Austrian HPC meeting 2019 |
[209] T. Hoefler: | ||
An HPC Systems Guy’s View of Quantum Computing
(Presentation) presented in Darstadt, Germany, Jan. 2019, |
[210] T. Hoefler: | ||
High-Performance Communication for Machine Learning
(Presentation) presented in Aachen, Germany, Jan. 2019, |
[211] T. Schulthess, P. Bauer, O. Fuhrer, T. Hoefler, C. Schaer, N. Wedi: | ||
Reflecting on the goal and baseline for exascale computing: a roadmap based on weather and climate simulations
Computing in Science and Engineering (CiSE). Vol 21, Nr. 1, IEEE Computer Society, ISSN: 1521-9615, Jan. 2019, |
[212] T. Hoefler: | ||
MPI Remote Memory Access Programming and Scientific Benchmarking of Parallel Codes
(Presentation) presented in Aachen, Germany, Jan. 2019, |
[213] T. Ben-Nun, A. Shoshana Jakobovits, T. Hoefler: | ||
Neural Code Comprehension: A Learnable Representation of Code Semantics
In Advances in Neural Information Processing Systems 31, presented in Montreal, Canada, pages 3589--3601, Curran Associates, Inc., Dec. 2018, |
[214] D. Alistarh, T. Hoefler, M. Johansson, S. Khirirat, N. Konstantinov, C. Renggli: | ||
The Convergence of Sparsified Gradient Methods
In Advances in Neural Information Processing Systems 31, presented in Montreal, Canada, Curran Associates, Inc., Dec. 2018, |
[215] M. Besta, D. Stanojevic, T. Zivic, J. Singh, M. Hoerold, T. Hoefler: | ||
Log(Graph): A Near-Optimal High-Performance Graph Representation
presented in Limassol, Cyprus, ACM, Nov. 2018, Accepted at the 27th International Conference on Parallel Architectures and Compilation (PACT'18) |
[216] T. Hoefler: | ||
Twelve ways to fool the masses when reporting performance of deep learning workloads
(Presentation) presented in Los Angeles, CA, Nov. 2018, Workshop III: HPC for Computationally and Data-Intensive Problems |
[217] H. Lin, X. Zhu, B. Yu, X. Tang, W. Xue, W. Chen, L. Zhang, T. Hoefler, X. Ma, X. Liu, W. Zheng, J. Xu: | ||
ShenTu: Processing Multi-Trillion Edge Graphs on Millions of Cores in Seconds
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC18) - Gordon Bell Award Finalist, presented in Denver, CO, USA, ACM, Nov. 2018, Gordon Bell Award Finalist |
[218] T. Hoefler: | ||
RDMA, Scalable MPI-3 RMA, and Next-Generation Post-RDMA Interconnects
(Presentation) presented in Dallas, TX, USA, Nov. 2018, Keynote at ExaMPI 2018 Workshop (in conjunction with SC18) |
[219] T. Hoefler: | ||
High-Performance Communication for Machine Learning
(Presentation) presented in Los Angeles, CA, Nov. 2018, Workshop III: HPC for Computationally and Data-Intensive Problems |
[220] T. Hoefler: | ||
High Level Programming Languages for Quantum Computation
(Presentation) presented in Dallas, TX, USA, Nov. 2018, |
[221] T. Hoefler: | ||
Will FPGAs make it this time?
(Presentation) presented in Dallas, TX, USA, Nov. 2018, |
[222] T. Hoefler: | ||
Deep500: An HPC Deep Learning Benchmark and Competition
(Presentation) presented in Dallas, TX, USA, Nov. 2018, |
[223] R. Gerstenberger, M. Besta, T. Hoefler: | ||
Enabling Highly-Scalable Remote Memory Access Programming with MPI-3 One Sided
In Communications of the ACM, ACM, Oct. 2018, Research Highlights |
[224] A. Calotoiu, A. Graf, T. Hoefler, D. Lorenz, S. Rinke, F. Wolf: | ||
Lightweight Requirements Engineering for Exascale Co-design
In {IEEE} International Conference on Cluster Computing, {CLUSTER} 2018, Belfast, UK, September 10-13, 2018, presented in Belfast, UK, IEEE, ISBN: 978-1-5386-8319-4, Sep. 2018, (28% (44/154)) |
[225] Y. Oyama, T. Ben-Nun, T. Hoefler, S. Matsuoka: | ||
Accelerating Deep Learning Frameworks with Micro-batches
In {IEEE} International Conference on Cluster Computing, {CLUSTER} 2018, Belfast, UK, September 10-13, 2018, presented in Belfast, UK, IEEE, ISBN: 978-1-5386-8319-4, Sep. 2018, (28% (44/154)) |
[226] T. Hoefler: | ||
An HPC System's guy's view of Quantum Computing
(Presentation) presented in Redmond, WA, Aug. 2018, Presentation at the Microsoft Faculty Summit 2018 |
[227] T. Hoefler: | ||
Performance Modeling for Future Computing Technologies
(Presentation) Jun. 2018, Invited talk at 60 years of CS @ Tsinghua celebration |
[228] M. Besta, T. Hoefler: | ||
Survey and Taxonomy of Lossless Graph Compression and Space-Efficient Graph Representations
CoRR. Vol abs/1806.01799, Jun. 2018, |
[229] O. Fuhrer, T. Chadha, T. Hoefler, G. Kwasniewski, X. Lapillonne, D. Leutwyler, D. Luethi, C. Osuna, C. Schaer, T. Schulthess, H. Vogt: | ||
Near-global climate simulation at 1 km resolution: establishing a performance baseline on 4888 GPUs with COSMO 5.0
Geoscientific Model Development. Vol 11, Nr. 4, Copernicus Publications, May 2018, |
[230] T. Hoefler: | ||
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis
(Presentation) Apr. 2018, Keynote at Swiss HPC Advisory Council Conference 2018 |
[231] K. Taranov, G. Alonso, T. Hoefler: | ||
Fast and strongly-consistent per-item resilience in key-value stores
ISBN: 978-1-4503-5584-1/18/04, Apr. 2018, EuroSys '18: Thirteenth EuroSys Conference 2018, April 23--26, 2018, Porto, Portugal (acceptance rate: 16% (43/262)) |
[232] S. Li, Y. Zhang, T. Hoefler: | ||
Cache-Oblivious MPI All-to-All Communications Based on Morton Order
IEEE Transactions on Parallel and Distributed Systems. Vol 29, Nr. 3, pages 542-555, IEEE, Mar. 2018, |
[233] M. Besta, S. M. Hassan, S. Yalamanchili, R. Ausavarungnirun, O. Mutlu, T. Hoefler: | ||
Slim NoC: A Low-Diameter On-Chip Network Topology for High Energy Efficiency and Scalability
Mar. 2018, Accepted at the 23rd ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'18) |
[234] M. Besta, E. Henriksson, T. Hoefler: | ||
Lowering Diameter Enables Cost-Effective and High-Performance Networks
(Presentation) presented in Williamsburg, USA, Mar. 2018, Presentation at the 2018 Warehouse-scale Memory Systems (WAMS) Workshop |
[235] T. Hoefler: | ||
Performance Portability - An Oxymoron?
(Presentation) presented in Kona, HI, USA, Mar. 2018, Invited talk at SOS'18 Workshop |
[236] C. Baumann, A. Marian Dan, Y. Meshman, T. Hoefler, M. Vechev: | ||
Automatic Verification of RMA Programs via Abstraction Extrapolation
Springer International Publishing, Feb. 2018, |
[237] I. Mueller, A. Arteaga, T. Hoefler, G. Alonso: | ||
Reproducible Floating-Point Aggregation in RDBMSs
Feb. 2018, In Proceedings of the 2018 IEEE 34th International Conference on Data Enineering |
[238] J. de Fine Licht, M. Blott, T. Hoefler: | ||
Designing scalable FPGA architectures using high-level synthesis
In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, presented in Vienna, Austria, pages 403--404, ACM, ISBN: 978-1-4503-4982-6, Feb. 2018, |
[239] T. Hoefler: | ||
The three L's in modern high-performance networking: low latency, low cost, low processing load
(Presentation) presented in Vienna, Austria, Feb. 2018, Keynote at the HiPINEB workshop at HPCA'18 |
[240] T. Hoefler: | ||
Developing high-performance software, from modeling to programming
(Presentation) presented in Nuremberg, Germany, Feb. 2018, Invited opening presentation at the Multicore@Siemens conference |
[241] L. Gianinazzi, P. Kalvoda, A. De Palma, M. Besta, T. Hoefler: | ||
Communication-Avoiding Parallel Minimum Cuts and Connected Components
In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, presented in Vienna, Austria, pages 219-232, ACM, ISBN: 978-1-4503-4982-6, Feb. 2018, (acceptance rate: 20% (28/138)) |
[242] T. Hoefler, S. Ramos, C. Osuna, F. Thaler, S. Moosbrugger, O. Fuhrer: | ||
Capability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNL and the COSMO Weather Code
(Presentation) presented in Denver, CO, Nov. 2017, Presentation at the Intel HPC Developer's Conference 2017 |
[243] T. Hoefler, S. Di Girolamo, K. Taranov, R. E. Grant, R. Brightwell: | ||
sPIN: High-performance streaming Processing in the Network
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC17), Nov. 2017, (acceptance rate: 18% (61/327)) Best Paper Finalist at SC17 (5/61) |
[244] E. Solomonik, M. Besta, F. Vella, T. Hoefler: | ||
Scaling Betweenness Centrality using Communication-Efficient Sparse Matrix Multiplication
Nov. 2017, Accepted at The International Conference for High Performance Computing, Networking, Storage and Analysis (SC'17) (acceptance rate: 18% (61/327)) |
[245] D. Unat, A. Dubey, T. Hoefler, J. Shalf, M. Abraham, M. Bianco, B. L. Chamberlain, R. Cledat, H. Carter Edwards, H. Finkel, K. Fuerlinger, F. Hannig, E. Jeannot, A. Kamil, J. Keasler, P. H J Kelly, V. Leung, H. Ltaief, N. Maruyama, C. J. Newburn, M. Pericas: | ||
Trends in Data Locality Abstractions for HPC Systems
IEEE Transactions on Parallel and Distributed Systems. Vol 28, Nr. 10, pages 3007-3020, IEEE, Oct. 2017, |
[246] T. Hoefler, S. Ramos, T. Ben-Nun: | ||
HPC Performance Optimization Advances at Extreme Scale
(Presentation) presented in Hefei, China, Oct. 2017, Invited talk at the Co-Design workshop (HPC China 2017) |
[247] T. Hoefler: | ||
A View on MPI's Recent Past, Present, and Future
(Presentation) presented in Chicago, IL, Sep. 2017, Invited talk at 25 Years of MPI Symposium |
[248] P. Yebenes, J. Escudero-Sahuquillo, P. J. Garcia, F. J. Quiles, T. Hoefler: | ||
Improving Non-Minimal and Adaptive Routing Algorithms in Slim Fly Networks
In Proceedings of the 25th Annual Symposium on High-Performance Interconnects (HOTI'17), Aug. 2017, Best Student Paper at HOTI'17 |
[249] T. Schneider, J. Dinan, M. Flajslik, K. D. Underwood, a. Torsten Hoefler: | ||
Fast Networks and Slow Memories: A Mechanism for Mitigating Bandwidth Mismatches
In Proceedings of the 25th Annual Symposium on High-Performance Interconnects (HOTI'17), Aug. 2017, |
[250] C. Barthels, T. Schneider, I. Mueller, G. Alonso, T. Hoefler: | ||
Distributed Join Algorithms on Thousands of Cores
Vol 10, Nr. 5, In Proc. VLDB Endow., presented in Munich, Germany, pages 517--528, VLDB Endowment, ISSN: 2150-8097, Aug. 2017, |
[251] E. Solomonik, G. Ballard, J. Demmel, T. Hoefler: | ||
A Communication-Avoiding Parallel Algorithm for the Symmetric Eigenvalue Problem
Nr. 11, In Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA'17), presented in Washington, DC, USA, pages 111--121, ACM, ISBN: 978-1-4503-4593-4, Jun. 2017, |
[252] A. Arteaga, O. Fuhrer, T. Hoefler, T. Schulthess: | ||
Model-Driven Choice of Numerical Methods for the Solution of the Linear Advection Equation
In Proceedings of the International Conference on Computational Science (ICCS'17), presented in Zurich, Switzerland, Elsevier, Jun. 2017, |
[253] T. Hoefler: | ||
Progress in automatic GPU compilation and why you want to run MPI on your GPU.
(Presentation) presented in Orlando, FL, Jun. 2017, Invited talk at IPDRM Workshop (IPDPS'17) |
[254] T. Hoefler: | ||
Progress in automatic GPU compilation and why you want to run MPI on your GPU.
(Presentation) presented in Orlando, FL, Jun. 2017, Invited talk at IPDRM Workshop (IPDPS'17) |
[255] T. Hoefler: | ||
Scientific Benchmarking of Parallel Computing Systems
(Presentation) presented in Orlando, FL, Jun. 2017, Keynote talk at EMBRACE Workshop (IPDPS'17) |
[256] M. Besta, M. Podstawski, L. Groner, E. Solomonik, T. Hoefler: | ||
To Push or To Pull: On Reducing Communication and Synchronization in Graph Computations
In Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing (HPDC'17), presented in Washington, DC, USA, ACM, Jun. 2017, (acceptance rate: 19%) |
[257] M. Poke, T. Hoefler, C. W. Glass: | ||
AllConcur: Leaderless Concurrent Atomic Broadcast
presented in Washington, DC, USA, ACM, Jun. 2017, (acceptance rate: 19%) |
[258] K. T. Foerster, L. Groner, T. Hoefler, M. Koenig, S. Schmid, R. Wattenhofer: | ||
Multi-agent Pathfinding with n Agents on Graphs with n Vertices: Combinatorial Classification and Tight Algorithmic Bounds
In Algorithms and Complexity - 10th International Conference, {CIAC} 2017, Athens, Greece, May 24-26, 2017, Proceedings, presented in Athens, Greece, May 2017, |
[259] S. Ramos , T. Hoefler: | ||
Capability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNL
In Proceedings of the 31st IEEE International Parallel & Distributed Processing Symposium (IPDPS'17), presented in Orlando, FL, USA, IEEE, May 2017, (acceptance rate: 22%, 116/516) |
[260] T. Wicky, E. Solomonik, T. Hoefler: | ||
Communication-Avoiding Parallel Algorithms for Solving Triangular Systems of Linear Equations
In Proceedings of the 31st IEEE International Parallel & Distributed Processing Symposium (IPDPS'17), presented in Orlando, FL, USA, IEEE, May 2017, (acceptance rate: 22%, 116/516) |
[261] T. Hoefler, A. Barak, A. Shiloh, Z. Drezner: | ||
Corrected Gossip Algorithms for Fast Reliable Broadcast on Unreliable Systems
In Proceedings of the 31st IEEE International Parallel & Distributed Processing Symposium (IPDPS'17), presented in Orlando, FL, USA, IEEE, May 2017, (acceptance rate: 22%, 116/516) |
[262] M. Besta, F. Marending, E. Solomonik, T. Hoefler: | ||
SlimSell: A Vectorized Graph Representation for Breadth-First Search
In Proceedings of the 31st IEEE International Parallel & Distributed Processing Symposium (IPDPS'17), presented in Orlando, FL, USA, IEEE, May 2017, (acceptance rate: 22%, 116/516) |
[263] S. Di Girolamo, F. Vella, T. Hoefler: | ||
Transparent Caching for RMA Systems
In Proceedings of the 31st IEEE International Parallel & Distributed Processing Symposium (IPDPS'17), presented in Orlando, FL, USA, IEEE, May 2017, (acceptance rate: 22%, 116/516) |
[264] T. Hoefler: | ||
dCUDA: Hardware Supported Overlap of Computation and Communication
(Presentation) Apr. 2017, |
[265] C. Barthels, G. Alonso, T. Hoefler: | ||
Designing Databases for Future High-Performance Networks
IEEE Technical Committee on Data Engineering. Vol 40, Nr. 1, IEEE, Mar. 2017, |
[266] T. Hoefler: | ||
HiPINEB Panel (highly exaggerated)
(Presentation) presented in Austin, TX, USA, Feb. 2017, |
[267] S. Shudler, A. Calotoiu, T. Hoefler, F. Wolf: | ||
Isoefficiency in Practice: Configuring and Understanding the Performance of Task-based Applications
In Proceedings of the 22nd ACM SIGPLAN symposium on Principles and practice of parallel programming, presented in College Station, TX, ACM, Feb. 2017, (acceptance rate: 21%, 29/139) |
[268] T. Hoefler: | ||
High-Performance Distributed RMA Locks
(Presentation) presented in Austin, TX, Feb. 2017, Seminar at ARM Research |
[269] T. Hoefler: | ||
Progress in automatic GPU compilation and why you want to run MPI on your GPU.
(Presentation) presented in Austin, TX, Feb. 2017, Seminar at University of texas Austin |
[270] T. Hoefler: | ||
Accelerating weather and climate simulations on heterogeneous architectures
(Presentation) presented in Erlangen, Germany, Feb. 2017, Colloquium at the Friedrich-Alexander-Universitaet Erlangen-Nuernberg |
[271] T. Hoefler: | ||
Progress in automatic GPU compilation and why you want to run MPI on your GPU.
(Presentation) presented in Champaign, IL, Jan. 2017, Seminar at University of Illinois at Urbana-Champaign/NCSA |
[272] T. Hoefler: | ||
High-Performance Distributed RMA Locks
(Presentation) presented in Champaign, IL, Jan. 2017, Seminar at University of Illinois at Urbana-Champaign/NCSA |
[273] T. Hoefler: | ||
Progress in automatic GPU compilation and why you want to run MPI on your GPU.
(Presentation) presented in Beijing, China, Jan. 2017, Seminar at Tsinghua University, Beijing, China |
[274] T. Hoefler: | ||
Accelerating weather and climate simulations on heterogeneous architectures
(Presentation) presented in Beijing, China, Jan. 2017, Distinguished colloquium at the Institute of Computing Technology at the Chinese Academy of Sciences, Beijing, China Distinguished colloquium |
[275] A. Marian Dan, P. Lam, T. Hoefler, M. Vechev: | ||
Modeling and Analysis of Remote Memory Access Programming
In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, presented in Amsterdam, Netherlands, pages 129--144, ACM, ISBN: 978-1-4503-4444-9, Nov. 2016, Outstanding Paper Award at OOPSLA'16 (4/52) |
[276] M. Martinasso, G. Kwasniewski, S. R. Alam, T. Schulthess, T. Hoefler: | ||
A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC16), presented in Salt Lake City, Utah, pages 63:1--63:11, IEEE Press, ISBN: 978-1-4673-8815-3, Nov. 2016, (acceptance rate: 18% (82/446)) |
[277] W. Tang, B. Wang, S. Ethier, G. Kwasniewski, T. Hoefler, K. Z. Ibrahim, K. Madduri, S. Williams, L. Oliker, C. Rosales-Fernandez, T. Williams: | ||
Extreme Scale Plasma Turbulence Simulations on Top Supercomputers Worldwide
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC16), presented in Salt Lake City, Utah, pages 43:1--43:12, IEEE Press, ISBN: 978-1-4673-8815-3, Nov. 2016, (acceptance rate: 18% (82/446)) |
[278] J. Domke, T. Hoefler: | ||
Scheduling-Aware Routing for Supercomputers
Nov. 2016, Accepted at The International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16) (acceptance rate: 18% (82/446)) |
[279] T. Gysi, J. Baer, T. Hoefler: | ||
dCUDA: Hardware Supported Overlap of Computation and Communication
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC16), presented in Salt Lake City, Utah, pages 52:1--52:12, IEEE Press, ISBN: 978-1-4673-8815-3, Nov. 2016, (acceptance rate: 18% (82/446)) |
[280] T. Hoefler: | ||
Polly-ACC: Transparent Compilation to Heterogeneous Hardware.
(Presentation) presented in Salt Lake City, UT, Nov. 2016, Invited talk at the LLVM-HPC workshop and TiTech Booth at SC16 |
[281] T. Hoefler: | ||
Theory and Practice in HPC: Modeling, Programming, and Networking
(Presentation) presented in Xi'an, China, Oct. 2016, Keynote talk at HPC China 2016 |
[282] S. Ramos, T. Hoefler: | ||
Cache Line Aware Algorithm Design for Cache-Coherent Architectures
IEEE Transactions on Parallel and Distributed Systems. Vol 27, Nr. 10, pages 2824-2837, IEEE, Oct. 2016, |
[283] A. Calotoiu, D. Beckingsale, C. W. Earl, T. Hoefler, I. Karlin, M. Schulz, F. Wolf: | ||
Fast Multi-Parameter Performance Modeling
Oct. 2016, Accepted at IEEE International Conference on Cluster Computing (Cluster'16) (acceptance rate: 24% (39/162)) |
[284] T. Hoefler: | ||
Progress in automatic GPU compilation and why you want to run MPI on your GPU.
(Presentation) presented in Lyon, France, Oct. 2016, Invited talk at the CCDSC meeting |
[285] T. Hoefler: | ||
Accelerating weather and climate simulations on heterogeneous architectures
(Presentation) presented in Xi'an, China, Oct. 2016, Invited talk at the CoDesign Meeting at HPC China 2016 |
[286] T. Hoefler: | ||
High-Performance Distributed RMA Locks
(Presentation) presented in Wuxi, China, Sep. 2016, Seminar talk at Intl. Workshop on High-Performance Systems |
[287] T. Hoefler: | ||
Theory and Practice in HPC: Modeling, Programming, and Networking
(Presentation) presented in Taipei, Taiwan, Sep. 2016, Opening keynote talk at IEEE Cluster 2016 |
[288] T. Hoefler: | ||
MODESTO: Data-centric Analytic Optimization of Complex
Stencil Programs on Heterogeneous Architectures
(Presentation) presented in Guangzhou, China, Sep. 2016, Seminar talk at Intl. Workshop on High-Performance Systems |
[289] T. Hoefler: | ||
Scientific Benchmarking of Parallel Computing Systems
(Presentation) presented in Knoxville, TN, USA, Aug. 2016, |
[290] T. Hoefler: | ||
Towards scalable RDMA locking on a NIC
(Presentation) presented in Palo Alto, CA, USA, Aug. 2016, |
[291] T. Hoefler: | ||
Network topologies for large-scale compute centers: It's the diameter, stupid!
(Presentation) presented in San Jose, CA, USA, Aug. 2016, Invited talk at the IEEE Hot Interconnects 2016 |
[292] T. Schneider, O. Bibartiu, T. Hoefler: | ||
Ensuring Deadlock-Freedom in Low-Diameter InfiniBand Networks
In Proceedings of the 24th Annual Symposium on High-Performance Interconnects (HOTI'16), Aug. 2016, Best Student Paper at HOTI'16 |
[293] S. Di Girolamo, P. Jolivet, K. D. Underwood, T. Hoefler: | ||
Exploiting Offload Enabled Network Interfaces
IEEE MICRO. Vol 36, Nr. 4, IEEE, Jul. 2016, |
[294] J. Domke, T. Hoefler, S. Matsuoka: | ||
Routing on the Dependency Graph: A New Approach to Deadlock-Free High-Performance Routing
In Proceedings of the 25th Symposium on High-Performance Parallel and Distributed Computing (HPDC'16), Jun. 2016, (acceptance rate: 16% (20/129)) |
[295] T. Hoefler: | ||
Selecting Technical Papers for an Interdisciplinary Conference: The PASC Review Process
In Proceedings of the 3rd Platform of Advanced Scientific Computing Conference (PASC'16), Jun. 2016, |
[296] T. Hoefler: | ||
Progress in automatic GPU compilation and why you want to run MPI on your GPU.
(Presentation) presented in Haifa, Israel, Jun. 2016, Seminar talk at Israel Institute of Technology (Technion) |
[297] T. Hoefler: | ||
Progress in automatic GPU compilation and why you want to run MPI on your GPU.
(Presentation) presented in Cetraro, Italy, Jun. 2016, Invited talk at the Cetraro HPC conference |
[298] T. Hoefler: | ||
An Overview of Static & Dynamic Techniques for Automatic Performance Modeling
(Presentation) presented in Frankfurt, Germany, Jun. 2016, Invited talk at International Supercomputing Conference |
[299] T. Hoefler: | ||
The Eigth Green Graph500
(Presentation) presented in Frankfurt, Germany, Jun. 2016, |
[300] T. Grosser, T. Hoefler: | ||
Polly-ACC: Transparent compilation to heterogeneous hardware
In Proceedings of the the 30th International Conference on Supercomputing (ICS'16), Jun. 2016, (acceptance rate: 24% (43/178)) |
[301] P. Schmid, M. Besta, T. Hoefler: | ||
High-Performance Distributed RMA Locks
In Proceedings of the 25th Symposium on High-Performance Parallel and Distributed Computing (HPDC'16), Jun. 2016, (acceptance rate: 16% (20/129)) Karsten Schwan Best Paper Award at HPDC'16 (1/20) |
[302] T. Hoefler: | ||
Scientific Benchmarking of Parallel Computing Systems
(Presentation) presented in Stuttgart, Germany, Apr. 2016, |
[303] T. Hoefler: | ||
Active RDMA - new tricks for an old dog
(Presentation) presented in Gleneden Beach, OR, USA, Apr. 2016, Invited talk at Salishan Meeting |
[304] P. M. Widener, S. Levy, K. B. Ferreira, T. Hoefler: | ||
On noise and the performance benefit of nonblocking collectives
The International Journal of High Performance Computing Applications. Vol 30, Nr. 1, pages 121-133, Sage, ISSN: 1094-3420, Jan. 2016, accepted for publication on Nov. 2nd 2015 |
[305] T. Hoefler: | ||
Automatic Performance Models for the Masses: Static and dynamic techniques for application performance modeling
(Presentation) presented in CoDesign Workshop, Wuxi, China, Nov. 2015, |
[306] T. Hoefler: | ||
Remote Memory Access Programming: Faster Parallel Computing Without Messages
(Presentation) presented in Tsinghua University, Beijing, China, Nov. 2015, |
[307] T. Hoefler: | ||
Performance Reproducibility Birds of a Feather
(Presentation) presented in Austin, TX, USA, Nov. 2015, |
[308] T. Hoefler: | ||
The Seventh Green Graph500
(Presentation) presented in Austin, TX, USA, Nov. 2015, |
[309] G. Kathareios, C. Minkenberg, B. Prisacari, G. Rodriguez, T. Hoefler: | ||
Cost-Effective Diameter-Two Topologies: Analysis and Evaluation
presented in Austin, TX, USA, ACM, ISBN: 978-1-4503-3723-6, Nov. 2015, In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC15) (acceptance rate: 22%, 79/358) |
[310] T. Hoefler, R. Belli: | ||
Scientific Benchmarking of Parallel Computing Systems
presented in Austin, TX, USA, pages 73:1--73:12, ACM, ISBN: 978-1-4503-3723-6, Nov. 2015, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC15) (acceptance rate: 22%, 79/358) |
[311] H. Schweizer, M. Besta, T. Hoefler: | ||
Evaluating the Cost of Atomic Operations on Modern Architectures
presented in San Francisco, CA, USA, ACM, Oct. 2015, Accepted at the 24th International Conference on Parallel Architectures and Compilation (PACT'15) (acceptance rate: 21%, 38/179) |
[312] A. Bhattacharyya, G. Kwasniewski, T. Hoefler: | ||
Using Compiler Techniques to Improve Automatic Performance Modeling
presented in San Francisco, CA, USA, ACM, Oct. 2015, Accepted at the 24th International Conference on Parallel Architectures and Compilation (PACT'15) (acceptance rate: 21%, 38/179) |
[313] S. Di Girolamo, P. Jolivet, K. D. Underwood, T. Hoefler: | ||
Exploiting Offload Enabled Network Interfaces
In Proceedings of the 23rd Annual Symposium on High-Performance Interconnects (HOTI'15), presented in Oracle Santa Clara Campus, CA, USA, IEEE, Aug. 2015, Best Student Paper at HOTI'15 |
[314] T. Hoefler: | ||
How fast will your application go? Static and dynamic techniques for application performance modeling
(Presentation) presented in Bloomington, IN, USA, Jul. 2015, |
[315] T. Hoefler: | ||
The Sixth Green Graph500 List
(Presentation) presented in Frankfurt, Germany, Jul. 2015, |
[316] T. Hoefler: | ||
Towards Remote Memory Access Programming for Data Analytics
(Presentation) presented in Chicago, IL, USA, Jul. 2015, |
[317] M. Besta, T. Hoefler: | ||
Active Access: A Mechanism for High-Performance Distributed Data-Centric Computations
In Proceedings of the 29th International Conference on Supercomputing (ICS'15), presented in Newport Beach, CA, USA, pages 155--164, ACM, ISBN: 978-1-4503-3559-1, Jun. 2015, (acceptance rate: 25% (40/160)) |
[318] S. Ramos, T. Hoefler: | ||
Cache Line Aware Optimizations for ccNUMA Systems
In Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing (HPDC'15) (short paper), presented in Portland, OR, USA, pages 85--88, ACM, ISBN: 978-1-4503-3550-8, Jun. 2015, |
[319] T. Gysi, T. Grosser, T. Hoefler: | ||
MODESTO: Data-centric Analytic Optimization of Complex Stencil Programs on Heterogeneous Architectures
In Proceedings of the 29th International Conference on Supercomputing (ICS'15), presented in Newport Beach, CA, USA, pages 177--186, ACM, ISBN: 978-1-4503-3559-1, Jun. 2015, (acceptance rate: 25% (40/160)) |
[320] M. Poke, T. Hoefler: | ||
DARE: High-Performance State Machine Replication on RDMA Networks
In Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing (HPDC'15), presented in Portland, OR, USA, pages 107--118, ACM, ISBN: 978-1-4503-3550-8, Jun. 2015, (acceptance rate: 16% (19/116)) |
[321] M. Besta, T. Hoefler: | ||
Accelerating Irregular Computations with Hardware Transactional Memory and Active Messages
In Proceedings of the 24th Symposium on High-Performance Parallel and Distributed Computing (HPDC'15), presented in Portland, OR, USA, pages 161--172, ACM, ISBN: 978-1-4503-3550-8, Jun. 2015, (acceptance rate: 16% (19/116)) Best Paper at HPDC'15 (1/19) |
[322] S. Shudler, A. Calotoiu, T. Hoefler, A. Strube, F. Wolf: | ||
Exascaling Your Library: Will Your Implementation Meet Your Expectations?
In Proceedings of the 29th International Conference on Supercomputing (ICS'15), presented in Newport Beach, CA, USA, pages 161--175, ACM, ISBN: 978-1-4503-3559-1, Jun. 2015, (acceptance rate: 25% (40/160)) |
[323] T. Hoefler: | ||
Remote Memory Access Programming: Faster Parallel Computing Without Messages
(Presentation) presented in San Diego, CA, USA, Jun. 2015, |
[324] T. Hoefler: | ||
How fast will your application go? Static and dynamic techniques for application performance modeling
(Presentation) presented in Oak Ridge, TN, USA, Jun. 2015, |
[325] T. Hoefler: | ||
Efficient networking and programming of large-scale computing systems
(Presentation) presented in Palo Alto, CA, USA, Jun. 2015, |
[326] T. Hoefler: | ||
Towards Remote Memory Access Programming for Data Analytics
(Presentation) presented in Berkeley, CA, USA, Jun. 2015, |
[327] T. Hoefler: | ||
Remote Memory Access Programming: Faster Parallel Computing Without Messages
(Presentation) presented in Atlanta, GA, USA, Jun. 2015, |
[328] R. Belli, T. Hoefler: | ||
Notified Access: Extending Remote Memory Access Programming Models for Producer-Consumer Synchronization
In Proceedings of the 29th IEEE International Parallel & Distributed Processing Symposium (IPDPS'15), presented in Hyderabad, India, IEEE, May 2015, (acceptance rate: 21,8%, 108/496) Best Paper at IPDPS'15 (4/108) |
[329] T. Hoefler, R. Ross, T. Roscoe: | ||
Distributing the Data Plane for Remote Storage Access
presented in Kartause Ittingen, Switzerland, USENIX, May 2015, Proceedings of the 15th Workshop on Hot Topics in Operating Systems (acceptance rate: 32% (29/90)) |
[330] T. Hoefler: | ||
MODESTO: Data-centric Analytic Optimization of Complex Stencil Programs on Heterogeneous Architectures
(Presentation) presented in Hyderabad, India, May 2015, |
[331] T. Lee, C. Pappas, C. Basescu, J. Han, T. Hoefler, A. Perrig: | ||
Source-Based Path Selection: The Data Plane Perspective
In Proceedings of the 10th International Conference on Future Internet, presented in Seoul, Republic of Korea, pages 41--45, ACM, ISBN: 978-1-4503-3564-5, May 2015, |
[332] T. Hoefler, J. Domke: | ||
Fail-in-Place Network Design
presented in Oslo, Norway, May 2015, |
[333] T. Hoefler: | ||
How fast will your application go? Static and dynamic techniques
for application performance modeling.
(Presentation) presented in Hyderabad, India, May 2015, Keynote talk at HIPS'15/LSPP'15 in conjuntion with IPDPS'15 |
[334] T. Hoefler, J. Dinan, R. Thakur, B. Barrett, P. Balaji, W. Gropp, K. Underwood: | ||
Remote Memory Access Programming in MPI-3
ACM Transactions on Parallel Computing (TOPC). ACM, Jan. 2015, accepted for publication on Dec. 4th |
[335] T. Hoefler: | ||
Resilience Overheads at Scale and Scalability
(Presentation) presented in Dresden, Germany, Dec. 2014, |
[336] T. Hoefler: | ||
The Fourth Green Graph500 List
(Presentation) presented in New Orleans, Louisiana, USA, Nov. 2014, |
[337] J. Domke, T. Hoefler, S. Matsuoka: | ||
Fail-in-Place Network Design: Interaction between Topology, Routing Algorithm and Failures
presented in New Orleans, LA, USA, Nov. 2014, Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC14) (acceptance rate: 21%, 82/394) |
[338] K. B. Ferreira, P. Widener, S. Levy, D. Arnold, T. Hoefler: | ||
Understanding the Effects of Communication and Coordination on Checkpointing at Scale
presented in New Orleans, LA, USA, Nov. 2014, Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC14) (acceptance rate: 21%, 82/394) |
[339] M. Besta, T. Hoefler: | ||
Slim Fly: A Cost Effective Low-Diameter Network Topology
presented in New Orleans, LA, USA, Nov. 2014, Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC14) (acceptance rate: 21%, 82/394) SC14 Best Student Paper (1/82) |
[340] W. Gropp, T. Hoefler, R. Thakur, E. Lusk: | ||
Using Advanced MPI: Modern Features of the Message-Passing Interface
presented in Cambridge, MA, MIT Press, ISBN: 978-0262527637, Nov. 2014, |
[341] T. Hoefler: | ||
IA3 Panel: HPC vs. Irregular Applications
(Presentation) presented in New Orleans, Louisiana, USA, Nov. 2014, |
[342] T. Hoefler: | ||
What about MPI + LLVM?
(Presentation) presented in New Orleans, Louisiana, USA, Nov. 2014, |
[343] T. Hoefler: | ||
LogGOPSim - Simple and Fast Large-Scale Simulations
(Presentation) presented in New Orleans, Louisiana, USA, Nov. 2014, |
[344] T. Hoefler: | ||
A case for runtime recompilation in HPC
(Presentation) presented in New Orleans, Louisiana, USA, Nov. 2014, The LLVM Compiler Infrastructure in HPC, Keynote Presentation |
[345] T. Hoefler, D. Moor: | ||
Energy, Memory, and Runtime Tradeoffs for Implementing Collective Communication Operations
Journal of Supercomputing Frontiers and Innovations. Vol 1, Nr. 2, pages 58--75, SuperFri Open Journal, Oct. 2014, |
[346] P. Widener, K. Ferreira, S. Levy, T. Hoefler: | ||
Exploring the effect of noise on the performance benefit of nonblocking allreduce
In Proceedings of the 21st European MPI Users' Group Meeting, presented in Kyoto, Japan, pages 77:77--77:82, ACM, ISBN: 978-1-4503-2875-3, Sep. 2014, Invited to a journal special issue on top picks from EuroMPI'14. |
[347] A. Bhattacharyya, T. Hoefler: | ||
PEMOGEN: Automatic Adaptive Performance Modeling During Program Runtime
In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation (PACT'14), presented in Edmonton, Alberta, Canada, pages 393-404, ACM, ISBN: 978-1-4503-2809-8, Aug. 2014, |
[348] T. Hoefler: | ||
Remote Memory Access Programming - Tools and Fault Tolerance
(Presentation) presented in Moscow, Russia, Jul. 2014, |
[349] M. Besta, T. Hoefler: | ||
Fault Tolerance for Remote Memory Access Programming Models
In Proceedings of the 23rd ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC'14), presented in Vancouver, Canada, ACM, Jun. 2014, (acceptance rate: 16%, 21/130) Best Paper Nominee at HPDC'14 (3/21) |
[350] T. Hoefler, G. Kwasniewski: | ||
Automatic Complexity Analysis of Explicitly Parallel Programs
In Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA'14), presented in Prague, Czech Republic, ACM, Jun. 2014, (acceptance rate: 25%, 30/122) |
[351] B. Prisacari, G. Rodriguez, P. Heidelberger, D. Chen, C. Minkenberg, T. Hoefler: | ||
Efficient Task Placement and Routing in Dragonfly Networks
In Proceedings of the 23rd ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC'14), presented in Vancouver, Canada, ACM, Jun. 2014, (acceptance rate: 16%, 21/130) |
[352] T. Hoefler: | ||
The Green Graph500 List
(Presentation) presented in Leipzig, Germany, Jun. 2014, |
[353] T. Hoefler: | ||
Using Simulation to Evaluate the Performance of Resilience Strategies at Scale
(Presentation) In ISC workshop on International Cooperation, presented in Leipzig, Germany, Jun. 2014, |
[354] D. Unat, J. Shalf, T. Hoefler, T. Schulthess, A. Dubey (Editors), M. Besta, a. others: | ||
Programming Abstractions for Data Locality
Technical Report. presented in Lugano, Switzerland, Apr. 2014, |
[355] T. Schneider, R. Gerstenberger, T. Hoefler: | ||
Application-oriented ping-pong benchmarking: how to assess the real communication overheads
Journal of Computing. Vol 96, Nr. 4, pages 279-292, Springer Vienna, ISSN: 0010-485X, Apr. 2014, Special issue on top picks from EuroMPI'12. |
[356] A. Arteaga, O. Fuhrer, T. Hoefler: | ||
Designing Bit-Reproducible Portable High-Performance Applications
In Proceedings of the 28th IEEE International Parallel and Distributed Processing Symposium (IPDPS), presented in Phoenix, AR, USA, IEEE Computer Society, Apr. 2014, (acceptance rate: 21.1%, 114/541) |
[357] S. Li, T. Hoefler, C. Hu, M. Snir: | ||
Improved MPI collectives for MPI processes in shared address spaces
Journal of Cluster Computing. pages 1-17, Springer US, ISSN: 1386-7857, Mar. 2014, |
[358] F. Wolf, C. Bischof, T. Hoefler, B. Mohr, G. Wittum, A. Calotoiu, C. Iwainsky, A. Strube, A. Vogel: | ||
Catwalk: A Quick Development Path for Performance Models
Springer. In Euro-Par 2014: Parallel Processing Workshops, pages 589-600, 2014, |
[359] B. Prisacari, G. Rodriguez, C. Minkenberg, T. Hoefler: | ||
Fast Pattern-Specific Routing for Fat Tree Networks
ACM Transactions on Architecture and Code Optimization. Vol 10, Nr. 4, presented in New York, NY, USA, pages 36:1--36:25, ACM, ISSN: 1544-3566, Dec. 2013, (acceptance rate: 24% (2011)) |
[360] T. Hoefler: | ||
MPI Beyond 3.0 and Towards Larger-Scale Computing
(Presentation) presented in Denver, CO, USA, Nov. 2013, Keynote at ExaMPI 2013 Workshop (in conjunction with SC13) |
[361] S. Levy, B. Topp, K. Ferreira, D. Arnold, T. Hoefler, P. Widener: | ||
Using Simulation to Evaluate the Performance of Resilience Strategies at Scale
presented in Denver, CO, USA, Nov. 2013, Proceedings of the 4th International Workshop in Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS13) |
[362] A. Friedley, G. Bronevetsky, A. Lumsdaine, T. Hoefler: | ||
Hybrid MPI: Efficient Message Passing for Multi-core Systems
In IEEE/ACM International Conference on High Performance Computing, Networking, Storage and Analysis (SC13), presented in Denver, Colorado, USA, pages 18:1--18:11, ISBN: 978-1-4503-2378-9, Nov. 2013, (acceptance rate: 20%, 92/457) |
[363] R. Gerstenberger, M. Besta, T. Hoefler: | ||
Enabling Highly-Scalable Remote Memory Access Programming with MPI-3 One Sided
In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, presented in Denver, Colorado, USA, pages 53:1--53:12, ACM, ISBN: 978-1-4503-2378-9, Nov. 2013, (acceptance rate: 20%, 92/457) Best Student Paper Finalist (8/92) and SC13 Best Paper (1/92) |
[364] A. Calotoiu, T. Hoefler, M. Poke, F. Wolf: | ||
Using Automated Performance Modeling to Find Scalability Bugs in Complex Codes
In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC13), presented in Denver, Colorado, USA, pages 45:1--45:12, ACM, ISBN: 978-1-4503-2378-9, Nov. 2013, (acceptance rate: 20%, 92/457) |
[365] T. Hoefler: | ||
The Green Graph500 List
(Presentation) presented in Denver, Colorado, Nov. 2013, |
[366] T. Schneider, T. Hoefler, R. Grant, B. Barrett, R. Brightwell: | ||
Protocols for Fully Offloaded Collective Operations on Accelerated Network Adapters
In Parallel Processing (ICPP), 2013 42nd International Conference on, presented in Lyon, France, pages 593-602, ISSN: 0190-3918, Oct. 2013, |
[367] T. Schneider, R. Gerstenberger, T. Hoefler: | ||
Compiler Optimizations for Non-Contiguous
Remote Data Movement
presented in Santa Clara, CA, USA, Sep. 2013, Proceedings of the 26th International Workshop on Languages and Compilers for Parallel Computing |
[368] T. Schneider, F. Kjolstad, T. Hoefler: | ||
MPI Datatype Processing using Runtime Compilation
In Proceedings of the 20th European MPI Users' Group Meeting, presented in Madrid, Spain, pages 19--24, ACM, ISBN: 978-1-4503-1903-4, Sep. 2013, Best Paper Award at EuroMPI'13 (1/25) |
[369] S. Li, T. Hoefler, M. Snir: | ||
NUMA-Aware Shared Memory Collective Communication for MPI
In Proceedings of the 22nd international symposium on High-performance parallel and distributed computing, presented in New York City, NY, USA, pages 85--96, ACM, ISBN: 978-1-4503-1910-2, Jun. 2013, (acceptance rate: 15%, 20/131) Nominated for Best Paper Award at HPDC'13 (3/20) |
[370] T. Hoefler: | ||
The Green Graph500 List
(Presentation) presented in Leipzig, Germany, Jun. 2013, |
[371] B. Prisacari, G. Rodriguez, C. Minkenberg, T. Hoefler: | ||
Bandwidth-optimal All-to-all Exchanges in Fat Tree Networks
In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, presented in Eugene, OR, USA, pages 139--148, ACM, ISBN: 978-1-4503-2130-3, Jun. 2013, (acceptance rate: 21%, 41/198) |
[372] S. Ramos, . Torsten Hoefler: | ||
Modeling Communication in Cache-Coherent SMP Systems - A Case-Study with Xeon Phi
In Proceedings of the 22nd international symposium on High-performance parallel and distributed computing, presented in New York City, NY, USA, pages 97--108, ACM, ISBN: 978-1-4503-1910-2, Jun. 2013, (acceptance rate: 15%, 20/131) |
[373] T. Hoefler, J. Dinan, D. Buntinas, P. Balaji, B. Barrett, R. Brightwell, W. Gropp, V. Kale, R. Thakur: | ||
MPI + MPI: a new hybrid approach to parallel programming with MPI plus shared memory
Journal of Computing. Springer, May 2013, doi: 10.1007/s00607-013-0324-2 |
[374] T. Hoefler: | ||
Application-Centric Benchmarking and Modeling for Co-Design
(Presentation) presented in Edinburgh, Great Britain, Apr. 2013, Presented at the Exascale Applications and Software Conference (EASC'13) |
[375] S. Ramos, T. Hoefler: | ||
Modelling Communications in Cache Coherent Systems
Technical Report. SPCL, ETH Zurich. presented in Zurich, Switzerland, Feb. 2013, |
[376] A. Friedley, T. Hoefler, G. Bronevetsky, A. Lumsdaine: | ||
Ownership Passing: Efficient Distributed Memory Programming on Multi-core Systems
In Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming, presented in Shenzen, China, pages 177--186, ACM, ISBN: 978-1-4503-1922-5, Feb. 2013, (acceptance rate: 18%, 26/146) |
[377] T. Hoefler, T. Schneider: | ||
Optimization Principles for Collective Neighborhood Communications
In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, presented in Salt Lake City, Utah, USA, pages 98:1--98:10, IEEE Computer Society Press, ISBN: 978-1-4673-0804-5, Nov. 2012, (acceptance rate: 21%, 100/472) |
[378] S. Pellegrini, T. Hoefler, T. Fahringer: | ||
Exact Dependence Analysis for Increased Communication Overlap
In Recent Advances in the Message Passing Interface - 19th European MPI Users' Group Meeting, EuroMPI 2012, Vienna, Austria, September 23-26, 2012. Proceedings, presented in Vienna, Austria, Springer, ISBN: 978-3-642-33517-4, Sep. 2012, |
[379] M. Passing Interface Forum: | ||
MPI: A Message-Passing Interface Standard Version 3.0
Sep. 2012, Chapter author for Collective Communication, Process Topologies, and One Sided Communications |
[380] T. Hoefler: | ||
MPI-3.0: A Response to New Challenges in Hardware and Software
(Presentation) presented in Stuttgart, Germany, Sep. 2012, Keynote at Multicore Challenge 2012 |
[381] T. Schneider, R. Gerstenberger, T. Hoefler: | ||
Micro-Applications for Communication Data Access Patterns and MPI Datatypes
Vol 7490, In Recent Advances in the Message Passing Interface - 19th European MPI Users' Group Meeting, EuroMPI 2012, Vienna, Austria, September 23-26, 2012. Proceedings, presented in Vienna, Austria, pages 121-131, Springer, ISBN: 978-3-642-33517-4, Sep. 2012, Invited to a journal special issue on top picks from EuroMPI'12. |
[382] S. Pellegrini, T. Hoefler, T. Fahringer: | ||
On the Effects of CPU Caches on MPI Point-to-Point Communications
In Proceedings of the 2012 IEEE International Conference on Cluster Computing, presented in Beijing, China, pages 495--503, IEEE Computer Society, ISBN: 978-0-7695-4807-4, Sep. 2012, (acceptance rate: 28.9%, 58/200) |
[383] T. Hoefler, T. Schneider: | ||
Runtime Detection and Optimization of Collective Communication Patterns
In Proceedings of the 21st international conference on Parallel Architectures and Compilation Techniques (PACT), presented in Minneapolis, MN, USA, pages 263--272, ACM, ISBN: 978-1-4503-1182-3, Sep. 2012, (acceptance rate: 18.9%, 39/207) |
[384] T. Hoefler, J. Dinan, D. Buntinas, P. Balaji, B. Barrett, R. Brightwell, W. Gropp, V. Kale, R. Thakur: | ||
Leveraging MPI's One-Sided Communication Interface for Shared-Memory Programming
Vol 7490, In Recent Advances in the Message Passing Interface - 19th European MPI Users' Group Meeting, EuroMPI 2012, Vienna, Austria, September 23-26, 2012. Proceedings, presented in Vienna, Austria, Springer, ISBN: 978-3-642-33517-4, Sep. 2012, Invited to journal special issue on top picks from EuroMPI'12. |
[385] T. Hoefler: | ||
The Green Graph500
(Presentation) presented in Hamburg, Germany, Jul. 2012, |
[386] T. Hoefler: | ||
Optimized routing and process mapping for arbitrary network topologies
(Presentation) presented in Tokyo, Japan, Jun. 2012, Tokyo Institute of Technology |
[387] G. Bauer, S. Gottlieb, T. Hoefler: | ||
Performance Modeling and Comparative Analysis
of the MILC Lattice QCD Application su3 rmd
In Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), presented in Ottawa, Canada, pages 652--659, IEEE Computer Society, ISBN: 978-0-7695-4691-9, May 2012, (acceptance rate: 27%, 83/302) |
[388] P. Gottschling, T. Hoefler: | ||
Productive Parallel Linear Algebra Programming with Unstructured Topology Adaption
In Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), presented in Ottawa, Canada, pages 9--16, IEEE Computer Society, ISBN: 978-0-7695-4691-9, May 2012, (acceptance rate: 27%, 83/302) |
[389] G. Bauer, T. Hoefler, W. Kramer, B. Fiedler: | ||
Analyses and Modeling of Applications Used to Demonstrate Sustained Petascale Performance on Blue Waters
(Presentation) presented in Stuttgart, Germany, May 2012, Cray User Group |
[390] T. Hoefler: | ||
New and old Features in MPI-3.0: The Past, the Standard, and the Future
(Presentation) University of Illinois at Urbana-Champaign. presented in Munich, Germany, Apr. 2012, |
[391] T. Hoefler: | ||
Performance Modeling for Systematic Performance Tuning
(Presentation) presented in Aachen, Germany, Mar. 2012, |
[392] T. Hoefler: | ||
Performance-oriented Parallel Programming Integrating Hardware, Middleware, and Applications
(Presentation) presented in Savannah, GA, USA, Feb. 2012, SIAM SIAG/SC Junior Scientist Award Lecture |
[393] F. Kjolstad, T. Hoefler, M. Snir: | ||
Automatic Datatype Generation and Optimization
In Proceedings of the 17th ACM symposium on Principles and practice of parallel programming, Feb. 2012, (poster paper) (acceptance rate (posters): 17%, 32/185) |
[394] T. Hoefler, T. Schneider: | ||
Communication-Centric Optimizations by Dynamically Detecting Collective Operations
In Proceedings of the 17th ACM symposium on Principles and practice of parallel programming, Feb. 2012, (poster paper) (acceptance rate (posters): 17%, 32/185) |
[395] K. Kharbas, D. Kim, T. Hoefler, F. Mueller: | ||
Assessing HPC Failure Detectors for MPI Jobs
In Proceedings of the 2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing, presented in Munich, Germany, pages 81--88, IEEE Computer Society, ISBN: 978-0-7695-4633-9, Feb. 2012, |
[396] T. Hoefler: | ||
Energy-aware Software Development for Massive-Scale Systems
(Presentation) presented in Salt Lake City, Utah, USA, Jan. 2012, |
[397] T. Hoefler, W. Gropp, M. Snir, W. Kramer: | ||
Performance Modeling for Systematic Performance Tuning
In International Conference for High Performance Computing, Networking, Storage and Analysis (SC'11), SotP Session, Nov. 2011, |
[398] T. Hoefler: | ||
Performance Modeling for the Masses
(Presentation) presented in Seattle, WA, USA, Nov. 2011, |
[399] W. Gropp, T. Hoefler, R. Thakur, J. Larsson Träff: | ||
Performance Expectations and Guidelines for MPI Derived Datatypes
Vol 6960, In Recent Advances in the Message Passing Interface (EuroMPI'11), presented in Santorini, Greece, pages 150-159, Springer, ISBN: 978-3-642-24448-3, Sep. 2011, |
[400] V. Venkatesan, M. Chaarawi, E. Gabriel, T. Hoefler: | ||
Design and Evaluation of Nonblocking Collective I/O Operations
Vol 6960, In Recent Advances in the Message Passing Interface (EuroMPI'11), presented in Santorini, Greece, pages 90-98, Springer, ISBN: 978-3-642-24448-3, Sep. 2011, |
[401] T. Hoefler: | ||
Energy-aware Software Development for Massive-Scale Systems
(Presentation) presented in Hamburg, Germany, Sep. 2011, Keynote at the International Conference on Energy-Aware High Performance Computing (EnA-HPC'11) EnA-HPC'11 Keynote Presentation |
[402] T. Hoefler, M. Snir: | ||
Writing Parallel Libraries with MPI - Common Practice, Issues, and Extensions
Vol 6960, In Recent Advances in the Message Passing Interface - 18th European MPI Users' Group Meeting, EuroMPI 2011, Santorini, Greece, September 18-21, 2011. Proceedings, presented in Santorini, Greece, pages 345--355, Springer, ISBN: 978-3-642-24448-3, Sep. 2011, Keynote paper at IMUDI/EuroMPI 2011. |
[403] T. Hoefler: | ||
Writing Parallel Libraries with MPI - The Good, the Bad, and the Ugly
presented in Santorini, Greece, Sep. 2011, Keynote talk at 18th European PVM/MPI User's Group Meeting Keynote talk at EuroMPI 2011. |
[404] T. Schneider, S. Eckelmann, T. Hoefler, a. Wolfgang Rehm: | ||
Kernel-Based Offload of Collective Operations - Implementation, Evaluation and Lessons Learned
In Proceedings of the 17th international conference on Parallel processing - Volume Part II, presented in Bordeaux, France, pages 264--275, Springer-Verlag, ISBN: 978-3-642-23396-8, Aug. 2011, (acceptance rate 29.9%, 81/271) |
[405] S. Harrell, P. Smith, D. Smith, T. Hoefler, A. Labutina, T. Overmeyer: | ||
Methods of Creating Student Cluster Competition Teams
In Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery, presented in Salt Lake City, Utah, pages 50:1--50:6, ACM, Jul. 2011, |
[406] T. Hoefler, M. Snir: | ||
Performance Engineering: A Must for Petaflops and Beyond
Jun. 2011, Extended Abstract for Keynote at Large-scale System and Application Performance Workshop 2011 Keynote Paper at LSAP'11 |
[407] J. Willcock, T. Hoefler, N. Edmonds, A. Lumsdaine: | ||
Active Pebbles: Parallel Programming for Data-Driven Applications
In Proceedings of the 2011 ACM International Conference on Supercomputing (ICS'11), presented in Tucson, AZ, pages 235--245, ACM, ISBN: 978-1-4503-0102-2, Jun. 2011, (acceptance rate 21.7%, 35/161) |
[408] T. Hoefler, M. Snir: | ||
Generic Topology Mapping Strategies for Large-scale Parallel Architectures
In Proceedings of the 2011 ACM International Conference on Supercomputing (ICS'11), presented in Tucson, AZ, pages 75--85, ACM, ISBN: 978-1-4503-0102-2, Jun. 2011, (acceptance rate 21.7%, 35/161) |
[409] J. Domke, T. Hoefler, W. Nagel: | ||
Deadlock-Free Oblivious Routing for Arbitrary Topologies
In Proceedings of the 25th IEEE International Parallel \& Distributed Processing Symposium (IPDPS), presented in Anchorage, AL, USA, pages 613--624, IEEE Computer Society, ISBN: 0-7695-4385-7, May 2011, (acceptance rate: 19.6%, 112/571) |
[410] T. Hoefler: | ||
Characterizing the Influence of System Noise on Large-Scale Parallel Applications
(Presentation) presented in Aachen, Germany, Apr. 2011, Talk at RWTH Aachen University |
[411] T. Hoefler: | ||
Model-Driven, Performance-Centric HPC Software and System Design and Optimization
(Presentation) presented in Juelich, Germany, Apr. 2011, Talk at Juelich Supercomputing Center (JSC) |
[412] P. Balaji, D. Buntinas, D. Goodell, W. Gropp, T. Hoefler, S. Kumar, E. Lusk, R. Thakur, J. Larsson Träff: | ||
MPI on Millions of Cores
Parallel Processing Letters (PPL). Vol 21, Nr. 1, pages 45-60, World Scientific Publishing Company, Mar. 2011, |
[413] W. Gropp, T. Hoefler, M. Snir: | ||
Performance Modeling for Systematic Performance Tuning
(Presentation) In SIAM Conference on Computational Science and Engineering 2011 (Abstracts), presented in Reno, NV, SIAM, Feb. 2011, |
[414] J. Willcock, T. Hoefler, N. Edmonds, A. Lumsdaine: | ||
Active Pebbles: A Programming Model For Highly Parallel Fine-Grained Data-Driven Computations
In Proceedings of the 16th ACM symposium on Principles and practice of parallel programming, pages 305--306, ISBN: 978-1-4503-0119-0, Feb. 2011, (poster paper) (acceptance rate: 25%, 26/165 papers + 16/165 poster) PPoPP'11 Best Poster Award |
[415] E. Holk, W. E. Byrd, J. Willcock, T. Hoefler, A. Chauhan, A. Lumsdaine: | ||
Kanor -- A Declarative Language for Explicit Communication
In Proceedings of the 13th international conference on Practical aspects of declarative languages, presented in Austin, TX, USA, pages 190--204, Springer-Verlag, ISBN: 978-3-642-18377-5, Jan. 2011, |
[416] N. Edmonds, T. Hoefler, A. Lumsdaine: | ||
A Space-Efficient Parallel Algorithm for Computing Betweenness Centrality in Distributed Memory
In International Conference on High Performance Computing, presented in Goa, India, pages 1 - 10, ISBN: 978-1-4244-8518-5 , Dec. 2010, (acceptance rate: 19.2%) |
[417] N. Edmonds, J. Willock, T. Hoefler, A. Lumsdaine: | ||
Design of a Large-Scale Hybrid-Parallel Graph Library
In International Conference on High Performance Computing, Student Research Symposium, presented in Goa, India, IEEE, Dec. 2010, |
[418] T. Hoefler: | ||
Bridging Performance Analysis Tools and Analytic Performance Modeling for HPC
In Proceedings of Workshop on Productivity and Performance (PROPER 2010), presented in Ischia, Italy, Springer, Dec. 2010, Keynote extended abstract for PROPER'10. |
[419] T. Hoefler: | ||
Software and Hardware Techniques for Power-Efficient HPC Networking
Computing in Science and Engineering (CiSE). Vol 12, Nr. 6, pages 30-37, IEEE Computer Society, ISSN: 0740-7475, Dec. 2010, |
[420] T. Hoefler, T. Schneider, A. Lumsdaine: | ||
Characterizing the Influence of System Noise on Large-Scale Applications by Simulation
In International Conference for High Performance Computing, Networking, Storage and Analysis (SC'10), Nov. 2010, (acceptance rate 19.8%, 50/253) SC10 Best Paper Award |
[421] T. Hoefler: | ||
Optimizing Communication on Blue Waters
(Presentation) In Talk at the Blue Waters PRAC Workshop, presented in Urbana, IL, USA, Oct. 2010, |
[422] T. Hoefler, S. Gottlieb: | ||
Parallel Zero-Copy Algorithms for Fast Fourier Transform and Conjugate Gradient using MPI Datatypes
Vol LNCS 6305, In Recent Advances in the Message Passing Interface (EuroMPI'10), presented in Stuttgart, Germany, pages 132--141, Springer, ISSN: 0302-9743, ISBN: 078-3-642-15645-8, Sep. 2010, |
[423] T. Hoefler, G. Bronevetsky, B. Barrett, B. R. de Supinski, A. Lumsdaine: | ||
Efficient MPI Support for Advanced Hybrid Programming Models
Vol LNCS 6305, In Recent Advances in the Message Passing Interface (EuroMPI'10), presented in Stuttgart, Germany, pages 50--61, Springer, ISSN: 0302-9743, ISBN: 078-3-642-15645-8, Sep. 2010, |
[424] T. Hoefler, W. Gropp, R. Thakur, J. Larsson Träff: | ||
Toward Performance Models of MPI Implementations for Understanding Application Scaling Issues
Vol LNCS 6305, In Recent Advances in the Message Passing Interface (EuroMPI'10), presented in Stuttgart, Germany, pages 21--30, Springer, ISSN: 0302-9743, ISBN: 078-3-642-15645-8, Sep. 2010, |
[425] J. Willcock, T. Hoefler, N. Edmonds, A. Lumsdaine: | ||
AM++: A Generalized Active Message Framework
In Proceedings of the 19th international conference on Parallel architectures and compilation techniques, presented in Vienna, Austria, pages 401--410, ACM, ISBN: 978-1-4503-0178-7, Sep. 2010, (acceptance rate: 17%, 46/266) |
[426] T. Hoefler: | ||
Analytical Performance Modeling and Simulation for Blue Waters
(Presentation) In Keynote at Workshop on Productivity and Performance (PROPER 2010), presented in Ischia, Italy, Aug. 2010, PROPER'10 Keynote Presentation |
[427] B. Arimilli, R. Arimilli, V. Chung, S. Clark, W. Denzel, B. Drerup, T. Hoefler, J. Joyner, J. Lewis, J. Li, N. Ni, R. Rajamony: | ||
The PERCS High-Performance Interconnect
IBM. In Proceedings of 18th Symposium on High-Performance Interconnects (Hot Interconnects 2010), IEEE, Aug. 2010, |
[428] T. Hoefler, R. Rabenseifner, H. Ritzdorf, B. R. de Supinski, R. Thakur, J. Larsson Träff: | ||
The Scalable Process Topology Interface of MPI 2.2
Concurrency and Computation: Practice and Experience. Vol 23, Nr. 4, pages 293-310, John Wiley & Sons, Ltd., ISSN: 1532-0634, Aug. 2010, |
[429] T. Hoefler, T. Schneider, A. Lumsdaine: | ||
Accurately Measuring Overhead, Communication Time and Progression of Blocking and Nonblocking Collective Operations at Massive Scale
International Journal of Parallel, Emergent and Distributed Systems. Vol 25, Nr. 4, pages 241-258, Taylor & Francis Group, ISSN: 1744-5779, Jul. 2010, |
[430] T. Hoefler, T. Schneider, A. Lumsdaine: | ||
LogGOPSim - Simulating Large-Scale Applications in the LogGOPS Model
In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, presented in Chicago, Illinois, pages 597--604, ACM, ISBN: 978-1-60558-942-8, Jun. 2010, LSAP'10 Best Paper Award |
[431] T. Hoefler: | ||
Nonblocking and Sparse Collective Operations on Petascale Computers
(Presentation) presented in Argonne National Laboratory, Jun. 2010, |
[432] T. Hoefler, J. Willcock, A. Chauhan, A. Lumsdaine: | ||
The Case for Collective Pattern Specification
Jun. 2010, Accepted at the 1st ACM Workshop on Advances in Message Passing (AMP'10) |
[433] R. Thakur, P. Balaji, D. Buntinas, D. Goodell, W. Gropp, T. Hoefler, S. Kumar, E. Lusk, J. Larsson Träff: | ||
MPI at Exascale
In Procceedings of SciDAC 2010, presented in Chattanooga, Tennessee, Jun. 2010, |
[434] T. Hoefler: | ||
2010 Blue Waters Performance Modeling Workshop -- Opening and Introduction
(Presentation) In Opening Slides for the Blue Waters Modeling Workshop, presented in Urbana, IL, USA, Mar. 2010, |
[435] T. Hoefler, C. Siebert, A. Lumsdaine: | ||
Scalable Communication Protocols for Dynamic Sparse Data Exchange
In Proceedings of the 2010 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'10), presented in Bangalore, India, pages 159--168, ACM, ISBN: 978-1-60558-708-0, Jan. 2010, (acceptance rate 16.8%, 29/173) |
[436] P. Kambadur, A. Gupta, T. Hoefler, A. Lumsdaine: | ||
Demand-driven Execution of Static Directed Acyclic Graphs Using Task Parallelism
presented in Kochi, India, pages 284-293, ISBN: 978-1-4244-4922-4, Dec. 2009, (acceptance rate 11%, 35/320) |
[437] T. Hoefler: | ||
Selected MPI-2.2 and MPI-3 Features
(Presentation) presented in Portland, OR, USA, Nov. 2009, MPICH Birds of a Feather Supercomputing 2009 (SC09), host: Darius Buntinas |
[438] T. Hoefler, T. Schneider, A. Lumsdaine: | ||
LogGP in Theory and Practice - An In-depth Analysis of Modern Interconnection Networks and Benchmarking Methods for Collective Operations.
Elsevier Journal of Simulation Modelling Practice and Theory (SIMPAT). Vol 17, Nr. 9, pages 1511-1521, Elsevier, ISSN: 1569-190X, Oct. 2009, |
[439] T. Hoefler: | ||
Improving Parallel Computing Platforms
(Presentation) presented in Munich, Germany, Oct. 2009, Presentation at the Technical University of Munich, Host: Prof. M. Gerndt |
[440] M. Passing Interface Forum: | ||
MPI: A Message-Passing Interface Standard Version 2.2
Sep. 2009, Chapter author for Collective Communication and Process Topologies |
[441] T. Hoefler, C. Siebert, A. Lumsdaine: | ||
Group Operation Assembly Language - A Flexible Way to Express Collective Communication
In ICPP-2009 - The 38th International Conference on Parallel Processing, presented in Vienna, Austria, IEEE, ISBN: 978-0-7695-3802-0, Sep. 2009, (acceptance rate 32%, 71/220) |
[442] T. Hoefler, A. Lumsdaine, J. Dongarra: | ||
Towards Efficient MapReduce Using MPI
In Recent Advances in Parallel Virtual Machine and Message Passing Interface, 16th European PVM/MPI Users' Group Meeting, presented in Helsinki, Finland, Springer, Sep. 2009, |
[443] T. Hoefler, T. Schneider, A. Lumsdaine: | ||
The Effect of Network Noise on Large-Scale Collective Communications
Parallel Processing Letters (PPL). Vol 19, Nr. 4, pages 573-593, World Scientific Publishing Company, Aug. 2009, |
[444] T. Hoefler, T. Schneider, A. Lumsdaine: | ||
Optimized Routing for Large-Scale InfiniBand Networks
In 17th Annual IEEE Symposium on High Performance Interconnects (HOTI 2009), presented in New York, NY, Aug. 2009, |
[445] T. Hoefler, J. Larsson Träff: | ||
Sparse Collective Operations for MPI
In Proceedings of the 23rd IEEE International Parallel & Distributed Processing Symposium, HIPS'09 Workshop, presented in Rome, Italy, ISSN: 1530-2075, ISBN: 978-1-4244-3750-4, May 2009, |
[446] T. Hoefler, T. Schneider, A. Lumsdaine: | ||
The Impact of Network Noise at Large-Scale Communication Performance
In Proceedings of the 23rd IEEE International Parallel & Distributed Processing Symposium, LSPP'09 Workshop, presented in Rome, Italy, ISSN: 1530-2075, ISBN: 978-1-4244-3750-4, May 2009, Invited to a journal special issue on top picks from LSPP'09. |
[447] T. Hoefler, T. Schneider, A. Lumsdaine: | ||
A Power-Aware, Application-Based, Performance Study Of Modern Commodity Cluster Interconnection Networks
In Proceedings of the 23rd IEEE International Parallel & Distributed Processing Symposium, CAC'09 Workshop, presented in Rome, Italy, ISSN: 1530-2075, ISBN: 978-1-4244-3750-4, May 2009, |
[448] C. Kaiser, T. Hoefler, B. Bierbaum, T. Bemmerl: | ||
Implementation and Analysis of Nonblocking Collective Operations on SCI Networks
In Proceedings of the 23rd IEEE International Parallel & Distributed Processing Symposium, CAC'09 Workshop, presented in Rome, Italy, ISSN: 1530-2075, ISBN: 978-1-4244-3750-4, May 2009, |
[449] T. Hoefler on behalf of the MPI Forum: | ||
MPI: A Message-Passing Interface Standard -- Working-Draft for Nonblocking Collective Operations
MPI Forum. MPI Forum, Apr. 2009, |
[450] J. Mueller, T. Schneider, J. Domke, R. Geyer, M. Haesing, T. Hoefler, S. Hoehlig, G. Juckeland, A. Lumsdaine, M. Mueller, W. Nagel: | ||
Cluster Challenge 2008: Optimizing Cluster Configuration and Applications to Maximize Power Efficiency
In In proceedings of the 10th LCI International Conference on High-Performance Clustered Computing, presented in Boulder, CO, Mar. 2009, LCI'09 Best Paper Award |
[451] T. Schneider, T. Hoefler, A. Lumsdaine : | ||
ORCS: An Oblivious Routing Congestion Simulator
Indiana University. Nr. 675, Indiana University Computer Science, Feb. 2009, |
[452] D. Gregor, T. Hoefler, B. Barrett, A. Lumsdaine : | ||
Fixing Probe for Multi-Threaded MPI Applications
Indiana University. Nr. 674, Indiana University Computer Science, Jan. 2009, |
[453] T. Hoefler: | ||
MPI-3 Collective Working Group - December'08 Meeting
(Presentation) Indiana University. presented in Menlo Park, CA, USA, Dec. 2008, Activity Report to the MPI Forum |
[454] T. Hoefler, T. Schneider, A. Lumsdaine: | ||
Multistage Switches are not Crossbars: Effects of Static Routing in High-Performance Networks
In Proceedings of the 2008 IEEE International Conference on Cluster Computing, presented in Tsukuba, Japan, IEEE Computer Society, ISSN: 1552-5244, ISBN: 978-1-4244-2640, Oct. 2008, (acceptance rate 30%, 28/92) |
[455] T. Hoefler, A. Lumsdaine: | ||
Message Progression in Parallel Computing - To Thread or not to Thread?
In Proceedings of the 2008 IEEE International Conference on Cluster Computing, presented in Tsukuba, Japan, IEEE Computer Society, ISSN: 1552-5244, ISBN: 978-1-4244-2640, Oct. 2008, (acceptance rate 30%, 28/92) |
[456] T. Hoefler: | ||
MPI-3 Collective Working Group - October'08 Meeting
(Presentation) Indiana University. presented in Chicago, IL, USA, Oct. 2008, Activity Report to the MPI Forum |
[457] T. Hoefler: | ||
Principles for Coordinated Optimization of Computation and Communication in Large-Scale Parallel Systems
Indiana University. presented in Bloomington, IN, USA, Sep. 2008, |
[458] T. Hoefler: | ||
MPI-3 Collective Working Group - September'08 Meeting
(Presentation) Indiana University. presented in Dublin, Ireland, Sep. 2008, Activity Report to the MPI Forum |
[459] T. Hoefler, M. Schellmann, S. Gorlatch, A. Lumsdaine: | ||
Communication Optimization for Medical Image Reconstruction Algorithms
Vol LNCS 5205, In Recent Advances in Parallel Virtual Machine and Message Passing Interface, 15th European PVM/MPI Users' Group Meeting, presented in Dublin, Ireland, pages 75-83, Springer, ISSN: 0302-9743, ISBN: 078-3-540-87474-4, Sep. 2008, |
[460] T. Hoefler, F. Lorenzen, A. Lumsdaine: | ||
Sparse Non-Blocking Collectives in Quantum Mechanical Calculations
Vol LNCS 5205, In Recent Advances in Parallel Virtual Machine and Message Passing Interface, 15th European PVM/MPI Users' Group Meeting, presented in Dublin, Ireland, pages 55-63, Springer, ISSN: 0302-9743, ISBN: 078-3-540-87474-4, Sep. 2008, |
[461] T. Hoefler: | ||
Non-blocking Collective Operations for MPI
(Presentation) Lawrence Livermore National Lab. presented in Livermore, CA, USA, Aug. 2008, |
[462] T. Hoefler: | ||
Multistage Interconnection Networks are not Crossbars
(Presentation) Lawrence Berkeley National Lab. presented in Berkeley, CA, USA, Aug. 2008, |
[463] T. Hoefler: | ||
The effects of common communication patterns in large-scale networks with switch-based static routing
(Presentation) Nerd Lunch at Cisco Systems. presented in San Jose, CA, USA, Aug. 2008, |
[464] P. Geoffray, T. Hoefler: | ||
Adaptive Routing Strategies for Modern High Performance Networks
In 16th Annual IEEE Symposium on High Performance Interconnects, HOTI'08, presented in Stanford, CA, USA, pages 165-172, IEEE Computer Society, ISBN: 978-0-7695-3380-3, Aug. 2008, (acceptance rate 30%, 14/47) |
[465] T. Hoefler, J. Larsson Träff, C. Siebert, A. Lumsdaine: | ||
MPI-3 Collective Working Group - June'08 Meeting
(Presentation) Indiana University. presented in Menlo Park, CA, USA, Jun. 2008, Activity Report to the MPI Forum |
[466] T. Hoefler: | ||
Towards coordinated optimization of computation and communication in parallel applications
(Presentation) Fakultaet fuer Informatik, Universität Münster. presented in Muenster, Germany, Jun. 2008, |
[467] T. Hoefler, P. Gottschling, A. Lumsdaine: | ||
Brief Announcement: Leveraging Non-blocking Collective Communication in High-performance Applications
In Proceedings of the Twentieth Annual Symposium on Parallelism in Algorithms and Architectures, SPAA'08, presented in Munich, Germany, pages 113-115, Association for Computing Machinery (ACM), ISBN: 978-1-59593-973-9, Jun. 2008, (short paper) (acceptance rate: 28%, 36/128) |
[468] T. Hoefler, A. Lumsdaine: | ||
Overlapping Communication and Computation with High Level Communication Routines
In Proceedings of the 8th IEEE Symposium on Cluster Computing and the Grid (CCGrid 2008), presented in Lyon, France, May 2008, (acceptance rate: 32%) |
[469] T. Hoefler: | ||
Non-Blocking Collectives for MPI
(Presentation) Institut fuer Wissenschaftliches Rechnen, Technische Universitaet Dresden. presented in Dresden, Germany, May 2008, |
[470] T. Hoefler, T. Schneider, A. Lumsdaine: | ||
Accurately Measuring Collective Operations at Massive Scale
In Proceedings of the 22nd IEEE International Parallel & Distributed Processing Symposium, PMEO'08 Workshop, presented in Miami, FL, ISSN: 1530-2075, ISBN: 978-1-4244-1694-3, Apr. 2008, Invited to a journal special issue on top picks from PMEO'08. |
[471] T. Hoefler, A. Lumsdaine: | ||
Optimizing non-blocking Collective Operations for InfiniBand
In Proceedings of the 22nd IEEE International Parallel & Distributed Processing Symposium, CAC'08 Workshop, presented in Miami, FL, ISSN: 1530-2075, ISBN: 978-1-4244-1694-3, Apr. 2008, |
[472] T. Hoefler, A. Lumsdaine: | ||
MPI-3 Collective Working Group - April'08 Meeting
(Presentation) Indiana University. presented in Chicago, IL, USA, Apr. 2008, Slides with proposals to the MPI-3 collective WG, all preliminary, published on request |
[473] T. Hoefler, A. Lumsdaine: | ||
MPI-3 Collective Working Group - January'08 Meeting
(Presentation) Indiana University. presented in Chicago, IL, USA, Mar. 2008, Slides with proposals to the MPI-3 collective WG, all preliminary, published on request |
[474] T. Hoefler, F. Lorenzen, D. Gregor, A. Lumsdaine: | ||
Topological Collectives for MPI-2
Open Systems Lab, Indiana University. MPI Forum, Feb. 2008, |
[475] D. Gregor, T. Hoefler, A. Lumsdaine: | ||
Dynamically-Sized Messages in MPI-3
Open Systems Lab, Indiana University. MPI Forum, Feb. 2008, |
[476] T. Schneider, T. Hoefler, S. Wunderlich, T. Mehlan, W. Rehm: | ||
An optimized ZGEMM implementation for the Cell BE
In Proceedings of the 9th Workshop on Parallel Systems and Algorithms (PASA), presented in Dresden, Germany, ISSN: 1617-5468, ISBN: 978-3-88579-218-5, Feb. 2008, |
[477] T. Hoefler: | ||
Accurately Measuring Collective Operations at Massive Scale
(Presentation) C&C Research Laboratories, NEC Europe Ltd.. presented in Sankt Augustin, Germany, Dec. 2007, |
[478] T. Hoefler: | ||
Non-blocking Collectives for MPI-2
(Presentation) High Performance Computing Center Stuttgart (HLRS). presented in Stuttgart, Germany, Dec. 2007, |
[479] F. Mietke, T. Mehlan, T. Hoefler, W. Rehm: | ||
Design and Evaluation of a 2048 Core Cluster System
In Proceedings of 3rd KiCC Workshop 2007, presented in Aachen, Germany, RWTH Aachen, Dec. 2007, |
[480] T. Hoefler, M. Mosch, T. Mehlan, W. Rehm: | ||
CollGM - A Myrinet/GM optimized collective component for Open MPI
In Proceedings of 3rd KiCC Workshop 2007, presented in Aachen, Germany, RWTH Aachen, Dec. 2007, |
[481] A. Friedley, T. Hoefler, M. Leininger, A. Lumsdaine: | ||
Scalable High Performance Message Passing over InfiniBand for Open MPI
In Proceedings of 3rd KiCC Workshop 2007, presented in Aachen, Germany, RWTH Aachen, Dec. 2007, |
[482] T. Hoefler, A. Lumsdaine, W. Rehm: | ||
Implementation and Performance Analysis of Non-Blocking Collective Operations for MPI
In Proceedings of the 2007 International Conference on High Performance Computing, Networking, Storage and Analysis, SC07, presented in Reno, USA, IEEE Computer Society/ACM, Nov. 2007, (acceptance rate 20%, 54/268) |
[483] T. Hoefler, P. Kambadur, R. L. Graham, G. Shipman, A. Lumsdaine: | ||
A Case for Standard Non-Blocking Collective Operations
Vol 4757, In Recent Advances in Parallel Virtual Machine and Message Passing Interface, EuroPVM/MPI 2007, presented in Paris, France, pages 125-134, Springer, ISSN: 0302-9743, ISBN: 978-3-540-75415-2, Oct. 2007, |
[484] T. Schneider, S. Wunderlich, W. Rehm, T. Hoefler, H. Schick: | ||
Code Optimization for Cell/B.E. - Opportunities for ABINIT
In IBM CASCON 2006 Symposium, presented in Dublin, Ireland, IBM, Oct. 2007, Research Poster at the IBM CASCON 2006 Symposium, Dublin, Ireland |
[485] T. Hoefler: | ||
Non-blocking Collectives for MPI-2
(Presentation) Dresden University of Technology, Center for Information Services and High Performance Computing (ZIH). presented in Dresden, Germany, Oct. 2007, |
[486] T. Hoefler, T. Mehlan, A. Lumsdaine, W. Rehm: | ||
Netgauge: A Network Performance Measurement Framework
Vol 4782, In Proceedings of High Performance Computing and Communications, HPCC'07, presented in Houston, USA, pages 659-671, Springer, ISBN: 978-3-540-75443-5, Sep. 2007, |
[487] T. Hoefler, P. Gottschling, A. Lumsdaine, W. Rehm: | ||
Optimizing a Conjugate Gradient Solver with Non-Blocking Collective Operations
Elsevier Journal of Parallel Computing (PARCO). Vol 33, Nr. 9, pages 624-633, Elsevier, ISSN: 0167-8191, Sep. 2007, |
[488] F. Mietke, T. Mehlan, T. Hoefler, W. Rehm: | ||
Stand HPC Cluster CHiC
(Presentation) TU Chemnitz. presented in Chemnitz, Germany, Apr. 2007, |
[489] F. Mietke, T. Hoefler, T. Mehlan, W. Rehm: | ||
Diskless Cluster und Lustre - Erfahrungsbericht zum CHiC
(Presentation) TU Chemnitz. presented in Chemnitz, Germany, Apr. 2007, |
[490] T. Hoefler, C. Siebert, W. Rehm: | ||
A practically constant-time MPI Broadcast Algorithm for large-scale InfiniBand Clusters with Multicast
TU Chemnitz. In Proceedings of the 21st IEEE International Parallel & Distributed Processing Symposium (CAC'07 Workshop), presented in Long Beach, CA, USA, pages 232, IEEE Computer Society, ISBN: 1-4244-0909-8, Mar. 2007, |
[491] T. Hoefler, A. Lichei, W. Rehm: | ||
Low-Overhead LogGP Parameter Assessment for Modern Interconnection Networks
TU Chemnitz. In Proceedings of the 21st IEEE International Parallel & Distributed Processing Symposium, PMEO'07 Workshop, presented in Long Beach, CA, USA, IEEE Computer Society, ISBN: 1-4244-0909-8, Mar. 2007, |
[492] T. Hoefler, G. Zerah: | ||
Transforming the high-performance 3d-FFT in ABINIT to enable the use of non-blocking collective operations
Commissariat a l'Energie Atomique - Direction des applications militaires (CEA-DAM). presented in Bruyeres-le-Chatel, France, Feb. 2007, |
[493] F. Mietke, D. Dunger, T. Mehlan, T. Hoefler, W. Rehm: | ||
A native InfiniBand Transporter for MySQL Cluster
TU Chemnitz. In Proceedings of the 2nd Workshop 'Kommunikation in Clusterrechnern und Clusterverbundsystemen' (KiCC'07), presented in Chemnitz, Germany, Feb. 2007, |
[494] T. Hoefler: | ||
Application Optimization with non-blocking Collectives
(Presentation) Commissariat a l'Energie Atomique - Direction des applications militaires (CEA-DAM). presented in Bruyeres-le-chatel, France, Jan. 2007, |
[495] T. Hoefler, G. Zerah: | ||
Optimization of a parallel 3d-FFT with non-blocking collective operations
(Presentation) Invited to the 3rd International ABINIT Developer Workshop. presented in Liege, Belgium, Jan. 2007, |
[496] T. Hoefler: | ||
Non-Blocking Collectives for MPI-2
(Presentation) Commissariat a l'Energie Atomique - Direction des applications militaires (CEA-DAM). presented in Bruyeres-le-chatel, France, Jan. 2007, |
[497] T. Hoefler, J. Squyres, W. Rehm, A. Lumsdaine: | ||
A Case for Non-Blocking Collective Operations
Vol 4331/2006, In Frontiers of High Performance Computing and Networking - ISPA'06 Workshops, presented in Sorrento, Italy, pages 155-164, Springer Berlin / Heidelberg, ISBN: 978-3-540-49860-5, Dec. 2006, |
[498] T. Hoefler: | ||
Non-blocking Collectives for MPI-2
(Presentation) C&C Research Laboratories, NEC Europe Ltd.. presented in Sankt Augustin, Germany, Nov. 2006, |
[499] T. Hoefler, R. Janisch, W. Rehm: | ||
Parallel scaling of Teter's minimization for Ab Initio calculations
presented in Tampa, FL, USA, Nov. 2006, Presented at the workshop HPC Nano in conjunction with the IEEE international conference on Supercomputing (SC'06) |
[500] T. Mehlan, J. Strunk, T. Hoefler, . Frank Mietke, W. Rehm: | ||
IRS - A portable Interface for Reconfigurable Systems
In Proceedings of IEEE International Conference on Parallel Computing in Electrical Engineering (PARELEC'06), presented in Bialystok, Poland, pages 187-191, IEEE Computer Society, ISBN: 0-7695-2554-7, Sep. 2006, |
[501] T. Hoefler, P. Gottschling, W. Rehm, A. Lumsdaine: | ||
Optimizing a Conjugate Gradient Solver with Non-Blocking Collective Operations
In Recent Advantages in Parallel Virtual
Machine and Message Passing Interface. 13th European PVM/MPI User's
Group Meeting, Proceedings, LNCS 4192, presented in Bonn, Germany, pages 374-382, Springer, ISSN: 0302-9743, ISBN: 3-540-39110-X, Sep. 2006, Invited to a journal special issue on top picks from EuroMPI'06. |
[502] T. Hoefler, C. Viertel, T. Mehlan, F. Mietke, W. Rehm: | ||
Assessing Single-Message and Multi-Node Communication Performance of InfiniBand
In Proceedings of IEEE International Conference on Parallel Computing in Electrical Engineering (PARELEC'06), presented in Bialystok, Poland, pages 227-232, IEEE Computer Society, ISBN: 0-7695-2554-7, Sep. 2006, |
[503] T. Hoefler, J. Squyres, G. Fagg, G. Bosilca, W. Rehm, A. Lumsdaine: | ||
A New Approach to MPI Collective Communication Implementations
In Distributed and Parallel Systems - From Cluster to Grid Computing (DAPSYS'06), presented in Innsbruck, Austria, pages 45-54, Springer, ISBN: 978-0-387-69857-1, Sep. 2006, |
[504] T. Hoefler, A. Lumsdaine: | ||
Design, Implementation, and Usage of LibNBC
Open Systems Lab, Indiana University. presented in Bloomington, IN, USA, School of Informatics, Aug. 2006, |
[505] T. Hoefler, J. Squyres, G. Bosilca, G. Fagg, A. Lumsdaine, W. Rehm: | ||
Non-Blocking Collective Operations for MPI-2
Open Systems Lab, Indiana University. presented in Bloomington, IN, USA, School of Informatics, Aug. 2006, |
[506] F. Mietke, R. Baumgartl, R. Rex, T. Mehlan, T. Hoefler, W. Rehm: | ||
Analysis of the Memory Registration Process in the Mellanox InfiniBand Software Stack
In Proceedings of Euro-Par 2006 Parallel Processing, presented in Dresden, Germany, pages 124-133, Springer-Verlag Berlin, ISBN: 3-540-37783-2, Aug. 2006, (acceptance rate 37.9%, 110/290) |
[507] T. Hoefler, M. Reinhardt, F. Mietke, T. Mehlan, W. Rehm: | ||
Low Overhead Ethernet Communication for Open MPI on Linux Clusters
TU Chemnitz. Vol CSR-06, Nr. 06, In Chemnitzer Informatik Berichte, presented in Chemnitz, TU Chemnitz, ISSN: 0947-5125, Jul. 2006, |
[508] R. Riesen, C. Vaughan, a. Torsten Hoefler: | ||
What if MPI Collective Operations Were Instantaneous?
Cray Inc.. In Proceedings of the 2006 Cray User Group Meeting, presented in Lugano, Switzerland, May 2006, |
[509] T. Hoefler, T. Mehlan, F. Mietke, W. Rehm: | ||
Fast Barrier Synchronization for InfiniBand
In Proceedings of the 20th IEEE International Parallel & Distributed Processing Symposium (IPDPS), CAC'06 Workshop, presented in Rhodes, Greece, ISBN: 1-4244-0054-6, Apr. 2006, |
[510] T. Hoefler: | ||
Open MPI - Collv2 Design
(Presentation) Cisco Systems. presented in San Jose, CA, USA, Apr. 2006, |
[511] T. Hoefler, T. Mehlan, F. Mietke, W. Rehm: | ||
LogfP - A Model for small Messages in InfiniBand
In Proceedings of the 20th IEEE International Parallel & Distributed Processing Symposium (IPDPS), PMEO-PDS'06 Workshop, presented in Rhodes, Greece, ISBN: 1-4244-0054-6, Apr. 2006, |
[512] T. Hoefler, T. Mehlan, F. Mietke, W. Rehm: | ||
Adding Low-Cost Hardware Barrier Support to Small Commodity Clusters
In Proceedings of 19th International Conference
on Architecture and Computing Systems - ARCS'06, presented in Frankfurt, Germany, pages 343-250, ISSN: 3-88579-175-7, Mar. 2006, |
[513] T. Hoefler: | ||
Parallelization Options for the Band-by-Band Minimization of Teter et. al.
(Presentation) Universite catholique de Louvain. presented in Louvain-la-Neuve, Belgium, Feb. 2006, |
[514] R. Kullmann, T. Hoefler: | ||
A short Performance Analysis of Abinit under different build environments
TU Chemnitz. presented in Chemnitz, Germany, Jan. 2006, |
[515] T. Hoefler: | ||
The Cell Processor
22. Chaos Communication Congress. In 22C3 Proceedings, presented in Berlin, Germany, pages 286-292, ISBN: 3-934636-04-7, Dec. 2005, |
[516] T. Hoefler, R. Janisch, W. Rehm: | ||
A Performance Analysis of ABINIT on a Cluster System
TU Chemnitz. In Parallel Algorithms and Cluster Computing, presented in Chemnitz, Germany, pages 37-51, Springer, Lecture Notes in Computational Science and Engineering, ISBN: 3-540-33539-0, Dec. 2005, |
[517] T. Hoefler, R. Janisch, W. Rehm: | ||
Improving the parallel scaling of ABINIT
CINECA Consorzio Interuniversitario. In Science and Supercomputing in Europe - Report 2005, presented in Caseleccio di Reno, Italy, pages 551-559, CINECA Conzorzio Interuniversitario, ISBN: 88-86037-17-1, Dec. 2005, |
[518] T. Hoefler, J. Squyres, T. Mehlan, F. Mietke, W. Rehm: | ||
Implementing a Hardware-based Barrier in Open MPI
TU Chemnitz. In Proceedings of 2005 KiCC Workshop, Chemnitzer Informatik Berichte, presented in Chemnitz, Germany, ISSN: 0947-5152, Nov. 2005, |
[519] T. Mehlan, T. Hoefler, F. Mietke, W. Rehm: | ||
Integration of the SISCI Shared Memory Interface into Open MPI
TU Chemnitz. In Proceedings of 2005 KiCC Workshop, Chemnitzer Informatik Berichte, presented in Chemnitz, Germany, ISSN: 0947-5152, Nov. 2005, |
[520] F. Mietke, R. Rex, T. Hoefler, T. Mehlan, W. Rehm: | ||
Reducing the Impact of Memory Registration in InfiniBand.
TU Chemnitz. In Proceedings of 2005 KiCC Workshop, Chemnitzer Informatik Berichte, presented in Chemnitz, Germany, ISSN: 0947-5152, Nov. 2005, |
[521] T. Hoefler, W. Rehm: | ||
Communication/Computation Overlap in MPI
(Presentation) Technical University of Chemnitz. presented in Chemnitz, Germany, Nov. 2005, |
[522] T. Hoefler: | ||
Fast Barrier Synchronization for InfiniBand
(Presentation) Technical University of Chemnitz. presented in Munich, Germany, Sep. 2005, |
[523] T. Hoefler, W. Rehm: | ||
A short Performance Analysis of Abinit on a Cluster System
Computer Architecture Technical Report. Technical University of Chemnitz. presented in Chemnitz, Germany, Jul. 2005, |
[524] T. Hoefler, W. Rehm: | ||
A Communication Model for Small Messages with InfiniBand
PARS. In PARS Mitteilungen, presented in Luebeck, Germany, pages 32-41, PARS, ISSN: 0177-0454, Jun. 2005, PARS Junior Researcher Prize |
[525] T. Hoefler, L. Cerquetti, T. Mehlan, F. Mietke, W. Rehm: | ||
A practical approach to the rating of barrier algorithms using the LogP model and Open-MPI
In Proceedings of the 2005 International Conference on Parallel Processing Workshops, presented in Oslo, Norway, pages 562--569, ISBN: 0-7659-2381-1, Jun. 2005, |
[526] F. Mietke, M. Steiger, T. Mehlan, T. Hoefler und Wolfgang Rehm: | ||
SHIBA Shared Memory Support for InfiniBand MPICH2 Device
In PARS Mitteilungen 2005, presented in Luebeck, Germany, pages 14-23, ISSN: 0177-0454, Jun. 2005, |
[527] T. Hoefler: | ||
Evaluation of publicly available Barrier-Algorithms and Improvement of the Barrier-Operation for large-scale Cluster-Systems with special Attention on InfiniBand Networks
Technical University of Chemnitz. presented in Chemnitz, Germany, Apr. 2005, TU Chemnitz Best Student Award, 2005 |
[528] T. Hoefler: | ||
Remote Network Analysis
21. Chaos Communication Congress. In 21C3 Proceedings, presented in Berlin, Germany, pages 33-37, ISBN: 3-934636-02-0, Dec. 2004, |
[529] T. Hoefler, T. Mehlan, F. Mietke, W. Rehm: | ||
A Survey of Barrier Algorithms for Coarse Grained Supercomputers
Chemnitzer Informatik Berichte. Technical University of Chemnitz. Vol 04, Nr. 03, presented in Chemnitz, Germany, ISSN: 0947-5152, Dec. 2004, |
[530] T. Hoefler, W. Rehm: | ||
A Meta Analysis of Gigabit Ethernet over Copper Solutions for Cluster-Networking
Chemnitzer Informatik Berichte. Technical University of Chemnitz. Vol 04, Nr. 04, presented in Chemnitz, Germany, ISSN: 0947-5152, Dec. 2004, |