Publication

Chip-Level Redundancy in Distributed Shared-Memory Multiprocessors

Related publications (32)

FPGAs in the Datacenters: the Case of Parallel Hybrid Super Scalar String Sample Sort

Paolo Ienne, Mikhail Asiatici, Damian Maiorano

String sorting is an important part of database and MapReduce applications; however, it has not been studied as extensively as sorting of fixed-length keys. Handling variable-length keys in hardware is challenging and it is no surprise that no string sorte ...

IEEE COMPUTER SOC2020

Fully-Asynchronous Cache-Efficient Simulation of Detailed Neural Networks

Felix Schürmann, Michael Lee Hines

Modern asynchronous runtime systems allow the re-thinking of large-scale scientific applications. With the example of a simulator of morphologically detailed neural networks, we show how detaching from the commonly used bulk-synchronous parallel (BSP) exec ...

SPRINGER INTERNATIONAL PUBLISHING AG2019

Asynchronous Branch-Paralle Simulation of Detailed Neuron Models

Felix Schürmann, Michael Lee Hines

Simulations of electrical activity of networks of morphologically detailed neuron models allow for a better understanding of the brain. State-of-the-art simulations describe the dynamics of ionic currents and biochemical processes within branching topologi ...

2019

HetExchange: Encapsulating heterogeneous CPU-GPU parallelism in JIT compiled engines

Anastasia Ailamaki, Manolis Karpathiotakis, Raja Appuswamy, Periklis Chrysogelos

Modern server hardware is increasingly heterogeneous as hardware accelerators, such as GPUs, are used together with multicore CPUs to meet the computational demands of modern data analytics workloads. Unfortunately, query parallelization techniques used by ...

2019

Asynchronous Simulation of Neuronal Activity

Bruno Ricardo Da Cunha Magalhães

Simulations of the electrical activity of networks of morphologically-detailed neuron models allow for a better understanding of the brain. Short time to solution is critical in order to study long biological processes such as synaptic plasticity and learn ...

EPFL2019

Improving Main-memory Database System Performance through Cooperative Multitasking

Georgios Psaropoulos

Database systems access memory either sequentially or randomly. Contrary to sequential access and despite the extensive efforts of computer architects, compiler writers, and system builders, random access to data larger than the processor cache has been s ...

EPFL2019

PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization

Thijs Vogels, Martin Jaggi

We study lossy gradient compression methods to alleviate the communication bottleneck in data-parallel distributed optimization. Despite the significant attention received, current compression schemes either do not scale well, or fail to achieve the target ...

2019

Computational characteristics and hardware implications of brain tissue simulations

Francesco Cremonesi

Understanding the link between the brain's anatomy and its function through computer simulations of neural tissue models is a widely used approach in computational neuroscience. This technique enables rapid prototyping and testing of hypotheses, allowing r ...

EPFL2019

Stop Crying Over Your Cache Miss Rate: Handling Efficiently Thousands of Outstanding Misses in FPGAs

Paolo Ienne, Mikhail Asiatici

FPGAs rely on massive datapath parallelism to accelerate applications even with a low clock frequency. However, applications such as sparse linear algebra and graph analytics have their throughput limited by irregular accesses to external memory for which ...

ASSOC COMPUTING MACHINERY2019

Linebacker: Preserving Victim Cache Lines in Idle Register Files of GPUs

Yunho Oh

Modern GPUs suffer from cache contention due to the limited cache size that is shared across tens of concurrently running warps. To increase the per-warp cache size prior techniques proposed warp throttling which limits the number of active warps. Warp thr ...

ASSOC COMPUTING MACHINERY2019

Network-Compute Co-Design for Distributed In-Memory Computing

Alexandros Daglis

The booming popularity of online services is rapidly raising the demands for modern datacenters. In order to cope with data deluge, growing user bases, and tight quality of service constraints, service providers deploy massive datacenters with tens to hund ...

EPFL2018

Scalable Synchronization in Shared-Memory Systems: Extrapolating, Adapting, Tuning

Georgios Chatzopoulos

As hardware evolves, so do the needs of applications. To increase the performance of an application, there exist two well-known approaches. These are scaling up an application, using a larger multi-core platform, or scaling out, by distributing work to mul ...

EPFL2018