Eu Nu Ma Tem NEW!
Eu nu mă tem că voi cădea, Oricât de-aproape-aş fi de val.Când El conduce luntrea mea, Eu liniştit privesc spre mal.Cor:(:Să fie furtună şi valuri, şi negură grea, Eu nu mă tem când Isus e-n corabia mea:)Eu nu mă tem, eu nu mă tem, Căci ştiu în cine am crezut.Când vin talazuri eu îl chem, Şi nici nu ştiu cum am trecut.Eu nu mă tem, şi vreau să spunCă Salvatorul meu e bun.Furtuna El a liniştit, Şi valurile le-a oprit.
Eu Nu Ma Tem
Since 2014, I'm full professor at Telecom SudParis/IP Paris. My research focuses on virtualization, operating systems, concurrency and language runtimes. I am particularly interested in improving the performance, the design and the safety of the systems. I lead the Parallel and Distributed Systems group of the computer science department of Telecom SudParis. After having been the chair of the french chapter of the ACM SIGOPS from 2011 to 2014, I acted as treasurer from 2014 to 2016. I received the PhD degree from UPMC Sorbonne Université in 2005 and my "Habilitation à diriger les recherche", also from UPMC Sorbonne Université, in 2012. From 2006 to 2014, I was associate professor at UPMC Sorbonne Université, in the LIP6 laboratory, and in 2005, I performed postdoctoral research at the Université de Grenoble "Jospeh Fourier".
If you are interested in a PhD, send me a note. You will need to demonstrate real scientific curiosity, and interest in research topics such as concurrent programming, virtualization, operating systems, language runtimes, etc.
During the last decade, the need for computational power has increased due to the emergence and fast evolution of fields such as data analysis or artificial intelligence. This tendency is also reinforced by the growing number of services and end-user devices. Due to physical constraints, the trend for new hardware has shifted from an increase in processor frequency to an increase in the number of cores per machine.
Directly concerned by this change, operating systems have evolved to include complex rules each pertaining to different hardware configurations. However, more often than not, resources management units are responsible for one specific resource and make a decision in isolation. Moreover, because of the complexity and fast evolution rate of hardware, operating systems, not designed to use a generic approach have trouble keeping up. Given the advance of virtualization technology, we propose a new approach to resource management in complex topologies using virtualization to add a small software layer dedicated to resources placement in between the hardware and a standard operating system.
Similarly, in user space applications, parallelism is an important lever to attain high perfor- mances, which is why high performance computing runtimes, such as MPI, are built to increase parallelism in applications. The recent changes in modern architectures combined with fast networks have made overlapping CPU-bound computation and network communication a key part of parallel applications. While some degree of overlap might be attained manually, this is often a complex and error prone procedure. Our proposal automatically transforms blocking communications into nonblocking ones to increase the overlapping potential. To this end, we use a separate communication thread responsible for handling communications and a memory protection mechanism to track memory accesses in communication buffers. This guarantees both progress for these communications and the largest window during which communication and computation can be processed in parallel.
The virtualization technology and the NUMA architecture both evolved independently to tackle different issues: reduce hardware usage cost for the first, produce more powerful hardware for the second. Nonetheless, nowadays, the hardware used in the cloud data centers uses NUMA architectures and thus, the virtual machines are executed atop such hardware. The virtualization software has, however, not been designed for NUMA architectures. Because of this poor integration, the applications executed inside a virtual machine running atop of a NUMA architecture may have low performance. As the combined use of NUMA architectures and virtualization is relatively recent, because of the cloud computing emergence, only a few works address this performance issue.
My PhD thesis addresses the challenge of efficiently virtualizing a NUMA architecture in a cloud infrastructure. In detail, my research is twofold. On the first side, my research has the goal of measuring how virtualization behaves on a NUMA architecture, and how and why a NUMA architecture changes the performance of virtualized applications.
Understanding the performance of a multi-threaded application is difficult. The threads interfere when they access the same hardware resource or the same synchronization primitive, which slows down their execution. Unfortunately, current profiling tools reports the hardware components or the synchronization primitives that saturate, but they cannot tell if the saturation is the cause of a performance bottleneck.
In this PhD these, I propose a holistic metric able to pinpoint the blocks of code that suffer interference the most, regardless of the interference cause. The metric relies on differential execution, but instead of comparing previously identified inefficient runs with efficient ones, I consider performance variation as a universal indicator of interference problems. With an evaluation of 27 applications I show that the metric can identify interference problems caused by 6 different kinds of interactions in 9 applications.
Large-scale multicore architectures create new challenges for garbage collectors (GCs). On contemporary cache-coherent Non-Uniform Memory Access (ccNUMA) architectures, applications with a large memory footprint suffer from the cost of the garbage collector (GC), because, as the GC scans the reference graph, it makes many remote memory accesses, saturating the interconnect between memory nodes. In this thesis, we address this problem with NumaGiC, a GC with a mostly-distributed design.
In order to maximise memory access locality during collection, a GC thread avoids accessing a different memory node, instead notifying a remote GC thread with a message; nonetheless, NumaGiC avoids the drawbacks of a pure distributed design, which tends to decrease parallelism and increase memory access imbalance, by allowing threads to steal from other nodes when they are idle. NumaGiC strives to find a perfect balance between local access, memory access balance, and parallelism.
In this work, we compare NumaGiC with Parallel Scavenge and some of its incrementally improved variants on two different ccNUMA architectures running on the Hotspot Java Virtual Machine of OpenJDK 7. On Spark and Neo4j, two industry-strength analytics applications, with heap sizes ranging from 160 GB to 350 GB, and on SPECjbb2013 and SPECjbb2005, NumaGiC improves over- all performance by up to 94% over Parallel Scavenge, and increases the performance of the collector itself by up to 5.4 over Parallel Scavenge. In terms of scalability of GC throughput with increasing number of NUMA nodes, NumaGiC scales substantially better than Parallel Scavenge for all the applications. In fact in case of SPECjbb2005, where inter-node object references are the least among all, NumaGiC scales almost linearly.
Today, Java is regularly used to implement large multithreaded server-class applications that use locks to protect access to shared data. However, understanding the impact of locks on the performance of a system is complex, and thus the use of locks can impede the progress of threads on configurations that were not anticipated by the developer, during specific phases of the execution. In this paper, we propose Free Lunch, a new lock profiler for Java application servers, specifically designed to identify, in-vivo, phases where the progress of the threads is impeded by a lock. Free Lunch is designed around a new metric, critical section pressure (CSP), which directly correlates the progress of the threads to each of the locks. Using Free Lunch, we have identified phases of high CSP, which were hidden with other lock profilers, in the distributed Cassandra NoSQL database and in several applications from the DaCapo 9.12, the SPECjvm2008 and the SPECjbb2005 benchmark suites. Our evaluation of Free Lunch shows that its overhead is never greater than 6%, making it suitable for in-vivo use.
Other contributions presented in this thesis include a profiler that identifies the locks that are the bottlenecks in multithreaded applications and that can thus benefit from RCL, and a reengineering tool developed with Julia Lawall that transforms POSIX locks into RCL locks. Eighteen applications were used to evaluate RCL: the nine applications of the SPLASH-2 benchmark suite, the seven applications of the Phoenix 2 benchmark suite, Memcached, and Berkeley DB with a TPC-C client. Eight of these applications are unable to scale because of locks and benefit from RCL on an x86 machine with four AMD Opteron processors and 48 hardware threads. Using RCL locks, performance is improved by up to 2.5 times with respect to POSIX locks on Memcached, and up to 11.6 times with respect to Berkeley DB with the TPC-C client. On an SPARC machine with two Sun Ultrasparc T2+ processors and 128 hardware threads, three applications benefit from RCL. In particular, performance is improved by up to 1.3 times with respect to POSIX locks on Memcached, and up to 7.9 times with respect to Berkeley DB with the TPC-C client.
Our first contribution, called Jasmin, aims at preventing resource sharing conflicts by isolating applications. Jasmin is a middleware for development, deployment and isolation of native, component-based and service-oriented applications targeted at embedded systems. Jasmin enables fast and easy cross-application communication, and uses Linux containers for lightweight isolation. Our second contribution, called Incinerator, is a subsystem in the Java Virtual Machine (JVM) aiming to resolve the problem of Java stale references, i.e., references to objects that should no more be used. Stale references can cause significant memory leaks in an OSGi-based smart home gateway, hence decreasing the amount of available memory, which increases the risks of memory sharing conflicts. With less than 4% overhead, Incinerator not only detects stale references, making them visible to developers, but also eliminates them, hence lowering the risks of resource sharing conflicts. Even in Java, memory sharing conflicts happen. Thus, in order to detect them, we propose our third contribution: a memory monitoring subsystem integrated into the JVM. Our subsystem is mostly transparent to application developers and also aware of the component model composing smart home applications. The system accurately accounts for resources consumed during cross-application interactions, and provides on-demand snapshots of memory usage statistics for the different service providers sharing the gateway. 041b061a72