Previous events


Modelling and simulation of complex fluids

May, 2016

Visiting Nelder fellow Prof. John Tsamopoulos delivered a series of lectures on complex fluids:

Rheology of complex fluids: introduction

Part 2: Shear flows

Part 3: Extensional flows

Part 4: Generalized Newtonian fluids

Part 5: Linear viscoelasticity

Part 6: Nonlinear viscoelasticity


HPC summer school

September 28th - October 2nd, 2015

The HPC summer school brought together the HPC support team, scientific community and businesses for one week of tutorials, lectures, community sessions and exchange of ideas.

Abstractions for Data-Centric Computing

Didem Unat, Koç University
Wednesday July 29th at 15:30 in Huxley 217

Programming models play a crucial role in providing the necessary tools to express locality and minimize data movement, while also abstracting complexity from programmers.Unfortunately, existing compute-centric programming environments provide fewabstractions to manage data movement and locality, and rely on a large shared cache to virtualize data movement. We propose three programming abstractions, tiles, layout and loop traversal that address data locality and increased parallelism on emerging parallel computing systems. The TiDA library implements these abstractions in the data structures through domain decomposition and provides performance portable codes. We demonstrate how TiDA provides high performance with minimal coding effort on current and future NUMA node architectures.

Bio: Didem Unat joined Koç University in Istanbul in September 2014 as a full time faculty. Previously she was at the Lawrence Berkeley National Laboratory. She is the recipient of the Luis Alvarez Fellowship in 2012 at the Berkeley Lab. Her research interest lies primarily in the area of high performance computing, parallel programming models, compiler analysis and performance modeling. She is currently working on designing and evaluating programming models for future exascale architectures, as part of Hardware Software co-design project. She received her Ph.D under Prof. Scott B. Baden's research group at University of California-San Diego. In her thesis, she developed the Mint programming model and its source-to-source compiler to facilitate GPGPU programming. She holds a B.S in computer engineering from Bo─čaziçi University.

ExaStencils: Towards the automatic generation of highly parallel multigrid implementations

Hannah Rittich (Univ Wuppertal)
Friday June 26th 2015 at 10:30 in Huxley 218

In an ideal (scientific computing) world we would have a software thatrequires only a partial differential equation (PDE) as input to compute itssolution efficiently on a highly parallel computer. ExaStencils is one of theresearch projects that work towards this ideal world. Within the project weare especially focused on efficient and highly parallel multigrid solvers.To implement them we design domain specific languages (DSLs) that allow us toconveniently describe the problems and the algorithms that solve them. TheseDSLs are designed to be translated into efficient and parallel code easily.Furthermore, to achieve good performance of the solver, we employ methods thatchoose automatically the more efficient algorithms that solve the given PDE onthe given hardware.I would like to present the current state of the project, and the lessons we have learned so far.

Hannah Rittich is a PhD student at the University of Wuppertal in the group of Andreas Frommer, supervised by Matthias Bolten ( . She is working on the ExaStencils project (

A Nonhydrostatic Atmospheric Dynamical Core in CAM-SE

Henry Tufo, University of Colorado
Friday June 5th at 4pm, location to be announced

In order to perform climate simulations in the Commuity Earth-System Model (CESM) with atmosphericresolutions well beyond 1/8th of a degree, a number of new technologies must be put into place. Mostfundamentally, the Primitive-Equations of motion utilized by the Communicty Atmosphere Model (CAM)dynamical-core must be replaced or augmented with one or more nonhydrostatic models of the rotating,compressible-Euler / Navier-Stokes system. Additionally, these solvers must perform at extrememly highlevels of throughput and scale with near optimal parallel efficiency in order to make such expensivehigh-resolution simulations practical. In this talk, we will discuss our efforts to develop a nonhydrostaticmodel in CAM-SE, its parallel scaling, and related supporting technologies.

Dr. Henry Tufo received his Ph.D. from Brown University in 1998. Henry was a member of the DOE ASC Center for Astrophysical Thermonuclear Flashes at the University of Chicago and Argonne National Laboratory from 1998 to 2002 and from 2002 to 2013 was as a Scientist and Computer Science Section Head at the National Center for Atmospheric Research. He is currently a Full Professor of Computer Science at the University of Colorado and Director and founder of the Computational Science Center. Henry conducts research in high-performance scientific computing, parallel architectures and algorithms, Linux clusters, scalable solvers, high-order numerical methods, computational fluid dynamics, and flow visualization. He is co-developer of NEK5000, long-time contributor to the HOMME effort, lead architect for the Janus computer system and its co-designed facility, manager of NCAR’s TeraGrid resources, and recipient of the Gordon Bell award in 1999 and 2000 for demonstrated excellence in high-performance and large-scale parallel computing.

Two talks by Stephane Popinet

Adaptive numerical methods for fluid mechanics

Date: Thursday June 4
Time & place: 4pm Huxley 340

The equations of fluid mechanics can be used to describe natural processes over a wide range of scales, from the behaviour of micro-organisms to astrophysics. Each of these processes is in turn often controlled by internal interactions on widely different scales. Numerical methods able to efficiently resolve these interactions are — in combination with theoretical analysis and lab experiments — an essential tool for advancing our understanding. I will give a general overview of the hierarchical numerical methods I have worked on, as implemented within the free software Gerris Flow Solver ( and Basilisk ( and discuss a range of applications including microscale high-energy droplet dynamics, multiphase and complex flows, tsunamis and climate dynamics.

Continuum modelling of granular materials

Date: Friday June 5
Time and place: 11am Huxley 340

Granular materials such as sand/gravel are notoriously difficult to model. One approach is to treat them as fluids with a specific "non-Newtonian" rheology. Although this approach dates back to the early 20th century, rheologies suited to dry granular materials have only been discovered very recently. I willl show how these advances can be combined with numerical methods to obtain very accurate models of avalanches and other granular flows.

Stephane Popinet is a member of the Complex Fluids and Hydrodynamic Instabilities group of the Institut Jean Le Rond s'Alembert which is part of the Universite Pierre et Marie Curie (Paris 6) and CNRS. Stephane is the author of the public domain hydrodynamics software Gerris ( and Basilisk (, and has published extensively in the area of multiphase flows, complex fluids and droplet dynamics among others. He is one of the leading figures in the numerical simulation of fluid flows and combines expertise in both fluid dynamics as well as scientific computation to address complex nonlinear hydrodynamics problems.

Sneaking up on reliable and effective jet noise control

Daniel J Bodony (Univ Illinois)
Huxley 130 2-3pm on Tuesday May 5th

Abstract: The loudest source of high-speed jet noise, such as found on naval tactical fighters, appears to be unsteady wavepackets that are acoustically efficient but relatively weak compared to the main jet turbulence. These wavepackets can be usefully described by linear dynamics and connected to transient growth mechanisms. Through a component-wise structural sensitivity analysis of the turbulent jet baseflow, using both the equilibrium and time-average fields, estimates are given as to what location and kind of actuators and sensors are most effective, in a linear feedback context, to control the wavepackets to reduce their noise. Low and high frequency approaches are examined where the controlling mechanisms differ: the low-frequency control indirectly targets the slow variation of the mean on which the wavepackets propagate while the high-frequency control targets the wavepackets themselves. The predicted control strategy is evaluated using direct numerical simulations on a series of Mach 1.5 turbulent jets. Bio: Daniel J. Bodony is the Blue Waters Associate Professor and Donald Biggar Willett Faculty Scholar in the Department of Aerospace Engineering at the University of Illinois. Prior to joining the University of Illinois he was an engineering research associate at the NASA Ames/Stanford Center for Turbulence Research. He received his PhD from Stanford University in 2005, he received an NSF CAREER award in 2012 in fluid dynamics, and he is an Associate Fellow of the AIAA.

OPS - An abstraction from multi-block structured mesh computations

Istvan Reguly (Oxford University)
Monday April 27th, 11am in Huxley 144

In this talk Istvan will introduce the OPS abstraction for multi-block structured grids and discuss some of the targeted applications: primarily CloverLeaf. CloverLeaf is a mini-app from AWE, and has several hand-tuned implementations for various architectures, which serve as good baselines for comparison with a high-level approaches, such as OPS. I will discuss performance with MPI, OpenMP, CUDA, OpenCL, OpenACC, both on a single node, including IBM's Power8, and large scale systems. Referring back to OP2, He will discuss strategies for GPU and vectorised CPU code generation and the challenges involved from the perspective of a simple code-to-code transformation tool in python. Finally, we are taking a look at OPS's checkpointing functionality and ho w it can near-optimal functionality with very little user input, by relying on the loop chaining abstraction and the high-level description of computations in OPS.

Bio: Istvan has been working as an RA at Oxford on high level abstractions and Domain Specific Languages for scientific computing, and also as a GPU specialist associated with the GPU-accelerated Emerald supercomputer. He has a PhD from the Pázmány Péter Catholic University in Hungary (

A bilevel programming problem occurring in smart grids

Prof. Leo Liberti (CNRS LIX, Ecole Polytechnique) on
7th May, 11am, CPSE Seminar Room, RODH C615, Roderic Hill Building. Refreshments beforehand in the CPSE Common Room (Centre for Process Systems Engineering2015 Seminar Series)

Abstract: A key property to define a power grid "smart" is its real-time, fine-grained monitoring capabilities. For this reason, a variety of monitoring equipment must be installed on the grid. We look at the problem of fully monitoring a power grid by means of Phasor Measurement Units (PMUs), which is a graph covering problem with some equipment-specific constraints. We show that, surprisingly, a bilevel formulation turns out to provide the most efficien t algorithm.

Computational Progress in Linear and Mixed Integer Programming

Prof. Robert Bixby
13th May, 11am, CPSE Seminar Room, RODH C615, Roderic Hill Building. Refreshments beforehand in the CPSE Common Room (Centre for Process Systems Engineering2015 Seminar Series).

Abstract: We will look at the progress in linear and mixed-integer programming software over the last 25 years. As a result of this progress, modern linear programming codes are now capable of robustly and efficiently solving instances with multiple millions of variables and constraints. With these linear programming advances as a foundation, mixed-integer programming then provides the modeling framework and solution technology that enables the overwhelming majority of present-day business planning and scheduling applications, and is the key technology behind prescriptive analytics. The performance improvements in mixed-integer programming code overs the last 25 years have been nothing short of remarkable, well beyond those of linear programming and have transformed this technology into an out-of-the box tool with applications to an almost unlimited range of real-world problems.

Towards better HPC with Allinea tools

Date : 22nd April 2015, venue : RSM 3.35

Abstract: Allinea offers a wide range of products for production and development environments to help the HPC community improve Supercomputers workloads. Allinea Forge is a professional development environment designed for the challenges that face software developers with multi-threaded and multi-process codes. This toolkit is here to resolve any defects during the development workflow, from initial design to the integration with testing facilities. With Allinea Forge, HPC developers can easily: Discover the critical performance bottlenecks Measure the scaling prop erties of codes across threads and processes Observe and control threads and processes simultaneously Resolve complex destructive bugs. Once an application is optimized and debugged, it goes to production. At this stage of an application life cycle, Allinea Performance Reports is very useful to make sure that the application is running to its full capabilities. By finding the sweet spot, it is possible to drastically increase the number of runs within a given timeframe and make better use of the allocations available. During this hands-on workshop, we will discover Allinea tools on ICL cluster. At the end of the session, the attendees will: know the basic knowledge necessary to get started with the tools in a real life environment know how to choose the right feature to be used during the development activity be able to conduct benchmarks easily and analyze performance reports to make the right submission choices. During this workshop, an equal amount of time will be spent on Allinea DDT - the parallel debugger, Allinea MAP - the parallel profiler, and Allinea Performance Reports - the application analysis tool.

Proposed Agenda: 14:00 - 14:15 : Roundtable and introduction
14:15 - 14:30 : Presentation of Allinea tools (ppt)
14:30 - 15:15 : Getting started with Allinea Forge : optimizing a simple code
15:15 - 16:00 : From profiling to debugging with Allinea Forge
16:00 - 16:45 : Improving an HPC workload with Allinea Performance Reports

>BEM++ - Efficient solution of boundary integral equation problems

Date: Thursday 26 Feb 2015
Time: 16:00 - 17:00
Venue: Huxley 340
Campus: South Kensington Campus
Speaker: Timo Betcke, UCL
Contact: Pavel Berloff

The BEM++ boundary element library is a software project that was started in 2010 at University Colleg e London to provide an open-source general purpose BEM library for a variety of application areas. In this talk we introduce the underlying design concepts of the library and discuss several applications, including high-frequency preconditioning for ultrasound applications, the solution of time-domain problems via convolution quadrature, light-scattering from ice crystals, and the solution of coupled FEM/BEM problems with FEniCS and BEM++.

Worst-case complexity of nonlinear optimization: Where do we stand?

Speaker: Philippe To int, Uni de Namur
Wed 11th February at 11am, CPSE Seminar Room RODH C615, Roderic Hill Bldg
Organiser: Ruth Misener, DoC

We review the available results on the evaluation complexity of algorithms using Lipschitz-continuous Hessians for the approximate solution of nonlinear and potentially nonconvex optimization problems. Here, evaluation complexity is a bound on the largest number of problem functions (objective, constraints) and derivatives evaluations that are needed before an approximate first-order critical point of the problem is guaranteed to be found. We start by considering the unconstrained case and examine classical methods (such as Newton's method) and the more recent ARC2 method, which we show is optimal under reasonable assumptions. We then turn to constrained problems and analyze the case of convex constraints first, showing that a suitable adaptation ARC2CC of the ARC2 approach also possesses remarkable complexity properties. We finally extend the results obtained in simpler settings to the general equality and inequality constrained non linear optimization problem by constructing a suitable ARC2GC algorithm whose evaluation complexity also exhibits the same remarkable properties.

Philippe L. Toint (born 1952) received its degree in Mathematics in the University of Namur (Belgium) in 1974 and his Ph.D. in 1978 under the guidance of Prof M.J.D. Powell. He was appointed as lecturer at the University of Namur in 1979 were he became associate professor in 1987 and full-professor in 1993. Since 1979, he has been the co-director of the Numerical Analysis Unit and director of the Transportation Research Group in this department. He was in charge of the University Computer Services from 1998 to 2000 and director of the Department of Mathematics from 2006 to 2009. He currently serves as Vice-rector for Research and IT for the university. His research interests include numerical optimization, numerical analysis and transportation research. He has published four books and more than 280 papers and technical reports. Elected as SIAM Fellow (2009), he was also awarded the Beale-Orchard-Hayes Prize (1994, with Conn and Gould)) and the Lagrange Prize in Continuous Optimization (2006, with Fletcher and Leyffer). He is the past Chairman (2010-2013) of the Mathematical Programming Society, the international scientific body gathering most researchers in mathematical optimization world-wide. Married and father of two girls, he is a keen music and poetry lover as well as an enthusiast scuba-diver.

Electron correlation in van der Waals interactions

Ridgway Scott, Professor of Computer Science and of Mathematics at the University of Chicago
Coordinates: Room G01, Royal School of Mines, 12:00 Monday January 26th 2015

We examine a technique of Slater and Kirkwood which provides an exact resolution of the asymptotic behavior of the van der Waals attraction between two hydrogens atoms. We modify their technique to make the problem more tractable analytically and more easily solvable by numerical methods. Moreover, we prove rigorously that this approach provides an exact solution for the asymptotic electron correlation. The proof makes use of recent results that utilize the Feshbach-Schur perturbation technique. We provide visual representations of the asymptotic electron correlation (entanglement) based on the use of Laguerre approximations.


Formal Alchemy for Real-World HPC Parallelism

Ganesh Gopalakrishnan Professor, School of Computing, University of Utah
11:00 Friday 31 October in Huxley 217

Long-lived parallel programming notations for high performance computing tend to grow by accretion of features. Deconstructing these notations into their elemental components may allow us to view what once appeared ugly and confusing as basically beautifully put together, with only a mild amount of tarnish. We will demonstrate that there is some truth to this assertion by presenting how we once deconstructed MPI - a venerable API for HPC - and built an active testing tool that helps "predict" the behavior of MPI programs. The key binding principle turned out to be a somewhat non-traditional "matches before" relation that is aware of not only program control flow but also the status of message buffer availability, helping us explain deadlocks that occur either when there is limited buffering or also when there is excessive buffering.

Unfortunately, the magic of such theories always seems to fall short of the full scope of any practical API; yet, we believe that the daunting world of HPC concurrency desperately needs such acts of alchemy. I will raise the question of how to deconstruct future APIs that may, in addition to behavioral elements, contain aspects of allowed degrees of floating-point result non-determinism or even fault-handling methods. I hope to explain our current projects in this context.

A Computational Viewpoint on Classical Density Functional Theory

Dr Matt Knepley, Senior Research Associate, Computation Institute, Univ. of Chicago
2.42 at 4pm, Thursday October 9th 2014.

Classical Density Functional Theory (CDFT) has become a powerful tool for invesitgating the physics of molecular scale biophysical systems. It can accurately represent the entropic contribution to the free energy. When augmented with an electrostatic contribution, it can treat complex systems such as ion channels. We will look at the basic computational steps involved in the Rosenfeld formulation of CDFT, using the Reference Fluid Density (RFD) method of Gillespie. We present a scalable and accurate implementation using a hybrid numerical-analytic Fourier method.

Workshop: Computational Cardiac Electrophysiology

Organisers: Dr Chris Cantwell, Dr Richard Clayton, Professor Spencer Sherwin; Visit the event website
15-16 May 2014, room RH266, Roderic Hill Building

With recent advances in computer technology it is now becoming feasible to simulate not only the highly complex electrophysiological processes which occur in myocardium, but also on timescales which could enable modelling to directly influence and assist clinical intervention and mechanistic understanding across multiple scales.

This two-day workshop aims to bring together UK groups with an interest in using computer modelling to tackle basic science and clinical challenges in cardiac electrphysiology and discuss recent advances.

Topics include:

  • Challenges in patient-specific modelling
  • Image acquisition, pre-processing and mesh generation
  • Clinical interfacing: assisting diagnosis and treatment
  • High-performance computing: towards real-time simulation
  • Continuous modelling: monodomain/bidomain, FEM, SEM, finite difference
  • Discrete modelling: cellular automata, modelling cell coupling and ion channel changes.
  • Multiscale modelling from cell to organ: the right model for the job.

As a secondary focus of the workshop, we hope to discuss and define a range of benchmark problems of increasing complexity, covering both cellular and tissue aspects of the electrophysiology models. A number of cardiac electrophysiology codes exist and some are widely used, but few benchmark problems have been designed to verify these codes and assess their performance in preparation for research and clinical use. By designing a suite of test problems, we can further increase confidence in the available software tools.

The Pochoir Stencil Compiler

Bradley Kuszmaul, MIT Computer Science and Artificial Intelligence Laboratory and Tokutek Inc
Friday January 17th at 11am, room 212 in the William Penney Building

A stencil computation repeatedly updates each point of a $d$-dimensional grid as a function of itself and its near neighbors. Parallel cache-efficient stencil algorithms based on trapezoidal decompositions are known, but most programmers find them difficult to write. The Pochoir stencil compiler allows a programmer to write a simple specification of a stencil in a domain-specific stencil language embedded in C++ which the Pochoir compiler then translates into high-performing Cilk code that employs an efficient parallel cache-oblivious algorithm. Pochoir supports general $d$-dimensional stencils and handles both periodic and aperiodic boundary conditions in one unified algorithm. The Pochoir system provides a C++ template library that allows the user's stencil specification to be executed directly in C++ without the Pochoir compiler (albeit more slowly), which simplifies user debugging and greatly simplified the implementation of the Pochoir compiler itself. A host of stencil benchmarks run on a modern multicore machine demonstrates that Pochoir outperforms standard parallel-loop implementations, typically running 2--10 times faster. The algorithm behind Pochoir improves on prior cache-efficient algorithms on multidimensional grids by making hyperspace cuts, which yield asymptotically more parallelism for the same cache efficiency.


Workshop on Global Optimisation

Thursday December 19th Centre for Process Systems Engineering Seminar Room C615

Global optimization problems arise in all disciplines of science, applied science and engineering, including molecular biology, computati onal chemistry, thermodynamics, process systems design and control, finance and transportation. Examples of important applications of global optimization are the structure prediction of molecules, protein folding peptide docking, phase and chemical equilibrium, portfolio selection, air traffic control, chip layout and compaction problems, and the optimal design and control of non-ideal separations, to name a few.
This one-day workshop will provide a unique opportunity, for both academic and industrial communities, to discuss the latest developments in global optimization, meet experts in the field and explore new research directions. The workshop is supported by the Centre for Computational Science and Engineering (CMSE) at Imperial and jointly organised by the Department of Computing (DoC) and the Centre for Process Systems Engineering (CPSE) of Imperial College London.


  • Claire Adjiman (Imperial College London)
  • Roy L. Johnston (University of Birmingham)
  • Ruth Misener (Imperial College London)
  • Panos Parpas (Imperial College London)
  • Fabio Schoen (Universita degli Studi di Firenze)
  • David Wales (University of Cambridge)

Two talks by Jan Treibig on multicore performance engineering

Jan Treibig

The first is more of a research talk, while the second is more of a tutorial in making software run fast on multicore processors.

Talk 1: 11:00-12:00 on Monday December 9th 2013, Room 212, William Penney Building, Imperial College:

Comparing the performance of different x86 SIMD instruction sets for a medical imaging application on modern Multi- and Manycore chips - Exploring performance and power properties of modern multicore chips via simple machine models

Talk 2: 11:00-13:00 on Tuesday Dec 10th 2013 in the Grantham Institute's Boardroom:

Performance Engineering for Multi-Core Processors

Performance optimization for multi-core processors is a complex task requiring intimate knowledge about algorithm, implementation level, a nd processor architecture. This talk will introduce the principles of modern computer architecture illustrated with micro-benchmarking explorations. Based on this knowledge a pattern-driven performance engineering process will be described which focuses on a resource utilization based view on optimization. The proposed process will be explained on several examples.

Prospects for next-generation multigridding: low communication, low memory, adaptive, and resilient

Jed Brown, Argonne National Laboratory
5pm Thursday 26th September 2013, Room G.39, Royal School of Mines, Imperial College

Several hardware trends jeopardize traditional multilevel solvers: data motion rather than flops are becoming the leading hardware and energy cost, hybrid architectures move the primary compute capability and local memory bandwidth further from the global network, and the frequency of hardware failures is expected to increase. We propose to address the challenge of versatile multilevel solver performance at extreme scale with a radical departure from the status quo: inverting the multilevel philosophy from "coarse accelerates fine" to "fine improves accuracy of coarse", allowing the removal of "horizontal" communication and merging infrastructure with multiscale modeling. This approach is derived from a prescient and nearly forgotten observation by Achi Brandt over 30 years ago that a multigrid solve (specifically, evaluating functionals of the solution) can be performed in poly-logarithmic memory while reserving O(N) complexity. This ideal of ephemeral state is rarely feasible in real applications, but the same pr inciple removes communication and breaks dependencies that also allows local recovery from faults despite the fact that implicit solves are globally coupled, is naturally suited to architectures in which only a small fraction of total compute capability is latency-opt imized next to a fast network, and has the potential to transform the postprocessing workflow for PDE-based models to drastically reduce the amount of data that must be stored to disk without compromising analysis capability.

The Periodic Table of Finite Elements

Douglas Arnold, University of Minnesota
4pm Wednesday 12th June 2013, Room 340, Huxley Building, Imperial College

Finite element methodology, reinforced by deep mathematical analysis, provides one of the most important and powerful toolsets for numerical simulation. Over the past forty years a bewildering variety of different finite element spaces have been invented to meet the demands of many different problems. The relationship between these finite elements has often not been clear, and the techniques developed to analyze them can seem like a collection of ad hoc tricks. The finite element exterior calculus, developed over the last decade, has elucidated the requirements for stable finite elem ent methods for a large class of problems, clarifying and unifying this zoo of methods, and enabling the development of new finite elements suited to previously intractable problems. In this talk, we will discuss the big picture that emerges, providing a sort of periodic table of finite element methods.

A Comparative Study of the Construction of Global Climate Models

Steve Easterbrook, University of Toronto
4pm Tuesday March 12th 2013, Room 217, Huxley Building, Imperial College

In the literature, comparisons between climate models tend to focus on how well each model captures various physical processes, and how skillful they are in reproducing the climatology of observational datasets. In this talk, we present a different type of comparison, based on an analysis of the software architecture of global climate models. As with the architecture of a historic building, the architecture of a typical climate model includes a mix of older and newer elements, and evidence of re-purposing as new generations of scientists have adapted the model to new uses. An analysis of the development history of each model shows a mix of 'essence' and 'accident'. The essence include explicit design decisions that all climate modellers face (e.g. the choice of grids, selection of numerical methods for the dynamical core, use of particular coupling technologies, etc), while the accidental aspects represent constraints that are beyond the immediate control of a model's designers (e.g. available funding and experti se, the demands of different user groups, the rhythms imposed by collaborations between different research groups, etc). The talk will draw on a observations from case studies of four major models from four different countries: the UK Met Office Hadley Centre (UKMO); the US National Centre for Atmospheric Research (NCAR); the German Max-Planck Institute for Meteorology (MPI-M); and the French Institute Pierre Simon Laplace (IPSL). We will use differences in the architecture of these four models to ill ustrate some of t he different organizational constraints faced by climate research labs. The result of the analysis suggests that there may be much more structural diversity among the current generation of earth system models than is revealed in recent comparisons of their climatology.

TAU - open-source performance tools for HPC

Sameer Shende will be giving two day training course on TAU at Imperial College London, 28-29 January.

TAU is one of the best known tools for performance tuning and analysis of parallel codes for high end computing. It is an essential took for anyone serious about developing high performance parallel software.

Location: Room 3.38, Royal School of Mines, Imperial College London, SW7 2AZ
Date: 28-29 January 2013
Time: 09:00-17:00 each day with breaks for lunch, and morning and afternoon coffee.

To register for the event please contact Gerard Gorman ( This is a free event and places are allocated on a first come first served basis.

About TAU

TAU Performance System is a portable profiling and tracing toolkit for performance analysis of parallel programs written in Fortran, C, C++, Java, Python.

TAU (Tuning and Analysis Utilities) is capable of gathering performance information through instrumentation of functions, methods, basic blocks, and statements.

All C++ language features are supported including templates and namespaces. The API also provides selection of profiling groups for organizing and controlling instrumentation. The instrumentation can be inserted in the source code using an automatic instrumentor tool based on the Program Database Toolkit (PDT), dynamically using DyninstAPI, at runtime in the Java Virtual Machine, or manually using the instrumentation API.

TAU's profile visualization tool, paraprof, provides graphical displays of all the performance analysis results, in aggregate and single node/context/thread forms. The user can quickly identify sources of performance bottlenecks in the application using the graphical interface. In addition, TAU can generate event traces that can be displayed with the Vampir, Paraver or JumpShot trace visualization tools.

About Sameer Shende

Sameer Shende serves as the Director of the Performance Research Lab, NIC, University of Oregon as well as being one of the developers of TAU.