Following the great experiences in Google Summer of Code in 2011, 2012 and 2013 we are applying again in 2014!
Project ideas are grouped into categories (ideas related to the SixTrack accelerator simulation engine used in the LHC@Home application, to the cling interpreter inside Root, to the performant and versatile IgProf profiler, to the CERN Virtual Machine, and to the Geant 4 simulation tools). We also have a so-called 'Blue sky' ideas which are rather raw and must be worked on further before they become projects. And we are open to other great ideas, and are looking forward to hearing from students with new perspectives on our projects. The list of our mentors (and their areas of expertise) can be found below.
Our projects are all related to codes which are used for the LHC accelerator and its experiments. They are almost as diverse as software stack of the LHC experiments ranging from adapting the the CernVM file system to the tracking precisesly hundreds of protons over millions of times around a model of the LHC accelerator. The LHC experiments have software frameworks that make use of common scientific software libraries for high-energy physics (HEP) and many other open source tools. CernVM and CernVM-FS provide a uniform runtime environment based on Scientific Linux. The software stack is fully open sourced; many parts of it are used outside the world of particle physics, as in simulating medical physics detectors for medical imaging or estimating the dose deposited in the sensitive electronics of satelites as they fly through the earth's radiation belts.
We encourage students who plan to apply to contact us about their interest and explain their project ideas as early as possible. Our experience from our previous GSoC participation was that frequently an initial student application either needs to be reworked in close collaboration with a future mentor, or at least can benefit from feedback and discussion. Please do not forget to provide us with some relevant information about yourself (for example CV, past projects, personal page or blog, linkedin profile, github account, etc.).
Before submitting an application please consult the official GSoC FAQ where you can find some good advice on writing a successful application. The application should be submitted through the GSoC webapp before the 21st of March (19:00 UTC).
Project ideas
ROOT Object-Oriented data analysis framework
The ROOT system provides a set of OO frameworks with all the functionality needed to handle and analyze large amounts of data in a very efficient way. Having the data defined as a set of objects, specialized storage methods are used to get direct access to the separate attributes of the selected objects, without having to touch the bulk of the data. Included are histograming methods in an arbitrary number of dimensions, curve fitting, function evaluation, minimization, graphics and visualization classes to allow the easy setup of an analysis system that can query and process the data interactively or in batch mode, as well as a general parallel processing framework, PROOF, that can considerably speed up an analysis. Thanks to the built-in C++ interpreter the command language, the scripting, or macro, language and the programming language are all C++. The interpreter allows for fast prototyping of the macros since it removes the, time consuming, compile/link cycle. It also provides a good environment to learn C++. If more performance is needed the interactively developed macros can be compiled.
ROOT's new C++11 standard-compliant interpreter is Cling, an interpreter built on top of Clang (www.clang.llvm.org) and LLVM (www.llvm.org) compiler infrastructure. Cling is being developed at CERN as a standalone project. It is being integrated into the ROOT data analysis (root.cern.ch) framework, giving ROOT access to an C++11 standards compliant interpreter.
ROOT is an open system that can be dynamically extended by linking external libraries. This makes ROOT a premier platform on which to build data acquisition, simulation and data analysis systems. ROOT is the de-facto standard data storage and processing system for all High Energy Phyiscs labs and experiments world wide. It is also being used in other fields of science and beyond (e.g. finance, insurance, etc).
Complete ROOT — R interface
Description: Complete the interface in ROOT to call R function using the R C++ interface (Rcpp, see http://dirk.eddelbuettel.com/code/rcpp.html). Make available in ROOT many of the statistical packages available in ROOT such as those performing regression and.or multi-variate analysis. Developing this interface opens the possibility in ROOT to use the very large set of mathematical and statistical tools provided by R.
Expected Results: Make some class(es) implementing some of the existing algorithm interface in ROOT, so that the R functionality can be used directly from the ROOT classes or statistical studies (e.g. multivariate analysis) without knowing the low-level details. Another objective will be to package the ROOT-R interface in a library which will be can be optionally distributed with ROOT
Mentor: Lorenzo Moneta
Requirements: Basic knowledge of C++. Knowledge of ROOT and/or R would be advantages.
Coding rules and style checker based on the Clang Static Analyzer
Description: Code maintenance is very much facilitated if the coding and style rules are followed. This is the case for the ROOT project for which a number of rules have been defined since the early days of the project. A commercial tool was used to check the code for rule validations and to present the results in a easy form for developers to act on them. With a commercial solution, adding new rules has become a real problem. The idea is to re-implemement the existing rules with a open source tool, which can be extended and adapted to also fulfill the needs of other software development projects.
Expected Results: Develop a new C++ code checker tool, possibly based on the Clang Static Analyzer, which is initially inplementing the set of ROOT code rules and is extendible to other set of rules for diffrent proejcts. The current coding rules that will need to be implemented are listed here http://root.cern.ch/root/nightly/codecheck/rules.html and results of the analysis should be presented in a easy and accessible way for developers to identify what rules has been violated by the latest commit to the repository. The existing tool (commercial) produces the following table http://root.cern.ch/root/nightly/codecheck/codecheck.html
Mentor: Pere Mato, Olivier Couet
Requirements: Basic knowledge of C++, basic knowledge of Clang/Clang Static Analyzer
Code copy/paste detection
Mentor: Vasil Vasiliev
Cling projects
ROOT's new C++11 standard-compliant interpreter is Cling, an interpreter built on top of Clang (www.clang.llvm.org) and LLVM (www.llvm.org) compiler infrastructure. Cling is being developed at CERN as a standalone project. It is being integrated into the ROOT data analysis (root.cern.ch) framework, giving ROOT access to an C++11 standards compliant interpreter.
Cling bundle for most popular platforms
Mentor: Vasil Vasiliev
Cling name autodetection and library autoloading
Mentor: Vasil Vasiliev
LINQ 4 ROOT and Cling
Mentor: Vasil Vasiliev
Cling language support
Mentor: Vasil Vasiliev
Implement Automatic Differentiation library using Cling
Mentor: Vasil Vasiliev
CernVM & CernVM-FS
CernVM is a Virtual Software Appliance designed to provide a complete and portable environment for developing and running the data analysis of the CERN Large Hadron Collider (LHC) on any end-user computer (laptop, desktop) as well as on the Grid and on Cloud resources, independently of host operating systems. This "thin" appliance contains only a minimal operating system required to bootstrap and run the experiment software. The experiment software is delivered using the CernVM File System (CernVM-FS), a Fuse file system developed to deliver High Energy Physics (HEP) software stacks onto (virtual) worker nodes.
CernVM-FS is used for instance by LHC experiments on their world-wide distributed computing infrastructure. HEP software is quite complex with tens of gigabytes per release and 1-2 releases per week while, at the same time, almost all the individual files are very small resulting in hundreds of thousands of files per release. CernVM-FS uses content addressable storage (like the GIT versioning system). For distribution it uses HTTP. File meta data are organized in trees of SQlite file catalogs. Files and file meta data are downloaded on demand and aggressively cached. The CernVM File System is part of the CernVM appliance but it compiles and runs on physical Linux/OS X boxes as well. It is mostly BSD licensed with small GPL parts. CernVM-FS source code of the current development branch can be downloaded from here, the documentation is available here.
Streamline CernVM Contextualization Plug-ins
Description: CernVM is a virtual appliance that can be used by the four LHC experiments in order to run simulation and data processing applications in the Cloud. Unlike standard virtual machine images, CernVM provides a uniform configuration interface across the most relevant cloud infrastructures (Amazon EC2, Google Compute Engine, CERN OpenStack, Nimbus ScienceCloud, ...). This is achieved by so called "contextualization plugins", light-weight agents inside the virtual machine that detect the infrastructure at hand and dynamically adapt the image. With the transition of most cloud infrastructures from an early exploitation phase to a production service, the contextualization plugins also need to be evolved, streamlined, and optimized. For this project, the student develops or evolves a common framework for contextualization plugins. The student is supposed to measures and optimizes the virtual machine boot time delay due to these plugins. The student will also get in touch with the computing groups of LHC experiments to ensure CernVM fits within their distributed computing environment.
Expected Result: .
Mentors: Gerardo Ganis
Requirements: Good knowledge of Linux/Unix, experience with scripting languages (Perl/Python/BASH). Experience with virtualization technology is a plus.
Geant 4 Simulation Toolkit and Geant Vector Prototype
The Geant4 toolkit simulates the interactions of elementary particles and radiation with matter. It is used to simulate the detectors of the LHC and other High Energy Physics (HEP) experiments. It finds application in other areas, from assessing the effects of radiation on the electronics of satellites to improving the design of medical detectors with the custom GATE application. LHC experiments use Geant4 to compare the signatures of rare events from new physics (such as the Higgs boson) to those coming from known interactions. The open source toolkit is developed by the Geant4 collaboration, which brings together 90 physicists and engineers from laboratories and universities around the world. Developments are ongoing to improve its precision and scope of application, and to better utilise current and emerging computer architectures. The simulation programs of the LHC experiments use the Geant4 simulation toolkit to produce simulated LHC events continuously running with on about 100,000 CPU cores throughout the year. These statistics remain a limitation in the analysis potential for some interesting types of new physics. As a result the goal of the project is to explore different ways to reduce the execution time of Geant4 on today’s complex commodity CPUs, and to prototype how to use it efficiently on the many-core hardware of the future (tens, hundreds of cores, threads or ‘warps’). The code required to model diverse types of particles and interactions, and to model the complex geometries of detectors is large. Due to this it overwhelms the caches of current CPUs, significantly reducing the efficiency of utilisation of today’s hardware. This effort is focused on identifying ways to spread the work between threads and/or to reorganise the work. By using less code in each core we aim to use the memory architectures of today’s hardware better. At the same time we prepare the way to obtain good performance on tomorrow’s hardware.
Reengineer Propagation of Charged Tracks in a Magnetic Field for Vector and GPU
Description: A significant part of the CPU time in simulations of large detectors is taken in integrating the motion of particles in an electromagnetic field, using numerical integration techniques. Our idea is to redesign the classes used in propagation, to avoid virtual function calls and aid optimization including vectorisation. We target common code which can be used in several modes: in sequential simulation for a single particle, in vector mode for a set of tracks or on a GPU for a single track.
Expected Result: Created new implementations using template techniques and vectorization that can be used across vector and non-vector CPUs and GPUs, and benchmark the speed for a vector of particles propagating in a magnetic field against the existing sequential version.
Mentors: John Apostolakis, Sandro Wenzel
Requirements: Good knowledge of C++, Object-Oriented Programming, parallel programming using for vectors or on GPUs are essential. Knowledge of solution of ordinar differential equations will be beneficial.
Sixtrack numerical accelerator simulation
SixTrack is a simulation tool for the trajectory of high energy particles in accelerators. It has been used in the design and optimization of the LHC and is now being used to design the upgrade that will be installed in the next decade the High-Luminsity LHC (HL-LHC). Sixtrack has been adapted to take advantage of large scale volunteer computing resources provided by the LHC@Home project. It has been engineered to give the exact same results after millions of operations on several, very different computer platforms. The source code is written in Fortran, and is pre-processed by two programs that assemble the code blocks and provide automatic differentiation of the equation of motions. The code relies on the crlibm library, careful arrangement of parenthesis, dedicated input/output and selected compilation flags for the most common compilers to provide identical results on different platforms and operating systems. An option enables the use of the Boinc library for volunteer computing. A running environment SixDesk is used to generate input files, split simulations for LHC@Home or CERN cluster and collect the results for the user. SixTrack is licensed under LGPLv2.1.
A strong background in computer science and programming languages as well the interest to understand computational physics methods implemented in the code are sought. The unique challenge will be offerred to work with a high-performance production code that is used for the highest energy accelerator in the world - and thus the code's reliability and backward compatibility cannot be compromised. There will be the opportunity to learn about methods used in simulating the motion of particles in accelerators.
Simulating time dependent functions
Description: Implement, test and put in production the ability to change the strength, misalignment of any element by applying a function composed by predefined branch like linear, parabolic, sinusoidal, withe noise, colored noise.
Expected results: The user will have the option to define magnet strength as a function of time from the input files.
Mentors: Ricardo De Maria, Eric McIntosh
Requirements: Fortran, calculus.
New physics models
Description: Implement, test and put in production a new solver for exact bending dipoles, include per-particle mass and charge state, track total time.
Expected results: The user can build accelerator simulations with more accurate models for low energy machines.
Mentors: Ricardo De Maria, Eric McIntosh
Requirements: Fortran, calculus, accelerator physics.
Database infrastructure for large scale simulation
Description: Develop a database interface to collect the results sent from volunteers or computer clusters and provide them to the users. The code will allow to split a study in smaller unit, submit the study, query the state, re-submit missing or invalid results, collect the data in users' SQLite database.
Expected results: The user will have a complete set of tools to submit, follow and collect results from distributed computer resources.
Mentors: Ricardo De Maria, Eric McIntosh
Requirements: SQL, python, C/C++, Linux.
Create a Standalone Tracking Library
Description: : Define an API for particle tracking, implemented the existing model as a standalone module in C and re-factor existing code to use it. The API and the module should open the way to GPU calculations.
Expected results: Test runs which rely only on the newly rewritten library to perform tracking simulations.
Mentors: Ricardo De Maria, Eric McIntosh
Requirements: Experience with Fortran, C, calculus and a background of physics are important.
Performance Monitoring
IgProf (https://igprof.org) is a lightweight performance profiling and analysis tool. It can be run in one of three modes: as a performance profiler, as a memory profiler, or in instrumentation mode. When used as a performance profiler it provides statistical sampling based performance profiles of the application. In the memory profiling mode it can be used to obtain information about the total number of dynamic memory allocations, profiles of the ``live'' memory allocations in the heap at any given time and information about memory leaks. The memory profiling is particularly important for C/C++ programs, where large amounts of dynamic memory allocation can affect performance and where very complex memory footprints need to be understood. In nearly all cases no code changes are needed to obtain profiles. IgProf currently supports Linux and x86/x86-64 architectures, and provides initial support for ARMv7 one. It correctly handles dynamically loaded shared libraries, threaded applications and subprocesses. It can be run entirely in user space without special privileges and produces full navigable call stacks. The profile reports can be visualized in one of several ways. A simple text report can be produced from the raw data saved during the profiling run. Alternatively a web-browsable profile report can be produced which allows easy navigation of the call stack. Both allow one to see profile data ordered by symbol in terms of ``cumulative'' cost (function plus children) and ``self'' cost (time in the function itself) as well as a full call graph view showing the functions which called, and which were called by, any given function. An important feature of the web-navigable reports is the ability to point via URL at particular places in the call graph. This facilitates collaboration between individuals at different locations. While there are obviouly many profilers out there, the goal of IgProf is to provide a reliable profiler for large applications like the one found in HEP, and to tackle the challenges posed by heterogenous compututing architectures.
Profiling mixed python / C++ programs
Mentors: Giulio Eulisse, Peter Elmer, Vincenzo Innocente
Data driven profiling
Mentor: Giulio Eulisse, Peter Elmer, Vincenzo Innocente
Support for CUDA / OpenCL profiling
Mentor: Giulio Eulisse, Peter Elmer, Vincenzo Innocente
Enhanced support for ARM, x32 and MacOSX architectures.
Mentor: Giulio Eulisse, Peter Elmer, Vincenzo Innocente
'Blue sky' ideas
- General GPU-vectorized solid: Create a vectorised/GPU-capable solid that can be used in place of the most popular existing solids (box, tube, conical & spherical shells), for use in Geant4-GPU and vector simulation. Inspired by the approach of James Tickner (see article in Computer Physics Communications, Vol 181, Issue 11, Nov 2010, pages 1821–1832 available at http://dx.doi.org/10.1016/j.cpc.2010.07.001 ).
Mentor: John Apostolakis, Gabriele Cosmo
Requirements: Experience with C/C++, vectorization
- New simulation engine: Create a prototype geometry navigation for one particle type with a limited geometry choice (1-2 types of volume types) using a high-productivity parallel language ( Chapel, X10 ). Benchmark this against existing solutions for the same problem. Document the development effort and the pitfalls of the language, tools and implementation (potential for a scientific report on the coding experience, in addition to the results).
Mentor: John Apostolakis
Requirements: Experience with C/C++ and either Chapel, X10, Cilk+, SplitC or a similar language is required.
Mentors
Here is the list of our mentors and their areas of experitse:
- John Apostolakis, Geant simulation (admin)
- Lorenzo Moneta, Root (admin)
- Peter Elmer, IgProf
- Guilio Eulisse, IgProf
- Vincenzo Innocente, IgProf
- Gerrardo Ganis, CernVM
- Vincenzo Innocente, IgProf
- Riccardo De Maria, Sixtrack
- Eric McIntosh, Sixtrack
- Vassil Vassilev, Cling
- Sandro Wenzel, Geant simulation
Contact information
Please do not hesitate to contact us if you are planning to apply for any of the above projects:
- SFT GSoC mailing list: sft-gsoc-AT-cern-DOT-ch (no subscription needed).
- SFT GSoC Jabber/XMPP chat room: gsoc-sft@conference.jabber.org . We noticed that Gmail/Gtalk XMPP accounts have problems posting messages to the chat room, so please register yourself an XMPP account on some other server (it takes no time to register an account at http://www.jabber.org).
- IRC is restricted at CERN - please use Jabber instead.