courses

I offer various courses destilling the most important aspects of software engineering for scientific groups. Courses can be in-person or online, and they usually consist of a presentation part and a workshop part, but can be fully customized to your needs. I have so far given courses at:

  • Technical University of Munich
  • University of California, Berkeley

Do not hesitate to reach out and we can discuss potential courses for your group! Just write me an e-mail.

Topics

The topics of the courses can be adjusted to your particular needs. From my experience, the Best practices for coding in science course is highly useful for most research groups. Even though it is mostly a presentation, is it still interactive, with quizzes and small tasks. The other courses contain both presentation and hands-on workshop parts.

But generally, we can adapt the courses as you need, as potentially not all aspects will be relevant for your group. Here are some general courses that I have taught:


Best practices for programming for scientists (2 x 2h presentation)

Part of this class includes showing how to use a software IDE to efficiently debug your code
  • Coding pitfalls (classic bugs you need to know about)
  • Programming paradigms (writing maintainable code)
  • Using Integrated Development Environments (IDEs) for fast and efficient programming
  • Debugging code
  • Version control with git
  • Testing your code
  • Leveraging AI tools

See also my blog posts on writing proper code and programming paradigms.


Version control with git for scientists (integrated presentation and workshop, 4h total)

Illustration of the difference between the two `git` concepts, `merge` and `rebase`.
  • What are the benefits of version control and why should all code be in version control?
  • What is git, what is Github?
  • Setting everything up
  • Understanding the benefits of version control and how to make use of it
  • How to properly use git
    • commits
    • branches
    • merging
    • checking what has changed


Monitoring and optimizing resource usage of scientific code (~3h presentation, potentially with workshop)

Profiling a python script to find memory leaks
  • Understanding the memory architecture of computers
  • How to monitor total usage of computer programs
  • Professional profiling tools
  • A brief introduction to data structures and runtime analysis (O-notation)
  • Programming memory-efficiently
    • chunking
    • data types
    • lazy loading
    • in-place operations

See also my blog post on this topic: memory aspects in scientific code


Using the command line and bash scripting to speed up scientific workflows (integrated presentation and workshop, 4h total)

The command line is a powerful tool to find and manipulate data swiftly, and to automate workflows. This example shows a simple way to figure out how many grid cells of all sub-processes of a modeling exercise have successfully simulated until the year 2010.

This covers topics that are helpful both when working locally or on a supercomputer.

  • Navigating through files and directories
  • Finding data quickly
  • Superfast data manipulation in the command line
  • Writing scripts to automate workflows
  • Monitoring resource usage of data analyses


Introduction to Continuous Integration and Continuous Deployment (~1h presentation)

CI offers numerous tools to automatically ensure the quality of your code and to foster collaboration.
  • Collaborating
  • Automated documentation
  • Code linting
  • Automated testing
  • Issue tracking
  • Versioning


Using Docker, snakemake, and other tools to make research reproducible (Presentation 2h, potentially with workshop)

Tools like `snakemake` can help make your workflows 100% reproducible. This figure shows the automated pipeline from my ISIMIP project, including downloading, cropping, merging, and mapping data, and combining it to model input files.
  • Making scientific workflows reproducible with snakemake (Python) or targets (R)
  • Dockerize your code to make it run anywhere