• Graduate Program
    • Why study Business Data Science?
    • Program Outline
    • Courses
    • Course Registration
    • Admissions
    • Facilities
  • Research
  • News
  • Summer School
    • Deep Learning
    • Machine Learning for Business
    • Tinbergen Institute Summer School Program
    • Receive updates
  • Events
    • Events Calendar
    • Events archive
    • Summer school
      • Deep Learning
      • Machine Learning for Business
      • Tinbergen Institute Summer School Program
      • Receive updates
    • Conference: Consumer Search and Markets
    • Tinbergen Institute Lectures
    • Annual Tinbergen Institute Conference archive
  • Alumni
Home | Courses | Parallel Computing and Big data
Course

Parallel Computing and Big data


  • Teacher(s)
    Jeroen Engelberts
  • Research field
    Data Science
  • Dates
    Period 2 - Oct 30, 2023 to Dec 22, 2023
  • Course type
    Field
  • Program year
    First
  • Credits
    3

Course description

Nowadays, even mobile phones and tablets have multiple core central processing units (CPUs), as do have the simplest laptop and desktop PCs. Using their combined compute power, however, is not trivial. This is as true for the small systems, as well as (worlds) largest compute systems. In data science, making use efficiently of all compute power is a required skill that needs to be learned. In this course you will be taught how to have all cores take part in a single task, or to have each core working on its own share of the total task.

Many data scientists either run on MacOS or Linux on the desktop/laptop. Both these operating systems are, or are based on, UNIX. Furthermore, many researchers in the Netherlands make use of the national supercomputer clusters, Lisa and Snellius, at SURFsara. Like most other large shared computer systems in research, these systems have UNIX, or Linux, running as operating system. Therefore, during the course students will get hands-on experience with UNIX.

After practicing with UNIX, the different types of parallel programming will be taught with Python as programming language. Although C and Fortran are very common in high-performance computing (HPC), it is also possible to use parallelism in Python, the language of choice for many researchers in the data science field.

The contents of this course comprise a BASH (Unix shell) course, an introduction to Jupyter Notebooks and Numpy, and a programming course to learn how to work with different parallel modules and packages in Python. For the latter, the “Python Parallel Programming Cookbook” is used, which will be provided by the teacher.

Prerequisites

Programming Basics, Mathematics, Statistics

Course literature

The following list of mandatory readings (presented in alphabetical order) are considered essential for your learning experience. These articles are also part of the exam material. Changes in the reading list will be communicated on CANVAS.

Book:

G. Zaccone (2019) – Python Parallel Programming Cookbook, 2nd Edition, Packt Publishing, ISBN-13: 978-1-78953-373-6 – will be provided in PDF form by the teacher, because “[This book] has been retired.”,

https://www.packtpub.com/product/python-parallel-p...