Learn to Code for Data Analysis

last edited May 03 2017 by Michel Wermelinger | Created by Michel Wermelinger | Other contributors:

A free online introduction to reproducible data analysis using Python and open data

“Learn to Code for Data Analysis” is a free online course by The Open University, continuously available here.

It is an introduction on how to obtain, clean, process, analyse, and visualise open data, and publicly share the results according to a reproducible research approach. The course uses real health, weather, development and economic data from the World Health Organisation, the Weather Underground, the World Bank, and the United Nations Comtrade databases.

The course assumes no knowledge of programming and statistics and does not require any software installation: it uses an online Python-based environment. The course uses pandas (a state of the art data analysis library for Python), and Jupyter notebooks as the programming and documentation environment. All these are used by professional scientists.

The course aims to promote data literacy and may be of interest to teachers and A-level students not just in Computing but also in maths, natural sciences, engineering, sociology, human geography and other disciplines where data analysis can be used to investigate a topic more in depth. I’d appreciate if you can share this in your school.

The pedagogy of this course is outlined here.

Level: Beginner

Duration: 20-30h (depending on how many of the exercises and projects are tackled)

Learning outcomes:

  • Understanding basic programming and data analysis concepts (e.g. correlation)
  • Awareness of open data sources as a public resource
  • Using a programming environment to develop programs
  • Writing simple programs to analyse large bodies of data and produce useful results


  • Python: variables, assignments, expressions, basic data types, if-statement, functions
  • Programming: using Jupyter Notebooks, writing readable and documented code, testing code
  • Data analysis: using pandas to read CSV and Excel files, to clean, filter, partition, aggregate and summarise data
  • Visualisation: using matplotlib to produce simple line, bar and scatter charts
  • Reproducible research: writing up and publicly sharing data analyses


By downloading this resource you agree to the CAS resource guidelines and use it appropriately.

Note: Unless otherwise specified, this resource and all associated files are published here under the Creative Commons Attribution-Share Alike 3.0 Licence. If you wish to use a different license, please state this here.

Read our resource guidelines

Feedback and Comments

Available when logged in (join via the front page, for free):
  • View 0 comments on this resource.
  • View resource history, links to related resources.
  • Leave feedback for the author(s), or help by editing the resource.
no image