CAS Community   >   Resources   >  

no image
Learn to Code for Data Analysis

A free online introduction to reproducible data analysis using Python and open data

Michel Wermelinger

Created by Michel Wermelinger
last edited May 03 2017 by Michel Wermelinger


“Learn to Code for Data Analysis” is a free online course by The Open University, continuously available here.

It is an introduction on how to obtain, clean, process, analyse, and visualise open data, and publicly share the results according to a reproducible research approach. The course uses real health, weather, development and economic data from the World Health Organisation, the Weather Underground, the World Bank, and the United Nations Comtrade databases.

The course assumes no knowledge of programming and statistics and does not require any software installation: it uses an online Python-based environment. The course uses pandas (a state of the art data analysis library for Python), and Jupyter notebooks as the programming and documentation environment. All these are used by professional scientists.

The course aims to promote data literacy and may be of interest to teachers and A-level students not just in Computing but also in maths, natural sciences, engineering, sociology, human geography and other disciplines where data analysis can be used to investigate a topic more in depth. I’d appreciate if you can share this in your school.

The pedagogy of this course is outlined here.

Level: Beginner

Duration: 20-30h (depending on how many of the exercises and projects are tackled)

Learning outcomes:

  • Understanding basic programming and data analysis concepts (e.g. correlation)
  • Awareness of open data sources as a public resource
  • Using a programming environment to develop programs
  • Writing simple programs to analyse large bodies of data and produce useful results

Syllabus:

  • Python: variables, assignments, expressions, basic data types, if-statement, functions
  • Programming: using Jupyter Notebooks, writing readable and documented code, testing code
  • Data analysis: using pandas to read CSV and Excel files, to clean, filter, partition, aggregate and summarise data
  • Visualisation: using matplotlib to produce simple line, bar and scatter charts
  • Reproducible research: writing up and publicly sharing data analyses

Feedback and Comments


Available when logged in (join via the front page, for free):
  • View 2 comments on this resource.
  • View resource history, links to related resources.
  • Leave feedback for the author(s), or help by editing the resource.
Categories: