WELCOME!
This is the second learning path prepared from the Dataviz.Shef team that is specifically designed for those who have completed Learning path - Concept or those with some experience in data visualisations and programming languages such as Python, R, or Matlab. If not, it is recommended that you go through the first learning path first before you read on.
You will soon find out that we are often referring to external resources this is because there are already enormous amazing resources available on the internet, we have organised them in relevant sections for you to check out. In addition, the university has a partnership with Linkedin Learning providing thousands of online training courses to staff and students through MUSE, we have also included some useful courses to help you get started.
Unlike the previous learning path where most of the resources concentrated on concepts of data visualisations and guides for coding, this learning path will mainly focus on exploring what we can do with each programming language to produce suitable data visualisations. In each of the languages there are three sections for you to explore, Data processing, Data visualisation, and Share. Choose a programming language to get started.
R
There are two articles[1][2] listed many useful R packages that we recommended you to look at. When you start using R, it is common that you’ll be recommended to install R studio (an Integrated development environment (IDE) for R) as it is a great tool for source code editing, build automation and debugging. Learn more about R studio at rstudio.
"The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures."
The tidyverse contained some powerful packages that covered almost every stage in producing a data visualisation, read more at here to understand more about the Tidyverse. Bear in mind that this collection of packages could be difficult to grasp for those who have no coding experience previously, so make sure you have a good understanding of R and basic functions.
The tidyverse provides readr to read data files, tidyr to tidy data, dplyr for manipulation, and Tibble for presenting data frames etc. Here are some useful resources for you to learn the tidyverse in more depth:
There are many data visulisation tools in R which offer you a broad range of choices. Some popular packages/libraries/framework are:
Packages | Description | Learning Curve | Written in |
---|---|---|---|
ggplot2 | Data visualisation package, part of the Tidyverse ecosystem | ★★☆☆☆ | R |
Plotly | Graphing library that creates interactive web-based graph available in both R and Python | ★★★☆☆ | Plotly.js, based on d3.js and stack.gl. Rendered locally through htmlwidgets |
Dash for R | A productive framework for building web applications in both R and Python | ★★★☆☆ | Plotly.js, React.js, d3.js |
Shiny | A powerful package allows you to create web based applications in simply steps | ★★★☆☆ | R, Javascript, HTML, CSS |
Leaflet | One of most popular javascript framework for interactive map | ★☆☆☆☆ | Javascript, htmlwidgets |
Rgl | A package for producing interactive 3-D plots | ★☆☆☆☆ | R |
Dygraphs | A javascript charting library capable of interpreting dense datasets and producing interactive plots | ★★★★☆ | Javascript |
Lattice | A data visualisation library emphasis on multivariate data | ★★★★☆ | R, C |
[Learning Curve] ★☆☆☆☆: Shallowest ★★★★★: Steepest
Most of the time, knowing one or two packages/framework is sufficient for building an interactive visualisation. For example, a common match is Shiny and ggplot2 working together to produce a web-based application.
Note: Ratings of difficulties are subjective, ratings are made mainly based on how much extra knowledge required, quality of documentation and examples, and the size of the community. On the other hand, it really depends on the user's background. The following are some useful resources for you to explore:
It would be a good idea to share your data visualisation to the world via some web hosting services or some website. Amazingly, Shiny and Dash (by Plotly) both provide solutions that allow people around the world to interact with your data visualisations. For other javascript based packages, you can embed most of them into your website directly or via iframe.
Shiny:
If you want to publish your own shiny app, please read this public guidance/policy statement before continuing.
Dash:
*: although this is a dash app written in python, you can replace all codes with your R codes and keep most of the existing dash component libraries.We are also recommending you to store your code on a source code hosting website such as GitHub, GitLab, and BitBucket, so that people can reproduce your work easily.
Python
Perhaps many of you have already used Jupyter Notebook or JupyterLab for writing Python codes, otherwise we suggest you take a look at this website to learn more about them and why they are great for developing data visualisations with Python.
Pandas - a python package known for easy-to-use data structure and data analysis tool.
Numpy - a package used for working with arrays, as well as supports for linear algebra calculations and many high-level mathematical functions.
Scikit-learn - a popular machine learning package built on Numpy, Scipy, and Matplotlib.
One thing to note is that all packages mentioned above are part of Scipy - a Python-based ecosystem of open-source software for mathematics, science, and engineering.
If you prefer video tutorials:There are many data visulisation tools in Python which offer you a broad range of choices. Here are some packages/libraries/framework that are used by researchers:
(Click for tutorials) | Description | Learning Curve | Written in |
---|---|---|---|
Matplotlib | Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python | ★★☆☆☆ | Python, C++ |
Glueviz | Glueviz is a Python library with its own graphical user interface. This tool is worth trying if you want to compare different, interrelated datasets. Check out our GlueViz blog tag page to find out more. | ★☆☆☆☆ | Python |
Seaborn | A Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics | ★★☆☆☆ | Python |
Bokeh | Bokeh is an interactive visualization library for modern web browsers | ★★★☆☆ | Python, Typescript |
Shiny | A powerful package allows you to create web based applications in simply steps | ★★★☆☆ | R, Javascript, HTML, CSS |
Plotly | An interactive, open-source plotting library that supports over 40 unique chart types covering a wide range of statistical, financial, geographic, scientific, and 3-dimensional use-cases | ★★★☆☆ | Python, Javascript |
[Learning Curve] ★☆☆☆☆: Shallowest ★★★★★: Steepest
Note: Ratings of difficulties are subjective, ratings are made mainly based on how much extra knowledge required, quality of documentation and examples, and the size of the community. On the other hand, it really depends on the user's background. The following are some articles for you to read:
It would be a good idea to share your data visualisation to the world via some web hosting services or some website. If you have decided to work with Plotly, then you can use Dash (also provided by Plotly) for hosting your application. For other packages, you can share your notebook with various platforms such as binder, Microsoft Azure Notebook, and Google Collaboratory.
Host Jupyter Notebook:
Dash:
We are also recommending you to store your code on a source code hosting website such as GitHub, GitLab, and BitBucket, so that people can reproduce your work easily.
Matlab
Matlab is different to the previous languages described here, in that it is an interactive programming environment, with the Matlab language at it core. Mathlab was designed for engineers and scientists, so it focuses on making data manipulation, analysis and visualisation tools as intuative and natural as possible.
Mathworks have produced a number of excellent training courses, webinars and tutorials which can help you build skills and confidence when processing your data with Matlab.
Essential skills:
Matlab also has a number of advanced functions and additional toolboxes that can help researchers utilise modern computing power. Including, but not limited to; Machine Learning, tall arrays for big data management, GPU computing and Image processing.
Visualising data with Matlab is enjoyable; it also offers a considerable amount of flexibility. Visit the Matlab plot gallery to look at some of the many ways you can visualise data with Matlab. There is also a LiveScript Gallery where its possible to access interesting and informative data visualisation workflows.
You may find some of the following resources useful:Like Dash for Python and Shiny for R, Matlab also has an App Designer which can be hosted on the web, letting users explore your data using visualisations. You can find an introduction to the Matlab App Designer here, along with a helpful catalogue of components that can be easily built into your applications.
Getting started with App designer:Congratulations!
You have completed the learning path - Lab, now you know how to create data visualisations with your chosen programming languages and packages. While continuing to explore more about data visualisations, perhaps we should think about how to create a reproducible process...