Data Management with R using tidyr, dplyr and ggvis – Part 0

Hey folks,

in the upcoming weeks I would like to talk about data management in R.  I will show you how to tidy, transform and visualise your data using three popular and powerful data management packages: tidyr, dplyr and ggvis. This is what you can expect in the next weeks:

In general data management can be divided into three main steps:

1. Tidy: Data are often messy, before data can be analysed it has to be tidied up. The basic rule is to have each variable saved in its own column and every observation saved in its own row. For this purpose I will show you the package tidyr developed by the RStudio Team that makes cleaning and tidying data much easier.

2. Transfom: Usually before you can jump into your data analysis you have to first transform, subset and filter your data. dplyr is a package that is specialized on data frames and supposed to be much faster than plyr and other comparable packages. I will show you how to use this package by explaining the most frequently used functions like select, filter, mutate,…

3.Visualise: Once your data is tidy and transformed you can start your models and analysis.  The generated data and information usually has to be visualised. For this purpose I will show you the package ggvis. It is like ggplot2 built on concepts from the grammar of graphics. In addition to that it adds interactivity, a new data pipeline, and it renders in a web browser which enables an easy sharing and publishing of your results.

I hope to see you in the upcoming weeks!




About This Author

Martin was born in Czech Republic and studied at the University of Natural Resources and Life Sciences, Vienna. He is currently working at an Earth Observation Company in Austria, specialised in Land Monitoring. His main interests are: Open-source applications like R, (geospatial) statistics and data-management, web-mapping and visualization. He loves travelling, geocaching, photography and TV series.

Post A Reply