Hey folks,
in the upcoming weeks I would like to talk about data management in R. I will show you how to tidy, transform and visualise your data using three popular and powerful data management packages: tidyr
, dplyr
and ggvis
. This is what you can expect in the next weeks:
In general data management can be divided into three main steps:
1. Tidy: Data are often messy, before data can be analysed it has to be tidied up. The basic rule is to have each variable saved in its own column and every observation saved in its own row. For this purpose I will show you the package tidyr
developed by the RStudio Team that makes cleaning and tidying data much easier.
2. Transfom: Usually before you can jump into your data analysis you have to first transform, subset and filter your data. dplyr
is a package that is specialized on data frames and supposed to be much faster than plyr
and other comparable packages. I will show you how to use this package by explaining the most frequently used functions like select
, filter
, mutate
,…
3.Visualise: Once your data is tidy and transformed you can start your models and analysis. The generated data and information usually has to be visualised. For this purpose I will show you the package ggvis
. It is like ggplot2
built on concepts from the grammar of graphics. In addition to that it adds interactivity, a new data pipeline, and it renders in a web browser which enables an easy sharing and publishing of your results.
I hope to see you in the upcoming weeks!
Cheers
Martin
Post A Reply