in the upcoming weeks I would like to talk about data management in R. I will show you how to tidy, transform and visualise your data using three popular and powerful data management packages:
ggvis. This is what you can expect in the next weeks:
In general data management can be divided into three main steps:
1. Tidy: Data are often messy, before data can be analysed it has to be tidied up. The basic rule is to have each variable saved in its own column and every observation saved in its own row. For this purpose I will show you the package
tidyr developed by the RStudio Team that makes cleaning and tidying data much easier.
2. Transfom: Usually before you can jump into your data analysis you have to first transform, subset and filter your data.
dplyr is a package that is specialized on data frames and supposed to be much faster than
plyr and other comparable packages. I will show you how to use this package by explaining the most frequently used functions like
3.Visualise: Once your data is tidy and transformed you can start your models and analysis. The generated data and information usually has to be visualised. For this purpose I will show you the package
ggvis. It is like
ggplot2 built on concepts from the grammar of graphics. In addition to that it adds interactivity, a new data pipeline, and it renders in a web browser which enables an easy sharing and publishing of your results.
I hope to see you in the upcoming weeks!