Colloquium 2023 Data Wrangling with Tidyverse

From SHARCNETHelp
Jump to navigationJump to search

Tidyverse is an cohesive set of packages for doing data science in R. We have demonstrated the graphics portion of this in prior talks (ggplot). In this one we are going to demonstrate the data munging portions (dplyr, forcats, tibble, readr, stringr, tidyr, and purr) by restoring the underlying data hierarchy implicit in the layout of a 500 pages reference PDF file given only the words on each page and their bounding boxes.