This week the data was about R packages on CRAN showing the package description, dependencies, licenses, etc. There were 4 data sets in total, one on the packages, one on the authors of the packages, and two interesting ones storing the edges and nodes of a network generated by collaboration on packages.
The network stuff looked interesting, but I decided to look into the package dependencies i.e. either depends, imports, or suggests. I had a theory that pretty much every package beginning with ‘gg’ must use a few standard packages and then more niche ones depending on what it needs to do. I could have just looked at the frequency or percentage of ‘gg’ packages the packages are used in but didn’t think it would make an interesting chart.
Instead, I decided to create a chart looking at pairs of packages (still not sure why other than that I thought a tile chart would look cool).
I’m happy with how it turned out. The interesting part is at the top of the triangle so I added a zoomed-in view of the top 40 packages using patchwork::inset_element
.
There are a few unexpected things, for example, out of the 194 ‘gg’ packages {dplyr} and {ggplot2} are only in ~50 packages together. Wild. I guess most just don’t use {dplyr}. A bar chart would have worked that out!
Anyway, while I’m not sold on what this is showing, it does look kind of cool.
Code 👉 Github
