Pie charts within a list - r

Let me start off by saying that I know pie charts are terrible methods of accurately displaying data but I have been asked to produce this as part of a report. I have a data set that contains information about location, injury type, and then several fields of personal data. I would like to display a pie chart of the percentage of each type of injury that occurs at each location. I've tried this where facility2 is a list of 52 elements created by splitting the full dataframe by ServiceSite.x. This partially works but the pie charts created only contain the count for one "initial type".
summarized_list <- lapply(facility2, function(x){
x %>% group_by(InitialType) %>% summarize(length(InitialType))
})
pies <- function(z) {
ggplot(z, aes(x = "", fill = length(InitialType)))+
geom_bar(width = 1, na.rm = TRUE)+
coord_polar(theta = "y")
}
lapply(summarized_list, pies)
This also partially works and would be perfect, but only prints out 13 charts instead of all 52
pies2 <- function(x) {
ggplot(x, aes(x = "", fill = InitialType))+
geom_bar(width = 1, na.rm = TRUE)+
coord_polar(theta = "y")+
xlab(x$ServiceSite.x)
}
lapply(facility2, pies2)
and gives this error
Error: Aesthetics must be either length 1 or the same as the data (1): x, fill
I know the first method splits the data perfectly while providing correct counts, I just can't figure out what I need to change in the ggplot() to have all injury types display for each facility. I would also like to add a label of percentages if possible or at least just the counts.
Sample data:
ServiceSite.x InitialType
2 Dermatitis
2 Diabetic
2 Pressure Injury
2 Pressure Injury
3 Pressure Injury
3 Other
3 Laceration
3 Other
4 Pressure Injury
4 MASD
4 Blister (Non-Pressure)
4 Skin Tear
4 Pressure Injury
5 Skin Tear
5 Other
5 Contusion
5 Skin Tear
5 Surgical(Non-Healing)
5 Pressure Injury
6 Pressure Injury
1 Pressure Injury
6 Pressure Injury
6 MASD
1 Surgical(Non-Healing)
1 Pressure Injury
1 Skin Tear
1 Contusion
facility2 <- split(full, full$ServiceSite.x)
both variables are factors.

I'm not getting the error you're showing, so you might want to go through and find the culprit, which may not be in the sample you posted. I'm using purrr::map rather than lapply; that's partially preference, and partially how well it fits in with a piped workflow. I also find map functions easy for debugging, since it will print the name or index of each item as it's being mapped over; this often helps me figure out where in a list a problem is.
The first set of plots here just come from rewriting your code using purrr::imap, which maps over two lists: the list itself, and its names. split names the list based on the values in ServiceSite.x, so you now have access to them to set the xlab. I'm not sure if one bug might have been in setting the xlab with x$ServiceSite.x, which seems like it should have returned an entire vector, not just a single string.
library(tidyverse)
library(patchwork)
pies1 <- df %>%
split(.$ServiceSite.x) %>%
imap(function(data, site) {
ggplot(data, aes(x = "", fill = InitialType)) +
geom_bar(width = 1) +
coord_polar(theta = "y") +
xlab(site)
})
I'm using the patchwork library to stick all the plots together just for easier display here.
reduce(pies1, `+`) + plot_layout(ncol = 2, byrow = T)
For the labels, do a little data prep first to calculate counts and percentages. Here I did this with a couple dplyr functions. Then add a geom_text with position_stack(vjust = 0.5) so the texts will 1. stack going around the circle, the same as the bars do, and 2. be centered in the wedges. I'll leave it up to you to format the text as you want, including adding count labels instead or in addition.
pies2 <- df %>%
split(.$ServiceSite.x) %>%
imap(function(data, site) {
data %>%
count(ServiceSite.x, InitialType) %>%
mutate(share = round(n / sum(n), digits = 2)) %>%
ggplot(aes(x = "", y = n, fill = InitialType)) +
geom_col(width = 1) +
geom_text(aes(label = scales::percent(share)), position = position_stack(vjust = 0.5)) +
coord_polar(theta = "y")
})
pies2[[1]]

Related

Error: "replacement has 1 row, data has 0" attempting to create animation using gganimate

I am attempting to create an animation of NFL plays using R's "gganimate" package, however whenever I attempt to run the gganimate package (by calling the "transition_time" function), I receive the following error:
`
ERROR while rich displaying an object: Error in `$<-.data.frame`(`*tmp*`, "group", value = ""): replacement has 1 row, data has 0
`
According to the research I've done, this error occurs when data is attempted to a new column in a dataframe before that column has been created. In this case, I assume the column "group" is the issue column, but I'm unsure how/why gganimate is creating a new column on the dataframe in the first place, so I'm having trouble getting anywhere with solving the issue.
Dataset
The function I have created takes in a dataframe that is a filtered version of the week1_merged_v2.csv file attached (referred to as "df_week1_full" in the code). Each instance of this dataset represents a specific player's location (represented by "x" and "y" columns) at every tenth of a second (each tenth of a second is represented by the "frameId" column). I am attempting to animate how those locations change over time (as "frameId" changes).
Here is the function as currently constructed ("animation stuff" is where the animation is created):
plot_play <- function(df_one_play) {
# unique list of team colors
colores <- unique(df_week1_full$color1)
names(colores) <- colores
# play metadata
play_dir <- df_one_play$playDirection %>% .[1]
yards_togo <- df_one_play$yardsToGo %>% .[1]
los <- df_one_play$absoluteYardlineNumber %>% .[1]
togo_line <- if(play_dir=="left") los-yards_togo else los+yards_togo
# field background
field_plot +
# plot players
geom_point(data = df_one_play, aes(x = x, y = y, color = color1)) +
geom_point(size = 6) +
# color players according to team color
scale_colour_manual(values = colores) +
# get rid of color legend
theme(legend.position="none") +
# line of scrimmage
annotate(
"segment",
x = los, xend = los, y = 0, yend = 160/3,
colour = "#0d41e1"
) +
# 1st down marker
annotate(
"segment",
x = togo_line, xend = togo_line, y = 0, yend = 160/3,
colour = "#f9c80e"
) +
# animation stuff
transition_time(frameId) +
ease_aes('linear') +
NULL
}
Does anyone have knowledge of gganimate and perhaps what I am doing incorrectly when calling its functions? Thank you in advance! Also, I'm a long-time user of Stack Overflow, but this is first time asking a question, so any recommendations on how to more appropriately ask questions would be great as well.

ggplotting multiple lines of same data type on one line graph

So i have this data, that I would like to plot onto a graph - all the lines on the same graph
>ndiveristy
Quadrant nta.shannon ntb.shannon ntc.shannon
1 1 2.188984 0.9767274 1.8206140
2 2 1.206955 1.3240481 1.3007058
3 3 1.511083 0.5805081 0.7747041
4 4 1.282976 1.4222243 0.4843907
5 5 1.943930 1.7337267 1.5736545
6 6 2.030524 1.8604619 1.6860711
7 7 2.043356 1.5707110 1.5957869
8 8 1.421275 1.4363365 1.5456799
here is the code that I am using to try to plot it:
ggplot(ndiversity,aes(x=Quadrant,y=Diversity,colour=Transect))+
geom_point()+
geom_line(aes(y=nta.shannon),colour="red")+
geom_line(aes(y=ntb.shannon),colour="blue")+
geom_line(aes(y=ntc.shannon),colour="green")
But all I am getting is the error
data must be a data frame, or other object coercible by fortify(), not a numeric vector.
Can someone tell me what I'm doing wrong
Typically, rather than using multiple geom_line calls, we would only have a single call, by pivoting your data into long format. This would create a data frame of three columns: one for Quadrant, one containing labels nta.shannon, ntb.shannon and ntc.shannon, and a column for the actual values. This allows a single geom_line call, with the label column mapped to the color aesthetic, which automatically creates an instructive legend for your plot too.
library(tidyverse)
as.data.frame(ndiversity) %>%
pivot_longer(-1, names_to = 'Type', values_to = 'Shannon') %>%
mutate(Type = substr(Type, 1, 3)) %>%
ggplot(aes(Quadrant, Shannon, color = Type)) +
geom_line(size = 1.5) +
theme_minimal(base_size = 16) +
scale_color_brewer(palette = 'Set1')
For posterity:
convert to data frame
ndiversity <- as.data.frame(ndiversity)
get rid of the excess code
ggplot(ndiversity,aes(x=Quadrant))+
geom_line(aes(y=nta.shannon),colour="red")+
geom_line(aes(y=ntb.shannon),colour="blue")+
geom_line(aes(y=ntc.shannon),colour="green")
profit
not the prettiest graph I ever made

R, ggplot, How do I keep related points together when using jitter?

One of the variables in my data frame is a factor denoting whether an amount was gained or spent. Every event has a "gain" value; there may or may not be a corresponding "spend" amount. Here is an image with the observations overplotted:
Adding some random jitter helps visually, however, the "spend" amounts are divorced from their corresponding gain events:
I'd like to see the blue circles "bullseyed" in their gain circles (where the "id" are equal), and jittered as a pair. Here are some sample data (three days) and code:
library(ggplot2)
ccode<-c(Gain="darkseagreen",Spend="darkblue")
ef<-data.frame(
date=as.Date(c("2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03")),
site=c("Castle","Temple","Temple","Temple","Temple","Temple","Palace","Palace","Castle","Castle","Castle","Temple","Temple","Palace","Palace","Castle","Castle","Castle","Castle","Castle","Temple","Temple","Palace","Castle","Temple","Temple","Temple","Temple","Temple","Palace","Palace","Castle","Castle","Castle","Temple","Temple","Palace","Palace","Castle","Castle","Castle","Castle","Castle","Temple","Temple","Palace"),
id=c("C123","T101","T93","T94","T95","T96","P102","P96","C126","C127","C128","T100","T98","P100","P98","C129","C130","C131","C132","C133","T104","T99","P99","C123","T101","T93","T94","T95","T96","P102","P96","C126","C127","C128","T100","T98","P100","P98","C129","C130","C131","C132","C133","T104","T99","P99"),
gainspend=c("Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend"),
amount=c(6,14,34,31,3,10,6,14,2,16,16,14,1,1,15,11,8,7,2,10,15,4,3,NA,NA,4,5,NA,NA,NA,NA,NA,NA,2,NA,1,NA,3,NA,NA,2,NA,NA,2,NA,3))
#▼ 3 day, points centered
ggplot(ef,aes(date,site)) +
geom_point(aes(size=amount,color=gainspend),alpha=0.5) +
scale_color_manual(values=ccode) +
scale_size_continuous(range=c(1,15),breaks=c(5,10,20))
#▼ 3 day, jitted
ggplot(ef,aes(date,site)) +
geom_point(aes(size=amount,color=gainspend),alpha=0.5,position=position_jitter(w=0,h=0.2)) +
scale_color_manual(values=ccode) +
scale_size_continuous(range=c(1,15),breaks=c(5,10,20))
My main idea is the old "add jitter manually" approach. I'm wondering if a nicer approach could be something like plotting little pie charts as points a la package scatterpie.
In this case you could add a random number for the amount of jitter to each ID so points within groups will be moved the same amount. This takes doing work outside of ggplot2.
First, draw the "jitter" to add for each ID. Since a categorical axis is 1 unit wide, I choose numbers between -.3 and .3. I use dplyr for this work and set the seed so you will get the same results.
library(dplyr)
set.seed(16)
ef2 = ef %>%
group_by(id) %>%
mutate(jitter = runif(1, min = -.3, max = .3)) %>%
ungroup()
Then the plot. I use a geom_blank() layer so that the categorical site axis is drawn before I add the jitter. I convert site to be numeric from a factor and add the jitter on; this only works for factors so luckily categorical axes in ggplot2 are based on factors.
Now paired ID's move together.
ggplot(ef2, aes(x = date, y = site)) +
geom_blank() +
geom_point(aes(size = amount, color = gainspend,
y = as.numeric(factor(site)) + jitter),
alpha=0.5) +
scale_color_manual(values = ccode) +
scale_size_continuous(range = c(1, 15), breaks = c(5, 10, 20))
#> Warning: Removed 15 rows containing missing values (geom_point).
Created on 2021-09-23 by the reprex package (v2.0.0)
You can add some jitter by id outside the ggplot() call.
jj <- data.frame(id = unique(ef$id), jtr = runif(nrow(ef), -0.3, 0.3))
ef <- merge(ef, jj, by = 'id')
ef$sitej <- as.numeric(factor(ef$site)) + ef$jtr
But you need to make site integer/numeric to do this. So when it comes to making the plot, you need to manually add axis labels with scale_y_continuous(). (Update: the geom_blank() trick from aosmith above is a better solution!)
ggplot(ef,aes(date,sitej)) +
geom_point(aes(size=amount,color=gainspend),alpha=0.5) +
scale_color_manual(values=ccode) +
scale_size_continuous(range=c(1,15),breaks=c(5,10,20)) +
scale_y_continuous(breaks = 1:3, labels= sort(unique(ef$site)))
This seems to work, but there are still a few gain/spend circles without a partner--perhaps there is a problem with the id variable.
Perhaps someone else has a better approach!

Split one massive plot into smaller sub-plots for better visualisation in ggplot

I've got data on survival/sampling dates of over 500 dogs, each dog having been sampled at least once, and several having been sampled three or four times. For e.g.
Microchip_number Date Sampling_occasion
White notched fatso 20,11,2018 First
White notched fatso 28,12,2018 Second
White notched fatso 09,04,2019 Third
White notched fatso 23,10,2019 Fourth
Tuttu Jeevan 06,12,2018 First
Tuttu Jeevan 03,01,2019 Second
Tuttu Jeevan 04,05,2019 Third
Tuppy 22,10,2018 First
Tuppy 20,11,2018 Second
Tuppy 17,04,2019 Third
Tuppy 31,07,2019 Lost to study
I've managed to plot this in ggplot, but it's a very large image which requires zooming in and scrolling to view the sampling times of each individual dog.
Plot of outcomes for all dogs
I've found suggestions to split large dataframes based on a certain variable (e.g. month) or to use facet_wrap, but in my case, I don't have any such variable to use. Is there a way to split this large plot into multiple smaller plots that don't need to be zoomed in to view all the details clearly, such as below (without having to separately plot subsets of the dataframe)?
How I'd like each split/sub-plot to appear
This is the code I'm using
outcomes <- read_xlsx("Dog outcomes.xlsx", col_types = c("text", "date", "text"))
outcomes$Microchip_number<- as.factor(outcomes$Microchip_number)
outcomes$Sampling_occasion<- factor(outcomes$Sampling_occasion,
levels = c("First", "Second", "Third", "Fourth", "Lost to study", "Died"))
g<- ggplot(outcomes)
g + geom_point(aes(x = Date, y = Microchip_number, colour = Sampling_occasion, shape = Sampling_occasion)) +
geom_line(aes(x = Date, y = Microchip_number, group = Microchip_number, colour = Sampling_occasion)) +
theme_bw()
Thanks so much, Jrm FRL, the code to add the counter and subgroup columns was exactly what I needed! As Gregor mentioned, facet_wrap just made things more difficult to view, so I used a for loop using subgroup to plot 50 dogs per pdf page (or any other device). This is the code I used, and it's worked perfectly, although for some reason, the 'Microchip_number's are displaying in reverse sequence / alphabetical order (68481, 68480, 68479 etc.), despite being organised the other way round in the main dataframe 'outcomes'. Minor quibble, however! This makes it so much easier to visualise outcomes for specific individuals. Cheers!
outcomes2 <- outcomes %>%
mutate(counter = 1 + cumsum(c(0,as.numeric(diff(Microchip_number))!=0)), # this counter starting at 1 increments for each new dog
subgroup = as.factor(ceiling(counter/50)))
pdf(file = "All_outcomes_50.pdf") #
for (i in 1:length(unique(outcomes2$subgroup))) {
outcomes2 %>%
filter(subgroup == i) -> df
ggplot(df) + geom_point(aes(x = Date, y = Microchip_number, colour = Sampling_occasion, shape = Sampling_occasion)) +
geom_line(aes(x = Date, y = Microchip_number, group = Microchip_number, colour = Sampling_occasion)) +
theme_bw() -> wow
print(wow)
}
dev.off()
New plot after using 'for' loop
You can simply divide your dasatet in sub-groups containing the same number of dogs (e.g. 10).
Add an intermediate counter column to overcome the small difficulty that there is not necessarly the same number of rows for each dog.
I would suggest :
library('dplyr')
outcomes <- outcomes %>%
mutate(counter = 1 + cumsum(c(0,as.numeric(diff(Microchip_number))!=0)), # this counter starting at 1 increments for each new dog
subgroup = as.factor(ceiling(counter/10)))
You will obtain a new dataset with a factor subgroup column whose value is different every 10th dog. Then just add a + facet_wrap(.~subgroup) to your plot.
Hope this will help.

How to melt a dataframe into multiple factors

I have been trying to plot a line plot with ggplot.
My data looks something like this:
I04 F04 I05 F05 I06 F06
CAT 3 12 2 6 6 20
DOG 0 0 0 0 0 0
BIEBER 1 0 0 1 0 0
and can be found here.
Basically, we have a certain number of CATs (or other creatures) initially in a year (this is I04), and a certain number of CATs at the end of the year (this is F04). This goes on for some time.
I can plot something like this fairly simply using the code below, and get this:
This is fantastic, but doesn't work very well for me. After all, I have these staring and ending inventory for each year. So I am interested in seeing how the initial values (I04, I05, I06) change over time. So, for each animal, I would like to create two different lines, one for initial quantity and one for final quantity (F01, F05, F06). This seems to me like now I have to consider two factors.
This is really difficult given the way my data is set up. I'm not sure how to tell ggplot that all the I prefixed years are one factor, and all the F prefixed years are another factor. When the dataframe gets melted, it's too late. I'm not sure how to control this situation.
Any advice on how I can separate these values or perhaps another, better way to tackle this situation?
Here is the code I have:
library(ggplot2)
library(reshape2)
DF <- read.csv("mydata.csv", stringsAsFactors=FALSE)
## cleaning up, converting factors to numeric, etc
text_names <- data.frame(as.character(DF$animals))
names(text_names) <- c("animals")
numeric_cols <- DF[, -c(1)]
numeric_cols <- sapply(numeric_cols, as.numeric)
plot_me <- data.frame(cbind(text_names, numeric_cols))
plot_me$animals <- as.factor(plot_me$animals)
meltedDF <- melt(plot_me)
p <- ggplot()
p <- p + geom_line(aes(seq(1:36), meltedDF$value, group=meltedDF$animals, color=meltedDF$animals))
p
Using your original data from the link:
nd <- reshape(mydata, idvar = "animals", direction = "long", varying = names(mydata)[-1], sep = "")
ggplot(nd, aes(x = time, y = I, group = animals, colour = animals)) + geom_line() + ggtitle("Development of initial inventories")
ggplot(nd, aes(x = time, y = F, group = animals, colour = animals)) + geom_line() + ggtitle("Development of final inventories")
I think from a data analyst perspective the following approach might provide better insight.
For each animal we visualize the initial and the final quantity in a separate panel. Moreover, each subplot has its own y scale because the values of the different animal types are radically different. Like this, differences within and across animal types are easier to spot.
Given the current structure of your data, we do not need two different factors. After the gather call the indicator column includes data like I04, F04, etc. We just need to separate the first character from the rest resulting in two columns type and time. We can use type as the argument for color in the ggplot call. time provides a unified x-axis across all animal types.
library(tidyr)
library(dplyr)
library(ggplot2)
data %>% gather(indicator, value, -animals) %>%
separate(indicator, c('type', 'time'), sep = 1) %>%
mutate(
time = as.numeric(time)
) %>% ggplot(aes(time, value, color = type)) +
geom_line() +
facet_grid(animals ~ ., scales = "free_y")
Of course, you might also do it the other way round, namely using a subplot for the initial and the final quantities like this:
data %>% gather(indicator, value, -animals) %>%
separate(indicator, c('type', 'time'), sep=1) %>%
mutate(
time = as.numeric(time)
) %>% ggplot(aes(time, value, color = animals)) +
geom_line() +
facet_grid(type ~ ., scales = "free_y")
But as described above, I would not recommend that because the y scale varies too much across animal types.

Resources