I have a data set similar to the one below where I have a lot of data for certain groups and then only single observations for other groups. I would like my single observations to show up as points but the other groups with multiple observations to show up as lines (no points). My code is below:
EDIT: I'm attempting to find a way to do this without using multiple datasets in the geom_* calls because of the issues it causes with the legend. There was an answer that has since been deleted that was able to handle the legend but didn't get rid of the points on the lines. I would potentially like a single legend with points only showing up if they are a single observation.
library(tidyverse)
dat <- tibble(x = runif(10, 0, 5),
y = runif(10, 0, 20),
group = c(rep("Group1", 4),
rep("Group2", 4),
"Single Point 1",
"Single Point 2")
)
dat %>%
ggplot(aes(x = x, y = y, color = group)) +
geom_point() +
geom_line()
Created on 2019-04-02 by the reprex package (v0.2.1)
Only plot the data with 1 point in geom_point() and the data with >1 point in geom_line(). These can be precalculated in mutate().
dat = dat %>%
group_by(group) %>%
mutate(n = n() )
dat %>%
ggplot(aes(x = x, y = y, color = group)) +
geom_point(data = filter(dat, n == 1) ) +
geom_line(data = filter(dat, n > 1) )
Having the legend match this is trickier. This is the sort of thing that that override.aes argument in guide_legend() can be useful for.
In your case I would separately calculate the number of observations in each group first, since that is what the line vs point is based on.
sumdat = dat %>%
group_by(group) %>%
summarise(n = n() )
The result is in the same order as the factor levels in the legend, which is why this works.
Now we need to remove lines and keep points whenever the group has only a single observation. 0 stands for a blank line and NA stands for now shape. I use an ifelse() statement for linetype and shape for override.aes, based on the number of observations per group.
dat %>%
ggplot(aes(x = x, y = y, color = group)) +
geom_point(data = filter(dat, n == 1) ) +
geom_line(data = filter(dat, n > 1) ) +
guides(color = guide_legend(override.aes = list(linetype = ifelse(sumdat$n == 1, 0, 1),
shape = ifelse(sumdat$n == 1, 19, NA) ) ) )
Related
I am trying to create a plot in R that shows post-surgical outcomes over time. Each row in the dataframe has up to 8 measurements at different time points post-surgery (with some missing values), and for each row, I want to create a line graph that shows the change in the measurement over time. Here is an example dataframe:
dat <- data.frame(Preop=c(-2,0.5,-0.25,1.5), PO_1M=c(-1.5,0.2,-0.1,1.0), PO_6M=c(-1.2,0.1,-0.05,0.5), PO_1Y=c(-1.0,0.05,0,0.25))
dat
I have tried the following code to rearrange the data and get a plot with points over time, but I want to change this so that each row is maintained and can create a line graph.
library(tidyverse)
dat2<-dat %>% tidyr::pivot_longer(cols=Preop:PO_1Y)
dat2$nummonths<-ifelse(dat2$name=='Preop',0,
ifelse(dat2$name=='PO_1M',1,
ifelse(dat2$name=='PO_6M',6,
ifelse(dat2$name=='PO_1Y',12,NA))))
ggplot(dat2, aes(nummonths,value))+geom_point()
I want the graph to look something like this:
Currently, I have the points plotted, but I do not know how to connect these points to create a line graph. Thanks so much for any help!
You need an id of sorts in the data in order to group by it and plot it accordingly. Here's a dplyr suggestion to get you started:
library(dplyr)
# library(tidyr) # pivot_longer
library(ggplot2)
dat %>%
mutate(id = factor(row_number())) %>%
tidyr::pivot_longer(cols=Preop:PO_1Y) %>%
mutate(NumMonths = case_when(name == "Preop" ~ 0, name == "PO_1M" ~ 1, name == "PO_6M" ~ 6, name == "PO_1Y" ~ 12, TRUE ~ NA_real_)) %>%
ggplot(aes(NumMonths, value)) + geom_path(aes(group = id, color = id))
An alternative (in place of case_when) is to define a lookup table of Months that maps names to number of months, and then you can easily use this to add some context to your plot:
Months <- tibble(
name = c("Preop", "PO_1M", "PO_6M", "PO_1Y"),
NumMonths = c(0, 1, 6, 12)
)
dat %>%
mutate(id = factor(row_number())) %>%
tidyr::pivot_longer(cols=Preop:PO_1Y) %>%
left_join(., Months, by = "name") %>%
ggplot(aes(NumMonths, value)) +
geom_text(aes(y = -Inf, label = name), data = Months, hjust = 0, vjust = 0, angle = 90) +
geom_path(aes(group = id, color = id)) +
geom_point(aes(group = id, color = id))
While it can arguably be improved aesthetically, I think the structure of it should be clear enough.
I am trying to avoid plotting lines beyond the first and last zero to avoid this overlap. Please note that this is just a toy data of a much bigger data set and that solution to filter 0s does not work in this case.
dta <-
data.frame(grp = c(rep("a",10), rep("b",10),rep("c",10)),
lines = c(rep(seq(1,10,1),3)),
vc = c(c(0,0,0,0,.3,.3,.1, 0,0,0),
c(.1,.3,.3,.3,.1, 0,0,0,0,0),
c(0,0,0,0,0, 0,0,0,0,0)))
dta %>%
ggplot(aes(lines, vc, color = grp))+
geom_line()+
scale_x_continuous(
breaks = seq(0, 10, 1)
)+
scale_y_continuous(
limits = c(-0.01, 1),
breaks = seq(0, 1, 0.1)
)
Any ideas on how to remove these lines, please? For example, the blue line should stop at x=6.
If I set 0 to NA lines do not go down to the x-axis.
dta %>%
mutate(vc = ifelse(vc==0, NA, vc)) %>%
ggplot(aes(lines, vc, color = grp))+
geom_line()+
scale_x_continuous(
breaks = seq(0, 10, 1)
) +
scale_y_continuous(
limits = c(-0.01, 1),
breaks = seq(0, 1, 0.1)
)
I need the blue line to go down to the x-axis and then stop. This goes for all other lines.
Here is a working solution with tidyverse:
library(tidyr)
library(dplyr)
dta %>%
group_by(grp) %>%
mutate(across(-lines,
~ ifelse(lag(.) == 0 & . == 0 & lead(.) == 0, NA, .))) %>%
ggplot(aes(lines, vc, color = grp)) +
geom_line()
Produces this plot:
This solution is kind of verbose but does what you need I believe. It can be applied to a grouped data frame. For each group, given a column name as input, it trims away rows at the beginning and end where that column is equal to zero... but importantly it retains a zero at the beginning and end.
function definition
The function uses tidy evaluation for the column name by which to trim the data frame. The statements with which find runs of zeroes at the beginning and end, if present, and retain the last zero before the nonzero entries and the first one after them.
trim_zero <- function(data, column) {
x0 <- pull(data, {{ column }}) == 0
beginning_0 <- max(which(x0)[which(x0) < min(which(!x0))], 1)
ending_0 <- min(which(x0)[which(x0) > max(which(!x0))], length(x0))
data[beginning_0:ending_0, ]
}
applying the function to your data
require(dplyr)
require(ggplot2)
dta_trimmed <- dta %>%
group_by(grp) %>%
group_modify(~ trim_zero(., vc))
ggplot(dta_trimmed, aes(lines, vc, color = grp))+
geom_line()+
scale_x_continuous(
breaks = seq(0, 10, 1)
)+
scale_y_continuous(
limits = c(-0.01, 1),
breaks = seq(0, 1, 0.1)
)
library(tidyverse)
dta <-
data.frame(grp = c(rep("a",10), rep("b",10)),
lines = c(rep(seq(1,10,1),2)),
vc = c(c(0,0,0,0,.3,.3,.1, 0,0,0),
c(.1,.3,.3,.3,.1, 0,0,0,0,0)))
dta %>%
filter(vc > 0) %>%
ggplot(aes(lines, vc, color = grp))+
geom_line()
Created on 2021-06-05 by the reprex package (v2.0.0)
Here is code to give context to my question:
set.seed(1); tibble(x=factor(sample(LETTERS[1:7],7,replace = T),levels = LETTERS[1:7])) %>% group_by_all() %>% count(x,.drop = F) %>%
ggplot(mapping = aes(x=x,y=n))+geom_bar(stat="identity")+geom_text(
aes(label = n, y = n + 0.05),
position = position_dodge(1),
vjust = 0)
I want ALL of the levels of the variable x to be displayed on the x-axis (LETTERS[1:7]). For each Level with n>0, I want the value to display atop the bar for that level. For each level with n==0, I want the value label to NOT be displayed. Currently, the plot displays the 0 for 'empty' factor levels c("C","F"), and I want to suppress the display of '0's for those levels, but still display "C", and "F" on the x-axis.
I hope someone might be able to help me.
Thanks.
A simple ifelse() will do it. You can enter any text you like for example ifelse( n>0, n , "No Data")
library( tidyr)
library( ggplot2)
library( dplyr )
set.seed(1); tibble(x=factor(sample(LETTERS[1:7],7,replace = T),levels = LETTERS[1:7])) %>% group_by_all() %>% count(x,.drop = F) %>%
ggplot(mapping = aes(x=x,y=n))+geom_bar(stat="identity")+
geom_text(
aes(label = ifelse( n>0, n , ""), y = n + 0.05),
position = position_dodge(1),
vjust = 0)
You pass a function to the data argument inside geom_test, for this example you can do a subset on the piped data (referred as .x):
set.seed(1);
tibble(x=factor(sample(LETTERS[1:7],7,replace = T),levels = LETTERS[1:7])) %>% group_by_all() %>% count(x,.drop = F) %>%
ggplot(mapping = aes(x=x,y=n))+geom_bar(stat="identity")+
geom_text(data=~subset(.x,n>0),
aes(label = n, y = n + 0.05),
position = position_dodge(1),
vjust = 0)
I am using ggforce to create a plot like this. .
My goal is to facet this type of plot.
For background on how the chart was made, check out update 3 on this question. The only modification that I have made was adding a geom_segment between the x axis and the Y value positions.
The reason why I believe faceting this graph is either difficult, or even impossible, is because continuous value x coordinates are used to determine where the geom_arc_bar is positioned in space.
My only idea for getting this to work has been supplying each "characteristic" that I want to facet with a set of x coordinates (1,2,3). Initially, as I will demonstrate in my code, I worked with set of highly curated data. Ideally, I would like to scale this to a dataset with many variables.
In the example graph that I have provided, the Y value is from table8, filtered for rows with "DFT". The area of the half-circles is proportional to the values of DDFS and FDFS from table9. Ideally, I would like to be able to create a function allowing for the easy creation of these graphs, with perhaps 3 parameters, the data for the y value, and for both half circles.
Here is my data.
Here is the code that I have written thus far.
For making a single plot
#Filter desired Age and Measurement
table9 %>%
filter(Age == "6-11" & Measurement != 'DFS' ) %>%
select( SurveyYear, Total , Measurement ) %>%
arrange(SurveyYear) %>%
dplyr::rename(Percent = Total) -> table9
#Do the same for table 8.
table8 %>%
filter(Age == "6-11" & Measurement != "DS" & Measurement != "FS") %>%
select(SurveyYear, Total) %>%
dplyr::rename(Y = Total)-> table8
table8 <- table8 %>%
bind_rows(table8) %>%
arrange(Y) %>%
add_column(start = rep(c(-pi/2, pi/2), 3), x = c(1,1,2,2,3,3))
table8_9 <- bind_cols(table8,table9) %>%
select(-SurveyYear1)
#Create the plot
ggplot(table8_9) + geom_segment( aes(x=x, xend=x, y=0, yend=Y), size = 0.5, linetype="solid") +
geom_arc_bar(aes(x0 = x, y0 = Y, r0 = 0, r = sqrt((Percent*2)/pi)/20,
start = start, end = start + pi, fill = Measurement),
color = "black") + guides(fill = guide_legend(title = "Type", reverse = T)) +
guides(fill = guide_legend(title = "Measurement", reverse = F)) +
xlab("Survey Year") + ylab("Mean dfs") + coord_fixed() + theme_pubr() +
scale_y_continuous(expand = c(0, 0), limits = c(0, 5.5)) +
scale_x_continuous(breaks = 1:3, labels = paste0(c("1988-1994", "1999-2004", "2011-2014"))) +
scale_fill_discrete(labels = c("ds/dfs", "fs/dfs")) -> lolliPlot
lolliPlot
Attempt at many plots
#Filter for "DFS"
table8 <- table8 %>%
filter(Measurement=="DFS")
#Duplicate DF vertically, and add column specifying the start point for the arcs.
table8 <- table8 %>%
bind_rows(table8) %>%
add_column(start = rep(c(-pi/2, pi/2), length(.$SurveyYear)/2), x = rep(x = c(1,2,3),length(.$SurveyYear)/3)) %>%
arrange(Age, x)
#Bind two tables today, removing all of the characteristic columns from table 8.
table8_9 <- bind_cols(table8,table9) %>%
select(-Age1, -SurveyYear1, -Measurement) %>%
gather(key = Variable, value = Y, -x,-start,-Age, -SurveyYear, -Measurement1, -Total1, -Male1, -Female1, -'White, non-Hispanic1', -'Black, non-hispanic1', -'Mexican American1', -'Less than 100% FPG1', -'100-199% FPG1', -'Greater than 200% FPG1')
This is where I get stuck. I can't figure out a way to format the data so that I can facet the graph. If anybody has any ideas or advice, I would greatly appreciate it.
I've got a question regarding an edge case with ggplot2 in R.
They don't like you adding multiple legends, but I think this is a valid use case.
I've got a large economic dataset with the following variables.
year = year of observation
input_type = *labor* or *supply chain*
input_desc = specific type of labor (eg. plumbers OR building supplies respectively)
value = percentage of industry spending
And I'm building an area chart over approximately 15 years. There are 39 different input descriptions and so I'd like the user to see the two major components (internal employee spending OR outsourcing/supply spending)in two major color brackets (say green and blue), but ggplot won't let me group my colors in that way.
Here are a few things I tried.
Junk code to reproduce
spec_trend_pie<- data.frame("year"=c(2006,2006,2006,2006,2007,2007,2007,2007,2008,2008,2008,2008),
"input_type" = c("labor", "labor", "supply", "supply", "labor", "labor","supply","supply","labor","labor","supply","supply"),
"input_desc" = c("plumber" ,"manager", "pipe", "truck", "plumber" ,"manager", "pipe", "truck", "plumber" ,"manager", "pipe", "truck"),
"value" = c(1,2,3,4,4,3,2,1,1,2,3,4))
spec_broad <- ggplot(data = spec_trend_pie, aes(y = value, x = year, group = input_type, fill = input_desc)) + geom_area()
Which gave me
Error in f(...) : Aesthetics can not vary with a ribbon
And then I tried this
sff4 <- ggplot() +
geom_area(data=subset(spec_trend_pie, input_type="labor"), aes(y=value, x=variable, group=input_type, fill= input_desc)) +
geom_area(data=subset(spec_trend_pie, input_type="supply_chain"), aes(y=value, x=variable, group=input_type, fill= input_desc))
Which gave me this image...so closer...but not quite there.
To give you an idea of what is desired, here's an example of something I was able to do in GoogleSheets a long time ago.
It's a bit of a hack but forcats might help you out. I did a similar post earlier this week:
How to factor sub group by category?
First some base data
set.seed(123)
raw_data <-
tibble(
x = rep(1:20, each = 6),
rand = sample(1:120, 120) * (x/20),
group = rep(letters[1:6], times = 20),
cat = ifelse(group %in% letters[1:3], "group 1", "group 2")
) %>%
group_by(group) %>%
mutate(y = cumsum(rand)) %>%
ungroup()
Now, use factor levels to create gradients within colors
df <-
raw_data %>%
# create factors for group and category
mutate(
group = fct_reorder(group, y, max),
cat = fct_reorder(cat, y, max) # ordering in the stack
) %>%
arrange(cat, group) %>%
mutate(
group = fct_inorder(group), # takes the category into account first
group_fct = as.integer(group), # factor as integer
hue = as.integer(cat)*(360/n_distinct(cat)), # base hue values
light_base = 1-(group_fct)/(n_distinct(group)+2), # trust me
light = floor(light_base * 100) # new L value for hcl()
) %>%
mutate(hex = hcl(h = hue, l = light))
Create a lookup table for scale_fill_manual()
area_colors <-
df %>%
distinct(group, hex)
Lastly, make your plot
ggplot(df, aes(x, y, fill = group)) +
geom_area(position = "stack") +
scale_fill_manual(
values = area_colors$hex,
labels = area_colors$group
)