I want to create plots from the following data, so that each plot Title is the Site. I have the following dataframe:
> head(sum_stats)
Season Site Isotope Time n mean sd se
1 Summer Afon Cadnant 14CAA 0 3 100.00000 0.000000 0.0000000
2 Summer Afon Cadnant 14CAA 2 3 68.26976 4.375331 2.5260988
3 Summer Afon Cadnant 14CAA 5 3 69.95398 7.885443 4.5526627
4 Summer Afon Cadnant 14CAA 24 3 36.84054 2.421846 1.3982532
5 Summer Afon Cadnant 14CAA 48 3 27.96619 0.829134 0.4787008
6 Summer Afon Cadnant 14CAA 72 3 26.28713 1.454819 0.8399404
> str(sum_stats)
'data.frame': 648 obs. of 8 variables:
$ Season : Factor w/ 1 level "Summer": 1 1 1 1 1 1 1 1 1 1 ...
$ Site : Factor w/ 27 levels "Afon Cadnant",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Isotope: Factor w/ 4 levels "14CAA","14CGlu",..: 1 1 1 1 1 1 2 2 2 2 ...
$ Time : num 0 2 5 24 48 72 0 2 5 24 ...
$ n : int 3 3 3 3 3 3 3 3 3 3 ...
$ mean : num 100 68.3 70 36.8 28 ...
$ sd : num 0 4.375 7.885 2.422 0.829 ...
$ se : num 0 2.526 4.553 1.398 0.479 ...
I have written a function to create plots of the above data:
plot_func <- function(T){ggplot(data = T) + geom_point(aes(Time, mean, colour = Season)) +
geom_line(aes(Time, mean, colour = Season)) +
geom_errorbar(aes(Time, mean, ymax = (mean + se), ymin = (mean - se)), width = 0.1) +
labs(title = unique(levels(sum_stats$Site)), y = "Percentage of isotope remaining in solution", x = "Time (h)") +
facet_wrap(~Isotope, ncol = 2)} + theme(axis.title.y = element_text(vjust = 1)) +
theme(axis.title.x = element_text(vjust = -0.1)) + theme(plot.title = element_text(vjust = 1)) +
theme_bw()
I then use the function in a by call to run the function over each level of the Site factor:
by(sum_stats, sum_stats$Site, plot_func)
I get 27 graphs of the following form:
However, all the titles are the same. How can I make each title reflect the factor level that it is plotting? Can this be done inside the plotting function?
Thanks
Right now you are setting the title using the original data.frame, and not the subset of data passed to you function. If all the sites are the same in the subset you receive, you can just use the first as the title. Use
...
labs(title = T$Site[1], ...)
...
Related
This question already has answers here:
Removing one level/group from Facet_wrap ggplot2 in R
(1 answer)
ggplot2: How do you select a subset of factor levels to be grouped into a single facet
(1 answer)
Closed 2 years ago.
I'm a relatively new user of R
I have a data frame "Lossl" as follows:
'data.frame': 100 obs. of 18 variables:
$ plot : chr "3" "1" "5" "1" ...
$ day : Factor w/ 3 levels "0","218","365": 1 1 1 1 1 1 1 1 2 2 ...
$ ID : chr "A014" "A047" "A110" "A125" ...
$ type : chr "litter" "litter" "litter" "litter" ...
$ species : Factor w/ 4 levels "birch leaves",..: 2 3 1 3 4 1 4 2 2 2 ...
$ treat : Factor w/ 2 levels "char","control": 2 2 2 1 2 1 1 1 2 2 ...
$ inimass : num 4.02 4 4.02 4 4.02 4 4.02 4 4.01 4.02 ...
$ inichar : num 0 0 0 0 0 0 0 0 0 0 ...
$ fresh.mass: num 4.02 4 4.02 4 4.02 4 4.02 4 4.62 4.46 ...
$ rem_g : num 4.02 4 4.02 4 4.02 4 4.02 4 3.45 3.55 ...
$ rem : num 100 100 100 100 100 ...
$ W : num 0 0 0 0 0 ...
$ Cot : num NA NA NA NA NA ...
I'm trying to create barplot with facets by factor 'day.'
ggplot(data=Lossr, aes(x=species, y=W)) +coord_cartesian(ylim=c(0,80)) +
scale_colour_manual(values=c("black", "3"))+
stat_summary(fun = mean, geom = 'bar', aes(fill=treat), colour='black', width=0.5, position=position_dodge(0.6)) +
scale_fill_manual(values=c("grey", "green")) +
stat_summary(fun.data = mean_se, geom = 'errorbar', width=0.5, position=position_dodge(0.6), aes(fill=treat)) +
facet_wrap(. ~day) + theme_bw()
The result is
So I wish to exclude "0" which is level 1 from factor "day",
if I do as follows:
ggplot(data=Lossr, aes(x=species, y=W)) +coord_cartesian(ylim=c(0,80)) + scale_colour_manual(values=c("black", "3"))+
stat_summary(fun = mean, geom = 'bar', aes(fill=treat), colour='black', width=0.5, position=position_dodge(0.6)) +
scale_fill_manual(values=c("grey", "green")) +
stat_summary(fun.data = mean_se, geom = 'errorbar', width=0.5, position=position_dodge(0.6), aes(fill=treat)) +
facet_wrap(~day==c("218", "365")) + theme_bw()
This is levels of factor which I need, but labels turned to logical true and false rather than days I had on the previous figure.
Could someone help me to fix this problem with labels?
Thank you.
I am trying to create a series of Violin plots which show average concentration across different regions (separating out hemispheres and conditions).
I keep getting the following error: Error: Discrete value supplied to continuous scale. Any thoughts would be greatly appreciated.
Take care and stay well.
Here is a look at the structure of my data frame:
> str(Oxyhb_V2)
'data.frame': 1028 obs. of 7 variables:
$ ID : chr "B1" "B1" "B1" "B1" ...
$ Name : chr "Happy_HbO_LeftParietal_Value" "Happy_HbO_RightParietal_Value" "Happy_HbO_LeftSTC_Value" "Happy_HbO_RightSTC_Value" ...
$ Values : num -59.33 1.94 -33.85 21.11 -135.14 ...
$ Condition : Factor w/ 2 levels "Happy","ThreatAngryFearful": 1 1 1 1 1 1 1 1 2 2 ...
$ Chromophore: Factor w/ 1 level "HbO": 1 1 1 1 1 1 1 1 1 1 ...
$ Hemisphere : Factor w/ 2 levels "Left","Right": 1 2 1 2 1 2 1 2 1 2 ...
$ ROI : Factor w/ 4 levels "DLPFC","IFC",..: 3 3 4 4 1 1 2 2 3 3 ...
- attr(*, "na.action")= 'omit' Named int [1:520] 9 18 27 36 40 41 43 44 45 49 ...
..- attr(*, "names")= chr [1:520] "9" "27" "45" "63" ...
Here is my current ggplot code
q <- ggplot(Oxyhb_V2, aes(x=Hemisphere, y=Values, color=Condition)) +
facet_wrap(~ROI, scales='free') +
geom_vline(xintercept = 0, linetype = "dotted", color="black", alpha = .2) + #accentuate origin
geom_hline(yintercept = 0, linetype = "dotted", color="black", alpha = .2) + #accentuate origin
labs(x = "Condition", y = "Mean Oxy-Hb (uM)") + #label axes
theme(text=element_text(size=12)) +
geom_violin(trim=FALSE) +
geom_boxplot(width=0.1)+
geom_point() +#set label font size
theme_minimal() #set theme
plot(q)
The error is caused by geom_vline(xintercept = 0) layer. Replace 0 with one of the values of your x, for example geom_vline(xintercept = "Left")
I have a data visualization question regarding ggplot2.
I'm trying to figure out how can I shade a specificity area in my density_plot. I googled it a lot and I tried all solutions.
My code is:
original_12 <- data.frame(sum=rnorm(100,30,5), sex=c("M","F"))
cutoff_12 <- 35
ggplot(data=original_12, aes(original_12$sum)) + geom_density() +
facet_wrap(~sex) +
geom_vline(data=original_12, aes(xintercept=cutoff_12),
linetype="dashed", color="red", size=1)
So, from this:
I want this:
The question on ggplot2 shade area under density curve by group is different than mine because they use different groups and graphs.
Similar to this SO question except the facet adds an additional complexity.
You need to rename the PANEL data as "sex" and factor it correctly to match your already existing aesthetic option. Your original "sex" factor is ordered alphabetically (default data.frame option), which is a little confusing at first.
make sure you name your plot "p" to create a ggplot object:
p <- ggplot(data=original_12, aes(original_12$sum)) +
geom_density() +
facet_wrap(~sex) +
geom_vline(data=original_12, aes(xintercept=cutoff_12),
linetype="dashed", color="red", size=1)
The ggplot object data can be extracted...here is the structure of the data:
str(ggplot_build(p)$data[[1]])
'data.frame': 1024 obs. of 16 variables:
$ y : num 0.00114 0.00121 0.00129 0.00137 0.00145 ...
$ x : num 17 17 17.1 17.1 17.2 ...
$ density : num 0.00114 0.00121 0.00129 0.00137 0.00145 ...
$ scaled : num 0.0121 0.0128 0.0137 0.0145 0.0154 ...
$ count : num 0.0568 0.0604 0.0644 0.0684 0.0727 ...
$ n : int 50 50 50 50 50 50 50 50 50 50 ...
$ PANEL : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ group : int -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ ymin : num 0 0 0 0 0 0 0 0 0 0 ...
$ ymax : num 0.00114 0.00121 0.00129 0.00137 0.00145 ...
$ fill : logi NA NA NA NA NA NA ...
$ weight : num 1 1 1 1 1 1 1 1 1 1 ...
$ colour : chr "black" "black" "black" "black" ...
$ alpha : logi NA NA NA NA NA NA ...
$ size : num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
$ linetype: num 1 1 1 1 1 1 1 1 1 1 ...
It cannot be used directly because you need to rename the PANEL data and factor it to match your original dataset. You can extract the data from the ggplot object here:
to_fill <- data_frame(
x = ggplot_build(p)$data[[1]]$x,
y = ggplot_build(p)$data[[1]]$y,
sex = factor(ggplot_build(p)$data[[1]]$PANEL, levels = c(1,2), labels = c("F","M")))
p + geom_area(data = to_fill[to_fill$x >= 35, ],
aes(x=x, y=y), fill = "red")
#DATA
set.seed(2)
original_12 <- data.frame(sum=rnorm(100,30,5), sex=c("M","F"))
cutoff_12 <- 35
#Calculate density for each sex
temp = do.call(rbind, lapply(split(original_12, original_12$sex), function(a){
d = density(a$sum)
data.frame(sex = a$sex[1], x = d$x, y = d$y)
}))
#For each sex, seperate the data for the shaded area
temp2 = do.call(rbind, lapply(split(temp, temp$sex), function(a){
rbind(data.frame(sex = a$sex[1], x = cutoff_12, y = 0), a[a$x > cutoff_12,])
}))
#Plot
ggplot(temp) +
geom_line(aes(x = x, y = y)) +
geom_vline(xintercept = cutoff_12) +
geom_polygon(data = temp2, aes(x = x, y = y)) +
facet_wrap(~sex) +
theme_classic()
Hej hej,
I would like to calculate growth rates, storing them in a new column of my data frame e.g. named growth.per.day. I am - as always - looking for a way that doesn't include hundreds and hundreds of lines of manually edited code.
I have six levels of algae and 25 levels of nutrients.
This means i have 150 "subgroups" for which i want to calculate the rates. Those subsets differ in length based on the individual algae.
So, basically:
Algae A ->
Nutrient (1) -> C.mikro.gr.L (Day 2) - C.mikro.gr.L (Day 1),C.mikro.gr.L (Day 3) - C.mikro.gr.L (Day 2) ... ;
Nutrient (2) -> C.mikro.gr.L (Day 2) - C.mikro.gr.L (Day 1),C.mikro.gr.L (Day 3) - C.mikro.gr.L (Day 2) ... etc.
I already split the data frame by algae
X <- split(data, data$ALGAE)
names(X) <- c("ANKI", "CHLAMY", "MIX_A", "MIX_B", "SCENE", "STAURA")
list2env(X, envir = .GlobalEnv)
and i have also split those again, creating the aforementioned lovely 150 subsets. Then i applied
ratio1$growth.per.day <- c(NA,ratio1[2:nrow(ratio1), 16] - ratio1[1:(nrow(ratio1)-1), 16])
which is perfect and does what i want, BUT i would really very much appreciate a shorter, more elegant way without butchering my dataframe.
'data.frame': 3550 obs. of 16 variables:
$ SAMPLE.ID : Factor w/ 150 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
$ COMMUNITY : chr "com.1" "com.1" "com.1" "com.1" ...
$ NUTRIENT : Factor w/ 25 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
$ RATIO : Factor w/ 23 levels "3.2","4","5.4",..: 11 9 6 4 1 14 10 8 5 2 ...
$ PHOS : Factor w/ 5 levels "0.09","0.195",..: 5 5 5 5 5 4 4 4 4 4 ...
$ NIT : Factor w/ 5 levels "1.5482","3.0964",..: 5 4 3 2 1 5 4 3 2 1 ...
$ DATUM : Factor w/ 35 levels "30.08.16","31.08.16",..: 1 1 1 1 1 1 1 1 1 1 ...
$ DAY : int 0 0 0 0 0 0 0 0 0 0 ...
$ TYPE : chr "mono" "mono" "mono" "mono" ...
$ ALGAE : Factor w/ 6 levels "ANK","CHLA","MIX A",..: 5 5 5 5 5 5 5 5 5 5 ...
$ MEAN : num 864 868 882 873 872 ...
$ GROW : num 0.00116 0.00115 0.00113 0.00115 0.00115 ...
$ FLUORO : num NA NA NA NA NA NA NA NA NA NA ...
$ MEAN.MQ : num 0.964 0.969 0.985 0.975 0.973 ...
$ GROW.MQ : num 1.04 1.03 1.02 1.03 1.03 ...
$ C.mikro.gr.L: num -764 -913 -1394 -1085 -1039 ...
I hope this sufficiently describes the problem,
Thanks so much!
Hope it is what you asked for:
df = data.frame(algae = sort(rep(LETTERS[1:6], 20)),
nutrient = rep(letters[22:26], 24),
day = rep(c(rep(1, 5),
rep(2, 5),
rep(3, 5),
rep(4, 5)), 6),
growth = runif(120, 30, 60))
library(dplyr)
df = df %>% group_by(algae, nutrient) %>% mutate(rate = c(NA, diff(growth, lag = 1)))
And there the table for alga A and nutrient v:
algae nutrient day growth rate
<fctr> <fctr> <dbl> <dbl> <dbl>
1 A v 1 48.68547 NA
2 A v 2 55.63570 6.950232
3 A v 3 53.28569 -2.350013
4 A v 4 44.83022 -8.455465
I need some help with these lines of code.
My data set:
> str(data.tidy)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 9480 obs. of 11 variables:
$ Country.Name : Factor w/ 248 levels "Afghanistan",..: 234 12 13 20 22 31 17 16 25 28 ...
$ Country.Code : Factor w/ 248 levels "ABW","AFG","AGO",..: 7 12 13 16 17 18 19 21 27 28 ...
$ Year : Factor w/ 56 levels "1960","1961",..: 1 1 1 1 1 1 1 1 1 1 ...
$ InfantMortality : num 137.3 20.3 37.3 29.5 186.9 ...
$ AdolFertilityRate: num 176.9 44.8 48.4 27.1 85.8 ...
$ FertilityRate : num 6.93 3.45 2.69 2.54 6.28 ...
$ LifeExpectancy : num 52.2 70.8 68.6 69.7 37.3 ...
$ TotalUnemp : num NA NA NA NA NA NA NA NA NA NA ...
$ TotalPop : num 92612 10276477 7047539 9153489 2431620 ...
$ Region : Factor w/ 8 levels "","East Asia & Pacific",..: 5 2 3 3 8 8 7 5 4 4 ...
$ IncomeGroup : Factor w/ 6 levels "","High income: nonOECD",..: 2 3 3 3 4 4 5 2 5 6 ...
Reference code that I want to 'functionize':
ggplot(data.tidy,aes(as.numeric(as.character(Year)),y=InfantMortality))+
geom_line(aes(color=Country.Name))+
facet_grid(.~IncomeGroup)+
theme(legend.position="none")+
theme(strip.text.x = element_text(size = 7))+
labs(x='Year', title='Change in mortality rate over time')+
geom_smooth(color='black')
I want to replace data.tidy, InfantMortality, IncomeGroup and title in the example above.
Here was my attempt at the code:
facetedlineplot <- function(df,y,facet,title){
ggplot(df,aes(as.numeric(as.character(Year)),y=y))+
geom_line(aes(color=Country.Name))+
facet_grid(.~facet)+
theme(legend.position="none")+
theme(strip.text.x = element_text(size = 7))+
labs(x='Year',title=title)+
geom_smooth(color='black')
}
The error:
> facetedlineplot(data.tidy,y = 'InfantMortality',facet = 'IncomeGroup',title = 'Title goes here')
Error in layout_base(data, cols, drop = drop) :
At least one layer must contain all variables used for facetting
I have tried aes_string, but I couldn't get it to work. What does the error mean? How can I work around this issue?
Update:
I have some code that partially works now, using reformulate()
facetedlineplot <- function(df,y,facet,title){
year <- as.numeric(as.character(df$Year))
ggplot(df,aes(x=year,y=y))+
geom_line(aes(color=Country.Name))+
facet_grid(paste('.~',reformulate(facet)))+
theme(legend.position="none")+
theme(strip.text.x = element_text(size = 7))+
labs(x='Year',title=title)+
geom_smooth(color='black')
}
> facetedlineplot(data.tidy,y = 'InfantMortality', facet = 'IncomeGroup', title = 'Title goes here')
Warning message:
Computation failed in `stat_smooth()`:
x has insufficient unique values to support 10 knots: reduce k.
>
Still, an incorrect plot>
Thank you in advance,
Rahul
I have the solution. Three steps worked for me:
- Change datatype of the Year variable in data.tidy from factor to numeric.
- Use aes_string for the ggplot argument
- For facet_grid(), many things worked:
Use as.formula() to pass '~IncomeGroup'
Just pass '~IncomeGroup' directly to facet_grid()
Final code:
facetedlineplot <- function(df,y,facet,title){
ggplot(df,aes_string(x = 'Year', y = y))+
geom_line(aes(color=Country.Name))+
facet_grid(facet)+
theme(legend.position="none")+
theme(strip.text.x = element_text(size = 9))+
labs(x='Year',title=title)+
geom_smooth(color='black')
}
d <- data.tidy
d$Year <- as.numeric(as.character(d$Year))
facetedlineplot(d,'InfantMortality','~IncomeGroup','Title')