I have a data visualization question regarding ggplot2.
I'm trying to figure out how can I shade a specificity area in my density_plot. I googled it a lot and I tried all solutions.
My code is:
original_12 <- data.frame(sum=rnorm(100,30,5), sex=c("M","F"))
cutoff_12 <- 35
ggplot(data=original_12, aes(original_12$sum)) + geom_density() +
facet_wrap(~sex) +
geom_vline(data=original_12, aes(xintercept=cutoff_12),
linetype="dashed", color="red", size=1)
So, from this:
I want this:
The question on ggplot2 shade area under density curve by group is different than mine because they use different groups and graphs.
Similar to this SO question except the facet adds an additional complexity.
You need to rename the PANEL data as "sex" and factor it correctly to match your already existing aesthetic option. Your original "sex" factor is ordered alphabetically (default data.frame option), which is a little confusing at first.
make sure you name your plot "p" to create a ggplot object:
p <- ggplot(data=original_12, aes(original_12$sum)) +
geom_density() +
facet_wrap(~sex) +
geom_vline(data=original_12, aes(xintercept=cutoff_12),
linetype="dashed", color="red", size=1)
The ggplot object data can be extracted...here is the structure of the data:
str(ggplot_build(p)$data[[1]])
'data.frame': 1024 obs. of 16 variables:
$ y : num 0.00114 0.00121 0.00129 0.00137 0.00145 ...
$ x : num 17 17 17.1 17.1 17.2 ...
$ density : num 0.00114 0.00121 0.00129 0.00137 0.00145 ...
$ scaled : num 0.0121 0.0128 0.0137 0.0145 0.0154 ...
$ count : num 0.0568 0.0604 0.0644 0.0684 0.0727 ...
$ n : int 50 50 50 50 50 50 50 50 50 50 ...
$ PANEL : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ group : int -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ ymin : num 0 0 0 0 0 0 0 0 0 0 ...
$ ymax : num 0.00114 0.00121 0.00129 0.00137 0.00145 ...
$ fill : logi NA NA NA NA NA NA ...
$ weight : num 1 1 1 1 1 1 1 1 1 1 ...
$ colour : chr "black" "black" "black" "black" ...
$ alpha : logi NA NA NA NA NA NA ...
$ size : num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
$ linetype: num 1 1 1 1 1 1 1 1 1 1 ...
It cannot be used directly because you need to rename the PANEL data and factor it to match your original dataset. You can extract the data from the ggplot object here:
to_fill <- data_frame(
x = ggplot_build(p)$data[[1]]$x,
y = ggplot_build(p)$data[[1]]$y,
sex = factor(ggplot_build(p)$data[[1]]$PANEL, levels = c(1,2), labels = c("F","M")))
p + geom_area(data = to_fill[to_fill$x >= 35, ],
aes(x=x, y=y), fill = "red")
#DATA
set.seed(2)
original_12 <- data.frame(sum=rnorm(100,30,5), sex=c("M","F"))
cutoff_12 <- 35
#Calculate density for each sex
temp = do.call(rbind, lapply(split(original_12, original_12$sex), function(a){
d = density(a$sum)
data.frame(sex = a$sex[1], x = d$x, y = d$y)
}))
#For each sex, seperate the data for the shaded area
temp2 = do.call(rbind, lapply(split(temp, temp$sex), function(a){
rbind(data.frame(sex = a$sex[1], x = cutoff_12, y = 0), a[a$x > cutoff_12,])
}))
#Plot
ggplot(temp) +
geom_line(aes(x = x, y = y)) +
geom_vline(xintercept = cutoff_12) +
geom_polygon(data = temp2, aes(x = x, y = y)) +
facet_wrap(~sex) +
theme_classic()
Related
This question already has answers here:
Removing one level/group from Facet_wrap ggplot2 in R
(1 answer)
ggplot2: How do you select a subset of factor levels to be grouped into a single facet
(1 answer)
Closed 2 years ago.
I'm a relatively new user of R
I have a data frame "Lossl" as follows:
'data.frame': 100 obs. of 18 variables:
$ plot : chr "3" "1" "5" "1" ...
$ day : Factor w/ 3 levels "0","218","365": 1 1 1 1 1 1 1 1 2 2 ...
$ ID : chr "A014" "A047" "A110" "A125" ...
$ type : chr "litter" "litter" "litter" "litter" ...
$ species : Factor w/ 4 levels "birch leaves",..: 2 3 1 3 4 1 4 2 2 2 ...
$ treat : Factor w/ 2 levels "char","control": 2 2 2 1 2 1 1 1 2 2 ...
$ inimass : num 4.02 4 4.02 4 4.02 4 4.02 4 4.01 4.02 ...
$ inichar : num 0 0 0 0 0 0 0 0 0 0 ...
$ fresh.mass: num 4.02 4 4.02 4 4.02 4 4.02 4 4.62 4.46 ...
$ rem_g : num 4.02 4 4.02 4 4.02 4 4.02 4 3.45 3.55 ...
$ rem : num 100 100 100 100 100 ...
$ W : num 0 0 0 0 0 ...
$ Cot : num NA NA NA NA NA ...
I'm trying to create barplot with facets by factor 'day.'
ggplot(data=Lossr, aes(x=species, y=W)) +coord_cartesian(ylim=c(0,80)) +
scale_colour_manual(values=c("black", "3"))+
stat_summary(fun = mean, geom = 'bar', aes(fill=treat), colour='black', width=0.5, position=position_dodge(0.6)) +
scale_fill_manual(values=c("grey", "green")) +
stat_summary(fun.data = mean_se, geom = 'errorbar', width=0.5, position=position_dodge(0.6), aes(fill=treat)) +
facet_wrap(. ~day) + theme_bw()
The result is
So I wish to exclude "0" which is level 1 from factor "day",
if I do as follows:
ggplot(data=Lossr, aes(x=species, y=W)) +coord_cartesian(ylim=c(0,80)) + scale_colour_manual(values=c("black", "3"))+
stat_summary(fun = mean, geom = 'bar', aes(fill=treat), colour='black', width=0.5, position=position_dodge(0.6)) +
scale_fill_manual(values=c("grey", "green")) +
stat_summary(fun.data = mean_se, geom = 'errorbar', width=0.5, position=position_dodge(0.6), aes(fill=treat)) +
facet_wrap(~day==c("218", "365")) + theme_bw()
This is levels of factor which I need, but labels turned to logical true and false rather than days I had on the previous figure.
Could someone help me to fix this problem with labels?
Thank you.
I am trying to create a series of Violin plots which show average concentration across different regions (separating out hemispheres and conditions).
I keep getting the following error: Error: Discrete value supplied to continuous scale. Any thoughts would be greatly appreciated.
Take care and stay well.
Here is a look at the structure of my data frame:
> str(Oxyhb_V2)
'data.frame': 1028 obs. of 7 variables:
$ ID : chr "B1" "B1" "B1" "B1" ...
$ Name : chr "Happy_HbO_LeftParietal_Value" "Happy_HbO_RightParietal_Value" "Happy_HbO_LeftSTC_Value" "Happy_HbO_RightSTC_Value" ...
$ Values : num -59.33 1.94 -33.85 21.11 -135.14 ...
$ Condition : Factor w/ 2 levels "Happy","ThreatAngryFearful": 1 1 1 1 1 1 1 1 2 2 ...
$ Chromophore: Factor w/ 1 level "HbO": 1 1 1 1 1 1 1 1 1 1 ...
$ Hemisphere : Factor w/ 2 levels "Left","Right": 1 2 1 2 1 2 1 2 1 2 ...
$ ROI : Factor w/ 4 levels "DLPFC","IFC",..: 3 3 4 4 1 1 2 2 3 3 ...
- attr(*, "na.action")= 'omit' Named int [1:520] 9 18 27 36 40 41 43 44 45 49 ...
..- attr(*, "names")= chr [1:520] "9" "27" "45" "63" ...
Here is my current ggplot code
q <- ggplot(Oxyhb_V2, aes(x=Hemisphere, y=Values, color=Condition)) +
facet_wrap(~ROI, scales='free') +
geom_vline(xintercept = 0, linetype = "dotted", color="black", alpha = .2) + #accentuate origin
geom_hline(yintercept = 0, linetype = "dotted", color="black", alpha = .2) + #accentuate origin
labs(x = "Condition", y = "Mean Oxy-Hb (uM)") + #label axes
theme(text=element_text(size=12)) +
geom_violin(trim=FALSE) +
geom_boxplot(width=0.1)+
geom_point() +#set label font size
theme_minimal() #set theme
plot(q)
The error is caused by geom_vline(xintercept = 0) layer. Replace 0 with one of the values of your x, for example geom_vline(xintercept = "Left")
I have a small data set, local, (5 observations) with two types: a and b.
Each observation has a Date field (p.start), a ratio, and a duration.
local
principal p.start duration allocated.days ratio
1 P 2015-03-18 1 162.0000 162.0000
2 V 2015-08-28 4 24.0000 6.0000
3 V 2015-09-03 1 89.0000 89.0000
4 V 2015-03-30 1 32.0000 32.0000
5 P 2015-01-29 1 150.1667 150.1667
str(local)
'data.frame': 5 obs. of 5 variables:
$ principal : chr "P" "V" "V" "V" ...
$ p.start : Date, format: "2015-03-18" "2015-08-28" "2015-09-03" "2015-03-30" ...
$ duration : Factor w/ 10 levels "1","2","3","4",..: 1 4 1 1 1
$ allocated.days: num 162 24 89 32 150
$ ratio : num 162 6 89 32 150
I have another data frame, stats, with text to be added to a faceted plot.
stats
principal xx yy zz
1 P 2015-02-28 145.8 Average = 156
2 V 2015-02-28 145.8 Average = 24
str(stats)
'data.frame': 2 obs. of 4 variables:
$ principal: chr "P" "V"
$ xx : Date, format: "2015-02-28" "2015-02-28"
$ yy : num 146 146
$ zz : chr "Average = 156" "Average = 24"
The following code fails:
p = ggplot (local, aes (x = p.start, y = ratio, size = duration))
p = p + geom_point (colour = "blue"); p
p = p + facet_wrap (~ principal, nrow = 2); p
p = p + geom_text(aes(x=xx, y=yy, label=zz), data= stats)
p
Error: Continuous value supplied to discrete scale
Any ideas? I'm missing something obvious.
The problem is that you are plotting from 2 data.frames, but your initial ggplot call includes aes parameters referring to just the local data.frame.
So although your geom_text specifies data=stats, it is still looking for size=duration.
The following line works for me:
ggplot(local) +
geom_point(aes(x=p.start, y=ratio, size=duration), colour="blue") +
facet_wrap(~ principal, nrow=2) +
geom_text(data=stats, aes(x=xx, y=yy, label=zz))
Just remove size = duration from ggplot (local, aes (x = p.start, y = ratio, size = duration)) and add it into geom_point (colour = "blue"). Then, it should work.
ggplot(local, aes(x=p.start, y=ratio))+
geom_point(colour="blue", aes(size=duration))+
facet_wrap(~principal, nrow=2)+
geom_text(aes(x=xx, y=yy, label=zz), data=stats)
My goal is to plot this shapefile colored by a specific column.
It contains 100 polygons. I apply fortify() on it and join some missing columns
# convert SpPolyDaFrame into normal dataFrame for plotting
data.df = fortify(data)
# join missing columns
data#data$id = rownames(data#data)
data.df$perc_ch = data#data$perc_ch
data.df = left_join(data.df, data#data, by=c('id'='id'))
After calling fortify(), every entry exists five times. (see 'order').
Calling str() on 'data.df':
'data.frame': 500 obs. of 11 variables:
$ long : num 421667 421667 416057 416057 421667 ...
$ lat : num 8064442 8060421 8060421 8064442 8064442 ...
$ order : int 1 2 3 4 5 1 2 3 4 5 ...
$ hole : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ piece : Factor w/ 1 level "1": 1 1 1 1 1 1 1 1 1 1 ...
$ id : chr "0" "0" "0" "0" ...
$ group : Factor w/ 100 levels "0.1","1.1","2.1",..: 1 1 1 1 1 2 2 2 2 2 ...
$ perc_ch.x: num 17.4 11.4 20.5 12 15 ...
$ z : int 1 1 1 1 1 2 2 2 2 2 ...
$ Ch_area : num 3914498 3914498 3914498 3914498 3914498 ...
$ perc_ch.y: num 17.4 17.4 17.4 17.4 17.4 ...
This is introduced by fortify(). However, it does not change the plot outcome as long as I join the missing columns based on a matching column (= perc_ch.y).
If I add missing columns without a matching index (=perc_ch.x), I run in troubles because of the redundant entries because wrong values are assigned to the polygons.
I do not see a reason for this copy effect?
No need to bind the data to the polygons:
library(rgeos)
library(maptools)
library(rgdal)
URL <- "https://www.dropbox.com/s/rsr49jwm1pf9abu/data.zip?dl=1"
fil <- "sodata.zip"
if (!file.exists(fil)) download.file(URL, fil)
fils <- unzip(fil)
shp <- grep("shp$", fils, value=TRUE)
geo <- readOGR(shp, ogrListLayers(shp)[[1]], stringsAsFactors=FALSE, verbose=FALSE)
geo_map <- fortify(geo, region="z")
gg <- ggplot()
gg <- gg + geom_map(data=geo_map, map=geo_map,
aes(x=long, y=lat, map_id=id),
color=NA, size=0, fill=NA)
gg <- gg + geom_map(data=geo#data, map=geo_map,
aes(fill=perc_ch, map_id=z),
color="#2b2b2b", size=0.15)
gg <- gg + viridis::scale_fill_viridis()
gg <- gg + ggthemes::theme_map()
gg <- gg + theme(legend.position="right")
gg
I want to create plots from the following data, so that each plot Title is the Site. I have the following dataframe:
> head(sum_stats)
Season Site Isotope Time n mean sd se
1 Summer Afon Cadnant 14CAA 0 3 100.00000 0.000000 0.0000000
2 Summer Afon Cadnant 14CAA 2 3 68.26976 4.375331 2.5260988
3 Summer Afon Cadnant 14CAA 5 3 69.95398 7.885443 4.5526627
4 Summer Afon Cadnant 14CAA 24 3 36.84054 2.421846 1.3982532
5 Summer Afon Cadnant 14CAA 48 3 27.96619 0.829134 0.4787008
6 Summer Afon Cadnant 14CAA 72 3 26.28713 1.454819 0.8399404
> str(sum_stats)
'data.frame': 648 obs. of 8 variables:
$ Season : Factor w/ 1 level "Summer": 1 1 1 1 1 1 1 1 1 1 ...
$ Site : Factor w/ 27 levels "Afon Cadnant",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Isotope: Factor w/ 4 levels "14CAA","14CGlu",..: 1 1 1 1 1 1 2 2 2 2 ...
$ Time : num 0 2 5 24 48 72 0 2 5 24 ...
$ n : int 3 3 3 3 3 3 3 3 3 3 ...
$ mean : num 100 68.3 70 36.8 28 ...
$ sd : num 0 4.375 7.885 2.422 0.829 ...
$ se : num 0 2.526 4.553 1.398 0.479 ...
I have written a function to create plots of the above data:
plot_func <- function(T){ggplot(data = T) + geom_point(aes(Time, mean, colour = Season)) +
geom_line(aes(Time, mean, colour = Season)) +
geom_errorbar(aes(Time, mean, ymax = (mean + se), ymin = (mean - se)), width = 0.1) +
labs(title = unique(levels(sum_stats$Site)), y = "Percentage of isotope remaining in solution", x = "Time (h)") +
facet_wrap(~Isotope, ncol = 2)} + theme(axis.title.y = element_text(vjust = 1)) +
theme(axis.title.x = element_text(vjust = -0.1)) + theme(plot.title = element_text(vjust = 1)) +
theme_bw()
I then use the function in a by call to run the function over each level of the Site factor:
by(sum_stats, sum_stats$Site, plot_func)
I get 27 graphs of the following form:
However, all the titles are the same. How can I make each title reflect the factor level that it is plotting? Can this be done inside the plotting function?
Thanks
Right now you are setting the title using the original data.frame, and not the subset of data passed to you function. If all the sites are the same in the subset you receive, you can just use the first as the title. Use
...
labs(title = T$Site[1], ...)
...