ggplot2 facet_wrap geom_text not accepting date values - r

I have a small data set, local, (5 observations) with two types: a and b.
Each observation has a Date field (p.start), a ratio, and a duration.
local
principal p.start duration allocated.days ratio
1 P 2015-03-18 1 162.0000 162.0000
2 V 2015-08-28 4 24.0000 6.0000
3 V 2015-09-03 1 89.0000 89.0000
4 V 2015-03-30 1 32.0000 32.0000
5 P 2015-01-29 1 150.1667 150.1667
str(local)
'data.frame': 5 obs. of 5 variables:
$ principal : chr "P" "V" "V" "V" ...
$ p.start : Date, format: "2015-03-18" "2015-08-28" "2015-09-03" "2015-03-30" ...
$ duration : Factor w/ 10 levels "1","2","3","4",..: 1 4 1 1 1
$ allocated.days: num 162 24 89 32 150
$ ratio : num 162 6 89 32 150
I have another data frame, stats, with text to be added to a faceted plot.
stats
principal xx yy zz
1 P 2015-02-28 145.8 Average = 156
2 V 2015-02-28 145.8 Average = 24
str(stats)
'data.frame': 2 obs. of 4 variables:
$ principal: chr "P" "V"
$ xx : Date, format: "2015-02-28" "2015-02-28"
$ yy : num 146 146
$ zz : chr "Average = 156" "Average = 24"
The following code fails:
p = ggplot (local, aes (x = p.start, y = ratio, size = duration))
p = p + geom_point (colour = "blue"); p
p = p + facet_wrap (~ principal, nrow = 2); p
p = p + geom_text(aes(x=xx, y=yy, label=zz), data= stats)
p
Error: Continuous value supplied to discrete scale
Any ideas? I'm missing something obvious.

The problem is that you are plotting from 2 data.frames, but your initial ggplot call includes aes parameters referring to just the local data.frame.
So although your geom_text specifies data=stats, it is still looking for size=duration.
The following line works for me:
ggplot(local) +
geom_point(aes(x=p.start, y=ratio, size=duration), colour="blue") +
facet_wrap(~ principal, nrow=2) +
geom_text(data=stats, aes(x=xx, y=yy, label=zz))

Just remove size = duration from ggplot (local, aes (x = p.start, y = ratio, size = duration)) and add it into geom_point (colour = "blue"). Then, it should work.
ggplot(local, aes(x=p.start, y=ratio))+
geom_point(colour="blue", aes(size=duration))+
facet_wrap(~principal, nrow=2)+
geom_text(aes(x=xx, y=yy, label=zz), data=stats)

Related

Plotting time-series data with a gap in r?

I have a data set that has missing data from about July 7th to July 19th. Graph of my dataset. You can see the data gap pretty easily. I would like to truncate it so that the gap isnt there and the before and after data butt up against each other. Something like this . I did try to follow the linked example but I dont understand how they set up xseq. I also tried just removing the offending dates and creating a dataframe without them but that didnt solve the problem.
Im not sure if its helpful but here is the existing code for the graph:
together <- ggplot() +
stat_summary(data = grid_pad, aes(x = DTT, y = grid_value, fill = 'Ambient'), geom='ribbon', fun.data = mean_cl_quantile, alpha = 0.25) +
stat_summary(data = grid_pad, aes(x = DTT, y = grid_value, color = 'Ambient'), geom='line', fun = mean, size = 0.9) +
stat_summary(data = turtle_pad, aes(x = DTT, y = turtle_value, fill = 'Turtle'), geom='ribbon', fun.data = mean_cl_quantile, alpha = 0.25) +
stat_summary(data = turtle_pad, aes(x = DTT, y = turtle_value, color = 'Turtle'), geom='line', fun = mean, size = 0.9) +
labs(x = "Date", y = "Temperature")+
scale_color_manual("Legend", values = c('Ambient' = '#1b9e77', 'Turtle' = '#d95f02'), labels = c(Ambient = 'Ambient Temp', Turtle = 'Turtle Temp')) +
scale_fill_manual("Legend", values = c('Ambient' = '#1b9e77', 'Turtle' = '#d95f02'), labels = c(Ambient = 'Ambient Temp', Turtle = 'Turtle Temp')) +
theme_classic() +
ggtitle("Ambient and Turtle Temperatures")+
ggeasy::easy_center_title()+
easy_remove_legend_title()
together
and here is the structure of my data:
> str(grid_pad)
grouped_df [142,800 x 3] (S3: grouped_df/tbl_df/tbl/data.frame)
$ Logger : Factor w/ 50 levels "TL1","TL11","TL12",..: 1 1 1 1 1 1 1 1 1 1 ...
$ DTT : POSIXct[1:142800], format: "2021-05-28 00:00:00" "2021-05-28 01:00:00" "2021-05-28 02:00:00" "2021-05-28 03:00:00" ...
$ grid_value: num [1:142800] NA NA NA NA NA 19.5 19.5 19.5 20 22 ...
- attr(*, "groups")= tibble [50 x 2] (S3: tbl_df/tbl/data.frame)
..$ Logger: Factor w/ 50 levels "TL1","TL11","TL12",..: 1 2 3 4 5 6 7 8 9 10 ...
> str(turtle_pad)
grouped_df [57,120 x 3] (S3: grouped_df/tbl_df/tbl/data.frame)
$ Name : Factor w/ 20 levels "F1","F11","F12",..: 1 1 1 1 1 1 1 1 1 1 ...
$ DTT : POSIXct[1:57120], format: "2021-05-28 00:00:00" "2021-05-28 01:00:00" "2021-05-28 02:00:00" "2021-05-28 03:00:00" ...
$ turtle_value: num [1:57120] NA NA NA NA NA NA NA NA NA NA ...
- attr(*, "groups")= tibble [20 x 2] (S3: tbl_df/tbl/data.frame)
..$ Name : Factor w/ 20 levels "F1","F11","F12",..: 1 2 3 4 5 6 7 8 9 10 ...
with base R, verbose:
df_with_gap <- data.frame(Name = gl(41, 1),
DTT = as.Date(Sys.Date()) + (-20:20),
turtle_value = c(runif(20), rep(NA, 5), runif(16))
)
rows_to_keep <- !is.na(df_with_gap$turtle_value)
## remove NAs
df_without_gap <- df_with_gap[rows_to_keep,]
## create some index to use for x-values ggplot
df_without_gap$pseudo_date <- rownames(df)
Please note:
while you could use DTT of the remaining values to label your axis (see label argument in ?scale_x_continuous`, the plot will be misleading as it covers up missing information)
a scatter plot would be the way to go if you want to show the association between ambient and turtle temperature.
to show seasonality of instead, consider adding a smoother (?geom_smooth for ggplot)
to convey variability, a boxplot might be more instructive
helpful chart pickers on the web

Violin plot error - Discrete value supplied to continuous scale

I am trying to create a series of Violin plots which show average concentration across different regions (separating out hemispheres and conditions).
I keep getting the following error: Error: Discrete value supplied to continuous scale. Any thoughts would be greatly appreciated.
Take care and stay well.
Here is a look at the structure of my data frame:
> str(Oxyhb_V2)
'data.frame': 1028 obs. of 7 variables:
$ ID : chr "B1" "B1" "B1" "B1" ...
$ Name : chr "Happy_HbO_LeftParietal_Value" "Happy_HbO_RightParietal_Value" "Happy_HbO_LeftSTC_Value" "Happy_HbO_RightSTC_Value" ...
$ Values : num -59.33 1.94 -33.85 21.11 -135.14 ...
$ Condition : Factor w/ 2 levels "Happy","ThreatAngryFearful": 1 1 1 1 1 1 1 1 2 2 ...
$ Chromophore: Factor w/ 1 level "HbO": 1 1 1 1 1 1 1 1 1 1 ...
$ Hemisphere : Factor w/ 2 levels "Left","Right": 1 2 1 2 1 2 1 2 1 2 ...
$ ROI : Factor w/ 4 levels "DLPFC","IFC",..: 3 3 4 4 1 1 2 2 3 3 ...
- attr(*, "na.action")= 'omit' Named int [1:520] 9 18 27 36 40 41 43 44 45 49 ...
..- attr(*, "names")= chr [1:520] "9" "27" "45" "63" ...
Here is my current ggplot code
q <- ggplot(Oxyhb_V2, aes(x=Hemisphere, y=Values, color=Condition)) +
facet_wrap(~ROI, scales='free') +
geom_vline(xintercept = 0, linetype = "dotted", color="black", alpha = .2) + #accentuate origin
geom_hline(yintercept = 0, linetype = "dotted", color="black", alpha = .2) + #accentuate origin
labs(x = "Condition", y = "Mean Oxy-Hb (uM)") + #label axes
theme(text=element_text(size=12)) +
geom_violin(trim=FALSE) +
geom_boxplot(width=0.1)+
geom_point() +#set label font size
theme_minimal() #set theme
plot(q)
The error is caused by geom_vline(xintercept = 0) layer. Replace 0 with one of the values of your x, for example geom_vline(xintercept = "Left")

Formula notation for scatterplot producing unexpected results

I am working on a map, where the color of each point is proportional to one response variable, and the size of the point is proportional to another. I've noticed that when I try to plot the points using formula notation things go haywire, while default notation performs as expected. I have used formula notation to plot maps many times before, and thought that the notations were nearly interchangeable. Why would these produce different results? I have read through the plot.formula and plot.default documentation and haven't been able to figure it out. Based on this I am wondering if it has to do with the columns of dat being coerced to factors, but I'm not sure why that would be happening. Any ideas?
Consider the following example data frame, dat:
latitude <- c(runif(10, min = 45, max = 48))
latitude[9] <- NA
longitude <- c(runif(10, min = -124.5, max = -122.5))
longitude[9] <- NA
color <- c("#00FFCCCC", "#99FF00CC", "#FF0000CC", "#3300FFCC", "#00FFCCCC",
"#00FFCCCC", "#3300FFCC", "#00FFCCCC", NA, "#3300FFCC")
size <- c(4.916667, 5.750000, 7.000000, 2.000000, 5.750000,
4.500000, 2.000000, 4.500000, NA, 2.000000)
dat <- as.data.frame(cbind(longitude, latitude, color, size))
Plotting according to formula notation
plot(latitude ~ longitude, data = dat, type = "p", pch = 21, col = 1, bg = color, cex = size)
produces
this mess and the following error: graphical parameter "type" is obsolete.
Plotting according to the default notation
plot(longitude, latitude, type = "p", pch = 21, col = 1, bg = color, cex = size)
works as expected, though with the same error.
There are a couple of problems with this. First is that your use of cbind is turning this into a matrix, albeit temporarily, which is converting your numbers to character. See:
dat <- as.data.frame(cbind(longitude, latitude, color, size))
str(dat)
# 'data.frame': 10 obs. of 4 variables:
# $ longitude: Factor w/ 9 levels "-122.855375511572",..: 6 8 9 1 4 3 2 7 NA 5
# $ latitude : Factor w/ 9 levels "45.5418886151165",..: 6 2 4 1 3 7 5 9 NA 8
# $ color : Factor w/ 4 levels "#00FFCCCC","#3300FFCC",..: 1 3 4 2 1 1 2 1 NA 2
# $ size : Factor w/ 5 levels "2","4.5","4.916667",..: 3 4 5 1 4 2 1 2 NA 1
If instead you just use data.frame, you'll get:
dat <- data.frame(longitude, latitude, color, size)
str(dat)
# 'data.frame': 10 obs. of 4 variables:
# $ longitude: num -124 -124 -124 -123 -124 ...
# $ latitude : num 47.3 45.9 46.3 45.5 46 ...
# $ color : Factor w/ 4 levels "#00FFCCCC","#3300FFCC",..: 1 3 4 2 1 1 2 1 NA 2
# $ size : num 4.92 5.75 7 2 5.75 ...
plot(latitude ~ longitude, data = dat, pch = 21, col = 1, bg = color, cex = size)
But now the colors are all dorked. Okay, the problem is likely because your $color is a factor, which is being interpreted internally as integers. Try stringsAsFactors=F:
dat <- data.frame(longitude, latitude, color, size, stringsAsFactors=FALSE)
str(dat)
# 'data.frame': 10 obs. of 4 variables:
# $ longitude: num -124 -124 -124 -123 -124 ...
# $ latitude : num 47.3 45.9 46.3 45.5 46 ...
# $ color : chr "#00FFCCCC" "#99FF00CC" "#FF0000CC" "#3300FFCC" ...
# $ size : num 4.92 5.75 7 2 5.75 ...
plot(latitude ~ longitude, data = dat, pch = 21, col = 1, bg = color, cex = size)

Shading a specific area using a density plot - ggplot2

I have a data visualization question regarding ggplot2.
I'm trying to figure out how can I shade a specificity area in my density_plot. I googled it a lot and I tried all solutions.
My code is:
original_12 <- data.frame(sum=rnorm(100,30,5), sex=c("M","F"))
cutoff_12 <- 35
ggplot(data=original_12, aes(original_12$sum)) + geom_density() +
facet_wrap(~sex) +
geom_vline(data=original_12, aes(xintercept=cutoff_12),
linetype="dashed", color="red", size=1)
So, from this:
I want this:
The question on ggplot2 shade area under density curve by group is different than mine because they use different groups and graphs.
Similar to this SO question except the facet adds an additional complexity.
You need to rename the PANEL data as "sex" and factor it correctly to match your already existing aesthetic option. Your original "sex" factor is ordered alphabetically (default data.frame option), which is a little confusing at first.
make sure you name your plot "p" to create a ggplot object:
p <- ggplot(data=original_12, aes(original_12$sum)) +
geom_density() +
facet_wrap(~sex) +
geom_vline(data=original_12, aes(xintercept=cutoff_12),
linetype="dashed", color="red", size=1)
The ggplot object data can be extracted...here is the structure of the data:
str(ggplot_build(p)$data[[1]])
'data.frame': 1024 obs. of 16 variables:
$ y : num 0.00114 0.00121 0.00129 0.00137 0.00145 ...
$ x : num 17 17 17.1 17.1 17.2 ...
$ density : num 0.00114 0.00121 0.00129 0.00137 0.00145 ...
$ scaled : num 0.0121 0.0128 0.0137 0.0145 0.0154 ...
$ count : num 0.0568 0.0604 0.0644 0.0684 0.0727 ...
$ n : int 50 50 50 50 50 50 50 50 50 50 ...
$ PANEL : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ group : int -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ ymin : num 0 0 0 0 0 0 0 0 0 0 ...
$ ymax : num 0.00114 0.00121 0.00129 0.00137 0.00145 ...
$ fill : logi NA NA NA NA NA NA ...
$ weight : num 1 1 1 1 1 1 1 1 1 1 ...
$ colour : chr "black" "black" "black" "black" ...
$ alpha : logi NA NA NA NA NA NA ...
$ size : num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
$ linetype: num 1 1 1 1 1 1 1 1 1 1 ...
It cannot be used directly because you need to rename the PANEL data and factor it to match your original dataset. You can extract the data from the ggplot object here:
to_fill <- data_frame(
x = ggplot_build(p)$data[[1]]$x,
y = ggplot_build(p)$data[[1]]$y,
sex = factor(ggplot_build(p)$data[[1]]$PANEL, levels = c(1,2), labels = c("F","M")))
p + geom_area(data = to_fill[to_fill$x >= 35, ],
aes(x=x, y=y), fill = "red")
#DATA
set.seed(2)
original_12 <- data.frame(sum=rnorm(100,30,5), sex=c("M","F"))
cutoff_12 <- 35
#Calculate density for each sex
temp = do.call(rbind, lapply(split(original_12, original_12$sex), function(a){
d = density(a$sum)
data.frame(sex = a$sex[1], x = d$x, y = d$y)
}))
#For each sex, seperate the data for the shaded area
temp2 = do.call(rbind, lapply(split(temp, temp$sex), function(a){
rbind(data.frame(sex = a$sex[1], x = cutoff_12, y = 0), a[a$x > cutoff_12,])
}))
#Plot
ggplot(temp) +
geom_line(aes(x = x, y = y)) +
geom_vline(xintercept = cutoff_12) +
geom_polygon(data = temp2, aes(x = x, y = y)) +
facet_wrap(~sex) +
theme_classic()

Error in stat_summary(fun.y) when plotting outliers in a modified ggplot-boxplot

I want to plot boxplots showing the 95 percentile instead of the IQR, including outliers as defined by exceeding the 95% criterion.
This code is working fine, and based on several answers found here and on the web:
f1 <- function(x) {
subset(x, x < quantile(x, probs=0.025)) # only for low outliers
}
f2 <- function(x) {
r <- quantile(x, probs = c(0.025, 0.25, 0.5, 0.75, 0.975))
names(r) <- c("ymin", "lower", "middle", "upper", "ymax")
r
}
d <- data.frame(x=gl(2,50), y=rnorm(100))
library(ggplot2)
p0 <- ggplot(d, aes(x,y)) +
stat_summary(fun.data = f2, geom="boxplot") + coord_flip()
p1 <- p0 + stat_summary(fun.y = f1, geom="point")
The structure of d is:
'data.frame': 100 obs. of 2 variables:
$ x: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ y: num 2.275 0.659 -0.821 -0.129 1.997 ...
Now, coming to my real data, which is structured essentially the same:
str(test)
'data.frame': 11830917 obs. of 2 variables:
$ x: Ord.factor w/ 34 levels "SG26"<"SG22"<..: 18 18 18 18 18 18 18 18 18 18 ...
$ y: num 84.6 84.1 93.3 84 93.2 94.3 83.3 92.5 94.5 98.8 ...
Now, if i am applying the same plot command, i get:
p0 <- ggplot(test, aes(x,y)) + stat_summary(fun.data = f2, geom="boxplot") + coord_flip()
p1 <- p0 + stat_summary(fun.y = f1, geom="point")
p1
Warning message:
Computation failed in `stat_summary()`:
Argumente implizieren unterschiedliche Anzahl Zeilen: 1, 0
The final line is the german version of "arguments imply differing number of rows 1 0". p0 is produced just fine.
What could be the difference between the two datasets?
The problem, as identified by #Heroka and #bdemarest, arose by one factor level having only one value.
My workaround is to skip those factors:
f1 <- function(x) {
if (length(x) > 7) {
return(subset(x, x < quantile(x, probs=0.025))) # only for low outliers
} else {
return(NA)
}
}
For unknown reasons, the problem persisted until there were at least 7 values per factor level.

Resources