Plotting time-series data with a gap in r? - r

I have a data set that has missing data from about July 7th to July 19th. Graph of my dataset. You can see the data gap pretty easily. I would like to truncate it so that the gap isnt there and the before and after data butt up against each other. Something like this . I did try to follow the linked example but I dont understand how they set up xseq. I also tried just removing the offending dates and creating a dataframe without them but that didnt solve the problem.
Im not sure if its helpful but here is the existing code for the graph:
together <- ggplot() +
stat_summary(data = grid_pad, aes(x = DTT, y = grid_value, fill = 'Ambient'), geom='ribbon', fun.data = mean_cl_quantile, alpha = 0.25) +
stat_summary(data = grid_pad, aes(x = DTT, y = grid_value, color = 'Ambient'), geom='line', fun = mean, size = 0.9) +
stat_summary(data = turtle_pad, aes(x = DTT, y = turtle_value, fill = 'Turtle'), geom='ribbon', fun.data = mean_cl_quantile, alpha = 0.25) +
stat_summary(data = turtle_pad, aes(x = DTT, y = turtle_value, color = 'Turtle'), geom='line', fun = mean, size = 0.9) +
labs(x = "Date", y = "Temperature")+
scale_color_manual("Legend", values = c('Ambient' = '#1b9e77', 'Turtle' = '#d95f02'), labels = c(Ambient = 'Ambient Temp', Turtle = 'Turtle Temp')) +
scale_fill_manual("Legend", values = c('Ambient' = '#1b9e77', 'Turtle' = '#d95f02'), labels = c(Ambient = 'Ambient Temp', Turtle = 'Turtle Temp')) +
theme_classic() +
ggtitle("Ambient and Turtle Temperatures")+
ggeasy::easy_center_title()+
easy_remove_legend_title()
together
and here is the structure of my data:
> str(grid_pad)
grouped_df [142,800 x 3] (S3: grouped_df/tbl_df/tbl/data.frame)
$ Logger : Factor w/ 50 levels "TL1","TL11","TL12",..: 1 1 1 1 1 1 1 1 1 1 ...
$ DTT : POSIXct[1:142800], format: "2021-05-28 00:00:00" "2021-05-28 01:00:00" "2021-05-28 02:00:00" "2021-05-28 03:00:00" ...
$ grid_value: num [1:142800] NA NA NA NA NA 19.5 19.5 19.5 20 22 ...
- attr(*, "groups")= tibble [50 x 2] (S3: tbl_df/tbl/data.frame)
..$ Logger: Factor w/ 50 levels "TL1","TL11","TL12",..: 1 2 3 4 5 6 7 8 9 10 ...
> str(turtle_pad)
grouped_df [57,120 x 3] (S3: grouped_df/tbl_df/tbl/data.frame)
$ Name : Factor w/ 20 levels "F1","F11","F12",..: 1 1 1 1 1 1 1 1 1 1 ...
$ DTT : POSIXct[1:57120], format: "2021-05-28 00:00:00" "2021-05-28 01:00:00" "2021-05-28 02:00:00" "2021-05-28 03:00:00" ...
$ turtle_value: num [1:57120] NA NA NA NA NA NA NA NA NA NA ...
- attr(*, "groups")= tibble [20 x 2] (S3: tbl_df/tbl/data.frame)
..$ Name : Factor w/ 20 levels "F1","F11","F12",..: 1 2 3 4 5 6 7 8 9 10 ...

with base R, verbose:
df_with_gap <- data.frame(Name = gl(41, 1),
DTT = as.Date(Sys.Date()) + (-20:20),
turtle_value = c(runif(20), rep(NA, 5), runif(16))
)
rows_to_keep <- !is.na(df_with_gap$turtle_value)
## remove NAs
df_without_gap <- df_with_gap[rows_to_keep,]
## create some index to use for x-values ggplot
df_without_gap$pseudo_date <- rownames(df)
Please note:
while you could use DTT of the remaining values to label your axis (see label argument in ?scale_x_continuous`, the plot will be misleading as it covers up missing information)
a scatter plot would be the way to go if you want to show the association between ambient and turtle temperature.
to show seasonality of instead, consider adding a smoother (?geom_smooth for ggplot)
to convey variability, a boxplot might be more instructive
helpful chart pickers on the web

Related

Violin plot error - Discrete value supplied to continuous scale

I am trying to create a series of Violin plots which show average concentration across different regions (separating out hemispheres and conditions).
I keep getting the following error: Error: Discrete value supplied to continuous scale. Any thoughts would be greatly appreciated.
Take care and stay well.
Here is a look at the structure of my data frame:
> str(Oxyhb_V2)
'data.frame': 1028 obs. of 7 variables:
$ ID : chr "B1" "B1" "B1" "B1" ...
$ Name : chr "Happy_HbO_LeftParietal_Value" "Happy_HbO_RightParietal_Value" "Happy_HbO_LeftSTC_Value" "Happy_HbO_RightSTC_Value" ...
$ Values : num -59.33 1.94 -33.85 21.11 -135.14 ...
$ Condition : Factor w/ 2 levels "Happy","ThreatAngryFearful": 1 1 1 1 1 1 1 1 2 2 ...
$ Chromophore: Factor w/ 1 level "HbO": 1 1 1 1 1 1 1 1 1 1 ...
$ Hemisphere : Factor w/ 2 levels "Left","Right": 1 2 1 2 1 2 1 2 1 2 ...
$ ROI : Factor w/ 4 levels "DLPFC","IFC",..: 3 3 4 4 1 1 2 2 3 3 ...
- attr(*, "na.action")= 'omit' Named int [1:520] 9 18 27 36 40 41 43 44 45 49 ...
..- attr(*, "names")= chr [1:520] "9" "27" "45" "63" ...
Here is my current ggplot code
q <- ggplot(Oxyhb_V2, aes(x=Hemisphere, y=Values, color=Condition)) +
facet_wrap(~ROI, scales='free') +
geom_vline(xintercept = 0, linetype = "dotted", color="black", alpha = .2) + #accentuate origin
geom_hline(yintercept = 0, linetype = "dotted", color="black", alpha = .2) + #accentuate origin
labs(x = "Condition", y = "Mean Oxy-Hb (uM)") + #label axes
theme(text=element_text(size=12)) +
geom_violin(trim=FALSE) +
geom_boxplot(width=0.1)+
geom_point() +#set label font size
theme_minimal() #set theme
plot(q)
The error is caused by geom_vline(xintercept = 0) layer. Replace 0 with one of the values of your x, for example geom_vline(xintercept = "Left")

Formula notation for scatterplot producing unexpected results

I am working on a map, where the color of each point is proportional to one response variable, and the size of the point is proportional to another. I've noticed that when I try to plot the points using formula notation things go haywire, while default notation performs as expected. I have used formula notation to plot maps many times before, and thought that the notations were nearly interchangeable. Why would these produce different results? I have read through the plot.formula and plot.default documentation and haven't been able to figure it out. Based on this I am wondering if it has to do with the columns of dat being coerced to factors, but I'm not sure why that would be happening. Any ideas?
Consider the following example data frame, dat:
latitude <- c(runif(10, min = 45, max = 48))
latitude[9] <- NA
longitude <- c(runif(10, min = -124.5, max = -122.5))
longitude[9] <- NA
color <- c("#00FFCCCC", "#99FF00CC", "#FF0000CC", "#3300FFCC", "#00FFCCCC",
"#00FFCCCC", "#3300FFCC", "#00FFCCCC", NA, "#3300FFCC")
size <- c(4.916667, 5.750000, 7.000000, 2.000000, 5.750000,
4.500000, 2.000000, 4.500000, NA, 2.000000)
dat <- as.data.frame(cbind(longitude, latitude, color, size))
Plotting according to formula notation
plot(latitude ~ longitude, data = dat, type = "p", pch = 21, col = 1, bg = color, cex = size)
produces
this mess and the following error: graphical parameter "type" is obsolete.
Plotting according to the default notation
plot(longitude, latitude, type = "p", pch = 21, col = 1, bg = color, cex = size)
works as expected, though with the same error.
There are a couple of problems with this. First is that your use of cbind is turning this into a matrix, albeit temporarily, which is converting your numbers to character. See:
dat <- as.data.frame(cbind(longitude, latitude, color, size))
str(dat)
# 'data.frame': 10 obs. of 4 variables:
# $ longitude: Factor w/ 9 levels "-122.855375511572",..: 6 8 9 1 4 3 2 7 NA 5
# $ latitude : Factor w/ 9 levels "45.5418886151165",..: 6 2 4 1 3 7 5 9 NA 8
# $ color : Factor w/ 4 levels "#00FFCCCC","#3300FFCC",..: 1 3 4 2 1 1 2 1 NA 2
# $ size : Factor w/ 5 levels "2","4.5","4.916667",..: 3 4 5 1 4 2 1 2 NA 1
If instead you just use data.frame, you'll get:
dat <- data.frame(longitude, latitude, color, size)
str(dat)
# 'data.frame': 10 obs. of 4 variables:
# $ longitude: num -124 -124 -124 -123 -124 ...
# $ latitude : num 47.3 45.9 46.3 45.5 46 ...
# $ color : Factor w/ 4 levels "#00FFCCCC","#3300FFCC",..: 1 3 4 2 1 1 2 1 NA 2
# $ size : num 4.92 5.75 7 2 5.75 ...
plot(latitude ~ longitude, data = dat, pch = 21, col = 1, bg = color, cex = size)
But now the colors are all dorked. Okay, the problem is likely because your $color is a factor, which is being interpreted internally as integers. Try stringsAsFactors=F:
dat <- data.frame(longitude, latitude, color, size, stringsAsFactors=FALSE)
str(dat)
# 'data.frame': 10 obs. of 4 variables:
# $ longitude: num -124 -124 -124 -123 -124 ...
# $ latitude : num 47.3 45.9 46.3 45.5 46 ...
# $ color : chr "#00FFCCCC" "#99FF00CC" "#FF0000CC" "#3300FFCC" ...
# $ size : num 4.92 5.75 7 2 5.75 ...
plot(latitude ~ longitude, data = dat, pch = 21, col = 1, bg = color, cex = size)

Plotting quantity of teachers per county on a map in R

I have two data sets MASS and MASS2 to create a map in R. I got the first one with the help of library(ggmap).
counties<-map_data('county')
MASS<-map_data('county', 'massachusetts')
str(MASS)
data.frame': 744 obs. of 6 variables:
$ long : num -70.7 -70.5 -70.5 -70.5 -70.5 ...
$ lat : num 41.7 41.8 41.8 41.8 41.8 ...
$ group : num 1 1 1 1 1 1 1 1 1 1 ...
$ state : chr "massachusetts" "massachusetts" "massachusetts" ...
$ county_name: chr "barnstable" "barnstable" "barnstable" "barnstable" ...
The second consists of 14 points one per each county and has a teacher's quantity data per that county.
str(MASS2)
'data.frame': 14 obs. of 6 variables:
$ state : chr "massachusetts" "massachusetts" "massachusetts" ...
$ county_name : chr "barnstable" "berkshire" "bristol" "dukes" ...
$ long : num -70.7 -73.5 -71.2 -70.5 -71 ...
$ lat : num 41.7 42 41.7 41.4 42.4 ...
$ group : num 1 2 3 4 5 6 7 8 9 10 ...
$ teacher_count: int 62 40 47 ...
I need to create a map where each teacher_count point will be represented by a circle in accordance with teacher's amount. So far I'm getting just one size circles.
My code is next:
ggplot(MASS, aes(long,lat, group = group)) +
geom_polygon(aes(fill = county_name),colour = "black") +
geom_point(data = MASS2, aes(x = long, y = lat), color = "red", size = 5)+
theme(legend.position="none") +
coord_quickmap()
This is the map I get
I found one solution online which offers to represent the size in geom_point as
+geom_point(......, size = MASS2$teacher_count*circle_scale_amt)+
scale_size_continuous(range=range(MA$teacher_count))
but R can't find circle_scale_amt.
I am a new to R and trying to learn. Will appreciate ideas for any other ways to represent the teachers by their quantity! Thank you!
This works for me after setting a value for circle_scale_amt to rescale the size of the points otherwise they would be too big.
library(ggmap)
counties <- map_data('county')
MASS <- map_data('county', 'massachusetts')
circle_scale_amt <- 0.05
ggplot(MASS, aes(long,lat, group = group)) +
geom_polygon(aes(fill = subregion),colour = "black") +
geom_point(data = MASS2, aes(x = long, y = lat),
size = MASS2$teacher_count * circle_scale_amt,
color = "red", alpha = 0.6)+
scale_size_continuous(range = range(MASS2$teacher_count)) +
theme(legend.position="none") +
coord_quickmap()
Created on 2018-03-16 by the reprex package (v0.2.0).

Shading a specific area using a density plot - ggplot2

I have a data visualization question regarding ggplot2.
I'm trying to figure out how can I shade a specificity area in my density_plot. I googled it a lot and I tried all solutions.
My code is:
original_12 <- data.frame(sum=rnorm(100,30,5), sex=c("M","F"))
cutoff_12 <- 35
ggplot(data=original_12, aes(original_12$sum)) + geom_density() +
facet_wrap(~sex) +
geom_vline(data=original_12, aes(xintercept=cutoff_12),
linetype="dashed", color="red", size=1)
So, from this:
I want this:
The question on ggplot2 shade area under density curve by group is different than mine because they use different groups and graphs.
Similar to this SO question except the facet adds an additional complexity.
You need to rename the PANEL data as "sex" and factor it correctly to match your already existing aesthetic option. Your original "sex" factor is ordered alphabetically (default data.frame option), which is a little confusing at first.
make sure you name your plot "p" to create a ggplot object:
p <- ggplot(data=original_12, aes(original_12$sum)) +
geom_density() +
facet_wrap(~sex) +
geom_vline(data=original_12, aes(xintercept=cutoff_12),
linetype="dashed", color="red", size=1)
The ggplot object data can be extracted...here is the structure of the data:
str(ggplot_build(p)$data[[1]])
'data.frame': 1024 obs. of 16 variables:
$ y : num 0.00114 0.00121 0.00129 0.00137 0.00145 ...
$ x : num 17 17 17.1 17.1 17.2 ...
$ density : num 0.00114 0.00121 0.00129 0.00137 0.00145 ...
$ scaled : num 0.0121 0.0128 0.0137 0.0145 0.0154 ...
$ count : num 0.0568 0.0604 0.0644 0.0684 0.0727 ...
$ n : int 50 50 50 50 50 50 50 50 50 50 ...
$ PANEL : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ group : int -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ ymin : num 0 0 0 0 0 0 0 0 0 0 ...
$ ymax : num 0.00114 0.00121 0.00129 0.00137 0.00145 ...
$ fill : logi NA NA NA NA NA NA ...
$ weight : num 1 1 1 1 1 1 1 1 1 1 ...
$ colour : chr "black" "black" "black" "black" ...
$ alpha : logi NA NA NA NA NA NA ...
$ size : num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
$ linetype: num 1 1 1 1 1 1 1 1 1 1 ...
It cannot be used directly because you need to rename the PANEL data and factor it to match your original dataset. You can extract the data from the ggplot object here:
to_fill <- data_frame(
x = ggplot_build(p)$data[[1]]$x,
y = ggplot_build(p)$data[[1]]$y,
sex = factor(ggplot_build(p)$data[[1]]$PANEL, levels = c(1,2), labels = c("F","M")))
p + geom_area(data = to_fill[to_fill$x >= 35, ],
aes(x=x, y=y), fill = "red")
#DATA
set.seed(2)
original_12 <- data.frame(sum=rnorm(100,30,5), sex=c("M","F"))
cutoff_12 <- 35
#Calculate density for each sex
temp = do.call(rbind, lapply(split(original_12, original_12$sex), function(a){
d = density(a$sum)
data.frame(sex = a$sex[1], x = d$x, y = d$y)
}))
#For each sex, seperate the data for the shaded area
temp2 = do.call(rbind, lapply(split(temp, temp$sex), function(a){
rbind(data.frame(sex = a$sex[1], x = cutoff_12, y = 0), a[a$x > cutoff_12,])
}))
#Plot
ggplot(temp) +
geom_line(aes(x = x, y = y)) +
geom_vline(xintercept = cutoff_12) +
geom_polygon(data = temp2, aes(x = x, y = y)) +
facet_wrap(~sex) +
theme_classic()

ggplot2 facet_wrap geom_text not accepting date values

I have a small data set, local, (5 observations) with two types: a and b.
Each observation has a Date field (p.start), a ratio, and a duration.
local
principal p.start duration allocated.days ratio
1 P 2015-03-18 1 162.0000 162.0000
2 V 2015-08-28 4 24.0000 6.0000
3 V 2015-09-03 1 89.0000 89.0000
4 V 2015-03-30 1 32.0000 32.0000
5 P 2015-01-29 1 150.1667 150.1667
str(local)
'data.frame': 5 obs. of 5 variables:
$ principal : chr "P" "V" "V" "V" ...
$ p.start : Date, format: "2015-03-18" "2015-08-28" "2015-09-03" "2015-03-30" ...
$ duration : Factor w/ 10 levels "1","2","3","4",..: 1 4 1 1 1
$ allocated.days: num 162 24 89 32 150
$ ratio : num 162 6 89 32 150
I have another data frame, stats, with text to be added to a faceted plot.
stats
principal xx yy zz
1 P 2015-02-28 145.8 Average = 156
2 V 2015-02-28 145.8 Average = 24
str(stats)
'data.frame': 2 obs. of 4 variables:
$ principal: chr "P" "V"
$ xx : Date, format: "2015-02-28" "2015-02-28"
$ yy : num 146 146
$ zz : chr "Average = 156" "Average = 24"
The following code fails:
p = ggplot (local, aes (x = p.start, y = ratio, size = duration))
p = p + geom_point (colour = "blue"); p
p = p + facet_wrap (~ principal, nrow = 2); p
p = p + geom_text(aes(x=xx, y=yy, label=zz), data= stats)
p
Error: Continuous value supplied to discrete scale
Any ideas? I'm missing something obvious.
The problem is that you are plotting from 2 data.frames, but your initial ggplot call includes aes parameters referring to just the local data.frame.
So although your geom_text specifies data=stats, it is still looking for size=duration.
The following line works for me:
ggplot(local) +
geom_point(aes(x=p.start, y=ratio, size=duration), colour="blue") +
facet_wrap(~ principal, nrow=2) +
geom_text(data=stats, aes(x=xx, y=yy, label=zz))
Just remove size = duration from ggplot (local, aes (x = p.start, y = ratio, size = duration)) and add it into geom_point (colour = "blue"). Then, it should work.
ggplot(local, aes(x=p.start, y=ratio))+
geom_point(colour="blue", aes(size=duration))+
facet_wrap(~principal, nrow=2)+
geom_text(aes(x=xx, y=yy, label=zz), data=stats)

Resources