Plotting quantity of teachers per county on a map in R - r

I have two data sets MASS and MASS2 to create a map in R. I got the first one with the help of library(ggmap).
counties<-map_data('county')
MASS<-map_data('county', 'massachusetts')
str(MASS)
data.frame': 744 obs. of 6 variables:
$ long : num -70.7 -70.5 -70.5 -70.5 -70.5 ...
$ lat : num 41.7 41.8 41.8 41.8 41.8 ...
$ group : num 1 1 1 1 1 1 1 1 1 1 ...
$ state : chr "massachusetts" "massachusetts" "massachusetts" ...
$ county_name: chr "barnstable" "barnstable" "barnstable" "barnstable" ...
The second consists of 14 points one per each county and has a teacher's quantity data per that county.
str(MASS2)
'data.frame': 14 obs. of 6 variables:
$ state : chr "massachusetts" "massachusetts" "massachusetts" ...
$ county_name : chr "barnstable" "berkshire" "bristol" "dukes" ...
$ long : num -70.7 -73.5 -71.2 -70.5 -71 ...
$ lat : num 41.7 42 41.7 41.4 42.4 ...
$ group : num 1 2 3 4 5 6 7 8 9 10 ...
$ teacher_count: int 62 40 47 ...
I need to create a map where each teacher_count point will be represented by a circle in accordance with teacher's amount. So far I'm getting just one size circles.
My code is next:
ggplot(MASS, aes(long,lat, group = group)) +
geom_polygon(aes(fill = county_name),colour = "black") +
geom_point(data = MASS2, aes(x = long, y = lat), color = "red", size = 5)+
theme(legend.position="none") +
coord_quickmap()
This is the map I get
I found one solution online which offers to represent the size in geom_point as
+geom_point(......, size = MASS2$teacher_count*circle_scale_amt)+
scale_size_continuous(range=range(MA$teacher_count))
but R can't find circle_scale_amt.
I am a new to R and trying to learn. Will appreciate ideas for any other ways to represent the teachers by their quantity! Thank you!

This works for me after setting a value for circle_scale_amt to rescale the size of the points otherwise they would be too big.
library(ggmap)
counties <- map_data('county')
MASS <- map_data('county', 'massachusetts')
circle_scale_amt <- 0.05
ggplot(MASS, aes(long,lat, group = group)) +
geom_polygon(aes(fill = subregion),colour = "black") +
geom_point(data = MASS2, aes(x = long, y = lat),
size = MASS2$teacher_count * circle_scale_amt,
color = "red", alpha = 0.6)+
scale_size_continuous(range = range(MASS2$teacher_count)) +
theme(legend.position="none") +
coord_quickmap()
Created on 2018-03-16 by the reprex package (v0.2.0).

Related

Plotting time-series data with a gap in r?

I have a data set that has missing data from about July 7th to July 19th. Graph of my dataset. You can see the data gap pretty easily. I would like to truncate it so that the gap isnt there and the before and after data butt up against each other. Something like this . I did try to follow the linked example but I dont understand how they set up xseq. I also tried just removing the offending dates and creating a dataframe without them but that didnt solve the problem.
Im not sure if its helpful but here is the existing code for the graph:
together <- ggplot() +
stat_summary(data = grid_pad, aes(x = DTT, y = grid_value, fill = 'Ambient'), geom='ribbon', fun.data = mean_cl_quantile, alpha = 0.25) +
stat_summary(data = grid_pad, aes(x = DTT, y = grid_value, color = 'Ambient'), geom='line', fun = mean, size = 0.9) +
stat_summary(data = turtle_pad, aes(x = DTT, y = turtle_value, fill = 'Turtle'), geom='ribbon', fun.data = mean_cl_quantile, alpha = 0.25) +
stat_summary(data = turtle_pad, aes(x = DTT, y = turtle_value, color = 'Turtle'), geom='line', fun = mean, size = 0.9) +
labs(x = "Date", y = "Temperature")+
scale_color_manual("Legend", values = c('Ambient' = '#1b9e77', 'Turtle' = '#d95f02'), labels = c(Ambient = 'Ambient Temp', Turtle = 'Turtle Temp')) +
scale_fill_manual("Legend", values = c('Ambient' = '#1b9e77', 'Turtle' = '#d95f02'), labels = c(Ambient = 'Ambient Temp', Turtle = 'Turtle Temp')) +
theme_classic() +
ggtitle("Ambient and Turtle Temperatures")+
ggeasy::easy_center_title()+
easy_remove_legend_title()
together
and here is the structure of my data:
> str(grid_pad)
grouped_df [142,800 x 3] (S3: grouped_df/tbl_df/tbl/data.frame)
$ Logger : Factor w/ 50 levels "TL1","TL11","TL12",..: 1 1 1 1 1 1 1 1 1 1 ...
$ DTT : POSIXct[1:142800], format: "2021-05-28 00:00:00" "2021-05-28 01:00:00" "2021-05-28 02:00:00" "2021-05-28 03:00:00" ...
$ grid_value: num [1:142800] NA NA NA NA NA 19.5 19.5 19.5 20 22 ...
- attr(*, "groups")= tibble [50 x 2] (S3: tbl_df/tbl/data.frame)
..$ Logger: Factor w/ 50 levels "TL1","TL11","TL12",..: 1 2 3 4 5 6 7 8 9 10 ...
> str(turtle_pad)
grouped_df [57,120 x 3] (S3: grouped_df/tbl_df/tbl/data.frame)
$ Name : Factor w/ 20 levels "F1","F11","F12",..: 1 1 1 1 1 1 1 1 1 1 ...
$ DTT : POSIXct[1:57120], format: "2021-05-28 00:00:00" "2021-05-28 01:00:00" "2021-05-28 02:00:00" "2021-05-28 03:00:00" ...
$ turtle_value: num [1:57120] NA NA NA NA NA NA NA NA NA NA ...
- attr(*, "groups")= tibble [20 x 2] (S3: tbl_df/tbl/data.frame)
..$ Name : Factor w/ 20 levels "F1","F11","F12",..: 1 2 3 4 5 6 7 8 9 10 ...
with base R, verbose:
df_with_gap <- data.frame(Name = gl(41, 1),
DTT = as.Date(Sys.Date()) + (-20:20),
turtle_value = c(runif(20), rep(NA, 5), runif(16))
)
rows_to_keep <- !is.na(df_with_gap$turtle_value)
## remove NAs
df_without_gap <- df_with_gap[rows_to_keep,]
## create some index to use for x-values ggplot
df_without_gap$pseudo_date <- rownames(df)
Please note:
while you could use DTT of the remaining values to label your axis (see label argument in ?scale_x_continuous`, the plot will be misleading as it covers up missing information)
a scatter plot would be the way to go if you want to show the association between ambient and turtle temperature.
to show seasonality of instead, consider adding a smoother (?geom_smooth for ggplot)
to convey variability, a boxplot might be more instructive
helpful chart pickers on the web

Violin plot error - Discrete value supplied to continuous scale

I am trying to create a series of Violin plots which show average concentration across different regions (separating out hemispheres and conditions).
I keep getting the following error: Error: Discrete value supplied to continuous scale. Any thoughts would be greatly appreciated.
Take care and stay well.
Here is a look at the structure of my data frame:
> str(Oxyhb_V2)
'data.frame': 1028 obs. of 7 variables:
$ ID : chr "B1" "B1" "B1" "B1" ...
$ Name : chr "Happy_HbO_LeftParietal_Value" "Happy_HbO_RightParietal_Value" "Happy_HbO_LeftSTC_Value" "Happy_HbO_RightSTC_Value" ...
$ Values : num -59.33 1.94 -33.85 21.11 -135.14 ...
$ Condition : Factor w/ 2 levels "Happy","ThreatAngryFearful": 1 1 1 1 1 1 1 1 2 2 ...
$ Chromophore: Factor w/ 1 level "HbO": 1 1 1 1 1 1 1 1 1 1 ...
$ Hemisphere : Factor w/ 2 levels "Left","Right": 1 2 1 2 1 2 1 2 1 2 ...
$ ROI : Factor w/ 4 levels "DLPFC","IFC",..: 3 3 4 4 1 1 2 2 3 3 ...
- attr(*, "na.action")= 'omit' Named int [1:520] 9 18 27 36 40 41 43 44 45 49 ...
..- attr(*, "names")= chr [1:520] "9" "27" "45" "63" ...
Here is my current ggplot code
q <- ggplot(Oxyhb_V2, aes(x=Hemisphere, y=Values, color=Condition)) +
facet_wrap(~ROI, scales='free') +
geom_vline(xintercept = 0, linetype = "dotted", color="black", alpha = .2) + #accentuate origin
geom_hline(yintercept = 0, linetype = "dotted", color="black", alpha = .2) + #accentuate origin
labs(x = "Condition", y = "Mean Oxy-Hb (uM)") + #label axes
theme(text=element_text(size=12)) +
geom_violin(trim=FALSE) +
geom_boxplot(width=0.1)+
geom_point() +#set label font size
theme_minimal() #set theme
plot(q)
The error is caused by geom_vline(xintercept = 0) layer. Replace 0 with one of the values of your x, for example geom_vline(xintercept = "Left")

Shading a specific area using a density plot - ggplot2

I have a data visualization question regarding ggplot2.
I'm trying to figure out how can I shade a specificity area in my density_plot. I googled it a lot and I tried all solutions.
My code is:
original_12 <- data.frame(sum=rnorm(100,30,5), sex=c("M","F"))
cutoff_12 <- 35
ggplot(data=original_12, aes(original_12$sum)) + geom_density() +
facet_wrap(~sex) +
geom_vline(data=original_12, aes(xintercept=cutoff_12),
linetype="dashed", color="red", size=1)
So, from this:
I want this:
The question on ggplot2 shade area under density curve by group is different than mine because they use different groups and graphs.
Similar to this SO question except the facet adds an additional complexity.
You need to rename the PANEL data as "sex" and factor it correctly to match your already existing aesthetic option. Your original "sex" factor is ordered alphabetically (default data.frame option), which is a little confusing at first.
make sure you name your plot "p" to create a ggplot object:
p <- ggplot(data=original_12, aes(original_12$sum)) +
geom_density() +
facet_wrap(~sex) +
geom_vline(data=original_12, aes(xintercept=cutoff_12),
linetype="dashed", color="red", size=1)
The ggplot object data can be extracted...here is the structure of the data:
str(ggplot_build(p)$data[[1]])
'data.frame': 1024 obs. of 16 variables:
$ y : num 0.00114 0.00121 0.00129 0.00137 0.00145 ...
$ x : num 17 17 17.1 17.1 17.2 ...
$ density : num 0.00114 0.00121 0.00129 0.00137 0.00145 ...
$ scaled : num 0.0121 0.0128 0.0137 0.0145 0.0154 ...
$ count : num 0.0568 0.0604 0.0644 0.0684 0.0727 ...
$ n : int 50 50 50 50 50 50 50 50 50 50 ...
$ PANEL : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ group : int -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ ymin : num 0 0 0 0 0 0 0 0 0 0 ...
$ ymax : num 0.00114 0.00121 0.00129 0.00137 0.00145 ...
$ fill : logi NA NA NA NA NA NA ...
$ weight : num 1 1 1 1 1 1 1 1 1 1 ...
$ colour : chr "black" "black" "black" "black" ...
$ alpha : logi NA NA NA NA NA NA ...
$ size : num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
$ linetype: num 1 1 1 1 1 1 1 1 1 1 ...
It cannot be used directly because you need to rename the PANEL data and factor it to match your original dataset. You can extract the data from the ggplot object here:
to_fill <- data_frame(
x = ggplot_build(p)$data[[1]]$x,
y = ggplot_build(p)$data[[1]]$y,
sex = factor(ggplot_build(p)$data[[1]]$PANEL, levels = c(1,2), labels = c("F","M")))
p + geom_area(data = to_fill[to_fill$x >= 35, ],
aes(x=x, y=y), fill = "red")
#DATA
set.seed(2)
original_12 <- data.frame(sum=rnorm(100,30,5), sex=c("M","F"))
cutoff_12 <- 35
#Calculate density for each sex
temp = do.call(rbind, lapply(split(original_12, original_12$sex), function(a){
d = density(a$sum)
data.frame(sex = a$sex[1], x = d$x, y = d$y)
}))
#For each sex, seperate the data for the shaded area
temp2 = do.call(rbind, lapply(split(temp, temp$sex), function(a){
rbind(data.frame(sex = a$sex[1], x = cutoff_12, y = 0), a[a$x > cutoff_12,])
}))
#Plot
ggplot(temp) +
geom_line(aes(x = x, y = y)) +
geom_vline(xintercept = cutoff_12) +
geom_polygon(data = temp2, aes(x = x, y = y)) +
facet_wrap(~sex) +
theme_classic()

ggplot2 facet_wrap geom_text not accepting date values

I have a small data set, local, (5 observations) with two types: a and b.
Each observation has a Date field (p.start), a ratio, and a duration.
local
principal p.start duration allocated.days ratio
1 P 2015-03-18 1 162.0000 162.0000
2 V 2015-08-28 4 24.0000 6.0000
3 V 2015-09-03 1 89.0000 89.0000
4 V 2015-03-30 1 32.0000 32.0000
5 P 2015-01-29 1 150.1667 150.1667
str(local)
'data.frame': 5 obs. of 5 variables:
$ principal : chr "P" "V" "V" "V" ...
$ p.start : Date, format: "2015-03-18" "2015-08-28" "2015-09-03" "2015-03-30" ...
$ duration : Factor w/ 10 levels "1","2","3","4",..: 1 4 1 1 1
$ allocated.days: num 162 24 89 32 150
$ ratio : num 162 6 89 32 150
I have another data frame, stats, with text to be added to a faceted plot.
stats
principal xx yy zz
1 P 2015-02-28 145.8 Average = 156
2 V 2015-02-28 145.8 Average = 24
str(stats)
'data.frame': 2 obs. of 4 variables:
$ principal: chr "P" "V"
$ xx : Date, format: "2015-02-28" "2015-02-28"
$ yy : num 146 146
$ zz : chr "Average = 156" "Average = 24"
The following code fails:
p = ggplot (local, aes (x = p.start, y = ratio, size = duration))
p = p + geom_point (colour = "blue"); p
p = p + facet_wrap (~ principal, nrow = 2); p
p = p + geom_text(aes(x=xx, y=yy, label=zz), data= stats)
p
Error: Continuous value supplied to discrete scale
Any ideas? I'm missing something obvious.
The problem is that you are plotting from 2 data.frames, but your initial ggplot call includes aes parameters referring to just the local data.frame.
So although your geom_text specifies data=stats, it is still looking for size=duration.
The following line works for me:
ggplot(local) +
geom_point(aes(x=p.start, y=ratio, size=duration), colour="blue") +
facet_wrap(~ principal, nrow=2) +
geom_text(data=stats, aes(x=xx, y=yy, label=zz))
Just remove size = duration from ggplot (local, aes (x = p.start, y = ratio, size = duration)) and add it into geom_point (colour = "blue"). Then, it should work.
ggplot(local, aes(x=p.start, y=ratio))+
geom_point(colour="blue", aes(size=duration))+
facet_wrap(~principal, nrow=2)+
geom_text(aes(x=xx, y=yy, label=zz), data=stats)

ggplot2_Error: geom_point requires the following missing aesthetics: y

I am trying to run rWBclimate package in RStudio. I copied the below code from ROpenSci and pasted in RStudio. But I get error saying 'Don't know how to automatically pick scale for object of type list. Defaulting to continuous
Error: geom_point requires the following missing aesthetics: y
gbr.dat.t <- get_ensemble_temp("GBR", "annualavg", 1900, 2100)
## Loading required package: rjson
### Subset to just the median percentile
gbr.dat.t <- subset(gbr.dat.t, gbr.dat.t$percentile == 50)
## Plot and note the past is the same for each scenario
ggplot(gbr.dat.t,aes(x=fromYear,y=data,group=scenario,colour=scenario))
+ geom_point() +
geom_path() +
theme_bw() +
xlab("Year") +
ylab("Annual Average Temperature in 20 year increments")
I also tried to use geom_point(stat="identity") in the following way but didn't work:
ggplot(gbr.dat.t,aes(x=fromYear,y=data,group=scenario,colour=scenario))
+ geom_point(stat="identity") +
geom_path() +
theme_bw() +
xlab("Year") +
ylab("Annual Average Temperature in 20 year increments")
I still get the same message "Don't know how to automatically pick scale for object of type list. Defaulting to continuous
Error: geom_point requires the following missing aesthetics: y"
Also, the result from str(gbr.dat.t) is given below:
> str(gbr.dat.t)
'data.frame': 12 obs. of 6 variables:
$ scenario : chr "past" "past" "past" "past" ...
$ fromYear : int 1920 1940 1960 1980 2020 2020 2040 2040 2060 2060 ...
$ toYear : int 1939 1959 1979 1999 2039 2039 2059 2059 2079 2079 ...
$ data :List of 12
..$ : num 9.01
..$ : num 9.16
..$ : num 9.05
..$ : num 9.36
..$ : num 10
..$ : num 9.47
..$ : num 9.92
..$ : num 10.7
..$ : num 10.3
..$ : num 11.4
..$ : num 12.1
..$ : num 10.4
$ percentile: int 50 50 50 50 50 50 50 50 50 50 ...
$ locator : chr "GBR" "GBR" "GBR" "GBR" ...
Looking for your helpful answers.
Hope this helps. All I did was convert the gbr.dat.t$data to a numeric vector
library('rWBclimate')
library("ggplot2")
gbr.dat.t <- get_ensemble_temp("GBR", "annualavg", 1900, 2100)
## Loading required package: rjson
### Subset to just the median percentile
gbr.dat.t <- subset(gbr.dat.t, gbr.dat.t$percentile == 50)
#This is the line you were missing
gbr.dat.t$data <- unlist(gbr.dat.t$data)
## Plot and note the past is the same for each scenario
ggplot(gbr.dat.t,aes(x=fromYear,y=data,group=scenario,colour=scenario)) + geom_point() +
geom_path() +
theme_bw() +
xlab("Year") +
ylab("Annual Average Temperature in 20 year increments")

Resources