Unwanted lines between groups in ggplot - r

I have climate logger data for several years and want to plot the daily temperature cycle for each day in one month. I am using ggplot and and grouping the data by day
When I plot data from a single year, everything is fine. When I plot data from multiple years, I get lines from 23:00 back to 00:00. If I use facet_wrap, it works, but I have multiple sites and want to facet by site not year.
clim2 <- structure(list(date = structure(c(1404172980, 1404176580, 1404180180,
1404183780, 1404187380, 1404190980, 1404194580, 1404198180, 1404201780,
1404205380, 1404208980, 1404212580, 1404216180, 1404219780, 1404223380,
1404226980, 1404230580, 1404234180, 1404237780, 1404241380, 1404244980,
1404248580, 1404252180, 1404255780, 1404259380, 1404262980, 1404266580,
1404270180, 1404273780, 1404277380, 1404280980, 1404284580, 1404288180,
1404291780, 1404295380, 1404298980, 1404302580, 1404306180, 1404309780,
1404313380, 1404316980, 1404320580, 1404324180, 1404327780, 1404331380,
1404334980, 1404338580, 1404342180, 1435708980, 1435712580, 1435716180,
1435719780, 1435723380, 1435726980, 1435730580, 1435734180, 1435737780,
1435741380, 1435744980, 1435748580, 1435752180, 1435755780, 1435759380,
1435762980, 1435766580, 1435770180, 1435773780, 1435777380, 1435780980,
1435784580, 1435788180, 1435791780, 1435795380, 1435798980, 1435802580,
1435806180, 1435809780, 1435813380, 1435816980, 1435820580, 1435824180,
1435827780, 1435831380, 1435834980, 1435838580, 1435842180, 1435845780,
1435849380, 1435852980, 1435856580, 1435860180, 1435863780, 1435867380,
1435870980, 1435874580, 1435878180), class = c("POSIXct", "POSIXt"
), tzone = "NMT"), value = c(-0.1, 0, 0, 0, 0, 0, 0, 0, 0.2,
0.3, 0.7, 2.2, 2.6, 2.6, 3.3, 3, 1.9, 1.7, 1.1, 2.1, 0.7, 0.3,
-0.3, -0.4, -0.3, -1, -0.9, -1, -1, -1.1, -1.2, -0.5, -0.6, -1.2,
1.1, 3, 3.4, 4.5, 1.9, 1.9, 3.8, 3.4, 1.3, -0.1, 0.2, -0.6, -0.8,
-0.9, -0.4, -0.3, -0.3, -0.3, -0.2, -0.3, -0.6, -0.8, -0.7, -1.1,
1.2, 2.9, 1.9, 1.4, 1.7, 1.9, 1.6, 1.5, 0.9, 1.1, -0.5, -1.4,
-1.2, -1.1, -1.6, -1.3, -1.4, -1.4, -1.5, -1.3, -1.3, -1.6, -1.9,
-1.8, 0.9, 1.4, 0.9, 0.7, 0.4, -0.5, 0.1, 0.2, 0.1, -0.1, -0.6,
-0.9, -0.9, -0.7)), .Names = c("date", "value"), row.names = c(NA,
-96L), class = "data.frame")
library(ggplot2)
library(lubridate)
g <-ggplot(clim2, aes(x = hour(date) + minute(date)/60, y = value, colour = factor(year(date)), group = factor(day(date)))) +
geom_path() +
xlab("Time")
print(g)

If you want to remove these lines, you have to make sure that group contains unique value for one path (roughly speaking, some sort of non-overlapping id), e.g.
clim2$year <- year(clim2$date)
clim2$day_id <- paste0(day(clim2$date), "_", clim2$year)
ggplot(clim2, aes(x = hour(date) + minute(date)/60,
y = value, colour = factor(year), group = day_id)) +
geom_path() +
xlab("Time")

Related

How to model quantiles regression curves for probabilities depending on a predictor in R?

I'd' like to model the 25th, 50th and 75th quantile regression curves (q25, q50, q75) for 241 values of probability ('prob') depending on x0.
For that purpose, I used the qgamV package as follows. However, this approach led to some q25, q50, q75 values <0 and >1, which is not expected for probabilities.
Graphically, one would expect the q25 and q75 regression curves to approach the 'prob' limits 0 and 1 in a more tangential way (see below).
How to model these quantiles curves as best as possible, knowing that they represent probabilities?
Thanks for help.
Initial dataframe (df0):
df0 <- structure(list(x0 = c(2.65, 3.1, 2.15, 2.45, 2.9, 1.55, 2.05,
2.75, 2, 2.45, 4.05, 1.95, 3.35, 2.15, 2.5, 1.75, 1.6, 2.3, 3.35,
3.55, 2.1, 3.15, 2.5, 1.05, 2.3, 2.3, 2.95, 0.8, 1.75, 2.95,
2.55, 1.65, 2.4, 2.8, 2.2, 3.45, 2.15, 2.9, 1.7, 2.7, 2.05, 2.75,
2.35, 3.75, 2.2, 1.1, 2.35, 2.5, 3.05, 1, 4.4, 1.3, 2.2, 2.5,
1.35, 1.95, 1.95, 5.45, 2, 1.65, 2.7, 2, 1.5, 1.05, 4.15, 2.15,
1.9, 1.85, 4.2, 2.2, 3.35, 1.55, 1.95, 2.3, 1.9, 3.45, 2.2, 3.55,
1.4, 2.5, 2.35, 2.5, 2.4, 3.35, 2, 2.6, 3.05, 2.75, 1.6, 1.65,
2.45, 1.55, 1.65, 2.25, 0.9, 2.4, 2.2, 2, 1.65, 1.35, 1.95, 2.5,
1.6, 1.25, 3.8, 2.25, 2.85, 1.45, 2.4, 2.8, 3.75, 3.05, 1.8,
1.25, 1.55, 2, 2.55, 2.75, 3.55, 2.2, 2.1, 3.55, 3.65, 2.3, 1.25,
2.45, 2.2, 1.95, 1.65, 0.7, 2, 1.5, 2.8, 3.4, 3.95, 2.55, 2.45,
2.65, 1.75, 1.7, 2.5, 2.05, 2.75, 2.05, 3, 2.25, 3.6, 2.35, 3.25,
1.6, 3.3, 2.05, 1.95, 2.15, 2.3, 4.1, 2.45, 1.6, 2.3, 0.6, 2.35,
2.45, 1.9, 2.5, 1.35, 3.2, 2.25, 1.65, 2.75, 1.8, 3, 0.95, 2.7,
2.15, 3.75, 2.5, 1.95, 2.7, 3.75, 2.4, 2.4, 3.05, 1.8, 3.6, 2.05,
2.75, 2.15, 1.35, 3.15, 2.25, 3.1, 2, 2.35, 3.3, 2.05, 0.75,
2.55, 2.2, 3.15, 3.1, 1.75, 3.2, 3.15, 2.8, 2.5, 1.8, 2.2, 1.85,
3.35, 1.35, 2.75, 1.85, 2.8, 2.65, 3.15, 1.15, 2.5, 3.75, 2.75,
4.55, 2.3, 2.65, 3.1, 3.65, 0.8, 2.45, 3.25, 3.65, 3.75, 1.75,
2.55, 1.15, 2.05, 2.05, 3.5, 0.75, 2.55, 2.2, 2.1, 2.15, 2.75
), prob = c(0.043824528975438, 0.0743831343145038, 0.0444802301649798,
0.0184204002808217, 0.012747152819121, 0.109320069103749, 0.868637913750677,
0.389605665620339, 0.846536935687218, 0.104932383728924, 0.000796924809569913,
0.844673988202945, 0.00120791067227541, 0.91751061807481, 0.0140582427585067,
0.61360854266884, 0.55603090737844, 0.0121424615930165, 0.000392412410090414,
0.00731972612592678, 0.450730636411052, 0.0111896050578429, 0.0552971757296455,
0.949825608148576, 0.00216318997302124, 0.620876890784462, 0.00434032271743834,
0.809464444601336, 0.890796570916792, 0.0070834616944228, 0.0563350845256127,
0.913156468748195, 0.00605085671490011, 0.00585882020388307,
0.0139577135093548, 0.0151356267602558, 0.00357231467872644,
0.000268107682417655, 0.047883018897558, 0.137688264298974, 0.846219411361109,
0.455395192661041, 0.440089914302649, 0.312776912863294, 0.721283899836456,
0.945808616162847, 0.160122538485323, 0.274966581834218, 0.223500907500226,
0.957169102670141, 3.29173412975754e-05, 0.920710197397359, 0.752055893010363,
0.204573327883464, 0.824869881489217, 0.0336636091577387, 0.834235793851965,
0.00377210373002217, 0.611370672834389, 0.876156793482752, 0.04563653558985,
0.742493995255321, 0.42035122692417, 0.916359628728296, 0.182755925347698,
0.139504394672643, 0.415836463269909, 0.0143112277191436, 0.00611022961831899,
0.794529254262237, 0.000295836911230635, 0.88504245090271, 0.0320097205131667,
0.386424550101868, 0.724747784339428, 0.0374198694261709, 0.772894216412908,
0.243626917726206, 0.884082536765856, 0.649357153222083, 0.651665475576256,
0.248153637183556, 0.621116026311962, 0.254679380328883, 0.815492354289526,
0.00384382735772974, 0.00098493832845314, 0.0289740210412282,
0.919537164719931, 0.029914235716672, 0.791051705450356, 0.535062926433525,
0.930153425256182, 0.739648381556949, 0.962078822556967, 0.717404075711021,
0.00426200695619151, 0.0688025266083751, 0.30592683399928, 0.76857384388609,
0.817428136470741, 0.0101583095649087, 0.190150584186769, 0.949353043876038,
0.000942385744019884, 0.00752842476126574, 0.451811230189468,
0.878142444707428, 0.085390660867941, 0.705492062082986, 0.00776625091631656,
0.120499683875168, 0.871558791341612, 0.204175216963286, 0.88865934672351,
0.735067195665991, 0.111767657566763, 0.0718305257427526, 0.001998068594943,
0.726375812318976, 0.628064249939129, 0.0163105011142307, 0.585565544471761,
0.225632568540361, 0.914834452659588, 0.755043268549628, 0.44993311080756,
0.876058522964169, 0.876909380258345, 0.935545943209396, 0.856566304797687,
0.891579321327903, 0.67586664661773, 0.305274362445618, 0.0416387565225755,
0.244843991055886, 0.651782914419153, 0.615583040148267, 0.0164959661557421,
0.545479687527543, 0.0254178939123714, 0.00480000384583597, 0.0256296636591875,
0.776444262284288, 0.00686736233661002, 0.738267311816833, 0.00284628668554737,
0.0240371572079387, 0.00549270830047392, 0.91880163437759, 0.336534358175717,
0.276841848679916, 0.718008645244615, 0.0897424253787563, 0.0719730540202573,
0.00215797941000608, 0.0219160132143199, 0.797680147185277, 0.66612383359622,
0.946965411044528, 0.133399527090937, 0.343056247984854, 0.202570454449074,
0.00349712323805031, 0.919979740593237, 0.577123238372546, 0.759418264563034,
0.904569159000302, 0.0179587619909363, 0.785657258439329, 0.235867625712547,
0.959688292861383, 0.668060191654474, 0.0014774986557077, 0.00831528722028647,
0.669655207261098, 0.157824457113222, 0.110637023939517, 0.262525772704882,
0.112654002253028, 0.22606090266161, 0.157513622503487, 0.25688454756606,
0.00201570863346944, 0.70318409224183, 0.25568985167711, 0.810637054896326,
0.92708070974999, 0.608664352336801, 0.707490903842404, 0.00094520948858089,
0.106177223644193, 0.582785205597368, 0.0585327568963445, 0.377814739935042,
0.972447647118833, 0.0111118791692372, 0.58947840090326, 0.0111189166236961,
0.00317374095338712, 0.0664218007312096, 0.00227258301798719,
0.00198861129291917, 0.337443337988163, 0.750708293355867, 0.837530172974158,
0.627428065068903, 0.744110974625108, 0.00320417425932798, 0.871800026765784,
0.613647987816266, 0.808457030433619, 0.00486495461698562, 0.597950577021363,
0.000885253981642748, 0.0800527366346806, 0.00951706823839207,
0.125222576598629, 0.346018567766834, 0.0376933970313487, 0.157903106929268,
0.0371982251307384, 0.00407175432189843, 0.0946588147179984,
0.967274516618573, 0.169109953293894, 0.00124072042059317, 0.00259042255361196,
0.000400511359506596, 0.841289470209085, 0.807106898740506, 0.926962245924993,
0.814160745645036, 0.662558468801531, 0.000288068688170646, 0.698932091902567,
0.00242011818508616, 0.645573844423654, 0.517121859568318, 0.0931231998319089,
0.000877774529895907)), row.names = c(NA, -241L), class = "data.frame")
Quantiles regressions and plot:
library(mgcViz)
library(qgam)
library(ggplot2)
# Quantile regressions
q50 <- qgamV(prob ~ s(x0, bs="cr", k=10), data = df0, qu = 0.5)
q25 <- qgamV(prob ~ s(x0, bs="cr", k=10), data = df0, qu = 0.25)
q75 <- qgamV(prob ~ s(x0, bs="cr", k=10), data = df0, qu = 0.75)
# New dataframe including fitted quantile values
df1 <- df0
df1$q50 <- q50[["fitted.values"]]
df1$q25 <- q25[["fitted.values"]]
df1$q75 <- q75[["fitted.values"]]
# Plot
x_brk <- seq(0, 6, 1); x_lab <- seq(0, 6, 1)
y_brk <- seq(0, 1, 0.1); y_lab <- seq(0, 1, 0.1)
ggplot(df1, aes(x = x0, y = prob))+
scale_x_continuous(limits=c(0, 20), expand=c(0, 0), breaks=x_brk, labels=x_lab)+
scale_y_continuous(limits=c(-1, 2),expand=c(0, 0), breaks=y_brk, labels=y_lab)+
geom_vline(xintercept=x_brk, colour="grey25", size=0.2)+
geom_hline(yintercept=y_brk, colour="grey50", size=0.2)+
geom_hline(yintercept=0.5, linetype="solid", color = "black", size=0.2)+
geom_point(data = df1, aes(x = x0, y = prob), colour = "grey50", size=0.75, inherit.aes = TRUE)+
xlab(~paste("x0"))+
ylab(~paste("Prob"))+
theme(plot.title = element_blank())+
theme(plot.margin=unit(c(0.2,0.5,0.01,0.3),"cm"))+
theme(axis.text.x=element_text(colour="black", size=9.5, margin=margin(b=10),vjust=-1))+
theme(axis.text.y=element_text(colour="black", size=9.5,hjust=0.5))+
theme(axis.title.x=element_text(colour="black", size=11.5, margin=margin(b=2), vjust=1))+
theme(axis.title.y=element_text(colour="black", size=11.5, margin=margin(b=2), vjust=4))+
theme(panel.background=element_rect(fill="white"), panel.border = element_rect(colour = "black", fill=NA))+
geom_line(aes(x=x0, y = q50), data=df1, colour="black",size=0.8, inherit.aes = TRUE)+
geom_line(aes(x=x0, y = q25), data=df1, colour="black",size=0.6, linetype = "longdash")+
geom_line(aes(x=x0, y = q75), data=df1, colour="black",size=0.6, linetype = "longdash")+
coord_cartesian(xlim = c(0, 6), ylim = c(0, 1))
Continuation of the solution proposed by user2974951:
Given the non-normal distribution of Prob, I think better to use qgam rather than quantreg, by taking inspiration from user2974951's solution.
The difference between these 2 quantile regression approaches is very slight on example x0, but much more obvious with another predictor x1:
Example x0:
Example x1:
You can use the logit transform and then use regular quantile regresion
library(quantreg)
df0 <- df0[order(df0$x0), ] # ordering just for easier visualization
df0$probL <- log(df0$prob/(1 - df0$prob))
t <- c(0.25, 0.5, 0.75)
mod <- lapply(t, function(x){rq(probL ~ x0, data=df0, tau=x)})
names(mod) <- paste0("Q_", t)
pre <- as.data.frame(do.call(cbind, lapply(mod, function(x){1/(1 + exp(-predict(x)))})))
plot(prob ~ x0, data=df0)
lines(pre$Q_0.25 ~ df0$x0, col="red")
lines(pre$Q_0.5 ~ df0$x0, col="green")
lines(pre$Q_0.75 ~ df0$x0, col="red")

Making quarto html documents use the full window width

I am creating a quarto html document using R and publishing it to the web at rpubs.com. The document uses only the middle third of the window and leaves the left and right thirds blank. I am showing a table but it is not started on the far left side and as a result it is cut off part way through. How can I tell quarto to use the full window? Here is a reproducible example.
---
title: "Test2"
author: "Philip Smith"
format: html
html:
page-layout: custom
editor: visual
code-fold: true
link-external-icon: true
knitr:
opts_chunk:
collapse: true
comment: "#>"
R.options:
knitr.graphics.auto_pdf: true---
---
```{r set-options, echo=FALSE, warning=FALSE, cache=FALSE}
library(lubridate)
library(gt)
FdfT <- structure(list(Name = c("LFS participation rate", "LFS employment rate",
"LFS unemployment rate", "Job vacancy rate", "LFS payroll", "LFS payroll employment",
"LFS payroll average hours worked", "LFS payroll average hourly earnings",
"SEPH payroll", "SEPH payroll employment", "SEPH payroll average hours worked",
"SEPH payroll average hourly earnings", "LFS payroll", "LFS payroll employment",
"LFS payroll average hours worked", "LFS payroll average hourly earnings",
"SEPH payroll", "SEPH payroll employment", "SEPH payroll average hours worked",
"SEPH payroll average hourly earnings"), V1 = c(65.4, 61.5, 6,
5.6, 8.9, 5.8, 0.3, 2.7, 9.2, 6.8, -0.3, 2.3, 0.8, 0.3, 0.1,
0.4, 1.3, 0.6, 0.6, 0.1), V2 = c(65, 60.8, 6.5, 5.4, 7.7, 5.8,
-0.6, 2.3, 9.3, 7.8, -0.6, 2, -0.2, -1.5, 0.4, 0.9, 0.9, 0.1,
0, 0.9), V3 = c(65.4, 61.8, 5.5, 5.3, 9.5, 6.3, 0, 3.1, 10.1,
8, -0.6, 2.5, 1.9, 2.3, -0.3, -0.1, 0.8, 0.7, 0, 0), V4 = c(65.4,
61.9, 5.3, 5.7, 9.1, 5.1, 0.3, 3.5, 11.8, 7.2, -0.6, 4.9, 0.7,
0.3, 0.1, 0.3, 2.2, 0.8, -0.3, 1.7), V5 = c(65.3, 61.9, 5.2,
5.7, 9.3, 6.3, -0.5, 3.3, 10.2, 7, -1.2, 4.3, -0.2, 0, -0.6,
0.4, 0.2, 0.7, 0, -0.4), V6 = c(65.3, 61.9, 5.1, 5.7, 10.9, 7.2,
-0.6, 3.9, 11.5, 8.6, -0.9, 3.5, 1.2, 0.2, 0.2, 0.8, -0.4, 0,
-0.3, -0.3), V7 = c(64.9, 61.7, 4.9, 5.6, 11.3, 5.2, 0.6, 5.2,
11.2, 7.8, -0.6, 3.9, 1.3, 0.1, 0.5, 0.7, 1, 0.8, 0, 0.4), V8 = c(64.7,
61.6, 4.9, 5.5, 10.5, 4.2, 0.9, 5.2, 9, 5.9, -0.6, 3.3, 0, -0.4,
-0.1, 0.4, 0.1, 0.1, 0, -0.1), V9 = c(64.8, 61.3, 5.4, 5.3, 9.4,
3.5, 0.3, 5.4, 9.2, 5.5, 0, 3.6, 0.2, -0.1, -0.4, 0.6, 0.8, 0,
0, 0.9), V10 = c(64.7, 61.3, 5.2, 5.2, 8.5, 2.6, 0.6, 5.2, 9.1,
5.5, -0.6, 3.8, 0.5, 0.2, 0.3, 0, 0.8, 0.5, -0.3, 0.4), V11 = c(64.9,
61.6, 5.2, 4.9, 8.8, 2.8, 0.3, 5.5, 7.9, 4.7, -1.2, 4.1, 1.1,
0.5, -0.1, 0.7, -0.6, 0, -0.6, 0.1), V12 = c(64.8, 61.5, 5.1,
NA, 8.3, 2, 0.5, 5.6, NA, NA, NA, NA, 0.7, 0, 0.3, 0.5, NA, NA,
NA, NA), V13 = c(65, 61.8, 5, NA, 7.7, 2.1, 0.3, 5.2, NA, NA,
NA, NA, 0.2, 0.4, -0.1, 0, NA, NA, NA, NA)), row.names = c(NA,
-20L), class = "data.frame")
LASTdate <-"2022-12-01"
NumMths <- 12
Dates <- seq.Date(as.Date("2021-12-01"),as.Date("2022-12-01"),by="month")
colls <- c("V1","V2","V3","V4","V5","V6","V7","V8","V9","V10","V11","V12","V13")
MyTitle <- paste0("**Labour market indicators<br>",format(Dates[1],"%B %Y"),
" to ",format(Dates[13],"%B %Y"),"**")
gt_tbl1 <- gt(data=FdfT)
gt_tbl1 <- gt_tbl1 %>%
tab_options(table.font.size=12,container.width = 1450) %>%
tab_header(
title=md(html(MyTitle))
) %>%
cols_align(
align=c("left"),
columns=c(`Name`)
) %>%
fmt_number(
columns=all_of(colls),
decimals=1,
use_seps=TRUE
) %>%
cols_label(
`Name`="",
`V1`=md("**Dec<br>2021**"),
`V2`=md("**Jan<br>2022**"),
`V3`=md("**Feb<br>2022**"),
`V4`=md("**Mar<br>2022**"),
`V5`=md("**Apr<br>2022**"),
`V6`=md("**May<br>2022**"),
`V7`=md("**Jun<br>2022**"),
`V8`=md("**Jul<br>2022**"),
`V9`=md("**Aug<br>2022**"),
`V10`=md("**Sep<br>2022**"),
`V11`=md("**Oct<br>2022**"),
`V12`=md("**Nov<br>2022**"),
`V13`=md("**Dec<br>2022**")
) %>%
sub_missing(columns=everything(),rows=everything(),
missing_text="---") %>%
tab_style(
style = list(
cell_text(weight = "bold")
),
locations = cells_title()
) %>%
tab_style( # column label style
style = list(
cell_text(weight = "bold")
),
locations = cells_column_labels(
columns=c(Name,all_of(colls)))
) %>%
tab_row_group(label="Ratio, per cent",
rows=c(1:4),id="Levels") %>%
tab_row_group(label="12-month percentage change, per cent",
rows=c(5:12),id="PC12") %>%
tab_row_group(label="1-month percentage change, per cent",
rows=c(13:20),id="PC01") %>%
opt_row_striping(row_striping = TRUE) %>%
opt_vertical_padding(scale = 0.25) %>%
tab_footnote(
footnote = paste0("Dashes mean 'data not yet available'. Source for ",
"data: Statistics Canada. #PhilSmith26. Prepared: ",Sys.time()),
locations = cells_title()
)
gt_tbl1
```
I discovered that one must update to the pre-release 1.3 of quarto. Then the following YAML code does the job.
format:
html:
grid:
sidebar-width: 0px
body-width: 2000px
margin-width: 0px
gutter-width: 1.5rem

is there a simple way to draw a graph in christmas tree farm in r

Is there a mathematical function or a way in which we can get a graph that will be in the form of a Christmas tree, like this?
thanks for your help
Here's one of many options:
tree <- data.frame(x = c(-5, 5, 2, 4, 1.5, 3, 0, -3, -1.5, -4, -2, -5,
-0.75, 0.75, 0.75, -0.75),
y = c(1, 1, 3, 3, 5, 5, 7, 5, 5, 3, 3, 1, 0, 0, 1, 1),
part = rep(c("branches", "trunk"), times = c(12, 4)))
baubles <- data.frame(x = c(-1.9, -2.4, 0.5, -0.3, -0.2, -1.3, 0.5,
1.2, -2.2, -1, 1.7, -1.4, -1.4, 0.4, 2.1, 0.4,
-0.8, -3.3, 0.5, -2.2, -0.1, -1.5, 2, 3.9, 1.3,
-1.7, 3.7, 2.8, 1, -0.1, 3.8, -2.9, -1.9, -1.7,
-2.6, -2.3, 0.9, 1, -0.4, 1.5, 1.8, -0.5, -1.4,
-0.4, -0.5, -0.9, -1.7, 0.7, 1.6, 1.2, -0.4, 1,
0.8, 2.3, -2.5, -2, -2.9, -1.4, -1.1, 0.2),
y = c(3, 3.3, 1.2, 4.4, 5.1, 5.2, 1.1, 6, 1.5, 2.4, 1.2,
5.4, 2.2, 3.4, 3.4, 3.8, 3.1, 1.2, 4.3,
1.6, 2.4, 5.4, 4.5, 1.1, 1.3, 5, 1.5, 1.9, 1.7,
5.4, 1.3, 1.1, 2.2, 4, 1.8, 2, 4.6, 1.1, 5.9, 4.4,
2, 1.5, 2, 1.2, 5.3, 3.6, 3.5, 4.5, 5.8, 3, 2.7,
5.3, 3.1, 1.7, 1.6, 2.8, 3.6, 2.2, 2.8, 1.7),
color = sample(c("white", "yellow", "red"), 60, TRUE))
library(ggplot2)
ggplot(tree, aes(x, y)) +
geom_polygon(aes(fill = part)) +
geom_point(data = baubles, aes(color = color), size = 4) +
scale_fill_manual(values = c("green4", "brown4"), name = "Parts of tree") +
scale_color_identity(guide = guide_legend(), labels = c("red bauble",
"white bauble", "yellow bauble"), name = "Decorations") +
theme_minimal(base_size = 20)
Created on 2022-11-20 with reprex v2.0.2

How to average 7 years of data

I'm new to R programming and this website so please bear with my incompetence. I pulled atmospheric data from the past 7 years for 7 variables; ozone, CO, NO, NO2, windspeed, PM 2.5, PM 10. What I am trying to do is graph this data to see if the government mandated stay at home orders during this covid-19 pandemic had any effect on atmospheric composition. From the graph I'm not quite sure what to do next. I believe I need an average of the past 7 years? The data is hourly data displayed in a 24 hour format.
NO.dat data frame
dput(head(NO.dat,10))
structure(list(Date = c("3/1/2014", "3/2/2014", "3/3/2014", "3/4/2014",
"3/5/2014", "3/6/2014", "3/7/2014", "3/8/2014", "3/9/2014", "3/10/2014"
), X0.00 = c(3.6, NA, 2.3, 17.1, 0.4, 0.9, 110.9, 0.1, NA, 0.4
), X1.00 = c(6.3, NA, 1.4, 18.7, 0.2, 0.2, 15.8, 0, NA, 0.6),
X2.00 = c(2.3, 0.6, 0.4, 13.9, 0.2, 0.1, 13.5, 0, 0.8, 0.3
), X3.00 = c(0.9, 0.3, 0.9, 4.2, 0.5, 0.3, 22.7, 0.2, 0.5,
0.7), X4.00 = c(0.2, 0.1, 2.8, 5.2, 0.7, 0, 40.1, 0.1, 0.8,
2.8), X5.00 = c(0.4, 0, 4.4, 11.1, 2.4, 1.8, 22.2, 0.1, 0.8,
4.1), X6.00 = c(11.8, 0.1, 17.6, 51.8, 3.6, 8.2, 2.8, 0.3,
1, 20.1), X7.00 = c(39.5, 0.6, 30.3, 118.6, 15.7, 12, 3.7,
1, 1.9, 39.1), X8.00 = c(23.9, 0.7, 25.8, 35.6, 20.6, 11.4,
6.3, 1.5, 1.2, 33.5), X9.00 = c(8.4, 1.1, 20.8, 28.7, 5.1,
9.4, 3.7, 1.3, 0.8, 9.6), X10.00 = c(4.3, 0.5, 13.3, 17.1,
1.1, 6, 1.3, 2.4, 1.4, 2.5), X11.00 = c(3.9, 0.3, 8.3, 13.9,
0.5, 5.6, 0.9, 2.3, 1.3, 1.2), X12.00 = c(4.1, 0.6, 6.3,
12.2, 0.6, 4.3, 0.8, 1.6, 1, 1.1), X13.00 = c(2.6, 0.6, 9.1,
9, 0.6, 3.6, 0.7, 2, 1.6, 1.1), X14.00 = c(3.7, 0.5, 9.3,
1.4, 0.9, 2.3, 0.9, 1.1, 1.1, 1.3), X15.00 = c(3.4, 0.5,
9.4, 0.8, 0.8, 1.8, 1.2, 1.8, 1.2, 1.1), X16.00 = c(1, 0.3,
5.7, 0.6, 2.5, 2.3, 1.1, 2.3, 1.2, 1), X17.00 = c(0.9, 0.3,
13.4, 0.5, 3.2, 1.8, 0.7, 1.4, 0.6, 0.7), X18.00 = c(0.8,
0.2, 22.1, 0.5, 3.9, 0.7, 0.7, 0.9, 0.4, 0.5), X19.00 = c(0.5,
0.2, 24.2, 1.8, 15.4, 1.1, 0.1, 0.8, 10.1, 0.6), X20.00 = c(0.5,
1, 18.4, 17.1, 5.1, 33.4, 0.3, 0.1, 45.3, 0.5), X21.00 = c(1,
0.5, 15.4, 55.7, 2, 39.5, 4.1, 0, 49.5, 0.4), X22.00 = c(0.4,
0.2, 8.1, 52.6, 2.7, 25.2, 0.9, 0.3, 27.2, 0.5), X23.00 = c(0.4,
6, 11.9, 2.2, 2.5, 62.1, 0.2, 0.1, 3.3, 0.4)), row.names = c(NA,
10L), class = "data.frame")
NO2.dat data frame
dput(head(NO2.dat,10))
structure(list(Date = c("3/1/2014", "3/2/2014", "3/3/2014", "3/4/2014",
"3/5/2014", "3/6/2014", "3/7/2014", "3/8/2014", "3/9/2014", "3/10/2014"
), X0.00 = c(5, 0.5, 3.2, 16.3, 0.4, 2, 91.2, 0.2, 0.5, 0.2),
X1.00 = c(7, 0.4, 2.4, 18.4, 0.3, 0.6, 17.7, 0.2, 0.5, 0.1
), X2.00 = c(1.7, 0.4, 0.3, 16.3, 0.1, 0.4, 10.3, 0.3, 0.6,
0.2), X3.00 = c(0.8, 0.6, 0.7, 4.4, 0.8, 0.6, 8.5, 0.4, 0.5,
0.6), X4.00 = c(0.6, 0.2, 2.6, 4.4, 1, 0.6, 43.7, 0.3, 0.7,
2.6), X5.00 = c(0.6, 0.3, 5, 12.8, 2.7, 2.8, 15.7, 0.4, 0.7,
4.3), X6.00 = c(5.8, 0.4, 18.6, 60.5, 3.8, 9.5, 3, 0.6, 0.9,
22.1), X7.00 = c(32, 0.7, 27.4, 117.5, 15.3, 12.6, 4.4, 1.7,
2.2, 36.2), X8.00 = c(21.3, 1, 22.7, 37.1, 20.3, 12.5, 7.6,
2.1, 1.4, 33.2), X9.00 = c(7.9, 1.4, 19.4, 28.7, 5, 10.5,
4.8, 2.3, 0.9, 11), X10.00 = c(4.2, 0.6, 12.4, 19, 1.6, 8.1,
1.9, 3.1, 1.8, 2.9), X11.00 = c(4.2, 0.8, 9.6, 15.7, 1.1,
7.4, 1.6, 3.4, 1.8, 1), X12.00 = c(4.2, 0.9, 6.6, 14.2, 1.2,
6.1, 1.4, 2.7, 1.3, 1.2), X13.00 = c(NA, 0.8, 9.4, NA, 1.4,
3.9, 1.2, NA, 1.9, 1.2), X14.00 = c(NA, 0.9, 9.6, NA, 1.9,
3.1, 1.3, NA, 1.3, 1.3), X15.00 = c(NA, 0.9, 9.6, NA, 1.7,
2.9, 1.9, NA, 1.6, 1), X16.00 = c(1, 0.8, 6.4, 1.2, 3.8,
3, 1.8, 3, 1.2, 1.1), X17.00 = c(1.2, 0.7, 12, 1, 4, 1.5,
1.5, 2, 0.5, 0.6), X18.00 = c(0.9, 0.5, 20.2, 0.9, 5, 1,
1.3, 1.5, 0.3, 0.3), X19.00 = c(0.5, 0.5, 19.1, 2.1, 15.8,
1.1, 0.6, 1.3, 5.1, 0.4), X20.00 = c(0.4, 1.1, 17.5, 7, 4.2,
24.9, 0.5, 0.7, 32.1, 0.4), X21.00 = c(0.7, 0.7, 13.3, 28.4,
2.4, 31.7, 3.4, 0.7, 37, 0.3), X22.00 = c(0.4, 0.4, 7.3,
21, 2.9, 18.5, 1.2, 0.6, 20, 0.3), X23.00 = c(0.4, 5.8, 11.6,
0.8, 2.9, 47.8, 0.5, 0.6, 2.1, 0.2)), row.names = c(NA, 10L
), class = "data.frame")
Any help would be much appreciated!
'''
library(reshape2)
library(dplyr)
library(lubridate)
library(ggplot2)
#remove summary stats
NO.dat <- NO.dat[,1:25]
NO2.dat <- NO2.dat[,1:25]
#reorganize data using reshape
x<-melt(NO.dat, id="Date")
colnames(x) <- c("Date","Hour","NO")
x$Hour<- as.numeric(x$Hour)-y<-melt(NO2.dat, id="Date")
y<-melt(NO2.dat, id="Date")
colnames(y) <- c("Date","Hour","NO2")
y$Hour<- as.numeric(y$Hour)-1
x <- cbind(x,y$NO2)
colnames(x)[4] <- "NO2"
x$min <- ":00"
x$time <- paste(x$Hour, x$min, sep="")
x$DT <- paste(x$Date, x$time)
x %>% select(DT, NO, NO2) %>% mutate(NOx=NO + NO2) %>% mutate(DT =
mdy_hm(DT)) %>% arrange(DT) -> x
p <- ggplot(x, aes(x=DT, y=NOx)) + geom_line() + xlab("")
x$index <- 1:nrow(x)
loessMod10 <- loess(NOx ~ index, data=x, span=0.10, na.action=)
x <- na.omit(x)
x$smoothed10 <- predict(loessMod10)
#pdf("El Paso NOx.pdf",w=6,h=3,useDingbats= FALSE)
p <- ggplot(x) + geom_line(aes(x=DT, y=NOx), linetype = "dashed",
size=0.3) + xlab("") +
geom_line(aes(x=DT, y=smoothed10), color = "red") + labs(y="NOx
(ppbv)") +
ggtitle("NOx concentrations at Chamizal TCEQ Site") +
theme(plot.title = element_text(hjust = 0.5)) +
annotate("text",x=as.POSIXct("2020-03-24 17:00:00"), y=130, +
label="Stay Home 1", angle=90, size=2.5)
annotate("text",x=as.POSIXct("2020-04-01 17:00:00"), y=130,
label="Stay Home 2", angle=90, size = 2.5) +
annotate("segment", x = as.POSIXct("2020-03-24 17:00:00"), xend=
as.POSIXct("2020-03-24 17:00:00"), y = 0, yend = 105, colour =
"blue") +
annotate("segment", x = as.POSIXct("2020-04-01 17:00:00"), xend=
as.POSIXct("2020-04-01 17:00:00"), y = 0, yend = 105, colour =
"blue")
'''
Graph that I think needs to be averaged??
Perhaps you can add a year column and use this to group your data by year. You could then overlay the data by year.
For fun, I had a go at reproducing the plots avoiding reshape.
library(dplyr)
library(tidyr)
library(stringr)
library(ggplot2)
# stack raw data for NO and NO2
NO_stacked <- NO %>%
pivot_longer(cols = starts_with("X"),
names_to = "hours",
values_to = "NO")
NO2_stacked <- NO2 %>%
pivot_longer(cols = starts_with("X"),
names_to = "hours",
values_to = "NO2")
# combine into one data frame
data <- bind_cols(NO_stacked, NO2_stacked) %>%
select(Date, hours, NO, NO2)
# coerce dates to POSIXct and add hours; remove hours; reshape to long format using pivot_longer
data <- data %>%
mutate(Date = as.POSIXct(Date, format = "%m/%d/%Y", tz = "UTC"),
hours = as.numeric(str_sub(hours, start = 2, end = -1))) %>%
mutate(Date = Date + 60*60*hours) %>%
select(-hours) %>%
pivot_longer(cols = contains("NO"),
names_to = "Contaminant",
values_to = "Concentration")
# plot
ggplot(data = data, aes(x = Date, y = Concentration)) +
geom_line() +
geom_smooth(method = "loess", formula = y ~ x) +
facet_wrap(vars(Contaminant), nrow = 2) +
ggtitle("NOx concentrations at Chamizal TCEQ Site")

Make table/matrix of probability densities and associated breaks

The following code takes vector V1, and bootstraps 10000 times a randomized rnomal sample made out of V1, creating with the results a matrix with 10000 columns. It then creates a histogram for that matrix.
V1 <- c(0.18, 0.2, 0.24, 0.35, -0.22, -0.17, 0.28, -0.28, -0.14, 0.03, 0.87, -0.2, 0.06, -0.1, -0.72, 0.18, 0.01, 0.31, -0.36, 0.61, -0.16, -0.07, -0.13, 0.01, -0.09, 0.26, -0.14, 0.08, -0.62, -0.2, 0.3, -0.21, -0.11, 0.05, 0.06, -0.28, -0.27, 0.17, 0.42, -0.05, -0.15, 0.05, -0.07, -0.22, -0.34, 0.16, 0.34, 0.1, -0.12, 0.24, 0.45, 0.37, 0.61, 0.9, -0.25, 0.02)
xxx <- round(sapply(1:10000, function(i) rnorm(length(V1), mean=sample
(V1, length(V1), replace=TRUE), sd(V1))),2)
h <- hist(xxx, plot=T)
I would like to create a printout of its probability density function in matrix or table format, i.e. get a matrix with 1.0, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1, 0, -0,1 -0.2, -0.3, -0.4, -0.5, -0.6, -0.7, -0.8, -0.9, -1.0 as breaks, and the associated probability densities on the next column.
My problems are two. First, specifying the breaks I want fails. Secondly, making a matrix with h$breaks and h$density also fails. Any insights would be much appreciated. Thank you.
#This works, but I want to specify the breaks
h <- hist(xxx, plot=T, breaks=20)
#This gives error "some 'x' not counted; maybe 'breaks' do not span range of 'x'"
h <- hist(xxx, plot=T, breaks=c(1.0, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1, 0, -0,1 -0.2, -0.3, -0.4, -0.5, -0.6, -0.7, -0.8, -0.9, -1.0))
#This gives error number of rows of result is not a multiple of vector length
ddd<- cbind(h$breaks, h$density)
At first you have a type error, you confused “.” and “,”.
At second the breaks must span the whole range of observations like this:
h <- hist(xxx, plot=F, breaks=c(max(xxx),1.0, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1, 0, -0.1, -0.2, -0.3, -0.4, -0.5, -0.6, -0.7, -0.8, -0.9, -1.0,min(xxx)))
And at last, you want to pick “mids” instead of breaks:
ddd<- cbind(h$mids, h$density)

Resources