How to calculate the average slope within a moving window in R - r

My dataset contains 2 variables y and t [05s]. y was measured every 05 seconds.
I am trying to calculate the average slope within a moving 20-second-window, i.e. after calculating the first 20-second slope value the window moves forward one time unit (05 seconds) and calculates the next 20-second-window, producing successive 20-second slope values at 05-second increments.
I thought that calculating a rolling regression with rollapply (zoo package) would do the trick, but I get the same intercept and slope values for each window over and over again. What can I do?
My data:
dput(DataExample)
structure(list(t = c(0, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35,
0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95,
1, 1.05, 1.1, 1.15, 1.2, 1.25, 1.3, 1.35, 1.4, 1.45, 1.5, 1.55,
1.6, 1.65, 1.7, 1.75, 1.8, 1.85, 1.9, 1.95, 2, 2.05, 2.1, 2.15,
2.2, 2.25, 2.3, 2.35, 2.4, 2.45, 2.5, 2.55, 2.6, 2.65, 2.7, 2.75,
2.8, 2.85, 2.9, 2.95, 3, 3.05, 3.1, 3.15, 3.2, 3.25, 3.3, 3.35,
3.4, 3.45, 3.5, 3.55, 3.6, 3.65, 3.7, 3.75, 3.8, 3.85, 3.9, 3.95,
4, 4.05, 4.1, 4.15, 4.2, 4.25, 4.3, 4.35, 4.4, 4.45, 4.5, 4.55,
4.6, 4.65, 4.7, 4.75, 4.8, 4.85, 4.9, 4.95, 5, 5.05, 5.1, 5.15,
5.2, 5.25, 5.3, 5.35, 5.4, 5.45, 5.5, 5.55, 5.6, 5.65, 5.7, 5.75,
5.8, 5.85, 5.9, 5.95, 6, 6.05, 6.1, 6.15, 6.2, 6.25, 6.3, 6.35,
6.4, 6.45, 6.5, 6.55, 6.6, 6.65, 6.7, 6.75, 6.8, 6.85, 6.9, 6.95,
7, 7.05, 7.1, 7.15, 7.2, 7.25, 7.3, 7.35, 7.4, 7.45, 7.5, 7.55,
7.6, 7.65, 7.7, 7.75, 7.8, 7.85, 7.9, 7.95, 8, 8.05, 8.1, 8.15,
8.2, 8.25, 8.3, 8.35, 8.4, 8.45, 8.5, 8.55, 8.6, 8.65, 8.7, 8.75,
8.8, 8.85, 8.9, 8.95, 9, 9.05, 9.1, 9.15, 9.2, 9.25, 9.3, 9.35,
9.4, 9.45, 9.5, 9.55, 9.6, 9.65, 9.7, 9.75, 9.8, 9.85, 9.9, 9.95,
10, 10.05, 10.1, 10.15, 10.2, 10.25, 10.3), y = c(3.05, 3.04,
3.02, 3.05, 3.01, 3.02, 3.02, 3.05, 3.02, 3.01, 3.04, 3.04, 3.03,
3.03, 3.03, 3.02, 3.02, 3.03, 3.03, 3.03, 3.04, 3.03, 3.03, 3.03,
3.03, 3.02, 3.02, 3.02, 3.01, 3.03, 3.03, 3.03, 3.03, 3.03, 3.02,
3.01, 3.02, 3.02, 3.01, 3.02, 3.02, 3.02, 3.03, 3.02, 3.02, 3.01,
3.01, 3.02, 3.01, 3.02, 3.02, 3.02, 3.02, 3.01, 3.01, 3.01, 3.01,
3.02, 3, 3.01, 3.02, 3.02, 3.02, 3.01, 3.01, 3.01, 3.01, 3.02,
3, 3.01, 3.01, 3.01, 3.01, 3.01, 3.01, 3, 3, 3.01, 3, 3, 3.01,
3.01, 3.01, 3.01, 3, 3, 3, 3.01, 3, 3, 3.01, 3.01, 3.01, 3.01,
3.01, 3.01, 3, 3.02, 3, 3.01, 3.02, 3.04, 3.05, 3.08, 3.04, 3.06,
3.08, 3.06, 3.08, 3.09, 3.04, 3.05, 3.07, 3.08, 3.06, 3.08, 3.08,
3.07, 3.08, 3.08, 3.05, 3.06, 3.07, 3.07, 3.06, 3.08, 3.08, 3.08,
3.08, 3.08, 3.05, 3.06, 3.08, 3.08, 3.06, 3.09, 3.07, 3.08, 3.08,
3.08, 3.06, 3.07, 3.07, 3.07, 3.06, 3.09, 3.07, 3.07, 3.08, 3.08,
3.06, 3.07, 3.07, 3.07, 3.06, 3.09, 3.07, 3.07, 3.07, 3.08, 3.07,
3.07, 3.07, 3.07, 3.06, 3.08, 3.07, 3.07, 3.06, 3.08, 3.07, 3.07,
3.07, 3.07, 3.06, 3.08, 3.07, 3.07, 3.06, 3.08, 3.06, 3.07, 3.06,
3.07, 3.06, 3.08, 3.07, 3.07, 3.06, 3.07, 3.06, 3.07, 3.06, 3.07,
3.06, 3.07, 3.06, 3.06, 3.06, 3.07, 3.04, 3.04, 3.04, 3.06, 3.06,
3.04, 3.04)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-207L), .Names = c("t", "y"))
R-Code:
require(zoo)
library("zoo", lib.loc="~/R/win-library/3.3")
rollapply(zoo(DataExample),
width=5,
FUN = function(Z)
{
z = lm(formula=y~t, data = as.data.frame(DataExample));
return(z$coef)
}, by=1,
by.column=FALSE, align="right")

The comment seems to have been deleted but it was pointed out that the function in rollapply in the code in the question was not using the argument passed to it. After fixing that and making some other minor improvements, this returns the intercept and the slope in columns 1 and 2 respectively.
library(zoo)
Coef <- function(Z) coef(lm(y ~ t, as.data.frame(Z)))
rollapplyr(zoo(DataExample), 5, Coef, by.column = FALSE)

Here a complete code to illustrate what I was meaning with the speed of .lm.fit and lm.
As well as a usage with data.table.
library(zoo)
library(data.table)
library(ggplot2)
theme_set(theme_bw())
library(microbenchmark)
# function for linear regression and find the slope coefficient
rollingSlope.lm <- function(vector) {
a <- coef(lm(vector ~ seq(vector)))[2]
return(a)
}
rollingSlope.lm.fit <- function(vector) {
a <- coef(.lm.fit(cbind(1, seq(vector)), vector))[2]
return(a)
}
# create data example
test <- data.table(x = seq(100), y = dnorm(seq(100), mean=75, sd=30))
ggplot(test, aes(x, y))+ geom_point()
# graphics about the slope calculated
test[, ':=' (Slope.lm.fit = rollapply(y, width=5, FUN=rollingSlope.lm.fit, fill=NA),
Slope.lm = rollapply(y, width=5, FUN=rollingSlope.lm, fill=NA))]
# change the width size
test[, ':=' (Slope.lm.fit.50 = rollapply(y, width=50, FUN=rollingSlope.lm.fit, fill=NA),
Slope.lm.50 = rollapply(y, width=50, FUN=rollingSlope.lm, fill=NA))]
# melt data for plotting
test2 <- melt.data.table(test, measure.vars=c("Slope.lm.fit", "Slope.lm", "Slope.lm.fit.50", "Slope.lm.50"))
ggplot(test2, aes(x, value))+ geom_point(aes(color=variable))
# efficiency of the 2 lm
mb <- microbenchmark(lm.fit = a <- rollapply(test$y, 5, rollingSlope.lm.fit, fill=NA),
lm = b <- rollapply(test$y, 5, rollingSlope.lm, fill=NA))
# check if they equal
all.equal(a, b, check.attributes=FALSE)
# TRUE
# plot results
boxplot(mb, unit="ms", notch=TRUE)

This is how I would go about doing it without the zoo library
## Modified version of your function that does not rely on accessing
## variables that is external to its environment.
slopes<-function(data) {
z = lm(formula=y~t, data=data );
z$coef ## Implicit return of last variable
}
## The number of frames to take the windowed slope of
windowsize<-4
do.call(rbind,lapply(seq(dim(data)[1]-windowsize),
function(x) slopes(data[x:(x+windowsize),])))
It iterates over a list from 1 to length data - windowsize subsetting data into overlapping window sizes of 4. The subsetted data is then passed to your slopes function before being bound into a single array.

I've tried to plot slopes as geom_segment() but I failed. At least I've got the df with different values for slope:
slope <- function(dat){
return(data.frame(t = sprintf("[%f,%f]", min(dat$t), max(dat$t)),
slope = lm(y~t-1, data = dat)$coef,
row.names = NULL)
)
}
mw <- function(dtf, wdth = 0.2, incr = 0.05){
if(!nrow(dtf)){
return(data.frame())
}
return(rbind(slope(dtf[dtf$t <= min(dtf$t) + wdth,]),
mw(dtf[dtf$t >= min(dtf$t) + incr,])
)
)
}
slp <- mw(dtf)
head(slp)
tail(slp)
# t slope
# 1 [0.000000,0.200000] 20.180000
# 2 [0.050000,0.250000] 16.498182
# 3 [0.100000,0.300000] 13.433333
# 4 [0.200000,0.400000] 9.554737
# 5 [0.250000,0.450000] 8.299608
# 6 [0.300000,0.500000] 7.340606
# ...
#175 [9.900000,10.100000] 0.3049778
#176 [10.000000,10.200000] 0.3017733
#177 [10.050000,10.250000] 0.3002829
#178 [10.150000,10.300000] 0.2982748
#179 [10.250000,10.300000] 0.2958620
#180 [10.300000,10.300000] 0.2951456

Related

How to model quantiles regression curves for probabilities depending on a predictor in R?

I'd' like to model the 25th, 50th and 75th quantile regression curves (q25, q50, q75) for 241 values of probability ('prob') depending on x0.
For that purpose, I used the qgamV package as follows. However, this approach led to some q25, q50, q75 values <0 and >1, which is not expected for probabilities.
Graphically, one would expect the q25 and q75 regression curves to approach the 'prob' limits 0 and 1 in a more tangential way (see below).
How to model these quantiles curves as best as possible, knowing that they represent probabilities?
Thanks for help.
Initial dataframe (df0):
df0 <- structure(list(x0 = c(2.65, 3.1, 2.15, 2.45, 2.9, 1.55, 2.05,
2.75, 2, 2.45, 4.05, 1.95, 3.35, 2.15, 2.5, 1.75, 1.6, 2.3, 3.35,
3.55, 2.1, 3.15, 2.5, 1.05, 2.3, 2.3, 2.95, 0.8, 1.75, 2.95,
2.55, 1.65, 2.4, 2.8, 2.2, 3.45, 2.15, 2.9, 1.7, 2.7, 2.05, 2.75,
2.35, 3.75, 2.2, 1.1, 2.35, 2.5, 3.05, 1, 4.4, 1.3, 2.2, 2.5,
1.35, 1.95, 1.95, 5.45, 2, 1.65, 2.7, 2, 1.5, 1.05, 4.15, 2.15,
1.9, 1.85, 4.2, 2.2, 3.35, 1.55, 1.95, 2.3, 1.9, 3.45, 2.2, 3.55,
1.4, 2.5, 2.35, 2.5, 2.4, 3.35, 2, 2.6, 3.05, 2.75, 1.6, 1.65,
2.45, 1.55, 1.65, 2.25, 0.9, 2.4, 2.2, 2, 1.65, 1.35, 1.95, 2.5,
1.6, 1.25, 3.8, 2.25, 2.85, 1.45, 2.4, 2.8, 3.75, 3.05, 1.8,
1.25, 1.55, 2, 2.55, 2.75, 3.55, 2.2, 2.1, 3.55, 3.65, 2.3, 1.25,
2.45, 2.2, 1.95, 1.65, 0.7, 2, 1.5, 2.8, 3.4, 3.95, 2.55, 2.45,
2.65, 1.75, 1.7, 2.5, 2.05, 2.75, 2.05, 3, 2.25, 3.6, 2.35, 3.25,
1.6, 3.3, 2.05, 1.95, 2.15, 2.3, 4.1, 2.45, 1.6, 2.3, 0.6, 2.35,
2.45, 1.9, 2.5, 1.35, 3.2, 2.25, 1.65, 2.75, 1.8, 3, 0.95, 2.7,
2.15, 3.75, 2.5, 1.95, 2.7, 3.75, 2.4, 2.4, 3.05, 1.8, 3.6, 2.05,
2.75, 2.15, 1.35, 3.15, 2.25, 3.1, 2, 2.35, 3.3, 2.05, 0.75,
2.55, 2.2, 3.15, 3.1, 1.75, 3.2, 3.15, 2.8, 2.5, 1.8, 2.2, 1.85,
3.35, 1.35, 2.75, 1.85, 2.8, 2.65, 3.15, 1.15, 2.5, 3.75, 2.75,
4.55, 2.3, 2.65, 3.1, 3.65, 0.8, 2.45, 3.25, 3.65, 3.75, 1.75,
2.55, 1.15, 2.05, 2.05, 3.5, 0.75, 2.55, 2.2, 2.1, 2.15, 2.75
), prob = c(0.043824528975438, 0.0743831343145038, 0.0444802301649798,
0.0184204002808217, 0.012747152819121, 0.109320069103749, 0.868637913750677,
0.389605665620339, 0.846536935687218, 0.104932383728924, 0.000796924809569913,
0.844673988202945, 0.00120791067227541, 0.91751061807481, 0.0140582427585067,
0.61360854266884, 0.55603090737844, 0.0121424615930165, 0.000392412410090414,
0.00731972612592678, 0.450730636411052, 0.0111896050578429, 0.0552971757296455,
0.949825608148576, 0.00216318997302124, 0.620876890784462, 0.00434032271743834,
0.809464444601336, 0.890796570916792, 0.0070834616944228, 0.0563350845256127,
0.913156468748195, 0.00605085671490011, 0.00585882020388307,
0.0139577135093548, 0.0151356267602558, 0.00357231467872644,
0.000268107682417655, 0.047883018897558, 0.137688264298974, 0.846219411361109,
0.455395192661041, 0.440089914302649, 0.312776912863294, 0.721283899836456,
0.945808616162847, 0.160122538485323, 0.274966581834218, 0.223500907500226,
0.957169102670141, 3.29173412975754e-05, 0.920710197397359, 0.752055893010363,
0.204573327883464, 0.824869881489217, 0.0336636091577387, 0.834235793851965,
0.00377210373002217, 0.611370672834389, 0.876156793482752, 0.04563653558985,
0.742493995255321, 0.42035122692417, 0.916359628728296, 0.182755925347698,
0.139504394672643, 0.415836463269909, 0.0143112277191436, 0.00611022961831899,
0.794529254262237, 0.000295836911230635, 0.88504245090271, 0.0320097205131667,
0.386424550101868, 0.724747784339428, 0.0374198694261709, 0.772894216412908,
0.243626917726206, 0.884082536765856, 0.649357153222083, 0.651665475576256,
0.248153637183556, 0.621116026311962, 0.254679380328883, 0.815492354289526,
0.00384382735772974, 0.00098493832845314, 0.0289740210412282,
0.919537164719931, 0.029914235716672, 0.791051705450356, 0.535062926433525,
0.930153425256182, 0.739648381556949, 0.962078822556967, 0.717404075711021,
0.00426200695619151, 0.0688025266083751, 0.30592683399928, 0.76857384388609,
0.817428136470741, 0.0101583095649087, 0.190150584186769, 0.949353043876038,
0.000942385744019884, 0.00752842476126574, 0.451811230189468,
0.878142444707428, 0.085390660867941, 0.705492062082986, 0.00776625091631656,
0.120499683875168, 0.871558791341612, 0.204175216963286, 0.88865934672351,
0.735067195665991, 0.111767657566763, 0.0718305257427526, 0.001998068594943,
0.726375812318976, 0.628064249939129, 0.0163105011142307, 0.585565544471761,
0.225632568540361, 0.914834452659588, 0.755043268549628, 0.44993311080756,
0.876058522964169, 0.876909380258345, 0.935545943209396, 0.856566304797687,
0.891579321327903, 0.67586664661773, 0.305274362445618, 0.0416387565225755,
0.244843991055886, 0.651782914419153, 0.615583040148267, 0.0164959661557421,
0.545479687527543, 0.0254178939123714, 0.00480000384583597, 0.0256296636591875,
0.776444262284288, 0.00686736233661002, 0.738267311816833, 0.00284628668554737,
0.0240371572079387, 0.00549270830047392, 0.91880163437759, 0.336534358175717,
0.276841848679916, 0.718008645244615, 0.0897424253787563, 0.0719730540202573,
0.00215797941000608, 0.0219160132143199, 0.797680147185277, 0.66612383359622,
0.946965411044528, 0.133399527090937, 0.343056247984854, 0.202570454449074,
0.00349712323805031, 0.919979740593237, 0.577123238372546, 0.759418264563034,
0.904569159000302, 0.0179587619909363, 0.785657258439329, 0.235867625712547,
0.959688292861383, 0.668060191654474, 0.0014774986557077, 0.00831528722028647,
0.669655207261098, 0.157824457113222, 0.110637023939517, 0.262525772704882,
0.112654002253028, 0.22606090266161, 0.157513622503487, 0.25688454756606,
0.00201570863346944, 0.70318409224183, 0.25568985167711, 0.810637054896326,
0.92708070974999, 0.608664352336801, 0.707490903842404, 0.00094520948858089,
0.106177223644193, 0.582785205597368, 0.0585327568963445, 0.377814739935042,
0.972447647118833, 0.0111118791692372, 0.58947840090326, 0.0111189166236961,
0.00317374095338712, 0.0664218007312096, 0.00227258301798719,
0.00198861129291917, 0.337443337988163, 0.750708293355867, 0.837530172974158,
0.627428065068903, 0.744110974625108, 0.00320417425932798, 0.871800026765784,
0.613647987816266, 0.808457030433619, 0.00486495461698562, 0.597950577021363,
0.000885253981642748, 0.0800527366346806, 0.00951706823839207,
0.125222576598629, 0.346018567766834, 0.0376933970313487, 0.157903106929268,
0.0371982251307384, 0.00407175432189843, 0.0946588147179984,
0.967274516618573, 0.169109953293894, 0.00124072042059317, 0.00259042255361196,
0.000400511359506596, 0.841289470209085, 0.807106898740506, 0.926962245924993,
0.814160745645036, 0.662558468801531, 0.000288068688170646, 0.698932091902567,
0.00242011818508616, 0.645573844423654, 0.517121859568318, 0.0931231998319089,
0.000877774529895907)), row.names = c(NA, -241L), class = "data.frame")
Quantiles regressions and plot:
library(mgcViz)
library(qgam)
library(ggplot2)
# Quantile regressions
q50 <- qgamV(prob ~ s(x0, bs="cr", k=10), data = df0, qu = 0.5)
q25 <- qgamV(prob ~ s(x0, bs="cr", k=10), data = df0, qu = 0.25)
q75 <- qgamV(prob ~ s(x0, bs="cr", k=10), data = df0, qu = 0.75)
# New dataframe including fitted quantile values
df1 <- df0
df1$q50 <- q50[["fitted.values"]]
df1$q25 <- q25[["fitted.values"]]
df1$q75 <- q75[["fitted.values"]]
# Plot
x_brk <- seq(0, 6, 1); x_lab <- seq(0, 6, 1)
y_brk <- seq(0, 1, 0.1); y_lab <- seq(0, 1, 0.1)
ggplot(df1, aes(x = x0, y = prob))+
scale_x_continuous(limits=c(0, 20), expand=c(0, 0), breaks=x_brk, labels=x_lab)+
scale_y_continuous(limits=c(-1, 2),expand=c(0, 0), breaks=y_brk, labels=y_lab)+
geom_vline(xintercept=x_brk, colour="grey25", size=0.2)+
geom_hline(yintercept=y_brk, colour="grey50", size=0.2)+
geom_hline(yintercept=0.5, linetype="solid", color = "black", size=0.2)+
geom_point(data = df1, aes(x = x0, y = prob), colour = "grey50", size=0.75, inherit.aes = TRUE)+
xlab(~paste("x0"))+
ylab(~paste("Prob"))+
theme(plot.title = element_blank())+
theme(plot.margin=unit(c(0.2,0.5,0.01,0.3),"cm"))+
theme(axis.text.x=element_text(colour="black", size=9.5, margin=margin(b=10),vjust=-1))+
theme(axis.text.y=element_text(colour="black", size=9.5,hjust=0.5))+
theme(axis.title.x=element_text(colour="black", size=11.5, margin=margin(b=2), vjust=1))+
theme(axis.title.y=element_text(colour="black", size=11.5, margin=margin(b=2), vjust=4))+
theme(panel.background=element_rect(fill="white"), panel.border = element_rect(colour = "black", fill=NA))+
geom_line(aes(x=x0, y = q50), data=df1, colour="black",size=0.8, inherit.aes = TRUE)+
geom_line(aes(x=x0, y = q25), data=df1, colour="black",size=0.6, linetype = "longdash")+
geom_line(aes(x=x0, y = q75), data=df1, colour="black",size=0.6, linetype = "longdash")+
coord_cartesian(xlim = c(0, 6), ylim = c(0, 1))
Continuation of the solution proposed by user2974951:
Given the non-normal distribution of Prob, I think better to use qgam rather than quantreg, by taking inspiration from user2974951's solution.
The difference between these 2 quantile regression approaches is very slight on example x0, but much more obvious with another predictor x1:
Example x0:
Example x1:
You can use the logit transform and then use regular quantile regresion
library(quantreg)
df0 <- df0[order(df0$x0), ] # ordering just for easier visualization
df0$probL <- log(df0$prob/(1 - df0$prob))
t <- c(0.25, 0.5, 0.75)
mod <- lapply(t, function(x){rq(probL ~ x0, data=df0, tau=x)})
names(mod) <- paste0("Q_", t)
pre <- as.data.frame(do.call(cbind, lapply(mod, function(x){1/(1 + exp(-predict(x)))})))
plot(prob ~ x0, data=df0)
lines(pre$Q_0.25 ~ df0$x0, col="red")
lines(pre$Q_0.5 ~ df0$x0, col="green")
lines(pre$Q_0.75 ~ df0$x0, col="red")

How to use complete.cases in gtsummary for each variable for doing a paired t.test instead doing complete.cases for full data frame?

I am trying to do a paired t.test on my data for pre-post analysis and uses gtsummary package to create the table. As I have missing data I filter the dataframe by complete.cases(.) but as it filter for all the columns I am loosing much data. Instead of that I want filter complete.cases() only for the particular variable it test for each time. Eg: if it is doing the test for variable1 it should check the complete.cases() for only variable1. Can someone please help me how to accomplish it? Following is the code I am using now.
trial_paired <-
df %>% filter(OSAclass == 'OSA') %>% select(c('time1', 'CPAP','Cholesterol', 'Triglyceride','HDL_chol','LDL_chol'))%>%
group_by(time1) %>%
mutate(id = row_number()) %>%
ungroup()
t2 <-
trial_paired %>%
# delete missing values
filter(complete.cases(.)) %>%
# keep IDs with both measurements
group_by(id) %>%
filter(n() == 2) %>%
ungroup() %>%
# summarize data
tbl_summary(by = time1 , include = -id, type = all_continuous() ~ "continuous2", statistic = all_continuous() ~ c("{median} ({p25}, {p75})", "{min}, {max}", "{mean} ({sd})")) %>%
add_p(test = list(all_continuous() ~ "paired.t.test",
all_categorical() ~ "mcnemar.test"),
group = id)
structure(list(time1 = c("first", "second", "first", "second",
"first", "second", "first", "second", "first", "second", "first",
"second", "first", "second", "first", "second", "first", "second",
"first", "second", "first", "second", "first", "second", "first",
"second", "first", "second", "first", "second", "first", "second",
"first", "second", "first", "second", "first", "second", "first",
"second", "first", "second", "first", "second", "first", "second",
"first", "second", "first", "second", "first", "second", "first",
"second", "first", "second", "first", "second", "first", "second",
"first", "second", "first", "second", "first", "second", "first",
"second", "first", "second", "first", "second", "first", "second",
"first", "second", "first", "second", "first", "second", "first",
"second", "first", "second", "first", "second", "first", "second",
"first", "second", "first", "second", "first", "second", "first",
"second", "first", "second", "first", "second", "first", "second",
"first", "second", "first", "second", "first", "second", "first",
"second", "first", "second", "first", "second", "first", "second",
"first", "second", "first", "second", "first", "second", "first",
"second", "first", "second", "first", "second", "first", "second",
"first", "second", "first", "second", "first", "second", "first",
"second", "first", "second", "first", "second", "first", "second",
"first", "second", "first", "second", "first", "second", "first",
"second", "first", "second", "first", "second", "first", "second",
"first", "second", "first", "second", "first", "second", "first",
"second", "first", "second", "first", "second", "first", "second",
"first", "second", "first", "second"), CPAP = c(1, 1, 1, 1, 0,
0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
1, 0, 1, 1, 1, 1, 0, NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, NA, 0, 0,
0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0,
0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0,
0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0), Cholesterol = c(4.83, 4.83, 4.81, 4.81, 4.48, 4.48,
4.25, 4.25, 4.93, 4.93, 5.57, 5.57, 5.52, 5.52, 5.47, 5.47, 4.61,
4.61, 5.4, 5.4, 5.31, 5.31, 4.89, 4.89, 6.62, 6.62, 5.15, 5.15,
4.7, 4.7, 4.62, 4.62, 4.66, 4.66, 5.17, 5.17, 4.78, 4.78, 8.82,
8.82, 4.28, 4.28, 4.9, 4.9, 2.9, 2.9, 5.92, 5.92, 5.39, 5.39,
4.92, 4.92, 3.75, 3.75, 3.87, 3.87, 6.1, 6.1, 6.05, 6.05, 5.18,
5.18, 4.57, 4.57, 5.42, 5.42, 6.08, 6.08, 5.48, 5.48, 4.78, 4.78,
3.89, 3.89, 4.62, 4.62, 4.6, 4.6, 6.02, 6.02, 3.67, 3.67, 6.06,
6.06, 6.12, 6.12, 4.84, 4.84, 5.86, 5.86, 5.9, 5.9, 6.27, 6.27,
3.87, 3.87, 7.4, 7.4, 5.55, 5.55, 4.45, 4.45, 5.26, 5.26, 4.62,
4.62, 7.17, 7.17, 5.35, 5.35, 5.99, 5.99, 5.94, 5.94, 4.38, 4.38,
5.2, 5.2, 4.68, 4.68, 3.29, 3.29, 4.85, 4.85, 4.83, 4.83, 5.21,
5.21, 6.61, 6.61, 6.33, 6.33, 5.59, 5.59, 7.14, 7.14, 4.8, 4.8,
4.22, 4.22, 5.45, 5.45, 4.87, 4.87, 5.89, 5.89, 5.1, 5.1, 4.18,
4.18, 5.58, 5.58, 6.41, 6.41, 4.26, 4.26, 4.88, 4.88, 4.3, 4.3,
6.51, 6.51, 5.19, 5.19, 6, 6, 4.39, 4.39, 6, 6, 4.73, 4.73, 6.23,
6.23, 4.51, 4.51), Triglyceride = c(4.62, 4.62, 1.16, 1.16, 2.29,
2.29, 2.41, 2.41, 2.88, 2.88, 2.89, 2.89, 5.22, 5.22, 2.3, 2.3,
0.95, 0.95, 2.21, 2.21, 2.54, 2.54, 1.98, 1.98, 3.4, 3.4, 1.77,
1.77, 1.95, 1.95, 3.53, 3.53, 1.17, 1.17, 1.04, 1.04, 2.53, 2.53,
2.69, 2.69, 0.71, 0.71, 1.32, 1.32, 0.82, 0.82, 2.75, 2.75, 1.76,
1.76, 3.59, 3.59, 2.38, 2.38, 1.87, 1.87, 2.06, 2.06, 15.53,
15.53, 1.66, 1.66, 1.57, 1.57, 1.23, 1.23, 1.99, 1.99, 1.98,
1.98, 2, 2, 1.52, 1.52, 0.92, 0.92, 1.49, 1.49, 3.4, 3.4, 1.39,
1.39, 1.06, 1.06, 3.37, 3.37, 0.9, 0.9, 1.49, 1.49, 1.8, 1.8,
1.45, 1.45, 1.44, 1.44, 3.9, 3.9, 0.95, 0.95, 0.89, 0.89, 0.74,
0.74, 2.42, 2.42, 3.99, 3.99, 1.32, 1.32, 2.27, 2.27, 2.09, 2.09,
1.53, 1.53, 2.02, 2.02, 2.38, 2.38, 1.06, 1.06, 1.71, 1.71, 1.16,
1.16, 1.41, 1.41, 2.9, 2.9, 1.17, 1.17, 1.41, 1.41, 2.84, 2.84,
2.94, 2.94, 0.67, 0.67, 1.83, 1.83, 2.33, 2.33, 2.82, 2.82, 1.47,
1.47, 0.82, 0.82, 2.96, 2.96, 2.84, 2.84, 2.04, 2.04, 3.14, 3.14,
1.44, 1.44, 2.14, 2.14, 0.85, 0.85, 2.39, 2.39, 1.1, 1.1, 1.52,
1.52, 1.41, 1.41, 2.64, 2.64, 1.06, 1.06), HDL_chol = c(0.81,
0.81, 0.86, 0.86, 1.3, 1.3, 0.99, 0.99, 1.06, 1.06, 1.31, 1.31,
1.01, 1.01, 1.02, 1.02, 1.38, 1.38, 1.31, 1.31, 1.63, 1.63, 1.63,
1.63, 1.27, 1.27, 1.28, 1.28, 0.99, 0.99, 0.94, 0.94, 1.14, 1.14,
2.14, 2.14, 1.74, 1.74, 1.19, 1.19, 1.03, 1.03, 1.19, 1.19, 1.75,
1.75, 0.93, 0.93, 1.85, 1.85, 0.88, 0.88, 1.02, 1.02, 1.05, 1.05,
1.1, 1.1, 0.38, 0.38, 0.95, 0.95, 1.15, 1.15, 1.38, 1.38, 1.34,
1.34, 0.86, 0.86, 1.02, 1.02, 1.19, 1.19, 1.89, 1.89, 1.22, 1.22,
1.37, 1.37, 0.92, 0.92, 1.33, 1.33, 1.44, 1.44, 1.28, 1.28, 1.28,
1.28, 1.18, 1.18, 1.32, 1.32, 1.98, 1.98, 1.23, 1.23, 1.93, 1.93,
0.76, 0.76, 1.72, 1.72, 1.24, 1.24, 1.13, 1.13, 1.88, 1.88, 1.27,
1.27, 1.34, 1.34, 1.28, 1.28, 0.9, 0.9, 1.07, 1.07, 1.25, 1.25,
1.41, 1.41, 1.59, 1.59, 1.35, 1.35, 1.47, 1.47, 1.41, 1.41, 2.37,
2.37, 1.17, 1.17, 1.35, 1.35, 1.02, 1.02, 1.32, 1.32, 0.86, 0.86,
1.62, 1.62, 1.11, 1.11, 1.17, 1.17, 1, 1, 1.28, 1.28, 1.16, 1.16,
0.93, 0.93, 1.13, 1.13, 1.24, 1.24, 1.76, 1.76, 0.89, 0.89, 1.55,
1.55, 1.76, 1.76, 1.34, 1.34, 1.86, 1.86, 1.29, 1.29), LDL_chol = c(2.49,
2.49, 3.58, 3.58, 2.7, 2.7, 2.42, 2.42, 3.25, 3.25, 3.58, 3.58,
3.15, 3.15, 3.78, 3.78, 3.06, 3.06, 3.56, 3.56, 2.97, 2.97, 2.74,
2.74, 4.72, 4.72, 3.34, 3.34, 3.17, 3.17, 2.87, 2.87, 3.09, 3.09,
2.87, 2.87, 2.56, 2.56, 7.19, 7.19, 2.87, 2.87, 3.28, 3.28, 1.2,
1.2, 4.2, 4.2, 3.22, 3.22, 3.1, 3.1, 2.27, 2.27, 2.43, 2.43,
4.49, 4.49, 1.52, 1.52, 3.67, 3.67, 2.97, 2.97, 3.67, 3.67, 4.3,
4.3, 3.96, 3.96, 3.2, 3.2, 2.41, 2.41, 2.64, 2.64, 3.03, 3.03,
3.82, 3.82, 2.28, 2.28, 4, 4, 3.91, 3.91, 3.27, 3.27, 4.07, 4.07,
4.11, 4.11, 4.47, 4.47, 2.39, 2.39, 5.23, 5.23, 3.43, 3.43, 3.13,
3.13, 3.13, 3.13, 2.55, 2.55, 4.99, 4.99, 3.16, 3.16, 4.05, 4.05,
4.15, 4.15, 2.6, 2.6, 3.54, 3.54, 2.74, 2.74, 1.59, 1.59, 2.79,
2.79, 2.77, 2.77, 3.32, 3.32, 4.3, 4.3, 4.56, 4.56, 2.87, 2.87,
5.29, 5.29, 2.7, 2.7, 2.85, 2.85, 3.55, 3.55, 3.26, 3.26, 3.4,
3.4, 3.49, 3.49, 2.59, 2.59, 3.74, 3.74, 4.24, 4.24, 2.73, 2.73,
2.98, 2.98, 2.87, 2.87, 4.89, 4.89, 3.38, 3.38, 4.35, 4.35, 2.51,
2.51, 4.16, 4.16, 2.99, 2.99, 3.92, 3.92, 2.77, 2.77), ANGPTL8 = c(3337.5,
3962.5, 2737.5, 962.5, 1775, 3737.5, 1025, 962.5, 1175, 912.5,
1662.5, 2075, 2862.5, 1950, 2337.5, 1875, 350, 14412.5, 962.5,
787.5, 1650, 2150, 3250, 1150, 1425, 1162.5, 975, 762.5, 5562.5,
2662.5, 1450, 787.5, 387.5, 475, 1037.5, 1125, 1462.5, 1750,
1137.5, 800, 812.5, 1637.5, 750, 4850, 1112.5, 1187.5, 662.5,
462.5, 4125, 1825, 1275, 750, 6275, 1062.5, 737.5, 3650, 1650,
1425, 2925, 1512.5, 1100, 887.5, 662.5, 825, 487.5, 662.5, 400,
600, 1077.77777777778, 1211.11111111111, 555.555555555556, 511.111111111111,
1066.66666666667, 1311.11111111111, 277.777777777778, 1822.22222222222,
1000, 1055.55555555556, 1255.55555555556, 1000, 1555.55555555556,
1266.66666666667, 1233.33333333333, 1422.22222222222, 1655.55555555556,
800, 555.555555555556, 677.777777777778, 411.111111111111, 344.444444444445,
766.666666666667, 800, 333.333333333333, 1011.11111111111, 455.555555555555,
955.555555555556, 833.333333333333, 777.777777777778, 844.444444444444,
866.666666666667, 755.555555555556, 1011.11111111111, 722.222222222222,
888.888888888889, 255.555555555556, 244.444444444445, 1433.33333333333,
1033.33333333333, 488.888888888889, 477.777777777778, 1600, 1022.22222222222,
1077.77777777778, 988.888888888889, 622.222222222222, 2500, 2077.77777777778,
688.888888888889, 788.888888888889, 1155.55555555556, 1288.88888888889,
1633.33333333333, 1744.44444444445, 2011.11111111111, 366.666666666667,
466.666666666667, 522.222222222222, 1222.22222222222, 477.777777777778,
788.888888888889, 994.444444444445, 1383.33333333333, 2183.33333333333,
661.111111111111, 2350, 1772.22222222222, 672.222222222222, 1183.33333333333,
494.444444444445, 883.333333333333, 416.666666666667, 338.888888888889,
2005.55555555555, 594.444444444444, NA, 305.555555555555, 961.111111111111,
1138.88888888889, 616.666666666667, 583.333333333333, 1405.55555555556,
705.555555555555, 1605.55555555556, 1594.44444444445, 1094.44444444444,
1272.22222222222, 3127.77777777778, 961.111111111111, 750, 661.111111111111,
916.666666666667, 572.222222222222, 1150, 1094.44444444444, 683.333333333333,
827.777777777778, 972.222222222222, 238.888888888889, NA, 327.777777777778,
850, 750, 672.222222222222, 827.777777777778, 983.333333333333,
1038.88888888889), BMP_2 = c(23, 26.92, 25.62, 26.27, 25.62,
26.92, 24.97, 26.92, 25.62, 28.2, NA, 26.92, 22.34, 23, 26.92,
24.32, 24.32, 25.62, 24.32, 25.62, 24.32, 23, 25.62, 28.2, 25.62,
24.32, 23, 26.92, 25.62, 28.2, 24.32, 26.92, 18.95, 23, 23, 25.62,
23, 24.32, 24.32, 23, 25.62, 25.62, 21.67, 26.92, 24.32, 25.62,
21.67, 23, 23, 26.92, 28.2, 24.32, 28.2, 28.2, 26.92, 26.92,
25.62, 25.62, 24.32, 24.32, 24.32, 24.32, 25.62, 23, 17.57, 20.32,
30.61, 27.33, 20.94, 26.16, 23.68, 26.16, 26.16, 28.46, 23.68,
26.16, 20.94, 32.65, 26.16, 28.46, 28.46, 30.61, 26.16, 32.65,
23.68, 28.46, 23.68, 28.46, 19.43, 22.35, 26.16, 28.46, 23.68,
28.46, 26.16, 30.61, 26.16, 28.46, 23.68, 23.68, 28.46, 30.61,
30.61, 30.61, 26.16, 28.46, 20.94, 26.16, 23.68, 30.61, 26.16,
28.46, 20.94, 23.68, 31.64, 26.16, 23.68, 30.61, 23.68, 28.46,
26.16, 30.61, 20.94, 26.16, 14.02, 26.16, 20.94, 23.68, 30.61,
34.58, 23.39, 26.67, 19.74, 19.74, 3, 15.48, 15.48, 23.39, 17.71,
15.48, 15.48, 19.74, 3, 10, NA, 23.39, 19.74, 26.67, 19.74, 19.74,
19.74, 23.39, 17.71, 23.39, 23.39, 26.67, 3, 3, 3, 23.39, 19.74,
19.74, 19.74, 29.69, 33.85, 23.39, 10, 10, 15.48, 23.39, 10,
19.74, 15.48, 15.48, 19.74, 19.74), IGFBP_3_1 = c(441353.12,
NA, 393869.87, NA, NA, NA, 579939.36, NA, 456112.02, NA, NA,
610080.87, NA, NA, 533744.22, 628064.64, 523351.47, NA, 517877.29,
NA, 486315.82, NA, NA, 542659.7, 508437.67, 589967.34, 536282.89,
512564.26, 436271.69, 601179.52, 504448.47, 506264.97, 420330.98,
NA, 538394.66, NA, NA, NA, NA, NA, 495111.88, 549340.97, 672083.18,
NA, 591978.44, NA, NA, 571958.24, 507324.12, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, 475288.45, NA, 536037.9, 548109.89,
559995.14, NA, 473616.64, 542571.78, 465343.85, 1127900, 714496.84,
NA, 646959.05, 4856100, 443062.73, 542179.38, 579299.18, 1142900,
564875.53, 1037100, 1174200, NA, 548298.03, 874608.37, 902414.03,
1471500, NA, NA, 1668200, NA, 3153500, 1527000, 534397.71, 556715.71,
1016800, 703025.17, NA, NA, 161911.33, 126486.58, 682462.8, NA,
1365000, NA, 977538.37, NA, 3348600, NA, 1022700, 783787.11,
NA, NA, 859094.87, NA, 1056900, 953743.93, 363547.86, 422392.66,
796697.33, 804929.76, 686250.79, 859712.77, 726741.92, 2091000,
568594.78, 644119.63, 1139000, NA, 802047.77, NA, 1256800, 1442100,
1058500, 974033.9, 967920.77, 981304.96, 1107000, 1197400, 1019800,
1346600, 1135800, 1261900, 1203600, 1352600, NA, 1335400, 1100400,
1398300, 924378.25, 1194500, 1384400, 1186500, 1360700, 1222800,
843925.82, 1232900, 1600800, 1489200, 1133700, 1451700, 1182700,
1445100, 1732100, 1528500, 1321900, 1313500, 1101500, 1422500,
1344700, 1460200, 1224900, 1225100, 1167800, 1155800, 1149200,
1278700)), row.names = c(NA, -176L), class = c("tbl_df", "tbl",
"data.frame"))
You can use !is.na(variable) to drop rows with NA values only for specific variable.
library(dplyr)
library(gtsummary)
t2 <-
trial_paired %>%
# delete missing values in variable1
filter(!is.na(variable1)) %>%
# keep IDs with both measurements
group_by(id) %>%
filter(n() == 2) %>%
ungroup() %>%
# summarize data
tbl_summary(by = time1 , include = -id, type = all_continuous() ~ "continuous2", statistic = all_continuous() ~ c("{median} ({p25}, {p75})", "{min}, {max}", "{mean} ({sd})")) %>%
add_p(test = list(all_continuous() ~ "paired.t.test",
all_categorical() ~ "mcnemar.test"),
group = id)
To do this dynamically we can create a function.
summary_data <- function(data, var) {
data %>%
# delete missing values
filter(!is.na(.data[[var]])) %>%
# keep IDs with both measurements
group_by(id) %>%
filter(n() == 2) %>%
ungroup() %>%
# summarize data
tbl_summary(by = time1 , include = -id, type = all_continuous() ~ "continuous2", statistic = all_continuous() ~ c("{median} ({p25}, {p75})", "{min}, {max}", "{mean} ({sd})")) %>%
add_p(test = list(all_continuous() ~ "paired.t.test",
all_categorical() ~ "mcnemar.test"),
group = id)
}
#apply function to single column
summary_data(trial_paired, 'Cholesterol')
summary_data(trial_paired, 'Triglyceride')
#apply function to multiple column
cols <- c('Cholesterol', 'Triglyceride', 'HDL_chol')
#Or drop only the first column
#cols <- names(trial_paired)[-1]
res <- lapply(cols, summary_data, data = trial_paired)

Cbind() in R gives "mts" "ts" "matrix" object instead of "mts" "ts" object

I have two ts objects.
structure(c(5.92, 5.97, 5.92, 6.04, 6.32, 6.43, 6.48, 6.04, 6.2,
6.09, 5.29, 5.05, 5.13, 5, 4.81, 4.86, 5.42, 5.22, 5.19, 5.06,
4.95, 4.88, 4.93, 5.03, 4.99, 4.97, 5.1, 4.89, 4.74, 4.56, 4.43,
4.35, 4.22, 4.3, 4.71, 4.76, 4.95, 4.84, 4.84, 4.64, 4.51, 4.54,
4.27, 4.11, 4.07, 3.99, 3.96, 3.92, 3.89, 3.95, 3.91, 3.8, 3.67,
3.55, 3.6, 3.5, 3.38, 3.35, 3.34, 3.41, 3.53, 3.56, 3.45, 3.54,
4.07, 4.37, 4.46, 4.49, 4.19, 4.26, 4.46, 4.43, 4.3, 4.34, 4.34,
4.19, 4.16, 4.13, 4.12, 4.16, 4.04, 4, 3.86, 3.67, 3.71, 3.77,
3.67, 3.84, 3.98, 4.05, 3.91, 3.89, 3.8, 3.94, 3.96, 3.87, 3.66,
3.69, 3.6, 3.6, 3.57, 3.44, 3.44, 3.46, 3.47, 3.77, 4.2, 4.15,
4.17, 4.2, 4.04, 4.01, 3.9, 3.97, 3.88, 3.8, 3.9, 3.92, 3.95,
4.03, 4.33, 4.44, 4.47, 4.59, 4.57, 4.53, 4.55, 4.63, 4.83, 4.87,
4.64, 4.46, 4.37, 4.26, 4.14, 4.07, 3.8, 3.77, 3.62, 3.6, 3.69,
3.7, 3.72, 3.62, 3.47, 3.45, 3.31, 3.23, 3.16, 3.02, 2.94), .Tsp = c(2008.08333333333,
2020.58333333333, 12), class = "ts")
and
structure(c(250000, 246000, 249000, 250000, 255000, 258500, 255000,
245000, 235000, 225000, 226000, 215000, 205000, 215000, 215000,
218000, 225000, 220300, 220000, 211000, 202000, 200000, 200000,
196000, 187000, 200000, 201000, 207500, 220000, 210000, 216600,
200000, 199700, 190000, 190750, 180000, 178000, 171500, 185000,
184000, 195000, 200000, 195000, 175000, 170000, 163000, 160000,
153000, 150000, 165000, 175000, 180500, 198000, 195000, 186000,
177000, 162000, 166000, 165000, 153000, 149000, 164000, 179000,
192000, 213750, 213000, 208500, 197500, 185400, 182000, 180000,
175000, 167000, 183000, 192500, 207500, 225000, 220000, 222500,
206000, 192500, 190000, 190000, 182000, 179000, 199000, 212000,
220000, 228500, 224900, 218000, 210000, 203000, 198000, 199000,
190000, 185995, 206250, 223000, 227750, 232000, 230000, 223000,
210000, 210000, 207500, 208000, 205500, 204000, 222500, 235000,
235000, 245000, 240000, 229700, 222000, 218000, 215000, 219875,
215000, 217500, 230000, 239000, 240000, 249900, 243000, 233000,
229000, 222000, 223000, 220000, 216000, 222000, 232500, 240000,
246000, 249500, 249000, 242000, 235000, 231000, 231250, 230000,
223000, 225000, 248000, 260000, 251000, 254000, 262000, 270000
), .Tsp = c(2008.08333333333, 2020.58333333333, 12), class = "ts")
Both are imported as ts objects.
Median_price_ts <- ts(chicago$Median_price, start = c(2008, 2), end = c(2020, 8), frequency = 12)
Average_rate_ts <- ts(mortgage_data_monthly$Average_rate, start = c(2008, 2), end = c(2020, 8), frequency = 12)
When I used cbind() to combine them, it gives me the following:
class(Median_price_ts)
class(Average_rate_ts)
foo <- cbind(Median_price_ts, Average_rate_ts)
class(foo)
I get:
[1] "ts"
[1] "ts"
[1] "mts" "ts" "matrix"
and instead I just want
"mts" "ts"
Why is this important? I am building a VAR model and apparently my data must be only in "mts" "ts" format for forecasting.
VAR works well with dataframe and "mts" "ts" "matrix".
This tutorial does cbind() and receives only "mts" "ts". I also worked with other such datasets before and abled to do forecasting with just "mts" "ts", but not "mts" "ts" "matrix".
Appreciate tips!
You can force a class on foo :
class(foo) <- c('mts', 'ts')
class(foo)
#[1] "mts" "ts"
foo
# Median_price_ts Average_rate_ts
#Feb 2008 5.92 250000
#Mar 2008 5.97 246000
#Apr 2008 5.92 249000
#May 2008 6.04 250000
#Jun 2008 6.32 255000
#Jul 2008 6.43 258500
#Aug 2008 6.48 255000
#Sep 2008 6.04 245000
#Oct 2008 6.20 235000
#...
#...

ggpairs error: Error in cor.test.default(x, y, method = method, use = use) : not enough finite observations

I am trying to create a scatterplot matrix using package GGally and ggpairs. In my dataset tol, I have several demographic variables that are categorical, and several that are continuous. I created a data frame with the variables I wanted and tried to omit NA values because I keep getting this error:
Error in cor.test.default(x, y, method = method, use = use) : not
enough finite observations"
When I don't include the aesthetic mapping, the scatterplot works just fine. Even when I mess with my csv file to make sure there are no empty cells, I still get this error.
Here is the code:
cs <- tol[c("location","comp_sat_avg","burnout_avg","sec_stress_avg","burnout_ee_avg","burnout_dp_avg","burnout_pa_avg","obs_avg","desc_avg","aware_avg","nonjudg_avg","nonreac_avg","wkplre_wc_avg","Efficacy_avg","Lotr_avg","hsecontrol_avg","hsemsupport_avg","hsepsupport_avg","hserole_avg","hsedemands_avg")]
csdata <- na.omit(cs)
ggpairs(csdata,lower=list(continuous="smooth"),mapping=ggplot2::aes(color= location)) +
theme_bw()
I have three other categorical variables I need to group by separately so any help is extremely appreciated.
Per stefan's comment here is a sample of my dataset:
tol <- structure(list(location = c("Mukono Health Center IV", "Mukono Health Center IV",
"Goma Health Center III", "Goma Health Center III", "Goma Health Center III",
"Kawolo General Hospital", "Kawolo General Hospital", "Mukono Health Center IV",
"Mukono Health Center IV", "Lwanyonyi VHT", "Mukono Health Center IV",
"Goma Health Center III", "Mukono Health Center IV", "Mukono Health Center IV",
"Goma Health Center III", "Mukono Health Center IV", "Mukono Health Center IV",
"Mukono Health Center IV", "Mukono Health Center IV", "Lwanyonyi VHT"
), comp_sat_avg = c(4.6, 4.9, 4.4, 4.2, 3.7, 4.2, 3, 4.3, 3.8,
4.4, 2.8, 3.9, 4.7, 4.4, 3.22, 4.6, 1.8, 4.67, 3, 4.8), burnout_avg = c(2.2,
3.2, 2.1, 2.7, 3.4, 2.1, 3.11, 2.4, 2.6, 2.5, 2.89, 2, 1.8, 1.8,
2.78, 2.6, 3.5, 2.7, 2.56, 2.1), sec_stress_avg = c(2.6, 1.4,
2.44, 3.1, 3.5, 2.8, 3.1, 2.4, 3.1, 3.33, 2.56, 1.8, 2.8, 1.9,
3.1, 2.8, 1.5, 3.8, 3.9, 2.6), burnout_ee_avg = c(2.11, 2.33,
2.78, 2.67, 4.67, 1.22, 1, 3.33, 1.78, 4.33, 3.33, 1.78, 2.78,
1.11, 1.67, 2.89, 5.89, 1.78, 3, 0.78), burnout_dp_avg = c(1.6,
0.4, 1.2, 2.4, 1.8, 0.75, 1.2, 2.8, 0.6, 2.4, 4.2, 2.4, 1.2,
0.6, 3.8, 3.2, 5.6, 1, 1.6, 0.4), burnout_pa_avg = c(5.13, 5.75,
4.75, 2.88, 5.25, 4.67, 5.75, 5, 5.5, 5.25, 4.88, 4.5, 3.75,
4.13, 3.13, 4, 4, 3, 4.88, 5.88), obs_avg = c(3.63, 3.25, 2,
4.38, 2.88, 4, 3.75, 2.38, 2.13, 2.75, 4.63, 3.88, 3, 2.14, 3.83,
3.5, 2.25, 2.63, 4.13, 3.88), desc_avg = c(3, 3.38, 4.5, 3.88,
3.38, 3.13, 3.63, 2.63, 3.75, 4.25, 3.5, 4.38, 2.57, 3.63, 3.25,
3.63, 3.13, 4.13, 4.25, 3.38), aware_avg = c(2.5, 4.25, 4.63,
4.25, 4.13, 3.5, 4.13, 3.25, 3.25, 4.75, 4.13, 4.75, 3.5, 3.88,
2.13, 4.13, 3.5, 4.13, 3.57, 3.25), nonjudg_avg = c(1.88, 3.63,
4.38, 1.88, 2.63, 3.25, 3, 3, 3.25, 4, 2, 3, 3, 4.88, 1.86, 2.88,
3.25, 2.5, 2.38, 1.63), nonreac_avg = c(3.71, 3.57, 2.43, 4.29,
3, 3.43, 3.86, 3.86, 2.86, 4.29, 3.86, 3, 3, 3.14, 4.43, 3.43,
2.8, 3.71, 3.57, 3.43), wkplre_wc_avg = c(5.07, 6.13, 5.8, 5.27,
4.33, 6.2, 4.07, 7, 6.27, 2.29, 5.14, 4.4, 4.73, 5.47, 5.07,
4.93, 3.07, 5.6, 5.73, 4.8), Efficacy_avg = c(4, 1.4, 3.6, 3.1,
3.1, 2.9, 3.6, 2, 2.5, 3.3, 3.7, 3.6, 1.9, 3.7, 3.5, 3.6, 3.2,
3.6, 3.5, 3.9), Lotr_avg = c(2.17, 2.33, 3.6, 0.5, 2.67, 1.67,
3.2, 2.17, 2.5, 3.67, 2.33, 3.67, 1.17, 1.83, 2, 2.67, 1.83,
2.67, 2.83, 3.5), hsecontrol_avg = c(3.67, 4.5, 3.5, 3.5, 3.17,
3.83, 4.5, 4.33, 3.83, 3.83, 3.67, 4.67, 4.5, 3.67, 3.83, 3.17,
3, 4.17, 3.83, 3.17), hsemsupport_avg = c(3.6, 4, 3.2, 3.6, 3.2,
4.2, 3.6, 4, 3.8, 3.6, 3, 4.2, 3.4, 4.2, 3.8, 3.2, 2.4, 4, 4,
3.8), hsepsupport_avg = c(3.25, 4, 3.75, 3.5, 3, 4.75, 4.25,
4.75, 3.75, 3.5, 4.67, 4.25, 3.75, 4, 4, 3.25, 1.5, 4, 4, 4),
hserole_avg = c(4.8, 5, 4.4, 4.2, 5, 4, 4, 4.2, 4, 4.6, 4.6,
4.8, 4.2, 4.2, 3.2, 4.4, 2.8, 4, 4.2, 5), hsedemands_avg = c(2,
3.29, 3.29, 4, 1.86, 3.57, 3.29, 1.71, 3.14, 1.71, 3.71,
3.71, 3.43, 3.86, 1.86, 2.71, 4, 3.29, 3.57, 2.57)), row.names = c(NA,
-20L), class = c("tbl_df", "tbl", "data.frame"), na.action = structure(c(`1` = 1L,
`5` = 5L, `11` = 11L, `15` = 15L, `19` = 19L, `24` = 24L, `27` = 27L,
`30` = 30L, `46` = 46L, `47` = 47L), class = "omit"))
You need to take two steps to make this work. There are two locations that only have two observations, which will not work with cor.test.default. Subset your data to remove those observations:
csdata <-
csdata %>%
filter(
location != "Kawolo General Hospital"
, location != "Lwanyonyi VHT"
)
However, now your dataset will retain those factor levels but with 0 observations for each. Convert variable locations to factor using:
csdata$location <- factor(csdata$location)
Now your ggpairs with aesthetics mapping will run no problem:
ggpairs(csdata,lower=list(continuous="smooth"),mapping=ggplot2::aes(color= location)) +
theme_bw()

How do I plot a linear regression line in a specified bin in a histogram?

So we are trying to determine speciation rate as a function of animal weight. Animal weight follows a gaussian distribution when they are plotted altogether; hence, we only want to fit the regression line in the decreasing trend of the histogram. Specifically, the line should start from x = 2.1 and y = 3.0. Fig. 1 is my current plot using the code below, while Fig. 2 is the outcome I would like to acquire (superimposed line via paint), which I don't know how to do. Any help on the matter will be greatly appreciated.
Attached is my code:
x.log = c(-2.9, -2.7, -2.5, -2.3, -2.1, -1.9, -1.7, -1.5, -1.3, -1.1,
-0.9,-0.7, -0.5, -0.3, -0.1, 0.1, 0.3, 0.5, 0.5, 0.7, 0.9, 1.1,
1.3, 1.5, 1.7, 1.9, 2.1, 2.3, 2.5, 2.7, 2.9, 3.1, 3.3, 3.5, 3.7,
3.9, 4.1, 4.3, 4.5, 4.7, 4.9, 5.1, 5.3, 5.5, 5.7, 5.9, 6.1,
6.3, 6.5,6.9, 7.1, 7.3, 7.5, 7.7, 7.9)
y.log = c(0, 0, 0, 0.47, 0.60, 0.95, 1.14, 1.38, 1.68, 1.79, 2.10, 2.26,
2.29, 2.39, 2.48, 2.52, 2.79, 2.68, 2.80, 2.84, 2.96, 2.92,
2.91, 3.01, 2.95, 3.05, 2.94, 2.96, 2.98, 2.83, 2.85, 2.83,
2.71, 2.63, 2.61, 2.57, 2.37, 2.26, 2.17, 1.99, 1.87, 1.74,
1.62, 1.36, 1.30, 1.07, 1.20, 0.90, 0.30, 0.69, 0.30, 0.47, 0
0.30, 0)
# plot the histogram
names(log.nspecies) = logbio
log.nspecies = log.nspecies[order (as.numeric(names(log.nspecies)))]
xpos = barplot(log.nspecies, las = 2, space = 0, col = 'red',
xlab = 'ln Weight', ylab = 'ln Number of species')

Resources