Removing Outliers 3SDs from the mean of a monoexponetial function in R - r

I have a large data set that analyzes exercising subjects' oxygen consumption over time (x= Time, y = VO2). This data fits a monoexponential function.
Here is a brief, sample data frame:
'''
VO2 <- c(11.71,9.84,17.96,18.87,14.58,13.38,5.89,20.28,20.03,31.17,22.07,30.29,29.08,32.89,29.01,29.21,32.42,25.47,30.51,37.86,23.48,40.27,36.25,19.34,36.53,35.19,22.45,30.23,3,19.48,25.35,22.74)
Time <- c(0,2,27,29,31,33,39,77,80,94,99,131,133,134,135,149,167,170,177,178,183,184,192,222,239,241,244,245,250,251,255,256)
DF <- data.frame(VO2,Time)
'''
visual representation of the data -- * note that this data set is much smaller (and therefore might not fit a function as well) as the full data set.
I am somewhat new to R and very much not a mathematical expert. I would appreciate your help with the two goals of this data set.
Based on typical conventions of the laboratory I work in, this data should be fit to a monoexponential function
I would love some insight into fitting data to a function such as this. Note that I have many similar data sets (for different subjects) and need to fit a monoexponential function to each of them. It would be best if fit could be applied generically across my data sets.
Based on this monoexponential function, I would like to identify and remove any outlying points. Here I will define an outlier as any point >3 standard deviations from the mean of the monoexponential function.
So far, I have this (unsuccessful) code to fit a function to the above data. Not only does it not fit well, but I am also unable to create a smooth function.
'''
fit <- lm(VO2~poly(Time,2,raw=TRUE))
xx <- seq(1,250, length=32)
plot(Time,VO2,pch=19,ylim=c(0,50))+
lines(xx, predict(fit, data.frame(DF=xx)), col="red")
'''
Thank you to all the individuals who have commented and provided their valuable feedback. As I continue to learn and research, I will add to this post with successful/less successful attempts at the code for this process. Thank you for your knowledge, assistance and understanding.

Related

Using bootstrapping to compare full and sample datasets

This is a fairly complicated situation, so I'll try to succinctly explain but feel free to ask for clarification.
I have several datasets of biological data that vary significantly in sample size (e.g., 253-1221 observations/dataset). I need to estimate individual breeding parameters and compare them (for a different analysis), but because of the large sample size differences, I took a sub-set of data from each dataset so the sample sizes were equal for each comparison. For example, the smallest dataset had 253 observations, so for all the others I used the following code
AT_EABL_subset <- Atlantic_EABL[sample(1:nrow(Atlantic_EABL), 253,replace=FALSE),]
to take a subset of 253 observations from the full dataset (in this case AT_EABL originally had 1,221 observations).
It's now suggested that I use bootstrapping to check if the parameter estimates from my subsets are similar to the full dataset estimates. I'm looking for code that will run, say, 200 iterations of the above subset data and calculate the average of the coefficients so I can compare them to the coefficients from my model with the full dataset. I found a site that uses the sample function to achieve this (https://towardsdatascience.com/bootstrap-regression-in-r-98bfe4ff5007), but when I get to this portion of the code
c(sample_coef_intercept, model_bootstrap$coefficients[1])
sample_coef_x1 <-
c(sample_coef_x1, model_bootstrap$coefficients[2])
}
I get
Error: $ operator not defined for this S4 class
Below is the code I'm using. I don't know if I'm getting the above error because of the type of model I'm running (glmer vs. lm used in the link), or if there's a different function that will give me the data I need. Any advice is greatly appreciated.
sample_coef_intercept <- NULL
sample_coef_x1 <- NULL
for (i in 1:2) {
boot.sample = AT_EABL_subset[sample(1:nrow(AT_EABL_subset), nrow(AT_EABL_subset), replace = FALSE), ]
model_bootstrap <- glmer(cbind(YOUNG_HOST_TOTAL_ATLEAST,CLUTCH_SIZE_HOST_ATLEAST-YOUNG_HOST_TOTAL_ATLEAST)~as.factor(YEAR)+(1|LatLong),binomial,data=boot.sample)}
sample_coef_intercept <-
c(sample_coef_intercept, model_bootstrap$coefficients[1])
sample_coef_x1 <-
c(sample_coef_x1, model_bootstrap$coefficients[2])

How to create a rolling linear regression in R?

I am trying to create (as the title suggests) a rolling linear regression equation on a set of data (daily returns of two variables, total of 257 observations for each, linked together by date, want to make the rolling window 100 observations). I have searched for rolling regression packages but I have not found one that works on my data. The two data pieces are stored within one data frame.
Also, I am pretty new to programming, so any advice would help.
Some of the code I have used is below.
WeightedIMV_VIX_returns_combined_ID100896 <- left_join(ID100896_WeightedIMV_daily_returns, ID100896_VIX_daily_returns, by=c("Date"))
head(WeightedIMV_VIX_returns_combined_ID100896, n=20)
plot(WeightedIMV_returns ~ VIX_returns, data = WeightedIMV_VIX_returns_combined_ID100896)#the data seems to be correlated enought to run a regression, doesnt matter which one you put first
ID100896.lm <- lm(WeightedIMV_returns ~ VIX_returns, data = WeightedIMV_VIX_returns_combined_ID100896)
summary(ID100896.lm) #so the estimate Intercept is 1.2370, estimate Slope is 5.8266.
termplot(ID100896.lm)
Again, sorry if this code is poor, or if I am missing any information that some of you may need to help. This is my first time on here! Just let me know what I can do better. Thanks!

Converting R script to SAS

I want to add noise to a dataset. This is a fairly straightforward procedure in R. I sample from a Laplace distribution and then add/multiply/whatever that vector to the vector I want to add noise to.
The issue is, my colleague is asking for the code in SAS. I have not used SAS since graduate school and my project has been put on hold until I can get my colleague up to speed in SAS.
My code is pretty simple :
library ("rmutil")
vector <- c (1,2,3,1,2,3,1,2,3)
vector_prop <- vector/sum(vector)
noise <- rlaplace(9, m=1, s=.1)
new_vector <- vector_prop * noise
I am turning my vector I want to add noise to into a proportion, then drawing from a laplace distribution. Finally I multiply those draws with my proportion vector.
Any idea would be helpful as the SAS documentation was difficult to follow. I imagine they feel the same way with R documentation.
Assuming your data is in a data set called have with a variable called vector_prop the following code is likely correct. Because of the nature of random numbers and streams you can't replicate that though, don't you end up with a different data set each time?
data want;
set have;
call streaminit(24); *fixes random number stream for reproduciblilty;
new_var = vectorProp * rand('laplace', 1, 0.1);
run;

Where does R store subset information in glm object?

I'm trying to do some post-processing of a large number of glm models that I am working with, but I need to extract information about the data subset from the glm objects.
As a toy example:
x <- rnorm(100)
y <- rnorm(100,x,0.5)
s<-sample(c(T,F),100,replace=T)
myGlm <- glm(y~x, subset= s)
From this, I need to know which of the 100 observations were used by getting the information out of myGlm. I thought that myGlm$data would have the subsetted data, but it actually has all 100 observations in it. I looked through str(myGlm) to no avail. However, it is quite clear that somewhere in the object, information about the subset s is stored.
This seems like it should be totally trivial!
Thanks in advance.
as.numeric(rownames(myGlm$model))

R: Use VAR model to predict response to change in values of certain variables

I've fitted a VECM model in R, and converted in to a VAR representation. I would like to use this model to predict the future value of a response variable based on different scenarios for the explanatory variables.
Here is the code for the model:
library(urca)
library(vars)
input <-read.csv("data.csv")
ts <- ts(input[16:52,],c(2000,1),frequency=4)
dat1 <- cbind(ts[,"dx"], ts[,"u"], ts[,"cci"],ts[,"bci"],ts[,"cpi"],ts[,"gdp"])
args('ca.jo')
vecm <- ca.jo(dat1, type = 'trace', K = 2, season = NULL,spec="longrun",dumvar=NULL)
vecm.var <- vec2var(vecm,r=2)
Now what I would like do is to predict "dx" into the future by varying the others. I am not sure if something like "predict dx if u=30,cpi=15,bci=50,gdp=..." in the next period would work. So what I have in mind is something along the lines of: increase "u" by 15% in the next period (which would obviously impact on all the other variables as well, including "dx") and predict the impact into the future.
Also, I am not sure if the "vec2var" step is necessary, so please ignore it if you think it is redundant.
Thanks
Karl
This subject is covered very nicely in Chapters 4 and 8 of Bernhard Pfaff's book, "Analysis of Integrated and Cointegrated Time Series with R", for which the vars and urca packages were written.
The vec2var step is necessary if you want to use the predict functionality that's available.
A more complete answer was provided on the R-Sig-Finance list. See also this related thread.
Here you go - ??forecast gave vars::predict, Predict method for objects of class varest and vec2var as an answer, which looks precisely as you want it. Increasing u looks like impulse response analysis, so look it up!

Resources