Linear interpolation in loops - r

I have two files. One file has five observations of “Flux” across a three treatment experiment (treatments=A, B, C). In these three treatments temperatures have been manipulated. The observations of Flux are taken at five points in a 24 hr period. The second file (Temp) contains the temperatures for the three treatments across the 24 hour period.
I would like to use linear interpolation to predict what the Flux values will be at every hour during the 24 hour period. Note that the interpolation equations will be slightly different between the three treatments.
Can this be done in a loop so that the values of flux are estimated for each hour in the Temp.csv file? Then have the values integrated (summed) across the 24 hour period?
The files are available on dropbox here: Temp Data
This shows: the different slopes of the best fit linear relationships between flux and temperature across the three treatments:
#subset data in flux by treatment
fluxA<-flux[which(flux$Treatment=='A'),]
fluxB<-flux[which(flux$Treatment=='B'),]
fluxC<-flux[which(flux$Treatment=='C'),]
#Regression of Flux~Temperature
modelA<-lm (Flux~Temperature, data=fluxA)
summary (modelA)
modelB<-lm (Flux~Temperature, data=fluxB)
summary (modelB)
modelC<-lm (Flux~Temperature, data=fluxC)
summary (modelC)
#plot the regressions
plot (Flux~Temperature, data=fluxA,pch=16, xlim=c(0,28), ylim=c(0,20))
abline(modelA)
points(Flux~Temperature, data=fluxB,pch=16, col="orange")
abline(modelB, col="orange")
points(Flux~Temperature, data=fluxC,pch=16, col="red")
abline(modelC, col="red")

caldat <- read.csv(text="Treatment,Temperature,Flux
A,18.64,7.75
A,16.02,8.49
A,17.41,9.24
A,21.06,4.42
A,22.8,5.61
B,19.73,5.7
B,17.45,8.37
B,19.2,5.27
B,20.97,3.37
B,27.6,2.26
C,23.79,9.91
C,15.89,15.8
C,21.93,10.28
C,24.79,6.33
C,26.64,6.64
")
plot(Flux~Temperature, data=caldat, col=Treatment)
mod <- lm(Flux~Temperature*Treatment, data=caldat)
summary(mod)
points(rep(seq(16,28, length.out=1e3),3),
predict(mod, newdata=data.frame(Temperature=rep(seq(16,28, length.out=1e3),3),
Treatment=rep(c("A", "B", "C"), each=1e3))),
pch=".", col=rep(1:3, each=1e3))
You'll need to consider carefully if this is an appropriate and "good" model. Use standard regression diagnostics.
preddata <- read.csv(text="Time,A,B,C
100,17.8,21.64,23.04
200,17.5,21.3,22.7
300,17.23,21,22.39
400,16.92,20.67,22.08
500,16.47,20.3,21.74
600,15.78,19.75,21.24
700,15.19,19.14,20.63
800,14.58,18.47,20
900,14.22,17.99,19.49
1000,13.77,17.55,19.08
1100,13.39,17.02,18.62
1200,13.34,16.76,18.26
1300,13.17,16.62,18.05
1400,13.24,16.58,17.91
1500,13.31,16.63,17.86
1600,13.26,16.61,17.81
1700,13.12,16.57,17.75
1800,12.9,16.45,17.65
1900,12.74,16.32,17.54
2000,12.57,16.2,17.42
2100,12.36,16.04,17.28
2200,12.1,15.83,17.1
2300,11.79,15.57,16.88
2400,11.53,15.3,16.64
")
library(reshape2)
preddata <- melt(preddata, id="Time",
variable.name="Treatment", value.name="Temperature")
preddata$Flux <- predict(mod, newdata=preddata)
plot(Flux~Time, data=preddata, col=Treatment)
Sum the fluxes:
aggregate(Flux ~ Treatment, data=preddata, FUN=sum)
# Treatment Flux
#1 A 247.5572
#2 B 159.6803
#3 C 309.6186

Related

Unmarked colext function: detection probability = 1

I'm building a single species dynamic occupancy model through the r package "unmarked" with UnmarkedMultFrame setup with the "colext()" function for pika den occupancy across 4 years. However, I want to specify that detection probability (p) = 1 (perfect detection). I want to do this because the detection probability for known dens is near-perfect (many other studies make this assumption too). In theory, this is just a simpler model and kind-of defeats the purpose of an occupancy model, but I'm using it because I need to estimate colonization and extinction probabilities.
Another specification is that all dens we are monitoring were initially occupied the first year we have data for them (so occupancy = 1 for all dens the first year and I am monitoring extinction and re-colonization rates after the first year).
Does anyone know how to specify in Unmarked that p = 1 when using the colext function? This function estimates detection probability, but I'm not sure what it is basing it off of, so I don't know how to either eliminate it from the function entirely or force it to be 1. Here is an example of my data and code:
dets1 <- as.matrix(dets1) #detections (179 total dens sampled once per year for 4 years (lots of NAs))
year <- factor(rep(c(2018, 2019,2020, 2021),179)) #the 4 years we surveyed
UMFdets <- unmarkedMultFrame(y=dets1, numPrimary=4)
m4 <- colext(psiformula = ~1, # First-year occupancy
gammaformula = ~ year, # Colonization
epsilonformula = ~ year, # Extinction
pformula = ~1, #Detection
data = UMFdets)
*simply removing "pformula" doesn't work.
Any ideas or knowledge about this would be much appreciated! Thank you!

How to make linear regression for time intervals?

I have two to three hours data measured in seconds. I want to split this up in 11 intervals and make a linear regression on each interval.
The first time interval can be from 7-17 minutes and the next 18 - 27 minutes. My data has a column of seconds and and a column for the measuring in the champer.
I have started to make a plot
library(readr)
s24kul05p <- read.delim("C:/Data/24skulp05.txt", quote="")
View(s24kul05p)
s24kul05p
head(s24kul05p)
tail(s24kul05p)
data("s24kul05p")
plot(Ch1~Min, data=s24kul05p, ylim =c(170,250), xlim=c(1, 151), col="red")
abline(lm(Ch1~Min, data=s24kul05p))
After this I get a plot with one linear model, and it could be nice if it was possible make 11 linear models?
Drop it into a matrix of 11 columns, then turn it into a data.frame again. You'll have 11 variables to run regression.
Y <- runif(231)
M <- matrix(Y, ncol = 11)
M <- as.data.frame(M)

How to run a regression row by row

I just started using R for statistical purposes and I appreciate any kind of help.
As a first step, I ran a time series regression over my columns. Y values are dependent and the X is explanatory.
# example
Y1 <- runif(100, 5.0, 17.5)
Y2 <- runif(100, 4.0, 27.5)
Y3 <- runif(100, 3.0, 14.5)
Y4 <- runif(100, 2.0, 12.5)
Y5 <- runif(100, 5.0, 17.5)
X <- runif(100, 5.0, 7.5)
df1 <- data.frame(X, Y1, Y2, Y3, Y4, Y5)
# calculating log returns to provide data for the first regression
n <- nrow(df1)
X_logret <- log(X[2:n])-log(X[1:(n-1)])
Y1_logret <- log(Y1[2:n])-log(Y1[1:(n-1)])
Y2_logret <- log(Y2[2:n])-log(Y2[1:(n-1)])
Y3_logret <- log(Y3[2:n])-log(Y3[1:(n-1)])
Y4_logret <- log(Y4[2:n])-log(Y4[1:(n-1)])
Y5_logret <- log(Y5[2:n])-log(Y5[1:(n-1)])
# bringing the calculated log returns together in one data frame
df2 <- data.frame(X_logret, Y1_logret, Y2_logret, Y3_logret, Y4_logret, Y5_logret)
# running the time series regression
Regression <- lm(as.matrix(df2[c('Y1_logret', 'Y2_logret', 'Y3_logret', 'Y4_logret', 'Y5_logret')]) ~ df2$X)
# extracting the coefficients for further calculation
Regression$coefficients[2,(1:5)]
As a second step I want to run a regression row by row, which is day by day, since the data contains daily observed values. I also have a column "DATE" but I didn't know how to bring it in here in the example. The format of the DATE column is POSIXct, maybe someone has an idea how to refer to a certain period in it on which the regression should be done.
In the row by row regression I would like to use the 5 calculated coefficients (from the first regression) as an explanatory variable. The 5 Y_logret values, I would like to use as dependent variable.
Y_logret(1 to 5) = Beta * Regression$coefficients[2,(1:5)] + error value. The intercept is not needed, so I would set it to zero by adding +0 in the lm function.
My goal is to run this regression over a period of time, for example over 20 days. Day by day, this would provide a total of 20 Beta estimates (for one regression per day), but I would also need all errors for further calculation. So I have to extract 5 errors per day, that is a total of 20*5 error values.
This is just an example, in the original dataset I have 20 of the Y values and over 4000 rows. I would like to run the regression over certain intervals with 900-1000 day. Since I am completely new to R, I have no idea how to proceed. Especially how to code this in a few lines.
I really appreciate any kind of help.

ARIMA forecasts with R - how to update data

I've been trying to develop an ARIMA model to forecast wind speed values. I have a four year data series (from january 2008 until december 2011). The series presents 10 minute data, which means that in a day we have 144 observations. Well, I'm using the first three years (observations 1 to 157157) to generate the model and the last year to validate the model.
The thing is I want to update the forecast. On other words, when one forecast ends up, more data is added to the dataset and another forecast is performed. But the result seems like I had just lagged the original series. Here's the code:
#1 - Load data:
z=read.csv('D:/Faculdade/Mestrado/Dissertação/velocidade/tudo_10m.csv', header=T, dec=".")
vel=ts(z, start=c(2008,1), frequency=52000)
# 5 - ARIMA Forecasts:
library(forecast)
n=157157
while(n<=157200){
amostra <- vel[1:n] # Only data until 2010
pred <- auto.arima(amostra, seasonal=TRUE,
ic="aicc", stepwise=FALSE, trace=TRUE,
approximation=TRUE, xreg=NULL,
test="adf",
allowdrift=TRUE, lambda=NULL, parallel=TRUE, num.cores=4)
velpred <- arima(pred) # Is this step really necessary?
velpred
predvel<- forecast(pred, h=12) # h means the forecast steps ahead
predvel
plot(amostra, xlim=c(157158, n), ylim=c(0,20), col="blue", main="Previsões e Observações", type="l", lty=1)
lines(fitted(predvel), xlim=c(157158, n), ylim=c(0,20), col="red", lty=2)
n=n+12
}
But when it plot the results (I couldn't post the picture here), it exhibits the observed series and the forecasted plot, which seems just the same as the observed series, but one step lagged.
Can anyone help me examining my code and/or giving tips on how to get the best of my model? Thanks! (Hope my English is understandable...)

Gompertz Aging analysis in R

I have survival data from an experiment in flies which examines rates of aging in various genotypes. The data is available to me in several layouts so the choice of which is up to you, whichever suits the answer best.
One dataframe (wide.df) looks like this, where each genotype (Exp, of which there is ~640) has a row, and the days run in sequence horizontally from day 4 to day 98 with counts of new deaths every two days.
Exp Day4 Day6 Day8 Day10 Day12 Day14 ...
A 0 0 0 2 3 1 ...
I make the example using this:
wide.df2<-data.frame("A",0,0,0,2,3,1,3,4,5,3,4,7,8,2,10,1,2)
colnames(wide.df2)<-c("Exp","Day4","Day6","Day8","Day10","Day12","Day14","Day16","Day18","Day20","Day22","Day24","Day26","Day28","Day30","Day32","Day34","Day36")
Another version is like this, where each day has a row for each 'Exp' and the number of deaths on that day are recorded.
Exp Deaths Day
A 0 4
A 0 6
A 0 8
A 2 10
A 3 12
.. .. ..
To make this example:
df2<-data.frame(c("A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A"),c(0,0,0,2,3,1,3,4,5,3,4,7,8,2,10,1,2),c(4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36))
colnames(df2)<-c("Exp","Deaths","Day")
What I would like to do is perform a Gompertz Analysis (See second paragraph of "the life table" here). The equation is:
μx = α*e β*x
Where μx is probability of death at a given time, α is initial mortality rate, and β is the rate of aging.
I would like to be able to get a dataframe which has α and β estimates for each of my ~640 genotypes for further analysis later.
I need help going from the above dataframes to an output of these values for each of my genotypes in R.
I have looked through the package flexsurv which may house the answer but I have failed in attempts to find and implement it.
This should get you started...
Firstly, for the flexsurvreg function to work, you need to specify your input data as a Surv object (from package:survival). This means one row per observation.
The first thing is to re-create the 'raw' data from the summary tables you provide.
(I know rbind is not efficient, but you can always switch to data.table for large sets).
### get rows with >1 death
df3 <- df2[df2$Deaths>1, 2:3]
### expand to give one row per death per time
df3 <- sapply(df3, FUN=function(x) rep(df3[, 2], df3[, 1]))
### each death is 1 (occurs once)
df3[, 1] <- 1
### add this to the rows with <=1 death
df3 <- rbind(df3, df2[!df2$Deaths>1, 2:3])
### convert to Surv object
library(survival)
s1 <- with(df3, Surv(Day, Deaths))
### get parameters for Gompertz distribution
library(flexsurv)
f1 <- flexsurvreg(s1 ~ 1, dist="gompertz")
giving
> f1$res
est L95% U95%
shape 0.165351912 0.1281016481 0.202602176
rate 0.001767956 0.0006902161 0.004528537
Note that this is an intercept-only model as all your genotypes are A.
You can loop this over multiple survival objects once you have re-created the per-observation data as above.
From the flexsurv docs:
Gompertz distribution with shape parameter a and rate parameter
b has hazard function
H(x: a, b) = b.e^{ax}
So it appears your alpha is b, the rate, and beta is a, the shape.

Resources