Lowess function and plyr - r

I am working with a data frame I created and want to expand it to include a lowess fit. I have been able to add a lowess curve to the plot of my data along with an lm fit but I cannot figure out how to add the lowess values to my data frame.
Please forgive the ugliness of my code as I do everything by brute force (i.e. suggestions for simplification/efficiency are appreciated.) I apologize as I am not allowed to post images. I have a scatterplot with an lm fit yielding an R^2 of .7897 and a lowess curve which well replicates the qqplot of the lm fit. When I get a reputation of "10" I will post it for edification/ease of visualization:
##Read in Data
OPM.df <- read.csv("On Peak Mod TMAX.csv", header = TRUE)
## Data frame 2008-2012
OPM5.df <- OPM.df[4606:6140,]
##Verify headings
OPM5.df[1,]
SummerOPM5.df <- OPM5.df[month(OPM5.df$Date) >= 6 & month(OPM5.df$Date) <= 9, ]
###Fit Linear Regression to Data
fitsummerX <- lm(SummerOPM5.df$MaxLoad~SummerOPM5.df$TMAX)
summary(fitsummerX)
##Plot data
windows()
plot(SummerOPM5.df$TMAX, SummerOPM5.df$MaxLoad, main="Linear Regression Adjusted R- squared: 0.7897",)
## Add fit lines
abline(fitsummerX, col="red") # regression line (y~x)
lines(lowess(SummerOPM5.df$TMAX,SummerOPM5.df$MaxLoad), col="blue") # lowess line (x,y)
## plyr augmentation of df
SummerOPM5.df <- ddply(SummerOPM5.df, .(Date, MaxLoad, TMAX, OnPeakTotal), transform,
Lowess = (lowess(TMAX,MaxLoad)$y))
##Verify headings and values
SummerOPM5.df [1:5,]
This gives the following:
Date MaxLoad TMAX OnPeakTotal Lowess
1 2008-06-02 2880 214.0 43307 2880
2 2008-06-03 2860 197.0 43166 2860
3 2008-06-04 2787 172.5 42088 2787
4 2008-06-05 2902 216.5 43333 2902
5 2008-06-06 3078 275.0 45325 3078
The Lowess values are the same as the MaxLoad
If I run the lowess seperately:
Lowess = (lowess(SummerOPM5.df$TMAX,SummerOPM5.df$MaxLoad))
Lowess$y[1:5]
I get this:
[1] 2522.221 2569.523 2603.625 2622.795 2622.795
Where the vector is in rank order.
.
I am unsure if it preserves the sequence otherwise so am unsure how to fit this to the corresponding rows of my data frame.
Suggestions?

Related

Writing linear and exponential equations for fitted SMA model

I have fitted a standard major axis model to my data, and I need add an equation onto the plot but I can't figure out what this should be.
My data looks like this:
> head(d1)
x y
1 5.379431 10.263158
2 5.863559 5.287081
3 10.367855 4.186603
4 11.459073 5.669856
5 14.477543 6.387560
6 16.713999 4.377990
My model gives the following:
library(smatr)
m1 <- sma(y~x, data = d1, log="xy")
#Coefficients:
# elevation slope
#estimate -0.23978063 0.8576100
#lower limit -0.54266508 0.6786154
#upper limit 0.06310381 1.0838170
#H0 : variables uncorrelated
#R-squared : 0.3228417
#P-value : 1.3667e-05
So based on this I have plotted the data and added an equation for the line that looks like this:
plot1 of data and SMA fit
My problem is that the regression equation y=-0.240+0.858x doesn't make sense with the trendline. I have also been asked to provide an exponential equation (e.g. y = ab^x) and I have no idea how to convert it.
Any help would be much appreciated!
Following the comments, the working formula is: y=e^(0.240+ 0.858*log10(x))).
This is directly in the y=a+bx format, with:
a=e^0.240
b=e^(0.858/ln(10))

How to make linear regression for time intervals?

I have two to three hours data measured in seconds. I want to split this up in 11 intervals and make a linear regression on each interval.
The first time interval can be from 7-17 minutes and the next 18 - 27 minutes. My data has a column of seconds and and a column for the measuring in the champer.
I have started to make a plot
library(readr)
s24kul05p <- read.delim("C:/Data/24skulp05.txt", quote="")
View(s24kul05p)
s24kul05p
head(s24kul05p)
tail(s24kul05p)
data("s24kul05p")
plot(Ch1~Min, data=s24kul05p, ylim =c(170,250), xlim=c(1, 151), col="red")
abline(lm(Ch1~Min, data=s24kul05p))
After this I get a plot with one linear model, and it could be nice if it was possible make 11 linear models?
Drop it into a matrix of 11 columns, then turn it into a data.frame again. You'll have 11 variables to run regression.
Y <- runif(231)
M <- matrix(Y, ncol = 11)
M <- as.data.frame(M)

R: Calculate sill, range and nugget from a raster object

I need to calculate the sill, range and nugget from a raster layer. I have explored gstat, usdm packages where one can create variogram however I couln't find a function which given a raster layer will estimate these parameters.In most of the functions these parameters have to be defined eg. krigging.
I have raster data layers for different heights which looks similar to
I would like get the sill, nugget and range from the parameters of semivariogram fitted to these data layers to create a plot similar to this:
The original data layers are available here as a multiband tiff. Here is a figure from this paper which further illustrates the concept.
Using gstat, here is an example:
library(raster)
library(gstat)
demo(meuse, ask = FALSE, echo = FALSE)
set.seed(131) # make random numbers reproducible
# add some noise with .1 variance
meuse.grid$dist = meuse.grid$dist + rnorm(nrow(meuse.grid), sd=sqrt(.1))
r = raster(meuse.grid["dist"])
v = variogram(dist~1, as(r, "SpatialPixelsDataFrame"))
(f = fit.variogram(v, vgm("Sph")))
# model psill range
# 1 Nug 0.09035948 0.000
# 2 Sph 0.06709838 1216.737
f$psill[2] # sill
# [1] 0.06709838
f$range[2] # range
# [1] 1216.737
f$psill[1] # nugget
# [1] 0.09035948
Plug in your own raster for r, and it should work. Change the Sph to fit another variogram model, try plot(v,f) to verify the plot.
This is just a guess. This is how I estimate semi variance
where n is the number of layers which their mean is less than the total mean. m is the total mean across all the layers. r is the mean of each layer that fell below the total mean.
s <- stack("old_gap_.tif")
m <- cellStats(mean(s), stat="mean", na.rm=T) # 0.5620522
r <- m[m < 0.5620522]
sem <- 1/53 * (0.5620522 - r)^2
plot(sem, r)

ARIMA forecasts with R - how to update data

I've been trying to develop an ARIMA model to forecast wind speed values. I have a four year data series (from january 2008 until december 2011). The series presents 10 minute data, which means that in a day we have 144 observations. Well, I'm using the first three years (observations 1 to 157157) to generate the model and the last year to validate the model.
The thing is I want to update the forecast. On other words, when one forecast ends up, more data is added to the dataset and another forecast is performed. But the result seems like I had just lagged the original series. Here's the code:
#1 - Load data:
z=read.csv('D:/Faculdade/Mestrado/Dissertação/velocidade/tudo_10m.csv', header=T, dec=".")
vel=ts(z, start=c(2008,1), frequency=52000)
# 5 - ARIMA Forecasts:
library(forecast)
n=157157
while(n<=157200){
amostra <- vel[1:n] # Only data until 2010
pred <- auto.arima(amostra, seasonal=TRUE,
ic="aicc", stepwise=FALSE, trace=TRUE,
approximation=TRUE, xreg=NULL,
test="adf",
allowdrift=TRUE, lambda=NULL, parallel=TRUE, num.cores=4)
velpred <- arima(pred) # Is this step really necessary?
velpred
predvel<- forecast(pred, h=12) # h means the forecast steps ahead
predvel
plot(amostra, xlim=c(157158, n), ylim=c(0,20), col="blue", main="Previsões e Observações", type="l", lty=1)
lines(fitted(predvel), xlim=c(157158, n), ylim=c(0,20), col="red", lty=2)
n=n+12
}
But when it plot the results (I couldn't post the picture here), it exhibits the observed series and the forecasted plot, which seems just the same as the observed series, but one step lagged.
Can anyone help me examining my code and/or giving tips on how to get the best of my model? Thanks! (Hope my English is understandable...)

How to plot the individual trajectories of an lme model

I have the example data and model
x<-rep(seq(0,100,by=1),10)
y<-15+2*rnorm(1010,10,4)*x+rnorm(1010,20,100)
id<-NULL
for (i in 1:10){
id<-c(id, rep(i,101))}
dtfr<-data.frame(x=x,y=y, id=id)
library(nlme)
with (dtfr, summary(lme((y)~x,random=~1+x|id, na.action=na.omit )))
model.mx<-with (dtfr, (lme((y)~x,random=~1+x|id, na.action=na.omit )))
pd<-predict(model.mx, newdata=data.frame(x=0:100),level=0)
with (dtfr, plot(x, y))
lines(0:100,predict(model.mx, newdata=data.frame(x=0:100),level=0), col="darkred", lwd=7)
How can I extract the modelled intercept and slope of each individual ID and plot the individual trajectories of each ID?
Not sure what you want to do because all your coefficients are almost identical:
> coef(model.mx)
(Intercept) x
1 54.88302 19.18001
2 54.88298 19.18000
3 54.88299 19.18000
4 54.88299 19.18000
5 54.88302 19.18001
6 54.88300 19.18000
7 54.88301 19.18000
8 54.88300 19.18000
9 54.88299 19.18000
10 54.88300 19.18000
Maybe your real data gives you more different results. If it's the case, I would use abline inside a mapply call:
with (dtfr, plot(x, y))
mapply(abline,a=coef(model.mx)[,1],b=coef(model.mx)[,2], col=1:10)
Here's the result. Since all coeffcients are almost the same, the lines are plotted on top of each other. You only see the last one.

Resources