Let's say I have 4 raster layers with the same extend with data of 4 different years: 2006,2008,2010 and 2012:
library(raster)
r2006<-raster(ncol=3, nrow=3)
values(r2006)<-1:9
r2008<-raster(ncol=3, nrow=3)
values(r2008)<-3:11
r2010<-raster(ncol=3, nrow=3)
values(r2010)<-5:13
r2012<-raster(ncol=3, nrow=3)
values(r2012)<-7:15
Now I want to create raster layers for every year between 2006 and 2013 (or even longer) by inter-/extrapolating (a linear method should be a good start) the values of the 4 raster layers. The result should look like this:
r2006<-raster(ncol=3, nrow=3)
values(r2006)<-1:9
r2007<-raster(ncol=3, nrow=3)
values(r2007)<-2:10
r2008<-raster(ncol=3, nrow=3)
values(r2008)<-3:11
r2009<-raster(ncol=3, nrow=3)
values(r2009)<-4:12
r2010<-raster(ncol=3, nrow=3)
values(r2010)<-5:13
r2011<-raster(ncol=3, nrow=3)
values(r2011)<-6:14
r2012<-raster(ncol=3, nrow=3)
values(r2012)<-7:15
r2013<-raster(ncol=3, nrow=3)
values(r2013)<-8:16
Using lm() or approxExtrap don't seem to help a lot.
One way to do this is to separate your problem into two parts: 1. First, perform the numerical interpolation on the raster values, 2. and apply the interpolated values to the appropriate intermediate raster layers.
Idea: Build a data frame of the values() of the raster layers, time index that data frame, and then apply Linear Interpolation to those numbers. For linear interpolation I use approxTime from the simecol package.
For your example above,
library(raster)
library(simecol)
df <- data.frame("2006" = 1:9, "2008" = 3:11, "2010" = 5:13, "2012"=7:15)
#transpose since we want time to be the first col, and the values to be columns
new <- data.frame(t(df))
times <- seq(2006, 2012, by=2)
new <- cbind(times, new)
# Now, apply Linear Interpolate for each layer of the raster
approxTime(new, 2006:2012, rule = 2)
This gives:
# times X1 X2 X3 X4 X5 X6 X7 X8 X9
#1 2006 1 2 3 4 5 6 7 8 9
#2 2007 2 3 4 5 6 7 8 9 10
#3 2008 3 4 5 6 7 8 9 10 11
#4 2009 4 5 6 7 8 9 10 11 12
#5 2010 5 6 7 8 9 10 11 12 13
#6 2011 6 7 8 9 10 11 12 13 14
#7 2012 7 8 9 10 11 12 13 14 15
You can then store this, and take each row and apply to the values of that year's raster object.
Note: approxTime does not do linear extrapolation. It simply takes the closest value, so you need to account for that.
Related
I have a dataset of Ages for the customer and I wanted to make a frequency distribution by 9 years of a gap of age.
Ages=c(83,51,66,61,82,65,54,56,92,60,65,87,68,64,51,
70,75,66,74,68,44,55,78,69,98,67,82,77,79,62,38,88,76,99,
84,47,60,42,66,74,91,71,83,80,68,65,51,56,73,55)
My desired outcome would be similar to below-shared table, variable names can be differed(as you wish)
Could I use binCounts code into it ? if yes could you help me out using the code as not sure of bx and idxs in this code?
binCounts(x, idxs = NULL, bx, right = FALSE) ??
Age Count
38-46 3
47-55 7
56-64 7
65-73 14
74-82 10
83-91 6
92-100 3
Much Appreciated!
I don't know about the binCounts or even the package it is in but i have a bare r function:
data.frame(table(cut(Ages,0:7*9+37)))
Var1 Freq
1 (37,46] 3
2 (46,55] 7
3 (55,64] 7
4 (64,73] 14
5 (73,82] 10
6 (82,91] 6
7 (91,100] 3
To exactly duplicate your results:
lowerlimit=c(37,46,55,64,73,82,91,101)
Labels=paste(head(lowerlimit,-1)+1,lowerlimit[-1],sep="-")#I add one to have 38 47 etc
group=cut(Ages,lowerlimit,Labels)#Determine which group the ages belong to
tab=table(group)#Form a frequency table
as.data.frame(tab)# transform the table into a dataframe
group Freq
1 38-46 3
2 47-55 7
3 56-64 7
4 65-73 14
5 74-82 10
6 83-91 6
7 92-100 3
All this can be combined as:
data.frame(table(cut(Ages,s<-0:7*9+37,paste(head(s+1,-1),s[-1],sep="-"))))
This may be a simple question, but I haven't been able to find any answer. Consider you have a dataframe with n columns with molecular features. In the last row of each column, a coefficient of variance is expressed.
Example data set:
a <- data.frame(matrix(runif(30),ncol=3))
b <- c(50.23,45.23,21)
a<-rbind(a,b)
X1 X2 X3
1 0.1097075 0.78584027 0.20925033
2 0.6081752 0.39669748 0.65559913
3 0.9912855 0.68462073 0.54741795
4 0.8543848 0.53776889 0.43789447
5 0.2579654 0.92188090 0.61292895
6 0.6203840 0.73152279 0.82866311
7 0.6643195 0.84953926 0.62192976
8 0.5760624 0.30949900 0.11032929
9 0.8888167 0.04530598 0.08089825
10 0.8926815 0.61736284 0.19834310
11 50.2300000 45.23000000 21.00000000
How do I subset so I only get the columns with CV>50 in the last row? So my new data.frame would be:
X1
1 0.1097075
2 0.6081752
3 0.9912855
4 0.8543848
5 0.2579654
6 0.6203840
7 0.6643195
8 0.5760624
9 0.8888167
10 0.8926815
11 50.230000
We can do
a[,a[nrow(a),]>50,drop=FALSE]
This question already has an answer here:
Fitting a linear model with multiple LHS
(1 answer)
Closed 6 years ago.
I want to compute a linear regression on all column (or to a selected column) of a specific dataset. The first column respresent a X axis of the regression, the other each subject response. The second step is to extract for each specific subject the coefficients parameters of regression (linear or logistic).
Actually I do it manually for each column using lm (or glm) and extracting the coefficients to a specific variable and dataset.
Example using lm:
dataset <- as.data.frame(matrix(c(1,1,
3,7,2,1,4,5,3,2,4,6,4,2,5,8,5,5,9,9,6,4,
12,10,7,6,15,11,8,6,15,15,9,8,16,10,10,9,18,9,11,12,
20,12,12,15,21,16,13,18,22,15,14,22,21,10,15,29,24,12)
,nrow=15, ncol=4,byrow=TRUE))
colnames(dataset) <- c("X","Sj1","Sj2","Sj3")
Output:
dataset
X Sj1 Sj2 Sj3
1 1 1 3 7
2 2 1 4 5
3 3 2 4 6
4 4 2 5 8
5 5 5 9 9
6 6 4 12 10
7 7 6 15 11
8 8 6 15 15
9 9 8 16 10
10 10 9 18 9
11 11 12 20 12
12 12 15 21 16
13 13 18 22 15
14 14 22 21 10
15 15 29 24 12
Regressions:
attach (dataset)
mod1 <- lm(Sj1~X)
mod2 <- lm(Sj2~X)
mod3 <- lm(Sj3~X)
Intercept <- 0
Intercept[1] <- mod1$coefficients[[1]]
Intercept[2] <- mod2$coefficients[[1]]
Intercept[3] <- mod3$coefficients[[1]]
Slope <- 0
Slope[1] <- mod1$coefficients[[2]]
Slope[2] <- mod2$coefficients[[2]]
Slope[3] <- mod3$coefficients[[2]]
data.frame(Intercept,Slope,row.names=colnames(dataset)[-1])
and the final output is
Intercept Slope
Sj1 -4.580952 1.7392857
Sj2 1.104762 1.6035714
Sj3 6.104762 0.5285714
There is a code to perform it automatically, indipendently from the number of columns? I tried apply and function without results.
What is the best way to do this?
lm accepts a matrix on the LHS. See the documentation.
f <- as.formula(paste0("cbind(", paste(names(dataset)[-1], collapse = ","), ") ~ X"))
mods <- lm(f, data = dataset)
coef(mods)
# Sj1 Sj2 Sj3
#(Intercept) -4.580952 1.104762 6.1047619
#X 1.739286 1.603571 0.5285714
PS: You should get out of the habit of using attach.
I am using approx() to interpolate values.
x <- 1:20
y <- c(3,8,2,6,8,2,4,7,9,9,1,3,1,9,6,2,8,7,6,2)
df <- cbind.data.frame(x,y)
> df
x y
1 1 3
2 2 8
3 3 2
4 4 6
5 5 8
6 6 2
7 7 4
8 8 7
9 9 9
10 10 9
11 11 1
12 12 3
13 13 1
14 14 9
15 15 6
16 16 2
17 17 8
18 18 7
19 19 6
20 20 2
interpolated <- approx(x=df$x, y=df$y, method="linear", n=5)
gets me this:
interpolated
$x
[1] 1.00 5.75 10.50 15.25 20.00
$y
[1] 3.0 3.5 5.0 5.0 2.0
Now, the first and last value are duplicates of my real data, is there any way to prevent this or is it something I don't understand properly about approx()?
You may want to specify xout to avoid this. For instance, if you want to always exclude the first and the last points, here's how you can do that:
specify_xout <- function(x, n) {
seq(from=min(x), to=max(x), length.out=n+2)[-c(1, n+2)]
}
plot(df$x, df$y)
points(approx(df$x, df$y, xout=specify_xout(df$x, 5)), pch = "*", col = "red")
It does not prevent from interpolating the existing point somewhere in the middle (exactly what happens on the picture below).
approx will fit through all your original datapoints if you give it a chance (change n=5 to xout=df$x to see this). Interpolation is the process of generating values for y given unobserved values of x, but should agree if the values of x have been previously observed.
The method="linear" setup is going to 'draw' linear segments joining up your original coordinates exactly (and so will give the y values you input to it for integer x). You only observe 'new' y values because your n=5 means that for points other than the beginning and end the x is not an integer (and therefore not one of your input values), and so gets interpolated.
If you want observed values not to be exactly reproduced, then maybe add some noise via rnorm ?
I am looking for multivariate detrending under common trend of a time series data in R.
Time series data sample:
> head(d)
T x1 x2 x3 x4
1 1 2 4 3 1
2 2 3 5 4 4
3 3 6 6 6 6
4 4 8 9 10 7
5 5 10 13 20 9
I would like to detrend the above multivariate time series dataset d under common trend. I hope I am clear in explaining the problem that I am facing.
Thanks!
You can use multivariate regression to solve for constants. Because the betas are the same, (i.e. beta in Y=x*beta is an n by 2 matrix with identical rows), you need to account for that constraint. However, you can just string all the Ys together for this.
dvec=as.numeric(d)
n=dim(d)[1]
ncol=dim(d)[2]
x=rep(1:n,ncol)
model<-lm(dvec~x)
Then you can do
d=matrix(model$residuals,nrow=n)