I have a set of observation in irregular grid. I want to have them in regular grid with resolution of 5. This is an example :
d <- data.frame(x=runif(1e3, 0, 30), y=runif(1e3, 0, 30), z=runif(1e3, 0, 30))
## interpolate xy grid to change irregular grid to regular
library(akima)
d2 <- with(d,interp(x, y, z, xo=seq(0, 30, length = 500),
yo=seq(0, 30, length = 500), duplicate="mean"))
how can I have the d2 in SpatialPixelDataFrame calss? which has 3 colomns, coordinates and interpolated values.
You can use code like this (thanks to the comment by #hadley):
d3 <- data.frame(x=d2$x[row(d2$z)],
y=d2$y[col(d2$z)],
z=as.vector(d2$z))
The idea here is that a matrix in R is just a vector with a bit of extra information about its dimensions. The as.vector call drops that information, turning the 500x500 matrix into a linear vector of length 500*500=250000. The subscript operator [ does the same, so although row and col originally return a matrix, that is treated as a linear vector as well. So in total, you have three matrices, turn them all to linear vectors with the same order, use two of them to index the x and y vectors, and combine the results into a single data frame.
My original solution didn't use row and col, but instead rep to formulate the x and y columns. It is a bit more difficult to understand and remember, but might be a bit more efficient, and give you some insight useful for more difficult applications.
d3 <- data.frame(x=rep(d2$x, times=500),
y=rep(d2$y, each=500),
z=as.vector(d2$z))
For this formulation, you have to know that a matrix in R is stored in column-major order. The second element of the linearized vector therefore is d2$z[2,1], so the rows number will change between two subsequent values, while the column number will remain the same for a whole column. Consequently, you want to repeat the x vector as a whole, but repeat each element of y by itself. That's what the two rep calls do.
Related
EDITED: Suppose I have a symmetric matrix such as the one below.
dat<-c(NA,2,3,4,5,2,NA,8,9,10,3,8,NA,14,15,4,9,14,NA,20,5,10,15,20,NA)
x<-matrix(dat,nrow = 5,dimnames = list(c("A","B","C","D","E"),c("A","B","C","D","E")))
x
I'm trying to see if there is any way we can use R to reorder the matrix in such a way that the highest values are closer to the diagonal, with the maximum value of each column of the lower triangle as the first item in the diagonal, and also it maintains its symmetry. This is a problem in card sorting.
Here is the desired output:
result<-c(NA,20,15,10,5,20,NA,14,9,4,15,14,NA,8,3,10,9,8,NA,2,5,4,3,2,NA)
y<-matrix(result,nrow = 5,dimnames = list(c("E","D","C","B","A"),c("E","D","C","B","A")))
y
I had a similar requirement when examining a matrix containing similarities between documents.
k <- apply(x, 1, max, na.rm=TRUE)
order <- sort(k, decreasing=TRUE, index.return=TRUE)$ix
x[order, order]
I use max on each row to find the maximum value per row. na.rm ensures that the diagonal is not considered. sort then provides the desired order as a vector. Reorganising the matrix according to that order is as simple as x[order, order].
I am working in R with a series of data values that have an x position (distance along a transect) and a z position (distance from the ground for a given x position). There is not a data value measurement at each x, z coordinate, to do the analysis that I need to perform, I need to code a 0 in there. Here is a short code example, real data is usually 14,000-20,000 rows. In Matlab we solve this issue by creating an empty matrix and filling it. I need an x,z matrix normalized to max(z). So in the sample below, max z is 8 and max x is 4, so I need a 4 x 8 matrix where whenever there is no given value present, 0 would be entered--just not sure the best, most efficient way to do this in R.
x <- c(1,1,1,1,1,2,2,3,3,4,4,4)
z <- c(1,4,5,6,7,1,4,2,8,1,2,5)
value <- c(9,9,9,9,9,9,9,9,9,9,9,9)
data.frame(x,z, value)
Thanks ahead of time!
In R you would do it much the same way as you describe in Matlab. First, create a matrix with all zeroes:
df <- data.frame(x, z, value)
mat <- matrix(0, 4, 8)
And then the tricky part, where you have to create a vector of the selected elements
mat[cbind(df$x, df$z)] <- df$value
What the cbind is essentially doing is creating a 2-column matrix that is used to identify a set of elements in the matrix, and then assigning the corresponding value.
I am trying to calculate some statistics for a moving window and am using rollapply in the zoo package. My question is how do I get rollapply to apply that function to the previous n observations instead of the current observation and the previous n-1 observations as align right seems to do.
require(zoo)
z <- data.frame(x1=11:111, x2=111:211, x3=as.Date(31:131))#generate data
output<-data.frame(dates=z$x3,
rollapply(z[,1:2],by.column=TRUE, 5, max, fill=NA, align='right'))
I have a hunch this is answered by ?rollapply "If width is a plain numeric vector its elements are regarded as widths to be interpreted in conjunction with align whereas if width is a list its components are regarded as offsets. In the above cases if the length of width is 1 then width is recycled for every by-th point. If width is a list its components represent integer offsets such that the i-th component of the list refers to time points at positions i + width[[i]]." But I have no idea what that means in terms of R code an no example is provided.
Nevermind, I deciphered the 'help.' Adding the parameter width to rollapply like this:
width=list(-1:-5)
accomplishes it.
If I'm reading correctly, you just want the column "shifted" down by one - so that the value for row n is the value that row n+1 has now.
This can be easily done using the lag function:
z <- data.frame(x1=11:111, x2=111:211, x3=as.Date(31:131))#generate data
output<-data.frame(dates=z$x3,
rollapply(z[,1:2],by.column=TRUE, 5, max, fill=NA, align='right'))
output$x1 <- lag(output$x1, 1)
output$x2 <- lag(output$x2, 1)
I have a question and hope that some of you can help me. The issue is this: for a given data frame that includes a vector y of length n and a factor f with k different levels, I want to assign a new variable z which has length k to the data frame, based on f.
Example:
df <- data.frame(y=rnorm(12), f=rep(1:3, length.out=12))
z <- c(-1,0,5)
Note that my real z has been constructed to correspond to the unique factor levels, which is why length(z) = length(unique(df$f). I now want to create a vector of length n=12 that contains the value of z that corresponds to the factor level f. (Note: my real factor values are not ordered like in the above example, so just repeating the vector z won't work),
Now, an obvious solution would be to create a vector foutside the data frame, merge it with z and then to use merge. For instance,
newdf <- data.frame(z=z, f=c(1,2,3))
df <- merge(df, newdf, by="f")
However, I need to repeat this procedure several thousand times, and this merge-solution seems like shooting with canons on microbes. Hence my question: there almost surely is an easier and more efficient way to do this, but I just don't know how. Could anyone point me in the right direction? I am looking for something like the "inverse" of aggregate or by.
assuming that the values in z correspond to the f levels
df <- data.frame(y=rnorm(12), f= sample(c("a","b","c"),12,replace=T))
z <- c(-1,0,5)
df$newz<-z[df$f]
In case this is not clear: this works because factors are stored under the covers as integers. When you index z with that vector of factors you are effectively indexing with the underlying integers, which point to the right z value for that factor value.
I'm trying to find an apply() type function that can run a function that operates on two arrays instead of one.
Sort of like:
apply(X1 = doy_stack, X2 = snow_stack, MARGIN = 2, FUN = r_part(a, b))
The data is a stack of band arrays from Landsat tiles that are stacked together using rbind. Each row contains the data from a single tile, and in the end, I need to apply a function on each column (pixel) of data in this stack. One such stack contains whether each pixel has snow on it or not, and the other stack contains the day of year for that row. I want to run a classifier (rpart) on each pixel and have it identify the snow free day of year for each pixel.
What I'm doing now is pretty silly: mapply(paste, doy, snow_free) concatenates the day of year and the snow status together for each pixel as a string, apply(strstack, 2, FUN) runs the classifer on each pixel, and inside the apply function, I'm exploding each string using strsplit. As you might imagine, this is pretty inefficient, especially on 1 million pixels x 300 tiles.
Thanks!
I wouldn't try to get too fancy. A for loop might be all you need.
out <- numeric(n)
for(i in 1:n) {
out[i] <- snow_free(doy_stack[,i], snow_stack[,i])
}
Or, if you don't want to do the bookkeeping yourself,
sapply(1:n, function(i) snow_free(doy_stack[,i], snow_stack[,i]))
I've just encountered the same problem and, if I clearly understood the question, I may have solved it using mapply.
We'll use two 10x10 matrices populated with uniform random values.
set.seed(1)
X <- matrix(runif(100), 10, 10)
set.seed(2)
Y <- matrix(runif(100), 10, 10)
Next, determine how operations between the matrices will be performed. If it is row-wise, you need to transpose X and Y then cast to data.frame. This is because a data.frame is a list with columns as list elements. mapply() assumes that you are passing a list. In this example I'll perform correlation row-wise.
res.row <- mapply(function(x, y){cor(x, y)}, as.data.frame(t(X)), as.data.frame(t(Y)))
res.row[1]
V1
0.36788
should be the same as
cor(X[1,], Y[1,])
[1] 0.36788
For column-wise operations exclude the t():
res.col <- mapply(function(x, y){cor(x, y)}, as.data.frame(X), as.data.frame(Y))
This obviously assumes that X and Y have dimensions consistent with the operation of interest (i.e. they don't have to be exactly the same dimensions). For instance, one could require a statistical test row-wise but having differing numbers of columns in each matrix.
Wouldn't it be more natural to implement this as a raster stack? With the raster package you can use entire rasters in functions (eg ras3 <- ras1^2 + ras2), as well as extract a single cell value from XY coordinates, or many cell values using a block or polygon mask.
apply can work on higher dimensions (i.e. list elements). Not sure how your data is set up, but something like this might be what you are looking for:
apply(list(doy_stack, snow_stack), c(1,2), function(x) r_part(x[1], x[2]))