This question is for a project and nothing to do with homeworks/acads. I am a working statistician.
So my question is, how would you write a R function, given a matrix with 400 rows and two columns where every 20 rows starting from the first, form a first row of coordinates in a grid of points like below where I would like the function to return the four corners of each individual square/rectangle within the grid:
Hence the output would have four columns and each row would indicate a rectangle. I am only looking at adjacent rectangles of the same size as in for example if the numbers below denote the row indices of the example matrix (which has two columns):
Example of row indices:
1 2 3
4 5 6
7 8 9
Would have to be traversed in the following order:
[1,2,4,5],[2,3,5,6],[4,5,7,8],[5,6,8,9] and
return the corresponding 2d points from the example input data set
which would have 9 rows and 2 points. But just that, here the grid is specified to be 3 by 3 while in my example the grid is 20 by 20 and my input dataset is 400 rows by 2 columns. If you look at the traversed result there is a pattern wherethe row indices in each 4 point block are incremented by 1. i just want to generalize this to a 400 by 2 or any setting where there is a 2 column matrix of points and there is a mention of the grid dimension.
Here is a solution if I have understood you correctly. It was a very interesting problem to be honest. :D
The idea is to make a box of a given edge length and then move this box around the grid and record it's vertices. Please see the following:
# Assuming the grid is always a square grid.
grid.size <- 20
# The matrix of row indices.
rindex.grid <- matrix(1:(grid.size * grid.size),
nrow=grid.size, ncol=grid.size, byrow=TRUE)
# We can traverse the grid by moving any given square either right or down in any
# single move. We choose to go right.
move.square.right <- function (this.square, steps=1) {
new.square <- this.square + steps
}
# Going right, capture co-ordinates of all squares in this row.
collect.sq.of.edge.length.in.row.number <- function (grid.size, elength,
rownum=1) {
first.square.in.row <- (rownum - 1) * grid.size + c(1, elength)
first.square.in.row <- c(first.square.in.row,
first.square.in.row + grid.size * (elength - 1))
squares.in.row <- t(sapply(X=seq_len(grid.size - (elength - 1)) - 1,
FUN=move.square.right,
this.square=first.square.in.row))
squares.in.row
}
# Now we start going down the columns and using the function above to collect
# squares in each row. The we will rbind the list of squares in each row into a
# dataframe. So what we get is a (grid.size - (elength - 1) ^ 2) x 4 matrix where
# each row is the co-ordinates of a square of edge length elength.
collect.sq.of.edge.length.in.grid <- function (grid.size, elength) {
all.squares=lapply(X=seq_len(grid.size - (elength - 1)),
FUN=collect.sq.of.edge.length.in.row.number,
grid.size=grid.size, elength=elength)
all.squares <- do.call(rbind, all.squares)
all.squares
}
This seems to show that we are getting the right number of boxes for all edge lengths:
tmp <- sapply(1:20, collect.sq.of.edge.length.in.grid, grid.size=grid.size)
sapply(tt, nrow)
[1] 400 361 324 289 256 225 196 169 144 121 100 81 64 49 36 25 16 9 4 1
Plus, it works well in your 3x3 example:
collect.sq.of.edge.length.in.grid(grid.size=3, elength=2)
[,1] [,2] [,3] [,4]
[1,] 1 2 4 5
[2,] 2 3 5 6
[3,] 4 5 7 8
[4,] 5 6 8 9
If you want to create a movable 20 x 20 "window" that can scroll down and/or across a 400x400 space, then use:
mcorners <- function(xidx, yidx) mat[xidx:(xidx+19),
yidx:(yidx+19])
mcorners(1,1) # should return mat[1:20, mat1:20]
Then supply mcorners() with arguments to fit your somewhat vaguely described needs. The traversal down the first column might involve:
sapply(1:381, function(ix) yourfunc( mcorners(ix, 1) ) )
Related
Will try not to complicate things too much with my explanations, but I'm confused how to best go about filling a triangulated correlation matrix with no repeat values with existing correlation values derived from another package. This involves extracting specific values from a list of text files. This is what I have done so far:
# read in list of file names (they are named '1_1', '1_2' .. so on until '47_48' with no repeat values generated)
filenames <- read_table('/home/filenames.txt', col_names = 'file_id')
# create symmetrical matrix
M <- diag(48)
ct <- 1
for (sub in (filenames$file_id)) {
subj <- read.table(paste0(dat_dir, '/ht_', sub, '.HEreg'), sep="", fill=TRUE)
ht <- as.character(subj$V2[grep("rG",sub$V1)]) # wanting to extract the specific value in that column for each text file
M[ct,] <- as.numeric(ht) #input this value into the appropriate location
ct <- ct + 1
}
This obviously does not give me the triangulated output I would envision - I know there is an error with inputting the variable 'ht' into the matrix, but am not sure how to solve this moving forward. Ideally, the correlation value of file 1_1 should be inserted in row 1, col 1, file 1_2 should be inserted in row 2, col 1, so on and so forth, and avoiding repeats (should be 0's)
Should I turn to nested loops?
Much help would be appreciated from this R newbie here, I hope I didn't complicate things unnecessarily!
I think the easiest way would be to read in all your values into a vector. You can do this using a variation of your existing loop.
Let us assume that your desired size correlation matrix is 5x5 (I know you have 48x48 judging by your code, but to keep the example simple I will work with a smaller matrix).
Let us assume that you have read all of your correlation values into the vector x in column major order (same as R uses), i.e. the first element of x is row 2 column 1, second element is row 3 column 1 etc. I am further assuming that you are creating a symmetric correlation matrix, i.e. you have ones on the diagonal, which is why the indexing starts the way it does, because of your use of the diag() function. Let's assume your vector x contains the following values:
x <- 1:10
I know that these are not correlations, but they will make it easy to see how we fill the matrix, i.e. which vector element goes into which position in the resulting matrix.
Now, let us create the identity matrix and zero matrices for the upper and lower triangular correlations (off diagonal).
# Assuming 5x5 matrix
n_elements <- 5
m <- diag(n_elements)
m_upper <- m_lower <- matrix(0, n_elements, n_elements)
To quickly fill the lower triangular matrix, we can use the lower.tri().
m_lower[lower.tri(m_lower, diag = FALSE)] <- x
This will yield the following output:
[,1] [,2] [,3] [,4] [,5]
[1,] 0 0 0 0 0
[2,] 1 0 0 0 0
[3,] 2 5 0 0 0
[4,] 3 6 8 0 0
[5,] 4 7 9 10 0
As you can see, we have successfully filled the lower triangular. Also note the order in which the elements of the vector is filled into the matrix. This is crucial for your results to be correct. The upper triangular is simply the transpose of the lower triangular, and then we can add our three matrices together to form your symmetric correlation matrix.
m_upper <- t(m_lower)
M <- m_lower + m + m_upper
Which yields the desired output:
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 2 3 4
[2,] 1 1 5 6 7
[3,] 2 5 1 8 9
[4,] 3 6 8 1 10
[5,] 4 7 9 10 1
As you see, there is no need to work with nested loops to fill these matrices. The only loop you need is to read in the results from files (which it appears you have a handle on). If you only want the triangulated output, you can simply stop at the lower triangular matrix above. If your vector of estimated correlations (in my example x) include the diagonal elements, simply set diag = TRUE in the lower.tri() function and you are good to go.
Suppose I have a vector V1 (with two or more elements):
V1 <- 1:10
I can reorder the original vector with the function sample. This function, however, cannot make sure that none element in the new vector being in the same position as the original vector. For example:
set.seed(4)
V2 <- sample(V1)
This will result in a vector that has two elements being in the same position as the original one:
V1[V1 == V2]
3 5
My question is: Is it possible to generate a random vector to make sure that no element being in the same position between the two vectors?
Your requirement of not having certain indices in the vector not being able to shift means that you don't want a purely random permutation, where that might happen. The best I could come up with is to just loop, using sample until we find a vector where every element shifts:
v1 <- 1:10
v1_perm <- v1
cnt <- 0
while (sum(v1 == v1_perm) > 0) {
v1_perm <- sample(v1)
cnt <- cnt + 1
}
v1
v1_perm
paste0("It took ", cnt, " tries to find a suitable vector")
[1] 1 2 3 4 5 6 7 8 9 10
[1] 3 10 4 7 8 1 6 2 5 9
[1] "It took 3 tries to find a suitable vector"
Demo
Note that I have implemented the requirement of shifting positions with shifting values. This of course isn't strictly true, because two values could be the same. But, assuming all your entries are unique, then checking for zero overlap of values equates with zero overlap of indices.
I have some data in a 3D grid identified by simple i,j,k locations (no real-world spatial information). These data are in a RasterStack right now.
b <- stack(system.file("external/rlogo.grd", package="raster"))
# add more layers
b <- stack(b,b)
# dimensions
dim(b)
[1] 77 101 6
yields 77 rows, 101 columns, 6 layers.
# upscale by 2
up <- aggregate(b,fact=2)
dim(up)
[1] 39 51 6
yields 39 rows, 51 columns, 6 layers.
Hoped-for behavior: 3 layers.
I'm looking for a method to aggregate across layers in addition to the present behavior, which is to aggregate within each layer. I'm open to other data structures, but would prefer an existing upscaling/resampling/aggregation algorithm to one I write myself.
Potentially related are http://quantitative-advice.gg.mq.edu.au/t/fast-way-to-grid-and-sum-coordinates/110/5 or the spacetime package, which assumes the layers are temporal rather than spatial, adding more complexity.
Supouse you define agg.fact variable to denote the value 2:
agg.fact <- 2
up <- aggregate(b, fact = agg.fact)
dim(up)
[1] 39 51 6
Now we generate a table which indicates which layers will be aggregate with anothers using agg.fact:
positions <- matrix(1:nlayers(b), nrow = nlayers(b)/agg.fact, byrow = TRUE)
[,1] [,2]
[1,] 1 2
[2,] 3 4
[3,] 5 6
And apply a function(in this case mean but could be max``,sum` or another ...) to each pair of layers
up2 <- stack(apply(positions, 1, function(x){
mean(b[[x[1]]], b[[x[2]]])
}))
dim(up2)
[1] 77 101 3
Or if want to aggregate in 3 dimensions (choose if want aggregate 1-2d and then 3d or viceverza):
up3 <- stack(apply(positions, 1, function(x){
aggregate(mean(b[[x[1]]], b[[x[2]]]), fact = agg.fact) #first 3d
#mean(aggregate(b[[x[1]]], fact = agg.fact), aggregate(b[[x[2]]]), fact = agg.fact) ##first 1d-2d
}))
dim(up3)
[1] 39 51 3
I did not read the documentation correctly. To aggregate across layers:
For example, fact=2 will result in a new Raster* object with 2*2=4 times fewer cells. If two numbers are supplied, e.g., fact=c(2,3), the first will be used for aggregating in the horizontal direction, and the second for aggregating in the vertical direction, and the returned object will have 2*3=6 times fewer cells. Likewise, fact=c(2,3,4) aggregates cells in groups of 2 (rows) by 3 (columns) and 4 (layers).
It may be necessary to play with expand=TRUE vs expand=FALSE to get it to work, but this seems inconsistent (I have reported it as a bug).
I have two sets of matrices. Each matrix is 100x100 in dimension and I have 240 of them (imagine each matrix was collected in a month and I have a dataset composed of 240 months of 100x100 matrices).
The values in the matrices range from 1 to 15, representing vegetation types (grass, tropical forest, tundra etc).
My first set of matrices, m1, is my control experiment. My second set of matrices, m2, is a climate change experiment where changes in climate induce changes in the values of the matrices.
Therefore, the data is represented like this:
m1: set of 240 100x100 matrices, each matrix corresponding to a month (therefore 240 months of data). This is my control data
m2: same as m1, but the values are different because of some changes in climate. This is my experimental data.
Here is some data:
# generate dataset 1
set.seed(4)
someData1 <- round(runif(100 * 100 * 240, min=1, max=15),digits=0)
# generate dataset2
set.seed(5)
someData2 <- round(runif(100 * 100 * 240, min=1, max=15),digits=0)
# create matrices
k = 240; n=100; m = 100
m1 <- array(someData1, c(n,m,k))
m2 <- array(someData2, c(n,m,k))
What I would like to do is compare each cell of m2 relative to m1 in this way:
is the value different? yes/no
if yes, what was the change? for example 1 to 10, or 2 to 7 and so on.
and do the same for all 240 matrices in m2 relative to all 240 matrices in m1.
By the end, I would like to be able to:
have a binary matrix showing whether or not there has been changes in the values;
have a table with the frequency of changes in each class (i.e. 1 to 10, 2 to 7 etc).
Conceptually, what I need to achieve would be something like this:
where for simplicity sake I drew 5x5 matrices instead of 100x100 matrices.
How to achieve this in R?
To compare two matrices, use == or !=.
what.changed <- m1 != m2 # T if changed F if not
changes <- ifelse(what.changed, paste(m1, 'to', m2), NA)
changes # for your little matrices not the 100x100
[,1] [,2] [,3]
[1,] NA "7 to 10" "6 to 7"
[2,] NA NA NA
[3,] "3 to 4" "6 to 8" NA
Your matrices seem rather large, so I'm not sure if some sort of sparse matrix approach might be better. In regards to storing the changes as a string ("3 to 4"), perhaps you could only store changes where there is in fact a change, rather than creating such a large matrix where most of the elements are NA. e.g.
Or perhaps you could create a CSV/dataframe summarising your changes e.g. (using your 100x100x240 matrices to demonstrate the 3 coordinates):
# find coordinates of changes
change.coords <- which(m1 != m2, arr.ind=T)
colnames(change.coords) <- c('x', 'y', 'time') # whatever makes sense to your application
changes <- data.frame(change.coords, old=m1[change.coords], new=m2[change.coords])
head(changes)
x y time old new
1 1 1 1 9 4
2 2 1 1 1 11
3 3 1 1 5 14
4 5 1 1 12 2
5 6 1 1 5 11
6 7 1 1 11 8
Then you can print it out as you wish without having to store heaps of strings ("X to Y") and NAs, e.g (don't do this with your big example matrices, there are waaay too many changes and it will print them /all/):
with(changes, message(sprintf("Coords (%i, %i, %i): %i to %i\n",
x, y, time, old, new)))
Users
I have a distance matrix dMat and want to find the 5 nearest samples to the first one. What function can I use in R? I know how to find the closest sample (cf. 3rd line of code), but can't figure out how to get the other 4 samples.
The code:
Mat <- replicate(10, rnorm(10))
dMat <- as.matrix(dist(Mat))
which(dMat[,1]==min(dMat[,1]))
The 3rd line of code finds the index of the closest sample to the first sample.
Thanks for any help!
Best,
Chega
You can use order to do this:
head(order(dMat[-1,1]),5)+1
[1] 10 3 4 8 6
Note that I removed the first one, as you presumably don't want to include the fact that your reference point is 0 distance away from itself.
Alternative using sort:
sort(dMat[,1], index.return = TRUE)$ix[1:6]
It would be nice to add a set.seed(.) when using random numbers in matrix so that we could show the results are identical. I will skip the results here.
Edit (correct solution): The above solution will only work if the first element is always the smallest! Here's the correct solution that will always give the 5 closest values to the first element of the column:
> sort(abs(dMat[-1,1] - dMat[1,1]), index.return=TRUE)$ix[1:5] + 1
Example:
> dMat <- matrix(c(70,4,2,1,6,80,90,100,3), ncol=1)
# James' solution
> head(order(dMat[-1,1]),5) + 1
[1] 4 3 9 2 5 # values are 1,2,3,4,6 (wrong)
# old sort solution
> sort(dMat[,1], index.return = TRUE)$ix[1:6]
[1] 4 3 9 2 5 1 # values are 1,2,3,4,6,70 (wrong)
# Correct solution
> sort(abs(dMat[-1,1] - dMat[1,1]), index.return=TRUE)$ix[1:5] + 1
[1] 6 7 8 5 2 # values are 80,90,100,6,4 (right)