Suppose that I have an n row, m column matrix A, and I want to reorder every column in m according to the sorting of some specific row.
For instance, if I take order(A[,k]), that gives me the numeric or alphabetical order of elements in column k. I now want to sort every column in matrix A according to those rankings, so that elements 1...n in every row are ordered to correspond to elements 1...n (by rank) in column k. Is there a simple way to do this without looping over all columns?
Just use:
A[order(A[,k]),]
For example:
set.seed(21)
A <- matrix(rnorm(50),10,5)
A[order(A[,1]),]
to elaborate on #joshua's answer: I think the confusion may arise from the fact that you are ordering on a column but then passing that ordering as an index to the rows.
That's likely why you tried A[, order(A[,k])] instead of A[order(A[,k]),]
order(x) contrary to the name, does not actually order x, but rather just provides an ordering to x.
For example:
set.seed(1)
A <- matrix(sample(LETTERS[2:8], 24, T), ncol=6)
print(A, quote=F)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] C C F F G H
[2,] D H B D H C
[3,] F H C G D F
[4,] H F C E G B
order(A[, 2])
[1] 1 4 2 3
*Note that the output is only 4 elements long, which is the number of rows of A, not columns.*
The output essentially says that within column 2 of A,
the 1st element goes first,
the 4th element goes second,
the 2nd element goes thrid,
etc..
But each element of column A is attached to a row. We need to re-order the rows not the columns.
To apply that ordering to the entire matrix (or data frame), we use the ordering as a row index:
rowIndex <- order(A[, 2])
# Note that these are all equivalent
A[rowIndex, ]
A[order(A[, 2]), ]
A[c(1, 4, 1, 3), ]
Lastly, we can pass order() more than one vector, and it will use subsequent vectors to break ties.
However, regardless of the number of columns from A we give it, order will still give us a single vector, equal in size to the number of rows of A:
# Order according to column 2; ties are left according to their original order
order(A[, 2])
[1] 1 4 2 3
# Order according to column 2; ties are ordered according to column 5
order(A[, 2], A[, 5])
[1] 1 4 3 2
Related
I have a matrix, and I want to randomly pick 10% of the elements in the matrix, and store these elements into a dataframe indexing row, column and value.
I want to remark that I am interested in randomly sampling both row and column, so I am not interested in partial solutions to sample 10% of the rows and picking all the columns, or the other way around, sampling 10% of the columns and picking all the rows.
For example,
M = matrix(rnorm(30), 10, 3)
Given this matrix, that has 30 different elements, I would like to randomly sample 10% of them (0.1 * 30 = 3) and store those in a dataset of the form
row column value
4 2 x
7 1 x
2 1 x
You can use sample from a vector from seq_along and get the row and column using arrayInd and cbind this with the value of the matrix.
i <- sample(seq_along(M), length(M) %/% 10)
cbind(arrayInd(i, dim(M)), M[i])
#cbind(arrayInd(i, dim(M), c("row", "column"), TRUE), value = M[i]) #Alternative with names
# [,1] [,2] [,3]
#[1,] 5 1 -0.72818419
#[2,] 9 1 1.14609041
#[3,] 2 2 0.01162598
View the 2d matrix as a long 1d array,i.e. ravel it logically, not flattern it.
Then get 0.1 * len(matrix) * len(matrix[0]) rand numbers from 0 to len(matrix-1)*len(matrix[0]-1)
For example, you have a rand number: randVar,
then it can be represented in the form of row and col
row = randVar / len(matrix[0])
col = randVar % len(matrix[0])
Error message pops up when assigning values in dataframe A to matrix B.
A is a dataframe contains 9000 observations of 3 variables. Data are simulated values of 1000 iterations. Each iteration contains 9 values, i.e. 9 * 1000 = 9000.
V1 is iteration ID, variable name(which not useful for now), V3 is the variable I need.
I create a matrix B to keep values from A[,3]. However, the first value in each iteration will be discarded. Therefore, only 8 values in each iter are kept.
B <- matrix(NA, nrow = 1000, ncol = 8)
for(i in 1:iter){
for(m in 1:8){
B[i,m] <- A[9*(i-1)+m+1,3]
}
}
Then I got the error message. Couldn't figure it out anyways. Any help or suggestions or idea are the most welcome!
So, if I understand well, you basically want to fill the matrix row by row with all values of A[,3] except the first value of each group of 9 values.
Instead of using two for loops, you can go straight by filling directly the matrix with A[,3] when creating the matrix object B. It will fill it column by column, so you just have to transpose the matrix and remove the first column to get your result. The code looks like this:
B <- t(matrix(A$V3, nrow = 9, ncol = 1000))
B <- B[,-1]
Example
We defined a dataframe A with 3 variables and 9000 observations
A = data.frame(V1 = rnorm(9000),
V2 = rnorm(9000),
V3 = rnorm(9000))
> head(A)
V1 V2 V3
1 1.0755625 2.82414180 1.76860717
2 0.3421535 0.85857695 0.05682035
3 1.3747495 -0.01151905 0.90259357
4 1.1589849 0.91009114 0.35132258
5 -0.1107268 1.38244412 0.76163226
6 -1.5551836 1.27199029 -0.56923898
Then we apply the code below to generate B and we can check that B is:
> head(B[,1:5])
[,1] [,2] [,3] [,4] [,5]
[1,] 0.05682035 0.9025936 0.35132258 0.7616323 -0.5692390
[2,] -0.75018285 -0.6160903 -1.43556979 -0.3983150 2.0722279
[3,] 0.97226064 1.5366989 0.06546405 -0.5666010 2.3127568
[4,] -0.66904980 -1.9877136 -0.49963116 0.9217295 -0.6338961
[5,] 0.42339924 -0.6077871 0.16467356 -0.3301223 -0.6031495
[6,] 0.82212429 0.3383385 -0.26872905 1.1513397 -0.2644223
You can notice that first row of B correspond to first values of A WITHOUT the first one. and if we check dimensions of B, you will see:
> dim(B)
[1] 1000 8
I have a dataframe with multiple rows. I want to call a function is using any two rows. For example, Let's say I have this data and this myFunc which accepts two args:
df <- data.frame(q1=c(1,2,5), q2=c(5,5,5), q3=c(5,2,5), q4=c(5,5,5), q5=c(2,3,1))
df
q1 q2 q3 q4 q5
1 1 5 5 5 2
2 2 5 2 5 3
3 5 5 5 5 1
myFunc<-function(a,b) sum((df[a,]==df[b,] & df[a,]==5)*1)
A want to apply myFunc for row 1 and 2, myFunc(1,2) and I expect 2, myFunc compute how many "5" are have in common under the same column, between row 1 and 2.
Since I have thousands of rows, and I want to match all pairs, I want do this without writing a for loop, maybe with the do call or apply function family.
I tried this:
a=c(1,2) # match the row 1 and 2
b=c(2,3) # match the row 2 and 3
my_list=list(a,b)
do.call("myFunc", my_list)
But I got 4, instead of 2 and 2, any ideas?
The question recently changed. My understanding of it is that the input should be a list of pairs of row numbers and the output should be the same length as that list such that each component of the output is the number of columns with both entries equal to 5 in both rows defined by the corresponding pair. Thus for df shown in the question the list L shown below would correspond to c(myFunc(1, 2), myFunc(2, 3)) where myFunc is as defined in the question.
L <- list(1:2, 2:3)
myFunc2 <- function(x) myFunc(x[1], x[2])
sapply(L, myFunc2)
## [1] 2 2
Note that *1 in myFunc is unnecessary since sum will coerce a logical argument to numeric.
An alternative might be to specify the first row numbers as a vector and the second row numbers as another vector. In terms of L that would be a <- sapply(L, "[", 1); b <- sapply(L, "[", 2). Then use mapply.
a <- c(1, 2) # L[[1]][1], L[[2]][1]
b <- c(2, 3) # L[[1]][2], L[[2]][2]
mapply(myFunc, a, b)
## [1] 2 2
Try passing the rows instead of the row index
df <- data.frame(q1=c(1,2,5), q2=c(5,5,5), q3=c(5,2,5), q4=c(5,5,5), q5=c(2,3,1))
myFunc<-function(a,b) sum((a==b & a==5)*1)
myFunc(df[1,],df[2,])
This worked for me (returned 2)
I am currently working through an intro class and I and was having some difficulty with this particular problem:
Create a function that takes in a vector of numbers V.Size and a single number N as inputs and outputs a list object of size N where each list member is a vector that contains elements of V.Size such that the largest value in V.Size is in the vector of the first list item, the second largest value in V.Sizeis in the vector of the second list item, etc. The (N+1) ordered value of V.Size should be in the first vector of the list, the (N+2) ordered value ofV.Size should be in the second vector of the list and so on.
Now, this is what I have done thus far, I am trying to make an example code:
V.Size <- c(5,4,2,3,1)
n <- 5
Function <- c(V.Size, n)
Function
[1] 5 4 2 3 1 5
sort(Function, decreasing=TRUE)
[1] 5 5 4 3 2 1
The issue I am having is with (N+1), (N+2) and its ordering.
The first step to addressing this would be to create a vector of the list position for each element in sorted V.size. This is basically the vector (1, 2, ..., N, 1, 2, ..., N, ...), of total length V.size. You can get that with:
V.Size <- c(5,4,2,3,1)
n <- 2
rep(1:n, length.out=length(V.Size))
# [1] 1 2 1 2 1
Now you can use the split function to create a list based on these assignments:
split(sort(V.Size, decreasing=TRUE), rep(1:n, length.out=length(V.Size)))
# $`1`
# [1] 5 3 1
#
# $`2`
# [1] 4 2
With the help of people on this site I have a matrix y that looks similar to this (but much more simplified).
1,3
1,3
1,3
7,1
8,2
8,2
I have created a third column that generates random numbers (without replacement for each of the repeating chunks using this code j=cbind(y,sample(1:99999,y[,2],replace=FALSE)).
Matrix j looks like this:
1,3,4520
1,3,7980
1,3,950
7,1,2
8,3,4520
8,3,7980
8,3,950
How do I obtain truly random numbers for my third column such that for each of the repeating rows i.e. 3,then 1, then 2 I get a random number that is not replicated within that repeating part (replace = FALSE)?
Why this happens:
The problem is that sample command structure is:
sample(vector of values, how many?, replace = FALSE or TRUE)
here, "how many?" is supposed to be ONE value. Since you provide the whole of the second column of y, it just picks the first value which is 3 and so it reads as:
set.seed(45) # just for reproducibility
sample(1:99999, 3, replace = F)
And for this seed, the values are:
# [1] 63337 31754 24092
And since there are only 3 values are you're binding it to your matrix with 6 rows, it "recycles" the values (meaning, it repeats the values in the same order). So, you get:
# [,1] [,2] [,3]
# [1,] 1 3 63337
# [2,] 1 3 31754
# [3,] 1 3 24092
# [4,] 7 1 63337
# [5,] 8 2 31754
# [6,] 8 2 24092
See that the values repeat. For the matrix you've shown, I've no idea how the 7,1,2 occurs. As the first value of your matrix in y[,2] = 3.
What you should do instead:
y <- cbind(y, sample(1:99999, nrow(y), replace = FALSE))
This asks sample to generate nrow(y) = 6 (here) values without replacement. This would generate non-identical values of length 6 and that'll be binded to your matrix y.
This should get you what you want:
j <- cbind(y, unlist(sapply(unique(y[,2]), function(n) sample(1:99999, n))))
edit: There was an error in code. Function unique is of course needed.
I can't get this without a loop. Maybe someone else can get more elegant solution. For me the problem is to sample with repetition intra-group and without repetition inter-group
ll <- split(dat, paste(dat$V1,dat$V2,sep=''))
ll.length <- by(dat, paste(dat$V1,dat$V2,sep=''),nrow)
z <- rep(0,nrow(dat))
SET <- seq(1,100) ## we can change 100 by 99999 for example
v =1
for (i in seq_along(ll)){
SET <- SET[is.na(match(z,SET))]
nn <- nrow(ll[[i]])
z[v:(v+nn-1)] <- sample(SET,nn,rep=TRUE)
v <- v+nn
}
z
[1] 35 77 94 100 23 59