I've been trying to apply a function using the last two values of rows in a data frame, and I want it to repeat this process using the last two values of every row. Here's the function I'm trying to apply.
Basically, I create a function that uses a Kish grid. A kish grid is a way of randomly selecting participants in a household survey. It's a 10x8 matrix, it looks like this.
kishvalues <- c(1,1,1,1,1,1,1,1,
1,2,2,2,2,2,2,2,
1,1,3,3,3,3,3,3,
1,2,1,4,4,4,4,4,
1,1,2,1,5,5,5,5,
1,2,3,2,1,6,6,6,
1,1,1,3,2,1,7,7,
1,2,2,4,3,2,1,8,
1,1,3,1,4,3,2,1,
1,2,1,2,5,4,3,2)
kishtable <- matrix(kishvalues, nrow=10, ncol=8, byrow=T); kishtable
> kishtable <- matrix(kishvalues, nrow=10, ncol=8, byrow=T); kishtable
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 1 1 1 1 1 1 1
[2,] 1 2 2 2 2 2 2 2
[3,] 1 1 3 3 3 3 3 3
[4,] 1 2 1 4 4 4 4 4
[5,] 1 1 2 1 5 5 5 5
[6,] 1 2 3 2 1 6 6 6
[7,] 1 1 1 3 2 1 7 7
[8,] 1 2 2 4 3 2 1 8
[9,] 1 1 3 1 4 3 2 1
[10,] 1 2 1 2 5 4 3 2
If I'm doing household interviews, let's say I visit my 7th house of the day (rows), and there are 4 eligible participants for the interview (columns), I use the Kish table to select which of the four participants from youngest to oldest I have to choose in order to somewhat maintain randomness, and I would have to select the 3rd participant of that household for the interview.
> kishtable <- matrix(kishvalues, nrow=10, ncol=8, byrow=T); kishtable
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 1 1 1 1 1 1 1
[2,] 1 2 2 2 2 2 2 2
[3,] 1 1 3 3 3 3 3 3
[4,] 1 2 1 4 4 4 4 4
[5,] 1 1 2 1 5 5 5 5
[6,] 1 2 3 2 1 6 6 6
[7,] 1 1 1 (3) 2 1 7 7
[8,] 1 2 2 4 3 2 1 8
[9,] 1 1 3 1 4 3 2 1
[10,] 1 2 1 2 5 4 3 2
Now, here is the function I'm using
kish <- function(house,ep){
x <- kishtable[house,ep]
print(x)
}
kish(house=7, ep=4)
[1] 3
How can I apply this function, but instead of doing it on two single values, do it on two vectors, one vector is the sequential number of the homes visited (1 - 10) and another vector representing the number of eligible participants which can vary from 1 - 8?
Hope I made sense, let me know if you need anything else to better understand the problem.
Use cbind for the [-index.
homes <- c(7,8,10)
participants <- c(4,6,5)
kishtable[cbind(homes, participants)]
# [1] 3 2 5
Mapping to the following:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 1 1 1 1 1 1 1
[2,] 1 2 2 2 2 2 2 2
[3,] 1 1 3 3 3 3 3 3
[4,] 1 2 1 4 4 4 4 4
[5,] 1 1 2 1 5 5 5 5
[6,] 1 2 3 2 1 6 6 6
[7,] 1 1 1 (3)1 2 1 7 7
[8,] 1 2 2 4 3 (2)2 1 8
[9,] 1 1 3 1 4 3 2 1
[10,] 1 2 1 2 (5)3 4 3 2
Related
I have a list of 6 with 10 values in each. I would like to fill a 10x6 (10 rows, 6 columns) matrix with these values. I've tried some things but it's not working. I'm sure there must be an easy way to do it, but I haven't found it yet. Could anyone please help?
Here some example data:
l = lapply(1:6, rep, 10)
then use ?do.call and cbind to paste the list elements as columns:
do.call(cbind, l)
and you get a matrix:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 2 3 4 5 6
[2,] 1 2 3 4 5 6
[3,] 1 2 3 4 5 6
[4,] 1 2 3 4 5 6
[5,] 1 2 3 4 5 6
[6,] 1 2 3 4 5 6
[7,] 1 2 3 4 5 6
[8,] 1 2 3 4 5 6
[9,] 1 2 3 4 5 6
[10,] 1 2 3 4 5 6
I have a table which i want to transform
t LabelA LabelB start stop
[1,] 1 4 9 1 2
[2,] 1 3 5 1 2
[3,] 1 1 6 1 2
[4,] 2 4 9 1 2
[5,] 2 3 5 1 2
[6,] 2 1 6 1 2
[7,] 2 7 2 2 2
[8,] 3 3 5 3 4
[9,] 3 1 6 3 4
[10,] 3 7 2 3 5
[11,] 3 4 9 3 5
I want to filter the data in a way that rows which just differ by there number in the first column are removed (not completely but only the duplicate). So for rows 1 and 4 only row 1 should remain in the table. Or for row 3 and 9 only row 9 should remain. It is important that the information in the first column is remained and that the earliest occurance of the row remaisn in the table not the other incidences.
You can use duplicated:
mat[!duplicated(as.data.frame(mat[, -1])), ]
t LabelA LabelB start stop
[1,] 1 4 9 1 2
[2,] 1 3 5 1 2
[3,] 1 1 6 1 2
[7,] 2 7 2 2 2
[8,] 3 3 5 3 4
[9,] 3 1 6 3 4
[10,] 3 7 2 3 5
[11,] 3 4 9 3 5
where mat is the name of your matrix.
Try using duplicated function:
mymx <- matrix(c(1,4,9,1,2 ,1,3,5,1,2 ,1,1,6,1,2 ,2,4,9,1,2 ,2,3,5,1,2 ,2,1,6,1,2 ,2,7,2,2,2 ,3,3,5,3,4 ,3,1,6,3,4 ,3,7,2,3,5 ,3,4,9,3,5), ncol=5, byrow=T)
mymx[!duplicated(mymx[,-1]),]
> mymx[!duplicated(mymx[,-1]),]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 4 9 1 2
[2,] 1 3 5 1 2
[3,] 1 1 6 1 2
[4,] 2 7 2 2 2
[5,] 3 3 5 3 4
[6,] 3 1 6 3 4
[7,] 3 7 2 3 5
[8,] 3 4 9 3 5
I am trying to accomplish two things. First if I have a vector 1:5 I want to get a matrix (or two vectors) indicating the unique combinations of these elements including twice the same number but excluding repetitions.
Right now I can do this using a matrix:
foo <- matrix(1:5,5,5)
cbind(foo[upper.tri(foo,diag=TRUE)],foo[lower.tri(foo,diag=TRUE)])
[,1] [,2]
[1,] 1 1
[2,] 1 2
[3,] 2 3
[4,] 1 4
[5,] 2 5
[6,] 3 2
[7,] 1 3
[8,] 2 4
[9,] 3 5
[10,] 4 3
[11,] 1 4
[12,] 2 5
[13,] 3 4
[14,] 4 5
[15,] 5 5
But there has to be a simpler way. I tried to use Vectorize on seq but this gives me an error:
cbind(Vectorize(seq,"from")(1:5,5),Vectorize(seq,"to")(5,1:5))
Error in Vectorize(seq, "from") :
must specify formal argument names to vectorize
A second thing I want to do is if I have a list containing vectors, bar, to get a vector containing the elements of the list repeated equal to the number of elements in that element. I can do this with:
unlist(apply(rbind(1:length(bar),sapply(bar,length)),2,function(x)rep(x[1],x[2])))
[1] 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3
But again there must be an easier way. I tried Vectorize again here but with the same error:
Vectorize(rep,"each")(1:length(bar),each=sapply(bar,length))
in Vectorize(rep, "each") :
must specify formal argument names to vectorize
To your first question: what about the simple combn() function in base:
> combn(1:5,2)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 1 1 1 2 2 2 3 3 4
[2,] 2 3 4 5 3 4 5 4 5 5
If you need a matrix arranged the one you made up, just transpose it with t(), like t(combn(1:5,2))
Note: this will not give you back the combinations of repeated elements of your seq, but you may add those easily to the matrix.
> unlist(lapply(1:5, seq, from=1))
[1] 1 1 2 1 2 3 1 2 3 4 1 2 3 4 5
> unlist(lapply(1:5, seq, 5))
[1] 1 2 3 4 5 2 3 4 5 3 4 5 4 5 5
and
> bar = lapply(1:5, seq, from=1)
> rep(seq_along(bar), sapply(bar, length))
[1] 1 2 2 3 3 3 4 4 4 4 5 5 5 5 5
A faster variation of Martin Morgan's solution to the first part:
rep(1:5,5:1)
[1] 1 1 1 1 1 2 2 2 2 3 3 3 4 4 5
unlist(lapply(1:5,function(x) x:5))
[1] 1 2 3 4 5 2 3 4 5 3 4 5 4 5 5
Roughly 7 and 3 times faster respectively.
I'm not sure I follow what you mean in the second part, but the following seems to fit your description:
lapply(bar,function(x) rep(x,length(x)))
I'm not sure what you call this, but the default 'flow' of matrices is downwards (as seen below)
matrix(1,7,5)*(1:7)
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
6 6 6 6 6
7 7 7 7 7
What if your intention is to multiply the vector to the right instead of downwards? Is there a better way to write the command below? Is there a toggle for column instead of row (same for replicate(7,1:7) it assumes downwards flow (paste row vectors downwards instead of column vectors to the right); is transpose the solution?)
t(t(matrix(1,7,5))*(1:5))
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
If you really want to do this a lot after defining the matrix you can always make an operator yourself:
'%mat%'<- function(x,y)t(t(x)*y)
matrix(1,7,5)%mat%1:5
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 1 2 3 4 5
[3,] 1 2 3 4 5
[4,] 1 2 3 4 5
[5,] 1 2 3 4 5
[6,] 1 2 3 4 5
[7,] 1 2 3 4 5
But I think it easier to just transpose twice as you said in the question:
t(t(matrix(1,7,5))*1:5)
Or of course opt to transpose the matrix once in the beginning, do everything you need to do with it and then transpose it back.
As far as I know there is no way to change the default behaviour of *, nor would you probably want too,
A matrix is simply a vector with a dim attribute. The elements of the matrix are stored in the vector in column-major order and there is no way to change this. * is an element-by-element operator that recycles its arguments as necessary. You can see the recycling rule at work via:
> x <- matrix(1,7,5)
> x*1:5
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 2 4
[2,] 2 4 1 3 5
[3,] 3 5 2 4 1
[4,] 4 1 3 5 2
[5,] 5 2 4 1 3
[6,] 1 3 5 2 4
[7,] 2 4 1 3 5
You can see the multiplication is taking place by column and the vector (1:5) is being recycled to be the same length as the matrix. Rather than transposing, you could use the matrix function to re-size your matrix by row.
> matrix(x*1:5,nrow(x),ncol(x),byrow=TRUE)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 1 2 3 4 5
[3,] 1 2 3 4 5
[4,] 1 2 3 4 5
[5,] 1 2 3 4 5
[6,] 1 2 3 4 5
[7,] 1 2 3 4 5
I'm not sure that's the most efficient solution, but it's the best I can think of at the moment and it's slightly faster than using t twice.
Do you mean this?
> matrix(rep(1:7,5), nrow=7, ncol=5)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 1 1 1
[2,] 2 2 2 2 2
[3,] 3 3 3 3 3
[4,] 4 4 4 4 4
[5,] 5 5 5 5 5
[6,] 6 6 6 6 6
[7,] 7 7 7 7 7
> matrix(rep(1:7,5), nrow=5, ncol=7, byrow=TRUE)
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1 2 3 4 5 6 7
[2,] 1 2 3 4 5 6 7
[3,] 1 2 3 4 5 6 7
[4,] 1 2 3 4 5 6 7
[5,] 1 2 3 4 5 6 7
Goal: from a list of vectors of equal length, create a matrix where each vector becomes a row.
Example:
> a <- list()
> for (i in 1:10) a[[i]] <- c(i,1:5)
> a
[[1]]
[1] 1 1 2 3 4 5
[[2]]
[1] 2 1 2 3 4 5
[[3]]
[1] 3 1 2 3 4 5
[[4]]
[1] 4 1 2 3 4 5
[[5]]
[1] 5 1 2 3 4 5
[[6]]
[1] 6 1 2 3 4 5
[[7]]
[1] 7 1 2 3 4 5
[[8]]
[1] 8 1 2 3 4 5
[[9]]
[1] 9 1 2 3 4 5
[[10]]
[1] 10 1 2 3 4 5
I want:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 1 2 3 4 5
[2,] 2 1 2 3 4 5
[3,] 3 1 2 3 4 5
[4,] 4 1 2 3 4 5
[5,] 5 1 2 3 4 5
[6,] 6 1 2 3 4 5
[7,] 7 1 2 3 4 5
[8,] 8 1 2 3 4 5
[9,] 9 1 2 3 4 5
[10,] 10 1 2 3 4 5
One option is to use do.call():
> do.call(rbind, a)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 1 2 3 4 5
[2,] 2 1 2 3 4 5
[3,] 3 1 2 3 4 5
[4,] 4 1 2 3 4 5
[5,] 5 1 2 3 4 5
[6,] 6 1 2 3 4 5
[7,] 7 1 2 3 4 5
[8,] 8 1 2 3 4 5
[9,] 9 1 2 3 4 5
[10,] 10 1 2 3 4 5
simplify2array is a base function that is fairly intuitive. However, since R's default is to fill in data by columns first, you will need to transpose the output. (sapply uses simplify2array, as documented in help(sapply).)
> t(simplify2array(a))
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 1 2 3 4 5
[2,] 2 1 2 3 4 5
[3,] 3 1 2 3 4 5
[4,] 4 1 2 3 4 5
[5,] 5 1 2 3 4 5
[6,] 6 1 2 3 4 5
[7,] 7 1 2 3 4 5
[8,] 8 1 2 3 4 5
[9,] 9 1 2 3 4 5
[10,] 10 1 2 3 4 5
The built-in matrix function has the nice option to enter data byrow. Combine that with an unlist on your source list will give you a matrix. We also need to specify the number of rows so it can break up the unlisted data. That is:
> matrix(unlist(a), byrow=TRUE, nrow=length(a) )
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 1 2 3 4 5
[2,] 2 1 2 3 4 5
[3,] 3 1 2 3 4 5
[4,] 4 1 2 3 4 5
[5,] 5 1 2 3 4 5
[6,] 6 1 2 3 4 5
[7,] 7 1 2 3 4 5
[8,] 8 1 2 3 4 5
[9,] 9 1 2 3 4 5
[10,] 10 1 2 3 4 5
Not straightforward, but it works:
> t(sapply(a, unlist))
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 1 2 3 4 5
[2,] 2 1 2 3 4 5
[3,] 3 1 2 3 4 5
[4,] 4 1 2 3 4 5
[5,] 5 1 2 3 4 5
[6,] 6 1 2 3 4 5
[7,] 7 1 2 3 4 5
[8,] 8 1 2 3 4 5
[9,] 9 1 2 3 4 5
[10,] 10 1 2 3 4 5
t(sapply(a, '[', 1:max(sapply(a, length))))
where 'a' is a list.
Would work for unequal row size
> library(plyr)
> as.matrix(ldply(a))
V1 V2 V3 V4 V5 V6
[1,] 1 1 2 3 4 5
[2,] 2 1 2 3 4 5
[3,] 3 1 2 3 4 5
[4,] 4 1 2 3 4 5
[5,] 5 1 2 3 4 5
[6,] 6 1 2 3 4 5
[7,] 7 1 2 3 4 5
[8,] 8 1 2 3 4 5
[9,] 9 1 2 3 4 5
[10,] 10 1 2 3 4 5