Get specific column value for each row - r

I want to get a "m" length vector that, considering a m x n matrix, for each row, gives the value on the column identified by another column (say column "Z").
I made it using a for loop:
for (i in 1:dim(data.frame)[1]){vector[i] <- data.frame[i,data.frame$Z[i]]}
Do you see a simpler way to code it avoiding the loop?

"apply" is a possibility:
> M <- cbind( matrix(1:15,3,5), "Z"=c(3,1,2) )
> M
Z
[1,] 1 4 7 10 13 3
[2,] 2 5 8 11 14 1
[3,] 3 6 9 12 15 2
> v <- apply(M,1,function(x){x[x["Z"]]})
> v
[1] 7 2 6
>

Related

Crosschecking numbers of a matrix in R

I'm currently working with a large matrix of two columns, and what I want to check is If every line/combination (two columns) is also present in a dataframe loaded (two columns as well).
Example,
(obj_design <- matrix(c(2,5,4,7,6,6,20,12,4,0), nrow = 5, ncol = 2))
[,1] [,2]
[1,] 2 6
[2,] 5 20
[3,] 4 12
[4,] 7 4
[5,] 6 0
(refined_grid <- data.frame(i=1:4, j=1:12))
i j
1 1 1
2 2 2
3 3 3
4 4 4
5 1 5
6 2 6
7 3 7
8 4 8
9 1 9
10 2 10
11 3 11
12 4 12
Following the reproducible example, it would be selected (2,6) and (4,12).
I'm wondering if there's a function that I can use to check the whole matrix, and see if a specific line is in the dataframe, and (if possible) write separately (new dataset) which elements of the matrix it is in.
Any assistance would be wonderful.
Here is an option with match
i1 <- match(do.call(paste, as.data.frame(obj_design)),
do.call(paste, refined_grid), nomatch = 0)
refined_grid[i1,]
This code will give you which rows of the matrix exist in the dataframe.
which(paste(obj_design[,1], obj_design[,2]) %in%
paste(refined_grid$i, refined_grid$j)
)
Then you can just assign it to a vector!

cumsum the opposite of diff in r

I have a question and I'm not sure if I'm being totally stupid here or if this is a genuine problem, or if I've misunderstood what these functions do.
Is the opposite of diff the same as cumsum? I thought it was. However, using this example:
dd <- c(17.32571,17.02498,16.71613,16.40615,
16.10242,15.78516,15.47813,15.19073,
14.95551,14.77397)
par(mfrow = c(1,2))
plot(dd)
plot(cumsum(diff(dd)))
> dd
[1] 17.32571 17.02498 16.71613 16.40615 16.10242 15.78516 15.47813 15.19073 14.95551
[10] 14.77397
> cumsum(diff(dd))
[1] -0.30073 -0.60958 -0.91956 -1.22329 -1.54055 -1.84758 -2.13498 -2.37020 -2.55174
These aren't the same. Where have I gone wrong?
AHHH! Fridays.
Obviously
The functions are quite different: diff(x) returns a vector of length (length(x)-1) which contains the difference between one element and the next in a vector x, while cumsum(x) returns a vector of length equal to the length of x containing the sum of the elements in x
Example:
x <- c(1:10)
#[1] 1 2 3 4 5 6 7 8 9 10
> diff(x)
#[1] 1 1 1 1 1 1 1 1 1
v <- cumsum(x)
> v
#[1] 1 3 6 10 15 21 28 36 45 55
The function cumsum() is the cumulative sum and therefore the entries of the vector v[i] that it returns are a result of all elements in x between x[1] and x[i]. In contrast, diff(x) only takes the difference between one element x[i] and the next, x[i+1].
The combination of cumsum and diff leads to different results, depending on the order in which the functions are executed:
> cumsum(diff(x))
# 1 2 3 4 5 6 7 8 9
Here the result is the cumulative sum of a sequence of nine "1". Note that if this result is compared with the original vector x, the last entry 10 is missing.
On the other hand, by calculating
> diff(cumsum(x))
# 2 3 4 5 6 7 8 9 10
one obtains a vector that is again similar to the original vector x, but now the first entry 1 is missing.
In none of the cases the original vector is restored, therefore it cannot be stated that cumsum() is the opposite or inverse function of diff()
You forgot to account for the impact of the first element
dd == c(dd[[1]], dd[[1]] + cumsum(diff(dd)))
#RHertel answered it well, stating that diff() returns a vector with length(x)-1.
Therefore, another simple workaround would be to add 0 to the beginning of the original vector so that diff() computes the difference between x[1] and 0.
> x <- 5:10
> x
#[1] 5 6 7 8 9 10
> diff(x)
#[1] 1 1 1 1 1
> diff(c(0,x))
#[1] 5 1 1 1 1 1
This way it is possible to use diff() with c() as a representation of the inverse of cumsum()
> cumsum(diff(c(0,x)))
#[1] 1 2 3 4 5 6 7 8 9 10
> diff(c(0,cumsum(x)))
#[1] 1 2 3 4 5 6 7 8 9 10
If you know the value of "lag" and "difference".
x<-5:10
y<-diff(x,lag=1,difference=1)
z<-diffinv(y,lag=1,differences = 1,xi=5) #xi is first value.
k<-as.data.frame(cbind(x,z))
k
x z
1 5 5
2 6 6
3 7 7
4 8 8
5 9 9
6 10 10

append a constant and all lower natural numbers to the numbers to a list

Is there an easy way to add all the numbers (1:constant) to a list, as in the example below, without typing them out by hand (or using a for loop)?
> list <- c(1:10)
> constant <- 3
> unique(c(list,list+1,list+2,list+3))
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13
I'd like to be able to assign a constant, and append the constant and all lower natural numbers to the numbers in my list.
Thanks!
Edit:
Here's what I'm trying to do, except without a for loop:
> list <- c(1:10)
> list2 <- c(1:10)
> constant <- 3
> for(i in 1:constant){
+ list2<-unique(c(list2,list+i))
+ }
> list2
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13
This is not perfectly efficient, but it's better than a for loop:
v <- 1:10 ## better not to call this "list"
constant <- 3
unique(c(outer(0:constant,v,"+")))

Choose one cell per row in data frame

I have a vector that tells me, for each row in a date frame, the column index for which the value in this row should be updated.
> set.seed(12008); n <- 10000; d <- data.frame(c1=1:n, c2=2*(1:n), c3=3*(1:n))
> i <- sample.int(3, n, replace=TRUE)
> head(d); head(i)
c1 c2 c3
1 1 2 3
2 2 4 6
3 3 6 9
4 4 8 12
5 5 10 15
6 6 12 18
[1] 3 2 2 3 2 1
This means that for rows 1 and 4, c3 should be updated; for rows 2, 3 and 5, c2 should be updated (among others). What is the cleanest way to achieve this in R using vectorized operations, i.e, without apply and friends? EDIT: And, if at all possible, without R loops?
I have thought about transforming d into a matrix and then address the matrix elements using an one-dimensional vector. But then I haven't found a clean way to compute the one-dimensional address from the row and column indexes.
With your example data, and using only the first few rows (D and I below) you can easily do what you want via a matrix as you surmise.
set.seed(12008)
n <- 10000
d <- data.frame(c1=1:n, c2=2*(1:n), c3=3*(1:n))
i <- sample.int(3, n, replace=TRUE)
## just work with small subset
D <- head(d)
I <- head(i)
First, convert D into a matrix:
dmat <- data.matrix(D)
Next compute the indices of the vector representation of the matrix corresponding to rows and columns indicated by I. For this, it is easy to generate the row indices as well as the column index (given by I) using seq_along(I) which in this simple example is the vector 1:6. To compute the vector indices we can use:
(I - 1) * nrow(D) + seq_along(I)
where the first part ( (I - 1) * nrow(D) ) gives us the correct multiple of the number of rows (6 here) to index the start of the Ith column. We then add on the row index to get the index for the n-th element in the Ith column.
Using this we just index into dmat using "[", treating it like a vector. The replacement version of "[" ("[<-") allows us to do the replacement in a single line. Here I replace the indicated elements with NA to make it easier to see that the correct elements were identified:
> dmat
c1 c2 c3
1 1 2 3
2 2 4 6
3 3 6 9
4 4 8 12
5 5 10 15
6 6 12 18
> dmat[(I - 1) * nrow(D) + seq_along(I)] <- NA
> dmat
c1 c2 c3
1 1 2 NA
2 2 NA 6
3 3 NA 9
4 4 8 NA
5 5 NA 15
6 NA 12 18
If you are willing to first convert your data.frame to a matrix, you can index elements-to-be-replaced using a two-column matrix. (Beginning with R-2.16.0, this will be possible with data.frames directly.) The indexing matrix should have row indices in its first column and column indices in its second column.
Here's an example:
## Create a subset of the your data
set.seed(12008); n <- 6
D <- data.frame(c1=1:n, c2=2*(1:n), c3=3*(1:n))
i <- seq_len(nrow(D)) # vector of row indices
j <- sample(3, n, replace=TRUE) # vector of column indices
ij <- cbind(i, j) # a 2-column matrix to index a 2-D array
# (This extends smoothly to higher-D arrays.)
## Convert it to a matrix
Dmat <- as.matrix(D)
## Replace the elements indexed by 'ij'
Dmat[ij] <- NA
Dmat
# c1 c2 c3
# [1,] 1 2 NA
# [2,] 2 NA 6
# [3,] 3 NA 9
# [4,] 4 8 NA
# [5,] 5 NA 15
# [6,] NA 12 18
Beginning with R-2.16.0, you will be able to use the same syntax for dataframes (i.e. without having to first convert dataframes to matrices).
From the R-devel NEWS file:
Matrix indexing of dataframes by two column numeric indices is now supported for replacement as well as extraction.
Using the current R-devel snapshot, here's what that looks like:
D[ij] <- NA
D
# c1 c2 c3
# 1 1 2 NA
# 2 2 NA 6
# 3 3 NA 9
# 4 4 8 NA
# 5 5 NA 15
# 6 NA 12 18
Here's one way:
d[which(i == 1), "c1"] <- "one"
d[which(i == 2), "c2"] <- "two"
d[which(i == 3), "c3"] <- "three"
c1 c2 c3
1 1 2 three
2 2 two 6
3 3 two 9
4 4 8 three
5 5 two 15
6 one 12 18

Quickly generate the cartesian product of a matrix

Let's say I have a matrix x which contains 10 rows and 2 columns. I want to generate a new matrix M that contains each unique pair of rows from x - that is, a new matrix with 55 rows and 4 columns.
E.g.,
x <- matrix (nrow=10, ncol=2, 1:20)
M <- data.frame(matrix(ncol=4, nrow=55))
k <- 1
for (i in 1:nrow(x))
for (j in i:nrow(x))
{
M[k,] <- unlist(cbind (x[i,], x[j,]))
k <- k + 1
}
So, x is:
[,1] [,2]
[1,] 1 11
[2,] 2 12
[3,] 3 13
[4,] 4 14
[5,] 5 15
[6,] 6 16
[7,] 7 17
[8,] 8 18
[9,] 9 19
[10,] 10 20
And then M has 4 columns, the first two are one row from x and the next 2 are another row from x:
> head(M,10)
X1 X2 X3 X4
1 1 11 1 11
2 1 11 2 12
3 1 11 3 13
4 1 11 4 14
5 1 11 5 15
6 1 11 6 16
7 1 11 7 17
8 1 11 8 18
9 1 11 9 19
10 1 11 10 20
Is there either a faster or simpler (or both) way of doing this in R?
The expand.grid() function useful for this:
R> GG <- expand.grid(1:10,1:10)
R> GG <- GG[GG[,1]>=GG[,2],] # trim it to your 55 pairs
R> dim(GG)
[1] 55 2
R> head(GG)
Var1 Var2
1 1 1
2 2 1
3 3 1
4 4 1
5 5 1
6 6 1
R>
Now you have the 'n*(n+1)/2' subsets and you can simple index your original matrix.
I'm not quite grokking what you are doing so I'll just throw out something that may, or may not help.
Here's what I think of as the Cartesian product of the two columns:
expand.grid(x[,1],x[,2])
You can also try the "relations" package. Here is the vignette. It should work like this:
relation_table(x %><% x)
Using Dirk's answer:
idx <- expand.grid(1:nrow(x), 1:nrow(x))
idx<-idx[idx[,1] >= idx[,2],]
N <- cbind(x[idx[,2],], x[idx[,1],])
> all(M == N)
[1] TRUE
Thanks everyone!
Inspired from the other answers, here is a function implementing cartesian product of two matrices, in the case of two matrices, the full cartesian product, for only one argument, omitting one of each pair:
cartesian_prod <- function(M1, M2) {
if(missing(M2)) { M2 <- M1
ind <- expand.grid(1:NROW(M1), 1:NROW(M2))
ind <- ind[ind[,1] >= ind[,2],] } else {
ind <- expand.grid(1:NROW(M1), 1:NROW(M2))}
rbind(cbind(M1[ind[,1],], M2[ind[,2],]))
}

Resources