Column-wise element selection in R - r

I need to select elements of a dataframe using the row indices, as stored in a vector. In other words, I have a vector or length equal to the number of columns in a data frame, and that vector contains the row numbers of the elements I need to extract (one element per column, in order).
How would I go about doing this?
Example:
vec <- c(1,2,1)
df <- data.frame(matrix(1:6, ncol = 3, nrow = 2))
That would look like this:
X1 X2 X3
1 1 3 5
2 2 4 6
And I would need to get elements (1,4,5) using the indices from vec = 1,2,1

We can use :
df[cbind(vec, 1:ncol(df))]
#[1] 1 4 5
Using cbind, we create a row and column index to subset values from df.
cbind(vec, 1:ncol(df))
vec
#[1,] 1 1
#[2,] 2 2
#[3,] 1 3
Using this matrix, we subset values from (row 1, column1), (row2, column2) and row(1, column3).

Related

I want to create a vector by using values of variables from a dataframe

data <- data.frame(cust_id=1:5,
a=c(10,20,30,40,50),
b=c(5,10,15,20,25),
c=c(20,40,60,80,100),
d=c(2,4,6,8,10))
vect <- c(a*2,b*3,c*4,d*5)
here I need to get values of a,b,c,d from the dataset for each cust_id (1:5) as a vector, then multiply them by 2,3,4,5 which are constants
vect <- c(40*2,20*3,80*4,8*5)
*Note: vect is a vector, I need output in vector form.
a,b,c,d are the values of variables from data for each cust_id
We can use crossprod (if we need to multiply and then sum)
crossprod(t(data[-1]), 2:5)
If it is only need to multiply by corresponding columns with vector
sweep(data[-1], 2, 2:5, "*")
Or with mapply
mapply(`*`, data[-1], 2:5)
You want to take the vectorized dot-product of a matrix by a vector:
as.matrix(data[,c('a','b','c','d')]) %*% c(2,3,4,5)
[,1]
[1,] 125
[2,] 250
[3,] 375
[4,] 500
and since you want to also propagate your cust_id column, it's simplest to slice off the cust_id column of your input dataframe, then cbind it separately back into the result:
df.out <- data.frame(cust_id = data[,'cust_id'],
result = as.matrix(data[,c('a','b','c','d')]) %*% c(2,3,4,5))
cust_id result
1 1 125
2 2 250
3 3 375
4 4 500
5 5 625

R, conditional summing of every second cell in each row

I have a data frame and want for each row the sum of every second cell (beginning with the second cell), whose left neighbor is greater than zero. Here's an example:
a <- c(-2,1,1,-2)
b <- c(1,2,3,4)
c <- c(-2,1,-1,2)
d <- c(5,6,7,8)
df <- data.frame(a,b,c,d)
This gives:
> df
a b c d
1 -2 1 -2 5
2 1 2 1 6
3 1 3 -1 7
4 -2 4 2 8
For the first row the correct sum is 0 (the left neighbor of 1 is -2 and the left neighbor of 5 is also -2); for the second it's 8; for the third it's 3; for the fourth it's again 8.
I want to do it without loops, so I tried it with sum() and which() like in Conditional Sum in R, but could not find a way through.
We subset the dataset for alternating columns using the recycling vector (c(TRUE, FALSE)) to get the 1st, 3rd, ...etc columns of the dataset, convert it to a logical vector by checking whether it is greater than 0 ( > 0), then multiply the values with the second subset of alternating columns ie. columns 2nd, 4th etc. by using the recycling vector (c(FALSE, TRUE)). The idea is that if there are values in the left column that are less than 0, it will be FALSE in the logical matrix and it gets coerced to 0 by multiplying with the other subset. Finally, do the rowSums to get the expected output
rowSums((df[c(TRUE, FALSE)]>0)*df[c(FALSE, TRUE)])
#[1] 0 8 3 8
It can be also replaced with seq
rowSums((df[seq(1, ncol(df), by = 2)]>0)*df[seq(2, ncol(df), by = 2)])
#[1] 0 8 3 8
Or another option is Reduce with Map
Reduce(`+`, Map(`*`, lapply(df[c(TRUE, FALSE)], `>`, 0), df[c(FALSE, TRUE)]))
#[1] 0 8 3 8

R subset elements specifying column and row if each element

How should I subset a matrix specifying both the line and the column of each item ? I'm currently using sapply but I don't find that particularly elegant:
> mat <- data.frame(a=c(1,2,3),b=c(7,6,5))
> mat
a b
1 1 7
2 2 6
3 3 5
> rowSel <- 1:3
> colSel <- c(1,2,1)
> sapply(rowSel,function(i){mat[i,colSel[i]]})
[1] 1 6 3
A shorter way:
mat[cbind(rowSel, colSel)]
#[1] 1 6 3
This uses the indexing by a twocolumn matrix. The first column contains the index of the row, the second column contains the index of the column. Each row of the twocolumn matrix indexes a element of the matrix mat.

Multiply columns with rows by matching column name and row name in R

I have a data frame which looks like this
> data
A B
1 1 2
2 2 1
I have a reference data frame which looks like this
> ref
Names Values
1 A 5
2 B 10
I want to multiply each column by corresponding row in Ref having same Name
the result should be this
> result
A B
1 5 20
2 10 10
What is the fastest way to achieve this in R? Any help would be greatly appreciated
We can match the column names of 'data' with 'Names' column of 'ref', get the corresponding 'Values' based on the numeric index and then multiply by replicating the ref$Values
data*ref$Values[match(names(data), ref$Names)][col(data)]
# A B
#1 5 20
#2 10 10
If you use Names for rownames instead in ref, you could do something like this as well
rownames(ref) <- ref$Names #assign rownames
ref$Names <- NULL #drop col
i <- intersect(rownames(ref), colnames(data)) #find intersect
mapply(`*`, ref[i, ], data[, i]) #perform multiplication
# [,1] [,2]
#[1,] 5 20
#[2,] 10 10
this should work:
for (n in seq(along = names(data))) {
data[,n] <- data[,n] * ref$Values[which(ref$Names == names(data)[n])]
}

R matrix getting row and column number and actual value

I have a matrix as below
B = matrix(
c(2, 4, 3, 1, 5, 7),
nrow=3,
ncol=2)
B # B has 3 rows and 2 columns
# [,1] [,2]
#[1,] 2 1
#[2,] 4 5
#[3,] 3 7
I would like to create a data.frame with 3 columns: row number, column number and actual value from above matrix. I am thinking of writing 2 for loops. Is there a more efficient way to do this?
The output that i want (i am showing only first 2 rows below)
rownum columnnum value
1 1 2
1 2 1
Try
cbind(c(row(B)), c(col(B)), c(B))
Or
library(reshape2)
melt(B)
As per #nicola's comments, the output needed may be in the row-major order. In that case, take the transpose of the matrix and do the same
TB <- t(B)
cbind(rownum = c(col(TB)), colnum = c(row(TB)), value = c(TB))
data.frame(which(B==B, arr.ind=TRUE), value=as.vector(B))

Resources