R subset elements specifying column and row if each element - r

How should I subset a matrix specifying both the line and the column of each item ? I'm currently using sapply but I don't find that particularly elegant:
> mat <- data.frame(a=c(1,2,3),b=c(7,6,5))
> mat
a b
1 1 7
2 2 6
3 3 5
> rowSel <- 1:3
> colSel <- c(1,2,1)
> sapply(rowSel,function(i){mat[i,colSel[i]]})
[1] 1 6 3

A shorter way:
mat[cbind(rowSel, colSel)]
#[1] 1 6 3
This uses the indexing by a twocolumn matrix. The first column contains the index of the row, the second column contains the index of the column. Each row of the twocolumn matrix indexes a element of the matrix mat.

Related

Column-wise element selection in R

I need to select elements of a dataframe using the row indices, as stored in a vector. In other words, I have a vector or length equal to the number of columns in a data frame, and that vector contains the row numbers of the elements I need to extract (one element per column, in order).
How would I go about doing this?
Example:
vec <- c(1,2,1)
df <- data.frame(matrix(1:6, ncol = 3, nrow = 2))
That would look like this:
X1 X2 X3
1 1 3 5
2 2 4 6
And I would need to get elements (1,4,5) using the indices from vec = 1,2,1
We can use :
df[cbind(vec, 1:ncol(df))]
#[1] 1 4 5
Using cbind, we create a row and column index to subset values from df.
cbind(vec, 1:ncol(df))
vec
#[1,] 1 1
#[2,] 2 2
#[3,] 1 3
Using this matrix, we subset values from (row 1, column1), (row2, column2) and row(1, column3).

R, conditional summing of every second cell in each row

I have a data frame and want for each row the sum of every second cell (beginning with the second cell), whose left neighbor is greater than zero. Here's an example:
a <- c(-2,1,1,-2)
b <- c(1,2,3,4)
c <- c(-2,1,-1,2)
d <- c(5,6,7,8)
df <- data.frame(a,b,c,d)
This gives:
> df
a b c d
1 -2 1 -2 5
2 1 2 1 6
3 1 3 -1 7
4 -2 4 2 8
For the first row the correct sum is 0 (the left neighbor of 1 is -2 and the left neighbor of 5 is also -2); for the second it's 8; for the third it's 3; for the fourth it's again 8.
I want to do it without loops, so I tried it with sum() and which() like in Conditional Sum in R, but could not find a way through.
We subset the dataset for alternating columns using the recycling vector (c(TRUE, FALSE)) to get the 1st, 3rd, ...etc columns of the dataset, convert it to a logical vector by checking whether it is greater than 0 ( > 0), then multiply the values with the second subset of alternating columns ie. columns 2nd, 4th etc. by using the recycling vector (c(FALSE, TRUE)). The idea is that if there are values in the left column that are less than 0, it will be FALSE in the logical matrix and it gets coerced to 0 by multiplying with the other subset. Finally, do the rowSums to get the expected output
rowSums((df[c(TRUE, FALSE)]>0)*df[c(FALSE, TRUE)])
#[1] 0 8 3 8
It can be also replaced with seq
rowSums((df[seq(1, ncol(df), by = 2)]>0)*df[seq(2, ncol(df), by = 2)])
#[1] 0 8 3 8
Or another option is Reduce with Map
Reduce(`+`, Map(`*`, lapply(df[c(TRUE, FALSE)], `>`, 0), df[c(FALSE, TRUE)]))
#[1] 0 8 3 8

how to name data frame columns to column index

It is a very basic question.How can you set the column names of data frame to column index? So if you have 4 columns, column names will be 1 2 3 4. The data frame i am using can have up to 100 columns.
It is not good to name the column names with names that start with numbers. Suppose, we name it as seq_along(D). It becomes unnecessarily complicated when we try to extract a column. For example,
names(D) <- seq_along(D)
D$1
#Error: unexpected numeric constant in "D$1"
In that case, we may need backticks or ""
D$"1"
#[1] 1 2 3
D$`1`
#[1] 1 2 3
However, the [ should work
D[["1"]]
#[1] 1 2 3
I would use
names(D) <- paste0("Col", seq_along(D))
D$Col1
#[1] 1 2 3
Or
D[["Col1"]]
#[1] 1 2 3
data
D <- data.frame(a=c(1,2,3),b=c(4,5,6),c=c(7,8,9),d=c(10,11,12))
Just use names:
D <- data.frame(a=c(1,2,3),b=c(4,5,6),c=c(7,8,9),d=c(10,11,12))
names(D) <- 1:ncol(D) # sequence from 1 through the number of columns

Multiply columns with rows by matching column name and row name in R

I have a data frame which looks like this
> data
A B
1 1 2
2 2 1
I have a reference data frame which looks like this
> ref
Names Values
1 A 5
2 B 10
I want to multiply each column by corresponding row in Ref having same Name
the result should be this
> result
A B
1 5 20
2 10 10
What is the fastest way to achieve this in R? Any help would be greatly appreciated
We can match the column names of 'data' with 'Names' column of 'ref', get the corresponding 'Values' based on the numeric index and then multiply by replicating the ref$Values
data*ref$Values[match(names(data), ref$Names)][col(data)]
# A B
#1 5 20
#2 10 10
If you use Names for rownames instead in ref, you could do something like this as well
rownames(ref) <- ref$Names #assign rownames
ref$Names <- NULL #drop col
i <- intersect(rownames(ref), colnames(data)) #find intersect
mapply(`*`, ref[i, ], data[, i]) #perform multiplication
# [,1] [,2]
#[1,] 5 20
#[2,] 10 10
this should work:
for (n in seq(along = names(data))) {
data[,n] <- data[,n] * ref$Values[which(ref$Names == names(data)[n])]
}

Extract every n-th + x subsequent elements from a vector

I would like to create a vector in which each element is the n-th element plus the x following elements of another vector.
For example, if I have the vector a:
a <- c(1,2,3,4,5,6,7,8,9,10)
My new vector b should have the elements
b <- c(1,2,5,6,9,10)
meaning the first two elements, the third two elements etc.
Any help would be much appreciated!
Logical indexing with recycling easily does this:
a <- c(1,2,3,4,5,6,7,8,9,10)
a[c(T,T,F,F)]
## [1] 1 2 5 6 9 10
From your comment to the question:
n <- 4
x <- 2
a[c(rep(T, n-x), rep(F,x))]
## [1] 1 2 5 6 9 10

Resources