Get column index from label in a data frame - r

Say we have the following data frame:
> df
A B C
1 1 2 3
2 4 5 6
3 7 8 9
We can select column 'B' from its index:
> df[,2]
[1] 2 5 8
Is there a way to get the index (2) from the column label ('B')?

you can get the index via grep and colnames:
grep("B", colnames(df))
[1] 2
or use
grep("^B$", colnames(df))
[1] 2
to only get the columns called "B" without those who contain a B e.g. "ABC".

The following will do it:
which(colnames(df)=="B")

I wanted to see all the indices for the colnames because I needed to do a complicated column rearrangement, so I printed the colnames as a dataframe. The rownames are the indices.
as.data.frame(colnames(df))
1 A
2 B
3 C

Following on from chimeric's answer above:
To get ALL the column indices in the df, so i used:
which(!names(df)%in%c())
or store in a list:
indexLst<-which(!names(df)%in%c())

This seems to be an efficient way to list vars with column number:
cbind(names(df))
Output:
[,1]
[1,] "A"
[2,] "B"
[3,] "C"
Sometimes I like to copy variables with position into my code so I use this function:
varnums<- function(x) {w=as.data.frame(c(1:length(colnames(x))),
paste0('# ',colnames(x)))
names(w)= c("# Var/Pos")
w}
varnums(df)
Output:
# Var/Pos
# A 1
# B 2
# C 3

match("B", names(df))
Can work also if you have a vector of names.

To generalize #NPE's answer slightly:
which(colnames(dat) %in% var)
where var is of the form
c("colname1","colname2",...,"colnamen")
returns the indices of whichever column names one needs.

Use t function:
t(colnames(df))
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "var1" "var2" "var3" "var4" "var5" "var6"

Here is an answer that will generalize Henrik's answer.
df=data.frame(A=rnorm(100), B=rnorm(100), C=rnorm(100))
numeric_columns<-c('A', 'B', 'C')
numeric_index<-sapply(1:length(numeric_columns), function(i)
grep(numeric_columns[i], colnames(df)))

#I wanted the column index instead of the column name. This line of code worked for me:
which (data.frame (colnames (datE)) == colnames (datE[c(1:15)]), arr.ind = T)[,1]
#with datE being a regular dataframe with 15 columns (variables)
data.frame(colnames(datE))
#> colnames.datE.
#> 1 Ce
#> 2 Eu
#> 3 La
#> 4 Pr
#> 5 Nd
#> 6 Sm
#> 7 Gd
#> 8 Tb
#> 9 Dy
#> 10 Ho
#> 11 Er
#> 12 Y
#> 13 Tm
#> 14 Yb
#> 15 Lu
which(data.frame(colnames(datE))==colnames(datE[c(1:15)]),arr.ind=T)[,1]
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Related

Extract specific elements in a matrix with a character vector in R

I want to extract specific elements column wise from the matrix A with the information from a character vector B (contain elements in the row names of the matrix) such as:
A <- matrix(seq(1,12),ncol=4)
rownames(A) <- letters[1:3]
A
[,1] [,2] [,3] [,4]
a 1 4 7 10
b 2 5 8 11
c 3 6 9 12
B <- c("a","c","c","b")
I want to get 1,6,9,11. Thanks :)
Two possible ways:
> A[cbind(match(B, rownames(A)), seq_len(ncol(A)))]
[1] 1 6 9 11
>
> diag(A[B, seq_along(B)]) # or diag(A[B, seq_len(ncol(A))])
[1] 1 6 9 11

Filtering data in R with same ID and determining rows which are in both data frames and which rows are not in both data frames

Does anyone know another method for filtering data when there is twice the same ID (Column X) in a data frame but with a different associate value (columns Y)?
Basically I wan to know which rows are in both data frame and after I want to know which row is not in both data frame (Actually I want the value of X and Y of this particular row)
Thank you in advance for your help!
> x <- seq(1:10)
> x[5] <- 4
> y <- (seq.int(1,19,2))
>
> x<- cbind(x,y)
> x
x y
[1,] 1 1
[2,] 2 3
[3,] 3 5
[4,] 4 7
[5,] 4 9
[6,] 6 11
[7,] 7 13
[8,] 8 15
[9,] 9 17
[10,] 10 19
>
> z <- x[1:4,]
> y <- x[6:10,]
>
> z <- rbind(z,y)
> z
x y
[1,] 1 1
[2,] 2 3
[3,] 3 5
[4,] 4 7
[5,] 6 11
[6,] 7 13
[7,] 8 15
[8,] 9 17
[9,] 10 19
>
> df1 <- z[z[,1] %in% x[,1]]
>
> matrix(df1,9,2) # As expected I'm getting 9 rows
[,1] [,2]
[1,] 1 1
[2,] 2 3
[3,] 3 5
[4,] 4 7
[5,] 6 11
[6,] 7 13
[7,] 8 15
[8,] 9 17
[9,] 10 19
>
> # Now I want to know what is the value inside the missing row
> df2 <- z[!z[,1] %in% x[,1]]
>
> matrix(df2,1,2) # I'm getting NA and NA, bu I was expecting the values 4 and 9
[,1] [,2]
[1,] NA NA
To use #hansjaneinvielleicht method:
xlist <- paste(x[,1], x[,2])
zlist <- paste(z[,1], z[,2])
setdiff(xlist, zlist)
# [1] "4 9"
What you're doing here is to filter for values that are not present in x[,1]. However, since 4 is in there, it's also filtered out.
Instead, I assume you'd probably want to work with setdiff method from dplyr (see the doc here)
Then use df2 <- setdiff(x, z)
I am using the cumcount here to adding another key for distinguish the duplicate value in x[,1]
v=ave(x[,1]==x[,1], x[,1], FUN=cumsum)
t=ave(z[,1]==z[,1], z[,1], FUN=cumsum)
df2 <- x[!paste(x[,1],v) %in% paste(z[,1],t)]
matrix(df2,1,2)
[,1] [,2]
[1,] 4 9
x <- data.frame(x)
z <- data.frame(z)
x$from <- "x"
z$from <- "z"
df2 <- merge(x, z, by = c("x", "y"), all.x = T)
df2
# x y from.x from.y
# 1 1 1 x z
# 2 2 3 x z
# 3 3 5 x z
# 4 4 7 x z
# 5 4 9 x <NA>
# 6 6 11 x z
# 7 7 13 x z
# 8 8 15 x z
# 9 9 17 x z
# 10 10 19 x z
df2 <- df2[is.na(df2$from.y),]
df2
# x y from.x from.y
# 5 4 9 x <NA>
Since my real problem was not the one posted since it was too complicated.
Basically, I was not able to apply any solution to my real problem since my real data frames were containing all data types and had a lot of columns.
But I was able to found a solution than work for my real problem but also for the problem posted in the question, so I post the answer than solved my real problem in case it can be useful for someone!
> dup <- which(duplicated(x[,1]) == TRUE)
> ans <- matrix(x[dup,],1,2)
> ans
[,1] [,2]
[1,] 4 9
> # I'm doing this in case the answer was not NA in df2 at the previous step, without
# providing the row "missing"
> df2 <- rbind(df2, ans)
> df2
[,1] [,2]
[1,] 4 9

Getting all the combination of numbers from a list that would sum to a specific number

I have the following list of numbers (1,3,4,5,7,9,10,12,15) and I want to find out all the possible combinations of 3 numbers from this list that would sum to 20.
My research on stackoverflow has led me to this post:
Finding all possible combinations of numbers to reach a given sum
There is a solution provided by Mark which stand as follows:
subset_sum = function(numbers,target,partial=0){
if(any(is.na(partial))) return()
s = sum(partial)
if(s == target) print(sprintf("sum(%s)=%s",paste(partial[-1],collapse="+"),target))
if(s > target) return()
for( i in seq_along(numbers)){
n = numbers[i]
remaining = numbers[(i+1):length(numbers)]
subset_sum(remaining,target,c(partial,n))
}
}
However I am having a hard time trying to tweak this set of codes to match my problem. Or may be there is a simpler solution?
I want the output in R to show me the list of numbers.
Any help would be appreciated.
You can use combn function and filter to meet your criteria. I have performed below calculation in 2 steps but one can perform it in single step too.
v <- c(1,3,4,5,7,9,10,12,15)
AllComb <- combn(v, 3) #generates all combination taking 3 at a time.
PossibleComb <- AllComb[,colSums(AllComb) == 20] #filter those with sum == 20
#Result: 6 sets of 3 numbers (column-wise)
PossibleComb
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 1 1 1 3 3 4
# [2,] 4 7 9 5 7 7
# [3,] 15 12 10 12 10 9
#
# Result in list
split(PossibleComb, col(PossibleComb))
# $`1`
# [1] 1 4 15
#
# $`2`
# [1] 1 7 12
#
# $`3`
# [1] 1 9 10
#
# $`4`
# [1] 3 5 12
#
# $`5`
# [1] 3 7 10
#
# $`6`
# [1] 4 7 9
The combn also have a FUN parameter which we can describe to output as list and then Filter the list elements based on the condition
Filter(function(x) sum(x) == 20, combn(v, 3, FUN = list))
#[[1]]
#[1] 1 4 15
#[[2]]
#[1] 1 7 12
#[[3]]
#[1] 1 9 10
#[[4]]
#[1] 3 5 12
#[[5]]
#[1] 3 7 10
#[[6]]
#[1] 4 7 9
data
v <- c(1,3,4,5,7,9,10,12,15)

Create a specific number of vertors for a list

I need to create a list of "N" vectors with a length "L" that begin in number "B" . If I specify that N=3, L=4 and B=5. I would need a list of the following three vectors.
5 ,6,7,8,
9,10,11,12
13,14,15,16
I can do it manually one by one but I have sometimes 20 or 30 vectors to create with always different lengths.
I would appreciate if someone could give me a hand with this.
Cheers
Carlos
If you are happy with matrix as an output...
N <- 3
L <- 4
B <- 5
x <- seq(from = B, to = B + N * L - 1)
y <- matrix(x, nrow = N, byrow = TRUE)
y
# [,1] [,2] [,3] [,4]
# [1,] 5 6 7 8
# [2,] 9 10 11 12
# [3,] 13 14 15 16
Taking the matrix to list via transposition and data.frame...
as.list(as.data.frame(t(y)))
# $V1
# [1] 5 6 7 8
#
# $V2
# [1] 9 10 11 12
#
# $V3
# [1] 13 14 15 16
I'm showing it in this way partly because I've never liked the coercion of numbers to colnames, certainly other ways to handle that. The transposition may be removed if you set y <- matrix(x, nrow = L) instead. And drop the as.list because technically the data.frame is a list.
as.data.frame(y)
# V1 V2 V3
# 1 5 9 13
# 2 6 10 14
# 3 7 11 15
# 4 8 12 16
You can use split() to get a list output.
split(seq(B, B + L*N - 1), (1:(L*N)-1) %/% N)

Extract elements from a vector within lists

I have a list of length 30000 and each list element contains one vector of length 6.
Example (with a length of just 2):
trainLists <- list(c(1,2,3,4,5,6),c(7,8,9,10,11,12))
I want to "flatten" these lists into a dataframe and create 6 factors (one corresponding to each of the elements in the vectors in the list).
Thus, the result would be:
I can accomplish this with a loop such as
for (i in 1:length(trainLists){
factor1 [i] <- trainLists[[i]][1]
factor2 [i] <- trainLists[[i]][2]
factor3 [i] <- trainLists[[i]][3]
factor4 [i] <- trainLists[[i]][4]
factor5 [i] <- trainLists[[i]][5]
factor6 [i] <- trainLists[[i]][6]
}
but it is horribly slow. How best to accomplish this?
As noted in the comments, most of what you want to do is achieved with a simple do.call(rbind, ...), like this:
> trainLists <- list(c(1,2,3,4,5,6),c(7,8,9,10,11,12))
> trainLists
[[1]]
[1] 1 2 3 4 5 6
[[2]]
[1] 7 8 9 10 11 12
> do.call(rbind, trainLists)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 2 3 4 5 6
[2,] 7 8 9 10 11 12
Taking things a few steps forward, you can do something like this:
cbind(example = seq_along(trainLists),
setNames(data.frame(do.call(rbind, trainLists)),
paste0("Factor_", sequence(
max(sapply(trainLists, length))))))
# example Factor_1 Factor_2 Factor_3 Factor_4 Factor_5 Factor_6
# 1 1 1 2 3 4 5 6
# 2 2 7 8 9 10 11 12

Resources