Say we have the following data frame:
> df
A B C
1 1 2 3
2 4 5 6
3 7 8 9
We can select column 'B' from its index:
> df[,2]
[1] 2 5 8
Is there a way to get the index (2) from the column label ('B')?
you can get the index via grep and colnames:
grep("B", colnames(df))
[1] 2
or use
grep("^B$", colnames(df))
[1] 2
to only get the columns called "B" without those who contain a B e.g. "ABC".
The following will do it:
which(colnames(df)=="B")
I wanted to see all the indices for the colnames because I needed to do a complicated column rearrangement, so I printed the colnames as a dataframe. The rownames are the indices.
as.data.frame(colnames(df))
1 A
2 B
3 C
Following on from chimeric's answer above:
To get ALL the column indices in the df, so i used:
which(!names(df)%in%c())
or store in a list:
indexLst<-which(!names(df)%in%c())
This seems to be an efficient way to list vars with column number:
cbind(names(df))
Output:
[,1]
[1,] "A"
[2,] "B"
[3,] "C"
Sometimes I like to copy variables with position into my code so I use this function:
varnums<- function(x) {w=as.data.frame(c(1:length(colnames(x))),
paste0('# ',colnames(x)))
names(w)= c("# Var/Pos")
w}
varnums(df)
Output:
# Var/Pos
# A 1
# B 2
# C 3
match("B", names(df))
Can work also if you have a vector of names.
To generalize #NPE's answer slightly:
which(colnames(dat) %in% var)
where var is of the form
c("colname1","colname2",...,"colnamen")
returns the indices of whichever column names one needs.
Use t function:
t(colnames(df))
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "var1" "var2" "var3" "var4" "var5" "var6"
Here is an answer that will generalize Henrik's answer.
df=data.frame(A=rnorm(100), B=rnorm(100), C=rnorm(100))
numeric_columns<-c('A', 'B', 'C')
numeric_index<-sapply(1:length(numeric_columns), function(i)
grep(numeric_columns[i], colnames(df)))
#I wanted the column index instead of the column name. This line of code worked for me:
which (data.frame (colnames (datE)) == colnames (datE[c(1:15)]), arr.ind = T)[,1]
#with datE being a regular dataframe with 15 columns (variables)
data.frame(colnames(datE))
#> colnames.datE.
#> 1 Ce
#> 2 Eu
#> 3 La
#> 4 Pr
#> 5 Nd
#> 6 Sm
#> 7 Gd
#> 8 Tb
#> 9 Dy
#> 10 Ho
#> 11 Er
#> 12 Y
#> 13 Tm
#> 14 Yb
#> 15 Lu
which(data.frame(colnames(datE))==colnames(datE[c(1:15)]),arr.ind=T)[,1]
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Related
I want to extract specific elements column wise from the matrix A with the information from a character vector B (contain elements in the row names of the matrix) such as:
A <- matrix(seq(1,12),ncol=4)
rownames(A) <- letters[1:3]
A
[,1] [,2] [,3] [,4]
a 1 4 7 10
b 2 5 8 11
c 3 6 9 12
B <- c("a","c","c","b")
I want to get 1,6,9,11. Thanks :)
Two possible ways:
> A[cbind(match(B, rownames(A)), seq_len(ncol(A)))]
[1] 1 6 9 11
>
> diag(A[B, seq_along(B)]) # or diag(A[B, seq_len(ncol(A))])
[1] 1 6 9 11
Does anyone know another method for filtering data when there is twice the same ID (Column X) in a data frame but with a different associate value (columns Y)?
Basically I wan to know which rows are in both data frame and after I want to know which row is not in both data frame (Actually I want the value of X and Y of this particular row)
Thank you in advance for your help!
> x <- seq(1:10)
> x[5] <- 4
> y <- (seq.int(1,19,2))
>
> x<- cbind(x,y)
> x
x y
[1,] 1 1
[2,] 2 3
[3,] 3 5
[4,] 4 7
[5,] 4 9
[6,] 6 11
[7,] 7 13
[8,] 8 15
[9,] 9 17
[10,] 10 19
>
> z <- x[1:4,]
> y <- x[6:10,]
>
> z <- rbind(z,y)
> z
x y
[1,] 1 1
[2,] 2 3
[3,] 3 5
[4,] 4 7
[5,] 6 11
[6,] 7 13
[7,] 8 15
[8,] 9 17
[9,] 10 19
>
> df1 <- z[z[,1] %in% x[,1]]
>
> matrix(df1,9,2) # As expected I'm getting 9 rows
[,1] [,2]
[1,] 1 1
[2,] 2 3
[3,] 3 5
[4,] 4 7
[5,] 6 11
[6,] 7 13
[7,] 8 15
[8,] 9 17
[9,] 10 19
>
> # Now I want to know what is the value inside the missing row
> df2 <- z[!z[,1] %in% x[,1]]
>
> matrix(df2,1,2) # I'm getting NA and NA, bu I was expecting the values 4 and 9
[,1] [,2]
[1,] NA NA
To use #hansjaneinvielleicht method:
xlist <- paste(x[,1], x[,2])
zlist <- paste(z[,1], z[,2])
setdiff(xlist, zlist)
# [1] "4 9"
What you're doing here is to filter for values that are not present in x[,1]. However, since 4 is in there, it's also filtered out.
Instead, I assume you'd probably want to work with setdiff method from dplyr (see the doc here)
Then use df2 <- setdiff(x, z)
I am using the cumcount here to adding another key for distinguish the duplicate value in x[,1]
v=ave(x[,1]==x[,1], x[,1], FUN=cumsum)
t=ave(z[,1]==z[,1], z[,1], FUN=cumsum)
df2 <- x[!paste(x[,1],v) %in% paste(z[,1],t)]
matrix(df2,1,2)
[,1] [,2]
[1,] 4 9
x <- data.frame(x)
z <- data.frame(z)
x$from <- "x"
z$from <- "z"
df2 <- merge(x, z, by = c("x", "y"), all.x = T)
df2
# x y from.x from.y
# 1 1 1 x z
# 2 2 3 x z
# 3 3 5 x z
# 4 4 7 x z
# 5 4 9 x <NA>
# 6 6 11 x z
# 7 7 13 x z
# 8 8 15 x z
# 9 9 17 x z
# 10 10 19 x z
df2 <- df2[is.na(df2$from.y),]
df2
# x y from.x from.y
# 5 4 9 x <NA>
Since my real problem was not the one posted since it was too complicated.
Basically, I was not able to apply any solution to my real problem since my real data frames were containing all data types and had a lot of columns.
But I was able to found a solution than work for my real problem but also for the problem posted in the question, so I post the answer than solved my real problem in case it can be useful for someone!
> dup <- which(duplicated(x[,1]) == TRUE)
> ans <- matrix(x[dup,],1,2)
> ans
[,1] [,2]
[1,] 4 9
> # I'm doing this in case the answer was not NA in df2 at the previous step, without
# providing the row "missing"
> df2 <- rbind(df2, ans)
> df2
[,1] [,2]
[1,] 4 9
I have the following list of numbers (1,3,4,5,7,9,10,12,15) and I want to find out all the possible combinations of 3 numbers from this list that would sum to 20.
My research on stackoverflow has led me to this post:
Finding all possible combinations of numbers to reach a given sum
There is a solution provided by Mark which stand as follows:
subset_sum = function(numbers,target,partial=0){
if(any(is.na(partial))) return()
s = sum(partial)
if(s == target) print(sprintf("sum(%s)=%s",paste(partial[-1],collapse="+"),target))
if(s > target) return()
for( i in seq_along(numbers)){
n = numbers[i]
remaining = numbers[(i+1):length(numbers)]
subset_sum(remaining,target,c(partial,n))
}
}
However I am having a hard time trying to tweak this set of codes to match my problem. Or may be there is a simpler solution?
I want the output in R to show me the list of numbers.
Any help would be appreciated.
You can use combn function and filter to meet your criteria. I have performed below calculation in 2 steps but one can perform it in single step too.
v <- c(1,3,4,5,7,9,10,12,15)
AllComb <- combn(v, 3) #generates all combination taking 3 at a time.
PossibleComb <- AllComb[,colSums(AllComb) == 20] #filter those with sum == 20
#Result: 6 sets of 3 numbers (column-wise)
PossibleComb
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 1 1 1 3 3 4
# [2,] 4 7 9 5 7 7
# [3,] 15 12 10 12 10 9
#
# Result in list
split(PossibleComb, col(PossibleComb))
# $`1`
# [1] 1 4 15
#
# $`2`
# [1] 1 7 12
#
# $`3`
# [1] 1 9 10
#
# $`4`
# [1] 3 5 12
#
# $`5`
# [1] 3 7 10
#
# $`6`
# [1] 4 7 9
The combn also have a FUN parameter which we can describe to output as list and then Filter the list elements based on the condition
Filter(function(x) sum(x) == 20, combn(v, 3, FUN = list))
#[[1]]
#[1] 1 4 15
#[[2]]
#[1] 1 7 12
#[[3]]
#[1] 1 9 10
#[[4]]
#[1] 3 5 12
#[[5]]
#[1] 3 7 10
#[[6]]
#[1] 4 7 9
data
v <- c(1,3,4,5,7,9,10,12,15)
I need to create a list of "N" vectors with a length "L" that begin in number "B" . If I specify that N=3, L=4 and B=5. I would need a list of the following three vectors.
5 ,6,7,8,
9,10,11,12
13,14,15,16
I can do it manually one by one but I have sometimes 20 or 30 vectors to create with always different lengths.
I would appreciate if someone could give me a hand with this.
Cheers
Carlos
If you are happy with matrix as an output...
N <- 3
L <- 4
B <- 5
x <- seq(from = B, to = B + N * L - 1)
y <- matrix(x, nrow = N, byrow = TRUE)
y
# [,1] [,2] [,3] [,4]
# [1,] 5 6 7 8
# [2,] 9 10 11 12
# [3,] 13 14 15 16
Taking the matrix to list via transposition and data.frame...
as.list(as.data.frame(t(y)))
# $V1
# [1] 5 6 7 8
#
# $V2
# [1] 9 10 11 12
#
# $V3
# [1] 13 14 15 16
I'm showing it in this way partly because I've never liked the coercion of numbers to colnames, certainly other ways to handle that. The transposition may be removed if you set y <- matrix(x, nrow = L) instead. And drop the as.list because technically the data.frame is a list.
as.data.frame(y)
# V1 V2 V3
# 1 5 9 13
# 2 6 10 14
# 3 7 11 15
# 4 8 12 16
You can use split() to get a list output.
split(seq(B, B + L*N - 1), (1:(L*N)-1) %/% N)
I have a list of length 30000 and each list element contains one vector of length 6.
Example (with a length of just 2):
trainLists <- list(c(1,2,3,4,5,6),c(7,8,9,10,11,12))
I want to "flatten" these lists into a dataframe and create 6 factors (one corresponding to each of the elements in the vectors in the list).
Thus, the result would be:
I can accomplish this with a loop such as
for (i in 1:length(trainLists){
factor1 [i] <- trainLists[[i]][1]
factor2 [i] <- trainLists[[i]][2]
factor3 [i] <- trainLists[[i]][3]
factor4 [i] <- trainLists[[i]][4]
factor5 [i] <- trainLists[[i]][5]
factor6 [i] <- trainLists[[i]][6]
}
but it is horribly slow. How best to accomplish this?
As noted in the comments, most of what you want to do is achieved with a simple do.call(rbind, ...), like this:
> trainLists <- list(c(1,2,3,4,5,6),c(7,8,9,10,11,12))
> trainLists
[[1]]
[1] 1 2 3 4 5 6
[[2]]
[1] 7 8 9 10 11 12
> do.call(rbind, trainLists)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 2 3 4 5 6
[2,] 7 8 9 10 11 12
Taking things a few steps forward, you can do something like this:
cbind(example = seq_along(trainLists),
setNames(data.frame(do.call(rbind, trainLists)),
paste0("Factor_", sequence(
max(sapply(trainLists, length))))))
# example Factor_1 Factor_2 Factor_3 Factor_4 Factor_5 Factor_6
# 1 1 1 2 3 4 5 6
# 2 2 7 8 9 10 11 12