sub-setting column based on column name in R - r

I have one data frame with column name as below
colnames(Data)
[1] "ID" "A" "B" "C" "D" "E" "F" "G"
I wanted to select all columns ahead of column D
Currently there are column E, F and G. but I might expect few more column which I am not sure, also I might expect few more columns before D as well , so I am not sure about at which location column D will be available
Is there any subset command in R we can use? Like below
Datanew <- subset(Data,select=c("D","E","F","G"))
Please advice.

Find which column is D and select all the following columns (using ncol):
columnToSelect <- which(names(Data) == "D"):ncol(Data)
Datanew <- subset(Data, select = columnToSelect)

You can use tail to get the last n names of the data frame once you find where column D is. We can utilize it like this
tail(1:5, 3) # return the last three elements
The following is equivalent
tail(1:5, -2) # don't return the first two elements
If we use which to find column D
columnToSelect <- which(names(Data) == "D")
We can use tail to get all of the columns from D and following.
tail(names(Data), -(columnToSelect - 1))
The column selection, then, can be wrapped up in one neat little call
Data[tail(names(Data), -(which(names(Data) == "D") - 1))]
A fully reproducible example:
Data <-
lapply(LETTERS[1:10],
function(l){
x <- data.frame(l = rnorm(10))
names(x) <- l
x
})
Data <- as.data.frame(Data)
Data[tail(names(Data), -(which(names(Data) == "D") - 1))]

Related

Reference R data frame column name as a string, given only the column name

I have a data frame df. It has a column named b. I know this column name, although I do not know its position in the data frame. I know that colnames(df) will give me a vector of character strings that are the names of all the columns, but I do not know how to get a string for this particular column. In other words, I want to obtain the string "b". How can I do that? I imagine this may involve the rlang package, which I have difficulty understanding.
Here's an example:
library(rlang)
library(tidyverse)
a <- c(1:8)
b <- c(23,34,45,43,32,45,68,78)
c <- c(0.34,0.56,0.97,0.33,-0.23,-0.36,-0.11,0.17)
df <- data.frame(a,b,c)
tf <- function(df,MYcol) {
print(paste0("The name of the input column is ",MYcol)) # does not work
print(paste0("The name of the input column is ",{{MYcol}})) # does not work
y <- {{MYcol}} # This gives the values in column b as it shoulkd
}
z <- tf(df,b) # Gives undesired values - I want the string "b"
If you cannot pass column name as string in the function (tf(df,"b")) directly, you can use deparse + substitute.
tf <- function(df,MYcol) {
col <- deparse(substitute(MYcol))
print(paste0("The name of the input column is ",col))
return(col)
}
z <- tf(df,b)
#[1] "The name of the input column is b"
z
#[1] "b"
We can use as_string with enquo/ensym
tf <- function(df, MYcol) {
mycol <- rlang::as_string(rlang::ensym(MYcol))
print(glue::glue("The name of the input column is {mycol}"))
return(mycol)
}
z <- tf(df,b)
The name of the input column is b
z
#[1] "b"

Find if a row contain a character and create a new column to label data

Having a dataframe with one column and every check in every row if "#" exist in text like these data:
"https://example.com/test-ability/321#321"
"https://example.com/test-ability/"
"anothertext#"
"notwithwhatyousearch"
How is it possible to find if every row contains the character "#" and create a second new column and label the row which have this character with "A" and row which has not the character with "B"?
Example of expected out
"https://example.com/test-ability/321#321", "A"
"https://example.com/test-ability/", "B"
"anothertext#", "A"
"notwithwhatyousearch", "B"
df = data.frame(x = c("https://example.com/test-ability/321#321",
"https://example.com/test-ability/",
"anothertext#",
"notwithwhatyousearch"), stringsAsFactors = F)
library(dplyr)
df %>% mutate(flag = ifelse(grepl("#", x), "A", "B"))
# x flag
# 1 https://example.com/test-ability/321#321 A
# 2 https://example.com/test-ability/ B
# 3 anothertext# A
# 4 notwithwhatyousearch B
Or a base R solution:
df$flag = ifelse(grepl("#", df$x), "A", "B")
Try this Python code, I don't know much about R language may be this will help you as reference:
ls=["https://example.com/test-ability/321#321",
"https://example.com/test-ability/",
"anothertext#",
"notwithwhatyousearch"] #Creating Data Frame
length=len(ls) #Finding list length
for i in range(0, length): #Iteration
if('#' in ls[i]):
print(ls[i],' A')
else:
print(ls[i],' B')

How to replace existing values by new values from look-up list without causing NA?

I have a data frame. One column contains the following values:
df$current_column=(A,B,C,D,E)
A vector contains a look up value:
v <- c(A=X, B=Y)
I want to replace this column to come up with a list of (X, Y, C,D,E)
I am thinking to create a new column like
df$new_column <- v[df$current_column]
It does the replacement of A and B but it also makes C,D,E as NA (X,Y, NA, NA, NA).
How to keep C,D and E or is there any other way?
looks like ifelse() could help:
d$current_column <- ifelse( d$current_column == A, X,
ifelse( d$current_column == B, Y, d$current_column ))
We can create a logical index with %in% and then do the conversion
i1 <- df$current_column %in% names(v)
df$new_column <- df$current_column
df$new_column[i1] <- v[df$new_column[i1]]
df$new_column
#[1] "X" "Y" "C" "D" "E"
Or use a single ifelse
with(df, ifelse(current_column %in% names(v),
v[current_column], current_column))
Update
If the 'current_column' is factor class, convert to character class and it should work.
df$new_column <- as.character(df$current_column)
df$new_column[i1] <- v[df$new_column[i1]]
data
df <- data.frame(current_column = LETTERS[1:5],
stringsAsFactors=FALSE)
v <- setNames(c('X', 'Y'), LETTERS[1:2])
user2029709,
-- was working off of your little example; for a more generic approach it would be nice to see a snippet of the real data or close simulation. In any case, here is something that may work for you better, without coding manually all ifelse() options, and is still a relatively straightforward solution:
vd <- data.frame(current_column = names(v), new_column = v, stringsAsFactors = FALSE)
df <- merge(df, vd, by = 'current_column', all.x = TRUE)
df$new_column <- ifelse(is.na(df$new_column), df$current_column, df$current_column)
You may have to modify data types when creating vd data.frame to assure proper merge.
Best,
oleg

Keep quotes when creating data frame in R

I have a data frame df containing many columns. From these, I extract two (col1 and col2) and use df2 = data.frame(df$col1, df$col2) for this.
It works: a new dataframe made of those two columns is created. But df$col1 was made of strings as:
"test1"
"test2"
df2$col1 is made instead of values (not sure how to call them) as:
test1
test2
Intersection between these df$col1 and df2$col1 yields zero. How do I keep the column as a string in the new data frame?
I tried adding stringsAsFactors = FALSE but nothing changed.
'df' is your data frame and you do not want to change the original data type. i.e., you should retain your string data type.
So basically you should subset those columns from the original data frame instead of creating a new data frame using 'data.frame'.
> df2<-df[,c("col1","col2")]
You can check the data type of each column in data frame by
> str(df2)
Your first data.frame has col1 set as character. When you create a second data.frame, this character column is coerced to factor. Here's a possible short proof.
> df1 <- data.frame(col1 = c("a", "b", "c"), col2 = 1:3)
> df1$col1
[1] a b c
Levels: a b c
> df1$col1 <- as.character(df1$col1)
> df1$col1
[1] "a" "b" "c" # this is what you have
>
> df2 <- data.frame(col1 = df1$col1)
> df2$col1
[1] a b c # coerced to factor
Levels: a b c

Get row and column name of data.frame according to condition in R

This is probably much easier than I am making it. It is one of those little problems which hangs you up and you wonder why.
Given dataframe as so:
a <- c(1,2,3)
b <- c(4,5,6)
test.df <- data.frame(a,b)
How could one use iterate through the values in each column and return the column name and row name if the value = '1'?
Something like this:
for (i in test.df) {
for (j in i) {
if (i == 1) {
print(rowname,columnname)
}
}
}
}
Where rowname and columnname are actual values.
Using which and arr.ind=T is a way:
Example Data
a <- c(1,2,3)
b <- c(4,5,6)
test.df <- data.frame(a,b)
Solution and Output
#test.df==1 gives a TRUE/FALSE matrix
#which with the arr.ind argument=TRUE gives the row/col of the TRUE elements
a <- which(test.df==1,arr.ind=T)
> a
row col
[1,] 1 1
And then you use the above to get the row and column names:
> row.names(test.df[a[,1],] ) #row names
[1] "1"
> names(test.df[a[,2]]) #column names
[1] "a"
Another approach:
> col = colnames(test.df)[apply(test.df, 2, function(u) any(u==1))]
> col
[1] "a"
> row = row.names(test.df)[apply(test.df, 1, function(u) any(u==1))]
> row
[1] "1"
This is an old thread but I am not very satisfied with the previous solutions.
For me the most intuitive is:
row.names(data)[which(data$feature1==value)]
Which is basically saying: Given the row names of all the data give me those where a given condition is satisfied.

Resources