In a data frame, I want to replace a value based on a condition in another column.
Example: when the value in column A is above x, then both values in column A and B are replaced by NA.
I can't find the proper way to do this with the different functions: na_if, ifelse, if_else,case_when...
Subscript the data frame by a logical vector having the condition:
DF[DF$A > x, c("A", "B")] <- NA
Here's a working answer:
d <- data.frame("A" = 1:10, "B" = 11:20)
x <- 5
d[d$A > x, c("A", "B")] <- NA
Related
I have a dataframe with two columns of related data. I want to create a third column that combines them, but there are lots of NAs in one or both columns. If both columns have a non-NA value, I want the new third column to paste both values. If either of the first two columns has an NA, I want the third column to contain just the non-NA value. An example with a toy data frame is below:
x <- c("a", NA, "c", "d")
y <- c("l", "m", NA, "o")
df <- data.frame(x, y)
# this is the new column I want to produce from columns x and y above
df$z <- c("al", "m", "c", "do")
I thought coalesce would solve my problem, but I can't find a way to keep both values if there is a value in both columns. Thanks in advance for any assistance.
One posible solution:
df$z <- gsub("NA", "",paste0(df$x, df$y))
Another possible solution:
library(dplyr)
df %>%
mutate(z = ifelse(is.na(x) | is.na(y), coalesce(x,y), paste0(x,y)))
An option with unite
library(tidyr)
library(dplyr)
df %>%
unite(z, everything(), na.rm = TRUE, sep = "", remove = FALSE)
z x y
1 al a l
2 m <NA> m
3 c c <NA>
4 do d o
My goal is to get a concise way to rename multiple columns in a data frame. Let's consider a small data frame df as below:
df <- data.frame(a=1, b=2, c=3)
df
Let's say we want to change the names from a, b, and c to Y, W, and Z respectively.
Defining a character vector containing old names and new names.
df names <- c(Y = "a", Z ="b", E = "c")
I would use this to rename the columns,
rename(df, !!!names)
df
suggestions?
One more !:
df <- data.frame(a=1, b=2, c=3)
df_names <- c(Y = "a", Z ="b", E = "c")
library(dplyr)
df %>% rename(!!!df_names)
## Y Z E
##1 1 2 3
A non-tidy way might be through match:
names(df) <- names(df_names)[match(names(df), df_names)]
df
## Y Z E
##1 1 2 3
You could try:
sample(LETTERS[which(LETTERS %in% names(df) == FALSE)], size= length(names(df)), replace = FALSE)
[1] "S" "D" "N"
Here, you don't really care what the new names are as you're using sample. Otherwise a straight forward names(df) < c('name1', 'name2'...
I have a df like this:
a <- c(4,5,3,5,1)
b <- c(8,9,7,3,5)
c <- c(6,7,5,4,3)
df <- data.frame(rbind(a,b,c))
I want a new df, df2, containing the difference between the values in each cell in rows a and b and the value in row c in their respective columns.
df2 would look like this:
a <- c(-2,-2,-2,1,-2)
b <- c(2,2,2,-1,2)
df2 <- data.frame(rbind(a,b))
Here is where I'm getting stuck:
df2 <- data.frame(apply(df,c(1,2),function(x) x - df[nrow(df),the col index of x]))
How do I reference the column index of x? Is there something like JavaScript's this?
We can do this easily by replicating the 3rd row to make the lengths equal before subtracting with the first two rows
out <- df[c("a", "b"),] - df["c",][col(df[c("a", "b"),])]
identical(df2, out)
#[1] TRUE
Or explicitly using rep
df[c("a", "b"),] - rep(unlist(df["c",]), each = 2)
I have one data frame with column name as below
colnames(Data)
[1] "ID" "A" "B" "C" "D" "E" "F" "G"
I wanted to select all columns ahead of column D
Currently there are column E, F and G. but I might expect few more column which I am not sure, also I might expect few more columns before D as well , so I am not sure about at which location column D will be available
Is there any subset command in R we can use? Like below
Datanew <- subset(Data,select=c("D","E","F","G"))
Please advice.
Find which column is D and select all the following columns (using ncol):
columnToSelect <- which(names(Data) == "D"):ncol(Data)
Datanew <- subset(Data, select = columnToSelect)
You can use tail to get the last n names of the data frame once you find where column D is. We can utilize it like this
tail(1:5, 3) # return the last three elements
The following is equivalent
tail(1:5, -2) # don't return the first two elements
If we use which to find column D
columnToSelect <- which(names(Data) == "D")
We can use tail to get all of the columns from D and following.
tail(names(Data), -(columnToSelect - 1))
The column selection, then, can be wrapped up in one neat little call
Data[tail(names(Data), -(which(names(Data) == "D") - 1))]
A fully reproducible example:
Data <-
lapply(LETTERS[1:10],
function(l){
x <- data.frame(l = rnorm(10))
names(x) <- l
x
})
Data <- as.data.frame(Data)
Data[tail(names(Data), -(which(names(Data) == "D") - 1))]
I have a data frame. One column contains the following values:
df$current_column=(A,B,C,D,E)
A vector contains a look up value:
v <- c(A=X, B=Y)
I want to replace this column to come up with a list of (X, Y, C,D,E)
I am thinking to create a new column like
df$new_column <- v[df$current_column]
It does the replacement of A and B but it also makes C,D,E as NA (X,Y, NA, NA, NA).
How to keep C,D and E or is there any other way?
looks like ifelse() could help:
d$current_column <- ifelse( d$current_column == A, X,
ifelse( d$current_column == B, Y, d$current_column ))
We can create a logical index with %in% and then do the conversion
i1 <- df$current_column %in% names(v)
df$new_column <- df$current_column
df$new_column[i1] <- v[df$new_column[i1]]
df$new_column
#[1] "X" "Y" "C" "D" "E"
Or use a single ifelse
with(df, ifelse(current_column %in% names(v),
v[current_column], current_column))
Update
If the 'current_column' is factor class, convert to character class and it should work.
df$new_column <- as.character(df$current_column)
df$new_column[i1] <- v[df$new_column[i1]]
data
df <- data.frame(current_column = LETTERS[1:5],
stringsAsFactors=FALSE)
v <- setNames(c('X', 'Y'), LETTERS[1:2])
user2029709,
-- was working off of your little example; for a more generic approach it would be nice to see a snippet of the real data or close simulation. In any case, here is something that may work for you better, without coding manually all ifelse() options, and is still a relatively straightforward solution:
vd <- data.frame(current_column = names(v), new_column = v, stringsAsFactors = FALSE)
df <- merge(df, vd, by = 'current_column', all.x = TRUE)
df$new_column <- ifelse(is.na(df$new_column), df$current_column, df$current_column)
You may have to modify data types when creating vd data.frame to assure proper merge.
Best,
oleg