I'd like to assign NAs to columns based on their name and another column value.
Like in the following example:
Given dataframe iris, I would like to assign NA to all columns whose name starts with "Sepal" and column "Species" == "setosa"
A solution using dplyr mutate_at/mutate_if is preferable, any other solution is also welcome.
I tried
iris %>%
mutate_if(str_detect(names(.), pattern = "Sepal") & (.$Species == "setosa") , function(x){x <- NA})
Error in tbl_if_vars(.tbl, .p, .env, ..., .include_group_vars = .include_group_vars) :
length(.p) == length(tibble_vars) is not TRUE
In dplyr, select vars that contain "Sepal" and assign NA to those rows where Species is "setosa":
iris %>%
mutate_at(vars(contains("Sepal")), funs(ifelse(Species == "setosa", NA, .)))
Or even shorter:
iris %>%
mutate_at(vars(contains("Sepal")),
funs(na_if(Species, "setosa")))
Related
I have a dataset with two columns, in one of them are missing values.
I load it using
data <- read_excel("file.xlsx") %>%
select("ID", "Value")
The tibble looks like that
ID
Value
1
2
NA
4
32
1
The NAs are recognized as such.
However, I use
data["ID"=="NA"] <- NA
to ensure that this is not the problem (R: is.na() does not pick up NA value).
When I try to filter:
data %>%
filter(!is.na(ID))
the whole tibble stays the same, and no row is deleted.
So I try
data %>%
mutate(
isna <- is.na(ID)
)
and all isna are FALSE.
Why doesn't recognize dplyr the NAs?
I am grateful for every help!
data["ID"=="NA"] <- NA
does nothing. The condition "ID"=="NA" is always FALSE, since you are comparing two unequal string literals ("ID" and "NA"). To fix it, use e.g.
data[data$ID == "NA", "ID"] <- NA
Welcome to SO! Use this to get NAs mutated and then delete the NAs:
data <- data %>%
mutate(ID = ifelse(ID == "NA",NA,ID)) %>%
filter(!is.na(ID))
Why not directly
data %>%
filter(ID != "NA")
or
subset(data, ID != "NA")
I would like to apply 3 functions using one code on the same variables in my data.
I have a data set and there are certain columns in my data and i want to apply these function to all of them.
1- make them all factor data
2- replace spaces in the columns with missing(convert space values to missing)
3- give missing value an explicit factor level using fct_explicit_na
i have done this in separate code lines but i want to merge all of them using dplyr mutate function. I tried the following but didnt work
cols <- c("id12", "id13", "id14", "id15")
data_new <- data_old %>%
mutate_if(cols=="", NA) %>% # replace space with NA for cols
mutate_at(cols, factor) %>% # then turn them into factors
mutate_at(cols, fct_explicit_na) # give NAs explicit factor level
)
I get the error:
Error in tbl_if_vars(.tbl, .p, .env, ..., .include_group_vars = .include_group_vars) :
length(.p) == length(tibble_vars) is not TRUE
The mutate_if step is not doing what the OP intend to do. Instead, we can do this in a single step with
library(dplyr)
data_old %>%
mutate_at(vars(cols), ~ na_if(., "") %>%
factor %>%
fct_explicit_na)
Why the OP's code didn't work?
Using a reproducible example, below code converts columns that are factor to character class
iris1 <- iris %>%
mutate_if(is.factor, as.character) %>%
mutate(Species = replace(Species, c(1, 3, 5), ""))
Now, if we do
iris1 %>%
mutate_if("Species" == "", NA)
it is comparing two strings instead of checking the column values. Also, mutate_if should return a logical vector of length 1 for selecting that column.
Instead, if we use
iris1 %>%
mutate_if(~ any(. == ""), ~ na_if(., "")) %>%
head
Here I'm attempting to remove NA values from a tibble :
mc = as_tibble(c("NA" , NA , "ls", "test"))
mc <- filter(mc , is.na == TRUE)
But error is returned :
> mc = as_tibble(c("NA" , NA , "ls", "test"))
> mc <- filter(mc , is.na == TRUE)
Error in filter_impl(.data, quo) :
Evaluation error: comparison (1) is possible only for atomic and list types.
How to remove NA values from this tibble ?
Try:
library(tidyverse)
mc %>%
mutate(value = replace(value, value == "NA", NA)) %>%
drop_na()
Which gives:
# A tibble: 2 x 1
value
<chr>
1 ls
2 test
Second line replaces all "NA" to a real <NA>. Then the third line drops all <NA> values.
If you simply want to remove actual NA values:
library(dplyr)
filter(mc, !is.na(value))
Alternatively (this will check all columns, not just the specified column as above):
na.omit(mc)
If you want to remove both NA values, and values equaling the string "NA":
library(dplyr)
filter(mc, !is.na(value), !value == "NA")
The solutions given by #tyluRp and #danh work perfectly fine.
Just wanted to add another alternative solution with the advantages
simpler code
shorter - for the lazy ones like me :)
See this one-liner:
mc %>% replace(. == "NA", NA) %>% na.omit
I am trying to replace some filtered values of a data set. So far, I wrote this lines of code:
df %>%
filter(group1 == uniq[i]) %>%
mutate(values = ifelse(sum(values) < 1, 2, NA)),
where uniq is just a list containing variable names I want to focus on (and group1 and values are column names). This is actually working. However, it only outputs the altered filtered rows and does not replace anything in the data set df. Does anyone have an idea, where my mistake is? Thank you so much! The following code is to reproduce the example:
group1 <- c("A","A","A","B","B","C")
values <- c(0.6,0.3,0.1,0.2,0.8,0.9)
df = data.frame(group1, group2, values)
uniq <- unique(unlist(df$group1))
for (i in 1:length(uniq)){
df <- df %>%
filter(group1 == uniq[i]) %>%
mutate(values = ifelse(sum(values) < 1, 2, NA))
}
What I would like to get is that it leaves all values except the last one since it is one unique group (group1 == C) and 0.9 < 1. So I'd like to get the exact same data frame here except that 0.9 is replaced with NA. Moreover, would it be possible to just use if instead of ifelse?
dplyr won't create a new object unless you use an assignment operator (<-).
Compare
require(dplyr)
data(mtcars)
mtcars %>% filter(cyl == 4)
with
mtcars4 <- mtcars %>% filter(cyl == 4)
mtcars4
The data are the same, but in the second example the filtered data is stored in a new object mtcars4
How can I simplify or perform the following operations using dplyr:
Run a function on all data.frame names, like mutate_each(funs()) for values, e.g.
names(iris) <- make.names(names(iris))
Delete columns that do NOT exist (i.e. delete nothing), e.g.
iris %>% select(-matches("Width")) # ok
iris %>% select(-matches("X")) # returns empty data.frame, why?
Add a new column by name (string), e.g.
iris %>% mutate_("newcol" = 0) # ok
x <- "newcol"
iris %>% mutate_(x = 0) # adds a column with name "x" instead of "newcol"
Rename a data.frame colname that does not exist
names(iris)[names(iris)=="X"] <- "Y"
iris %>% rename(sl=Sepal.Length) # ok
iris %>% rename(Y=X) # error, instead of no change
I would use setNames for this:
iris %>% setNames(make.names(names(.)))
Include everything() as an argument for select:
iris %>% select(-matches("Width"), everything())
iris %>% select(-matches("X"), everything())
To my understanding there's no other shortcut than explicitly naming the string like you already do:
iris %>% mutate_("newcol" = 0)
I came up with the following solution for #4:
iris %>%
rename_at(vars(everything()),
function(nm)
recode(nm,
Sepal.Length="sl",
Sepal.Width = "sw",
X = "Y")) %>%
head()
The last line just for convenient output of course.
1 through 3 are answered above. I came here because I had the same problem as number 4. Here is my solution:
df <- iris
Set a name key with the columns to be renamed and the new values:
name_key <- c(
sl = "Sepal.Length",
sw = "Sepal.Width",
Y = "X"
)
Set values not in data frame to NA. This works for my purpose better. You could probably just remove it from name_key.
for (var in names(name_key)) {
if (!(name_key[[var]] %in% names(df))) {
name_key[var] <- NA
}
}
Get a vector of column names in the data frame.
cols <- names(name_key[!is.na(name_key)])
Rename columns
for (nm in names(name_key)) {
names(df)[names(df) == name_key[[nm]]] <- nm
}
Select columns
df2 <- df %>%
select(cols)
I'm almost positive this can be done more elegantly, but this is what I have so far. Hope this helps, if you haven't solved it already!
Answer for the question n.2:
You can use the function any_of if you want to give explicitly the full names of the columns.
iris %>%
select(-any_of(c("X", "Sepal.Width","Petal.Width")))
This will not remove the non-existing column X and will remove the other two listed.
Otherwise, you are good with the solution with matches or a combination of any_of and matches.
iris %>%
select(-any_of("X")) %>%
select(-matches("Width"))
This will remove explicitly X and the matches. Multiple matches are also possible.
iris %>%
select(-any_of("X")) %>%
select(-matches(c("Width", "Spec"))) # use c for multiple matches