Using mutate to output value from named vector - r

I have 1X2 dataframe with values 'sent1' and 'sent2'.
test.df <- data.frame(sentence = c('sent1', 'sent2'))
I also have a reference vector that has values for the combination of the 2 sentences and 3 categories (a, b, c).
test.vec <- c(sent1_a = 1,
sent1_b = 0,
sent1_c = 1,
sent2_a = 0,
sent2_b = 1,
sent2_c = 1)
I would like to create a new df that looks like this:
output.df <- data.frame(sentence = c('sent1', 'sent2'),
a = c(1,0),
b = c(0,1),
c = c(0,1))
output.df
# sentence a b c
#1 sent1 1 0 0
#2 sent2 0 1 1
Ideally, I would like to use mutate to select the relevant values from the vector based on the corresponding sentence that I'm looping through
results <- test.df %>%
mutate(a = test.vec[[paste0(sentence, '_a')]])
However, I'm getting an error on this.
Error in mutate_impl(.data, dots) :
Evaluation error: attempt to select more than one element in vectorIndex.

You can reshape test.vec to the output you need:
library(tidyverse)
data.frame(test.vec) %>%
tibble::rownames_to_column() %>%
separate(rowname, c('sentence', 'vars')) %>%
spread(vars, test.vec)
# sentence a b c
#1 sent1 1 0 1
#2 sent2 0 1 1

Related

Unexpected behavior with case_when and is.na

I want to change all NA values in a column to 0 and all other values to 1. However, I can't get the combination of case_when and is.na to work.
# Create dataframe
a <- c(rep(NA,9), 2, rep(NA, 10))
b <- c(rep(NA,9), "test", rep(NA, 10))
df <- data.frame(a,b, stringsAsFactors = F)
# Create new column (c), where all NA values in (a) are transformed to 0 and other values are transformed to 1
df <- df %>%
mutate(
c = case_when(
a == is.na(.$a) ~ 0,
FALSE ~ 1
)
)
I expect column (c) to indicate all 0 values and one 1 value, but its all 0's.
It does work when I use an if_else statement with is.na, like:
df <- df %>%
mutate(
c = if_else(is.na(a), 0, 1))
)
What is going on here?
You should be doing this instead:
df %>%
mutate(
c = case_when(
is.na(a) ~ 0,
TRUE ~ 1
)
)

using replace_na() with indeterminate number of columns

My data frame looks like this:
df <- tibble(x = c(1, 2, NA),
y = c(1, NA, 3),
z = c(NA, 2, 3))
I want to replace NA with 0 using tidyr::replace_na(). As this function's documentation makes clear, it's straightforward to do this once you know which columns you want to perform the operation on.
df <- df %>% replace_na(list(x = 0, y = 0, z = 0))
But what if you have an indeterminate number of columns? (I say 'indeterminate' because I'm trying to create a function that does this on the fly using dplyr tools.) If I'm not mistaken, the base R equivalent to what I'm trying to achieve using the aforementioned tools is:
df[, 1:ncol(df)][is.na(df[, 1:ncol(df)])] <- 0
But I always struggle to get my head around this code. Thanks in advance for your help.
We can do this by creating a list of 0's based on the number of columns of dataset and set the names with the column names
library(tidyverse)
df %>%
replace_na(set_names(as.list(rep(0, length(.))), names(.)))
# A tibble: 3 x 3
# x y z
# <dbl> <dbl> <dbl>
#1 1 1 0
#2 2 0 2
#3 0 3 3
Or another option is mutate_all (for selected columns -mutate_at or base don conditions mutate_if) and applyreplace_all
df %>%
mutate_all(replace_na, replace = 0)
With base R, it is more straightforward
df[is.na(df)] <- 0

add new column to dataframe by referencing name of existing columns

I have a dataframe of this form:
df <- data.frame(abc = c(1, 0, 3, 2, 0),
foo = c(0, 4, 2, 1, 0),
glorx = c(0, 0, 0, 1, 2))
Here, the column names are strings and the values in the data frame are the number of times I would like to concatenate that string in a new data column. The new column I'd like to create would be a concatenation across all existing columns, with each column name being repeated according to the data.
For example, I'd like to create this new column and add it to the dataframe.
new_col <- c('abc', 'foofoofoofoo', 'abcabcabcfoofoo', 'abcabcfooglorx', 'glorxglorx')
also_acceptable <- c('abc', 'foofoofoofoo', 'abcfooabcfooabc', 'abcfooglorxabc', 'glorxglorx')
df %>% mutate(new_col = new_col, also_acceptable = also_acceptable)
The order of concatenation does not matter. The core problem I have is I don't know how to reference the name of a column by row when constructing a purrr::map() or dplyr::mutate() function to build a new column. Thus, I'm not sure how to programatically construct this new column.
(The core application here is combinatorial construction of chemical formulae in case anyone wonders why I would need such a thing.)
Here is an option using Map and strrep:
mutate(df, new_col = do.call(paste, c(sep="", Map(strrep, names(df), df))))
# abc foo glorx new_col
#1 1 0 0 abc
#2 0 4 0 foofoofoofoo
#3 3 2 0 abcabcabcfoofoo
#4 2 1 1 abcabcfooglorx
#5 0 0 2 glorxglorx
Or a simpler version as #thelatemail's comment:
df %>% mutate(new_col = do.call(paste0, Map(strrep, names(.), .)))
Map gives a list as follows:
Map(strrep, names(df), df) %>% as.tibble()
# A tibble: 5 x 3
# abc foo glorx
# <chr> <chr> <chr>
#1 abc
#2 foofoofoofoo
#3 abcabcabc foofoo
#4 abcabc foo glorx
#5 glorxglorx
Use do.call(paste, ...) to paste strings rowwise.

If any column in a row meets condition than mutate() column

Using dplyr, I am trying to conditionally update values in a column using ifelse and mutate. I am trying to say that, in a data frame, if any variable (column) in a row is equal to 7, then variable c should become 100, otherwise c remains the same.
df <- data.frame(a = c(1,2,3),
b = c(1,7,3),
c = c(5,2,9))
df <- df %>% mutate(c = ifelse(any(vars(everything()) == 7), 100, c))
This gives me the error:
Error in mutate_impl(.data, dots) :
Evaluation error: (list) object cannot be coerced to type 'double'.
The output I'd like is:
a b c
1 1 1 5
2 2 7 100
3 3 3 9
Note: this is an abstract example of a larger data set with more rows and columns.
EDIT:
This code gets me a bit closer, but it does not apply the ifelse statement by each row. Instead, it is changing all values to 100 in column c if 7 is present anywhere in the data frame.
df <- df %>% mutate(c = ifelse(any(select(., everything()) == 7), 100, c))
a b c
1 1 1 100
2 2 7 100
3 3 3 100
Perhaps this is not possible to do using dplyr?
I think this should work. We can check if values in df equal to 7. After that, use rowSums to see if any rows larger than 0, which means there is at least one value is 7.
df <- df %>% mutate(c = ifelse(rowSums(df == 7) > 0, 100, c))
Or we can use apply
df <- df %>% mutate(c = ifelse(apply(df == 7, 1, any), 100, c))
A base R equivalent is like this.
df$c[apply(df == 7, 1, any)] <- 100
You could try with purrr::map_dbl
library(purrr)
df$c <- map_dbl(1:nrow(df), ~ifelse(any(df[.x,]==7), 100, df[.x,]$c))
Output
a b c
1 1 1 5
2 2 7 100
3 3 3 9
In a dplyr::mutate statement this would be
library(purrr)
library(dplyr)
df %>%
mutate(c = map_dbl(1:nrow(df), ~ifelse(any(df[.x,]==7), 100, df[.x,]$c)))

Deleting columns from a data frame based on presence/absence of certain values

I would like to subset a data frame by removing columns that meet or do not meet a certain condition. For example, given the following data:
df <- data.frame(w = c('a', 'b', 'c'),
x = c(1, 0, 0),
y = c(0, 1, 0),
z = c(0, 0, 1))
Which gives:
w x y z
a 1 0 0
b 0 1 0
c 0 0 1
I would like to remove columns that contain a 0 after subsetting the rows. For example:
df %>% filter(., w == 'a')
Produces:
w x y z
a 1 0 0
Which I would like to then reduce to:
x
1
I am looking to do this using dplyr, thus the next step should be piped after the filter command. I have tried using summarise in conjunction with apply, but that has not worked.
You can use select_if():
df %>% filter(w == 'a') %>% select_if(function(col) is.numeric(col) && all(col != 0))
# x
#1 1

Resources