How to use ifelse to replace the values in a column with values in another column in a dataframe in R? - r

For example, I merged two dataframes using full_join() in dplyr as following:
df_1 <- data.frame(id = c(1,2,3,4,5), x = c('a', 'b', 'c', 'd', 'e'))
df_2 <- data.frame(id = c(2,4,5,6,7,8), y = c('f', 'g', 'h', 'i', 'j', 'k'))
df <- full_join(df_2, df_1, by = 'id')
I want to use ifelse() to do the following:
For each row, check whether there is missing value in x column
If yes, input "NO" into the y column
If no, input the value of x into the y column
I tried this code:
df$y <- ifelse(is.null(x), "NO", x)
But the result was not what I wanted:
What did I do wrong? Could you provide some suggestions on fixing the code?
Thank you a lot.

The following will do what you want:
df$y <- ifelse(is.na(df$x), "NO", df$x)
The problem appears to be is.null() where is.na() should be used.

Related

assertr: Automatically verify assumptions on columns

I want to automatically verify assumptions on columns of my tibble using assertr. The problem is that I have hundreds of columns so I can't apply them column by column. Data looks like this:
df <- tibble(
x = c(1,0,1,1,2),
y = c('A', 'B', 'C', 'D', 'A'),
z = c(1/3, 4, 5/7, 100, 3))
Then I have another tibble which describes types of columns:
df_map <- tibble(
col = c('x','y','z'),
col_type = c('POSSIBLE VALUES 0 AND 1', 'LESS THAN 1 MISSING VALUE', 'POSITIVE VALUE')
)
This can be written like:
df %>%
assert(in_set(0,1), x) %>%
assert_rows(num_row_NAs, within_bounds(0,1), y) %>%
verify(z > 0)
My question is how to apply these verifications (or any other) with this mapping so I don't need to write them for every single column.

very simple pivot of dataframe in r

Can someone please tell me how to get from dataframe df to the dataframe desired in the below reproducible code example, please? It is basically just a simple pivot. Thanks!
id <- c(1, 2)
x <- c('a', 'b')
y <- c('c', 'd')
df <- data.frame(id, x, y)
head(df)
c1 <- c('id', 'x', 'y')
c2 <- c('1', 'a', 'c')
c3 <- c('2', 'b', 'd')
desired <- data.frame(c1, c2, c3)
head(desired)
This seems to work (see also here):
test <- transpose(df)
colnames(test) <- rownames(df)
rownames(test) <- colnames(df)
but is it the best approach?
A one liner would be
setNames(data.frame(names(df), t(df)), paste0("c", 1:3))
A more general solution, without hardcoding the vector 1:3 is
setNames(data.frame(names(df), t(df)), paste0("c", seq_along(names(df))))

Why isn't a data.table sorted properly after calling setDT?

When a data table is turned into a data frame and then back into a data table it may keep the sorted attribute even though it is not sorted (see the example below). This leads to incorrect results when merging data.tables, and possible undetected bugs.
Is this the expected behavior? What is the best way to turn a data.frame into a sorted data.table and verify that it is indeed sorted?
library(data.table)
library(dplyr)
a <- data.table(id = c('a', 'B', 'c'), value = c(1,2,3))
b <- data.table(id = c('a', 'B', 'c'))
setkey(a,id)
a_sum <- a %>%
group_by(id) %>%
summarize_at(vars(value), sum)
setDT(a_sum, key = "id")
a_sum_nokey = setkey(copy(a_sum), NULL)
merged_key_fails = merge(a_sum, b, by="id")
merged_no_key_works = merge(a_sum_nokey, b, by="id")

R: Match a dataframe with 3 others, and create a column

I have a big data frame (df) with a variable id, and 3 other data (df1, df2, df3) frames that have some values of this id. So like the big dataframe has id 1:100, df1 might have 1,2,4,11 etc.
What i need to do is add a column to the big dataframe so that it says from which of the smaller dataframes the data came from.
df$new[df$id %in% df1$id] <- 1
df$new[df$id %in% df2$id] <- 2
df$new[df$id %in% df3$id] <- 3
df$new<- factor(df$new, labels = c('a', 'b', 'c'))
This is my solution but i don't really like it. Any other ideas?
We can use a nested ifelse
with(df, ifelse(id %in% df1$id, 'a',
ifelse(id %in% df2$id, 'b',
ifelse(id %in% df3$id, 'c', id)))

Combine vector and data.frame matching column values and vector values

I have
vetor <- c(1,2,3)
data <- data.frame(id=c('a', 'b', 'a', 'c', 'a'))
I need a data.frame output that match each vector value to a specific id, resulting:
id vector1
1 a 1
2 b 2
3 a 1
4 c 3
5 a 1
Here are two approaches I often use for similar situations:
vetor <- c(1,2,3)
key <- data.frame(vetor=vetor, mat=c('a', 'b', 'c'))
data <- data.frame(id=c('a', 'b', 'a', 'c', 'a'))
data$vector1 <- key[match(data$id, key$mat), 'vetor']
#or with merge
merge(data, key, by.x = "id", by.y = "mat")
So you want one unique integer for each different id column?
This is called a factor in R, and your id column is one.
To convert to a numeric representation, use as.numeric:
data <- data.frame(id=c('a', 'b', 'a', 'c', 'a'))
data$vector1 <- as.numeric(data$id)
This works because data$id is not a column of strings, but a column of factors.
Here's an answer I found that follows the "mathematical.coffee" tip:
vector1 <- c('b','a','a','c','a','a') # 3 elements to be labeled: a, b and c
labels <- factor(vector1, labels= c('char a', 'char b', 'char c') )
data.frame(vector1, labels)
The only thing we need to observe is that in the factor(vector1,...) function, vector1 will be ordered and the labels must follow that order correctly.

Resources