I have a dataframe df of integers across 6 variables.
a <- c(NA, NA, NA, 0, 0, 1, 1, 1)
b <- c(NA, NA, NA, 2, 2, 3, 3, 3)
c <- c(NA, NA, NA, 2, 2, 3, 3, 3)
d <- c(NA, NA, NA, 1, 1, 2, 2, 2)
e <- c(NA, NA, NA, 0, 0, 1, 1, 1)
f <- c(NA, NA, NA, 0, 0, 1, 1, 1)
df <- data.frame(a, b, c, d, e, f)
print(df)
a b c d e f
1 NA NA NA NA NA NA
2 NA NA NA NA NA NA
3 NA NA NA NA NA NA
4 0 2 2 1 0 0
5 0 2 2 1 0 0
6 1 3 3 2 1 1
7 1 3 3 2 1 1
8 1 3 3 2 1 1
I would like to add 1 to each row that contains a zero, resulting in:
a b c d e f
1 NA NA NA NA NA NA
2 NA NA NA NA NA NA
3 NA NA NA NA NA NA
4 1 3 3 2 1 1
5 1 3 3 2 1 1
6 1 3 3 2 1 1
7 1 3 3 2 1 1
8 1 3 3 2 1 1
I've been able to test if a row contains a zero with the following code, which adds a new column of "TRUE" or "FALSE".
df$cont0 <- apply(df, 1, function(x) any(x %in% "0"))
I thought I would this new value to then add 1 to reach row where df$cont0 == "TRUE"
ifelse(df$cont0 == "TRUE", df + 1, df)
This ends up creating a nested list that still does not perform the correct operation. I understand that ifelse is already vectorized, but other than that I'm not sure how to approach this issue. I am open to splitting apart the df into "TRUE" and "FALSE" conditions, then performing the operation on df$cont0 == "TRUE", but they need to be re-merged in the original order as the data are chronological and row order therefore matters. However I suspect there's an easier solution. Thank you!
Create a logical index with rowSums on the logical matrix and use that as row index to add
i1 <- rowSums(df == 0, na.rm = TRUE) > 0
df[i1,] <- df[i1, ] + 1
-ouptut
> df
a b c d e f
1 NA NA NA NA NA NA
2 NA NA NA NA NA NA
3 NA NA NA NA NA NA
4 1 3 3 2 1 1
5 1 3 3 2 1 1
6 1 3 3 2 1 1
7 1 3 3 2 1 1
8 1 3 3 2 1 1
Regarding the use of ifelse on a logical vector, it is related to the property of ifelse that it requires all the arguments to be of same length which is not met in the OP's case
Just try to get row index first :
index <- rowIndex(af == 0, na.rm = TRUE) > 0
af[index,] <- af[index, ] + 1
It should work.
Related
I have a dataframe with two columns:
df <- data.frame (a = c(NA, 0, NA, NA, NA, NA, 0, 0, NA),
b = c(1, 2, 5, 3, 6, 3, 2, 1, 4))
a b
1 NA 1
2 0 2
3 NA 5
4 NA 3
5 NA 6
6 NA 3
7 0 2
8 0 1
9 NA 4
When the value in column a is 0, I want to replace the value in column b; desired end result is:
a b
1 NA 1
2 0 0
3 NA 5
4 NA 3
5 NA 6
6 NA 3
7 0 0
8 0 0
9 NA 4
I tried various combinations of mutate with ifelse and case_when, and all but one replaces all of column b with column a values, 0 as well as NA.
Failed attemps:
df %>%
mutate(b = case_when(a == 0 ~ 0))
df %>%
mutate(b = case_when(a == 0 ~ 0,
TRUE ~ as.numeric(as.character(a))))
df %>%
mutate(b = ifelse(a==0, a, b))
All result in:
a b
1 NA NA
2 0 0
3 NA NA
4 NA NA
5 NA NA
6 NA NA
7 0 0
8 0 0
9 NA NA
After much consternation, I finally found a solution that produces the result I'm after:
df <- df %>%
mutate(b = ifelse(is.na(a), b, a))
a b
1 NA 1
2 0 0
3 NA 5
4 NA 3
5 NA 6
6 NA 3
7 0 0
8 0 0
9 NA 4
But I'm still perplexed as to why the others did not work as expected. Would love some insight here.
Using %in% instead of == can be useful where there are NA values.
In base R the following will give you what you want.
df$b[df$a %in% 0] <- 0
Using this in dplyr is slightly more complicated than base R, but simpler than the previous solutions:
library(dplyr)
df <- df %>% mutate(b = if_else(a %in% 0, 0, b))
The reason for the problems is that NA == 0 gives NA, not FALSE. NA %in% 0 gives FALSE.
A possible solution:
library(dplyr)
df %>%
mutate(b = if_else(a == 0 & !is.na(a), 0, b))
#> a b
#> 1 NA 1
#> 2 0 0
#> 3 NA 5
#> 4 NA 3
#> 5 NA 6
#> 6 NA 3
#> 7 0 0
#> 8 0 0
#> 9 NA 4
In general any operation on an NA becomes an NA, so when comparing vectors that have NA the results will be NA where either of the original items was NA.
If you're willing to eschew dplyr you can do this in base R:
df$b <- ifelse(
is.na(df$a),
df$b,
ifelse(
df$a == 0,
0,
df$b
)
)
I have ratings by different raters:
df <- structure(list(SZ = c(1, 1, NA, 0, NA, 1, 1),
SZ_ptak = c(1, 1, NA, NA, NA, 1, 0)),
row.names = c(NA, 7L), class = "data.frame")
I need to compare them to find ratings that differ. This code works fine as long as both raters assigned either 1 or 0. If one rating is NA and the other is 1 or 0, I also want to obtain the value 1 in column diff_SZ - how can that be done?
df %>%
mutate(diff_SZ = +(SZ != SZ_ptak))
SZ SZ_ptak diff_SZ
1 1 1 0
2 1 1 0
3 NA NA NA
4 0 NA NA
5 NA NA NA
6 1 1 0
7 1 0 1
Desired:
SZ SZ_ptak diff_SZ
1 1 1 0
2 1 1 0
3 NA NA NA
4 0 NA 1 <--
5 NA NA NA
6 1 1 0
7 1 0 1
Maybe it would be easy to understand if you list out the conditions.
library(dplyr)
df %>%
mutate(diff_SZ = case_when(is.na(SZ) & is.na(SZ_ptak) ~ NA_real_,
is.na(SZ) | is.na(SZ_ptak) ~ 1,
SZ != SZ_ptak ~ 1,
TRUE ~ 0))
# SZ SZ_ptak diff_SZ
#1 1 1 0
#2 1 1 0
#3 NA NA NA
#4 0 NA 1
#5 NA NA NA
#6 1 1 0
#7 1 0 1
Suppose I have this dataframe
df <- data.frame(
x=c(1, NA, NA, 4, 5, NA),
y=c(NA, 2, 3, NA, NA, 6)
which looks like this
x y
1 1 NA
2 NA 2
3 NA 3
4 4 NA
5 5 NA
6 NA 6
How can I merge the two columns into one? Basically the NA values are in complementary rows. It would be nice to also obtain (in the process) a flag column containing 0 if the entry comes from x and 1 if the entry comes from y.
We can try using the coalesce function from the dplyr package:
df$merged <- coalesce(df$x, df$y)
df$flag <- ifelse(is.na(df$y), 0, 1)
df
x y merged flag
1 1 NA 1 0
2 NA 2 2 1
3 NA 3 3 1
4 4 NA 4 0
5 5 NA 5 0
6 NA 6 6 1
We can also use base R methods with max.col on the logical matrix to get the column index, cbind with row index and extract the values that are not NA
df$merged <- df[cbind(seq_len(nrow(df)), max.col(!is.na(df)))]
df$flag <- +(!is.na(df$y))
df
# x y merged flag
#1 1 NA 1 0
#2 NA 2 2 1
#3 NA 3 3 1
#4 4 NA 4 0
#5 5 NA 5 0
#6 NA 6 6 1
Or we can use fcoalesce from data.table which is written in C and is multithreaded for numeric and factor types.
library(data.table)
setDT(df)[, c('merged', 'flag' ) := .(fcoalesce(x, y), +(!is.na(y)))]
df
# x y merged flag
#1: 1 NA 1 0
#2: NA 2 2 1
#3: NA 3 3 1
#4: 4 NA 4 0
#5: 5 NA 5 0
#6: NA 6 6 1
You can do that using dplyr as follows;
library(dplyr)
# Creating dataframe
df <-
data.frame(
x = c(1, NA, NA, 4, 5, NA),
y = c(NA, 2, 3, NA, NA, 6))
df %>%
# If x is null then replace it with y
mutate(merged = coalesce(x, y),
# If x is null then put 1 else put 0
flag = if_else(is.na(x), 1, 0))
# x y merged flag
# 1 NA 1 0
# NA 2 2 1
# NA 3 3 1
# 4 NA 4 0
# 5 NA 5 0
# NA 6 6 1
In the R data frame below, I would like to replace all of the instances where that both columns = NA to both columns = 0
So I would like to change this:
Col 1 Col 2
1 1
3 2
NA NA
3 NA
NA 3
NA NA
and would like the result to be:
Col 1 Col 2
1 1
3 2
0 0
3 NA
NA 3
0 0
One option is to create a logical index with rowSums on the logical matrix (!is.na(df1) - which will give TRUE values for non-NA and FALSE for NA. By doing the rowSums, rows that have 0 NAs i.e. all FALSE, will return 0 and others will be greater than 0. Negating (!) the vector returns TRUE for 0 values and all others FALSE) and then assign the rows to 0
df1[!rowSums(!is.na(df1)),] <- 0
df1
# Col 1 Col 2
#1 1 1
#2 3 2
#3 0 0
#4 3 NA
#5 NA 3
#6 0 0
Or it can be also done in the other way by not negating and comparing with the number of columns
Another option is to loop through the columns, check for the NAs with is.na and then Reduce it to a logical vector to assign the rows that are TRUE based on it to 0
df1[Reduce(`&`, lapply(df1, is.na)), ] <- 0
In case you want the columns explicitly referenced, you can also do
df <- data.frame(col1=c(1, 3, NA, 3, NA, NA), col2=c(1, 2, NA, NA, 3, NA))
df[is.na(df$col1) & is.na(df$col2), ] <- 0
df
## col1 col2
## 1 1 1
## 2 3 2
## 3 0 0
## 4 3 NA
## 5 NA 3
## 6 0 0
for the case of changing to zero just specific columns, you can reference those columns by index or name inside the brackets. E.g.
df <- data.frame(col1=c(1, 3, NA, 3, NA, NA), col2=c(1, 2, NA, NA, 3, NA), col3=rep(1, 6))
df[is.na(df$col1) & is.na(df$col2), c("col1", "col2")] <- 0
df
## col1 col2 col3
## 1 1 1 1
## 2 3 2 1
## 3 0 0 1
## 4 3 NA 1
## 5 NA 3 1
## 6 0 0 1
How do I assign progressive numbers to a column every time a given condition is met in another one? Given this data:
a <- data.frame(var = c(1, 0, 0, 1, 4, 5, 6, 1, 7, 1, 1))
I would like to construct a column that is progressively augmented by 1 every time var == 1 and returns NAs for the rest. The new column should then be filled with:
1, NA, NA, 2, NA, NA, NA, 3, NA, 4, 5
I thought about ifelse but I didn't manage to make it work.
Thanks!
You can use ifelse and cumsum:
a$newvar <- ifelse(a$var==1, cumsum(a$var==1), NA)
var newvar
1 1 1
2 0 NA
3 0 NA
4 1 2
5 4 NA
6 5 NA
7 6 NA
8 1 3
9 7 NA
10 1 4
11 1 5
I'd just like to add one more thing for a general case in which if you want to do the same thing for 4 or 5 or any thing else
a <- data.frame(var = c(1, 0, 0, 1, 4, 5, 6, 1, 7, 1, 1))
a$New <- ifelse(a$var == 1,1,NA)
a$New[!is.na(a$New)] <- cumsum(a$New[!is.na(a$New)])
Output:
> print(a)
var New
1 1 1
2 0 NA
3 0 NA
4 1 2
5 4 NA
6 5 NA
7 6 NA
8 1 3
9 7 NA
10 1 4
11 1 5
We can also do this with a variation of cumsum
a$newVar <- with(a, cumsum(var ==1) * NA^(var!=1))
a$newVar
#[1] 1 NA NA 2 NA NA NA 3 NA 4 5
Or using data.table, we convert the 'data.frame' to 'data.table' (setDT(a)), specify the logical condition in 'i' (var == 1), and assign (:= it is efficient as it assigns in place) the cumsum of 'var' to 'newvar'. By default, the other elements in 'newvar' that do not correspond to the logical condition will be filled by NA.
library(data.table)
setDT(a)[var==1, newvar := cumsum(var)]
a
# var newvar
# 1: 1 1
# 2: 0 NA
# 3: 0 NA
# 4: 1 2
# 5: 4 NA
# 6: 5 NA
# 7: 6 NA
# 8: 1 3
# 9: 7 NA
#10: 1 4
#11: 1 5
Or instead of cumsum we can use the sequence of rows
setDT(a)[var==1, newvar := seq_len(.N)]