How to change change variables to 0 if BOTH columns = NA - r

In the R data frame below, I would like to replace all of the instances where that both columns = NA to both columns = 0
So I would like to change this:
Col 1 Col 2
1 1
3 2
NA NA
3 NA
NA 3
NA NA
and would like the result to be:
Col 1 Col 2
1 1
3 2
0 0
3 NA
NA 3
0 0

One option is to create a logical index with rowSums on the logical matrix (!is.na(df1) - which will give TRUE values for non-NA and FALSE for NA. By doing the rowSums, rows that have 0 NAs i.e. all FALSE, will return 0 and others will be greater than 0. Negating (!) the vector returns TRUE for 0 values and all others FALSE) and then assign the rows to 0
df1[!rowSums(!is.na(df1)),] <- 0
df1
# Col 1 Col 2
#1 1 1
#2 3 2
#3 0 0
#4 3 NA
#5 NA 3
#6 0 0
Or it can be also done in the other way by not negating and comparing with the number of columns
Another option is to loop through the columns, check for the NAs with is.na and then Reduce it to a logical vector to assign the rows that are TRUE based on it to 0
df1[Reduce(`&`, lapply(df1, is.na)), ] <- 0

In case you want the columns explicitly referenced, you can also do
df <- data.frame(col1=c(1, 3, NA, 3, NA, NA), col2=c(1, 2, NA, NA, 3, NA))
df[is.na(df$col1) & is.na(df$col2), ] <- 0
df
## col1 col2
## 1 1 1
## 2 3 2
## 3 0 0
## 4 3 NA
## 5 NA 3
## 6 0 0
for the case of changing to zero just specific columns, you can reference those columns by index or name inside the brackets. E.g.
df <- data.frame(col1=c(1, 3, NA, 3, NA, NA), col2=c(1, 2, NA, NA, 3, NA), col3=rep(1, 6))
df[is.na(df$col1) & is.na(df$col2), c("col1", "col2")] <- 0
df
## col1 col2 col3
## 1 1 1 1
## 2 3 2 1
## 3 0 0 1
## 4 3 NA 1
## 5 NA 3 1
## 6 0 0 1

Related

compare ratings involving integers and NA

I have ratings by different raters:
df <- structure(list(SZ = c(1, 1, NA, 0, NA, 1, 1),
SZ_ptak = c(1, 1, NA, NA, NA, 1, 0)),
row.names = c(NA, 7L), class = "data.frame")
I need to compare them to find ratings that differ. This code works fine as long as both raters assigned either 1 or 0. If one rating is NA and the other is 1 or 0, I also want to obtain the value 1 in column diff_SZ - how can that be done?
df %>%
mutate(diff_SZ = +(SZ != SZ_ptak))
SZ SZ_ptak diff_SZ
1 1 1 0
2 1 1 0
3 NA NA NA
4 0 NA NA
5 NA NA NA
6 1 1 0
7 1 0 1
Desired:
SZ SZ_ptak diff_SZ
1 1 1 0
2 1 1 0
3 NA NA NA
4 0 NA 1 <--
5 NA NA NA
6 1 1 0
7 1 0 1
Maybe it would be easy to understand if you list out the conditions.
library(dplyr)
df %>%
mutate(diff_SZ = case_when(is.na(SZ) & is.na(SZ_ptak) ~ NA_real_,
is.na(SZ) | is.na(SZ_ptak) ~ 1,
SZ != SZ_ptak ~ 1,
TRUE ~ 0))
# SZ SZ_ptak diff_SZ
#1 1 1 0
#2 1 1 0
#3 NA NA NA
#4 0 NA 1
#5 NA NA NA
#6 1 1 0
#7 1 0 1

add value to each element in row if row contains match

I have a dataframe df of integers across 6 variables.
a <- c(NA, NA, NA, 0, 0, 1, 1, 1)
b <- c(NA, NA, NA, 2, 2, 3, 3, 3)
c <- c(NA, NA, NA, 2, 2, 3, 3, 3)
d <- c(NA, NA, NA, 1, 1, 2, 2, 2)
e <- c(NA, NA, NA, 0, 0, 1, 1, 1)
f <- c(NA, NA, NA, 0, 0, 1, 1, 1)
df <- data.frame(a, b, c, d, e, f)
print(df)
a b c d e f
1 NA NA NA NA NA NA
2 NA NA NA NA NA NA
3 NA NA NA NA NA NA
4 0 2 2 1 0 0
5 0 2 2 1 0 0
6 1 3 3 2 1 1
7 1 3 3 2 1 1
8 1 3 3 2 1 1
I would like to add 1 to each row that contains a zero, resulting in:
a b c d e f
1 NA NA NA NA NA NA
2 NA NA NA NA NA NA
3 NA NA NA NA NA NA
4 1 3 3 2 1 1
5 1 3 3 2 1 1
6 1 3 3 2 1 1
7 1 3 3 2 1 1
8 1 3 3 2 1 1
I've been able to test if a row contains a zero with the following code, which adds a new column of "TRUE" or "FALSE".
df$cont0 <- apply(df, 1, function(x) any(x %in% "0"))
I thought I would this new value to then add 1 to reach row where df$cont0 == "TRUE"
ifelse(df$cont0 == "TRUE", df + 1, df)
This ends up creating a nested list that still does not perform the correct operation. I understand that ifelse is already vectorized, but other than that I'm not sure how to approach this issue. I am open to splitting apart the df into "TRUE" and "FALSE" conditions, then performing the operation on df$cont0 == "TRUE", but they need to be re-merged in the original order as the data are chronological and row order therefore matters. However I suspect there's an easier solution. Thank you!
Create a logical index with rowSums on the logical matrix and use that as row index to add
i1 <- rowSums(df == 0, na.rm = TRUE) > 0
df[i1,] <- df[i1, ] + 1
-ouptut
> df
a b c d e f
1 NA NA NA NA NA NA
2 NA NA NA NA NA NA
3 NA NA NA NA NA NA
4 1 3 3 2 1 1
5 1 3 3 2 1 1
6 1 3 3 2 1 1
7 1 3 3 2 1 1
8 1 3 3 2 1 1
Regarding the use of ifelse on a logical vector, it is related to the property of ifelse that it requires all the arguments to be of same length which is not met in the OP's case
Just try to get row index first :
index <- rowIndex(af == 0, na.rm = TRUE) > 0
af[index,] <- af[index, ] + 1
It should work.

Merge two columns containing NA values in complementing rows

Suppose I have this dataframe
df <- data.frame(
x=c(1, NA, NA, 4, 5, NA),
y=c(NA, 2, 3, NA, NA, 6)
which looks like this
x y
1 1 NA
2 NA 2
3 NA 3
4 4 NA
5 5 NA
6 NA 6
How can I merge the two columns into one? Basically the NA values are in complementary rows. It would be nice to also obtain (in the process) a flag column containing 0 if the entry comes from x and 1 if the entry comes from y.
We can try using the coalesce function from the dplyr package:
df$merged <- coalesce(df$x, df$y)
df$flag <- ifelse(is.na(df$y), 0, 1)
df
x y merged flag
1 1 NA 1 0
2 NA 2 2 1
3 NA 3 3 1
4 4 NA 4 0
5 5 NA 5 0
6 NA 6 6 1
We can also use base R methods with max.col on the logical matrix to get the column index, cbind with row index and extract the values that are not NA
df$merged <- df[cbind(seq_len(nrow(df)), max.col(!is.na(df)))]
df$flag <- +(!is.na(df$y))
df
# x y merged flag
#1 1 NA 1 0
#2 NA 2 2 1
#3 NA 3 3 1
#4 4 NA 4 0
#5 5 NA 5 0
#6 NA 6 6 1
Or we can use fcoalesce from data.table which is written in C and is multithreaded for numeric and factor types.
library(data.table)
setDT(df)[, c('merged', 'flag' ) := .(fcoalesce(x, y), +(!is.na(y)))]
df
# x y merged flag
#1: 1 NA 1 0
#2: NA 2 2 1
#3: NA 3 3 1
#4: 4 NA 4 0
#5: 5 NA 5 0
#6: NA 6 6 1
You can do that using dplyr as follows;
library(dplyr)
# Creating dataframe
df <-
data.frame(
x = c(1, NA, NA, 4, 5, NA),
y = c(NA, 2, 3, NA, NA, 6))
df %>%
# If x is null then replace it with y
mutate(merged = coalesce(x, y),
# If x is null then put 1 else put 0
flag = if_else(is.na(x), 1, 0))
# x y merged flag
# 1 NA 1 0
# NA 2 2 1
# NA 3 3 1
# 4 NA 4 0
# 5 NA 5 0
# NA 6 6 1

Assign progressive number if a condition is met - R

How do I assign progressive numbers to a column every time a given condition is met in another one? Given this data:
a <- data.frame(var = c(1, 0, 0, 1, 4, 5, 6, 1, 7, 1, 1))
I would like to construct a column that is progressively augmented by 1 every time var == 1 and returns NAs for the rest. The new column should then be filled with:
1, NA, NA, 2, NA, NA, NA, 3, NA, 4, 5
I thought about ifelse but I didn't manage to make it work.
Thanks!
You can use ifelse and cumsum:
a$newvar <- ifelse(a$var==1, cumsum(a$var==1), NA)
var newvar
1 1 1
2 0 NA
3 0 NA
4 1 2
5 4 NA
6 5 NA
7 6 NA
8 1 3
9 7 NA
10 1 4
11 1 5
I'd just like to add one more thing for a general case in which if you want to do the same thing for 4 or 5 or any thing else
a <- data.frame(var = c(1, 0, 0, 1, 4, 5, 6, 1, 7, 1, 1))
a$New <- ifelse(a$var == 1,1,NA)
a$New[!is.na(a$New)] <- cumsum(a$New[!is.na(a$New)])
Output:
> print(a)
var New
1 1 1
2 0 NA
3 0 NA
4 1 2
5 4 NA
6 5 NA
7 6 NA
8 1 3
9 7 NA
10 1 4
11 1 5
We can also do this with a variation of cumsum
a$newVar <- with(a, cumsum(var ==1) * NA^(var!=1))
a$newVar
#[1] 1 NA NA 2 NA NA NA 3 NA 4 5
Or using data.table, we convert the 'data.frame' to 'data.table' (setDT(a)), specify the logical condition in 'i' (var == 1), and assign (:= it is efficient as it assigns in place) the cumsum of 'var' to 'newvar'. By default, the other elements in 'newvar' that do not correspond to the logical condition will be filled by NA.
library(data.table)
setDT(a)[var==1, newvar := cumsum(var)]
a
# var newvar
# 1: 1 1
# 2: 0 NA
# 3: 0 NA
# 4: 1 2
# 5: 4 NA
# 6: 5 NA
# 7: 6 NA
# 8: 1 3
# 9: 7 NA
#10: 1 4
#11: 1 5
Or instead of cumsum we can use the sequence of rows
setDT(a)[var==1, newvar := seq_len(.N)]

Table by row with R

I would like to tabulate by row within a data frame. I can obtain adequate results using table within apply in the following example:
df.1 <- read.table(text = '
state county city year1 year2 year3 year4 year5
1 2 4 0 0 0 1 2
2 5 3 10 20 10 NA 10
2 7 1 200 200 NA NA 200
3 1 1 NA NA NA NA NA
', na.strings = "NA", header=TRUE)
tdf <- t(df.1)
apply(tdf[4:nrow(tdf),1:nrow(df.1)], 2, function(x) {table(x, useNA = "ifany")})
Here are the results:
[[1]]
x
0 1 2
3 1 1
[[2]]
x
10 20 <NA>
3 1 1
[[3]]
x
200 <NA>
3 2
[[4]]
x
<NA>
5
However, in the following example each row consists of a single value.
df.2 <- read.table(text = '
state county city year1 year2 year3 year4 year5
1 2 4 0 0 0 0 0
2 5 3 1 1 1 1 1
2 7 1 2 2 2 2 2
3 1 1 NA NA NA NA NA
', na.strings = "NA", header=TRUE)
tdf.2 <- t(df.2)
apply(tdf.2[4:nrow(tdf.2),1:nrow(df.2)], 2, function(x) {table(x, useNA = "ifany")})
The output I obtain is:
# [1] 5 5 5 5
As such, I cannot tell from this output that the first 5 is for 0, the second 5 is for 1, the third 5 is for 2 and the last 5 is for NA. Is there a way I can have R return the value represented by each 5 in the second example?
You can use lapply to systematically output a list. You would have to loop over the row indices:
sub.df <- as.matrix(df.2[grepl("year", names(df.2))])
lapply(seq_len(nrow(sub.df)),
function(i)table(sub.df[i, ], useNA = "ifany"))
Protect the result by wrapping with list:
apply(tdf.2[4:nrow(tdf.2),1:nrow(df.2)], 2,
function(x) {list(table(x, useNA = "ifany")) })
Here's a table solution:
table(
rep(rownames(df.1),5),
unlist(df.1[,4:8]),
useNA="ifany")
This gives
0 1 2 10 20 200 <NA>
1 3 1 1 0 0 0 0
2 0 0 0 3 1 0 1
3 0 0 0 0 0 3 2
4 0 0 0 0 0 0 5
...and for your df.2:
0 1 2 <NA>
1 5 0 0 0
2 0 5 0 0
3 0 0 5 0
4 0 0 0 5
Well, this is a solution unless you really like having a list of tables for some reason.
I think the problem is stated in applys help:
... If n equals 1, apply returns a vector if MARGIN has length 1 and
an array of dimension dim(X)[MARGIN] otherwise ...
The inconsistencies of the return values of base R's apply family is the reason why I shifted completely to plyrs **ply functions. So this works as desired:
library(plyr)
alply( df.2[ 4:8 ], 1, function(x) table( unlist(x), useNA = "ifany" ) )

Resources