Suppose I have this dataframe
df <- data.frame(
x=c(1, NA, NA, 4, 5, NA),
y=c(NA, 2, 3, NA, NA, 6)
which looks like this
x y
1 1 NA
2 NA 2
3 NA 3
4 4 NA
5 5 NA
6 NA 6
How can I merge the two columns into one? Basically the NA values are in complementary rows. It would be nice to also obtain (in the process) a flag column containing 0 if the entry comes from x and 1 if the entry comes from y.
We can try using the coalesce function from the dplyr package:
df$merged <- coalesce(df$x, df$y)
df$flag <- ifelse(is.na(df$y), 0, 1)
df
x y merged flag
1 1 NA 1 0
2 NA 2 2 1
3 NA 3 3 1
4 4 NA 4 0
5 5 NA 5 0
6 NA 6 6 1
We can also use base R methods with max.col on the logical matrix to get the column index, cbind with row index and extract the values that are not NA
df$merged <- df[cbind(seq_len(nrow(df)), max.col(!is.na(df)))]
df$flag <- +(!is.na(df$y))
df
# x y merged flag
#1 1 NA 1 0
#2 NA 2 2 1
#3 NA 3 3 1
#4 4 NA 4 0
#5 5 NA 5 0
#6 NA 6 6 1
Or we can use fcoalesce from data.table which is written in C and is multithreaded for numeric and factor types.
library(data.table)
setDT(df)[, c('merged', 'flag' ) := .(fcoalesce(x, y), +(!is.na(y)))]
df
# x y merged flag
#1: 1 NA 1 0
#2: NA 2 2 1
#3: NA 3 3 1
#4: 4 NA 4 0
#5: 5 NA 5 0
#6: NA 6 6 1
You can do that using dplyr as follows;
library(dplyr)
# Creating dataframe
df <-
data.frame(
x = c(1, NA, NA, 4, 5, NA),
y = c(NA, 2, 3, NA, NA, 6))
df %>%
# If x is null then replace it with y
mutate(merged = coalesce(x, y),
# If x is null then put 1 else put 0
flag = if_else(is.na(x), 1, 0))
# x y merged flag
# 1 NA 1 0
# NA 2 2 1
# NA 3 3 1
# 4 NA 4 0
# 5 NA 5 0
# NA 6 6 1
Related
I have a dataframe df of integers across 6 variables.
a <- c(NA, NA, NA, 0, 0, 1, 1, 1)
b <- c(NA, NA, NA, 2, 2, 3, 3, 3)
c <- c(NA, NA, NA, 2, 2, 3, 3, 3)
d <- c(NA, NA, NA, 1, 1, 2, 2, 2)
e <- c(NA, NA, NA, 0, 0, 1, 1, 1)
f <- c(NA, NA, NA, 0, 0, 1, 1, 1)
df <- data.frame(a, b, c, d, e, f)
print(df)
a b c d e f
1 NA NA NA NA NA NA
2 NA NA NA NA NA NA
3 NA NA NA NA NA NA
4 0 2 2 1 0 0
5 0 2 2 1 0 0
6 1 3 3 2 1 1
7 1 3 3 2 1 1
8 1 3 3 2 1 1
I would like to add 1 to each row that contains a zero, resulting in:
a b c d e f
1 NA NA NA NA NA NA
2 NA NA NA NA NA NA
3 NA NA NA NA NA NA
4 1 3 3 2 1 1
5 1 3 3 2 1 1
6 1 3 3 2 1 1
7 1 3 3 2 1 1
8 1 3 3 2 1 1
I've been able to test if a row contains a zero with the following code, which adds a new column of "TRUE" or "FALSE".
df$cont0 <- apply(df, 1, function(x) any(x %in% "0"))
I thought I would this new value to then add 1 to reach row where df$cont0 == "TRUE"
ifelse(df$cont0 == "TRUE", df + 1, df)
This ends up creating a nested list that still does not perform the correct operation. I understand that ifelse is already vectorized, but other than that I'm not sure how to approach this issue. I am open to splitting apart the df into "TRUE" and "FALSE" conditions, then performing the operation on df$cont0 == "TRUE", but they need to be re-merged in the original order as the data are chronological and row order therefore matters. However I suspect there's an easier solution. Thank you!
Create a logical index with rowSums on the logical matrix and use that as row index to add
i1 <- rowSums(df == 0, na.rm = TRUE) > 0
df[i1,] <- df[i1, ] + 1
-ouptut
> df
a b c d e f
1 NA NA NA NA NA NA
2 NA NA NA NA NA NA
3 NA NA NA NA NA NA
4 1 3 3 2 1 1
5 1 3 3 2 1 1
6 1 3 3 2 1 1
7 1 3 3 2 1 1
8 1 3 3 2 1 1
Regarding the use of ifelse on a logical vector, it is related to the property of ifelse that it requires all the arguments to be of same length which is not met in the OP's case
Just try to get row index first :
index <- rowIndex(af == 0, na.rm = TRUE) > 0
af[index,] <- af[index, ] + 1
It should work.
data = data.frame(STUDENT=c(1,2,3,4,5,6,7,8),
CAT=c(NA,NA,1,2,3,NA,NA,0),
DOG=c(NA,NA,2,3,2,NA,1,NA),
MOUSE=c(2,3,NA,NA,NA,NA,NA,NA),
WANT=c(2,3,2,2,3,NA,NA,NA))
I have 'data' and wish to create the 'WANT' variable and what it does is it takes the first non-NA value that does not equals to '1' or '0' and it stores it in 'WANT'. The code example above shows an example of what I hope to get.
We can use coalesce after changing the values 0, 1 in the selected columns to NA, then bind the column with the original dataset
library(dplyr)
data %>%
transmute(across(CAT:MOUSE, ~ replace(., . %in% 0:1, NA))) %>%
transmute(WANT2 = coalesce(!!! .)) %>%
bind_cols(data, .)
# STUDENT CAT DOG MOUSE WANT WANT2
#1 1 NA NA 2 2 2
#2 2 NA NA 3 3 3
#3 3 1 2 NA 2 2
#4 4 2 3 NA 2 2
#5 5 3 2 NA 3 3
#6 6 NA NA NA NA NA
#7 7 NA 1 NA NA NA
#8 8 0 NA NA NA NA
Or using data.table with fcoalesce. Convert the 'data.frame' to 'data.table' (setDT(data)), specify the columns of interest in .SDcols, loop over the .SD replace the values that are 0, 1 to NA, use fcoalesce and assign (:=) it to create new column 'WANT2'
library(data.table)
setDT(data)[, WANT2 := do.call(fcoalesce, lapply(.SD, function(x)
replace(x, x %in% 0:1, NA))), .SDcols = CAT:MOUSE]
or with base R, we can use a vectorized option with row/column indexing to extract the first non-NA element after replaceing the values 0, 1 to NA
m1 <- !is.na(replace(data[2:4], data[2:4] == 1|data[2:4] == 0, NA))
data$WAN2 <- data[2:4][cbind(seq_len(nrow(m1)), max.col(m1, "first"))]
data$WANT2[data$WANT2 == 0] <- NA
Try this:
data$Want2 <- apply(data[,-c(1,5)],1,function(x) x[min(which(!is.na(x) & x!=0 & x!=1))])
STUDENT CAT DOG MOUSE WANT Want2
1 1 NA NA 2 2 2
2 2 NA NA 3 3 3
3 3 1 2 NA 2 2
4 4 2 3 NA 2 2
5 5 3 2 NA 3 3
6 6 NA NA NA NA NA
7 7 NA 1 NA NA NA
8 8 0 NA NA 0 NA
I'm trying to subset a dataframe in R by checking if each value is present in a specific list and keeping it if it is. For instance in the following dataframe:
x <- data.frame(A = sample(1:5, 5),
B = sample(1:5, 5),
C = sample(1:5, 5))
A B C
1 2 2 1
2 3 3 3
3 1 4 4
4 4 5 2
5 5 1 5
How could I subset it to include only the values 1, 3 and 4, giving the following as a result:
A B C
1 1
2 3 3 3
3 4 4
4 4
5 1
It doesn't matter what happens to the missing values - they could be changed to NA if this is easier. From browsing similar questions it seems that lapply might do it, but as a novice I'm struggling to apply what I've seen to this scenario.
set.seed(47)
x <- data.frame(A = sample(1:5, 5),
B = sample(1:5, 5),
C = sample(1:5, 5))
# with lapply
keep_vals = c(1, 3, 4)
x[] = lapply(x, function(y) {
y[! y %in% keep_vals] = NA
return(y)
})
x
# A B C
# 1 3 1 1
# 2 1 NA NA
# 3 NA NA 4
# 4 4 3 NA
# 5 NA 4 3
Or with a for loop:
set.seed(47) # reset data
x <- data.frame(A = sample(1:5, 5),
B = sample(1:5, 5),
C = sample(1:5, 5))
keep_vals = c(1, 3, 4)
for (i in 1:ncol(x)) {
x[, i][!x[, i] %in% keep_vals] <- NA
}
x
# A B C
# 1 3 1 1
# 2 1 NA NA
# 3 NA NA 4
# 4 4 3 NA
# 5 NA 4 3
With dplyr
x %>% mutate_all(
~replace(., !. %in% keep_vals, NA)
)
# A B C
# 1 3 1 1
# 2 1 NA NA
# 3 NA NA 4
# 4 4 3 NA
# 5 NA 4 3
using dplyr::bind_rows
do.call(bind_rows,apply(x,1, function(a) a[a %in% c(1,3,4)]))
# A tibble: 5 x 3
A B C
<int> <int> <int>
1 4 NA NA
2 1 1 1
3 3 3 NA
4 NA NA 4
5 NA 4 3
Collapsing each row to the matching numbers, and adjusting each length to ncol. (Assuming you want to "left-align" your numbers, as shown in your expected output.)
d <- setNames(as.data.frame(t(apply(d, 1, function(x) {
x <- x[x %in% c(1, 3, 4)]
`length<-`(x, ncol(d))
}))), names(d))
d
# A B C
# 1 1 NA NA
# 2 3 3 3
# 3 1 4 4
# 4 4 NA NA
# 5 NA NA NA
Since apply throws a matrix, we tell R that we want the transpose as.data.frame and setNames to restore those.
Note, that I changed line 5 of your example data so that it doesn't contain any of the matching numbers, so as not to make it too easy.
Data
d <- read.table(text="A B C
1 2 2 1
2 3 3 3
3 1 4 4
4 4 5 2
5 5 2 5", header=TRUE)
I want to make value of each row of column A , NA ,where column B is 2:
data
A B
1 2
2 4
NA 5
6 2
output
A B
NA 2
2 4
NA 5
NA 2
first and last row of B was 2 so A got NA in those.
Here's a way using ifelse in base R -
df$A <- ifelse(df$B == 2, NA_real_, df$A)
set.seed(0)
df <- data.frame(A = sample(1:10, size=5, replace=T),
B = sample(1:10, size=5, replace=T))
df
A B
1 9 7
2 4 2
3 7 3
4 1 1
5 2 5
df$A[df$B == 2] <- NA
df
A B
1 9 7
2 NA 2
3 7 3
4 1 1
5 2 5
How do I assign progressive numbers to a column every time a given condition is met in another one? Given this data:
a <- data.frame(var = c(1, 0, 0, 1, 4, 5, 6, 1, 7, 1, 1))
I would like to construct a column that is progressively augmented by 1 every time var == 1 and returns NAs for the rest. The new column should then be filled with:
1, NA, NA, 2, NA, NA, NA, 3, NA, 4, 5
I thought about ifelse but I didn't manage to make it work.
Thanks!
You can use ifelse and cumsum:
a$newvar <- ifelse(a$var==1, cumsum(a$var==1), NA)
var newvar
1 1 1
2 0 NA
3 0 NA
4 1 2
5 4 NA
6 5 NA
7 6 NA
8 1 3
9 7 NA
10 1 4
11 1 5
I'd just like to add one more thing for a general case in which if you want to do the same thing for 4 or 5 or any thing else
a <- data.frame(var = c(1, 0, 0, 1, 4, 5, 6, 1, 7, 1, 1))
a$New <- ifelse(a$var == 1,1,NA)
a$New[!is.na(a$New)] <- cumsum(a$New[!is.na(a$New)])
Output:
> print(a)
var New
1 1 1
2 0 NA
3 0 NA
4 1 2
5 4 NA
6 5 NA
7 6 NA
8 1 3
9 7 NA
10 1 4
11 1 5
We can also do this with a variation of cumsum
a$newVar <- with(a, cumsum(var ==1) * NA^(var!=1))
a$newVar
#[1] 1 NA NA 2 NA NA NA 3 NA 4 5
Or using data.table, we convert the 'data.frame' to 'data.table' (setDT(a)), specify the logical condition in 'i' (var == 1), and assign (:= it is efficient as it assigns in place) the cumsum of 'var' to 'newvar'. By default, the other elements in 'newvar' that do not correspond to the logical condition will be filled by NA.
library(data.table)
setDT(a)[var==1, newvar := cumsum(var)]
a
# var newvar
# 1: 1 1
# 2: 0 NA
# 3: 0 NA
# 4: 1 2
# 5: 4 NA
# 6: 5 NA
# 7: 6 NA
# 8: 1 3
# 9: 7 NA
#10: 1 4
#11: 1 5
Or instead of cumsum we can use the sequence of rows
setDT(a)[var==1, newvar := seq_len(.N)]