I'm trying to drop (set to NA) values in 1 column, based on values in another column; and to do this over a large set of columns. The idea is to then pass the data to a plotting function, to generate different plots for different cuts of the data.
Here's a reproducible example:
d <- data.frame("A_agree" = sample(1:7, 20, replace=T),
"B_agree" = sample(1:7, 20, replace=T),
"C_agree" = sample(1:7, 20, replace=T),
"A_change" = sample(1:5, 20, replace=T),
"B_change" = sample(1:5, 20, replace=T),
"C_change" = sample(1:5, 20, replace=T))
I've already found the following solution using base R, but it's of course slow, and I'm trying to learn more and more dplyr, so was wondering how to achieve this in dplyr
d.positive <- d
for (n in (c("A","B","C"))) {
for (i in 1:nrow(d.positive)) {
d.positive[i, paste0(n, "_agree")] <- ifelse(d.positive[i, paste0(n, "_change")] > 3,
d.positive[i, paste0(n, "_agree")],
NA)
}
}
d.neutral <- d
for (n in (c("A","B","C"))) {
for (i in 1:nrow(d.neutral)) {
d.neutral[i, paste0(n, "_agree")] <- ifelse(d.neutral[i, paste0(n, "_change")] == 3,
d.neutral[i, paste0(n, "_agree")],
NA)
}
}
d.negative <- d
for (n in (c("A","B","C"))) {
for (i in 1:nrow(d.negative)) {
d.negative[i, paste0(n, "_agree")] <- ifelse(d.negative[i, paste0(n, "_change")] < 3,
d.negative[i, paste0(n, "_agree")],
NA)
}
}
I thought I would use gather(), and then check for each row whether the corresponding column (hence the !!dimension) is bigger than a certain value (3 in this case), but it doesn't seem to work?
d %>%
gather(dimension,
value,
paste0(c("A","B","C"), "_agree")
) %>%
case_when(!!dimension > 3 ~ value=NA)
Alternatively, I thought I'd use map2_dfr from purrr, but I don't think it iterates over cells, just takes the entire column, hence this doesn't work:
map2_dfr(.x = d %>%
select( paste0(c("A","B","C"), "_agree") ),
.y = d %>%
select( paste0(c("A","B","C"), "_change") ),
~ if_else(.y > 3, x, NA)} )
Any pointers would be really helpful, to keep learning about the wonderful world of dplyr !
I get that you want to learn about purrr, but base R is just easier here:
d.positive <- d
check <- d.positive[4:6] <= 3 #it's the same condition
d.positive[,1:3][check] <- NA
> d.positive
A_agree B_agree C_agree A_change B_change C_change
1 1 NA NA 4 3 2
2 2 2 NA 4 5 2
3 4 NA NA 4 3 1
4 1 NA NA 4 1 2
5 NA 1 NA 2 4 1
6 NA 7 NA 3 5 1
7 NA 6 NA 1 5 1
8 NA 6 4 2 5 5
9 4 NA NA 4 1 2
10 1 NA NA 5 1 2
11 NA NA NA 3 1 2
12 NA NA NA 1 3 3
13 NA NA NA 1 1 1
14 NA NA NA 3 2 3
15 1 NA NA 5 3 3
16 2 NA NA 4 3 2
17 NA NA 6 1 1 4
18 NA NA NA 1 1 2
19 NA NA NA 2 3 1
20 NA NA NA 1 3 1
I would suggest to use tidyr package in combination with dplyr. In it there are new functions pivot_longer and pivot_wider which replace older gather and spread.
Using a combination of both the solution could be as follows:
d.neutral1 =
d %>%
mutate(row = row_number() ) %>%
pivot_longer(-row, names_sep = "_", names_to = c("name","type") ) %>%
pivot_wider(names_from = type, values_from = value) %>%
mutate(result = if_else(change == 3, agree, NA_integer_))
and if you want a similar shape to the original
d.neutral1 %>%
select(-agree, -change) %>%
pivot_wider(names_from = name, values_from = result)
Related
I have a dataframe with two columns. I need to check if where a column is NA the other is not. Thanks
Edited.
I would like to know, for each row of the dataframe, if there are rows with both columns not NA.
You can use the following code to check which row has no NA values:
df <- data.frame(x = c(1, NA),
y = c(2, NA))
which(rowSums(is.na(df))==ncol(df))
Output:
[1] 1
As you can see the first rows has no NA values so both columns have no NA values.
Here's a simple code to generate a column of the NA count for each row:
x <- sample(c(1, NA), 25, replace = TRUE)
y <- sample(c(1, NA), 25, replace = TRUE)
df <- data.frame(x, y)
df$NA_Count <- apply(df, 1, function(x) sum(is.na(x)))
df
x y NA_Count
1 NA 1 1
2 NA NA 2
3 1 NA 1
4 1 NA 1
5 NA NA 2
6 1 NA 1
7 1 1 0
8 1 1 0
9 1 1 0
I have the following dataframe:
df <- data.frame(var1_lag0 = c(1,2,3,4,5,6)
, var1_lag1 = c(0,1,2,3,4,5)
, var2_lag0 = c(34,5,45,7,2,1)
, var2_lag2 = c(0,0,34,5,45,7)
)
I want to change a specific value of each columns using the following logic:
Variable name contains "_lag1" then the first element of the column has to turn into NA
Variable name contains "_lag2" then the first and second element of the column has to turn into NA
Else the column remains as it is
The expected result should be look like:
df_new <- data.frame(var1_lag0 = c(1,2,3,4,5,6)
, var1_lag1 = c(NA,1,2,3,4,5)
, var2_lag0 = c(34,5,45,7,2,1)
, var2_lag2 = c(NA,NA,34,5,45,7)
)
As you have the original unlagged variables in your df you could simply recompute the lagged values using e.g. dplyr::lag which by default will give you NAs:
df <- data.frame(var1_lag0 = c(1,2,3,4,5,6)
, var1_lag1 = c(0,1,2,3,4,5)
, var2_lag0 = c(34,5,45,7,2,1)
, var2_lag2 = c(0,0,34,5,45,7)
)
library(dplyr)
df %>% mutate(var1_lag1 = dplyr::lag(var1_lag0, n = 1), var2_lag2 = dplyr::lag(var2_lag0, n = 2))
#> var1_lag0 var1_lag1 var2_lag0 var2_lag2
#> 1 1 NA 34 NA
#> 2 2 1 5 NA
#> 3 3 2 45 34
#> 4 4 3 7 5
#> 5 5 4 2 45
#> 6 6 5 1 7
A base R solution might look like this:
df <- data.frame(var1_lag0 = c(1,2,3,4,5,6)
, var1_lag1 = c(0,1,2,3,4,5)
, var2_lag0 = c(34,5,45,7,2,1)
, var2_lag2 = c(0,0,34,5,45,7)
)
df_new <- df
df_new[1 , grep(pattern="_lag1", colnames(df))] <- NA
df_new[c(1,2) , grep(pattern="_lag2", colnames(df))] <- NA
df_new
#> var1_lag0 var1_lag1 var2_lag0 var2_lag2
#> 1 1 NA 34 NA
#> 2 2 1 5 NA
#> 3 3 2 45 34
#> 4 4 3 7 5
#> 5 5 4 2 45
#> 6 6 5 1 7
Created on 2021-01-06 by the reprex package (v0.3.0)
Here is a for loop that checks the column names of the df for the key words "_lag1" and "_lag2" and turns the corresponding values to NA.
for (i in 1:length(df)){
if (grepl("_lag1",colnames(df)[i])){
df[1,i] = NA
}
else if (grepl("_lag2",colnames(df)[i])){
df[1:2,i] = NA
}
}
You can try to wrap a case_when inside a helper function and use mutate_at with contains to get the proper columns.
df %>%
mutate_at(vars(contains("lag1")),
function(x, lag) fix(x, "lag1")) %>%
mutate_at(vars(contains("lag2")),
function(x, lag) fix(x, "lag2"))
Which produces
var1_lag0 var1_lag1 var2_lag0 var2_lag2
1 1 NA 34 NA
2 2 1 5 NA
3 3 2 45 34
4 4 3 7 5
5 5 4 2 45
6 6 5 1 7
Here is the helper function called fix
fix <- function(x, lag){
real_lag <- case_when(stringr::str_detect("lag1", lag) ~ 1,
stringr::str_detect("lag2", lag) ~ 2)
x[1:real_lag] <- NA
return(x)
}
I wanted to merge different elements of atomic vectors by elements names stored in list. See example:
ls = list(a = c(a = 1, b = 2, d = 2), b = c(b = 2, c = 3), c = c(a = 1, b = 2))
Now, I wanted to get output like this:
a b c
a 1 NA 1
b 2 2 2
c NA 3 NA
d 2 NA NA
I tried Reduce, but it is not working. I do not want to use any external package for this problem.
Thanks
You can use [ in sapply after you have extracted all elements names.
i <- sort(unique(unlist(lapply(ls, names))))
x <- sapply(ls, "[", i)
rownames(x) <- i
x
# a b c
#a 1 NA 1
#b 2 2 2
#c NA 3 NA
#d 2 NA NA
We could also use bind_rows here
library(dplyr)
library(tibble)
bind_rows(ls, .id = 'x') %>%
column_to_rownames('x') %>%
t
a b c
a 1 NA 1
b 2 2 2
d 2 NA NA
c NA 3 NA
Or using base R
xtabs(values ~ ind + x, do.call(rbind, Map(cbind, x = names(ls), lapply(ls, stack))))
x
ind a b c
a 1 0 1
b 2 2 2
d 2 0 0
c 0 3 0
A data.table option using rbindlist
> t(rbindlist(Map(function(x) data.table(t(x)), lst), fill = TRUE))
[,1] [,2] [,3]
a 1 NA 1
b 2 2 2
d 2 NA NA
c NA 3 NA
I have to find all columns with all NA-values. If there are not all NA-values in column, I have to replace NAs with 0.
My solution is:
NA_check <- colSums(is.na(frame)) == nrow(frame) #True or False - all NA or not
frame[is.na(frame) & which(names(frame) %in% names(NA_check)[which(NA_check == FALSE, arr.ind=T)])] <- 0
These conditions work separately, but they don't work together or I get some errors combining them. How can I solve my problem?
P.S. This modification also doesn't work if NA_checkis not all FALSE:
frame[is.na(frame[which(names(frame) %in% names(NA_check)[which(NA_check == FALSE, arr.ind=T)])])] <- 0
You can find out columns which has atleast one non-NA value (not all values are NA) and replace NA in that subset to 0.
not_all_NA <- colSums(!is.na(frame)) > 0
frame[not_all_NA][is.na(frame[not_all_NA])] <- 0
We can check this with an example :
frame <- data.frame(a = c(NA, NA, 3, 4), b = NA, c = c(NA, 1:3), d = NA)
frame
# a b c d
#1 NA NA NA NA
#2 NA NA 1 NA
#3 3 NA 2 NA
#4 4 NA 3 NA
not_all_NA <- colSums(!is.na(frame)) > 0
frame[not_all_NA][is.na(frame[not_all_NA])] <- 0
frame
# a b c d
#1 0 NA 0 NA
#2 0 NA 1 NA
#3 3 NA 2 NA
#4 4 NA 3 NA
We can also do this with dplyr :
library(dplyr)
frame %>% mutate(across(where(~any(!is.na(.))), tidyr::replace_na, 0))
I'd like to remove the NA values from my columns, merge all columns into four columns, while keeping NA's if there is not 4 values in each row.
Say I have data like this,
df <- data.frame('a' = c(1,4,NA,3),
'b' = c(3,NA,3,NA),
'c' = c(NA,2,NA,NA),
'd' = c(4,2,NA,NA),
'e'= c(NA,5,3,NA),
'f'= c(1,NA,NA,4),
'g'= c(NA,NA,NA,4))
#> a b c d e f g
#> 1 1 3 NA 4 NA 1 NA
#> 2 4 NA 2 2 5 NA NA
#> 3 NA 3 NA NA 3 NA NA
#> 4 3 NA NA NA NA 4 4
My desired outcome would be,
df.desired <- data.frame('a' = c(1,4,3,3),
'b' = c(3,2,3,4),
'c' = c(4,2,NA,4),
'd' = c(1,5,NA,NA))
df.desired
#> a b c d
#> 1 1 3 4 1
#> 2 4 2 2 5
#> 3 3 3 NA NA
#> 4 3 4 4 NA
You could've probably explored a bit more on SO to tweak two answers 1 & 2.
Shifting all the Numbers with NAs
Remove the columns where you've got All NAs
Result:
df <- data.frame('a' = c(1,4,NA,3),
'b' = c(3,NA,3,NA),
'c' = c(NA,2,NA,NA),
'd' = c(4,2,NA,NA),
'e'= c(NA,5,3,NA),
'f'= c(1,NA,NA,4),
'g'= c(NA,NA,NA,4))
df.new<-do.call(rbind,lapply(1:nrow(df),function(x) t(matrix(df[x,order(is.na(df[x,]))])) ))
colnames(df.new)<-colnames(df)
df.new
df.new[,colSums(is.na(df.new))<nrow(df.new)]
Output:
> df.new[,colSums(is.na(df.new))<nrow(df.new)]
a b c d
[1,] 1 3 4 1
[2,] 4 2 2 5
[3,] 3 3 NA NA
[4,] 3 4 4 NA
I believe there are more efficient ways, anyhow that is my try:
x00=sapply(1:nrow(df),function(x) df[x,][!is.na( df[x,])])
x01=lapply(x00,function(x) x=c(x,rep(NA,7-length(x)-1)))
x02=as.data.frame(do.call("rbind",x01))
x02 <- x02[,colSums(is.na(x02))<nrow(x02)]
I have following solution:
df <- data.frame('a' = c(1,4,NA,3),
'b' = c(3,NA,3,NA),
'c' = c(NA,2,NA,NA),
'd' = c(4,2,NA,NA),
'e'= c(NA,5,3,NA),
'f'= c(1,NA,NA,4),
'g'= c(NA,NA,NA,4))
df
x <-list()
for(i in 1:nrow(df)){
x[[i]] <- df[i,]
x[[i]] <- x[[i]][!is.na(x[[i]])]
# x[[i]] <- as.data.frame(x[[i]], stringsAsFactors = FALSE)
x[[i]] <- c(x[[i]], rep(0, 5 -length(x[[i]])))
}
result <- do.call(rbind, x)
result