I've got a problem on how to neatly merge a lot of columns into less columns.
My df looks something like this (but with a lot more similar columns).
df <- data.frame(
A1 = c(1,1,1,NA,NA,NA),
A2 = c(NA,NA,NA,1,1,1),
B1 = c("text","text","text",NA,NA,NA),
B2 = c(NA,NA,NA,"text","text","text")
)
# which looks like this
A1 A2 B1 B2
1 NA "text" NA
1 NA "text" NA
1 NA "text" NA
NA 1 NA "text"
NA 1 NA "text"
NA 1 NA "text"
I would like to merge all the A columns into one A column and all the B columns into a B column. Like this.
A B
1 "text"
1 "text"
1 "text"
1 "text"
1 "text"
1 "text"
I am able to do this for one set of columns with this code:
df %<>% mutate(A1 = ifelse(is.na(A1), A2, A1))
# or possibly
df %<>% unite(A, A1, A2, sep = "", na.rm = TRUE) %>% mutate(A = as.numeric(A))
However, I have a lot of columns that need to be merged like this, resulting in a huge mutate command. Is there a way to do this cleaner/shorter?
Note: The names in the example are called A1 and A2 for clarity, in my orginal df, they are not that easily coupled.
You can try the base R code
unstack(
transform(
subset(u <- stack(df), complete.cases(u)),
ind = gsub("\\d+$", "", ind)
)
)
which gives
A B
1 1 text
2 1 text
3 1 text
4 1 text
5 1 text
6 1 text
Here's one approach... use a named list of column pairs and fcoalesce from "data.table". The names of the list will become the column names in the final data.frame.
pairs = list(A = 2:3, B = c(4, 6), C = c(5, 1))
data.frame(lapply(pairs, function(x) data.table::fcoalesce(df[x])))
# A B C
# 1 1 text 1.2
# 2 1 text 1.2
# 3 1 text 1.2
# 4 1 text 1.2
# 5 1 text 1.2
# 6 1 text 1.2
Sample data used for this example:
df <- data.frame(
C2 = c(1.2, NA, 1.2, 1.2, NA, NA), A1 = c(1,1,1,NA,NA,NA),
A2 = c(NA,NA,NA,1,1,1), B1 = c("text","text","text",NA,NA,NA),
C1 = c(NA, 1.2, NA, NA, 1.2, 1.2), B2 = c(NA,NA,NA,"text","text","text")
)
df
# C2 A1 A2 B1 C1 B2
# 1 1.2 1 NA text NA <NA>
# 2 NA 1 NA text 1.2 <NA>
# 3 1.2 1 NA text NA <NA>
# 4 1.2 NA 1 <NA> NA text
# 5 NA NA 1 <NA> 1.2 text
# 6 NA NA 1 <NA> 1.2 text
Related
I have an array with a few dimensions. I want to replace values according to values in the first index in the first dimension. In the example below, I want to change all values that the corresponding a1 dimension == 2. If I change only one index:
set.seed(2)
arr <- array(data=sample(1:2, 18, replace = TRUE), dim=c(3,3,2), dimnames=list(paste0("a",1:3),paste0("b",1:3),paste0("c",1:2)))
# replace second index according to first index of dimension 1
arr[2,,][arr[1,,]==2] <- NA
The result is as expected:
> arr
, , c1
b1 b2 b3
a1 1 2 1
a2 1 NA 1
a3 2 2 1
, , c2
b1 b2 b3
a1 2 2 1
a2 NA NA 2
a3 1 1 2
But if I try to change all other indexes like this:
set.seed(2)
arr <- array(data=sample(1:2, 18, replace = TRUE), dim=c(3,3,2), dimnames=list(paste0("a",1:3),paste0("b",1:3),paste0("c",1:2)))
# replace 2nd & 3rd index according to first index of dimension 1
arr[2:3,,][arr[1,,]==2] <- NA
It doesn't work as I expect. The indexes in an array is difficult to understand. How do I do it correctly? (naturally, without changing each index separately). Thanks.
I expect the result to be:
> arr
, , c1
b1 b2 b3
a1 1 2 1
a2 1 NA 1
a3 2 NA 1
, , c2
b1 b2 b3
a1 2 2 1
a2 NA NA 2
a3 NA NA 2
You can use rep to get the right indices for subsetting.
arr[2:3,,][rep(arr[1,,]==2, each=2)] <- NA
arr
#, , c1
#
# b1 b2 b3
#a1 1 2 1
#a2 1 NA 1
#a3 2 NA 1
#
#, , c2
#
# b1 b2 b3
#a1 2 2 1
#a2 NA NA 2
#a3 NA NA 2
Or more generally.
i <- 2:dim(arr)[1]
arr[i,,][rep(arr[1,,]==2, each=length(i))] <- NA
Or (Thanks to #jblood94 for this variant)
arr[-1,,][rep(arr[1,,]==2, each = nrow(arr) - 1)] <- NA
Or using a loop.
for(i in 2:nrow(arr)) arr[i,,][arr[1,,]==2] <- NA
It would be
arr[2:3,,][rep(arr[1,,]==2, each = 2)] <- NA
Or, more generally, to replace all rows based on the first row:
arr[-1,,][rep(arr[1,,]==2, each = nrow(arr) - 1)] <- NA
I'm trying to convert values in a data frame to rank order values by row. So take this:
df = data.frame(A = c(10, 20, NA), B = c(NA, 10, 20), C = c(20, NA, 10))
When I do this:
t(apply(df, 1, rank))
I get this:
[1,] 1 3 2
[2,] 2 1 3
[3,] 3 2 1
But I want the NA values to continue showing as NA, like so:
[1,] 1 NA 2
[2,] 2 1 NA
[3,] NA 2 1
Try using the argument na.last and set it to keep:
t(apply(df, 1, rank, na.last='keep'))
Output:
A B C
[1,] 1 NA 2
[2,] 2 1 NA
[3,] NA 2 1
As mentioned in the documentation of rank:
na.last:
for controlling the treatment of NAs. If TRUE, missing values in the data are put last; if FALSE, they are put first; if NA, they are removed; if "keep" they are kept with rank NA.
Here a dplyr approach
Libraries
library(dplyr)
Data
df <- tibble(A = c(10, 20, NA), B = c(NA, 10, 20), C = c(20, NA, 10))
Code
df %>%
mutate(across(.fns = ~rank(x = .,na.last = "keep")))
Output
# A tibble: 3 x 3
A B C
<dbl> <dbl> <dbl>
1 1 NA 2
2 2 1 NA
3 NA 2 1
I have a dataframe with two columns. I need to check if where a column is NA the other is not. Thanks
Edited.
I would like to know, for each row of the dataframe, if there are rows with both columns not NA.
You can use the following code to check which row has no NA values:
df <- data.frame(x = c(1, NA),
y = c(2, NA))
which(rowSums(is.na(df))==ncol(df))
Output:
[1] 1
As you can see the first rows has no NA values so both columns have no NA values.
Here's a simple code to generate a column of the NA count for each row:
x <- sample(c(1, NA), 25, replace = TRUE)
y <- sample(c(1, NA), 25, replace = TRUE)
df <- data.frame(x, y)
df$NA_Count <- apply(df, 1, function(x) sum(is.na(x)))
df
x y NA_Count
1 NA 1 1
2 NA NA 2
3 1 NA 1
4 1 NA 1
5 NA NA 2
6 1 NA 1
7 1 1 0
8 1 1 0
9 1 1 0
I was trying to add results of a for loop into a dataframe as new rows, but it gets an error when there is a new result with more columns than the original dataframe, how could I add the new result with extra columns to the dataframe with adding the extra column names to the original dataframe?
e.g.
original dataframe:
-______A B C
x1 1 1 1
x2 2 2 2
x3 3 3 3
I want to get
-______A B C D
x1 1 1 1 NA
x2 2 2 2 NA
x3 3 3 3 NA
X4 4 4 4 4
I tried rbind (Error in rbind(deparse.level, ...) :
numbers of columns of arguments do not match)
and rbind_fill (Error: All inputs to rbind.fill must be data.frames)
and bind_rows (Argument 2 must have names)
In base R, this can be done by creating a new column 'D' with NA and then assign new row with 4.
df1$D <- NA
df1['x4', ] <- 4
-output
> df1
A B C D
x1 1 1 1 NA
x2 2 2 2 NA
x3 3 3 3 NA
x4 4 4 4 4
Or in a single line
rbind(cbind(df1, D = NA), x4 = 4)
A B C D
x1 1 1 1 NA
x2 2 2 2 NA
x3 3 3 3 NA
x4 4 4 4 4
Regarding the error in bind_rows, it happens when the for loop output is not a named vector
library(dplyr)
> vec1 <- c(4, 4, 4, 4)
> bind_rows(df1, vec1)
Error: Argument 2 must have names.
Run `rlang::last_error()` to see where the error occurred.
If it is a named vector, then it should work
> vec1 <- c(A = 4, B = 4, C = 4, D = 4)
> bind_rows(df1, vec1)
A B C D
x1 1 1 1 NA
x2 2 2 2 NA
x3 3 3 3 NA
...4 4 4 4 4
data
df1 <- structure(list(A = 1:3, B = 1:3, C = 1:3),
class = "data.frame", row.names = c("x1",
"x2", "x3"))
You probably have something like this, if you list the elements of your for loop.
(l <- list(x1, x2, x3, x4, x5))
# [[1]]
# [1] 1 1 1
#
# [[2]]
# [1] 2 2 2 2
#
# [[3]]
# [1] 3 3
#
# [[4]]
# [1] 4
#
# [[5]]
# NULL
Multiple elements can be rbinded using a do.call(rbind, .) approach, your problem is, how to rbind multiple elements that differ in length.
There's a `length<-` function with which you may adjust the length of a vector. To know to which length, there's another function, lengths, that gives you the lengths of each list element, where you are interested in the maximum.
I include the special case when an element has length NULL (our 5th element of l); since length of NULL cannot be changed, replace those elements with NA.
So altogether you may do:
do.call(rbind, lapply(replace(l, lengths(l) == 0L, NA), `length<-`, max(lengths(l))))
# [,1] [,2] [,3] [,4]
# [1,] 1 1 1 NA
# [2,] 2 2 2 2
# [3,] 3 3 NA NA
# [4,] 4 NA NA NA
# [5,] NA NA NA NA
Or, since you probably want a data frame with pretty row and column names:
ml <- max(lengths(l))
do.call(rbind, lapply(replace(l, lengths(l) == 0L, NA), `length<-`, ml)) |>
as.data.frame() |> `dimnames<-`(list(paste0('x', 1:length(l)), LETTERS[1:ml]))
# A B C D
# x1 1 1 1 NA
# x2 2 2 2 2
# x3 3 3 NA NA
# x4 4 NA NA NA
# x5 NA NA NA NA
Note: R >= 4.1 used.
Data:
x1 <- rep(1, 3); x2 <- rep(2, 4); x3 <- rep(3, 2); x4 <- rep(4, 1); x5 <- NULL
I'd like to remove the NA values from my columns, merge all columns into four columns, while keeping NA's if there is not 4 values in each row.
Say I have data like this,
df <- data.frame('a' = c(1,4,NA,3),
'b' = c(3,NA,3,NA),
'c' = c(NA,2,NA,NA),
'd' = c(4,2,NA,NA),
'e'= c(NA,5,3,NA),
'f'= c(1,NA,NA,4),
'g'= c(NA,NA,NA,4))
#> a b c d e f g
#> 1 1 3 NA 4 NA 1 NA
#> 2 4 NA 2 2 5 NA NA
#> 3 NA 3 NA NA 3 NA NA
#> 4 3 NA NA NA NA 4 4
My desired outcome would be,
df.desired <- data.frame('a' = c(1,4,3,3),
'b' = c(3,2,3,4),
'c' = c(4,2,NA,4),
'd' = c(1,5,NA,NA))
df.desired
#> a b c d
#> 1 1 3 4 1
#> 2 4 2 2 5
#> 3 3 3 NA NA
#> 4 3 4 4 NA
You could've probably explored a bit more on SO to tweak two answers 1 & 2.
Shifting all the Numbers with NAs
Remove the columns where you've got All NAs
Result:
df <- data.frame('a' = c(1,4,NA,3),
'b' = c(3,NA,3,NA),
'c' = c(NA,2,NA,NA),
'd' = c(4,2,NA,NA),
'e'= c(NA,5,3,NA),
'f'= c(1,NA,NA,4),
'g'= c(NA,NA,NA,4))
df.new<-do.call(rbind,lapply(1:nrow(df),function(x) t(matrix(df[x,order(is.na(df[x,]))])) ))
colnames(df.new)<-colnames(df)
df.new
df.new[,colSums(is.na(df.new))<nrow(df.new)]
Output:
> df.new[,colSums(is.na(df.new))<nrow(df.new)]
a b c d
[1,] 1 3 4 1
[2,] 4 2 2 5
[3,] 3 3 NA NA
[4,] 3 4 4 NA
I believe there are more efficient ways, anyhow that is my try:
x00=sapply(1:nrow(df),function(x) df[x,][!is.na( df[x,])])
x01=lapply(x00,function(x) x=c(x,rep(NA,7-length(x)-1)))
x02=as.data.frame(do.call("rbind",x01))
x02 <- x02[,colSums(is.na(x02))<nrow(x02)]
I have following solution:
df <- data.frame('a' = c(1,4,NA,3),
'b' = c(3,NA,3,NA),
'c' = c(NA,2,NA,NA),
'd' = c(4,2,NA,NA),
'e'= c(NA,5,3,NA),
'f'= c(1,NA,NA,4),
'g'= c(NA,NA,NA,4))
df
x <-list()
for(i in 1:nrow(df)){
x[[i]] <- df[i,]
x[[i]] <- x[[i]][!is.na(x[[i]])]
# x[[i]] <- as.data.frame(x[[i]], stringsAsFactors = FALSE)
x[[i]] <- c(x[[i]], rep(0, 5 -length(x[[i]])))
}
result <- do.call(rbind, x)
result