how to add a new row with extra column in R? - r

I was trying to add results of a for loop into a dataframe as new rows, but it gets an error when there is a new result with more columns than the original dataframe, how could I add the new result with extra columns to the dataframe with adding the extra column names to the original dataframe?
e.g.
original dataframe:
-______A B C
x1 1 1 1
x2 2 2 2
x3 3 3 3
I want to get
-______A B C D
x1 1 1 1 NA
x2 2 2 2 NA
x3 3 3 3 NA
X4 4 4 4 4
I tried rbind (Error in rbind(deparse.level, ...) :
numbers of columns of arguments do not match)
and rbind_fill (Error: All inputs to rbind.fill must be data.frames)
and bind_rows (Argument 2 must have names)

In base R, this can be done by creating a new column 'D' with NA and then assign new row with 4.
df1$D <- NA
df1['x4', ] <- 4
-output
> df1
A B C D
x1 1 1 1 NA
x2 2 2 2 NA
x3 3 3 3 NA
x4 4 4 4 4
Or in a single line
rbind(cbind(df1, D = NA), x4 = 4)
A B C D
x1 1 1 1 NA
x2 2 2 2 NA
x3 3 3 3 NA
x4 4 4 4 4
Regarding the error in bind_rows, it happens when the for loop output is not a named vector
library(dplyr)
> vec1 <- c(4, 4, 4, 4)
> bind_rows(df1, vec1)
Error: Argument 2 must have names.
Run `rlang::last_error()` to see where the error occurred.
If it is a named vector, then it should work
> vec1 <- c(A = 4, B = 4, C = 4, D = 4)
> bind_rows(df1, vec1)
A B C D
x1 1 1 1 NA
x2 2 2 2 NA
x3 3 3 3 NA
...4 4 4 4 4
data
df1 <- structure(list(A = 1:3, B = 1:3, C = 1:3),
class = "data.frame", row.names = c("x1",
"x2", "x3"))

You probably have something like this, if you list the elements of your for loop.
(l <- list(x1, x2, x3, x4, x5))
# [[1]]
# [1] 1 1 1
#
# [[2]]
# [1] 2 2 2 2
#
# [[3]]
# [1] 3 3
#
# [[4]]
# [1] 4
#
# [[5]]
# NULL
Multiple elements can be rbinded using a do.call(rbind, .) approach, your problem is, how to rbind multiple elements that differ in length.
There's a `length<-` function with which you may adjust the length of a vector. To know to which length, there's another function, lengths, that gives you the lengths of each list element, where you are interested in the maximum.
I include the special case when an element has length NULL (our 5th element of l); since length of NULL cannot be changed, replace those elements with NA.
So altogether you may do:
do.call(rbind, lapply(replace(l, lengths(l) == 0L, NA), `length<-`, max(lengths(l))))
# [,1] [,2] [,3] [,4]
# [1,] 1 1 1 NA
# [2,] 2 2 2 2
# [3,] 3 3 NA NA
# [4,] 4 NA NA NA
# [5,] NA NA NA NA
Or, since you probably want a data frame with pretty row and column names:
ml <- max(lengths(l))
do.call(rbind, lapply(replace(l, lengths(l) == 0L, NA), `length<-`, ml)) |>
as.data.frame() |> `dimnames<-`(list(paste0('x', 1:length(l)), LETTERS[1:ml]))
# A B C D
# x1 1 1 1 NA
# x2 2 2 2 2
# x3 3 3 NA NA
# x4 4 NA NA NA
# x5 NA NA NA NA
Note: R >= 4.1 used.
Data:
x1 <- rep(1, 3); x2 <- rep(2, 4); x3 <- rep(3, 2); x4 <- rep(4, 1); x5 <- NULL

Related

Using IFELSE function across multiple columns [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I want to create a new column based on multiple columns of different data types
Names
1
2
3
A
000
NA
030
B
100
DDD
NA
C
XXX
000
050
Based on column 1-3, I want to add another column with the condition If value >= 30 then 1 else 0.
Output will be:
Names
1
2
3
4
A
000
NA
030
1
B
100
DDD
NA
1
C
XXX
000
015
0
Note : There are 36 such columns (1-36) across where I want to use the if condition and then create a new column.
adding some more details:
These variables are extracted from one long string like "030060000XXX010" which turned into 030 , 060, 000, XXX, 010. Now using IFELSE condition if any of the value (number looking) is >= 30 then 1 else 0
Consider using if_any. Loop over the columns other than 'Name', create a logical condition after converting to integer class, replace the NA with FALSE and coerces the logical output from if_any to binary (+)
library(dplyr)
library(tidyr)
df1 %>%
mutate(new = +(if_any(-Names, ~ replace_na(as.integer(.) >= 30, FALSE) ) ))
Since you want to group by 3, one way is to split.default the columns by 3, operate on one three-pack at a time, then combine them later.
I'll demonstrate on the data but repeating the three data columns so that we can show the iteration.
dat <- structure(list(Names = c("A", "B", "C"), X1 = c("000", "100", "XXX"), X2 = c(NA, "DDD", "000"), X3 = c(30L, NA, 50L), X1 = c("000", "100", "XXX"), X2 = c(NA, "DDD", "000"), X3 = c(30L, NA, 50L)), class = "data.frame", row.names = c(NA, -3L))
split.default(dat[,-1], (seq_along(dat)[-1]-2) %/% 3)
# $`0`
# X1 X2 X3
# 1 000 <NA> 30
# 2 100 DDD NA
# 3 XXX 000 50
# $`1`
# X1.1 X2.1 X3.1
# 1 000 <NA> 30
# 2 100 DDD NA
# 3 XXX 000 50
With this, we'll work on one three-pack at a time.
func <- function(x, lim = 30) {
x <- as.matrix(x)
x <- `dim<-`(suppressWarnings(as.numeric(x)), dim(x))
cbind(x,(+(rowSums(x <= lim, na.rm = TRUE) > 0)))
}
lapply(split.default(dat[,-1], (seq_along(dat)[-1]-2) %/% 3), func)
# $`0`
# [,1] [,2] [,3] [,4]
# [1,] 0 NA 30 1
# [2,] 100 NA NA 0
# [3,] NA 0 50 1
# $`1`
# [,1] [,2] [,3] [,4]
# [1,] 0 NA 30 1
# [2,] 100 NA NA 0
# [3,] NA 0 50 1
Now we just need to recombine them all again:
do.call(cbind, c(list(dat[,1,drop=FALSE]), lapply(split.default(dat[,-1], (seq_along(dat)[-1]-2) %/% 3), func)))
# Names 0.1 0.2 0.3 0.4 1.1 1.2 1.3 1.4
# 1 A 0 NA 30 1 0 NA 30 1
# 2 B 100 NA NA 0 100 NA NA 0
# 3 C NA 0 50 1 NA 0 50 1

How to merge elements of atomic vector in R?

I wanted to merge different elements of atomic vectors by elements names stored in list. See example:
ls = list(a = c(a = 1, b = 2, d = 2), b = c(b = 2, c = 3), c = c(a = 1, b = 2))
Now, I wanted to get output like this:
a b c
a 1 NA 1
b 2 2 2
c NA 3 NA
d 2 NA NA
I tried Reduce, but it is not working. I do not want to use any external package for this problem.
Thanks
You can use [ in sapply after you have extracted all elements names.
i <- sort(unique(unlist(lapply(ls, names))))
x <- sapply(ls, "[", i)
rownames(x) <- i
x
# a b c
#a 1 NA 1
#b 2 2 2
#c NA 3 NA
#d 2 NA NA
We could also use bind_rows here
library(dplyr)
library(tibble)
bind_rows(ls, .id = 'x') %>%
column_to_rownames('x') %>%
t
a b c
a 1 NA 1
b 2 2 2
d 2 NA NA
c NA 3 NA
Or using base R
xtabs(values ~ ind + x, do.call(rbind, Map(cbind, x = names(ls), lapply(ls, stack))))
x
ind a b c
a 1 0 1
b 2 2 2
d 2 0 0
c 0 3 0
A data.table option using rbindlist
> t(rbindlist(Map(function(x) data.table(t(x)), lst), fill = TRUE))
[,1] [,2] [,3]
a 1 NA 1
b 2 2 2
d 2 NA NA
c NA 3 NA

Follow-up: Separate columns with constant numbers and condense them to one row in R data.frame

This question is a follow-up on my previous question. In this question, after my split.default() call below, I get a named list of data.frames called L.
Qs: I was wondering how I could condense each data.frame in L whose each column consists of a constant number? (How about if I know the names of the data.frames whose columns are constant numbers?)
My desired output is shown further below.
r <- list(
data.frame(study.name = rep("Jacob", 6),
X = c(2,2,1,1,NA, NA),
Y = c(1,1,1,2,1,NA),
A = rep(1, 6),
B = rep(4, 6)),
data.frame(study.name = rep("Jon", 6),
X = c(1,NA,3,1,NA,NA),
G = c(1,1,1,2,NA,NA),
A = rep(3, 6),
B = rep(7, 6)))
DATA <- do.call(cbind, r)
nm1 <- Reduce(intersect, lapply(r, colnames))[-1]
L <- split.default(DATA[names(DATA) %in% nm1], names(DATA)[names(DATA) %in% nm1])
Desired output:
# $A
# A A.1
# 1 1 3
# $B
# B B.1
# 1 4 7
# $X
# X X.1
# 1 2 1
# 2 2 NA
# 3 1 3
# 4 1 1
# 5 NA NA
# 6 NA NA
Assuming that the NA rows should be preserved, apply duplicated by looping over the list as well as if all the elements of a particular are NA, then keep that row
lapply(L, function(x) x[(rowSums(is.na(x)) == ncol(x))|!duplicated(x),])
#$A
# A A.1
#1 1 3
#$B
# B B.1
#1 4 7
#$X
# X X.1
#1 2 1
#2 2 NA
#3 1 3
#4 1 1
#5 NA NA
#6 NA NA
If we also need a check for constant value
is_constant <- function(x) length(unique(x)) == 1L
lapply(L, function(x) if(all(sapply(x, is_constant))) x[1,, drop = FALSE] else x)
#$A
# A A.1
#1 1 3
#$B
# B B.1
#1 4 7
#$X
# X X.1
#1 2 1
#2 2 NA
#3 1 3
#4 1 1
#5 NA NA
#6 NA NA

Issue with local variables in r custom function

I've got a dataset
>view(interval)
# V1 V2 V3 ID
# 1 NA 1 2 1
# 2 2 2 3 2
# 3 3 NA 1 3
# 4 4 2 2 4
# 5 NA 5 1 5
>dput(interval)
structure(list(V1 = c(NA, 2, 3, 4, NA),
V2 = c(1, 2, NA, 2, 5),
V3 = c(2, 3, 1, 2, 1), ID = 1:5), row.names = c(NA, -5L), class = "data.frame")
I would like to extract the previous not NA value (or the next, if NA is in the first row) for every row, and store it as a local variable in a custom function, because I have to perform other operations on every row based on this value(which should change for every row i'm applying the function).
I've written this function to print the local variables, but when I apply it the output is not what I want
myFunction<- function(x){
position <- as.data.frame(which(is.na(interval), arr.ind=TRUE))
tempVar <- ifelse(interval$ID == 1, interval[position$row+1,
position$col], interval[position$row-1, position$col])
return(tempVar)
}
I was expecting to get something like this
# [1] 2
# [2] 2
# [3] 4
But I get something pretty messed up instead.
Here's attempt number 1:
dat <- read.table(header=TRUE, text='
V1 V2 V3 ID
NA 1 2 1
2 2 3 2
3 NA 1 3
4 2 2 4
NA 5 1 5')
myfunc1 <- function(x) {
ind <- which(is.na(x), arr.ind=TRUE)
# since it appears you want them in row-first sorted order
ind <- ind[order(ind[,1], ind[,2]),]
# catch first-row NA
ind[,1] <- ifelse(ind[,1] == 1L, 2L, ind[,1] - 1L)
x[ind]
}
myfunc1(dat)
# [1] 2 2 4
The problem with this is when there is a second "stacked" NA:
dat2 <- dat
dat2[2,1] <- NA
dat2
# V1 V2 V3 ID
# 1 NA 1 2 1
# 2 NA 2 3 2
# 3 3 NA 1 3
# 4 4 2 2 4
# 5 NA 5 1 5
myfunc1(dat2)
# [1] NA NA 2 4
One fix/safeguard against this is to use zoo::na.locf, which takes the "last observation carried forward". Since the top-row is a special case, we do it twice, second time in reverse. This gives us the "next non-NA value in the column (up or down, depending).
library(zoo)
myfunc2 <- function(x) {
ind <- which(is.na(x), arr.ind=TRUE)
# since it appears you want them in row-first sorted order
ind <- ind[order(ind[,1], ind[,2]),]
# this is to guard against stacked NA
x <- apply(x, 2, zoo::na.locf, na.rm = FALSE)
# this special-case is when there are one or more NAs at the top of a column
x <- apply(x, 2, zoo::na.locf, fromLast = TRUE, na.rm = FALSE)
x[ind]
}
myfunc2(dat2)
# [1] 3 3 2 4

Merge data.frame columns on set number of columns removing na's unless not enough values in row

I'd like to remove the NA values from my columns, merge all columns into four columns, while keeping NA's if there is not 4 values in each row.
Say I have data like this,
df <- data.frame('a' = c(1,4,NA,3),
'b' = c(3,NA,3,NA),
'c' = c(NA,2,NA,NA),
'd' = c(4,2,NA,NA),
'e'= c(NA,5,3,NA),
'f'= c(1,NA,NA,4),
'g'= c(NA,NA,NA,4))
#> a b c d e f g
#> 1 1 3 NA 4 NA 1 NA
#> 2 4 NA 2 2 5 NA NA
#> 3 NA 3 NA NA 3 NA NA
#> 4 3 NA NA NA NA 4 4
My desired outcome would be,
df.desired <- data.frame('a' = c(1,4,3,3),
'b' = c(3,2,3,4),
'c' = c(4,2,NA,4),
'd' = c(1,5,NA,NA))
df.desired
#> a b c d
#> 1 1 3 4 1
#> 2 4 2 2 5
#> 3 3 3 NA NA
#> 4 3 4 4 NA
You could've probably explored a bit more on SO to tweak two answers 1 & 2.
Shifting all the Numbers with NAs
Remove the columns where you've got All NAs
Result:
df <- data.frame('a' = c(1,4,NA,3),
'b' = c(3,NA,3,NA),
'c' = c(NA,2,NA,NA),
'd' = c(4,2,NA,NA),
'e'= c(NA,5,3,NA),
'f'= c(1,NA,NA,4),
'g'= c(NA,NA,NA,4))
df.new<-do.call(rbind,lapply(1:nrow(df),function(x) t(matrix(df[x,order(is.na(df[x,]))])) ))
colnames(df.new)<-colnames(df)
df.new
df.new[,colSums(is.na(df.new))<nrow(df.new)]
Output:
> df.new[,colSums(is.na(df.new))<nrow(df.new)]
a b c d
[1,] 1 3 4 1
[2,] 4 2 2 5
[3,] 3 3 NA NA
[4,] 3 4 4 NA
I believe there are more efficient ways, anyhow that is my try:
x00=sapply(1:nrow(df),function(x) df[x,][!is.na( df[x,])])
x01=lapply(x00,function(x) x=c(x,rep(NA,7-length(x)-1)))
x02=as.data.frame(do.call("rbind",x01))
x02 <- x02[,colSums(is.na(x02))<nrow(x02)]
I have following solution:
df <- data.frame('a' = c(1,4,NA,3),
'b' = c(3,NA,3,NA),
'c' = c(NA,2,NA,NA),
'd' = c(4,2,NA,NA),
'e'= c(NA,5,3,NA),
'f'= c(1,NA,NA,4),
'g'= c(NA,NA,NA,4))
df
x <-list()
for(i in 1:nrow(df)){
x[[i]] <- df[i,]
x[[i]] <- x[[i]][!is.na(x[[i]])]
# x[[i]] <- as.data.frame(x[[i]], stringsAsFactors = FALSE)
x[[i]] <- c(x[[i]], rep(0, 5 -length(x[[i]])))
}
result <- do.call(rbind, x)
result

Resources