Create dataframe with columns of unequal length from other dataframes [duplicate] - r

This question already has answers here:
cbind a dataframe with an empty dataframe - cbind.fill?
(10 answers)
Closed 9 years ago.
Say I have 5 dataframes with identical columns but different row lengths. I want
to make 1 dataframe that takes a specific column from each of the 5 dataframes, and
fills in with NA's (or whatever) where there isn't a length match. I've seen questions
on here that show how to do this with one-off vectors, but I'm looking for a way to
do it with bigger sets of data.
Ex: 2 dataframes of equal length:
long <- data.frame(accepted = rnorm(350, 2000), cost = rnorm(350,5000))
long2 <- data.frame(accepted = rnorm(350, 2000), cost = rnorm(350,5000))
I can create a list that combines them, then create an empty dataframe and populate
it with a common variable from the dataframes in the list:
list1 <- list(long, long2)
df1 <- as.data.frame(matrix(0, ncol = 5, nrow = 350))
df1[,1:2] <- sapply(list, '[[', 'accepted')
And it works.
But when I have more dataframes of unequal length, this approach fails:
long <- data.frame(accepted = rnorm(350, 2000), cost = rnorm(350,5000))
long2 <- data.frame(accepted = rnorm(350, 2000), cost = rnorm(350,5000))
medlong <- data.frame(accepted = rnorm(300, 2000), cost = rnorm(300,5000))
medshort <- data.frame(accepted = rnorm(150, 2000), cost = rnorm(150,5000))
short <- data.frame(accepted = rnorm(50, 2000), cost = rnorm(50,5000))
Now making the list and combined dataframe:
list2 <- list(long, long2, medlong, medshort, short)
df2 <- as.data.frame(matrix(0, ncol = 5, nrow = 350))
df1[,1:5] <- sapply(list, '[[', 'accepted')
I get the error about size mismatch:
Error in [<-.data.frame(*tmp*, , 1:5, value = c(1998.77096640377, :
replacement has 700 items, need 1750
The only solution I've found to populating this dataframe with columns of unequal
length from other dataframes is something along the lines of:
combined.df <- as.data.frame(matrix(0, ncol = 5, nrow = 350))
combined.df[,1] <- long[,2]
combined.df[,2] <- c(medlong[,2], rep(NA, nrow(long) - nrow(medlong))
But there's got to be a more elegant and faster way to do it... I know I'm missing something huge conceptually here

One way would be to find the length of the longest column and then concatenate shorter columns with the appropriate number of NAs. One way would be like this (with data of a more reasonable size for a MWE!)...
out <- lapply( list1 , '[[', 'accepted')
# Find length of longest column
len <- max( sapply( out , length ) )
# Stack shorter columns with NA at the end
dfs <- sapply( out , function(x) c( x , rep( NA , len - length(x) ) ) )
# Make data.frame and set column names at same time
setNames( do.call( data.frame , dfs ) , paste0("V" , 1:length(out) ) )
V1 V2 V3
1 -1.0913212 -2.4864497 0.04220331
2 -0.5252874 0.8030984 0.21774515
3 0.6914167 0.9685629 1.47159957
4 NA NA -0.89809670
5 NA NA 0.51140539
6 NA NA -0.46833136
7 NA NA -0.40085707

You could, also, "subset" each dataframe like df[nrow(df) + n,] in order to insert NAs:
#dataframes of different rows
long <- data.frame(accepted = rnorm(15, 2000), cost = rnorm(15,5000))
long2 <- data.frame(accepted = rnorm(10, 2000), cost = rnorm(10,5000))
long3 <- data.frame(accepted = rnorm(12, 2000), cost = rnorm(12,5000))
#insert all dataframes in list to manipulate
myls <- list(long, long2, long3)
#maximum number of rows
max.rows <- max(nrow(long), nrow(long2), nrow(long3))
#insert the needed `NA`s to each dataframe
new_myls <- lapply(myls, function(x) { x[1:max.rows,] })
#create wanted dataframe
do.call(cbind, lapply(new_myls, `[`, "accepted"))
# accepted accepted accepted
#1 2001.581 1999.014 2001.810
#2 2000.071 2000.033 2000.588
#3 1999.931 2000.188 2000.833
#4 1998.467 1999.891 1997.645
#5 2000.682 2000.144 1999.639
#6 1999.693 1999.341 1998.959
#7 2000.222 1998.939 2002.271
#8 1999.104 1998.530 1997.600
#9 1998.435 2001.496 2001.129
#10 1998.160 2000.729 2001.602
#11 1999.267 NA 1999.733
#12 2000.048 NA 2001.431
#13 1999.504 NA NA
#14 2000.660 NA NA
#15 2000.160 NA NA

You can try using merge:
long$rn <- rownames(long)
long2$rn <- rownames(long2)
medlong$rn <- rownames(medlong)
medshort$rn <- rownames(medshort)
short$rn <- rownames(short)
result <- (merge(merge(merge(merge(
long, long2[, cols], by=c('rn'), all=T),
medlong[, cols], by=c('rn'), all=T),
medshort[, cols], by=c('rn'), all=T),
short[, cols], by=c('rn'), all=T))

Related

Add/match rows with NA to matrix based on missing unique IDs

I am using a panel data set and intent to model this as a dynamic affiliation network using SAOMs. The data is unfortunately very messy and a pain to deal with.
I have managed to create adjacency matrices for each panel wave. However, over time the panel grew in size / people left. I need the number of rows in each matrix to be the same and in the same order according to the unique IDs, which are present when inspecting the objects in R. All "added IDs" should show 10s across the whole row.
Here is a reproducible example that should make the issue clear and also shows what I aim for. I assume this can be solved by smart use of the merge() function, but I could not get it to work:
wave1 <- matrix(c(0,0,1,1,0,1,1,0,1,1), nrow = 5, ncol = 2, dimnames = list(c("1","2","4","5","9"), c("group1","group2")))
wave2 <- matrix(c(0,1,1,0,1,0,1,1), nrow = 4, ncol = 2, dimnames = list(c("1","4","8","9"), c("group1","group2")))
wave1_c <- matrix(c(0,0,1,1,10,0,1,1,0,0,10,1), nrow = 6, ncol = 2, dimnames = list(c("1","2","4","5","8","9"), c("group1","group2")))
wave2_c <- matrix(c(0,10,1,10,1,0,1,10,0,10,1,1), nrow = 6, ncol = 2, dimnames = list(c("1","2","4","5","8","9"), c("group1","group2")))
Thanks in advance. Numbers in the matrices are arbitrary except for the 10s.
Solution in base R using dataframes and merge.
Merge and outer join.
dwave1_c <- merge(wave1, wave2, by = 'row.names', all = TRUE, suffixes="")[2:3]
dwave2_c <- merge(wave2, wave1, by = 'row.names', all = TRUE, suffixes="")[2:3]
dwave1_c[is.na(dwave1_c)] <- 10
dwave2_c[is.na(dwave2_c)] <- 10
as.matrix(dwave1_c)
as.matrix(dwave2_c)
Update.
both <- merge(wave1, wave2, by = 'row.names', all = TRUE)
Output.
Row.names group1.x group2.x group1.y group2.y
1 1 0 1 0 1
2 2 0 1 NA NA
3 4 1 0 1 0
4 5 1 1 NA NA
5 8 NA NA 1 1
6 9 0 1 0 1
dwave1_c <- both[,2:3]; colnames(dwave1_c) <- colnames(wave1)
dwave2_c <- both[,4:5]; colnames(dwave2_c) <- colnames(wave2)
dwave1_c[is.na(dwave1_c)] <- 10
dwave2_c[is.na(dwave2_c)] <- 10
Show result.
as.matrix(dwave1_c)
as.matrix(dwave2_c)
First try.
## Convert matrix to dataframe.
df1 <- as.data.frame(wave1)
df2 <- as.data.frame(wave2)
## Merge df1 and df2 by row name.
m_df1_df2 <- merge(df1, df2, by = 'row.names', all = TRUE)
rownames(m_df1_df2) <- m_df1_df2$Row.names
# Rows not in df1, but in df2,
# rows not in df2, but in df1
not1_2 <- m_df1_df2[is.na(m_df1_df2$group1.x),][c("group1.x", "group2.x")] # not in df1, in df2
not2_1 <- m_df1_df2[is.na(m_df1_df2$group1.y),][c("group1.y", "group2.y")] # not in df2, in df1
## Same column names.
colnames(not1_2) <- colnames(df1)
colnames(not2_1) <- colnames(df2)
## append
df1_c <- rbind(df1, not1_2)
df2_c <- rbind(df2, not2_1)
## order by row name
df1_c <- df1_c[order(row.names(df1_c)), ]
df2_c <- df2_c[order(row.names(df2_c)), ]
## replace NA by 10
df1_c[is.na(df1_c)] <- 10
df2_c[is.na(df2_c)] <- 10
as.matrix(df1_c)
as.matrix(df2_c)
The conversion of wave1,2 to data frames in my first attempt is redundant and can be omitted. However at the expense of implicit coercions.
## merge wave1 and wave2 by row name.
m_df1_df2 <- merge(wave1, wave2, by = 0, all = TRUE)
rownames(m_df1_df2) <- m_df1_df2$Row.names
# rows not in set 1, but in set 2,
# rows not in set 2, but in set 1.
not1_2 <- m_df1_df2[is.na(m_df1_df2$group1.x),][c("group1.x", "group2.x")]
not2_1 <- m_df1_df2[is.na(m_df1_df2$group1.y),][c("group1.y", "group2.y")]
## Same column names.
colnames(not1_2) <- colnames(wave1)
colnames(not2_1) <- colnames(wave2)
## append.
wave1_c <- rbind(wave1, not1_2)
wave2_c <- rbind(wave2, not2_1)
## order by row name.
wave1_c <- wave1_c[order(row.names(wave1_c)), ]
wave2_c <- wave2_c[order(row.names(wave2_c)), ]
## replace NA by 10.
wave1_c[is.na(wave1_c)] <- 10
wave2_c[is.na(wave2_c)] <- 10
## show result.
wave1_c
wave2_c
Solution using setdiff.
## rownames not in set 1, but in set 2,
## rownames not in set 2, but in set 1.
rn_not2_1 <- setdiff(rownames(wave1), rownames(wave2))
rn_not1_2 <- setdiff(rownames(wave2), rownames(wave1))
## missing rows to add.
add_to_1 <- wave2[rn_not1_2,,drop=FALSE]
add_to_2 <- wave1[rn_not2_1,,drop=FALSE]
add_to_1[,] <- 10
add_to_2[,] <- 10
## append.
wave1_c <- rbind(wave1, add_to_1)
wave2_c <- rbind(wave2, add_to_2)
## order by row name.
wave1_c <- wave1_c[order(row.names(wave1_c)), ]
wave2_c <- wave2_c[order(row.names(wave2_c)), ]
## show result.
wave1_c
wave2_c

Drop columns with a 'NA' header from data frames in a list?

I have a list of data frames that are pulled in from an Excel file. Some of the columns in the data frames have are named 'NA', contain no data, and are useless; therefore, I would like to drop them. The list contains 9 data frames and most have columns with 'NA' as their title.
Through multiple iterations, R has returned an error or warning. Including:
all_list <- all_list[!is.na(colnames(all_list))]
Warning message:
In is.na(colnames(all_list)) :
is.na() applied to non-(list or vector) of type 'NULL'
The above did not serve it's intended purpose, as the NA columns are still in each data frame.
all_list <- lapply(all_list, function(x){
colnames(x) <- x[!is.na(colnames(x))]
return(x)
})
This seems closer to the intended output, but reformats the data frame columns to be filled with NA's instead.
Here is a sample of my data showcasing the aforementioned NA's:
str(all_list)
List of 8
$ Retail :'data.frame': 305 obs. of 25 variables:
$ NA : chr [1:305] NA "Variable" "Variable" "Variable" ...
$ TIMEPERIOD : chr [1:305] NA "41640" "41671" "41699" ...
Edit: In case it wasn't clear, these blank columns filled with NA are the result of formatting within Excel for the sake of spacing; however, they serve no purpose for analysis within R.
You are pretty close to solution. A slight change in function used with lapply will take you to expected result.
The lapply traverses through each dataframe and your function needs to subset columns which names are not equal to NA.
all_list < lapply(all_list, function(x){
x[,colnames(x) != "NA"]
})
# Verify changed data all_list
all_list[[1]]
# col1 col2
# 1 g x
# 2 j z
# 3 n p
# 4 u o
# 5 e b
Data:
set.seed(1)
df1 <- data.frame(sample(letters, 5), sample(letters, 5), 1:5,
stringsAsFactors = FALSE)
names(df1) <- c("col1","col2","NA")
df2 <- data.frame(sample(letters, 5), sample(letters, 5), 11:15,
stringsAsFactors = FALSE)
names(df2) <- c("col1","col2","NA")
df3 <- data.frame(sample(letters, 5), sample(letters, 5), rep(NA, 5),
stringsAsFactors = FALSE)
names(df3) <- c("col1","col2","NA")
df4 <- data.frame(sample(letters, 5), sample(letters, 5), rep(NA, 5),
stringsAsFactors = FALSE)
names(df4) <- c("col1","col2","NA")
all_list <- list(df1,df2,df3,df4)
#check data
all_list[[1]]
# col1 col2 NA
#1 g x 1
#2 j z 2
#3 n p 3
#4 u o 4
$5 e b 5
# all_list[[2]], all_list[[3]] and all_list[[4]] contains similar values

create empty dataframe and error replacement has 1 row, data has 0 occurs

I need to create an empty dataframe, and set up columns for appending values later on.
My code:
df <- data.frame()
varNames <- c("rho", "lambda", "counts")
colnames(df)<- varNames
df['$rho'] <- NA
df["lambda"] <- NA
df["counts"] <- NA
But
Error in [<-.data.frame(*tmp*, "$rho", value = NA) : replacement has 1 row, data has 0
occurs.
We can use
df1 <- data.frame(rho = numeric(), lambda = numeric(), counts = numeric())
rbind(df1, list(rho = NA, lambda = NA, counts = NA))
# rho lambda counts
#1 NA NA NA
If we are assigning separately, then a list would be useful
lst <- setNames(vector("list", 3), varNames)
lst[['rho']] <- NA
lst[['lambda']] <- NA
lst
#$rho
#[1] NA
#$lambda
#[1] NA
#$counts
#NULL
as list elements can be of different length whereas a data.frame is a list with equal length columns. Once the assignments are completed and are of equal lengths, then convert it to data.frame with data.frame(lst) and write it back to file

How to merge several columns of the same dataframe?

I have one big data frame containing different measurements performed by several probes.
The timing of the measurements are not exactly the same. As I want to compare both measurements at a given time and plot them in an animation, I need my data to be "synchronized".
Here is an example of the dataframe I get (in real life I have way more columns that I read directly from a text file):
time1.in.s <- seq(0.010, 100, length.out = 100)
time2.in.s <- seq(0.022, 100, length.out = 100)
data1 <- seq(-10, 100, length.out = 100)
data2 <- seq(-25, 80, length.out = 100)
my.df <- data.frame(time1.in.s, data1, time2.in.s, data2)
Which gives:
time1.in.s data1 time2.in.s data2
1 0.01 -10.000000 0.022000 -25.0000000
2 1.02 -8.888889 1.031879 -23.9393939
3 2.03 -7.777778 2.041758 -22.8787879
4 3.04 -6.666667 3.051636 -21.8181818
5 4.05 -5.555556 4.061515 -20.7575758
6 5.06 -4.444444 5.071394 -19.6969697
What I want to do is merge the two timeX.in.s columns in a single "time" column. Where data is not available, I would have NAs that I could fill in with something like na.approx(my.df$data1, x = my.df$time).
This code is given so that you can reproduce the problem, but in real life, time1.in.s, time2.in.s, data1 and data2 are not available separately. What I actually do is my.df <- read.table(my.file, header = TRUE) and I get the same result. I thus don't have the possibility to build the separate data frames directly, I need to split the one big data frame in several manually:
df.list <- list()
for (i in seq(1, ncol(my.df), 2)) {
df.list[[ceiling(i/2)]] <- data.frame(time = my.df[, i], data = my.df[, i+1])
}
Then merge the dataframes one by one:
merged.df <- data.frame(time = as.numeric(NA), data = as.numeric(NA))
for (i in 1:length(df.list)) {
merged.df <- merge(merged.df, df.list[[i]], by = "time", all = TRUE)
}
And finally fill in the gaps:
merged.df$data.y <- na.approx(merged.df$data.y, x = merged.df$time, na.rm = FALSE)
That definitely works (except the names of the columns are a big mess). But it is cumbersome and doesn't look very R to me. Is there a simpler way to do this?
Here is the result obtained with the above commands:
> head(merged.df)
time data.x data.y data
1 0.010000 NA -10.000000 NA
2 0.022000 NA -9.986799 -25.00000
3 1.020000 NA -8.888889 NA
4 1.031879 NA -8.875821 -23.93939
5 2.030000 NA -7.777778 NA
6 2.041758 NA -7.764843 -22.87879
Column data.x comes from the initial empty merged.df. It can be dumped.
Column data.y is the my.df$data1 column.
In the above dataframe, I did not use the na.approx command on column data (which corresponds to my.df$data2 column)
Additional note on OmaymaS' proposed solution:
To make this work in the general case (i.e. with any number of columns), what I have done is the following. First, I defined a 6 columns data frame:
time1.in.s <- seq(0.010, 100, length.out = 100)
time2.in.s <- seq(0.022, 100, length.out = 100)
time3.in.s <- seq(0.017, 99.8, length.out = 100)
data1 <- seq(-10, 100, length.out = 100)
data2 <- seq(-25, 80, length.out = 100)
data3 <- seq(-15, 70, length.out = 100)
my.df <- data.frame(time1.in.s, data1, time2.in.s, data2, time3.in.s, data3)
This leads to:
head(my.df)
time1.in.s data1 time2.in.s data2 time3.in.s data3
1 0.01 -10.000000 0.022000 -25.00000 0.017000 -15.00000
2 1.02 -8.888889 1.031879 -23.93939 1.024909 -14.14141
3 2.03 -7.777778 2.041758 -22.87879 2.032818 -13.28283
4 3.04 -6.666667 3.051636 -21.81818 3.040727 -12.42424
5 4.05 -5.555556 4.061515 -20.75758 4.048636 -11.56566
6 5.06 -4.444444 5.071394 -19.69697 5.056545 -10.70707
I changed the name of all columns containing the time to the same name (this way I don't have to tell the merge function which column to merge by):
colnames(my.df)[seq(1, ncol(my.df), 2)] <- "Time"
Then I loop on a slightly modified Reduce function:
df.merged <- my.df[, 1:2]
for (i in seq(3, ncol(my.df), 2)) {
df.merged <- Reduce(function(x,y) merge(x,y,
all = TRUE),
list(df.merged,
my.df[, i:(i+1)])
)
}
This gives:
> head(df.merged)
Time data1 data2 data3
1 0.010000 -10.000000 NA NA
2 0.017000 NA NA -15.00000
3 0.022000 NA -25.00000 NA
4 1.020000 -8.888889 NA NA
5 1.024909 NA NA -14.14141
6 1.031879 NA -23.93939 NA
Finally, I apply the na.approx function:
df.interp <- df.merged
df.interp[, 2:ncol(df.interp)] <- na.approx(df.interp[, 2:ncol(df.interp)],
x = df.interp$Time,
na.rm = FALSE)
Here is the final result:
> head(df.interp)
Time data1 data2 data3
1 0.010000 -10.000000 NA NA
2 0.017000 -9.992299 NA -15.00000
3 0.022000 -9.986799 -25.00000 -14.99574
4 1.020000 -8.888889 -23.95187 -14.14560
5 1.024909 -8.883488 -23.94671 -14.14141
6 1.031879 -8.875821 -23.93939 -14.13548
I still have NAs at the beginning of some data columns, but I can get rid of them with the na.omit function.
Try merge, it should help you accomplish what you need:
First: create two datframes with data and corresponding time:
df1 <- data.frame(time1.in.s, data1)
df2 <- data.frame(time2.in.s, data2)
Second: merge the two dataframes, specifying the column to use using by.x and by.y, and include all values:
df.merged <- merge(df1,df2,
by.x = "time1.in.s",
by.y = "time2.in.s",
all.x = TRUE,
all.y = TRUE)
Note: to clarify as per Sotos recommendation:
all.x = TRUE,
all.y = TRUE
is similar to
all = TRUE
So if you want to exclude values from either dataframes that do not exist in the other, you can set all.x or all.y to FALSE.
Now you will have time in once column, and you can rename the columns as you like.
> head(df.merged)
time1.in.s data1 data2
1 0.010000 -10.000000 NA
2 0.022000 NA -25.00000
3 1.020000 -8.888889 NA
4 1.031879 NA -23.93939
5 2.030000 -7.777778 NA
6 2.041758 NA -22.87879
EDIT: If you want to apply this on multiple columns, where you have multiple timen.in.s- datan, you can try reduce as follows, where you can add multiple selections in the list, and all will be merged according to the time column, assuming that it will be always the first in select.
df.merged <- Reduce(function(x,y) merge(x,y,
by.x = names(x)[1],
by.y = names(y)[1],
all = TRUE),
list(select(my.df,time1.in.s, data1),
select(my.df,time2.in.s, data2))
)
> head(df.merged)
time1.in.s data1 data2
1 0.010000 -10.000000 NA
2 0.022000 NA -25.00000
3 1.020000 -8.888889 NA
4 1.031879 NA -23.93939
5 2.030000 -7.777778 NA
6 2.041758 NA -22.87879
Additional NOTE:
If you want to use columns' indecies, you can use:
df.merged <- Reduce(function(x,y) merge(x,y,
by.x = names(x)[1],
by.y = names(y)[1],
all = TRUE),
list(select(my.df,1,2),
select(my.df,3,4))
)
Also If your columns' names are consistent, and you want to build the list automatically, you can create a function which takes an integer and return the columns' names you want to select:
getDF <- function(x)
{
c1 <- paste0("time",x,".in.s")
c2 <- paste0("data",x)
return(c(c1,c2))
}
For example:
> getDF(1)
[1] "time1.in.s" "data1"
Then you can use this in reduce:
df.merged <- Reduce(function(x,y) merge(x,y,
by.x = names(x)[1],
by.y = names(y)[1],
all = TRUE),
list(my.df[,getDF(1)],
my.df[,getDF(2)])
)
A bit of code.
I am assuming that you would like to split your data.frame every two columns
library(magrittr)
library(dplyr)
...
my.df <- data.frame(time1.in.s, data1, time2.in.s, data2)
my.df %<>% t %>% data.frame %>%
mutate(x=(mod(seq_along(row.names(.)), 2) +
seq_along(row.names(.)))/2) %>% split(., .$x) %>% lapply(t)
for (i in 1:length(my.df)) colnames(my.df[[i]]) <- c("time", paste0("data",i))
my.df %<>% lapply(function(x) x[-dim(x), ])
final = Reduce(function(...) merge(..., all=T), my.df)

Sum observations from two columns, looping over many columns in R

I have searched high and low, but am stuck on how to approach this. I have two sets of columns that I want to sum, row by row, but which I want to loop over many columns. If I were to do this manually, I would want:
df1[1,1]+df2[1,1]
df1[2,1]+df2[2,1]
etc... I've found many helpful examples on how to do something like:
apply(df[,c("a","d")], 1, sum)
though I want to do this over lots of columns. Also, while it's not entirely relevant, I want to phrase my question as close to my reality as possible, so my example below includes NA's, since my actual data contains many missing values.
# make a data frame, df1, with three columns
a <- sample(1:100, 50, replace = T)
b <- sample(100:300, 50, replace = T)
c <- sample(2:50, 500, replace = T)
df1 <- cbind(a,b,c)
# make another data frame, df2, with three columns
x <- sample(1:100, 50, replace = T)
y <- sample(100:300, 50, replace = T)
z <- sample(2:50, 50, replace = T)
df2 <- cbind(x,y,z)
# make another data frame, df2, with three columns
x <- sample(1:100, 50, replace = T)
y <- sample(100:300, 50, replace = T)
z <- sample(2:50, 50, replace = T)
df2 <- cbind(x,y,z)
Make it possible to randomly throw a few NAs in, function from http://www.r-bloggers.com/function-to-generate-a-random-data-set/
NAins <- NAinsert <- function(df, prop = .1){
n <- nrow(df)
m <- ncol(df)
num.to.na <- ceiling(prop*n*m)
id <- sample(0:(m*n-1), num.to.na, replace = FALSE)
rows <- id %/% m + 1
cols <- id %% m + 1
sapply(seq(num.to.na), function(x){
df[rows[x], cols[x]] <<- NA
}
)
return(df)
}
Add the NAs to the frames
NAins(df1, .2)
NAins(df2, .14)
Then, I tried to seq along the columns in each data frame, and used apply setting the index to 1, meaning to sum each row entry. This doesn't work.
for(i in seq_along(df1)){
for(j in seq_along(df2)){
apply(c(df1[,i], col2[j]), 1, function(x) sum(x, na.rm = T))}}
Thanks for any help!
You should be able to just replace NA with 0, and then add with "+":
replace(df1, is.na(df1), 0) + replace(df2, is.na(df2), 0)
# X Y Z
# 1 7 19 6
# 2 11 12 1
# 3 16 14 11
# 4 13 7 13
# 5 10 2 11
Alternatively, if you have more than just two data.frames, you can collect them in a list and use Reduce:
Reduce("+", lapply(mget(c("df1", "df2", "df3")), function(x) replace(x, is.na(x), 0)))
Here's some sample data (and what I think is an easier way to create it):
set.seed(1) ## Set a seed so others can reproduce your sample data
dfmaker <- function() {
setNames(
data.frame(
replicate(3, sample(c(NA, 1:10), 5, TRUE), FALSE)),
c("X", "Y", "Z"))
}
df1 <- dfmaker()
df1
# X Y Z
# 1 2 9 2
# 2 4 10 1
# 3 6 7 7
# 4 9 6 4
# 5 2 NA 8
df2 <- dfmaker()
df2
# X Y Z
# 1 5 10 4
# 2 7 2 NA
# 3 10 7 4
# 4 4 1 9
# 5 8 2 3
df3 <- dfmaker()
You can transform the data.frame to an array and sum them using apply function.
install.package('abind')
library(abind)
df <- abind(list(df1,df2), along = 3)
results <- apply(df, MARGIN = c(1,2), FUN = function(x) sum(x, na.rm = TRUE))
results

Resources