Merging data frames into another dataframe - r

I'm working with R statistics. I'm trying to make a data frame that merges other three data frames. Those three data frames have different column names & different row numbers (they don't have row names).
I tried originally to do:
Namenewdf <- data.frame(dataframe1, dataframe2, dataframe3)
R marked an error because of differing number of rows.
Then I tried with the merge function but it also didn't work.
How do I merge the data frames so that the resulting data frames include the original information of the data frames used as arguments, not filling the 'void' rows from the data frames that have fewer rows?

library(rowr)
finaldataframe<-cbind.fill(dataframe1,dataframe2, dataframe3,fill = NA)
finaldataframe[is.na(finaldataframe)]<-""

Related

How do I apply the same changes on multiple data frames in R?

I have a file (named subdatlob) containing a list of data frames (4dfs namely 1,2,3, and 4). For each data frame, I want to implement the following
tri_I=as.triangle(subdatlob[["I"]],origin="AY",dev="DY",value="paid")
triLoB_I = incr2cum(tri_I)
for I = 1,2,3,4 or more generally, for each data frame I in the given list.
How do I do this? I will also be doing this step for a list containing 1,000,000+ data frames.
This inquiry involves applying a function to every data frame and naming the necessary variables for the computation.
#shs's suggestion
lapply(subdatlob, \(x) incr2cumc(as.triangle(x,origin="AY",dev="DY",value="paid")))
worked for me and for my larger list containing more data frames.

How do I merge 2 data frames on R based on 2 columns?

I am looking to merge 2 data frames based on 2 columns in R. The two data frames are called popr and dropped column, and they share the same 2 variables: USUBJID and TRTAG2N, which are the variables that I want to combine the 2 data frames by.
The merge function works when I am only trying to do it based off of one column:
merged <- merge(popr,droppedcol,by="USUBJID")
When I attempt to merge by using 2 columns and view the data frame "Duration", the table is empty and there are no values, only column headers. It says "no data available in table".
I am tasked with replicating the SAS code for this in R:
data duration;
set pop combined1 ;
by usubjid trtag2n;
run;
On R, I have tried the following
duration<- merge(popr,droppedcol,by.x="USUBJID","TRTAG2N",by.y="USUBJID","TRTAG2N")
duration <- merge(popr,droppedcol,by.x="USUBJID","TRTAG2N",by.y="USUBJID","TRTAG2N")
duration <- full_join(popr,droppedcol,by = c("USUBJID","TRTAG2N"))
duration <- merge(popr,droppedcol,by = c("USUBJID","TRTAG2N"))
I would like to see a data frame with the columns USUBJID, TRTAG2N, TRTAG2, and FUDURAG2, sorted by first FUDURAG2 and then USUBJID.
Per the SAS documentation, Combining SAS Data Sets, and confirmed by the SAS guru, #Tom, in comments above, the set with by simply means you are interleaving the datasets. No merge (which by the way is also a SAS method which you do not use) is taking place:
Interleaving uses a SET statement and a BY statement to combine
multiple data sets into one new data set. The number of observations
in the new data set is the sum of the number of observations from the
original data sets. However, the observations in the new data set are
arranged by the values of the BY variable or variables and, within
each BY group, by the order of the data sets in which they occur. You
can interleave data sets either by using a BY variable or by using an
index.
Therefore, the best translation of set without by in R is rbind(), and set with by is rbind + order (on the rows):
duration <- rbind(pop, combined1) # STACK DFs
duration <- with(duration, duration[order(usubjid, trtag2n),]) # ORDER ROWS
However, do note: rbind does not allow unmatched columns between the concatenated data sets. However, third-party packages allow for unmatched columns including: plyr::rbind.fill, dplyr::bind_rows, data.table::rbindlist.

How do I loop through multiple Data Frames in r to create a vector?

This is the code I am currently using to move data from multiple data frames into a time-ordered vector which I then perform analysis on and graph:
TotalLoans <- c(
sum(as.numeric(HCD2001$loans_all)), sum(as.numeric(HCD2002$loans_all)),
sum(as.numeric(HCD2003$loans_all)), sum(as.numeric(HCD2004$loans_all)),
sum(as.numeric(HCD2005$loans_all)), sum(as.numeric(HCD2006$loans_all)),
sum(as.numeric(HCD2007$loans_all)), sum(as.numeric(HCD2008$loans_all)),
sum(as.numeric(HCD2009$loans_all)), sum(as.numeric(HCD2010$loans_all)),
sum(as.numeric(HCD2011$loans_all)), sum(as.numeric(HCD2012$loans_all)),
sum(as.numeric(HCD2013$loans_all)), sum(as.numeric(HCD2014$loans_all)),
sum(as.numeric(HCD2015$loans_all)), sum(as.numeric(HCD2016$loans_all))
)
I do this four more times with similar data frames that also are similarly formatted as:
Varname$year
Is there a way to loop through these 16 data frames, select an individual column, perform a function on it, and put it into a vector? This is what I have tried so far:
AllList <- list(HCD2001, HCD2002, HCD2003, HCD2004, HCD2005, HCD2006, HCD2007, HCD2008, HCD2009, HCD2010, HCD2011, HCD2012, HCD2013, HCD2014, HCD2015, HCD2016)
TotalLoans <- lapply(AllList,
function(df){
sum(as.numeric(df$loans_all))
return(df)
}
)
However, it returns a Large List with every column from the data frames. All the other posts related to this were for modifying data frames, not creating a new vector with modified values of the data frames.

R: multiple merge with big data frames

I have two big dataframes: DBa and DBb. All colums of DBb are in DBa.
I want to merge these two dataframes by all DBb's colums.
I'm trying:
new <- merge(DBa, DBb, by=colnames(DBb))
but it gives me the error:
Elements listed in `by` must be valid column names in x and y
How can I do it?
I don't think you are looking to merge the data frames, you should put them on top of each other with rbind. With merge you will put two data frames next to eachother, and you only need one common column (the key) which should be unique otherwise the results will be a mess.
So use row bind (rbind). The columns must be in the same order and one data frame must not have more columns than the other.
new_data <- rbind(data1, data2)

How to multiply columns of same names belonging to different data.frame

I am having a problem... I have two data. frames with a lot of columns and these two data.frames are of different length, in fact one has many rows and second data.frame has only one row.... But in both data frames there are columns of same names. Now, I want to multiply the matching columns with each other. I fail to solve it. Please help me.
The command
mapply("*", DataFrame1, DataFrame2)
should work if you want to multiply all columns. If the relevant columns are only a subset of all columns in the data frames, we first need to identify the columns being present in both data frames.
mapply("*", DataFrame1[intersect(names(DataFrame1), names(DataFrame2))],
DataFrame2[intersect(names(DataFrame1), names(DataFrame2))])

Resources