R: multiple merge with big data frames - r

I have two big dataframes: DBa and DBb. All colums of DBb are in DBa.
I want to merge these two dataframes by all DBb's colums.
I'm trying:
new <- merge(DBa, DBb, by=colnames(DBb))
but it gives me the error:
Elements listed in `by` must be valid column names in x and y
How can I do it?

I don't think you are looking to merge the data frames, you should put them on top of each other with rbind. With merge you will put two data frames next to eachother, and you only need one common column (the key) which should be unique otherwise the results will be a mess.
So use row bind (rbind). The columns must be in the same order and one data frame must not have more columns than the other.
new_data <- rbind(data1, data2)

Related

Remove rows that exist in another dataframe using two columns

Using this:
list_one[!list_one$letters %in% list_two$letters,]
it is possible to remove rows from one dataframe that exist in another.
How can we transform this command in other to check two columns from both data frames.
example
list_one[![list_one$letters, list_one$id] %in% [list_two$letters, list_two$letters],]

Merging data frames into another dataframe

I'm working with R statistics. I'm trying to make a data frame that merges other three data frames. Those three data frames have different column names & different row numbers (they don't have row names).
I tried originally to do:
Namenewdf <- data.frame(dataframe1, dataframe2, dataframe3)
R marked an error because of differing number of rows.
Then I tried with the merge function but it also didn't work.
How do I merge the data frames so that the resulting data frames include the original information of the data frames used as arguments, not filling the 'void' rows from the data frames that have fewer rows?
library(rowr)
finaldataframe<-cbind.fill(dataframe1,dataframe2, dataframe3,fill = NA)
finaldataframe[is.na(finaldataframe)]<-""

Add a new row to R dataframe - but if does not exist already?

Is there any possiblity to add a new row that does not exist already in a dataframe? I create a big dataframe (100k records) with different combinations of the variables (randomly selected) and I want to add them to the existing dataframe with a condition: all they must be different (at least one variable must be different).
You can use the following if, for example, the dataframe, df1, may or may not exist and you want want append dataframe, dr2 to it:
rbind(if(exists("df1")) df1, df2)
But this can often be avoided by simply creating an empty data frame first. For example if your dataframe contains a single columns x which is of type character, than you could probably do:
df1 <- data.frame(x = character())
rbind(df1, df2)

Extract columns with same names from multiple data frames [R]

I am dealing with about 10 data frames that have the same column names, but different number of rows. I would like to create a list of all columns with the same names.
So, say i have 2 data frames with the same names.
a<-seq(0,20,1)
b<-seq(20,40,1)
c<-seq(10,30,1)
df.abc.1<-data.frame(a,b,c)
a<-seq(20,50,1)
b<-seq(10,40,1)
c<-seq(30,60,1)
df.abc.2<-data.frame(a,b,c)
I know i can create a list from this data such as,
list(df.abc.1$a, df.abc.2$a)
but i don't want to type out my long data frame names and column names.
I was hoping to do something like this,
list(c(df.abc.1, df.abc.2)$a)
But, it returns a list of df.abc.1$a
Perhaps there could be a way to use the grep function across multiple data.frames?
Perhaps a loop could accomplish this task?
Not sure if it's any better, but maybe
lapply(list(df.abc.1, df.abc.2), function(x) x$a)
For more than one column
lapply(list(df.abc.1, df.abc.2), function(x) x[, c("a","b")])

How to multiply columns of same names belonging to different data.frame

I am having a problem... I have two data. frames with a lot of columns and these two data.frames are of different length, in fact one has many rows and second data.frame has only one row.... But in both data frames there are columns of same names. Now, I want to multiply the matching columns with each other. I fail to solve it. Please help me.
The command
mapply("*", DataFrame1, DataFrame2)
should work if you want to multiply all columns. If the relevant columns are only a subset of all columns in the data frames, we first need to identify the columns being present in both data frames.
mapply("*", DataFrame1[intersect(names(DataFrame1), names(DataFrame2))],
DataFrame2[intersect(names(DataFrame1), names(DataFrame2))])

Resources