Merging dataframes in R is only merging half of the data? - r

I'm trying to merge two dataframes in R. When i try to combine them, it combines all the correct rows and columns, but the data doesn't move over too. I've tried full_join and merge() on the data.
The data I have is all in each original dataframe, and both dataframes share a column in common (which is what I'm using to merge with). Is there a function out there that will let me merge the dataframes and the data, despite only having one column in common?

Related

Is there an R function for merging multiple data frames vertically by rows?

I wanted to know if there's an R function that appends the rows of multiple data frames into a single data frame, vertically. The columns of my datasets are different and have different names, so I can't use rbind(). I have tried bind_rows() and smartbind() but still got the output in a horizontal fashion.
This is less obvious to work out how to do than I thought: if you try
rbind(unname(x), unname(y))
then R throws an error.
The easiest way I found was to convert the data.frames into matrices before binding them:
as.data.frame(rbind(as.matrix(x), as.matrix(y)))
The resulting data.frame has the same column names as the first data.frame given, in this case x. You still need the data.frames to have the same number of columns, however.
Outside of base R, the data.table package has a version of rbind that can ignore names when binding:
library(data.table)
rbind(as.data.table(x), as.data.table(y), use.names = FALSE)
However, this will return a data.table rather than a data.frame, so you'll probably want to convert it back afterwards.

Compare two dataframes to extract the new columns

I have two dataframes. As an example:
iris1<-iris[1:3]
iris2<-iris[1:4]
I want to extract the new column by comparing the two dataframes.
I have tried using the compare function from the eponymous package but no joy- it seems that comparing rows is more common. Is there an easy way to do this?
We can use setdiff
setdiff(union(names(iris1), names(iris2)), names(iris1))
Or if one of the dataset have more columns than the other while including all the columns of the second
setdiff(names(iris2), names(iris1))

Merging data frames into another dataframe

I'm working with R statistics. I'm trying to make a data frame that merges other three data frames. Those three data frames have different column names & different row numbers (they don't have row names).
I tried originally to do:
Namenewdf <- data.frame(dataframe1, dataframe2, dataframe3)
R marked an error because of differing number of rows.
Then I tried with the merge function but it also didn't work.
How do I merge the data frames so that the resulting data frames include the original information of the data frames used as arguments, not filling the 'void' rows from the data frames that have fewer rows?
library(rowr)
finaldataframe<-cbind.fill(dataframe1,dataframe2, dataframe3,fill = NA)
finaldataframe[is.na(finaldataframe)]<-""

Adding a column with function values to Spark dataframes with SparkR

I am using SparkR to work with some project that includes R and spark in its technology stack.
I have to create new columns with booleans values returned from validation functions. I can do this job easily with spark dataframes and one expression like:
sdf1$result <- sdf1$value == sdf2$value
The problem is when I have to compare two dataframes of different lengths.
What is the best way to operate sdf1 and sdf2 dataframes with a function and assign the value to a new column of sdf1? Let's suppose that I want to generate a column with the minimum length between sdf1 and sdf2.
If you have dataframes of different lengths, I logically assume that you have some column(s) that determines how to line up the values between the two dataframes. You will have to perform a join between the two dataframes on these columns (see SparkR::merge / SparkR::join) and then do your comparison operation to create your new column on the resulting dataframe.

How to multiply columns of same names belonging to different data.frame

I am having a problem... I have two data. frames with a lot of columns and these two data.frames are of different length, in fact one has many rows and second data.frame has only one row.... But in both data frames there are columns of same names. Now, I want to multiply the matching columns with each other. I fail to solve it. Please help me.
The command
mapply("*", DataFrame1, DataFrame2)
should work if you want to multiply all columns. If the relevant columns are only a subset of all columns in the data frames, we first need to identify the columns being present in both data frames.
mapply("*", DataFrame1[intersect(names(DataFrame1), names(DataFrame2))],
DataFrame2[intersect(names(DataFrame1), names(DataFrame2))])

Resources