Compare two dataframes to extract the new columns - r

I have two dataframes. As an example:
iris1<-iris[1:3]
iris2<-iris[1:4]
I want to extract the new column by comparing the two dataframes.
I have tried using the compare function from the eponymous package but no joy- it seems that comparing rows is more common. Is there an easy way to do this?

We can use setdiff
setdiff(union(names(iris1), names(iris2)), names(iris1))
Or if one of the dataset have more columns than the other while including all the columns of the second
setdiff(names(iris2), names(iris1))

Related

Merging dataframes in R is only merging half of the data?

I'm trying to merge two dataframes in R. When i try to combine them, it combines all the correct rows and columns, but the data doesn't move over too. I've tried full_join and merge() on the data.
The data I have is all in each original dataframe, and both dataframes share a column in common (which is what I'm using to merge with). Is there a function out there that will let me merge the dataframes and the data, despite only having one column in common?

Is there an R function for merging multiple data frames vertically by rows?

I wanted to know if there's an R function that appends the rows of multiple data frames into a single data frame, vertically. The columns of my datasets are different and have different names, so I can't use rbind(). I have tried bind_rows() and smartbind() but still got the output in a horizontal fashion.
This is less obvious to work out how to do than I thought: if you try
rbind(unname(x), unname(y))
then R throws an error.
The easiest way I found was to convert the data.frames into matrices before binding them:
as.data.frame(rbind(as.matrix(x), as.matrix(y)))
The resulting data.frame has the same column names as the first data.frame given, in this case x. You still need the data.frames to have the same number of columns, however.
Outside of base R, the data.table package has a version of rbind that can ignore names when binding:
library(data.table)
rbind(as.data.table(x), as.data.table(y), use.names = FALSE)
However, this will return a data.table rather than a data.frame, so you'll probably want to convert it back afterwards.

Adding a column with function values to Spark dataframes with SparkR

I am using SparkR to work with some project that includes R and spark in its technology stack.
I have to create new columns with booleans values returned from validation functions. I can do this job easily with spark dataframes and one expression like:
sdf1$result <- sdf1$value == sdf2$value
The problem is when I have to compare two dataframes of different lengths.
What is the best way to operate sdf1 and sdf2 dataframes with a function and assign the value to a new column of sdf1? Let's suppose that I want to generate a column with the minimum length between sdf1 and sdf2.
If you have dataframes of different lengths, I logically assume that you have some column(s) that determines how to line up the values between the two dataframes. You will have to perform a join between the two dataframes on these columns (see SparkR::merge / SparkR::join) and then do your comparison operation to create your new column on the resulting dataframe.

Split dataframe based on one column in r

I have a huge dataframe of around 1M rows and want to split the dataframe based on one column & different ranges.
Example dataframe:
length<-sample(rep(1:400),100)
var1<-rnorm(1:100)
var2<-sample(rep(letters[1:25],4))
test<-data.frame(length,var1,var2)
I want to split the dataframe based on length at different ranges (ex: all rows for length between 1 and 50).
range_length<-list(1:50,51:100,101:150,151:200,201:250,251:300,301:350,351:400)
I can do this by subsetting from the dataframe, ex: test1<-test[test$length>1 &test$length<50,]
But i am looking for more efficient way using "split" (just a line)
range = seq(0,400,50)
split(test, cut(test$length, range))
But do heed Justin's suggestion and look into using data.table instead of data.frame and I'll also add that it's very unlikely that you actually need to split the data.frame/table.

How to multiply columns of same names belonging to different data.frame

I am having a problem... I have two data. frames with a lot of columns and these two data.frames are of different length, in fact one has many rows and second data.frame has only one row.... But in both data frames there are columns of same names. Now, I want to multiply the matching columns with each other. I fail to solve it. Please help me.
The command
mapply("*", DataFrame1, DataFrame2)
should work if you want to multiply all columns. If the relevant columns are only a subset of all columns in the data frames, we first need to identify the columns being present in both data frames.
mapply("*", DataFrame1[intersect(names(DataFrame1), names(DataFrame2))],
DataFrame2[intersect(names(DataFrame1), names(DataFrame2))])

Resources