How to merge two dataframes R [duplicate] - r

This question already has answers here:
How to join (merge) data frames (inner, outer, left, right)
(13 answers)
Closed 6 years ago.
I have two data frames with some overlapping variables and some not. Each variable has an attribute (frequency of variable) and I need to combine the two into one dataframe where the result is two columns of attributes, one corresponding to the first dataframe, and the second corresponding to the first data frame, and the union of all the variables are represented.
dataframe 1:
var frequency
a 3
b 2
d 5
dataframe 2:
var frequency
a 2
b 3
c 3
Resulting dataframe:
var frequency1 frequency2
a 3 2
b 2 3
c 0 3
d 5 0
Thanks for your help.

This seems to work for me:
df1 = read.csv('df1.csv')
df2 = read.csv('df2.csv')
df1$frequency1 = df1$frequency
df2$frequency2 = df2$frequency
df1$frequency = NULL
df2$frequency = NULL
df = merge(df1, df2, by = 'var', all = TRUE)
print(df)
The idea is that if you want frequency1 and frequency2 to be the names in the final merged dataframe, you can rename them in df1 and df2 before merging. This produces:
var frequency1 frequency2
1 a 3 2
2 b 2 3
3 d 5 NA
4 c NA 3

Related

R, dataframe manipulation, sort [duplicate]

This question already has answers here:
How to join (merge) data frames (inner, outer, left, right)
(13 answers)
Closed 2 years ago.
I have the following two DFs, each with two columns (stringIDs, and counts). Data looks like:
I'd like to transform this to 1 DF, sorted A to Z, (with all stringIDs from both DFs, counts from DF1, counts from DF2). If the stringID does not exist, the corresponding count should be 0. Is there a package in R that will allow me to do this transformation?
I have:
I'd like the data transformed to:
Try this. It is a merge task:
#Data
df1 <- data.frame(stringid=paste0('string',1:4),counts=c(10,11,11,13),stringsAsFactors = F)
df2 <- data.frame(stringid=paste0('string',c(1,3:5)),counts=c(10,11,11,10),stringsAsFactors = F)
#Merge
dfmerged <- merge(df1,df2,by='stringid',all=T,suffixes = c('_df1','_df2'))
dfmerged[is.na(dfmerged)]<-0
stringid counts_df1 counts_df2
1 string1 10 10
2 string2 11 0
3 string3 11 11
4 string4 13 11
5 string5 0 10

Reshaping dataframe to list values over unique id - back and forth [duplicate]

This question already has answers here:
Collapse text by group in data frame [duplicate]
(2 answers)
Collapse / concatenate / aggregate a column to a single comma separated string within each group
(6 answers)
Closed 3 years ago.
I want to condense information in a dataframe to reduce the number of rows.
Consider the dataframe:
df <- data.frame(id=c("A","A","A","B","B","C","C","C"),b=c(4,5,6,1,2,7,8,9))
df
id b
1 A 4
2 A 5
3 A 6
4 B 1
5 B 2
6 C 7
7 C 8
8 C 9
I want to collapse the dataframe to all unique values of "id" and list the values in variable b. The result should look like
df.results <- data.frame(id=c("A","B","C"),b=c("4,5,6","1,2","7,8,9"))
df.results
id b
1 A 4,5,6
2 B 1,2
3 C 7,8,9
A solution for the first step is:
library(dplyr)
df.results <- df %>%
group_by(id) %>%
summarise(b = toString(b)) %>%
ungroup()
How would you turn df.results back into df?

Delete Duplicates when Merging DF [duplicate]

This question already has answers here:
Select only the first row when merging data frames with multiple matches
(4 answers)
Closed 5 years ago.
I know, I know.... Another merging Df question, please hear me out as I have searched SO for an answer on this but none has come.
I am merging two Df's, one smaller than the other, and doing a left merge, to match up the longer DF to the smaller DF.
This works well except for one issue, rows get added to the left (smaller) df when the right(longer) df has duplicates.
An Example:
Row<-c("a","b","c","d","e")
Data<-(1:5)
df1<-data.frame(Row,Data)
Row2<-c("a","b","b","c","d","e","f","g","h")
Data2<-(1:9)
df2<-data.frame(Row2,Data2)
names(df2)<-c("Row","Data2")
DATA<-merge(x = df1, y = df2, by = "Row", all.x = TRUE)
>DATA
Row Data Data2
1 a 1 1
2 b 2 2
3 b 2 3
4 c 3 4
5 d 4 5
6 e 5 6
See the extra "b" row?, that is what I want to get rid of, I want to keep the left DF, but very strictly, as in if there are 5 rows in DF1, when merged I want there to only be 5 rows.
Like this...
Row Data Data2
1 a 1 1
2 b 2 2
3 c 3 4
4 d 4 5
5 e 5 6
Where it only takes the first match and moves on.
I realize the merge function is only doing its job here, so is there another way to do this to get my expected result? OR is there a post-merge modification that should be done instead.
Thank you for your help and time.
Research:
How to join (merge) data frames (inner, outer, left, right)?
deleting duplicates
Merging two data frames with different sizes and missing values
We can use the duplicated function as follows:
DATA[!duplicated(DATA$Row),]
Row Data Data2
1 a 1 1
2 b 2 2
4 c 3 4
5 d 4 5
6 e 5 6
It´s possible also like
merge(x = df1, y = df1[unique(df1$Row),], by = "Row", all.x = TRUE)
# Row Data.x Data.y
#1 a 1 1
#2 b 2 2
#3 c 3 3
#4 d 4 4
#5 e 5 5
Since you only want the first row and don't care what variables are chosen, then you can use this code (before you merge):
Row2<-c("a","b","b","c","d","e","f","g","h")
Data2<-(1:9)
df2<-data.frame(Row2,Data2)
library(dplyr)
df2 %>%
group_by(Row2) %>%
slice(1)

How to associate the values of a column to another column of a different data frame [duplicate]

This question already has answers here:
How to join (merge) data frames (inner, outer, left, right)
(13 answers)
Closed 6 years ago.
I have two different data frame, in one of them I have the information id , in the other I have the id and a vector n, I would like associate the values of n to id in the first dataframe.
for exemple:
df1 <-data.frame(
id = c(1,1,1,2,2,3,3,3,3)
)
df2 <- data.frame(
id = c(1,2,3),
n = c(5,9,8)
)
I would like as output:
df1:
id n
1 5
1 5
1 5
2 9
2 9
3 8
3 8
3 8
3 8
df1 <- merge(df1, df2, by = c("id") )

How can I merge two dataframes if two cols have to be the same? [duplicate]

This question already has answers here:
How to join (merge) data frames (inner, outer, left, right)
(13 answers)
Closed 6 years ago.
I have two data frames. For example the df1 looks like:
Name Month Number
1.H 1 8
2.H 2 7
3.H 3 6
4.A 1 9
5.A 2 10
6.A 3 11
And df2 looks like:
Name Month index
1.H 1 3
2.H 2 2
3.H 3 1
4.A 1 3
5.A 2 5
6.A 3 9
And I want to merge it to the following df:
Name Month Number index
1.H 1 8 3
2.H 2 7 2
3.H 3 6 1
4.A 1 9 3
5.A 2 10 5
6.A 3 11 9
How can I merge the two df's to this df?
I have already tried the merge function by.x and by.y but that only allows merging by one column, but I want also the second column.
You can merge on more than one column at a time:
merge(df1, df2, by = c('Name', 'Month'))
In fact, that should be the default, as the default value of by is intersect(names(df1), names(df2)).
There are a lot of different ways to do this. The other two answers give base ways to do it. Here are two other ways with packages.
You can also use the sqldf package:
sqldf("select a.*, b.index from df1 as a join df2 as b on a.name = b.name and a.month = b.month")
You can use the dplyr package:
inner_join(df1, df2, by = c("name", "month"))

Resources