Match data points from two datasets [duplicate] - r

This question already has answers here:
How to join (merge) data frames (inner, outer, left, right)
(13 answers)
Closed 5 months ago.
I've got two datasets with different numbers of observations that I need to merge. The First one is like this:
state1 state2
1 2
2 3
3 4
4 1
The second one is like this:
state A state B distance
1 1 0
1 2 1
1 3 1
1 4 2
2 1 1
2 2 0
2 3 2
2 4 2
3 1 1
3 2 2
3 3 0
3 4 3
4 1 2
4 2 2
4 3 3
4 4 0
... ... ...
I would like to combine the distance column of the second dataset to the first one like this:
state1 state2 distance
1 2 1
2 3 2
3 4 3
4 1 2
Is there a way to add the distance info into the first dataset based on the info from the second dataset in R? Thanks.

We can use dplyr::left_join
library(dplyr)
left_join(df1, df2, by = c('state 1' = 'state A', 'state 2' = 'state B'))

Related

Keep rows with duplicated id in R [duplicate]

This question already has answers here:
Filtering a dataframe showing only duplicates
(4 answers)
Finding ALL duplicate rows, including "elements with smaller subscripts"
(9 answers)
Closed 3 months ago.
I want to select rows with duplicated id but keep both rows in the resulting dataset. Here is the original dataset:
dd <- data.frame(id=c(1,1,2,2,3,4,4,5,6,7,7),
coder=c(1,2,1,2,1,1,2,1,1,1,2)
)
dd
id coder
1 1
1 2
2 1
2 2
3 1
4 1
4 2
5 1
6 1
7 1
7 2
In the end, I want this:
id coder
1 1
1 2
2 1
2 2
4 1
4 2
7 1
7 2
I tried subset(dd, duplicated(id)) but it only kept one row:
id coder
1 2
2 2
4 2
7 2
How to achieve that?

number similar/duplicated rows in R [duplicate]

This question already has answers here:
How to create a consecutive group number
(13 answers)
How to convert three columns into single one
(2 answers)
Assign unique ID per multiple columns of data table
(2 answers)
Closed 4 years ago.
Hi I'm using R and I have a data like this:
1 2 3 4 5
1 2 1 2 2
3 4 1 2 3
1 2 3 4 5
3 4 1 2 3
I want to number the identical lines together with the same number, for the above ex
1 2 3 4 5 --> 1
1 2 1 2 2 --> 2
3 4 1 2 3 --> 3
1 2 3 4 5 --> 1
3 4 1 2 3 --> 3
Does any know how to do this in R (for both numeric case and character case)?
Your help is really appreciated!
This is your data:
df <- data.frame(a=c(1,1,3,1,3),
b=c(2,2,4,2,4),
c=c(3,1,1,3,1),
d=c(4,2,2,4,2),
e=c(5,2,3,5,3))
Approach 1:
You would need the data.table package to perform the below approach:
library(data.table)
i <- interaction(data.table(df), drop=TRUE)
df.out <- cbind(df, id=factor(i,labels=length(unique(i)):1))
This would give you the following:
# a b c d e id
#1 1 2 3 4 5 1
#2 1 2 1 2 2 3
#3 3 4 1 2 3 2
#4 1 2 3 4 5 1
#5 3 4 1 2 3 2
Approach 2:
Another approach is by using the plyr package, as follows:
library(plyr)
.id <- 0
df.out <- ddply(df, colnames(df), transform, id=(.id<<-.id+1))
This will give you the following output:
# a b c d e id
#1 1 2 1 2 2 1
#2 1 2 3 4 5 2
#3 1 2 3 4 5 2
#4 3 4 1 2 3 3
#5 3 4 1 2 3 3
Hope it helps.

Merge Two data frames column by column. (creating a table out of two data frames) [duplicate]

This question already has answers here:
How to join (merge) data frames (inner, outer, left, right)
(13 answers)
Closed 5 years ago.
I have two data frames of this format
df1=data.frame( Date = c(1,2,3,4,5), customer1 = c(6,7,8,4,2), customer2 =
c(2,3,6,5,3)... )
df2=data.frame( Date = c(1,2,3,4,5), order1 = c(0,1,3,0,1), order2 =
c(0,1,0,0,2).. )
i want a result that intertwines the two data frames along with the date column.
Date Customer1 Order1 Date Customer2 Order2 Date ....
1 6 0 1 2 0 1
2 7 1 2 3 1 2
3 8 3 3 6 0 3
4 4 0 4 5 0 4
5 2 1 5 3 2 5
I used a for loop running along the no. of columns and cbind to achieve the desired result. I wanted to know if there are simpler, more efficient ways to do it.
We can use order on the sequence of columns, then get the names of dataset based on the order, rbind with the 'Date' column to create a vector of column names. This can be used to order the columns in the full dataset (cbind(df1, df2))
cbind(df1, df2)[c(rbind('Date', matrix(c(names(df1)[-1],
names(df2)[-1])[order(c(seq_along(df1)[-1], seq_along(df2)[-1]))], ncol=2)))]
# Date customer1 order1 Date.1 customer2 order2
#1 1 6 0 1 2 0
#2 2 7 1 2 3 1
#3 3 8 3 3 6 0
#4 4 4 0 4 5 0
#5 5 2 1 5 3 2
NOTE: It is better to have unique column names in the dataset
Following solution doesn't have the same order of the columns or the Date doesn't appear multiple times. I really don't think there are a lot of reasons for either, so I am leaving the following solution.
It assumes Data is unique, if not don't use this approach.
merge(df1, df2, key = date)
# Date customer1 customer2 order1 order2
# 1 1 6 2 0 0
# 2 2 7 3 1 1
# 3 3 8 6 3 0
# 4 4 4 5 0 0
# 5 5 2 3 1 2

Find minimal value for a multiple same keys in table [duplicate]

This question already has answers here:
Extract row corresponding to minimum value of a variable by group
(9 answers)
Closed 5 years ago.
I have a table which contains multiple rows of the different data for a key of multiple columns.
Table looks like this:
A B C
1 1 1 2
2 1 1 3
3 2 1 4
4 1 2 4
5 2 2 3
6 2 3 1
7 2 3 2
8 2 3 2
I also discovered how to remove all of the duplicate elements using unique command for multiple colums, so the data duplication is not a problem.
I would like to know how to for every key(columns A and B in example) in the table to find only the minimum value in third column(C column in table)
At the end table should look like this
A B C
1 1 1 2
3 2 1 4
4 1 2 4
5 2 2 3
6 2 3 1
Thanks for any help. It is really appreciated
In any question, feel free to ask
con <- textConnection(" A B C
1 1 1 2
2 1 1 3
3 2 1 4
4 1 2 4
5 2 2 3
6 2 3 1
7 2 3 2
8 2 3 2")
df <- read.table(con, header = T)
df[with(df, order(A, B, C)), ]
df[!duplicated(df[1:2]),]
# A B C
# 1 1 1 2
# 3 2 1 4
# 4 1 2 4
# 5 2 2 3
# 6 2 3 1

merge data by groups and by common ID (IDs duplicated outside groups) [duplicate]

This question already has answers here:
How to join (merge) data frames (inner, outer, left, right)
(13 answers)
Closed 6 years ago.
This is not a duplicated question to How to join (merge) data frames. You can perform the left.merge inside the group but not to the whole data set. The ids are unique inside group, not acroos group. By not grouping and using a left.merge, you willl mess up the data.
I have a data with many groups (Panel data/Time seriers). Within the group, I want to merge the data by a common ID. And apply the same merge across all the groups that I have(same merge for all other groups).
#sample data
a<-data.frame(c(1:4,1:4),1,c('a','a','a','a','b','b','b','b'))
b<-data.frame(c(2,4,2,4),10,c('a','a','b','b'))
colnames(a)<-c('id','v','group')
colnames(b)<-c('id','v1','group')
> a
id v group
1 1 1 a
2 2 1 a
3 3 1 a
4 4 1 a
5 1 1 b
6 2 1 b
7 3 1 b
8 4 1 b
> b
id v1 group
1 2 10 a
2 4 10 a
3 2 10 b
4 4 10 b
I tried to use the dplyr group_by (group) and then merge(a,b,by='id',all.x=TRUE), but not sure how to apply dplyr to two data sets
desired output (left merge)
id v group.x v1 group.y
1 1 a NA <NA>
2 1 a 10 a
3 1 a NA <NA>
4 1 a 10 a
1 1 b NA <NA>
2 1 b 10 b
3 1 b NA <NA>
4 1 b 10 b
You can just include group in the by argument for the join:
a %>% left_join(b, by=c("id","group"))
id v group v1
1 1 1 a NA
2 2 1 a 10
3 3 1 a NA
4 4 1 a 10
5 1 1 b NA
6 2 1 b 10
7 3 1 b NA
8 4 1 b 10
This gives you only one "group" column, but v1 will be NA for cases where there's no matching row in b, so creating two separate "group" columns is redundant. Isn't that better, given that group (presumably) represents the same underlying division of the data in both data frames?

Resources