Problems when trying to join multiple dataframes in R

Problems when trying to join multiple dataframes in R - r

I have three data frames: df1, df2, df3 with the same number of columns and rows, in the same order.Their column names are exactly the same except for the last three columns (42:43) which are specific to each df (e.g.: col41df1, cold42df1, col43df1...col41df2, col42df2, col43df2...col41df3, col42df3, col43df3...).
I wanted to join the three data frames so that the columns that are specific to each would be appended at the end and I would end up with a data frame with 49 columns, rather than 43.I managed that with:
df_merged <- df1 %>%
left_join(df2)%>%
left_join(df3)
However, something goes wrong during the join because df_merged appears to have 6 NA values while none of the original data frames I joined had any.
Help please?
Thanks!

Since the rows are in the same order across all 3 dataframes, there's no need to use a join. Instead, simply grab the 3 columns you want from the second and third dataframes and attach them to the first, as such:
df_merged <- cbind(df1, df2[, c(41:43)], df3[, c(42:43)])
Here is an example:
df1 <- data.frame(id=c(1,2,3), value=c(5,10,25))
df2 <- data.frame(id=c(1,2,3), value=c(3,6,9), morevalues=c(4,5,9))
library(dplyr)
merged_df <- data.frame(df1, df2[,c(2:3)])
merged_df

Related

merge datasets with conditions in r

I have a data set which looks like the following:
The 'X19' is the row number of another data set. How can I merge these two data sets such that 'FNUMM' will be added to each row appears in 'X19'?
Thanks.

This is a merge where one of the keys is the rownames of one dataset. You can do this:
cbind(df1, df2[, "FNUMM"][match(rownames(df1), df2$X19)])
Here is a reproducible example
df1 <- data.frame(ID=c(1L,1L,1L,1L,2L,2L,3L,3L),
var=c(1:8), Smoke=c('No','No','Yes','No','No','No','Yes','No'))
df2 <- data.frame(X19=c(2,5,8), FNUMM=c('a','b','c'))
cbind(df1, df2[, "FNUMM"][match(rownames(df1), df2$X19)])

Try merge(df1, df2, by = 'X19'), where df1 and df2 are your two data frames.

Merge data frame based on column names in r

I have 4 data frames all with the same number of columns and identical column names.
The order of the columns is different.
I want to combine all 4 data frames together and match them with the column name.

Working Azure ML - This was the best option I found to automate this merge.
df <- maml.mapInputPort(1)
df2 <- maml.mapInputPort(2)
if (length(df2.toAdd <- setdiff (names(df), names(df2))))
df2[, c(df2.toAdd) := NA]
if (length(df.toAdd <- setdiff (names(df2), names(df))))
df[, c(df.toAdd) := NA]
df3 <- rbind(df, df2, use.names=TRUE)
maml.mapOutputPort("df3");

Suppose your 4 data frames are named df1, df2, df3 and df4, since the number of columns and the column names are identical, then why not this:
cl <- sort(colnames(df1))
mrg <- rbind(df1[,cl], df2[,cl], df3[,cl], df4[,cl])
If you want to have them in a specific order of columns, for example the order of columns in df2, then you can do this:
mrg <- mrg[,colnames(df2)]

Select columns matching names in a list

I have a data.frame
DF1
a.x.c b.y.l c.z.n d.a.pl f.e.cl
which consists of numeric columns
I also have a list
DF2
a.x.c c.z.n f.e.cl
which contains certain names of columns in DF2
I need to create DF3 that would store only those columns of DF1 which have matching names in DF2.
I have tried which to find indexes of columns i need. But problem that i have long name list of columns and which become useless.
Could you please help. Thank you beforehand.

We can use intersect to get the names that are common in both the datasets and use that to subset the columns of 'DF1' to create 'DF3'.
DF3 <- DF1[intersect(names(DF1),names(DF2))]
DF3
# a.x.c c.z.n
#1 1 7
#2 2 8
#3 3 9
data
DF1 <- data.frame(a.x.c = 1:3, b.y.l= 4:6, c.z.n=7:9)
DF2 <- list(a.x.c= 1:5, c.z.n=8:15, z.l.y=22:29)

How to substitute some values of one dataframe with other data frame in R?

I have two large dataframe, with the same column and row, but I need to substitute the NA of the first, based on the second. For example assume data frame "DF1" is
DF1 <- data.frame(a=c(1,NA,3), b=c(4,NA,6))
and "DF2" is
D2 <- data.frame(a=c(NA,2,NA), b=c(3,5,6))
When there is NA in the "DF1", I want to substitute "DF1" with "DF2", and create a new "DF3", i.e
a b
1 4
2 5
3 6
Could you help me with this please?

This should do the trick:
DF3 <- DF1
replace.bool.matrix <- is.na(DF1)
DF3[replace.bool.matrix] <- DF2[replace.bool.matrix]
Explanation:
We create DF3, which is a copy of DF1. Then we make a logical matrix replace.bool.matrix which we use to select the values in DF3 to replace, as well as the values in DF2 to replace them with.
This makes use of select operations on data frames, for which there are many tutorials.

This is much easier with the match() function:
df1 <- data.frame(steps=c(NA,NA,NA,NA,NA,NA,NA,NA),date=c('2012-10-01','2012-10-01','2012-10-01','2012-10-01','2012-10-01','2012-10-01','2012-10-02','2012-10-02'), interval=c(0,5,10,15,20,25,0,5))
df2 <- data.frame(Interval=c(0,5,10,15,20,25),x=c(1.716,0.339,0.132,0.151,0.075,2.094))
if (is.na(df1$steps)==TRUE) df1$steps <- df2$x[match(df1$interval,df2$Interval)]

creating new dataframe with matching ids in two different table that do not match

I am trying to merge two dataframes with ids, I want to merge first all the ids that match and then find that doesn't match, I found the merge function which can merge the common ids.for example:
m1 = merge(df1, df2, by=c("id"))
Now I am trying to create a new dataframe with ids of dataframe 2 that do not match dataframe 1.
Could you please advise me which command should I look for?
For example:
I have the following two datasets:
df1
df2
I am trying to create a new dataframe with ids from df2 that not in df1. for example id = "a3" and "c3" in df2.
my sample data:
df1 =data.frame(id= c("a1","a2","b1","b2","c1","c2"), value= 1:6)
df2 =data.frame(id= c("a1","a2","a3","b1","c1","c3"), value= 7:12)
Many thanks, Ayan

If you want to use merge, here is one way to do it:
df_merged <- merge(df2, df1, by.x="id", by.y="id", all.x=TRUE)
df_merged[is.na(df_merged$value.y),]
id value.x value.y
3 a3 9 NA
6 c3 12 NA
Since your column names are in both data.frames identical and merge merges by common column names, you have to tell the function the column names explicitly that you want to use, here id.
But you should ask yourself if you really want to merge here? If you just want those rows in df2 that are not in df1, why not use something like this?
df2[!(df2$id %in% df1$id), ]
id value
3 a3 9
6 c3 12

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Problems when trying to join multiple dataframes in R - r

Related

merge datasets with conditions in r

Merge data frame based on column names in r

Select columns matching names in a list

How to substitute some values of one dataframe with other data frame in R?

creating new dataframe with matching ids in two different table that do not match

Categories

Resources