merge datasets with conditions in r - r

I have a data set which looks like the following:
The 'X19' is the row number of another data set. How can I merge these two data sets such that 'FNUMM' will be added to each row appears in 'X19'?
Thanks.

This is a merge where one of the keys is the rownames of one dataset. You can do this:
cbind(df1, df2[, "FNUMM"][match(rownames(df1), df2$X19)])
Here is a reproducible example
df1 <- data.frame(ID=c(1L,1L,1L,1L,2L,2L,3L,3L),
var=c(1:8), Smoke=c('No','No','Yes','No','No','No','Yes','No'))
df2 <- data.frame(X19=c(2,5,8), FNUMM=c('a','b','c'))
cbind(df1, df2[, "FNUMM"][match(rownames(df1), df2$X19)])

Try merge(df1, df2, by = 'X19'), where df1 and df2 are your two data frames.

Related

r: Reshape data.frame

How can I do this reshape of the data.frame so I can have a new column with the cell lines and another column for each gene without changing the rest
Considering that df is the name of your dataframe, then:
df2 <- as.data.frame(t(df))[-1,]
colnames(df2) <- df$Geneid

Problems when trying to join multiple dataframes in R

I have three data frames: df1, df2, df3 with the same number of columns and rows, in the same order.Their column names are exactly the same except for the last three columns (42:43) which are specific to each df (e.g.: col41df1, cold42df1, col43df1...col41df2, col42df2, col43df2...col41df3, col42df3, col43df3...).
I wanted to join the three data frames so that the columns that are specific to each would be appended at the end and I would end up with a data frame with 49 columns, rather than 43.I managed that with:
df_merged <- df1 %>%
left_join(df2)%>%
left_join(df3)
However, something goes wrong during the join because df_merged appears to have 6 NA values while none of the original data frames I joined had any.
Help please?
Thanks!
Since the rows are in the same order across all 3 dataframes, there's no need to use a join. Instead, simply grab the 3 columns you want from the second and third dataframes and attach them to the first, as such:
df_merged <- cbind(df1, df2[, c(41:43)], df3[, c(42:43)])
Here is an example:
df1 <- data.frame(id=c(1,2,3), value=c(5,10,25))
df2 <- data.frame(id=c(1,2,3), value=c(3,6,9), morevalues=c(4,5,9))
library(dplyr)
merged_df <- data.frame(df1, df2[,c(2:3)])
merged_df

Merge data frame based on column names in r

I have 4 data frames all with the same number of columns and identical column names.
The order of the columns is different.
I want to combine all 4 data frames together and match them with the column name.
Working Azure ML - This was the best option I found to automate this merge.
df <- maml.mapInputPort(1)
df2 <- maml.mapInputPort(2)
if (length(df2.toAdd <- setdiff (names(df), names(df2))))
df2[, c(df2.toAdd) := NA]
if (length(df.toAdd <- setdiff (names(df2), names(df))))
df[, c(df.toAdd) := NA]
df3 <- rbind(df, df2, use.names=TRUE)
maml.mapOutputPort("df3");
Suppose your 4 data frames are named df1, df2, df3 and df4, since the number of columns and the column names are identical, then why not this:
cl <- sort(colnames(df1))
mrg <- rbind(df1[,cl], df2[,cl], df3[,cl], df4[,cl])
If you want to have them in a specific order of columns, for example the order of columns in df2, then you can do this:
mrg <- mrg[,colnames(df2)]

Extract list of non-matches in R

So I have two dataframes, and both have one column that represents an ID number linked to a DNA sequence, and another column has the DNA sequence. My two dataframes are either the raw data, or data that have been filtered to only include a subset of the raw data. What I'm now interested in doing is generating a .csv of all the sequences in the raw dataframe that don't have a match to the stuff in the filtered dataframe.
So as an example of the goal, I'll define a couple dataframes here with two columns (col1 and col2):
col1a<-c(1,2,3,4,5,6)
col2a<-c("a","t","a","t","a","g")
col1b<-c(1,3,5,6)
col2b<-c("a","a","a","g")
df1<-data.frame(col1a,col2a)
df2<-data.frame(col1b,col2b)
my output wants to be this third dataframe (df3):
col1c <- c(2,4)
col2c <- c("t","t")
df3 <- data.frame(col1c,col2c)
I know I can use %in%. I can get this far:
IN <- sum(df1$col1a %in% df2$col1b) #Output = 4
NOTIN <- sum(!df1$col1a %in% df2$col1b) #Output = 2
So now I'm looking for a way to export the rows referred to from "NOTIN" such that they can be written as a table. I want to generate the example dataframe I called df3 earlier, as my output.
Any help or suggestions are much appreciated :)
If df1 contains all the entries in df2, it's as simple as
df1[!df1$col1a %in% df2$col1b, ]
You can use an anti_join:
library(dplyr)
anti_join(df1, df2, by = c("col1a" = "col1b"))
You can do this in data.table as well:
library(data.table)
df1 <- data.table(df1, key = col1a)
df2 <- data.table(df2, key = col1b)
df1[!df2]
With version 1.9.5 (On GithHub, not on CRAN yet), you can use on = syntax instead of setting a key :
df1[!df2, on = c(col1a = "col1b")]

creating new dataframe with matching ids in two different table that do not match

I am trying to merge two dataframes with ids, I want to merge first all the ids that match and then find that doesn't match, I found the merge function which can merge the common ids.for example:
m1 = merge(df1, df2, by=c("id"))
Now I am trying to create a new dataframe with ids of dataframe 2 that do not match dataframe 1.
Could you please advise me which command should I look for?
For example:
I have the following two datasets:
df1
df2
I am trying to create a new dataframe with ids from df2 that not in df1. for example id = "a3" and "c3" in df2.
my sample data:
df1 =data.frame(id= c("a1","a2","b1","b2","c1","c2"), value= 1:6)
df2 =data.frame(id= c("a1","a2","a3","b1","c1","c3"), value= 7:12)
Many thanks, Ayan
If you want to use merge, here is one way to do it:
df_merged <- merge(df2, df1, by.x="id", by.y="id", all.x=TRUE)
df_merged[is.na(df_merged$value.y),]
id value.x value.y
3 a3 9 NA
6 c3 12 NA
Since your column names are in both data.frames identical and merge merges by common column names, you have to tell the function the column names explicitly that you want to use, here id.
But you should ask yourself if you really want to merge here? If you just want those rows in df2 that are not in df1, why not use something like this?
df2[!(df2$id %in% df1$id), ]
id value
3 a3 9
6 c3 12

Resources