Can I use join with across from dplyr? - r

I do not want to use all variables in data frame. I was thinking of something like this, but it comes up with an error.
df1 %>%
full_join(df2, by = 'DATE':'Vz').
Both data frame contain the same variables from DATE to Vz. I am interested in bringing the non-zero values of df2 to df1.
Thank you.

You can join by multiple columns with dplyr. Let me know if this answers your question:
library(dplyr)
full_join(df1, df2,
by=colnames(d1)[which(colnames(df1)=="DATE"):which(colnames(df1)=="Vz")])

Related

Adding a column of a dataframe to another dataframe if they match in another column

For a project in university, i'm working with large stock price dataframe's.
I have two dataframes.
Dataframe df1 includes the daily close prices over a certain time. The header includes the stock's shortcut.
Dataframe df2 includes the stock's shortcut in the first column and in the second column, there is the industry name of the stock's firm. IMPORTANT to know is that in df2 there are more values than in df1 (but every value in df1 should be in df2)
Is there any possibility to integrate the second column of df2 into the first row of df1 if they match (=> value from df1 header = df2 first column)
# Example Code
df1=as.data.frame(matrix(runif(20,min=0,max=1), nrow = 4))
df1
df2 <- as.data.frame(c("V1","V829","V2","V3","V493","V4","V5","V6","V992","V7"))
df2$insert <- c("test1","test2","test3","test4","test5","test6","test7","test8","test9","test10")
names(df2) <- c("Column2","test")
df1
df2
# Now insert/combine df2$test in (or over) df1[1,] as a row, if names(df1) and df2$Column2 matches
enter image description here (DataFrame df1)
enter image description here (DataFrame df2)
Thank you for your answers guys!
Nino
I would recommend you reshape your df1 into long format (see Reshaping data.frame from wide to long format).
library(tidyr)
df1_long <- df1 %>% gather(Instrument, value, -X)
I would organize the file this way because that makes it easier to use left__join() to match the data frames (see a description of mutating joins on the data wrangling cheat sheet).
df <- left_join(df1_long, df2, by = "Instrument")
If you want you can then make your dataframe wide again using the spread() function, which is the reverse of gather().
For the future I recommend you generate a reproducible example, rather than linking image files of your dataframes, as the links might expire, and it makes it generally less likely to get an answer on Stack Overflow.

Unable to perform merge: what is the difference in these dataframes?

I have two dataframes annotatedFile and subOutFile that contain similar data. I am retrieving annotatedFile from an xlsx file using readxl::read_xlsx. subOutFile is retrived using read.delim2 from a tab-separated text file. They contain similar columns but annotatedFile has an extra column - accuracy that I want to merge into the subOutFile dataframe
This is what the data frames look like:
My merge command was:
subOutFile = subOutFile %>% merge(subOutFile, annotatedFile[,c("StimName", "Accuracy")], by = "StimName", all.x = TRUE)
From the images above, you can see that the structure of the two dataframes looks different. One shows the vector-like notification [1:180] and the other does not. Is there something different about these dataframes which is why I am not able to perform the merge? Or is there another reason?
When you write df1 %>% merge(df1, df2), there is one too many df1.
It's either df1 <- merge(df1, df2) or df1 <- df1 %>% merge(df2). For the latter, there is a shortcut, but you will have to load the magrittr package: df1 %<>% merge(df2).

how do I create a dataframe based on values from another dataframe?

i have a data frame, from this I created another dataframe,
which consists of a selection after a number of conditions. The result is a number of countries that satisfy the condition n > 11.
Now I want to continue working with only these countries. How can I copy values from the first dataset based on the countries in the selection?
My first df looks like:
and the second (so the selection of countries):
In my final df I need every column and row from my 1st df (but only for the countries present in the second df)
I'm not sure about your data and reason using second dataframe, but let first and second data as df1 and df2, then
library(dplyr)
df1 %>%
filter(Country.o... %in% df2$Country.o...)
(I cannot find out what is the column name. You should not post your data as an imange)
Two options -
Do an inner join
a) Base R -
df3 <- merge(df1, df2, by = 'Country')
b) dplyr -
library(dplyr)
df3 <- inner_join(df1, df2, by = 'Country')
Instead of creating df2 from df1, I would just filter the 1st one to get the resulting dataframe.
df3 <- df1 %>% group_by(Country) %>% filter(n() > 11)

Merging of dataframes with different number of columns

I have these two dataframes.
DF1:
DF2:
I want my output DF to be be DF1 along with the value of X1 from DF2. That is, this is how I want the output to look like:
I have tried using merge and join, but am unable to get this required output. The primary problem seems to be due to the fact that the ID in DF1 has multiple matches in DF2. The resulting dataframe I get has all the rows, somewhat like this:
How do I fix this?
Thanks.
(apologies for table images, I wasn't able to figure out how to create a table on the fly)
You can use match to return the first hit in DF2.
DF1$X1 <- DF2$X1[match(DF1$ID, DF2$ID)]
Keep unique values in terms of ID in the second data frame and then join:
library(tidyverse)
DF2 <- DF2 %>%
distinct(ID, .keep_all = TRUE) %>%
select(ID, X1)
res <- DF1 %>%
inner_join(DF2, by = "ID")
glimpse(res)

Compare two data frame values for retrieve extra values from one them in R

I have two dataframes, df1 with 76349 rows and 4 columns (long, lat, country, year), and df2 with 2999 rows and 2 columns (long, lat). All the coords in df2 are mutual coordinates with df1. I need obtain the values of country and year of df1 for the same values of df2. I have trying solve using merge function. Apparently the output are correct, showing the values of country and year in df1 for coords identical of df2, however the number of rows in output is bigger than df2 (data refference). I tried to remove NA values and duplicated values, but the output remains bigger than df2.
How can I obtain values from country and years in df1 for the exactly values in df2?
I use the comand:
x = merge(df1,df2, by=c('long','lat'))
Thank for helping!
Here: link for data download.
https://www.dropbox.com/sh/zr9n56by0qs3h4l/AABjUO6wVi4zzrY2LWHH5P65a?dl=0
The package dplyr has several join options that may be helpful.
If I understand your question, the function 'inner_join' in that package should return what you want, i.e.:
library(dplyr)
x = inner_join(df2, df1)

Resources