I have 5 different data frames. I want to merge them together based on the common columns that they have. However how can I do it? I read about multimerge but I am not quite sure how to do it. For example my csv files are like
1st data frame df1
country year weather temperature
2nd data frame df2
country year region
3rd data frame df3
country humidity weather year temperature region
4th data frame df4
country region weather humidity temperature
Thus my final data frame should look like
country year region
(since these columns are in common)
should I use
total < - multimerge(df1,df2,df3,df4, by = ["country,year,region"]
However this throws an error.
Do you suggest another way that perhaps automatically finds the common columns and drops the rest?
The multimerge function you refer to is from package mergeutils.
You can use purely base R solution like this
Reduce(function(x,y) merge(x,y,by=c("country","year","region")),list(df1,df2,df3,df4))
Related
I have a dataframe which structure looks like this
I would like to create reshuffle the way data is presented by creating a new data frame where I summarise the data above and it looks like this:
Therefore, for each European country, I will be creating 4 variables which are a sum of the capital expenditure variable based on different conditions. Lets take the first one as an example:
This is the sum of total capital expenditure that is directed to Austria (so Destination country= 'Austria') from EU countries (Source country continent=EU).
Can someone indicate the code to create a new df with this structure and create the variable explained above?
Thanks a lot!
Thanks a lot!
So I have one dataset (DF1) that includes baseball players, the year, and their stats in that year. I have another (DF2) that lists the players, the year, and their salary in that year.
I would like to add the salary column information to DF1 when player name AND year match in both datasets.
I tried
DF1$Salary <- DF2$salary[match(Pitching$playerID, Salaries$playerID)]
But realized that if I did this the information was only correct for the first year. I need to only make the match if year and player ID are the same. Can someone help me? Thanks!
I'm trying to assign a variable in one dataframe into multiple rows of another dataframe - namely the AWND variable here (average wind speed).
I'm trying to obtain the AWND from
here
And I am trying to match it with multiple dates based on the date
here
Here's what I've tried so far.
dfNew <- merge(dfWeather, dfFlight, by="DATE")
I'm not sure how to proceed with this.
Should I do a join?
(EDIT: Here's the data- https://shrib.com/#-7dXevTkb12Bt6Kdfxim (this is the dput output of the data I am getting AWND from)
I got the flights data (that I am trying to match dates with) from the nycflights13 package, and then I subset the flights data to include only the carriers that had at least 1000 flights depart from LaGuardia.
The flights data has the date-time class as shown in your tibble. First, make sure that the elements you want to join between are the same i.e. 2013-01-01 05:00:00 will not match with 2013-01-01 in your dfWeather data.frame
# Make sure dates match between data.frames
dfFlight$DATE <- stringr::str_extract(dfFlight$DATE, "\\S*")
# Join AWND wherever dates match to left-hand side
dfNew <- dplyr::left_join(dfFlight, dfWeather, by = "DATE")
I did assume some things about your data since I couldn't fully see what you're working with from screenshot. This is my first answer on Stack Overflow, so feel free to edit or leave me suggestions
I have two data frames. One data frame is called Measurements and has 500 rows. The columns are PatientID, Value and M_Date. The other data frame is called Patients and has 80 rows and the columns are PatientID, P_Date.
Each patient ID in Patients is unique. For each row in Patients, I want to look at the set of measurements in Measurements with the same PatientID (there are maybe 6-7 per patient).
From this set of measurements, I want to identify the one with M_Date closest to P_Date. I want to append this value to Patients in a new column. How do I do this? I tried using ddplyr but can't figure out how to access two data frames at once within this function.
you probably want to install the install.packages("survival") and the neardate function within it to solve your problem.
It has a good example in the documentation
I need to create a bunch of subset data frames out of a single big df, based on a date column (e.g. - "Aug 2015" in month-Year format). It should be something similar to the subset function, except that the count of subset dfs to be formed should change dynamically depending upon the available values on date column
All the subsets data frames need to have similar structure, such that the date column value will be one and same for each and every subset df.
Suppose, If my big df currently has last 10 months of data, I need 10 subset data frames now, and 11 dfs if i run the same command next month (with 11 months of base data).
I have tried something like below. but after each iteration, the subset subdf_i is getting overwritten. Thus, I am getting only one subset df atlast, which is having the last value of month column in it.
I thought that would be created as 45 subset dfs like subdf_1, subdf_2,... and subdf_45 for all the 45 unique values of month column correspondingly.
uniqmnth <- unique(df$mnth)
for (i in 1:length(uniqmnth)){
subdf_i <- subset(df, mnth == uniqmnth[i])
i==i+1
}
I hope there should be some option in the subset function or any looping might do. I am a beginner in R, not sure how to arrive at this.
I think the perfect solution for this might be use of assign() for the iterating variable i, to get appended in the names of each of the 45 subsets. Thanks for the note from my friend. Here is the solution to avoid the subset data frame being overwritten each run of the loop.
uniqmnth <- unique(df$mnth)
for (i in 1:length(uniqmnth)){
assign(paste("subdf_",i,sep=""), subset(df, mnth == uniqmnth[i])) i==i+1
}