I have a dataframe which structure looks like this
I would like to create reshuffle the way data is presented by creating a new data frame where I summarise the data above and it looks like this:
Therefore, for each European country, I will be creating 4 variables which are a sum of the capital expenditure variable based on different conditions. Lets take the first one as an example:
This is the sum of total capital expenditure that is directed to Austria (so Destination country= 'Austria') from EU countries (Source country continent=EU).
Can someone indicate the code to create a new df with this structure and create the variable explained above?
Thanks a lot!
Thanks a lot!
Related
I have the following data table in R, which I need to collapse for streamlined data processing. I can do this manually, but I am looking for the most efficient way possible. The data frame looks like this:
and so on. Each age group has 4 observations, 2 male and 2 female (1 of each type). And region consists of city1, city2, city3, etc. which are all ordered the same as the example above. After all age groups are exhausted, the next cityX begins.
I need to combine gender into the total, summing males and females (within type). I also need to combine all age groups to give a population total (sum all age groups). I need to keep type separate, and then later combine them as an additional column. I want the final rows output to be the region. I need the population totals for each year column. So the final output would be like this:
I know this could be done manually by splitting the data frame repeatedly, but what would be the most efficient way to do this?
I have 5 different data frames. I want to merge them together based on the common columns that they have. However how can I do it? I read about multimerge but I am not quite sure how to do it. For example my csv files are like
1st data frame df1
country year weather temperature
2nd data frame df2
country year region
3rd data frame df3
country humidity weather year temperature region
4th data frame df4
country region weather humidity temperature
Thus my final data frame should look like
country year region
(since these columns are in common)
should I use
total < - multimerge(df1,df2,df3,df4, by = ["country,year,region"]
However this throws an error.
Do you suggest another way that perhaps automatically finds the common columns and drops the rest?
The multimerge function you refer to is from package mergeutils.
You can use purely base R solution like this
Reduce(function(x,y) merge(x,y,by=c("country","year","region")),list(df1,df2,df3,df4))
So I have one dataset (DF1) that includes baseball players, the year, and their stats in that year. I have another (DF2) that lists the players, the year, and their salary in that year.
I would like to add the salary column information to DF1 when player name AND year match in both datasets.
I tried
DF1$Salary <- DF2$salary[match(Pitching$playerID, Salaries$playerID)]
But realized that if I did this the information was only correct for the first year. I need to only make the match if year and player ID are the same. Can someone help me? Thanks!
I have two data frames. One data frame is called Measurements and has 500 rows. The columns are PatientID, Value and M_Date. The other data frame is called Patients and has 80 rows and the columns are PatientID, P_Date.
Each patient ID in Patients is unique. For each row in Patients, I want to look at the set of measurements in Measurements with the same PatientID (there are maybe 6-7 per patient).
From this set of measurements, I want to identify the one with M_Date closest to P_Date. I want to append this value to Patients in a new column. How do I do this? I tried using ddplyr but can't figure out how to access two data frames at once within this function.
you probably want to install the install.packages("survival") and the neardate function within it to solve your problem.
It has a good example in the documentation
I have a data set with a number of survey variables. I am looking at the data of one column, but need to split it by the factors in another column. The survey asked gender, and also asked how much the person smoked. I need to compare how much males smoke vs females, but I cannot figure out how to split the data in the column based on the information in another column.
Can someone help?
I don't know if this is what you want:
#some reproduciable data
survey <- data.frame(gender=c('m','m','f','m','f','f','m','m','f'),
smoke=c('little','much','much','little','no','some','little','little','much'))
#gender VS smoke:
table(survey)
#or:
table(survey$gender,survey$smoke)