So I have one dataset (DF1) that includes baseball players, the year, and their stats in that year. I have another (DF2) that lists the players, the year, and their salary in that year.
I would like to add the salary column information to DF1 when player name AND year match in both datasets.
I tried
DF1$Salary <- DF2$salary[match(Pitching$playerID, Salaries$playerID)]
But realized that if I did this the information was only correct for the first year. I need to only make the match if year and player ID are the same. Can someone help me? Thanks!
Related
I have a dataframe which structure looks like this
I would like to create reshuffle the way data is presented by creating a new data frame where I summarise the data above and it looks like this:
Therefore, for each European country, I will be creating 4 variables which are a sum of the capital expenditure variable based on different conditions. Lets take the first one as an example:
This is the sum of total capital expenditure that is directed to Austria (so Destination country= 'Austria') from EU countries (Source country continent=EU).
Can someone indicate the code to create a new df with this structure and create the variable explained above?
Thanks a lot!
Thanks a lot!
I am trying to calculate animal home ranges using movement data from the 30 days before two animals encountered each other, using R. So, for example, if animal1 meets animal2 on the 15th of June, I would like to select all movement data available between the 16th of May and the 14th of June for each animal. The problem I have is that I do not know how to program the subsetting of the movement data based on the date and animal id.
I would like to end up with two new datasets of movement data for each encounter, one per animal. Each new dataset would contain all movement data recorded for one of the encountering animals in the 30 days before the encounter.
I share part of the data with you in this Wetransfer link . The workbook contains 2 tabs:
encounters: Contains one line per encounter, with a column for the
date, another with the ID of group1 A and another with the ID of
group2. I would use the date and the IDs of this dataset to select
the data from the other dataset (movement_data)
movement_data: Contains one line per GPS point collected. There are
columns for the id of the point, the ID of the group, the date in
with the GPS point was taken, the latitude and the longitude.
Does anybody know how to do this? I don't even know where to start
Thank you very much!
So to subset by the id of the animal, you would just need to use DPLYR to subset the data by ID:
data %>% filter(ID == "A")
To get the dates you could add a column in excel where you subtract 30 days from each encounter and then filter for dates between that column and the encounter date
data %>%
filter(ID == "A") %>%
filter(between(date_column, as.Date('YYYY-MM-DD'), as.Date('YYYY-MM-DD')))
This question already has answers here:
Select the row with the maximum value in each group
(19 answers)
Closed 1 year ago.
I am trying to merge two datasets for my senior thesis on corporate political actibity. One shows all of the data I have on each company, which is made up off several previously merged datasets, and the other shows the year, the companies' ticker, and a variable called "dirnbr". "dirnbr" shows how many people were on the board in a given year, except it is showing it like this:
Basically, it is creating several entries per year, one for each person on the board, going from 1 to the total number on the board (which is the only number I really care about). I just want my dataset to show total number of people on the board in a given year, year, and ticker. This would then allow me to merge them using an inner_join command and then see what percentage of people on a board of directors in a given year were formerly involved in politics. (I have that information in my larger dataset).
Basically, I would like to drop every observation besides the largest "dirnbr" entry per year and ticker. Is there a way to do this (or achieve the same result in another way?)?
Please let me know, any help is very appreciated.
You could use
library(dplyr)
df %>%
group_by(ticker, year) %>%
filter(dirnbr == max(dirnbr))
or
df %>%
group_by(ticker, year) %>%
slice_max(dirnbr)
Firstly, I'm new to R and I apologize. So I'm working with data involving prescriptions. Since it's on a secure VM, I can't copy and paste, but the data structure looks like this:
Patient ID | Medication | Start Date | End Date
There are multiple rows for each patient, since each patient has been precribed more than one medication.
What I want to do is the following:
Find out how many medications/which medications the patients are on that overlap each other in terms of time frame, and then return how many overlapping prescriptions the patients has. Is there a way to do this in R?
I have two data frames. One data frame is called Measurements and has 500 rows. The columns are PatientID, Value and M_Date. The other data frame is called Patients and has 80 rows and the columns are PatientID, P_Date.
Each patient ID in Patients is unique. For each row in Patients, I want to look at the set of measurements in Measurements with the same PatientID (there are maybe 6-7 per patient).
From this set of measurements, I want to identify the one with M_Date closest to P_Date. I want to append this value to Patients in a new column. How do I do this? I tried using ddplyr but can't figure out how to access two data frames at once within this function.
you probably want to install the install.packages("survival") and the neardate function within it to solve your problem.
It has a good example in the documentation