Find closest datapoint to a date in another dataframe - r

I have two data frames. One data frame is called Measurements and has 500 rows. The columns are PatientID, Value and M_Date. The other data frame is called Patients and has 80 rows and the columns are PatientID, P_Date.
Each patient ID in Patients is unique. For each row in Patients, I want to look at the set of measurements in Measurements with the same PatientID (there are maybe 6-7 per patient).
From this set of measurements, I want to identify the one with M_Date closest to P_Date. I want to append this value to Patients in a new column. How do I do this? I tried using ddplyr but can't figure out how to access two data frames at once within this function.

you probably want to install the install.packages("survival") and the neardate function within it to solve your problem.
It has a good example in the documentation

Related

Make a new dataset that has the same variables and value types, but random observations based on a source dataset

I have five tables, each with between 20 to 30 variables and 1,500 to 2,300 observations. I want to make a new dataset similar in values to the original dataset but not the same. Meaning if one variable is gender. I want the new dataset to have gender data, but the values would differ.
I'm unsure what functions to use or even how to search for some methods. My Google searches are coming up with how to make subsets of data, but I need new datasets with the same number of variables and observations with random values based on the values of the source dataset.
Any advice would be helpful.

How do I gather data that is spread across in various rows to a single row?

I have a dataframe that has 23 columns of various parameters defining a patient which I extracted using dplyr from a larger dataframe after pivoting it such that each of the parameters forms the columns of the new dataframe.
Now I am facing an issue. I am getting a lot of rows for the same patient. For each parameter, one of the rows shows the required value and the rest is denoted as NA. So if the same patient is repeated, say 10 times, in every parameter column there is one row with the actual value and the rest is NA.
How do I remove these NAs and gather the information that is scattered in this manner?
I want the 1 and 2 to be on the same row. All the rows seen in this image of dataframe are of the same person.

Obtaining proportions within subsets of a data frame

I am trying to obtain proportions within subsets of a data frame. The inputs are Grade, Fully Paid and Charged Off. I tried using
DF$proportion<-as.vector(unlist(tapply(DF$Grade,paste(DF$Fully Paid ,DF$ Charged Off,sep="."),FUN=function(x){x/sum(x)}))
based on an answer given to this same question in a previous post Calculate proportions within subsets of a data frame but not having luck. I am guessing because Grade is a character not a number in my data.
Based on your comments, Here is the code you should try for each column.
DF$Charged_off_proportion <- as.vector(unlist(tapply(DF$Charged_Off,DF$Grade,FUN=function(x){x/sum(x)})))
Similarly you can change the column names for other columns like
DF$Fully_Paid_proportion <- as.vector(unlist(tapply(DF$Fully_Paid,DF$Grade,FUN=function(x){x/sum(x)})))

How to merge multi level datasets and repeat country-year observations?

I have a survey data frame with individual-level observations. I want to merge this data frame with another data frame with country-level variables (one row per country-year). This latter should repeat the rows according to the number of rows in the individual-level data frame. I have been tried this using the rep() function but it didn't work. I don't know if I was clear, but I would thank you so much if anyone can help.

Conditional operation on two data frames (R)

I'm having some difficulty executing a conditional operation on two dataframes. For problem illustration, I have three variables: Price, State, and Item, which are stored in a data frame (data1) with those column names. I use ddply to generate a dataframe (data2) that includes columns State and Item, and the average price(or some other function) for that State/Item combination.
What I then want to do is fill in a column in the originating data frame(i.e. a simple prediction vector), where the column's value is the mean value for a given observations combination of State and Item in data1. (e.g., if an observation in data1 has state="Arizona" and item="pen", I then want to retrieve the average price stored in data2 that corresponds to that state/item combination, and insert it into the column.)
Thank you for any help.
The plyr package comes with a great little function called join. You can use this to complete your task.
join(dat1,dat2, by=c('State','Item'))
Review ?join to see the different types of joins possible. I'm pretty sure you want a left join.

Resources