Firstly, I'm new to R and I apologize. So I'm working with data involving prescriptions. Since it's on a secure VM, I can't copy and paste, but the data structure looks like this:
Patient ID | Medication | Start Date | End Date
There are multiple rows for each patient, since each patient has been precribed more than one medication.
What I want to do is the following:
Find out how many medications/which medications the patients are on that overlap each other in terms of time frame, and then return how many overlapping prescriptions the patients has. Is there a way to do this in R?
Related
I am a cross country runner on a high school team, and I am using my limited knowledge of R and linear algebra to create a ranking index for xc teams.
I get my data from milesplit.com, but I am unsure if I am formatting this data properly. So far I created matrices for each race, with odd columns including runner score and even columns including time, where each team has a team_score and team_time column. I want to analyze growth of teams in a time series, but I have two questions about this:
(1): can I combine all of these "race matrices" into a time series? Can I assign all the data in a race matrix a certain date, then make one big time series including all 25 race matrices I made?
(2): Am I closing myself off to insights by not including name and grade for each runner (as I only record time and score)? If so, how can I write a matrix that contains all this information?
I have the following data table in R, which I need to collapse for streamlined data processing. I can do this manually, but I am looking for the most efficient way possible. The data frame looks like this:
and so on. Each age group has 4 observations, 2 male and 2 female (1 of each type). And region consists of city1, city2, city3, etc. which are all ordered the same as the example above. After all age groups are exhausted, the next cityX begins.
I need to combine gender into the total, summing males and females (within type). I also need to combine all age groups to give a population total (sum all age groups). I need to keep type separate, and then later combine them as an additional column. I want the final rows output to be the region. I need the population totals for each year column. So the final output would be like this:
I know this could be done manually by splitting the data frame repeatedly, but what would be the most efficient way to do this?
So I have one dataset (DF1) that includes baseball players, the year, and their stats in that year. I have another (DF2) that lists the players, the year, and their salary in that year.
I would like to add the salary column information to DF1 when player name AND year match in both datasets.
I tried
DF1$Salary <- DF2$salary[match(Pitching$playerID, Salaries$playerID)]
But realized that if I did this the information was only correct for the first year. I need to only make the match if year and player ID are the same. Can someone help me? Thanks!
Here's a portion of a dataset that I have of daily closes for certain stocks within a common period of time in .xlsx format:
What I need is an R script that would produce something like this:
So I need a row for each stock everyday for the time period and the corresponding prices for them in the third column like above. Of course, I have more than 100 stocks for a period of 4 years. So that makes more than 100 rows for each day for 4 years. For example, a hundred rows of the day 5.01.2015 and so forth.
I'm still very new to R so help is very much appreciated.
I have two data frames. One data frame is called Measurements and has 500 rows. The columns are PatientID, Value and M_Date. The other data frame is called Patients and has 80 rows and the columns are PatientID, P_Date.
Each patient ID in Patients is unique. For each row in Patients, I want to look at the set of measurements in Measurements with the same PatientID (there are maybe 6-7 per patient).
From this set of measurements, I want to identify the one with M_Date closest to P_Date. I want to append this value to Patients in a new column. How do I do this? I tried using ddplyr but can't figure out how to access two data frames at once within this function.
you probably want to install the install.packages("survival") and the neardate function within it to solve your problem.
It has a good example in the documentation