Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I am trying to look at prediction accuracy related to timeframes related to hospital discharge.
For example, I think Mr. Smith will be discharged within 3-7 days, which would mean he could any day from 11/9-11/13 would be correct. If he discharges in 2 days, I would say I was 1 day off and if he discharges within 10 days, I was 3 days off...
Is there any good method to do this using dplyr, base R, and lubridate? TIA. Sample data is at the link:
Sample data
A possible solution would be to express your need in a case_when.
library(dplyr)
df %>%
dplyr::mutate(DIF = case_when(discharge_calender_date < discharge_prediction_lower_bound ~ discharge_calender_date - discharge_prediction_lower_bound,
discharge_calender_date <= discharge_prediction_upper_bound ~ 0,
TRUE ~ discharge_calender_date - discharge_prediction_upper_bound))
This way you get a negative value if the patient left before the lower bound, zero if he left within the prediction and a positive result if he left after the prediction.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 17 days ago.
Improve this question
I have a dataset with time, where the time intervals are 6 hours apart and I have a column of heaterstatus.
The dataset :
I would like to know the percentage of zero occurred in each day for heaterstatus. New to R, any suggestion will be helpful.
Not tested since you only provided data as an image, but this should do what you want:
library(dplyr)
dat %>%
group_by(day = as.Date(Time)) %>%
summarize(pct_0 = mean(HeaterStatus == 0))
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I am trying to create a plot in R that shows post-surgical outcomes over time. I want to plot a certain data point at pre-op, 1 month post-op, 6 months post-op, etc. Here is an example dataframe:
dat <- data.frame(Preop=c(-2,0.5,-0.25,1.5), PO_1M=c(-1.5,0.2,-0.1,1.0), PO_6M=c(-1.2,0.1,-0.05,0.5), PO_1Y=c(-1.0,0.05,0,0.25))
dat
Ideally, the x axis will have markings for the time (preop, 1 month post-op, etc.), and the y axis will have the value at that time. The data should converge around y=0 coming from either the positive or negative direction, and I imagine a plot looking something like this:
My actual dataframe also has many missing values, so this would need to be accounted for somehow. I would appreciate if anyone could help approach this problem using either ggplot or base R plotting functions. Thanks so much!
Your data should be restructured. Use tidyr package to help make your columns into rows. Then use ifelse logic to convert your column names into the number of months. I assigned pre-op to zero months.
library(tidyverse)
dat2<-dat %>% tidyr::pivot_longer(cols=Preop:PO_1Y)
dat2$nummonths<-ifelse(dat2$name=='Preop',0,
ifelse(dat2$name=='PO_1M',1,
ifelse(dat2$name=='PO_6M',6,
ifelse(dat2$name=='PO_1Y',12,NA))))
ggplot(dat2, aes(nummonths,value))+geom_point()+theme_dark()
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm trying to calculate the number of retained students using R. The two variables I'm working with are 'registration_date' (mm/dd/yr) and 'date_of_last_login' (mm/dd/yr). A student is considered retained if they logged-in in the preceding 30 days.
ID 1 , 2, 3, 4, 5
registration_date 2/1/15, 2/1/15, 3/15/15, 2/10/15, 4/15/15
date_of_last_login 2/3/15, 3/15/15, 4/30/15, 4/25/15, 5/16/15
I imagine the idea is to create a new variable: 'retained students' but I am not sure how to set up the formula in R.
Assuming you mean the 30 days previous to today:
last_login <- c("2/3/15","3/15/15","4/30/15")
login <- as.Date(last_login, format = '%m/%d/%y')
retained_students <- (Sys.Date()-login < 30)
retained_students
retained_students is then a vector with either TRUE or FALSE for each login
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
A have a problem.
I have a data frame byDays, which consists of two columns: day and money.
Day looks like sequence from 0 to 100. And Money means amount of money, our customers spent in this day.
I plotted distribution, but cant link it, havent got enough reputation.
And i need to find a day(!) left from which will be 80% of area of my distibution.
If you want the point at which 80% of the total is reached this will give you the answer:
set.seed(1)
day <- 1:100
profit <- runif(100, 0, 15)
## Point at which 80% of the total is reached:
pct <- max(x[ cumsum(profit)/sum(profit) <= 0.8])
plot(day, cumsum(profit)/sum(profit))
abline(v=pct, col="red")
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I'm working with a dataset in R where the main area of interest is the date. (It has to do with army skirmishes and the date of the skirmish is recorded). I wanted to check if these were more likely to happen in a given season, or near a holiday, etc, so I want to be able to see how many dates there are in the summer, winter, etc but I'm sort of at a loss for how to do that.
A general recommendation: use the package lubridate for converting from strings to dates if you're having trouble with that. use cut() to divide dates into ranges, like so:
someDates <- c( '1-1-2013',
'2-14-2013',
'3-5-2013',
'8-21-2013',
'9-15-2013',
'11-28-2013',
'12-22-2013')
cutpoints<- c('1-1-2013',# star of range 'winter'
'3-20-2013',# spring
'6-21-2013',# summer
'9-23-2013',# fall
'12-21-2013',# winter
'1-1-2014')# end of range
library(lubridate)
temp <- cut(mdy(someDates),
mdy(cutpoints),
labels=FALSE)
someSeasons <- c('winter',
'spring',
'summer',
'fall',
'winter')[temp]
Now use 'someSeasons' to group your data into date ranges with your favorite
statistical analysis. For a choice of statistical analysis, poisson
regression adjusting for exposure (i.e. length of the season), comes to
mind, but that is probably a better question for Cross Validated
You can make a vector of cut points with regular intervals like so:
cutpoints<- c('3-20-2013',# spring
'6-21-2013',# summer
'9-23-2013',# fall
'12-21-2013')# winter
temp <- cut(mdy(someDates),
outer(mdy(cutpoints), years(1:5),`+`),
labels=F)
someSeasons <- c('spring',
'summer',
'fall',
'winter')[(temp-1)%% 4 + 1] #the index is just a little tricky...