categorizing date in R [closed] - r

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I'm working with a dataset in R where the main area of interest is the date. (It has to do with army skirmishes and the date of the skirmish is recorded). I wanted to check if these were more likely to happen in a given season, or near a holiday, etc, so I want to be able to see how many dates there are in the summer, winter, etc but I'm sort of at a loss for how to do that.

A general recommendation: use the package lubridate for converting from strings to dates if you're having trouble with that. use cut() to divide dates into ranges, like so:
someDates <- c( '1-1-2013',
'2-14-2013',
'3-5-2013',
'8-21-2013',
'9-15-2013',
'11-28-2013',
'12-22-2013')
cutpoints<- c('1-1-2013',# star of range 'winter'
'3-20-2013',# spring
'6-21-2013',# summer
'9-23-2013',# fall
'12-21-2013',# winter
'1-1-2014')# end of range
library(lubridate)
temp <- cut(mdy(someDates),
mdy(cutpoints),
labels=FALSE)
someSeasons <- c('winter',
'spring',
'summer',
'fall',
'winter')[temp]
Now use 'someSeasons' to group your data into date ranges with your favorite
statistical analysis. For a choice of statistical analysis, poisson
regression adjusting for exposure (i.e. length of the season), comes to
mind, but that is probably a better question for Cross Validated
You can make a vector of cut points with regular intervals like so:
cutpoints<- c('3-20-2013',# spring
'6-21-2013',# summer
'9-23-2013',# fall
'12-21-2013')# winter
temp <- cut(mdy(someDates),
outer(mdy(cutpoints), years(1:5),`+`),
labels=F)
someSeasons <- c('spring',
'summer',
'fall',
'winter')[(temp-1)%% 4 + 1] #the index is just a little tricky...

Related

Plotting measurements over time in R [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I am trying to create a plot in R that shows post-surgical outcomes over time. I want to plot a certain data point at pre-op, 1 month post-op, 6 months post-op, etc. Here is an example dataframe:
dat <- data.frame(Preop=c(-2,0.5,-0.25,1.5), PO_1M=c(-1.5,0.2,-0.1,1.0), PO_6M=c(-1.2,0.1,-0.05,0.5), PO_1Y=c(-1.0,0.05,0,0.25))
dat
Ideally, the x axis will have markings for the time (preop, 1 month post-op, etc.), and the y axis will have the value at that time. The data should converge around y=0 coming from either the positive or negative direction, and I imagine a plot looking something like this:
My actual dataframe also has many missing values, so this would need to be accounted for somehow. I would appreciate if anyone could help approach this problem using either ggplot or base R plotting functions. Thanks so much!
Your data should be restructured. Use tidyr package to help make your columns into rows. Then use ifelse logic to convert your column names into the number of months. I assigned pre-op to zero months.
library(tidyverse)
dat2<-dat %>% tidyr::pivot_longer(cols=Preop:PO_1Y)
dat2$nummonths<-ifelse(dat2$name=='Preop',0,
ifelse(dat2$name=='PO_1M',1,
ifelse(dat2$name=='PO_6M',6,
ifelse(dat2$name=='PO_1Y',12,NA))))
ggplot(dat2, aes(nummonths,value))+geom_point()+theme_dark()

calculating timeframe accuracy in R [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I am trying to look at prediction accuracy related to timeframes related to hospital discharge.
For example, I think Mr. Smith will be discharged within 3-7 days, which would mean he could any day from 11/9-11/13 would be correct. If he discharges in 2 days, I would say I was 1 day off and if he discharges within 10 days, I was 3 days off...
Is there any good method to do this using dplyr, base R, and lubridate? TIA. Sample data is at the link:
Sample data
A possible solution would be to express your need in a case_when.
library(dplyr)
df %>%
dplyr::mutate(DIF = case_when(discharge_calender_date < discharge_prediction_lower_bound ~ discharge_calender_date - discharge_prediction_lower_bound,
discharge_calender_date <= discharge_prediction_upper_bound ~ 0,
TRUE ~ discharge_calender_date - discharge_prediction_upper_bound))
This way you get a negative value if the patient left before the lower bound, zero if he left within the prediction and a positive result if he left after the prediction.

Duplicating elements of vector based on variable from different dataset [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have a dataset with monthly return of various stocks over a certain period of time where the months are already formatted to consecutively numbered month-ID. To compare those I have imported a .csv file with unique one-month interest rates during that time and saved it as a vector. Now, I want to add this vector to my datasaet. Problem is the difference in length.
My question is: how can I extend this vector to the length of my data by duplicating the elements such that every rate is correctly assigned to the corresponding month?
Say the stock dataset is called stocks with a variable called months. and the interest vector dataset is called interest. I assume they're both in the same order, and have the same months,
Then add the months to the interests with int_dat <- data.frame(months = unique(stocks$months), interest = interest). Add the months to the stocks data with stocks_new <- merge(stocks, int_dat, by = 'month', all = TRUE).

conditionally selecting two numeric row values in R [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I'm trying to subset a data set to remove all values before the 7th month of the year 2011. I have Years and Months in different columns.
What I am doing I know is logically wrong(also getting a wrong output), but can't seem to figure out the right way to do this:
state_in2_check <- subset(state_in2, Month > 6 & Year > 2011)
#thelatemail has given you a workable solution in the comments. Your problem is that You're asking R to match two logical checks separately, but each of those checks is dependant on the other. You won't, for example, get any "January" dates (because you're only accepting months greater than 6), even though "Jan-2013" would be fine. #thelatemail's solution separates the checks, such that months lower than 6 will be accepted, as long as they're in years greater than 2011.
Another way would be to convert to date at the same time as subsetting, this way the process is a little more logical:
Month <- 7
Year <- 2011
as.Date( paste( Year, Month, 15, sep = "-" ) )
[1] "2011-07-15"
You can use that simple conversion to subset in a more (in my opinion) logical way:
state_in2_check <- subset(state_in2,
as.Date( paste( Year, Month, 15, sep = "-" ) ) >
as.Date( "2011-06-15" )
)
Note I've made the day of the month the same in both date conversions, which will mean they're compared only according to month/year.

I found ways to plot a graph in R using plot function. but I am looking to create plot with part of the data [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions concerning problems with code you've written must describe the specific problem — and include valid code to reproduce it — in the question itself. See SSCCE.org for guidance.
Closed 9 years ago.
Improve this question
I am trying to plot a graph for a data containing years between 1900 and 2010 and output in each month of the year in R. I need to select years between 1950-2001 against months of Nov-february. How can I select part of data for plotting this graph ?
since I am a rookie at R programming or any programming, an easy to follow example would be of great help.
thanks
GRV
I am not sure exactly what you mean by
select years between 1950-2001 against months of Nov-february
But the following should get you started on a reproducible example...
#create a vector of months from 1900 through 2010
months <- seq(as.Date("1900/1/1"), as.Date("2010/12/31"), "months")
#assign a random vector of equal length
output <- rnorm(length(months))
#assign both values to a data_frame
data <- data.frame(months = months, output = output)
Based on your description, your data should look something like the dataframe, called data.
From here, you can make use of the subset function to help you on your way. The first example subsets to data from 1950 through 2001. The next further restricts that subset to the months of November through February.
#subset to just 1950 through 2001
data_sub <- subset(data, months >= as.Date("1950-01-01") & months <= as.Date("2001-12-31"))
#subset the 1950 to 2001 data to just Nov-feb months (i.e. c(11,12,1,2))
data_sub_nf <- subset(data_sub, as.numeric(format(data_sub$months, "%m")) %in% c(11,12,1,2))
You should also read Why is `[` better than `subset`? to move beyond subset.
As stated, after the data has been subset, you can use plot or any other plotting function to graph your data.

Resources