I have a data written in specific expression. To simplify the data, here is the example I made:
df<-data.frame(date=c(2012034,2012044,2012051,2012063,2012074),
math=c(100,100,23,46,78))
2012034 means 4th week of march,2012. Likewise 2012044 means 4th week of April,2012. I was trying to make the values of date expressing some order. The reason why I have to do this is because when I don't change them to time expressions, x axis of the scatter plot looks really weird.
My goal is this:
Find the oldest date in date column and name it as 1. In this case, 2012034 should be 1. Next, find the second oldest date in date column and calculate how many weeks passed after that date. The second oldest date in date is 2012044.So, 5 weeks after the oldest date 2012034. So it should be changed as 1+5=6. So, likewise, I want to number the date to indicate how many weeks have passed since the oldest date
One way to do it is by also specifying the day of the week and subtract it at the end, i.e.
as.Date(paste0(df$date, '-1'), '%Y%m%U-%u') - 1
#[1] "2012-03-22" "2012-04-22" "2012-05-01" "2012-06-15" "2012-07-22"
Related
so I am lost with the following problem:
I have a dataframe, in which one column contains (STARTED) the starting time of a survey, and several others information of the survey schedule of that survey participant (D5 to D10: only the planned survey dates, D17 to D50: planned send-out times of measurement per day). I'd like to create to columns that indicate now which survey day (1-6) and which measurement per day (1-6) this survey corresponds to.
First problem is the format (!)...
STARTED has the format %Y-%m-%d %H:%M:%S, D5 to D10 %d.%m.%Y and D17 to D50 %d.%m.%Y %H:%M.
I tried dmy_hms() from lubridate, parse_date_time(), and simply as.POSIXct(), but I always fail to get STARTED and the D17 to D50 section into a comparable format. Any solutions on this one?
After just separating STARTED into date & time columns, I was able to compare using ifelse() with D5 to D10 and to create the column of day running from 1 to 6.
This might be already more elegant with something like which(), but I was not able to create a vectorized version of this, as which(<<D5:D10>> == STARTED) would need to compare that per row. Does anyone have a solution for this?
And lastly, how on earth can I set up the second column indicating the measurement time? The first and last survey of the is easy, as there are also uniquely labelled, but for the other four ones I would need to compare per day whether the starting time is before the planned survey time of the following survey. I could imagine just checking whether STARTED falls in between two planned survey times just next to each other - as a POSIXct object that might work, if I can parse the different formats.
Help is greatly appreciated, thanks!
A screenshot from the beginning of the data:
Screenshot from R data using View()
For these first few rows, the intended variable day would need to be c(1,2,1,1,1,2,2) and measurement c(3,2,4,2,1,2,3).
Your other columns are not formatted with %d.%m.%Y, instead either %d.%m.%t (date only) or %d.%m.%y %H:%M. Note the change from %Y to %y.
Try:
as.Date("20.05.22", format = "%d.%m.%y")
# [1] "2022-05-20"
as.POSIXct("20.05.22 06:00", format = "%d.%m.%y %H:%M")
# [1] "2022-05-20 06:00:00 EDT"
I have a couples of weeknumbers of interest. Lets take '202124' (this week) as an example. How can I subtract x weeks from this week number?
Lets say I want to know the week number of 2 weeks prior, ideally I would like to do 202124 - 2 which would give me 202122. This is fine for most of the year however 202101 - 2 will give 202099 which is obviously not a valid week number. This would happen on a large scale so a more elegant solution is required. How could I go about this?
convert the year week values to dates subtract in days and format the output.
x <- c('202124', '202101')
format(as.Date(paste0(x, 1), '%Y%W%u') - 14, '%Y%V')
#[1] "202122" "202052"
To convert year week value to date we also need day of the week, I have used it as 1st day of the week.
I want to create a time series with date and quantity as variables. However I always have zero observation on Sundays. Therefore I want to define the week as 6-days in length in R. Any suggestions?
There is a data frame like this:
The first two columns in the df describe the start date (month and year) and the end date (month and year). Column names describe every single month and year of a certain time period.
I need a function/loop that insterts "1" or "0" in each cell - "1" when the date from given column name is within the period described by the two first columns, and "0" if not.
I would appreciate any help.
You want to do two different things. (a) create a dummy variable and (b) see if a particular date is in an interval.
Making a dummy variable is the easiest one, in base R you can use ifelse. For example in the iris data frame:
iris$dummy <- ifelse(iris$Sepal.Width > 2.5, 1, 0)
Now working with dates is more complicated. In this answer we will use the library lubridate. First you need to convert all those dates to a format 'Month Year' to something that R can understand. For example for February you could do:
new_format_february_2016 <- interval(ymd('2016-02-01'), ymd('2016-03-01') - dseconds(1))
#[1] 2016-02-01 UTC--2016-02-29 23:59:59 UTC
This is February, the interval of time from the 1 of February to one second before the 1 of March. You can do the same with your start date column and you end date column.
To compare two intevals of time (so, to see if a particular month fall into your other intervals) you can do:
int_overlaps(new_format_february_2016, other_interval)
If this returns true, the two intervals (one particular month and another one) overlaps. This is not the same as one being inside another, but in your case it will work. Using this you can iterate over different columns and rows and build your dummy variable.
But before doing so, I would recommend to clean your data, as your current format is complicate to work with. To get all the power that vector types in R provides ideally you would want to have one row per observation and one variable per column. This does not seem to be the case with your data frame. Take a look to the chapter 'Tidy data' of 'R for Data Science' specially the spreading and gathering subsection:
Tidy data
Is there a way to window filter dates by a number of days excluding weekends?
I know you can use the between function for filtering between two specific dates but I only know one of the two specific dates, with the other date I would like to do is 4 days prior in business days only (not counting weekends).
An pseudo-example of what I am looking for is, given this wednesday I want to filter everything up to 4 business days beforehand:
window(z, start = as.POSIXct("2017-09-13"), end = as.POSIXct("2017-09-20"))
Another example would be if I am given this Friday's date, the start date would be Monday.
Ideally, I want to be able to play with the window value.