Disambiguating day of the week in R - r

I have a certain start time and a specified day of the week.
start = as.POSIXct(1234567, origin = "1970-1-1")
format(start, format = "%A %c")
target1 = "TUE"
target2 = "Wednesday"
What I want is to find the first day, following start, that matches the corresponding day of the week. (And hopefully is somewhat flexible as to how the user might input the day of the week target) Any idea? I imagine a string lookup table might work, but there's gotta be a neater way.
Bonus points if the solution can be made to vectorise....

I haven't tried vectorizing this yet (not sure if I can), but here's an attempt:
find_day <- function(start,target){
target <- tolower(target)
next_week <- as.Date(start) + 1:7
next_week[match(target,substr(tolower(weekdays(next_week)),1,nchar(target)))]
}
It should accept any length or capitalized abbreviation of a day. How to use it:
> find_day(start,"TUE")
[1] "1970-01-20"
> find_day(start,"friday")
[1] "1970-01-16"

Related

Change date format with format() in R

So here's a basic algorithm in R that prints out the dates between two dates.
initial_date <- as.Date(toString((readline(prompt = "Enter a starting date in the format year-month-day:"))))
final_date <- as.Date(toString((readline(prompt = "Enter a final in the format year-month-day:"))))
dates <- seq(final_date, initial_date, by = "-1 day")
rev(dates[dates > initial_date & dates < final_date])
max.print = length(dates)
print(dates)
I would like to modify it so that the dates are in the format month-day-year like this: nov 27 2008. So I add "format(dates, format="%b %d %Y")".
initial_date <- as.Date(toString((readline(prompt = "Enter a starting date in the format year-month-day:"))))
final_date <- as.Date(toString((readline(prompt = "Enter a final in the format year-month-day:"))))
dates <- seq(final_date, initial_date, by = "-1 day")
format(dates, format="%b %d %Y")
rev(dates[dates > initial_date & dates < final_date])
max.print = length(dates)
print(dates)
But this keeps printing the same output as the previous code. How do I fix it?
There are a few points being misunderstood here:
format(dates, format="%b %d %Y") might be formatting it the way you want it to look, but it is not being stored, so the next command using dates is using the object as it was before the call to format(..). This as well as most R functions are functional, meaning that the effect of them is realized when it is stored in an object: calling the function itself has no side-effect. The "right" way to use format is to either print it right away (see far below) or to store it into the same or another variable. While I do not recommend doing this, a more functional use of this would have been
dates <- format(dates, format="%b %d %Y")
Ditto for rev(dates[...]): you need to use it immediately (as in print(rev(...)), i.e., the argument of an immediate function call) or store it somewhere else, such as
reversed_dates <- rev(dates[...])
In R, dates (proper Date-class) are number-like, so that one can safely make continuous-number comparisons such as date1 < date2 and date2 >= date3, etc. However, if you accidentally compare a %Y-%m-%d-string with another similary-formatted string, then it will still work. It still works because strings are compared lexicographically. This means that when comparing strings "2020-01-01" and "2019-01-01", it will first compare "2" and "2", it's a tie; same with "0"s; then it will see that "2" > "1", and therefore "2019-01-01" comes before the other.
This still works, even as strings, because the components with the most-significance are years, and as long as they are first in the string, the relative ordering (>, sort, order) still works. This continues to work if the dates are 0-padded integers. This does not work if they are not 0-padded, where "2021-2-1" > "2021-11-1" is reported as TRUE; this is because it gets to the month portion and compares the "2" with the first "1" of "11", and does not see that the next digit makes the "1" greater than "2".
The moment one starts bringing in month names, this goes the same type of wrong, since the month names (in any language, perhaps?) are not ordered lexicographically (I don't know that this is an absolute truth, but it is certainly true in English and perhaps many/most western languages ... I'm not polyglot to speak for other languages). This means that "2020-Apr-01" < "2020-Jan-01" will again be TRUE, unfortunately.
We'll combine #3 with the fact that in general, R will always print a Date-class object as "%Y-%m-%d"; there is no (trivial) way to get it to print a Date-class object as your "%b %d %Y" without either (a) converting it to a string and losing proper ordering; or (b) super-classing it so that it presents like you want on the console, but it is still a number underneath.
As for (a), this is a common thing to do for reports and labeling in plots, and I'm perfectly fine with that. I am not trying to convince the world that it should always see a date as %Y-%m-%d. However, what I am saying is that it is much easier to keep it as a proper Date-class object until you actually render it, and then format it at the last second. For this, do all of your filtering and ordering and then print(format(..)), such as this. I recommend this method.
dates <- seq(as.Date("2020-02-02"), as.Date("2020-02-06"), by = "day")
dates <- rev(dates[ dates > as.Date("2020-02-03") ])
print(format(dates, format = "%b %d %Y"))
# [1] "Feb 06 2020" "Feb 05 2020" "Feb 04 2020"
Again, above is the technique I recommend.
As for (b), yes, you can do it, but this approach is fragile since it is feasible that some functions that want Date-class objects will not immediately recognize that these are close enough to continue working as such; or they will strip the new class we assign at which point it will resort to "%Y-%m-%d"-format. You can use this, which requires that you change the class (see the # important line) of every Date-object you want to personalize the formatting. I recommend against doing this.
format.myDATE <- function(x, ...) { # fashioned after format.Date
xx <- format.Date(x, format = "%b %d %Y")
names(xx) <- names(x)
xx
}
print.myDATE <- function(x, max = NULL, ...) { # fashioned after print.Date
if (is.null(max))
max <- getOption("max.print", 9999L)
if (max < length(x)) {
print(format.myDATE(x[seq_len(max)]), ...)
cat(" [ reached 'max' / getOption(\"max.print\") -- omitted",
length(x) - max, "entries ]\n")
} else if (length(x))
print(format.myDATE(x), ...)
else cat(class(x)[1L], "of length 0\n")
invisible(x)
}
dates <- seq(as.Date("2020-02-02"), as.Date("2020-02-06"), by = "day")
class(dates) <- c("myDATE", class(dates)) ## important!
dates <- rev(dates[ dates > as.Date("2020-02-03") ])
print(dates) ## no need for format!
# [1] "Feb 06 2020" "Feb 05 2020" "Feb 04 2020"
### and number-like operations still tend to work
diff(dates)
# Time differences in days
# [1] -1 -1
Again, I recommend against doing this for data that you are working with. Many packages that pretty-print tables and plots and such may choose to override our preference for formatting, so there is no guarantee that this is honored across the board. This is why I suggest "accepting" the R way while working with it, regardless of your locale, and formatting it for your aesthetic preferences immediately before printing/rendering.
Another couple minor points:
remove toString, it's doing nothing for you here I think;
your use of max.print = ... suggests you think this is going to change anything else; most R things that have global options use options(...) for this, so you need to either set it globally in this R session with options(max.print=length(dates)), or a one-time limit with print(dates, max = length(dates)).

as.Date function gives different result in a for loop

Slight problem where my as.Date function gives a different result when I put it in a for loop. I'm looking in a folder with subfolders (per date) that contain images. I build date_list to organize all the dates (for plotting options in a later stage). The Julian Day starts from the first of January of the year, so because I have 4 years of date, the year must be flexible.
# Set up list with 4 columns and counter Q. jan is used to set all dates to the first of january
date_list <- outer(1:52, 1:4)
q = 1
jan <- "-01-01"
for (scene in folders){
year <- as.numeric(substr(scene, start=10, stop=13))
day <- as.numeric(substr(scene, start=14, stop=16))
datum <- paste(year, day, sep='_')
date_list[q, 1] <- datum
date_list[q, 2] <- year
date_list[q, 3] <- day
date_list[q, 4] <- as.Date(day, origin = as.Date(paste(year,jan, sep="")))
q = q+1
}
Output final row:
[52,] "2016_267" "2016" "267" "17068"
What am i missing in date_list[q, 4] that doesn't transfer my integer to a date?
running the following code does work, but due to the large amount of scenes and folders I like to automate this:
as.Date(day, origin = as.Date(paste(year,jan, sep="")))
Thank you for your time!
Well, I assume this would answer your first question:
date_list[q, 4] <- as.character(as.Date(datum,format="%Y_%j"))
as.Date accept a format argument, (the %Y and %j are documented in strptime), the %jis the julian day, this is a little easier to read than using origin and multiple paste calls.
Your problem is actually linked to what a Date object is:
> dput(as.Date("2016-01-10"))
structure(16810, class = "Date")
When entered into a matrix (your date_list) it is coerced to character w
without special treatment before like this:
> d<-as.Date("2016-01-10")
> class(d)<-"character"
> d
[1] "16810"
Hence you get only the number of days since 1970-01-01. When you ask for the date as character representation with as.character, it gives the correct value because the Date class as a as.character method which first compute the date in human format before returning a character value.
Now if I understood well your problem I would go this way:
First create a function to work on one string:
name_to_list <- function(name) {
dpart <- substr(name, start=10, stop=16)
date <- as.POSIXlt(dpart, format="%Y%j")
c("datum"=paste(date$year+1900,date$yday,sep="_"), "year"=date$year+1900, "julian_day"=date$yday, "date"=as.character(date) )
}
this function just get your substring, and then convert it to POSIXlt class, which give us julian day, year and date in one pass. as the year is stored as integer since 1900 (could be negative), we have to add 1900 when storing the year in the fields.
Then if your folders variable is a vector of string:
lapply(folders,name_to_list)
wich for folders=c("LC81730382016267LGN00","LC81730382016287LGN00","LC81730382016167LGN00") gives:
[[1]]
datum year julian_day date
"2016_266" "2016" "266" "2016-09-23"
[[2]]
datum year julian_day date
"2016_286" "2016" "286" "2016-10-13"
[[3]]
datum year julian_day date
"2016_166" "2016" "166" "2016-06-15"
Do you mean to output your day as 3 numbers? Should it not be 2 numbers?
day <- as.numeric(substr(scene, start=15, stop=16))
or
day <- as.numeric(substr(scene, start=14, stop=15))
That could at least be part of the issue. Providing an example of what typical values of "scene" are would be helpful here.

Opposite of timeNthNdayInMonth

Trying to find a function which can take a date and tell me which day it is,
e.g. if I input today's date, which is "12/29/2014", it will say "it is the 5th Monday of the month" (but doesn't have to be a string output, it can be 5,1 as output representing 5th Mon). It is kinda the opposite of timeNthNdayInMonth in timeDate library which tells you the date given nth nday.
Here's a function f():
f <- function(date)
paste(ceiling(as.numeric(format(date, "%d")) / 7), format(date, "%w"), sep = ",")
f(Sys.Date())

converting numbers to time

I entered my data by hand, and to save time I didn't include any punctuation in my times. So, for example, 8:32am I entered as 832. 3:34pm I entered as 1534. I'm trying to use the 'chrono' package (http://cran.r-project.org/web/packages/chron/chron.pdf) in R to convert these to time format, but chrono seems to require a delimiter between the hour and minute values. How can I work around this or use another package to convert my numbers into times?
And if you'd like to criticize me for asking a question that's already been answered before, please provide a link to said answer, because I've searched and haven't been able to find it. Then criticize away.
I think you don't need the chron package necessarily. When:
x <- c(834, 1534)
Then:
time <- substr(as.POSIXct(sprintf("%04.0f", x), format='%H%M'), 12, 16)
time
[1] "08:34" "15:34"
should give you the desired result. When you also want to include a variable which represents the date, you can use the ollowing line of code:
df$datetime <- as.POSIXct(paste(df$yymmdd, sprintf("%04.0f", df$x)), format='%Y%m%d %H%M%S')
Here's a sub solution using a regular expression:
set.seed(1); times <- paste0(sample(0:23,10), sample(0:59,10)) # ex. data
sub("(\\d+)(\\d{2})", "\\1:\\2", times) # put in delimitter
# [1] "6:12" "8:10" "12:39" "19:21" "4:43" "17:27" "18:38" "11:52" "10:19" "0:57"
Say
x <- c('834', '1534')
The last two characters represent minutes, so you can extract them using
mins <- substr(x, nchar(x)-1, nchar(x))
Similarly, extract hours with
hour <- substr(x, 0, nchar(x)-2)
Then create a fixed vector of time values with
time <- paste0(hour, ':', mins)
I think you are forced to specify dates in the chron package, so assuming a date value, you can converto chron with this:
chron(dates.=rep('02/02/02', 2),
times.=paste0(hour, ':', mins, ':00'),
format=c(dates='m/d/y',times='h:m:s'))
I thought I'd throw out a non-regex solution that uses lubridate. This is probably overkill.
library(lubridate)
library(stringr)
time.orig <- c('834', '1534')
# zero pad times before noon
time.padded <- str_pad(time.orig, 4, pad="0")
# parse using lubridate
time.period <- hm(time.padded)
# make it look like time
time.pretty <- paste(hour(time.period), minute(time.period), sep=":")
And you end up with
> time.pretty
[1] "8:34" "15:34"
Here are two solutions that do not use regular expressions:
library(chron)
x <- c(832, 1534, 101, 110) # test data
# 1
times( sprintf( "%d:%02d:00", x %/% 100, x %% 100 ) )
# 2
times( ( x %/% 100 + x %% 100 / 60 ) / 24 )
Either gives the following chron "times" object:
[1] 08:32:00 15:34:00 01:01:00 01:10:00
ADDED second solution.

R - character string with week-Year: week is lost when converting to Date format

I have a character string of the date in Year-week format as such:
weeks.strings <- c("2002-26", "2002-27", "2002-28", "2002-29", "2002-30", "2002-31")
However, converting this character to Date class results in a loss of week identifier:
> as.Date(weeks.strings, format="%Y-%U")
[1] "2002-08-28" "2002-08-28" "2002-08-28" "2002-08-28" "2002-08-28"
[6] "2002-08-28"
As shown above, the format is converted into year- concatenated with today's date, so any information about the original week is lost (ex - when using the format function or strptime to try and coerce back into the original format.
One solution I found in a help group is to specify the day of the week:
as.Date(weeks.strings, format="%Y-%u %U")
[1] "2002-02-12" "2002-02-19" "2002-02-26" "2002-03-05" "2002-01-02"
[6] "2002-01-09"
But it looks like this results in incorrect week numbering (doesn't match the original string).
Any guidance would be appreciated.
You just need to add a weekday to your weeks.strings in order to make the dates unambiguous (adapted from Jim Holtman's answer on R-help).
as.Date(paste(weeks.strings,1),"%Y-%U %u")
As pointed out in the comments, the Date class is not appropriate if the dates span a long horizon because--at some point--the chosen weekday will not exist in the first/last week of the year. In that case you could use a numeric vector where the whole portion is the year and the decimal portion is the fraction of weeks/year. For example:
wkstr <- sprintf("%d-%02d", rep(2000:2012,each=53), 0:52)
yrwk <- lapply(strsplit(wkstr, "-"), as.numeric)
yrwk <- sapply(yrwk, function(x) x[1]+x[2]/53)
Obviously, there's no unique solution, since each week could be represented by any of up to 7 different dates. That said, here's one idea:
weeks.strings <- c("2002-26", "2002-27", "2002-28", "2002-29",
"2002-30", "2002-31")
x <- as.Date("2002-1-1", format="%Y-%m-%d") + (0:52*7)
x[match(weeks.strings, format(x, "%Y-%U"))]
# [1] "2002-07-02" "2002-07-09" "2002-07-16" "2002-07-23"
# [5] "2002-07-30" "2002-08-06"

Resources