R: Smaller than operator wrong with dates - r

I have a dataframe climdata with 1 of the columns containing hourly measure moments. The format is "2015-12-30 17:00:00".
With my current dataset, the range goes from "2015-01-01 00:00:00" to "2016-11-15 07:00:00" and I want to extract the dates from 2015*.
I used this code to extract 2015**:
climdata <- climdata[climdata$Meetdatum > paste(2015, "-01-01", sep = "") &
climdata$Meetdatum < paste(2015, "-12-31", sep = ""),]
The new climdata ranges from "2015-01-01 00:00:00" to 2015-12-30 22:00:00".
This is odd, because I asked for dates after 1/1/'15. However this is not a real problem, since I need that day anyway.
The problem is the last part. I need 31/12/'15, but when I use <= instead of < I get "2015-12-30 23:00:00" as last element. Which makes no sense at all. First of all, this element should've already been in the dataset when I used < and second, where did the equal to 31/12/'15 go?
*This process should later be automated: I'll have a bigger dataset and the year to be extracted will be chosen from a selectInput() in a shiny app.
**I used paste() because 2015 will be replaced by the inputId later on.

Related

Formatting 24-hour time variable to capture observations in different ranges

I currently have a data frame with a column for Start.Time (imported from a *.csv file), and the format is in 24 hour format (e.g., 20:00:00 equals 8pm). My goal is to capture observations with a start time in various intervals (e.g., between 9:00:00 and 10:00:00), which also meet other criteria. However, it seems that R sorts this 'character' variable in a way that does not align with how our day goes (e.g., 14:00:00 is considered a lower value than 9:00:00).
For example, below is a line of code that works as intended, where I am capturing observations on two different trail segments, which had a start time between 8:00:00 and 9:00:00.
RLLtoMist8.9<-sum((dataset1$Trail.Segment==52|dataset1$Trail.Segment==55) &
(dataset1$Start.Time>="8:00" & dataset1$Start.Time < "9:00"),
na.rm=TRUE)
RLLtoMist8.9
But, this code below does not work as intended, as R is 'valuing' 9:00:00 as greater than 10:00:00.
RLLtoMist9.10 <-
sum((dataset1$Trail.Segment==52|dataset1$Trail.Segment==55) &
(dataset1$Start.Time>="9:00:00 AM" & dataset1$Start.Time < "10:00:00 AM"),
na.rm=TRUE)
It's certainly true that character types are sorted so that "14:00" is less than "9:00". However R has a datetime class which would sort times correctly once a character representation has been parsed.
a <- as.POSIXct("14:00", format="%H:%M")
b <- as.POSIXct("8:00", format="%H:%M")
# test
> a < b
[1] FALSE
You would be able to convert an entire column with:
dataset1$Start.Time <- as.POSIXct(dataset1$Start.Time, format="%H:%M")
The dates of a and b were the system date at the time of conversion, so if you printed them you would see dates and times in the default format. There are packages, such as chron, that let you use just times, but POSIXt objects have dates and times necessarily. See ?DateTimeClasses. The lubridate package also has an 'interval' class and there exist a difftime function in base-R.
There's also seq.POSIXt and cut.POSIXt functions, either of which could be used to create multiple time or date boundaries for categorical transformations of datetimes.
Using the data.table library:
# convert to data table
dataset1<-data.table(dataset1)
# format to a date format rather that character
dataset1[, Start.Time := as.POSIXct(Start.Time, format="%H:%M:%S")]
#now do your filtering
dataset1[between(Start.Time, as.POSIXct("09:00:00", format="%H:%M:%S"), as.POSIXct("10:00:00", format="%H:%M:%S")) & (Trail.Segment==52 | Trail.Segment==55)]

Mixed Date formats in R data frame

how do you work with a column of mixed date types, for example 8/2/2020,2/7/2020, and all are reflecting February,
I have tried zoo::as.Date(mixeddatescolumn,"%d/%m/%Y").The first one is right but the second is wrong.
i have tried solutions here too
Fixing mixed date formats in data frame? but the questions seems different from what i am handling.
It is really tricky to know even for a human if dates like '8/2/2020' is 8th February or 2nd August. However, we can leverage the fact that you know all these dates are in February and remove the "2" part of the date which represents the month and arrange the date in one standard format and then convert the date to an actual Date object.
x <- c('8/2/2020','2/7/2020')
lubridate::mdy(paste0('2/', sub('2/', '', x, fixed = TRUE)))
#[1] "2020-02-08" "2020-02-07"
Or same in base R :
as.Date(paste0('2/', sub('2/', '', x, fixed = TRUE)), "%m/%d/%Y")
Since we know that every month is in February search for /2/ or /02/ and if found the middle number is the month; otherwise, the first number is the month. In either case set the format appropriately and use as.Date. No packages are used.
dates <- c("8/2/2020", "2/7/2020", "2/28/2000", "28/2/2000") # test data
as.Date(dates, ifelse(grepl("/0?2/", dates), "%d/%m/%Y", "%m/%d/%Y"))
## [1] "2020-02-08" "2020-02-07" "2000-02-28" "2000-02-28"

How to work with POSIXlt in R

I am trying to do some analysis with a csv file that I have loaded into R. I was doing the following to access specific values via test[[3]][[1]] for example to get the specific value:
test <- read.csv(file = "test.csv")
test <- data.frame(lapply(test, as.character), stringsAsFactors=FALSE)
Otherwise I would have gotten something like this:
> chicago[[3]][[1]]
[1] 08/02/2002 11:00:00 AM
19747 Levels: 01/01/2001 03:49:00 AM 01/01/2001 06:17:00 PM 01/01/2001 12:00:00 AM ... 12/31/2015 11:46:00 AM
Since one column is saving dates I was converting it to POSIXlt.
test[[3]] <- strptime(test[[3]], format='%m/%d/%Y %I:%M:%S %p')
The values are now being changed as expected, for example:
01/28/2004 06:30:00 PM -> 2004-01-28 18:30:00
Trying to access the values now, I realised though that for example test[[3]][[1]] doesn't give the specific date - instead I get a list that contains every second of each row.
Testing a bit around, I found out that the POSIXit type is a bit "different"; meaning the value mentioned above seems to be some kind of list, being like this:
> unlist(unclass(value))
sec min hour mday mon year wday yday isdst zone gmtoff
"0" "0" "11" "2" "7" "102" "5" "213" "1" "CEST" NA
So my question is: is there a way to get values like "2004-01-28 18:30:00" instead of a list about the whole column?
You are making your life too difficult. You can parse to either Date or Datetime for an entire column. No need for lapply.
You (in general) do not want POSIXlt representation. Look into existing package such as my (relatively recent) anytime package (also on CRAN) which even converts from factor for you -- and does not require explicit format strings, origin values or other holdups.
But as your post does not contain a reproducible example I cannot help with more concrete steps.

Strip the date and keep the time

Lots of people ask how to strip the time and keep the date, but what about the other way around? Given:
myDateTime <- "11/02/2014 14:22:45"
I would like to see:
myTime
[1] "14:22:45"
Time zone not necessary.
I've already tried (from other answers)
as.POSIXct(substr(myDateTime, 12,19),format="%H:%M:%S")
[1] "2013-04-13 14:22:45 NZST"
The purpose is to analyse events recorded over several days by time of day only.
Thanks
Edit:
It turns out there's no pure "time" object, so every time must also have a date.
In the end I used
as.POSIXct(as.numeric(as.POSIXct(myDateTime)) %% 86400, origin = "2000-01-01")
rather than the character solution, because I need to do arithmetic on the results. This solution is similar to my original one, except that the date can be controlled consistently - "2000-01-01" in this case, whereas my attempt just used the current date at runtime.
I think you're looking for the format function.
(x <- strptime(myDateTime, format="%d/%m/%Y %H:%M:%S"))
#[1] "2014-02-11 14:22:45"
format(x, "%H:%M:%S")
#[1] "14:22:45"
That's character, not "time", but would work with something like aggregate if that's what you mean by "analyse events recorded over several days by time of day only."
If the time within a GMT day is useful for your problem, you can get this with %%, the remainder operator, taking the remainder modulo 86400 (the number of seconds in a day).
stamps <- c("2013-04-12 19:00:00", "2010-04-01 19:00:01", "2018-06-18 19:00:02")
as.numeric(as.POSIXct(stamps)) %% 86400
## [1] 0 1 2

Converting time format to numeric with R

In most cases, we convert numeric time to POSIXct format using R. However, if we want to compare two time points, then we would prefer the numeric time format. For example, I have a date format like "2001-03-13 10:31:00",
begin <- "2001-03-13 10:31:00"
Using R, I want to covert this into a numeric (e.g., the Julian time), perhaps something like the passing seconds between 1970-01-01 00:00:00 and 2001-03-13 10:31:00.
Do you have any suggestions?
The Julian calendar began in 45 BC (709 AUC) as a reform of the Roman calendar by Julius Caesar. It was chosen after consultation with the astronomer Sosigenes of Alexandria and was probably designed to approximate the tropical year (known at least since Hipparchus). see http://en.wikipedia.org/wiki/Julian_calendar
If you just want to remove ":" , " ", and "-" from a character vector then this will suffice:
end <- gsub("[: -]", "" , begin, perl=TRUE)
#> end
#[1] "20010313103100"
You should read the section about 1/4 of the way down in ?regex about character classes. Since the "-" is special in that context as a range operator, it needs to be placed first or last.
After your edit then the answer is clearly what #joran wrote, except that you would need first to convert to a DateTime class:
as.numeric(as.POSIXct(begin))
#[1] 984497460
The other point to make is that comparison operators do work for Date and DateTime classed variables, so the conversion may not be necessary at all. This compares 'begin' to a time one second later and correctly reports that begin is earlier:
as.POSIXct(begin) < as.POSIXct(begin) +1
#[1] TRUE
Based on the revised question this should do what you want:
begin <- "2001-03-13 10:31:00"
as.numeric(as.POSIXct(begin))
The result is a unix timestamp, the number of seconds since epoch, assuming the timestamp is in the local time zone.
Maybe this could also work:
library(lubridate)
...
df <- '24:00:00'
as.numeric(hms(df))
hms() will convert your data from one time format into another, this will let you convert it into seconds. See full documentation.
I tried this because i had trouble with data which was in that format but over 24 hours.
The example from ?as.POSIX help gives
as.POSIXct(strptime(begin, "%Y-%m-%d %H:%M:%S"))
so for you it would be
as.numeric(as.POSIXct(strptime(begin, "%Y-%m-%d %H:%M:%S")))

Resources