How to work with POSIXlt in R - r

I am trying to do some analysis with a csv file that I have loaded into R. I was doing the following to access specific values via test[[3]][[1]] for example to get the specific value:
test <- read.csv(file = "test.csv")
test <- data.frame(lapply(test, as.character), stringsAsFactors=FALSE)
Otherwise I would have gotten something like this:
> chicago[[3]][[1]]
[1] 08/02/2002 11:00:00 AM
19747 Levels: 01/01/2001 03:49:00 AM 01/01/2001 06:17:00 PM 01/01/2001 12:00:00 AM ... 12/31/2015 11:46:00 AM
Since one column is saving dates I was converting it to POSIXlt.
test[[3]] <- strptime(test[[3]], format='%m/%d/%Y %I:%M:%S %p')
The values are now being changed as expected, for example:
01/28/2004 06:30:00 PM -> 2004-01-28 18:30:00
Trying to access the values now, I realised though that for example test[[3]][[1]] doesn't give the specific date - instead I get a list that contains every second of each row.
Testing a bit around, I found out that the POSIXit type is a bit "different"; meaning the value mentioned above seems to be some kind of list, being like this:
> unlist(unclass(value))
sec min hour mday mon year wday yday isdst zone gmtoff
"0" "0" "11" "2" "7" "102" "5" "213" "1" "CEST" NA
So my question is: is there a way to get values like "2004-01-28 18:30:00" instead of a list about the whole column?

You are making your life too difficult. You can parse to either Date or Datetime for an entire column. No need for lapply.
You (in general) do not want POSIXlt representation. Look into existing package such as my (relatively recent) anytime package (also on CRAN) which even converts from factor for you -- and does not require explicit format strings, origin values or other holdups.
But as your post does not contain a reproducible example I cannot help with more concrete steps.

Related

Convert time format from chr to hours/minutes/seconds in R

See link attached. I want to change it form character to a time format so i can add a column with day/night.
Values are in this format: 00:30:00
enter image description here
Not really sure what you're going for here, but...
The as.ITime(...) function in the data.table package does this:
as.ITime('00:30:00')
## [1] "00:30:00"
The result is stored internally as seconds:
as.integer(as.ITime('00:30:00'))
## [1] 1800

Recode "date & time variable" into two separate variables

I'm a PhD student (not that experienced in R), and I'm trying to recode a string variable, called RecordedDate into two separate variables: a Date variable and a Time variable. I am using RStudio.
An example of values are:
8/6/2018 18:56
7/26/2018 10:43
7/28/2018 8:36
I would like to you the first part of the value (example: 08/6/2018) to reformat this into a date variable, and the second part of the value (example: 18:56) into a time variable.
I'm thinking the first step would be to create code that can break this up into two variables, based on some rule. I’m thinking maybe I can separate separate everything before the "space" into the Date variable, and after the "space" in the Time variable. I am not able to figure this out.
Then, I'm looking for code that would change the Date from a "string" variable to a "date" type variable. I’m not sure if this is correct, but I’m thinking something like:
better_date <- as.Date(Date, "%m/%d/%Y")
Finally, then I would like to change theTime variable to a "time" type format (if this exists). Not sure how to do this part either, but something that indicates hours and minutes. This part is less important than getting the date variable.
Two immediate ways:
strsplit() on the white space
The proper ways: parse, and then format back out.
Only 2. will guarantee you do not end up with hour 27 or minute 83 ...
Examples:
R> data <- c("8/6/2018 18:56", "7/26/2018 10:43", "7/28/2018 8:36")
R> strsplit(data, " ")
[[1]]
[1] "8/6/2018" "18:56"
[[2]]
[1] "7/26/2018" "10:43"
[[3]]
[1] "7/28/2018" "8:36"
R>
And:
R> data <- c("8/6/2018 18:56", "7/26/2018 10:43", "7/28/2018 8:36")
R> df <- data.frame(data)
R> df$pt <- anytime::anytime(df$data) ## anytime package used
R> df$time <- format(df$pt, "%H:%M")
R> df$day <- format(df$pt, "%Y-%m-%d")
R> df
data pt time day
1 8/6/2018 18:56 2018-08-06 18:56:00 18:56 2018-08-06
2 7/26/2018 10:43 2018-07-26 10:43:00 10:43 2018-07-26
3 7/28/2018 8:36 2018-07-28 00:00:00 00:00 2018-07-28
R>
I often collect data in a data.frame (or data.table) and then add column by column.

Extract time from factor column in R

I would like to extract the time from a table column sd_data$start in R with the following characteristics:
str(sd_data$start)
Factor w/ 122 levels "01/03/2017 08:00",..: 1 2 5 10 12 14 18 19 20 21 ...
I found similar questions on the forum but so far all the answers have only given me NAs or blank values (00:00:00) so I see no other option than raise the question again specifically for my dataset.
I have managed to extract the dates and move them to a new column in the table with little effort and I am very surprised how difficult it is (for me at least) to do the same for hours, minutes and seconds. I must be overlooking something.
sd_data$start_date <- as.Date(sd_data$start,format='%d/%m/%Y')
sd_data$start_time <-
Thanks in advance for helping me to find the right lines of code to complete this task.
Here an example of what I am trying to do and where I am failing to get the time out.
smpldata <- "01/03/2017 08:00"
smpltime <-as.Date(as.character(smpldata),format='%d/%m/%Y %M:%S')
smpltime
# [1] 08:00 = what I would like to see
# [1] "2017-03-01" = what I am seeing
Maybe using as.character() to convert to character before convert to date, because the factor type is not well transformed. And including the other string elements on the date format as suggested above by Sotos.
sd_data$start_date <-
as.Date(as.character(sd_data$start),
format='%d/%m/%Y %H:%M:%S')
Another tip is to take a look at lubridate package. It's very usefull for this kind of task.
library(lubridate)
smpldata <- as.factor("01/03/2017 08:00")
(smpltime <-dmy_hm(as.character(smpldata)))
[1] "2017-03-01 08:00:00 UTC"
Here you still see the date. You can handle just the time for plots and other needs using hour() and minute().
hour(smpltime)
[1] 8
minute(smpltime)
[1] 0
Or you can use the format() function to get exactly what you want.
format(smpltime, "%H:%M:%S")
[1] "08:00:00"
format(smpltime, "%H:%M")
[1] "08:00"

R: Smaller than operator wrong with dates

I have a dataframe climdata with 1 of the columns containing hourly measure moments. The format is "2015-12-30 17:00:00".
With my current dataset, the range goes from "2015-01-01 00:00:00" to "2016-11-15 07:00:00" and I want to extract the dates from 2015*.
I used this code to extract 2015**:
climdata <- climdata[climdata$Meetdatum > paste(2015, "-01-01", sep = "") &
climdata$Meetdatum < paste(2015, "-12-31", sep = ""),]
The new climdata ranges from "2015-01-01 00:00:00" to 2015-12-30 22:00:00".
This is odd, because I asked for dates after 1/1/'15. However this is not a real problem, since I need that day anyway.
The problem is the last part. I need 31/12/'15, but when I use <= instead of < I get "2015-12-30 23:00:00" as last element. Which makes no sense at all. First of all, this element should've already been in the dataset when I used < and second, where did the equal to 31/12/'15 go?
*This process should later be automated: I'll have a bigger dataset and the year to be extracted will be chosen from a selectInput() in a shiny app.
**I used paste() because 2015 will be replaced by the inputId later on.

Extract numbers from string as numeric or dates in R

I am working with some hdf5 data sets. However, the dates are stored in the file and no hint of these dates from the file name. The attribute file consists of day of the year, month of the year, day of the month and year columns.
I would like to pull out data to create time series identity for each of the files i.e.year month date format that can be used for time series.
A sample of the data can be downloaded here:
[ ftp://l5eil01.larc.nasa.gov/tesl1l2l3/TES/TL3COD.003/2007.08.31/TES-Aura_L3-CO_r0000006311_F01_09.he5 ]
There is an attribute group file and a data group file.
I use the R library "rhdf5" to explore the hdf5 files. E.g
CO1<-h5ls ("TES-Aura_L3-CO_r0000006311_F01_09.he5")
Attr<-h5read("TES-Aura_L3-CO_r0000006311_F01_09.he5","HDFEOS INFORMATION/coremetadata")
Data<-h5read("TES-Aura_L3-CO_r0000006311_F01_09.he5", "HDFEOS\SWATHS\ColumnAmountNO2\Data Fields\ColumnAmountNO2Trop")
The Attr when read consist of a long string with the only required information being "2007-08-31" which is the date of acquisition. I have been able to extract this using the Stringr library:
regexp <- "([[:digit:]]{4})([-])([[:digit:]]{2})([-])([[:digit:]]{2})"
Date<-str_extract(Attr,pattern=regexp)
which returns the Date as:
"2007-08-31"
The only problem left now is that the Date isnt recognised as numeric or date. How do I change this as I need to bind the Date with the data for all days to create a time series (more like an identifier as the data sets are irregular), please? a sample of how it looks after extracting the dates from string and binding with the CO values for each date is below
Dates CO3b
[1,] "2011-03-01" 1.625811e+18
[2,] "2011-03-04" 1.655504e+18
[3,] "2011-03-11" 1.690428e+18
[4,] "2011-03-15" 1.679871e+18
[5,] "2011-03-17" 1.705987e+18
[6,] "2011-03-17" 1.661198e+18
[7,] "2011-03-17" 1.662694e+18
[8,] "2011-03-20" 1.520328e+18
[9,] "2011-03-21" 1.510642e+18
[10,] "2011-03-21" 1.556637e+18
However, R recognises these dates as character and not as date. I need to convert them to a time series I can work with.
Seems like you've already done all the hard work! Based off your comment, here's how you could take it across the finish line.
From your comment, seems like you have the strings in a good format. Given that your variable is named date, simply go
dateObjects<-as.Date(Date) #where Date is your variable
and either the single value or vector of character strings (as the format you gave in the comment) will now be date objects, which you could use with a library like zoo to create time series.
If your strings are not necessarily in the format you've described, then refer to the following link to see how to format other string forms as dates.
http://www.statmethods.net/input/dates.html
Given your example data frame you can create a time series in the following way, using the package zoo.
library(zoo)
datavect<-as.zoo(df$CO3b)
index(datavect)<-as.Date(df$Date)
here we take your CO data, covert it to a zoo object, then assign the appropriate date to each entry, converting it from a character to a date object. Now if you print datavect, you'll see each data entry attached to a date. This allows you to take advantage of zoo methods, such as merge and window.
Here is one approach not using string extraction. If you know how long your time series should be, which you should based on the length of your dataset and knowledge of its periodicity, you could just create a regular date series and then add that into a data.frame with other variables of interest. Assuming you have daily data the below would work. Obviously your length.out would be different.
d1 <- ISOdate(year=2007,month=8,day=31)
d2 <- as.Date(format(seq(from=d1,by="day",length.out=10),"%Y-%m-%d"))
[1] "2007-08-31" "2007-09-01" "2007-09-02" "2007-09-03" "2007-09-04" "2007-09-05" "2007-09-06" "2007-09-07" "2007-09-08" "2007-09-09"
class(d2)
[1] "Date"
Edit of Original:
Oh I see. Well after reading in your new data example the below worked for me. It was a pretty straight forward transform. cheers
library(magrittr) # Needed for the pipe operator %>% it makes it really easy to string steps together.
dateData
Dates CO3b
1 2011-03-01 1.63e+18
2 2011-03-04 1.66e+18
3 2011-03-11 1.69e+18
4 2011-03-15 1.68e+18
5 2011-03-17 1.71e+18
6 2011-03-17 1.66e+18
7 2011-03-17 1.66e+18
8 2011-03-20 1.52e+18
9 2011-03-21 1.51e+18
10 2011-03-21 1.56e+18
dateData %>% sapply(class) # classes before transforming (character,numeric)
dateData[,1] <- as.Date(dateData[,1]) # Transform to date
dateData %>% sapply(class) # classes after transforming (Date,numeric)
str(dateData) # one more check
'data.frame': 10 obs. of 2 variables:
$ Dates: Date, format: "2011-03-01" "2011-03-04" "2011-03-11" "2011-03-15" ...
$ CO3b : num 1.63e+18 1.66e+18 1.69e+18 1.68e+18 1.71e+18 ...

Resources