Convert date with Time Zone formats in R - r

I have my dates in the following format :- Wed Apr 25 2018 00:00:00 GMT-0700 (Pacific Standard Time) or 43167 or Fri May 18 2018 00:00:00 GMT-0700 (PDT) all mixed in 1 column. What would be the easiest way to convert all of these in a simple YYYY-mm-dd (2018-04-13) format? Here is the column:
dates <- c('Fri May 18 2018 00:00:00 GMT-0700 (PDT)',
'43203',
'Wed Apr 25 2018 00:00:00 GMT-0700 (Pacific Standard Time)',
'43167','43201',
'Fri May 18 2018 00:00:00 GMT-0700 (PDT)',
'Tue May 29 2018 00:00:00 GMT-0700 (Pacific Standard Time)',
'Tue May 01 2018 00:00:00 GMT-0700 (PDT)',
'Fri May 25 2018 00:00:00 GMT-0700 (Pacific Standard Time)',
'Fri Apr 06 2018 00:00:00 GMT-0700 (PDT)','43173')
Expected format:2018-05-18, 2018-04-13, 2018-04-25, ...

I believe similar questions have been asked several times before. However, there
is a crucial point which needs special attention:
What is the origin for the dates given as integer (or as character string which can be converted to integer to be exact)?
If the data is imported from the Windows version of Excel, origin = "1899-12-30" has to be used. For details, see the Example section in help(as.Date) and the Other Applications section of the R Help Desk article by Gabor Grothendieck and Thomas Petzoldt.
For conversion of the date time strings, the mdy_hms() function from the lubridate package is used. In addition, I am using data.table syntax for its conciseness:
library(data.table)
data.table(dates)[!dates %like% "^\\d+$", new_date := as.Date(lubridate::mdy_hms(dates))][
is.na(new_date), new_date := as.Date(as.integer(dates), origin = "1899-12-30")][]
dates new_date
1: Fri May 18 2018 00:00:00 GMT-0700 (PDT) 2018-05-18
2: 43203 2018-04-13
3: Wed Apr 25 2018 00:00:00 GMT-0700 (Pacific Standard Time) 2018-04-25
4: 43167 2018-03-08
5: 43201 2018-04-11
6: Fri May 18 2018 00:00:00 GMT-0700 (PDT) 2018-05-18
7: Tue May 29 2018 00:00:00 GMT-0700 (Pacific Standard Time) 2018-05-29
8: Tue May 01 2018 00:00:00 GMT-0700 (PDT) 2018-05-01
9: Fri May 25 2018 00:00:00 GMT-0700 (Pacific Standard Time) 2018-05-25
10: Fri Apr 06 2018 00:00:00 GMT-0700 (PDT) 2018-04-06
11: 43173 2018-03-14
Apparently, the assumption to choose the origin which belongs to the Windows version of Excel seems to hold.
If only a vector of Date values is required:
data.table(dates)[!dates %like% "^\\d+$", new_date := as.Date(lubridate::mdy_hms(dates))][
is.na(new_date), new_date := as.Date(as.integer(dates), origin = "1899-12-30")][, new_date]
[1] "2018-05-18" "2018-04-13" "2018-04-25" "2018-03-08" "2018-04-11" "2018-05-18"
[7] "2018-05-29" "2018-05-01" "2018-05-25" "2018-04-06" "2018-03-14"

Related

Formatting and cleaning up a long date comlumn in R

I have a date column that looks like this:
Dates
Sun Jan 30 04:00:35 UTC 2022
Thu Sep 02 20:21:52 UTC 2021
Tue Sep 20 14:41:17 UTC 2022
Thu Apr 08 16:19:21 UTC 2021
Wed Nov 03 16:20:45 UTC 2021
I was trying the following method but cannot figure out how to get ride of the Hour,minute, seconds and the two ":". In the end I just want to have the month (preferably in a 1-12 format), day and year.
mutate(last_login_date = gsub("UTC","",Dates),
last_login_date = substr(Dates,5,25))
It may be easier to automatically convert to Date class with parse_date from parsedate
library(parsedate)
df1$Dates <- as.Date(parse_date(df1$Dates))

Momentjs date '06/05/2021' with format 'M/D/YY' considered as Fri Jun 05 2020 00:00:00 GMT+0300.... Why and how handle that?

Momentjs date '06/05/2021' with format 'M/D/YY' considered as 'Fri Jun 05 2020 00:00:00 GMT+0300...'. Is it more logical that it have to be 2021. Is it a bug or no? Why does it convert dates this way and how handle that?

Moment.js is not showing right time

I'm getting wrong time from momentjs when I pass UTC date time I expect it to convert to 7:30pm in Melbourne
Ex:
var myUTCTime = moment("2020-12-02 09:30:00.0000000 +00:00").utc();
Wed Dec 02 2020 09:30:00 GMT+0000
var melbourne = moment("2020-12-02 09:30:00.0000000 +00:00").utc().tz("Australia/Melbourne");
Wed Dec 02 2020 20:30:00 GMT+1100
 
Expecting Melbourne to be Wed Dec 02 2020 19:30:00 GMT+1000
 
7:30 PM (19:30) Melbourne Time = 9:30 AM (9:30) UTC
Following url shows the graph
http://www.timebie.com/timezone/universalmelbourne.php
That's taking Daylight Savings into account (observed in month of December).
For other dates, it's just fine. I tried May.
moment("2020-05-02 09:30:00.000 +00:00").utc().toString()
"Sat May 02 2020 09:30:00 GMT+0000"
moment("2020-05-02 09:30:00.000 +00:00").utc().tz("Australia/Melbourne").toString();
"Sat May 02 2020 19:30:00 GMT+1000"

Converting UTC Time to Local Time with Days of Week and Date Included

I have the following 2 columns as part of a larger data frame. The Timezone_Offset is the difference in hours for the local time (US West Coast in the data I'm looking at). In other words, UTC + Offset = Local Time.
I'm looking to convert the UTC time to the local time, while also correctly changing the day of the week and date, if necessary. For instance, here are the first 5 rows of the two columns.
UTC Timezone_Offset
Sun Apr 08 02:42:03 +0000 2012 -7
Sun Jul 01 03:27:20 +0000 2012 -7
Wed Jul 11 04:40:18 +0000 2012 -7
Sat Nov 17 01:31:36 +0000 2012 -8
Sun Apr 08 20:50:30 +0000 2012 -7
Things get tricky when the day of the week and date also have to be changed. For instance, looking at the first row, the local time should be Sat Apr 07 19:42:03 +0000 2012. In the second row, the month also has to be changed.
Sorry, I'm fairly new to R. Could someone possibly explain how to do this? Thank you so much in advance.
Parse as UTC, then apply the offset in seconds, ie times 60*60 :
data <- read.csv(text="UTC, Timezone_Offset
Sun Apr 08 02:42:03 +0000 2012, -7
Sun Jul 01 03:27:20 +0000 2012, -7
Wed Jul 11 04:40:18 +0000 2012, -7
Sat Nov 17 01:31:36 +0000 2012, -8
Sun Apr 08 20:50:30 +0000 2012, -7", stringsAsFactors=FALSE)
data$pt <- as.POSIXct(strptime(data$UTC, "%a %b %d %H:%M:%S %z %Y", tz="UTC"))
data$local <- data$pt + data$Timezone_Offset*60*60
Result:
> data[,3:4]
pt local
1 2012-04-08 02:42:03 2012-04-07 19:42:03
2 2012-07-01 03:27:20 2012-06-30 20:27:20
3 2012-07-11 04:40:18 2012-07-10 21:40:18
4 2012-11-17 01:31:36 2012-11-16 17:31:36
5 2012-04-08 20:50:30 2012-04-08 13:50:30
>

How to read in .csv data, then create a subset of that data based on conditional filtering?

I'm new to R programming, although I have been programming a number of other languages for years. I'm having a hard time finding any relevant information on this simple problem through searching the R documentation and stack overflow etc., so some help would be very much appreciated.
Here's the problem:
After reading in data from a .csv, I need to create a new dataset that contains only those observations where the "value" field is between 0 and 100 inclusive (there are 4 fields and ~2500 rows of data). I have no problem reading in the data and displaying it. My problem is when I try to take the list of input data and filter it based on the range condition for the "value" column.
Here's my input:
#read in the data from the sensor file
data = read.csv("C:/Code/sensor.txt", header=TRUE)
for (i in seq(4, nrow(data), 4)) {
if (as.integer(data[i])>0) {
print(data[i])
}
}
I am getting the error output:
> for (i in seq(4, nrow(data), 4)) {
+ if (as.integer(data[i])>0) {
+ print(data[i])
+ }
+ }
Error: (list) object cannot be coerced to type 'integer'
EDIT:
Here is some sample data:
timestamp, siteid, sensorid, value
Thu Jan 07 00:00:00 PST 2016,1,1,24
Thu Jan 07 00:00:00 PST 2016,1,2,5
Thu Jan 07 00:00:00 PST 2016,1,3,60
Thu Jan 07 00:00:00 PST 2016,2,1,0
Thu Jan 07 00:00:00 PST 2016,2,2,5
Thu Jan 07 00:00:00 PST 2016,2,3,100
Thu Jan 07 00:00:00 PST 2016,3,1,36
Thu Jan 07 00:00:00 PST 2016,3,2,5
Thu Jan 07 00:00:00 PST 2016,3,3,38
Thu Jan 07 00:00:00 PST 2016,4,1,99
Thu Jan 07 00:00:00 PST 2016,4,2,5
Thu Jan 07 00:00:00 PST 2016,4,3,84
Thu Jan 07 00:15:00 PST 2016,1,1,#ERROR#
Thu Jan 07 00:15:00 PST 2016,1,2,5
Thu Jan 07 00:15:00 PST 2016,1,3,96
Thu Jan 07 00:15:00 PST 2016,2,1,28
Thu Jan 07 00:15:00 PST 2016,2,2,5
Thu Jan 07 00:15:00 PST 2016,2,3,94
Thu Jan 07 00:15:00 PST 2016,3,1,3
Thu Jan 07 00:15:00 PST 2016,3,2,5
Thu Jan 07 00:15:00 PST 2016,3,3,95
Thu Jan 07 00:15:00 PST 2016,4,1,72
Thu Jan 07 00:15:00 PST 2016,4,2,5
Thu Jan 07 00:15:00 PST 2016,4,3,21
Thu Jan 07 00:30:00 PST 2016,1,1,160
Thu Jan 07 00:30:00 PST 2016,1,2,5
Thu Jan 07 00:30:00 PST 2016,1,3,34
First of all, always try to give us some reproductible example of data
data.beetween0and100 <- data[data$column.with.values => 0 & data$column.with.values <=100,]
This how you get data with desired values. Also you data frames how to dimmension rows, and columns so data[i] is bad but, data[i,] is a i-row of data frame.
print(data[i,]) #will work
with you data
#read in the data from the sensor file
data = read.csv("C:/Code/sensor.txt", header=TRUE)
for (i in seq(4, nrow(data), 4)) {
if (as.integer(data[i,numberofvaluecolumn])>0) {
print(data[i,numberofvaluecolumn])
}
}
For starters, loops in R are usually pretty slow and should be used with caution. With a dataset of only 2,500 records it probably isn't an issue, but worth mentioning if you are going to start using larger datasets.
If you are going to be doing a lot of data manipulation I would recommend becoming familiar with the dplyr library, https://cran.r-project.org/web/packages/dplyr/dplyr.pdf. It makes data manipulation very quick and easy.
data<-data %>%
filter(values>0,values<100)
If the function as.integer is throwing an error then perhaps read.csv() hasn't read the values in a format which as.integer() can handle.
Use str(data) or head() and tail() to see what read.csv() is producing.#
Looking at your your example data, adding the argument
na.strings = "#ERROR#"
to read.csv() might solve the issue.

Resources