I have a character vector of dates that I would like convert into a different timezone.
pubdate
[1] "Fri, 10 Jul 2015 03:21:23 +0000" "Fri, 10 Jul 2015 03:04:55 +0000"
[3] "Thu, 09 Jul 2015 23:49:01 +0000" "Thu, 09 Jul 2015 23:30:37 +0000"
[5] "Thu, 09 Jul 2015 23:27:44 +0000" "Thu, 09 Jul 2015 23:16:46 +0000"
[7] "Thu, 09 Jul 2015 23:14:06 +0000" "Thu, 09 Jul 2015 23:10:20 +0000"
[9] "Thu, 09 Jul 2015 23:07:52 +0000" "Thu, 09 Jul 2015 22:37:41 +0000"
[11] "Thu, 09 Jul 2015 22:35:06 +0000"
I created a function to this.
temp <- as.matrix(0)
for (i in 1:length(pubdate)){
tmp_dta <- strptime(pubdate[[i]],format="%a, %d %b %Y %H:%M:%S", tz="GMT")
tmp_dta$hour <- tmp_dta$hour - 1
tmp_dta <- as.POSIXct(tmp_dta)
attributes(tmp_dta)$tzone <- "Asia/Manila"
temp[i] <- tmp_dta
}
however, when i tried to print temp data, it seems to return the number of seconds. Here
> temp
[1] 1436494883 1436493895 1436482141 1436481037 1436480864 1436480206 1436480046 1436479820
[9] 1436479672 1436477861 1436477706
May I know how I can change it to return as dates? example: "2015-07-10 10:21:23 PHT"
Thanks!
UPDATED: As suggested by Nicola below, I removed the looping and added his suggested code. Below code works:
tmp_dta <- strptime(pubdate,format="%a, %d %b %Y %H:%M:%S", tz="GMT")
x <- as.POSIXct(tmp_dta)
attributes(x)$tzone <- "Asia/Manila"
newpubdate <- x - 3600
You can do the whole operation in one single line:
pubdate <- c( "Fri, 10 Jul 2015 03:21:23 +0000" ,"Fri, 10 Jul 2015 03:04:55 +0000","Thu, 09 Jul 2015 23:49:01 +0000", "Thu, 09 Jul 2015 23:30:37 +0000","Thu, 09 Jul 2015 23:27:44 +0000", "Thu, 09 Jul 2015 23:16:46 +0000","Thu, 09 Jul 2015 23:14:06 +0000","Thu, 09 Jul 2015 23:10:20 +0000", "Thu, 09 Jul 2015 23:07:52 +0000", "Thu, 09 Jul 2015 22:37:41 +0000","Thu, 09 Jul 2015 22:35:06 +0000")
strptime(pubdate,"%a, %d %b %Y %H:%M:%S %z",tz="Asia/Manila")
# [1] "2015-07-10 11:21:23 PHT" "2015-07-10 11:04:55 PHT" "2015-07-10 07:49:01 PHT" "2015-07-10 07:30:37 PHT" "2015-07-10 07:27:44 PHT" "2015-07-10 07:16:46 PHT" "2015-07-10 07:14:06 PHT" "2015-07-10 07:10:20 PHT"
# [9] "2015-07-10 07:07:52 PHT" "2015-07-10 06:37:41 PHT" "2015-07-10 06:35:06 PHT"
If you think there is a one-hour discrepancy (which I don't think there is: have you thought of daylight-saving time?), then, as #nicola suggested:
strptime(pubdate,"%a, %d %b %Y %H:%M:%S %z",tz="Asia/Manila") - as.difftime(1,unit="hours")
# [1] "2015-07-10 10:21:23 PHT" "2015-07-10 10:04:55 PHT" "2015-07-10 06:49:01 PHT" "2015-07-10 06:30:37 PHT" "2015-07-10 06:27:44 PHT" "2015-07-10 06:16:46 PHT" "2015-07-10 06:14:06 PHT" "2015-07-10 06:10:20 PHT"
# [9] "2015-07-10 06:07:52 PHT" "2015-07-10 05:37:41 PHT" "2015-07-10 05:35:06 PHT"
Related
I have a dataset with about 20000 observations. I need to convert one of the columns to a different date format.
head(df$created_at)
[1] Tue Mar 31 13:42:58 +0000 2020 Sat Mar 14 05:15:56 +0000 2020
[3] Sun Apr 05 14:02:10 +0000 2020 Tue Mar 24 09:06:12 +0000 2020
[5] Tue Apr 28 01:14:28 +0000 2020 Thu Oct 24 18:47:10 +0000 2019
I can apply as.date to an individual row:
as.Date(df$created_at[1], format = '%a %b %d %H:%M:%S %z %Y')
[1] "2020-03-31
But when I try to use as.Date on the entire column, I get:
df$dates = as.Date(df$created_at, format = '%a %b %d %H:%M:%S %z %Y')
Error in strptime(x, format, tz = "GMT") : input string is too long
What am I doing wrong? Is there another command I'm missing here?
(Too long for a comment.)
It works fine for the data you've shown us. There must be something wrong later in your column. You could locate the problem by trying the command on subsets of your data, e.g. tmp <- as.Date(df[1:(round(nrow(df)/2)), "created_at", ...) - then bisect to find the problem, e.g. if the problem doesn't occur in the first half of the data set then try rows 1:(round(0.75*nrow(df))) and so on ...
You could also try plotting nchar(df$created_at) to see if anything pops out.
df <- data.frame(created_at=c(
"Tue Mar 31 13:42:58 +0000 2020 Sat Mar 14 05:15:56 +0000 2020",
"Sun Apr 05 14:02:10 +0000 2020 Tue Mar 24 09:06:12 +0000 2020",
"Tue Apr 28 01:14:28 +0000 2020 Thu Oct 24 18:47:10 +0000 2019"))
df$dates = as.Date(df$created_at, format = '%a %b %d %H:%M:%S %z %Y')
Absent issues with your data as alluded to by Ben, here is a solution using parse_date_time from the lubridate package which parses the date variable into POSIXct date-time.
df <- tibble(date = c("Tue Mar 31 13:42:58 +0000 2020",
"Sun Apr 05 14:02:10 +0000 2020",
"Tue Apr 28 01:14:28 +0000 2020",
"Sat Mar 14 05:15:56 +0000 2020",
"Tue Mar 24 09:06:12 +0000 2020",
"Thu Oct 24 18:47:10 +0000 2019"))
library(lubridate)
df$date <- parse_date_time(df$date, "%a %b %d %H:%M:%S %z %Y")
date
<dttm>
1 2020-03-31 13:42:58
2 2020-04-05 14:02:10
3 2020-04-28 01:14:28
4 2020-03-14 05:15:56
5 2020-03-24 09:06:12
6 2019-10-24 18:47:10
Created on 2020-11-13 by the reprex package (v0.3.0)
My file contains a list of timestamps:
Fri Feb 14 19:07:31 +0000 2014
Fri Feb 14 19:07:46 +0000 2014
Fri Feb 14 19:07:50 +0000 2014
Fri Feb 14 19:08:04 +0000 2014
and reading it into R using:
dataset <- read.csv(file="Data.csv")
and i then write R commands to enable R to detect the timestamps:
time <- strptime(dataset,format = "%a %b %d %H:%M:%S %z %Y", tz = "GMT")
but I'm constantly getting an error saying:
Error in strptime(dataset, format = "%a %b %d %H:%M:%S %z %Y") :
input string is too long
it was working well at first but after i added:
defaults write org.R-project.R force.LANG en_US.UTF-8
in my terminal to fix some preferences in R for mac os x, the timestamp command stopped working an keep producing the error i mentioned above.
This is your original Data. myDates as Character.
dtData<-data.frame(myDates=c( "Fri Feb 14 19:07:31 +0000 2014",
"Fri Feb 14 19:07:46 +0000 2014",
"Fri Feb 14 19:07:50 +0000 2014",
"Fri Feb 14 19:08:04 +0000 2014"))
> dtData
myDates
1 Fri Feb 14 19:07:31 +0000 2014
2 Fri Feb 14 19:07:46 +0000 2014
3 Fri Feb 14 19:07:50 +0000 2014
4 Fri Feb 14 19:08:04 +0000 2014
you need to select dtData$myDates column
time <- strptime(dtData$myDates,format = "%a %b %d %H:%M:%S %z %Y", tz = "GMT");time
[1] "2014-02-14 19:07:31 GMT" "2014-02-14 19:07:46 GMT"
[3] "2014-02-14 19:07:50 GMT" "2014-02-14 19:08:04 GMT"
I have time-stamps in one column of my dataframe. They look like
"Tue May 14 21:57:04 +0000 2013"
I want to replace the whole timestamp with only month name. How can I do it in R? Lets say the column name is "timestamp" and dataframe name is "Df".
Below is the sample of some more entries.
"Wed Jul 10 01:30:36 +0000 2013"
"Fri Apr 20 01:46:59 +0000 2012"
"Sat Jul 07 17:56:34 +0000 2012"
"Sat Mar 16 02:12:30 +0000 2013"
"Sat Feb 16 02:29:11 +0000 2013"
I want these to look like
Jul
Apr
Jul
Mar
Feb
Your help will be highly appreciated.
Assign the source data using Akrun's string
R> dates <- c("Tue May 14 21:57:04 +0000 2013", "Wed Jul 10 01:30:36 +0000 2013",
"Fri Apr 20 01:46:59 +0000 2012", "Sat Jul 07 17:56:34 +0000 2012",
"Sat Mar 16 02:12:30 +0000 2013", "Sat Feb 16 02:29:11 +0000 2013")
R> dates
[1] "Tue May 14 21:57:04 +0000 2013"
[2] "Wed Jul 10 01:30:36 +0000 2013"
[3] "Fri Apr 20 01:46:59 +0000 2012"
[4] "Sat Jul 07 17:56:34 +0000 2012"
[5] "Sat Mar 16 02:12:30 +0000 2013"
[6] "Sat Feb 16 02:29:11 +0000 2013"
R>
Parse using the appropriate strptime format:
R> pt <- strptime(dates, "%a %b %d %H:%M:%S +0000 %Y")
R> pt
[1] "2013-05-14 21:57:04 CDT" "2013-07-10 01:30:36 CDT"
[3] "2012-04-20 01:46:59 CDT" "2012-07-07 17:56:34 CDT"
[5] "2013-03-16 02:12:30 CDT" "2013-02-16 02:29:11 CST"
R>
Re-format just the desired month
R> strftime(pt, "%m")
[1] "05" "07" "04" "07" "03" "02"
R> strftime(pt, "%b")
[1] "May" "Jul" "Apr" "Jul" "Mar" "Feb"
R> strftime(pt, "%B")
[1] "May" "July" "April" "July" "March"
[6] "February"
R>
You can use strptime along with format.
Assuming you have characters, we can first convert it into "POSIXlt" "POSIXt" format and then extracting the month (%b) part of it
format(strptime(x, "%a %b %d %H:%M:%S +0000 %Y"), "%b")
#[1] "Jul" "Apr" "Jul" "Mar" "Feb"
We can use sub. Match one or more non-white space characters(\\S+) followed by one or more white space (\\s+), then capture the non-white space as a group ((\\S+)) followed by characters until the end of the string and replace it with the backreference (\\1) for the captured group.
sub("\\S+\\s+(\\S+).*", "\\1", v1)
#[1] "May" "Jul" "Apr" "Jul" "Mar" "Feb"
It may be better to use DateTime conversions (as #DirkEddelbuettel mentioned in the comments) if we know how to get the format correct.
data
v1 <- c("Tue May 14 21:57:04 +0000 2013", "Wed Jul 10 01:30:36 +0000 2013",
"Fri Apr 20 01:46:59 +0000 2012", "Sat Jul 07 17:56:34 +0000 2012",
"Sat Mar 16 02:12:30 +0000 2013", "Sat Feb 16 02:29:11 +0000 2013")
Assuming your timestamp is text:
df<-data.frame(timestamp=c("Tue May 14 21:57:04 +0000 2013",
"Fri Apr 20 01:46:59 +0000 2012",
"Sat Mar 16 02:12:30 +0000 2013"),stringsAsFactors = F)
df$month<-sapply(df$timestamp,function(sx)strsplit(sx,split=" ")[[1]][2])
df
> df
timestamp month
1 Tue May 14 21:57:04 +0000 2013 May
2 Fri Apr 20 01:46:59 +0000 2012 Apr
3 Sat Mar 16 02:12:30 +0000 2013 Mar
1) The month name is always in character positions 5 through 7 inclusive of the timestamp column so this replaces the timestampcolumn with a character solumn of months:
transform(DF, timestamp = format(substr(timestamp, 5, 7)))
The output is:
timestamp
1 Jul
2 Apr
3 Jul
4 Mar
5 Feb
2) If you wanted a factor column instead then use this variation which ensures that the factor levels are Jan=1, Feb=2, etc. rather than being assigned alphabetically:
transform(DF, timestamp = factor(substr(timestamp, 5, 7), levels = month.abb))
Note: We have assumed input in the following reproducible form:
DF <- data.frame(timestamp = c("Fri Apr 20 01:46:59 +0000 2012",
"Sat Feb 16 02:29:11 +0000 2013", "Sat Jul 07 17:56:34 +0000 2012",
"Sat Mar 16 02:12:30 +0000 2013", "Wed Jul 10 01:30:36 +0000 2013"))
I've downloaded tweets in json format, converted it into csv, and read it into R. The existing time stamps are in factor format as shown below. How should I convert it into a timestamp that can be plotted against?
[1] Fri May 09 07:55:12 +0000 2014 Fri May 09 07:55:12 +0000 2014 Fri May 09 07:55:12 +0000 2014
[4] Fri May 09 07:55:12 +0000 2014 Fri May 09 07:55:12 +0000 2014 Fri May 09 07:55:12 +0000 2014
516 Levels: Fri May 09 07:55:12 +0000 2014 ... Fri May 09 09:15:07 +0000 2014
I think your question already answered => Convert Twitter Timestamp in R
But if you want to more simple you can use twitteR library.
> tweets <- userTimeline("BarackObama",n=100)
> df <- do.call("rbind",lapply(tweets, as.data.frame))
> names(df)
[1] "text" "favorited" "favoriteCount" "replyToSN" "created" "truncated"
[7] "replyToSID" "id" "replyToUID" "statusSource" "screenName" "retweetCount"
[13] "isRetweet" "retweeted" "longitude" "latitude"
we can plot directly the created status date
You can remove the unnecessary parts of the string before applying as.POSIXct. This can be done with gsub:
x <- as.factor(c("Fri May 09 07:55:12 +0000 2014",
"Fri May 09 07:55:12 +0000 2014"))
as.POSIXct(gsub("^.+? | \\+\\d{4}","", x),
format = "%b %d %X %Y")
# [1] "2014-05-09 07:55:12 CEST" "2014-05-09 07:55:12 CEST"
Here's some of my data, read in from a file names AttReport_all:
Registration.Date Join.Time Leave.Time
1 Jul 05, 2011 09:30 PM EDT Jul 07, 2011 01:05 PM EDT Jul 07, 2011 01:53 PM EDT
2 Jul 05, 2011 10:20 AM EDT Jul 07, 2011 01:04 PM EDT Jul 07, 2011 01:53 PM EDT
3 Jul 04, 2011 02:41 PM EDT Jul 07, 2011 12:49 PM EDT Jul 07, 2011 01:53 PM EDT
4 Jul 04, 2011 11:38 PM EDT Jul 07, 2011 12:49 PM EDT Jul 07, 2011 01:54 PM EDT
5 Jul 05, 2011 11:41 AM EDT Jul 07, 2011 12:54 PM EDT Jul 07, 2011 01:54 PM EDT
6 Jul 07, 2011 11:08 AM EDT Jul 07, 2011 01:16 PM EDT Jul 07, 2011 01:53 PM EDT
If I do strptime(AttReport_all$Registration.Date, "%b %m, %Y %H:%M %p", tz="") I get an array of NAs where I'm expecting dates.
Sys.setlocale("LC_TIME", "C") returns "C"
typeof(AttReport_all$Registration.Date) returns "integer"
is.factor(AttReport_all$Registration.Date) returns TRUE.
What am I missing?
Here's version output, if it helps:
platform i386-pc-mingw32
arch i386
os mingw32
system i386, mingw32
status
major 2
minor 13.0
year 2011
month 04
day 13
svn rev 55427
language R
version.string R version 2.13.0 (2011-04-13)
strptime automatically runs as.character on the first argument (so it doesn't matter that it's a factor) and any trailing characters not specified in format= are ignored (so "EDT" doesn't matter).
The only issues are the typo #Ben Bolker identified (%m should be %d) and %H should be %I (?strptime says you should not use %H with %p).
# %b and %m are both *month* formats
strptime("Jul 05, 2011 09:30 PM EDT", "%b %m, %Y %H:%M %p", tz="")
# [1] NA
# change %m to %d and we no longer get NA, but the time is wrong (AM, not PM)
strptime("Jul 05, 2011 09:30 PM EDT", "%b %d, %Y %H:%M %p", tz="")
# [1] "2011-07-05 09:30:00"
# use %I (not %H) with %p
strptime("Jul 05, 2011 09:30 PM EDT", "%b %d, %Y %I:%M %p", tz="")
# [1] "2011-07-05 21:30:00"