I have a dataset, with a column with automatically generated timestamps in the format.
head(tweets$V2)
[1] Fri Oct 30 18:33:50 +0000 2015 Fri Oct 30 18:33:51 +0000 2015 Fri Oct 30 18:33:52 +0000 2015
[4] Fri Oct 30 18:33:54 +0000 2015 Fri Oct 30 18:33:55 +0000 2015 Fri Oct 30 18:33:56 +0000 2015
I want to convert these to a POSIX type time-date format. Any pointers on how do I go about with this?
After converting these to a standard time format, I wanted to observe trends in the subjects of the tweets.
See ?strptime for more details:
tweets$V2 <- as.POSIXct(strptime(tweets$V2, "%a %b %d %H:%M:%S %z %Y"))
This will convert the strigs in POSIXct format with the default time zone of your system. If you want to specify a different timezone, include the tz argument.
a <- c("Fri Oct 30 18:33:50 +0000 2015")
as.POSIXct(strptime(a, "%a %b %d %H:%M:%S %z %Y"))
[1] "2015-10-30 14:33:50 EDT"
as.POSIXct(strptime(a, "%a %b %d %H:%M:%S %z %Y", tz = "GMT"))
[1] "2015-10-30 18:33:50 GMT"
Note: Convert the column with as.character if it is of class factor
Related
I have the string which is formatted as below:
Tue Feb 11 12:28:36 +0000 2014
I try to convert this string to timestamps in R by using:
timeobj <- strptime(df[1], format = "%a %b %e %H:%M:%S %z %Y", tz = "GMT")
where df[1] is in format of Tue Feb 11 12:28:36 +0000 2014
However, I got an error as below:
Error in strptime(df[1], format = "%a %b %e %H:%M:%S %z %Y", tz = "GMT") :
input string is too long
How can I fix this?
dput(df[ 1:5, 1]) =
c("Tue Feb 11 12:47:26 +0000 2014", "Tue Feb 11 12:55:09 +0000 2014", "Tue Feb 11 13:22:29 +0000 2014", "Tue Feb 11 13:24:31 +0000 2014", "Tue Feb 11 13:34:00 +0000 2014")
It looks like that your locale is not fitting the abbreviated weekday and month name.
x <- c("Tue Feb 11 12:47:26 +0000 2014",
"Tue Feb 11 12:55:09 +0000 2014", "Tue Feb 11 13:22:29 +0000 2014",
"Tue Feb 11 13:24:31 +0000 2014", "Tue Feb 11 13:34:00 +0000 2014")
Sys.setlocale("LC_ALL", "de_AT.UTF-8")
strptime(x, format = "%a %b %e %H:%M:%S %z %Y", tz = "GMT")
#[1] NA NA NA NA NA
Sys.setlocale("LC_ALL", "C")
strptime(x, format = "%a %b %e %H:%M:%S %z %Y", tz = "GMT")
#[1] "2014-02-11 12:47:26 GMT" "2014-02-11 12:55:09 GMT"
#[3] "2014-02-11 13:22:29 GMT" "2014-02-11 13:24:31 GMT"
#[5] "2014-02-11 13:34:00 GMT"
The manual of strptime says: '%a' Abbreviated weekday name in the current locale on this platform.
Also it looks like you are providing a data.frame with df[1] and not a vector which can probably provided with df[,1].
%T is enough for the time.
timeobj <- strptime(df[1], format = "%a %b %e %T %z %Y", tz = "GMT")
I have a dataset with about 20000 observations. I need to convert one of the columns to a different date format.
head(df$created_at)
[1] Tue Mar 31 13:42:58 +0000 2020 Sat Mar 14 05:15:56 +0000 2020
[3] Sun Apr 05 14:02:10 +0000 2020 Tue Mar 24 09:06:12 +0000 2020
[5] Tue Apr 28 01:14:28 +0000 2020 Thu Oct 24 18:47:10 +0000 2019
I can apply as.date to an individual row:
as.Date(df$created_at[1], format = '%a %b %d %H:%M:%S %z %Y')
[1] "2020-03-31
But when I try to use as.Date on the entire column, I get:
df$dates = as.Date(df$created_at, format = '%a %b %d %H:%M:%S %z %Y')
Error in strptime(x, format, tz = "GMT") : input string is too long
What am I doing wrong? Is there another command I'm missing here?
(Too long for a comment.)
It works fine for the data you've shown us. There must be something wrong later in your column. You could locate the problem by trying the command on subsets of your data, e.g. tmp <- as.Date(df[1:(round(nrow(df)/2)), "created_at", ...) - then bisect to find the problem, e.g. if the problem doesn't occur in the first half of the data set then try rows 1:(round(0.75*nrow(df))) and so on ...
You could also try plotting nchar(df$created_at) to see if anything pops out.
df <- data.frame(created_at=c(
"Tue Mar 31 13:42:58 +0000 2020 Sat Mar 14 05:15:56 +0000 2020",
"Sun Apr 05 14:02:10 +0000 2020 Tue Mar 24 09:06:12 +0000 2020",
"Tue Apr 28 01:14:28 +0000 2020 Thu Oct 24 18:47:10 +0000 2019"))
df$dates = as.Date(df$created_at, format = '%a %b %d %H:%M:%S %z %Y')
Absent issues with your data as alluded to by Ben, here is a solution using parse_date_time from the lubridate package which parses the date variable into POSIXct date-time.
df <- tibble(date = c("Tue Mar 31 13:42:58 +0000 2020",
"Sun Apr 05 14:02:10 +0000 2020",
"Tue Apr 28 01:14:28 +0000 2020",
"Sat Mar 14 05:15:56 +0000 2020",
"Tue Mar 24 09:06:12 +0000 2020",
"Thu Oct 24 18:47:10 +0000 2019"))
library(lubridate)
df$date <- parse_date_time(df$date, "%a %b %d %H:%M:%S %z %Y")
date
<dttm>
1 2020-03-31 13:42:58
2 2020-04-05 14:02:10
3 2020-04-28 01:14:28
4 2020-03-14 05:15:56
5 2020-03-24 09:06:12
6 2019-10-24 18:47:10
Created on 2020-11-13 by the reprex package (v0.3.0)
I have a csv file with rows filled with strings.
The string are time format which I want to read in R.
Tue Feb 10 12:18:39 +0000 2015
Tue Feb 10 12:19:56 +0000 2015
Tue Feb 10 12:19:57 +0000 2015
I know we use.
%a %b %d %x %z %Y
But how to actually write it in R?
I've Tried
strptime("file.csv"[ ],format="%a %b %d %x %z %Y")
Almost. You should read the csv into an object first, e.g. with read.csv(). Then you can
strptime(df$V1, "%a %b %d %H:%M:%S %z %Y")
[1] "2015-02-10 13:18:39" "2015-02-10 13:19:56" "2015-02-10 13:19:57"
Test data
df <- read.table(text = "Tue Feb 10 12:18:39 +0000 2015
Tue Feb 10 12:19:56 +0000 2015
Tue Feb 10 12:19:57 +0000 2015", sep = ";") # so it will not interpret space as separator
My file contains a list of timestamps:
Fri Feb 14 19:07:31 +0000 2014
Fri Feb 14 19:07:46 +0000 2014
Fri Feb 14 19:07:50 +0000 2014
Fri Feb 14 19:08:04 +0000 2014
and reading it into R using:
dataset <- read.csv(file="Data.csv")
and i then write R commands to enable R to detect the timestamps:
time <- strptime(dataset,format = "%a %b %d %H:%M:%S %z %Y", tz = "GMT")
but I'm constantly getting an error saying:
Error in strptime(dataset, format = "%a %b %d %H:%M:%S %z %Y") :
input string is too long
it was working well at first but after i added:
defaults write org.R-project.R force.LANG en_US.UTF-8
in my terminal to fix some preferences in R for mac os x, the timestamp command stopped working an keep producing the error i mentioned above.
This is your original Data. myDates as Character.
dtData<-data.frame(myDates=c( "Fri Feb 14 19:07:31 +0000 2014",
"Fri Feb 14 19:07:46 +0000 2014",
"Fri Feb 14 19:07:50 +0000 2014",
"Fri Feb 14 19:08:04 +0000 2014"))
> dtData
myDates
1 Fri Feb 14 19:07:31 +0000 2014
2 Fri Feb 14 19:07:46 +0000 2014
3 Fri Feb 14 19:07:50 +0000 2014
4 Fri Feb 14 19:08:04 +0000 2014
you need to select dtData$myDates column
time <- strptime(dtData$myDates,format = "%a %b %d %H:%M:%S %z %Y", tz = "GMT");time
[1] "2014-02-14 19:07:31 GMT" "2014-02-14 19:07:46 GMT"
[3] "2014-02-14 19:07:50 GMT" "2014-02-14 19:08:04 GMT"
The question is like the title:
The date date i got is in the format like "Sat Mar 17 11:27:57 +0000 2012".
How could i convert it into R's date data?
You just need to specify the correct format (as documented in strptime):
fmt <- "%a %b %d %H:%M:%S %z %Y"
# POSIXct
as.POSIXct("Sat Mar 17 11:27:57 +0000 2012", format=fmt, tz="UTC")
[1] "2012-03-17 11:27:57 UTC"
# POSIXlt
strptime("Sat Mar 17 11:27:57 +0000 2012", format=fmt, tz="UTC")
[1] "2012-03-17 16:27:57 UTC"