Input string too long error if using strptime in R - r

My file contains a list of timestamps:
Fri Feb 14 19:07:31 +0000 2014
Fri Feb 14 19:07:46 +0000 2014
Fri Feb 14 19:07:50 +0000 2014
Fri Feb 14 19:08:04 +0000 2014
and reading it into R using:
dataset <- read.csv(file="Data.csv")
and i then write R commands to enable R to detect the timestamps:
time <- strptime(dataset,format = "%a %b %d %H:%M:%S %z %Y", tz = "GMT")
but I'm constantly getting an error saying:
Error in strptime(dataset, format = "%a %b %d %H:%M:%S %z %Y") :
input string is too long
it was working well at first but after i added:
defaults write org.R-project.R force.LANG en_US.UTF-8
in my terminal to fix some preferences in R for mac os x, the timestamp command stopped working an keep producing the error i mentioned above.

This is your original Data. myDates as Character.
dtData<-data.frame(myDates=c( "Fri Feb 14 19:07:31 +0000 2014",
"Fri Feb 14 19:07:46 +0000 2014",
"Fri Feb 14 19:07:50 +0000 2014",
"Fri Feb 14 19:08:04 +0000 2014"))
> dtData
myDates
1 Fri Feb 14 19:07:31 +0000 2014
2 Fri Feb 14 19:07:46 +0000 2014
3 Fri Feb 14 19:07:50 +0000 2014
4 Fri Feb 14 19:08:04 +0000 2014
you need to select dtData$myDates column
time <- strptime(dtData$myDates,format = "%a %b %d %H:%M:%S %z %Y", tz = "GMT");time
[1] "2014-02-14 19:07:31 GMT" "2014-02-14 19:07:46 GMT"
[3] "2014-02-14 19:07:50 GMT" "2014-02-14 19:08:04 GMT"

Related

Convert string to timestamps in R

I have the string which is formatted as below:
Tue Feb 11 12:28:36 +0000 2014
I try to convert this string to timestamps in R by using:
timeobj <- strptime(df[1], format = "%a %b %e %H:%M:%S %z %Y", tz = "GMT")
where df[1] is in format of Tue Feb 11 12:28:36 +0000 2014
However, I got an error as below:
Error in strptime(df[1], format = "%a %b %e %H:%M:%S %z %Y", tz = "GMT") :
input string is too long
How can I fix this?
dput(df[ 1:5, 1]) =
c("Tue Feb 11 12:47:26 +0000 2014", "Tue Feb 11 12:55:09 +0000 2014", "Tue Feb 11 13:22:29 +0000 2014", "Tue Feb 11 13:24:31 +0000 2014", "Tue Feb 11 13:34:00 +0000 2014")
It looks like that your locale is not fitting the abbreviated weekday and month name.
x <- c("Tue Feb 11 12:47:26 +0000 2014",
"Tue Feb 11 12:55:09 +0000 2014", "Tue Feb 11 13:22:29 +0000 2014",
"Tue Feb 11 13:24:31 +0000 2014", "Tue Feb 11 13:34:00 +0000 2014")
Sys.setlocale("LC_ALL", "de_AT.UTF-8")
strptime(x, format = "%a %b %e %H:%M:%S %z %Y", tz = "GMT")
#[1] NA NA NA NA NA
Sys.setlocale("LC_ALL", "C")
strptime(x, format = "%a %b %e %H:%M:%S %z %Y", tz = "GMT")
#[1] "2014-02-11 12:47:26 GMT" "2014-02-11 12:55:09 GMT"
#[3] "2014-02-11 13:22:29 GMT" "2014-02-11 13:24:31 GMT"
#[5] "2014-02-11 13:34:00 GMT"
The manual of strptime says: '%a' Abbreviated weekday name in the current locale on this platform.
Also it looks like you are providing a data.frame with df[1] and not a vector which can probably provided with df[,1].
%T is enough for the time.
timeobj <- strptime(df[1], format = "%a %b %e %T %z %Y", tz = "GMT")

As.Date returns error when applied to column

I have a dataset with about 20000 observations. I need to convert one of the columns to a different date format.
head(df$created_at)
[1] Tue Mar 31 13:42:58 +0000 2020 Sat Mar 14 05:15:56 +0000 2020
[3] Sun Apr 05 14:02:10 +0000 2020 Tue Mar 24 09:06:12 +0000 2020
[5] Tue Apr 28 01:14:28 +0000 2020 Thu Oct 24 18:47:10 +0000 2019
I can apply as.date to an individual row:
as.Date(df$created_at[1], format = '%a %b %d %H:%M:%S %z %Y')
[1] "2020-03-31
But when I try to use as.Date on the entire column, I get:
df$dates = as.Date(df$created_at, format = '%a %b %d %H:%M:%S %z %Y')
Error in strptime(x, format, tz = "GMT") : input string is too long
What am I doing wrong? Is there another command I'm missing here?
(Too long for a comment.)
It works fine for the data you've shown us. There must be something wrong later in your column. You could locate the problem by trying the command on subsets of your data, e.g. tmp <- as.Date(df[1:(round(nrow(df)/2)), "created_at", ...) - then bisect to find the problem, e.g. if the problem doesn't occur in the first half of the data set then try rows 1:(round(0.75*nrow(df))) and so on ...
You could also try plotting nchar(df$created_at) to see if anything pops out.
df <- data.frame(created_at=c(
"Tue Mar 31 13:42:58 +0000 2020 Sat Mar 14 05:15:56 +0000 2020",
"Sun Apr 05 14:02:10 +0000 2020 Tue Mar 24 09:06:12 +0000 2020",
"Tue Apr 28 01:14:28 +0000 2020 Thu Oct 24 18:47:10 +0000 2019"))
df$dates = as.Date(df$created_at, format = '%a %b %d %H:%M:%S %z %Y')
Absent issues with your data as alluded to by Ben, here is a solution using parse_date_time from the lubridate package which parses the date variable into POSIXct date-time.
df <- tibble(date = c("Tue Mar 31 13:42:58 +0000 2020",
"Sun Apr 05 14:02:10 +0000 2020",
"Tue Apr 28 01:14:28 +0000 2020",
"Sat Mar 14 05:15:56 +0000 2020",
"Tue Mar 24 09:06:12 +0000 2020",
"Thu Oct 24 18:47:10 +0000 2019"))
library(lubridate)
df$date <- parse_date_time(df$date, "%a %b %d %H:%M:%S %z %Y")
date
<dttm>
1 2020-03-31 13:42:58
2 2020-04-05 14:02:10
3 2020-04-28 01:14:28
4 2020-03-14 05:15:56
5 2020-03-24 09:06:12
6 2019-10-24 18:47:10
Created on 2020-11-13 by the reprex package (v0.3.0)

Using strptime() for reading csv string into time format for R

I have a csv file with rows filled with strings.
The string are time format which I want to read in R.
Tue Feb 10 12:18:39 +0000 2015
Tue Feb 10 12:19:56 +0000 2015
Tue Feb 10 12:19:57 +0000 2015
I know we use.
%a %b %d %x %z %Y
But how to actually write it in R?
I've Tried
strptime("file.csv"[ ],format="%a %b %d %x %z %Y")
Almost. You should read the csv into an object first, e.g. with read.csv(). Then you can
strptime(df$V1, "%a %b %d %H:%M:%S %z %Y")
[1] "2015-02-10 13:18:39" "2015-02-10 13:19:56" "2015-02-10 13:19:57"
Test data
df <- read.table(text = "Tue Feb 10 12:18:39 +0000 2015
Tue Feb 10 12:19:56 +0000 2015
Tue Feb 10 12:19:57 +0000 2015", sep = ";") # so it will not interpret space as separator

Convert timestamp to a date-time format in R

I have a dataset, with a column with automatically generated timestamps in the format.
head(tweets$V2)
[1] Fri Oct 30 18:33:50 +0000 2015 Fri Oct 30 18:33:51 +0000 2015 Fri Oct 30 18:33:52 +0000 2015
[4] Fri Oct 30 18:33:54 +0000 2015 Fri Oct 30 18:33:55 +0000 2015 Fri Oct 30 18:33:56 +0000 2015
I want to convert these to a POSIX type time-date format. Any pointers on how do I go about with this?
After converting these to a standard time format, I wanted to observe trends in the subjects of the tweets.
See ?strptime for more details:
tweets$V2 <- as.POSIXct(strptime(tweets$V2, "%a %b %d %H:%M:%S %z %Y"))
This will convert the strigs in POSIXct format with the default time zone of your system. If you want to specify a different timezone, include the tz argument.
a <- c("Fri Oct 30 18:33:50 +0000 2015")
as.POSIXct(strptime(a, "%a %b %d %H:%M:%S %z %Y"))
[1] "2015-10-30 14:33:50 EDT"
as.POSIXct(strptime(a, "%a %b %d %H:%M:%S %z %Y", tz = "GMT"))
[1] "2015-10-30 18:33:50 GMT"
Note: Convert the column with as.character if it is of class factor

How to transform the date using R

Currently, I have a lot of data. Associated with the data, I also have dates. Unfortunately, the dates are in the following format (day (Monday-Sunday), month (January-December) date (1-31) Hour:Minute:Second timezone Year). I would like to convert this into just Month/Day(1-31)/Year. Following is the sample data.
created_data
Sat Jun 20 23:45:03 +0000 2015
Sat Jun 20 23:45:06 +0000 2015
Sat Jun 20 23:45:06 +0000 2015
Sat Jun 20 23:45:08 +0000 2015
Sat Jun 20 23:45:11 +0000 2015
Sat Jun 20 23:45:13 +0000 2015
Sat Jun 20 23:45:14 +0000 2015
Sat Jun 20 23:45:15 +0000 2015
This is currently in the form of a dataframe. The format in which I am trying to see the dataframe is the following:
Results
Jun 20 2015
Jun 20 2015
Jun 20 2015
Jun 20 2015
Jun 20 2015
Jun 20 2015
Jun 20 2015
Jun 20 2015
Following is the code that I have tried but the result was just NA
strptime(x = created_data, format = "%m/%d/%Y")
Result = NA
First you have to convert your character string to something that R knows how to deal with such as a POSIXct object.
Given your format you can do as.POSIXct(created_data), format="%a %b %d %X %z %Y")
Once it is in that format you can convert it back to a character string of the format you want using format such as...
format(as.POSIXct(created_data, format="%a %b %d %X %z %Y"), format = "%Y/%m/%d")
The following should work, assuming the datetimes are stored in a character vector.
library("stringr")
library("dplyr")
dates <- c("Sat Jun 20 23:45:03 +0000 2015",
"Sat Jun 20 23:45:06 +0000 2015",
"Sat Jun 20 23:45:06 +0000 2015",
"Sat Jun 20 23:45:08 +0000 2015",
"Sat Jun 20 23:45:11 +0000 2015",
"Sat Jun 20 23:45:13 +0000 2015",
"Sat Jun 20 23:45:14 +0000 2015",
"Sat Jun 20 23:45:15 +0000 2015")
str_split_fixed(dates, pattern = " ", n=6) %>%
as.data.frame() %>%
mutate(new.date = as.Date(paste(V2, V3, V6), format = "%b %d %Y"))
The basic idea being to split the string into its individual pieces using str_split_fixed(), then recombine the pieces in as.Date()
Just a base R solution without other packages.
x <- "Sat Jun 20 23:45:03 +0000 2015"
x1 <- format(strptime(x, "%a %b %d %H:%M:%S %z %Y", tz = "GMT"), "%b %d %Y")
x1
[1] "Jun 20 2015"

Resources