I have time-stamps in one column of my dataframe. They look like
"Tue May 14 21:57:04 +0000 2013"
I want to replace the whole timestamp with only month name. How can I do it in R? Lets say the column name is "timestamp" and dataframe name is "Df".
Below is the sample of some more entries.
"Wed Jul 10 01:30:36 +0000 2013"
"Fri Apr 20 01:46:59 +0000 2012"
"Sat Jul 07 17:56:34 +0000 2012"
"Sat Mar 16 02:12:30 +0000 2013"
"Sat Feb 16 02:29:11 +0000 2013"
I want these to look like
Jul
Apr
Jul
Mar
Feb
Your help will be highly appreciated.
Assign the source data using Akrun's string
R> dates <- c("Tue May 14 21:57:04 +0000 2013", "Wed Jul 10 01:30:36 +0000 2013",
"Fri Apr 20 01:46:59 +0000 2012", "Sat Jul 07 17:56:34 +0000 2012",
"Sat Mar 16 02:12:30 +0000 2013", "Sat Feb 16 02:29:11 +0000 2013")
R> dates
[1] "Tue May 14 21:57:04 +0000 2013"
[2] "Wed Jul 10 01:30:36 +0000 2013"
[3] "Fri Apr 20 01:46:59 +0000 2012"
[4] "Sat Jul 07 17:56:34 +0000 2012"
[5] "Sat Mar 16 02:12:30 +0000 2013"
[6] "Sat Feb 16 02:29:11 +0000 2013"
R>
Parse using the appropriate strptime format:
R> pt <- strptime(dates, "%a %b %d %H:%M:%S +0000 %Y")
R> pt
[1] "2013-05-14 21:57:04 CDT" "2013-07-10 01:30:36 CDT"
[3] "2012-04-20 01:46:59 CDT" "2012-07-07 17:56:34 CDT"
[5] "2013-03-16 02:12:30 CDT" "2013-02-16 02:29:11 CST"
R>
Re-format just the desired month
R> strftime(pt, "%m")
[1] "05" "07" "04" "07" "03" "02"
R> strftime(pt, "%b")
[1] "May" "Jul" "Apr" "Jul" "Mar" "Feb"
R> strftime(pt, "%B")
[1] "May" "July" "April" "July" "March"
[6] "February"
R>
You can use strptime along with format.
Assuming you have characters, we can first convert it into "POSIXlt" "POSIXt" format and then extracting the month (%b) part of it
format(strptime(x, "%a %b %d %H:%M:%S +0000 %Y"), "%b")
#[1] "Jul" "Apr" "Jul" "Mar" "Feb"
We can use sub. Match one or more non-white space characters(\\S+) followed by one or more white space (\\s+), then capture the non-white space as a group ((\\S+)) followed by characters until the end of the string and replace it with the backreference (\\1) for the captured group.
sub("\\S+\\s+(\\S+).*", "\\1", v1)
#[1] "May" "Jul" "Apr" "Jul" "Mar" "Feb"
It may be better to use DateTime conversions (as #DirkEddelbuettel mentioned in the comments) if we know how to get the format correct.
data
v1 <- c("Tue May 14 21:57:04 +0000 2013", "Wed Jul 10 01:30:36 +0000 2013",
"Fri Apr 20 01:46:59 +0000 2012", "Sat Jul 07 17:56:34 +0000 2012",
"Sat Mar 16 02:12:30 +0000 2013", "Sat Feb 16 02:29:11 +0000 2013")
Assuming your timestamp is text:
df<-data.frame(timestamp=c("Tue May 14 21:57:04 +0000 2013",
"Fri Apr 20 01:46:59 +0000 2012",
"Sat Mar 16 02:12:30 +0000 2013"),stringsAsFactors = F)
df$month<-sapply(df$timestamp,function(sx)strsplit(sx,split=" ")[[1]][2])
df
> df
timestamp month
1 Tue May 14 21:57:04 +0000 2013 May
2 Fri Apr 20 01:46:59 +0000 2012 Apr
3 Sat Mar 16 02:12:30 +0000 2013 Mar
1) The month name is always in character positions 5 through 7 inclusive of the timestamp column so this replaces the timestampcolumn with a character solumn of months:
transform(DF, timestamp = format(substr(timestamp, 5, 7)))
The output is:
timestamp
1 Jul
2 Apr
3 Jul
4 Mar
5 Feb
2) If you wanted a factor column instead then use this variation which ensures that the factor levels are Jan=1, Feb=2, etc. rather than being assigned alphabetically:
transform(DF, timestamp = factor(substr(timestamp, 5, 7), levels = month.abb))
Note: We have assumed input in the following reproducible form:
DF <- data.frame(timestamp = c("Fri Apr 20 01:46:59 +0000 2012",
"Sat Feb 16 02:29:11 +0000 2013", "Sat Jul 07 17:56:34 +0000 2012",
"Sat Mar 16 02:12:30 +0000 2013", "Wed Jul 10 01:30:36 +0000 2013"))
Related
I am trying to convert these dates in created_at column to the number of seconds column created_at_dt using POSIXct.
created_at
<chr>
Fri May 26 17:30:01 +0000 2017
Fri May 26 17:30:05 +0000 2017
Fri May 26 17:30:05 +0000 2017
Fri May 26 17:30:04 +0000 2017
Fri May 26 17:30:12 +0000 2017
Example of what i want to achieve:
created_at_dt
<dbl>
1495819801
1495819805
1495819805
1495819804
1495819812
I tried the following line but got only NA values introduced.
tweets <- tweets %>%
mutate(created_at_dt = asPOSIXct(as.numeric('created_at')))
Any help would be much appreciated. Thank you!
You just need to specify the correct format string for as.POSIXct.
Also, created_at should not be in quotes for mutate().
library(dplyr)
tweets <- tweets %>%
mutate(created_at_dt = as.POSIXct(created_at,
format = "%a %B %d %H:%M:%S %z %Y") %>%
as.numeric())
Result:
created_at created_at_dt
1 Fri May 26 17:30:01 +0000 2017 1495819801
2 Fri May 26 17:30:05 +0000 2017 1495819805
3 Fri May 26 17:30:05 +0000 2017 1495819805
4 Fri May 26 17:30:04 +0000 2017 1495819804
5 Fri May 26 17:30:12 +0000 2017 1495819812
The data:
tweets <- structure(list(created_at = c("Fri May 26 17:30:01 +0000 2017",
"Fri May 26 17:30:05 +0000 2017", "Fri May 26 17:30:05 +0000 2017",
"Fri May 26 17:30:04 +0000 2017", "Fri May 26 17:30:12 +0000 2017"
)), class = "data.frame", row.names = c(NA, -5L))
I have a dataset with about 20000 observations. I need to convert one of the columns to a different date format.
head(df$created_at)
[1] Tue Mar 31 13:42:58 +0000 2020 Sat Mar 14 05:15:56 +0000 2020
[3] Sun Apr 05 14:02:10 +0000 2020 Tue Mar 24 09:06:12 +0000 2020
[5] Tue Apr 28 01:14:28 +0000 2020 Thu Oct 24 18:47:10 +0000 2019
I can apply as.date to an individual row:
as.Date(df$created_at[1], format = '%a %b %d %H:%M:%S %z %Y')
[1] "2020-03-31
But when I try to use as.Date on the entire column, I get:
df$dates = as.Date(df$created_at, format = '%a %b %d %H:%M:%S %z %Y')
Error in strptime(x, format, tz = "GMT") : input string is too long
What am I doing wrong? Is there another command I'm missing here?
(Too long for a comment.)
It works fine for the data you've shown us. There must be something wrong later in your column. You could locate the problem by trying the command on subsets of your data, e.g. tmp <- as.Date(df[1:(round(nrow(df)/2)), "created_at", ...) - then bisect to find the problem, e.g. if the problem doesn't occur in the first half of the data set then try rows 1:(round(0.75*nrow(df))) and so on ...
You could also try plotting nchar(df$created_at) to see if anything pops out.
df <- data.frame(created_at=c(
"Tue Mar 31 13:42:58 +0000 2020 Sat Mar 14 05:15:56 +0000 2020",
"Sun Apr 05 14:02:10 +0000 2020 Tue Mar 24 09:06:12 +0000 2020",
"Tue Apr 28 01:14:28 +0000 2020 Thu Oct 24 18:47:10 +0000 2019"))
df$dates = as.Date(df$created_at, format = '%a %b %d %H:%M:%S %z %Y')
Absent issues with your data as alluded to by Ben, here is a solution using parse_date_time from the lubridate package which parses the date variable into POSIXct date-time.
df <- tibble(date = c("Tue Mar 31 13:42:58 +0000 2020",
"Sun Apr 05 14:02:10 +0000 2020",
"Tue Apr 28 01:14:28 +0000 2020",
"Sat Mar 14 05:15:56 +0000 2020",
"Tue Mar 24 09:06:12 +0000 2020",
"Thu Oct 24 18:47:10 +0000 2019"))
library(lubridate)
df$date <- parse_date_time(df$date, "%a %b %d %H:%M:%S %z %Y")
date
<dttm>
1 2020-03-31 13:42:58
2 2020-04-05 14:02:10
3 2020-04-28 01:14:28
4 2020-03-14 05:15:56
5 2020-03-24 09:06:12
6 2019-10-24 18:47:10
Created on 2020-11-13 by the reprex package (v0.3.0)
My file contains a list of timestamps:
Fri Feb 14 19:07:31 +0000 2014
Fri Feb 14 19:07:46 +0000 2014
Fri Feb 14 19:07:50 +0000 2014
Fri Feb 14 19:08:04 +0000 2014
and reading it into R using:
dataset <- read.csv(file="Data.csv")
and i then write R commands to enable R to detect the timestamps:
time <- strptime(dataset,format = "%a %b %d %H:%M:%S %z %Y", tz = "GMT")
but I'm constantly getting an error saying:
Error in strptime(dataset, format = "%a %b %d %H:%M:%S %z %Y") :
input string is too long
it was working well at first but after i added:
defaults write org.R-project.R force.LANG en_US.UTF-8
in my terminal to fix some preferences in R for mac os x, the timestamp command stopped working an keep producing the error i mentioned above.
This is your original Data. myDates as Character.
dtData<-data.frame(myDates=c( "Fri Feb 14 19:07:31 +0000 2014",
"Fri Feb 14 19:07:46 +0000 2014",
"Fri Feb 14 19:07:50 +0000 2014",
"Fri Feb 14 19:08:04 +0000 2014"))
> dtData
myDates
1 Fri Feb 14 19:07:31 +0000 2014
2 Fri Feb 14 19:07:46 +0000 2014
3 Fri Feb 14 19:07:50 +0000 2014
4 Fri Feb 14 19:08:04 +0000 2014
you need to select dtData$myDates column
time <- strptime(dtData$myDates,format = "%a %b %d %H:%M:%S %z %Y", tz = "GMT");time
[1] "2014-02-14 19:07:31 GMT" "2014-02-14 19:07:46 GMT"
[3] "2014-02-14 19:07:50 GMT" "2014-02-14 19:08:04 GMT"
Currently, I have a lot of data. Associated with the data, I also have dates. Unfortunately, the dates are in the following format (day (Monday-Sunday), month (January-December) date (1-31) Hour:Minute:Second timezone Year). I would like to convert this into just Month/Day(1-31)/Year. Following is the sample data.
created_data
Sat Jun 20 23:45:03 +0000 2015
Sat Jun 20 23:45:06 +0000 2015
Sat Jun 20 23:45:06 +0000 2015
Sat Jun 20 23:45:08 +0000 2015
Sat Jun 20 23:45:11 +0000 2015
Sat Jun 20 23:45:13 +0000 2015
Sat Jun 20 23:45:14 +0000 2015
Sat Jun 20 23:45:15 +0000 2015
This is currently in the form of a dataframe. The format in which I am trying to see the dataframe is the following:
Results
Jun 20 2015
Jun 20 2015
Jun 20 2015
Jun 20 2015
Jun 20 2015
Jun 20 2015
Jun 20 2015
Jun 20 2015
Following is the code that I have tried but the result was just NA
strptime(x = created_data, format = "%m/%d/%Y")
Result = NA
First you have to convert your character string to something that R knows how to deal with such as a POSIXct object.
Given your format you can do as.POSIXct(created_data), format="%a %b %d %X %z %Y")
Once it is in that format you can convert it back to a character string of the format you want using format such as...
format(as.POSIXct(created_data, format="%a %b %d %X %z %Y"), format = "%Y/%m/%d")
The following should work, assuming the datetimes are stored in a character vector.
library("stringr")
library("dplyr")
dates <- c("Sat Jun 20 23:45:03 +0000 2015",
"Sat Jun 20 23:45:06 +0000 2015",
"Sat Jun 20 23:45:06 +0000 2015",
"Sat Jun 20 23:45:08 +0000 2015",
"Sat Jun 20 23:45:11 +0000 2015",
"Sat Jun 20 23:45:13 +0000 2015",
"Sat Jun 20 23:45:14 +0000 2015",
"Sat Jun 20 23:45:15 +0000 2015")
str_split_fixed(dates, pattern = " ", n=6) %>%
as.data.frame() %>%
mutate(new.date = as.Date(paste(V2, V3, V6), format = "%b %d %Y"))
The basic idea being to split the string into its individual pieces using str_split_fixed(), then recombine the pieces in as.Date()
Just a base R solution without other packages.
x <- "Sat Jun 20 23:45:03 +0000 2015"
x1 <- format(strptime(x, "%a %b %d %H:%M:%S %z %Y", tz = "GMT"), "%b %d %Y")
x1
[1] "Jun 20 2015"
I've downloaded tweets in json format, converted it into csv, and read it into R. The existing time stamps are in factor format as shown below. How should I convert it into a timestamp that can be plotted against?
[1] Fri May 09 07:55:12 +0000 2014 Fri May 09 07:55:12 +0000 2014 Fri May 09 07:55:12 +0000 2014
[4] Fri May 09 07:55:12 +0000 2014 Fri May 09 07:55:12 +0000 2014 Fri May 09 07:55:12 +0000 2014
516 Levels: Fri May 09 07:55:12 +0000 2014 ... Fri May 09 09:15:07 +0000 2014
I think your question already answered => Convert Twitter Timestamp in R
But if you want to more simple you can use twitteR library.
> tweets <- userTimeline("BarackObama",n=100)
> df <- do.call("rbind",lapply(tweets, as.data.frame))
> names(df)
[1] "text" "favorited" "favoriteCount" "replyToSN" "created" "truncated"
[7] "replyToSID" "id" "replyToUID" "statusSource" "screenName" "retweetCount"
[13] "isRetweet" "retweeted" "longitude" "latitude"
we can plot directly the created status date
You can remove the unnecessary parts of the string before applying as.POSIXct. This can be done with gsub:
x <- as.factor(c("Fri May 09 07:55:12 +0000 2014",
"Fri May 09 07:55:12 +0000 2014"))
as.POSIXct(gsub("^.+? | \\+\\d{4}","", x),
format = "%b %d %X %Y")
# [1] "2014-05-09 07:55:12 CEST" "2014-05-09 07:55:12 CEST"