I have a vector b of strings as
> b
[1] "Jan 01 2016 00:26:00" "Jan 01 2016 03:06:00" "Jan 01 2016 22:36:00" "Jan 01 2016 17:46:00"
[5] "Jan 01 2016 18:06:00" "Jan 01 2016 23:16:00" "Jan 01 2016 03:16:00" "Jan 01 2016 09:46:00"
[9] "Jan 01 2016 00:06:00" "Jan 01 2016 03:56:00"
I want to convert them into Date/Time object. I tried:
> as.Date(b, "%b %d %Y %H:%M:%S")
[1] "2016-01-01" "2016-01-01" "2016-01-01" "2016-01-01" "2016-01-01" "2016-01-01" "2016-01-01" "2016-01-01"
[9] "2016-01-01" "2016-01-01"
Why do I not get H:M:S? By the way, each element of vector b is stripped from string of this type "Fri Jan 01 00:26:00 UTC 2016" using substring.
A solution to convert directly from string "Fri Jan 01 00:26:00 UTC 2016" to a date/time object of the format "2016-01-01 23:59:00" would be helpful. I will use this date/time column to order the entire dataframe.
I would try using as.POSIXct() instead of as.Date() to account for the time component. Some helpful documentation here.
Related
I have a dataset with about 20000 observations. I need to convert one of the columns to a different date format.
head(df$created_at)
[1] Tue Mar 31 13:42:58 +0000 2020 Sat Mar 14 05:15:56 +0000 2020
[3] Sun Apr 05 14:02:10 +0000 2020 Tue Mar 24 09:06:12 +0000 2020
[5] Tue Apr 28 01:14:28 +0000 2020 Thu Oct 24 18:47:10 +0000 2019
I can apply as.date to an individual row:
as.Date(df$created_at[1], format = '%a %b %d %H:%M:%S %z %Y')
[1] "2020-03-31
But when I try to use as.Date on the entire column, I get:
df$dates = as.Date(df$created_at, format = '%a %b %d %H:%M:%S %z %Y')
Error in strptime(x, format, tz = "GMT") : input string is too long
What am I doing wrong? Is there another command I'm missing here?
(Too long for a comment.)
It works fine for the data you've shown us. There must be something wrong later in your column. You could locate the problem by trying the command on subsets of your data, e.g. tmp <- as.Date(df[1:(round(nrow(df)/2)), "created_at", ...) - then bisect to find the problem, e.g. if the problem doesn't occur in the first half of the data set then try rows 1:(round(0.75*nrow(df))) and so on ...
You could also try plotting nchar(df$created_at) to see if anything pops out.
df <- data.frame(created_at=c(
"Tue Mar 31 13:42:58 +0000 2020 Sat Mar 14 05:15:56 +0000 2020",
"Sun Apr 05 14:02:10 +0000 2020 Tue Mar 24 09:06:12 +0000 2020",
"Tue Apr 28 01:14:28 +0000 2020 Thu Oct 24 18:47:10 +0000 2019"))
df$dates = as.Date(df$created_at, format = '%a %b %d %H:%M:%S %z %Y')
Absent issues with your data as alluded to by Ben, here is a solution using parse_date_time from the lubridate package which parses the date variable into POSIXct date-time.
df <- tibble(date = c("Tue Mar 31 13:42:58 +0000 2020",
"Sun Apr 05 14:02:10 +0000 2020",
"Tue Apr 28 01:14:28 +0000 2020",
"Sat Mar 14 05:15:56 +0000 2020",
"Tue Mar 24 09:06:12 +0000 2020",
"Thu Oct 24 18:47:10 +0000 2019"))
library(lubridate)
df$date <- parse_date_time(df$date, "%a %b %d %H:%M:%S %z %Y")
date
<dttm>
1 2020-03-31 13:42:58
2 2020-04-05 14:02:10
3 2020-04-28 01:14:28
4 2020-03-14 05:15:56
5 2020-03-24 09:06:12
6 2019-10-24 18:47:10
Created on 2020-11-13 by the reprex package (v0.3.0)
I have time-stamps in one column of my dataframe. They look like
"Tue May 14 21:57:04 +0000 2013"
I want to replace the whole timestamp with only month name. How can I do it in R? Lets say the column name is "timestamp" and dataframe name is "Df".
Below is the sample of some more entries.
"Wed Jul 10 01:30:36 +0000 2013"
"Fri Apr 20 01:46:59 +0000 2012"
"Sat Jul 07 17:56:34 +0000 2012"
"Sat Mar 16 02:12:30 +0000 2013"
"Sat Feb 16 02:29:11 +0000 2013"
I want these to look like
Jul
Apr
Jul
Mar
Feb
Your help will be highly appreciated.
Assign the source data using Akrun's string
R> dates <- c("Tue May 14 21:57:04 +0000 2013", "Wed Jul 10 01:30:36 +0000 2013",
"Fri Apr 20 01:46:59 +0000 2012", "Sat Jul 07 17:56:34 +0000 2012",
"Sat Mar 16 02:12:30 +0000 2013", "Sat Feb 16 02:29:11 +0000 2013")
R> dates
[1] "Tue May 14 21:57:04 +0000 2013"
[2] "Wed Jul 10 01:30:36 +0000 2013"
[3] "Fri Apr 20 01:46:59 +0000 2012"
[4] "Sat Jul 07 17:56:34 +0000 2012"
[5] "Sat Mar 16 02:12:30 +0000 2013"
[6] "Sat Feb 16 02:29:11 +0000 2013"
R>
Parse using the appropriate strptime format:
R> pt <- strptime(dates, "%a %b %d %H:%M:%S +0000 %Y")
R> pt
[1] "2013-05-14 21:57:04 CDT" "2013-07-10 01:30:36 CDT"
[3] "2012-04-20 01:46:59 CDT" "2012-07-07 17:56:34 CDT"
[5] "2013-03-16 02:12:30 CDT" "2013-02-16 02:29:11 CST"
R>
Re-format just the desired month
R> strftime(pt, "%m")
[1] "05" "07" "04" "07" "03" "02"
R> strftime(pt, "%b")
[1] "May" "Jul" "Apr" "Jul" "Mar" "Feb"
R> strftime(pt, "%B")
[1] "May" "July" "April" "July" "March"
[6] "February"
R>
You can use strptime along with format.
Assuming you have characters, we can first convert it into "POSIXlt" "POSIXt" format and then extracting the month (%b) part of it
format(strptime(x, "%a %b %d %H:%M:%S +0000 %Y"), "%b")
#[1] "Jul" "Apr" "Jul" "Mar" "Feb"
We can use sub. Match one or more non-white space characters(\\S+) followed by one or more white space (\\s+), then capture the non-white space as a group ((\\S+)) followed by characters until the end of the string and replace it with the backreference (\\1) for the captured group.
sub("\\S+\\s+(\\S+).*", "\\1", v1)
#[1] "May" "Jul" "Apr" "Jul" "Mar" "Feb"
It may be better to use DateTime conversions (as #DirkEddelbuettel mentioned in the comments) if we know how to get the format correct.
data
v1 <- c("Tue May 14 21:57:04 +0000 2013", "Wed Jul 10 01:30:36 +0000 2013",
"Fri Apr 20 01:46:59 +0000 2012", "Sat Jul 07 17:56:34 +0000 2012",
"Sat Mar 16 02:12:30 +0000 2013", "Sat Feb 16 02:29:11 +0000 2013")
Assuming your timestamp is text:
df<-data.frame(timestamp=c("Tue May 14 21:57:04 +0000 2013",
"Fri Apr 20 01:46:59 +0000 2012",
"Sat Mar 16 02:12:30 +0000 2013"),stringsAsFactors = F)
df$month<-sapply(df$timestamp,function(sx)strsplit(sx,split=" ")[[1]][2])
df
> df
timestamp month
1 Tue May 14 21:57:04 +0000 2013 May
2 Fri Apr 20 01:46:59 +0000 2012 Apr
3 Sat Mar 16 02:12:30 +0000 2013 Mar
1) The month name is always in character positions 5 through 7 inclusive of the timestamp column so this replaces the timestampcolumn with a character solumn of months:
transform(DF, timestamp = format(substr(timestamp, 5, 7)))
The output is:
timestamp
1 Jul
2 Apr
3 Jul
4 Mar
5 Feb
2) If you wanted a factor column instead then use this variation which ensures that the factor levels are Jan=1, Feb=2, etc. rather than being assigned alphabetically:
transform(DF, timestamp = factor(substr(timestamp, 5, 7), levels = month.abb))
Note: We have assumed input in the following reproducible form:
DF <- data.frame(timestamp = c("Fri Apr 20 01:46:59 +0000 2012",
"Sat Feb 16 02:29:11 +0000 2013", "Sat Jul 07 17:56:34 +0000 2012",
"Sat Mar 16 02:12:30 +0000 2013", "Wed Jul 10 01:30:36 +0000 2013"))
I'm trying to create a function that get all the months between two months into a list:
date1<- 201305
date2<- 201511
months <- function(date1,date2){}
And I want it return a list like this:
201305
201306
201307
...
201509
201510
201511
First, we need to create dates. What you supply is not yet a date as it misses a day--so we add one:
R> d1 <- as.Date(paste0("201305","01"), "%Y%m%d")
R> d2 <- as.Date(paste0("201511","01"), "%Y%m%d")
Given two dates, getting a sequence of dates is trivial: a call to seq(). Equally trivial to format in the way you want:
R> dat <- format(seq(d1,d2,by="month"), "%Y%m")
We check the beginning and end:
R> head(dat)
[1] "201305" "201306" "201307" "201308" "201309" "201310"
R> tail(dat)
[1] "201506" "201507" "201508" "201509" "201510" "201511"
R>
Now, as a function:
datseq <- function(t1, t2) {
format(seq(as.Date(paste0(t1,"01"), "%Y%m%d"),
as.Date(paste0(t2,"01"), "%Y%m%d"),by="month"),
"%Y%m")
}
This can be done using the yearmon class in the zoo package:
library(zoo)
ym1 <- as.yearmon(as.character(date1), "%Y%m") # convert to yearmon
ym2 <- as.yearmon(as.character(date2), "%Y%m") # ditto
s <- seq(ym1, ym2, 1/12) # create yearmon sequence
as.numeric(format(s, "%Y%m")) # convert to numeric yyyymm
giving:
[1] 201305 201306 201307 201308 201309 201310 201311 201312 201401 201402
[11] 201403 201404 201405 201406 201407 201408 201409 201410 201411 201412
[21] 201501 201502 201503 201504 201505 201506 201507 201508 201509 201510
[31] 201511
or you might prefer to use s which is a yearmon class variable which looks like this but sorts correctly and can be used in plotting:
> s
[1] "May 2013" "Jun 2013" "Jul 2013" "Aug 2013" "Sep 2013" "Oct 2013"
[7] "Nov 2013" "Dec 2013" "Jan 2014" "Feb 2014" "Mar 2014" "Apr 2014"
[13] "May 2014" "Jun 2014" "Jul 2014" "Aug 2014" "Sep 2014" "Oct 2014"
[19] "Nov 2014" "Dec 2014" "Jan 2015" "Feb 2015" "Mar 2015" "Apr 2015"
[25] "May 2015" "Jun 2015" "Jul 2015" "Aug 2015" "Sep 2015" "Oct 2015"
[31] "Nov 2015"
For example this works:
plot(seq(31) ~ s)
I know this is an old post but incase someone was having a similar issue to me I hope this can be helpful. If you're looking for the months between 2 dates, lets say
d1 = as.Date("2022-01-25")
d2 = as.Date("2022-02-13")
seq(as.Date(d1), as.Date(d2), by = "month")
[1] "2022-01-25"
It does not give back the desired output of both January and February. I noticed this is because the 'day' portion of the date needs to be the same (or less than) for the first date.
What I opted to do was mutate the first date as so:
d1 = paste0(substr(d1, 1,7), "-01")
seq(as.Date(d1), as.Date(d2), by = "month")
[1] "2022-01-01" "2022-02-01"
This way you will always get the year months that you desire but not necessarily the day portion.
The question is like the title:
The date date i got is in the format like "Sat Mar 17 11:27:57 +0000 2012".
How could i convert it into R's date data?
You just need to specify the correct format (as documented in strptime):
fmt <- "%a %b %d %H:%M:%S %z %Y"
# POSIXct
as.POSIXct("Sat Mar 17 11:27:57 +0000 2012", format=fmt, tz="UTC")
[1] "2012-03-17 11:27:57 UTC"
# POSIXlt
strptime("Sat Mar 17 11:27:57 +0000 2012", format=fmt, tz="UTC")
[1] "2012-03-17 16:27:57 UTC"
I've downloaded tweets in json format, converted it into csv, and read it into R. The existing time stamps are in factor format as shown below. How should I convert it into a timestamp that can be plotted against?
[1] Fri May 09 07:55:12 +0000 2014 Fri May 09 07:55:12 +0000 2014 Fri May 09 07:55:12 +0000 2014
[4] Fri May 09 07:55:12 +0000 2014 Fri May 09 07:55:12 +0000 2014 Fri May 09 07:55:12 +0000 2014
516 Levels: Fri May 09 07:55:12 +0000 2014 ... Fri May 09 09:15:07 +0000 2014
I think your question already answered => Convert Twitter Timestamp in R
But if you want to more simple you can use twitteR library.
> tweets <- userTimeline("BarackObama",n=100)
> df <- do.call("rbind",lapply(tweets, as.data.frame))
> names(df)
[1] "text" "favorited" "favoriteCount" "replyToSN" "created" "truncated"
[7] "replyToSID" "id" "replyToUID" "statusSource" "screenName" "retweetCount"
[13] "isRetweet" "retweeted" "longitude" "latitude"
we can plot directly the created status date
You can remove the unnecessary parts of the string before applying as.POSIXct. This can be done with gsub:
x <- as.factor(c("Fri May 09 07:55:12 +0000 2014",
"Fri May 09 07:55:12 +0000 2014"))
as.POSIXct(gsub("^.+? | \\+\\d{4}","", x),
format = "%b %d %X %Y")
# [1] "2014-05-09 07:55:12 CEST" "2014-05-09 07:55:12 CEST"