I have data of the form:
[1] "Mon Feb 01 09:11:55 +0000 2016" "Mon Feb 01 09:12:11 +0000 2016" ""
[4] "Mon Feb 01 09:14:25 +0000 2016" "" "Mon Feb 01 09:15:40 +0000 2016"
and I want to plot it using R.
I want to do an hourly plot of counts so all those between 9 and 10AM would be counted in one bucket and so on. The data will be over several days but date is unimportant just hour. I might also want to change hour to 30 minutes say.
I've tried various things but I'm a little out of my depth and would be very grateful for a few basic steps to get it to work.
I've tried:
str <- strptime(dt, "%a %b %d %H:%M:%S %z %Y", tz = "GMT")
# head(str,3)
( dt.gmt <- as.POSIXct(str, tz = "GMT") )
format(dt.gmt, tz = "EST", usetz = TRUE)
hms <- format(dt.gmt , format = "%H:%M:%S")
hms<-as.numeric(hms)
head(hms,3)
hms <- table(cut(hms, breaks="hour"))
which gives the error:
Error in breaks + 1 : non-numeric argument to binary operator
I've also tried:
aggdata <-aggregate(hms, by=(hms), FUN=mean, na.rm=TRUE)
which gives:
Error in aggregate.data.frame(as.data.frame(x), ...) : 'by' must be a list
Ok I just tried this, May be this can help you
dt <- c("Mon Feb 01 09:11:55 +0000 2016", "Mon Feb 01 10:12:11 +0000
2016","Mon Feb 01 09:21:55 +0000 2016" )
df <- data.frame('time' = dt,
'id' = c(1, 3, 2))
df$time <- as.POSIXct(gsub("^.+? | \\+\\d{4}","", df$time),
format = "%B %d %X %Y")
df$time <- as.POSIXlt(df$time)
df$hour <- format(df$time, format = '%H')
df
pivot <- aggregate(df$id, by = list(df$hour), FUN = length)
pivot
Related
hey guys i have a date format like this as a string "Wed May 23 22:58:46 +0000 2019" and i need to change it into this format "%m/%d/%Y %I:%M:%S
my code looks like this:
x <- "Wed May 01 23:59:45 +0000 2019"
xx <- as.POSIXct(x, format = "%a %b %d %H:%M:%S %z %Y")
but it does not work...
Here is step by step stringr combined with lubridate solution wrapped in a function. In essence we bring the string x with the stringr functions to a form where we could apply lubridates mdy_hms function. Note: This solution is not elegant!
x <- "Wed May 01 23:59:45 +0000 2019"
library(lubridate)
library(stringr)
special_datetime_function <- function(x){
x1 <- str_remove(x, "\\d{2}\\:\\d{2}\\:\\d{2}\\s\\+\\d+")
x2 <- str_extract(x, "\\d{2}\\:\\d{2}\\:\\d{2}")
y <- str_c(x1, x2, sep = " ")
y1 <- str_squish(str_replace(y, "^\\S* ", ""))
mdy_hms(y1)
}
special_datetime_function(x)
[1] "2019-05-01 23:59:45 UTC"
[Fri Aug 07, 2020 05:12 UTC]
I have this date format in a column, how to modify it to be 08, 07,2020 05:12
also, how to remove UTC from all columns
Check ?strptime for various format options. First convert the data to POSIXct, you can then use format to get it any format that you want.
x <- 'Fri Aug 07, 2020 05:12 UTC'
x1 <- as.POSIXct(x, format = '%a %b %d, %Y %H:%M UTC', tz = 'UTC')
x1
#[1] "2020-08-07 05:12:00 UTC"
format(x1, '%m,%d,%Y %H:%M')
#[1] "08,07,2020 05:12"
If we want to apply this for multiple columns we can use lapply. For example for first 4000 columns where your dataframe is called df we can do :
cols <- 1:4000
df[cols] <- lapply(df[cols], function(x) format(as.POSIXct(x,
format = '%a %b %d, %Y %H:%M UTC', tz = 'UTC'), '%m,%d,%Y %H:%M'))
Currently, my dataset has a time variable (factor) in the following format:
weekday month day hour min seconds +0000 year
I don't know what the "+0000" field is but all observations have this. For example:
"Tues Feb 02 11:05:21 +0000 2018"
"Mon Jun 12 06:21:50 +0000 2017"
"Wed Aug 01 11:24:08 +0000 2018"
I want to convert these values to POSIXlt or POSIXct objects(year-month-day hour:min:sec) and make them numeric. Currently, using as.numeric(as.character(time-variable)) outputs incorrect values.
Thank you for the great responses! I really appreciate a lot.
Not sure how to reproduce the transition from factor to char, but starting from that this code should work:
t <- unlist(strsplit(as.character("Tues Feb 02 11:05:21 +0000 2018")," "))
strptime(paste(t[6],t[2],t[3], t[4]),format='%Y %b %d %H:%M:%S')
PS: More on date formats and conversion: https://www.stat.berkeley.edu/~s133/dates.html
For this problem you can get by without using lubridate. First, to extract individual dates we can use regmatches and gregexpr:
date_char <- 'Tue Feb 02 11:05:21 +0000 2018 Mon Jun 12 06:21:50 +0000 2017'
ptrn <- '([[:alpha:]]{3} [[:alpha:]]{3} [[:digit:]]{2} [[:digit:]]{2}\\:[[:digit:]]{2}\\:[[:digit:]]{2} \\+[[:digit:]]{4} [[:digit:]]{4})'
date_vec <- unlist( regmatches(date_char, gregexpr(ptrn, date_char)))
> date_vec
[1] "Tue Feb 02 11:05:21 +0000 2018" "Mon Jun 12 06:21:50 +0000 2017"
You can learn more about regular expressions here.
In the above example +0000 field is the UTC offset in hours e.g. it would be -0500 for EST timezone. To convert to R date-time object:
> as.POSIXct(date_vec, format = '%a %b %d %H:%M:%S %z %Y', tz = 'UTC')
[1] "2018-02-02 11:05:21 UTC" "2017-06-12 06:21:50 UTC"
which is the desired output. The formats can be found here or you can use lubridate::guess_formats(). If you don't specify the tz, you'll get the output in your system's time zone (e.g. for me that would be EST). Since the offset is specified in the format, R correctly carries out the conversion.
To get numeric values, the following works:
> as.numeric(as.POSIXct(date_vec, format = '%a %b %d %H:%M:%S %z %Y', tz = 'UTC'))
[1] 1517569521 1497248510
Note: this is based on uniform string structure. In the OP there was Tues instead of Tue which wouldn't work. The above example is based on the three-letter abbreviation which is the standard reporting format.
If however, your data is a mix of different formats, you'd have to extract individual time strings (customized regexes, of course), then use lubridate::guess_formats() to get the formats and then use those to carry out the conversion.
Hope this is helpful!!
I tried ISOdatetime() but it's not working.
error: argument "min" is missing, with no default
for example: Tue Jan 31 17:38:10 +0000 2017 -> 31/01/2017 or 31-01-2017
We can use strptime
format(strptime(str1, format = "%a %b %d %H:%M:%OS%z %Y"), "%d/%m/%Y")
#[1] "31/01/2017"
I am trying to convert the created_at string but it returns NA
as.POSIXct("Tue Jun 07 23:27:12 +0000 2016", format="%a %b %d %H:%M:%S +0000 %Y", tz="GMT")
[1] NA
Any idea what's going wrong, seems fairly straightforward!
Conversion of dates depends on your locale. For me, this is Slovene, so your case doesn't work.
> as.POSIXct("Tue Jun 07 23:27:12 +0000 2016", format="%a %b %d %H:%M:%S +0000 %Y", tz="GMT")
[1] NA
However, if I change the date to Slovene (Tor = torek = Tuesday)
> as.POSIXct("Tor Jun 07 23:27:12 +0000 2016", format="%a %b %d %H:%M:%S +0000 %Y", tz="GMT")
[1] "2016-06-07 23:27:12 GMT"
In short, change your locale to English and you're set.
> Sys.setlocale("LC_TIME", "English")
[1] "English_United States.1252"
> as.POSIXct("Tue Jun 07 23:27:12 +0000 2016", format="%a %b %d %H:%M:%S +0000 %Y", tz="GMT")
[1] "2016-06-07 23:27:12 GMT"
a solution that doesn't involve changing your locale
library(dplyr)
library(magrittr)
twitter_to_POSIXct <- function(x, timezone = Sys.timezone()){
x %>%
strsplit("\\s+") %>%
unlist %>%
t %>%
as.data.frame(stringsAsFactors = FALSE) %>%
set_colnames(c("week_day", "month_abb",
"day", "hour", "tz",
"year")) %>%
mutate(month_num = which(month.abb %in% month_abb)) %>%
mutate(date_str = paste0(year, "-", month_num, "-", day, " ",
hour)) %>%
mutate(date = format(as.POSIXct(date_str, tz = tz),
tz = timezone)) %>%
pull(date)
}
twitter_to_POSIXct("Tue Jun 07 23:27:12 +0000 2016")