Subsetting results from sapply - r

After I use sapply, I get a list, and I would like to access individual elements of those lists. So far, I have:
large.list <- sapply(1:length(visit_num), function(x)
seq(enter.shift.want[x], to= exit.prime[x], by= 'hour'))
where enter.shift.want and exit.prime are vectors of dates.
head(large.list, 2)
[[1]]
[1] "1982-05-17 13:00:00 PDT" "1982-05-17 14:00:00 PDT" "1982-05-17 15:00:00 PDT"
[4] "1982-05-17 16:00:00 PDT" "1982-05-17 17:00:00 PDT" "1982-05-17 18:00:00 PDT"
[7] "1982-05-17 19:00:00 PDT" "1982-05-17 20:00:00 PDT" "1982-05-17 21:00:00 PDT"
[10] "1982-05-17 22:00:00 PDT"
[[2]]
[1] "1982-07-14 13:00:00 PDT" "1982-07-14 14:00:00 PDT" "1982-07-14 15:00:00 PDT"
[4] "1982-07-14 16:00:00 PDT" "1982-07-14 17:00:00 PDT" "1982-07-14 18:00:00 PDT"
[7] "1982-07-14 19:00:00 PDT" "1982-07-14 20:00:00 PDT" "1982-07-14 21:00:00 PDT"
[10] "1982-07-14 22:00:00 PDT"
I would like to have large.list[1] as a vector of dates/time.
Then I would like to do
large.list[1]<=enter.shift.want[1]
and get a vector of true and false results. Then I would want generalize and do
large.list[n]<= enter.shift.want[n] for each n in (1:length(visit_num)) , and add up the true/falses.
Thanks in advance.

If enter.shift.want is a list or a vector with same number of elements as large.list, here is one way to apply it to the whole list.
res <- Map(`<=`, large.list, enter.shift.want)
res1 <- Map(`<=`, large.list, enter.shift.want1)
To get the total number of TRUE per list element
colSums(do.call(cbind, res))
#[1] 3 3
Or
sapply(res, sum)
#[1] 3 3
sapply(res1,sum)
#[1] 3 7
data
large.list <- list(structure(c(390488400, 390492000, 390495600, 390499200,
390502800, 390506400, 390510000, 390513600, 390517200, 390520800
), class = c("POSIXct", "POSIXt"), tzone = "PDT"), structure(c(395499600,
395503200, 395506800, 395510400, 395514000, 395517600, 395521200,
395524800, 395528400, 395532000), class = c("POSIXct", "POSIXt"
), tzone = "PDT"))
v1 <- c('1982-05-17 00:00:00', '1982-07-14 00:00:00')
enter.shift.want <- lapply(v1, function(x) seq(as.POSIXct(x, tz='PDT'),
length.out=10, by='3 hour'))
enter.shift.want1 <- as.POSIXct(c('1982-05-17 15:00:00',
'1982-07-14 19:00:00'), tz='PDT')

Related

R matrix column to fill with timestamp

I'm trying to fill one matrix column with a date-time called up from another column.
B <- matrix(0, nrow(A) - 1, 3)
B[, 1] <- "Anne"
times <- as.POSIXct(tname$DT[1:955], format = "%Y-%m-%d %H:%M:%S")
B[, 2] <- times
When returning times, it lists them in the format "%Y-%m-%d %H:%M:%S",
[1] "2017-05-19 11:01:00 EDT" "2017-05-19 12:01:00 EDT" "2017-05-19
12:31:00 EDT" "2017-05-19 13:01:00 EDT"
[5] "2017-05-19 13:31:00 EDT" "2017-05-19 14:01:00 EDT" "2017-05-19
14:31:00 EDT" "2017-05-19 15:01:00 EDT"
[9] "2017-05-20 08:01:00 EDT" "2017-05-20 09:01:00 EDT" "2017-05-20
10:01:00 EDT" "2017-05-20 11:01:00 EDT" ....
however, when I call up B[, 2] it gives me weird numbers:
[1] "1495206060" "1495209660" "1495211460" "1495213260" "1495215060"
"1495216860" "1495218660" "1495220460"
[9] "1495281660" "1495285260" "1495288860" "1495292460" "1495296060" ....
How do I copy my dates and times into my matrix in the right format?

Total items in a list

Can anyone tell me why I do have only 1895 elements instead of 1896(79 days X 24 hours)?
time_index <- seq(from = as.POSIXct("2017-01-02 01:00"),
to = as.POSIXct("2017-03-21 24:00"), by = "hour")
length(time_index)
# >[1] 1895
daylight saving ?
time_index[1655:1660]
[1] "2017-03-11 23:00:00 EST" "2017-03-12 00:00:00 EST"
[3] "2017-03-12 01:00:00 EST" "2017-03-12 03:00:00 EDT"
[5] "2017-03-12 04:00:00 EDT" "2017-03-12 05:00:00 EDT"
to stop it from happening one must choose a time zone where there is no daylight saving, here is an example
time_index <- seq(from = as.POSIXct("2017-01-02 01:00",tz = 'UTC'),
to = as.POSIXct("2017-03-21 24:00", tz = 'UTC'),
by = "hour")
length(time_index)
[1] 1896

Create a time series by 30 minute intervals

I am trying to create a time series with 30 min intervals. I used the following command with the output also shown:
ts = seq(as.POSIXct("2009-01-01 00:00"), as.POSIXct("2014-12-31 23:30"),by = "hour")
"2010-02-21 12:00:00 EST" "2010-02-21 13:00:00 EST" "2010-02-21 14:00:00 EST"
When I change it to by ="min" it changes to be every minute.
How do I create a time series with every 30 minute intervals?
You can specify minutes in the by argument, and pass the time zone "UTC" as Adrian pointed out. Check ?seq.POSIXt for more details about the by argument specified as a character string:
A character string, containing one of "sec", "min", "hour", "day",
"DSTday", "week", "month", "quarter" or "year". This can optionally be
preceded by a (positive or negative) integer and a space, or followed
by "s".
ts <- seq(as.POSIXct("2017-01-01", tz = "UTC"),
as.POSIXct("2017-01-02", tz = "UTC"),
by = "30 min")
head(ts)
Output
[1] "2017-01-01 00:00:00 UTC"
[2] "2017-01-01 00:30:00 UTC"
[3] "2017-01-01 01:00:00 UTC"
[4] "2017-01-01 01:30:00 UTC"
[5] "2017-01-01 02:00:00 UTC"
[6] "2017-01-01 02:30:00 UTC"
Default units are seconds. So just do 1800 seconds to get 30 minutes.
ts = seq(as.POSIXct("2009-01-01 00:00"), as.POSIXct("2014-12-31 23:30"),by = 1800)
ts[1:20]
[1] "2009-01-01 00:00:00 EST" "2009-01-01 00:30:00 EST" "2009-01-01 01:00:00 EST" "2009-01-01 01:30:00 EST" "2009-01-01 02:00:00 EST"
[6] "2009-01-01 02:30:00 EST" "2009-01-01 03:00:00 EST" "2009-01-01 03:30:00 EST" "2009-01-01 04:00:00 EST" "2009-01-01 04:30:00 EST"
[11] "2009-01-01 05:00:00 EST" "2009-01-01 05:30:00 EST" "2009-01-01 06:00:00 EST" "2009-01-01 06:30:00 EST" "2009-01-01 07:00:00 EST"
[16] "2009-01-01 07:30:00 EST" "2009-01-01 08:00:00 EST" "2009-01-01 08:30:00 EST" "2009-01-01 09:00:00 EST" "2009-01-01 09:30:00 EST"

splicing time intervals posixct

I have the following time intervals that I would like to split into 10 equally spaced instances.
head(data)
stoptime starttime
1 2014-08-19 14:52:04 2014-08-19 15:22:04
2 2014-08-19 16:27:14 2014-08-19 17:17:33
3 2014-08-19 18:05:59 2014-08-19 18:09:12
4 2014-08-19 17:25:35 2014-08-19 17:29:06
5 2014-08-19 18:23:29 2014-08-19 18:57:34
6 2014-08-19 07:39:15 2014-08-19 07:48:49
I am able to take the midpoint using this code
one_day$midtime = as.POSIXct((as.numeric(one_day$stoptime) + as.numeric(one_day$starttime)) /2 , origin = '1970-01-01')
however, when I try to extend this code to ten equally spaced instances it goes completely wrong. Why is this happening and how can I fix this code?
one_day$first = as.POSIXct((as.numeric(one_day$stoptime) + as.numeric(one_day$starttime)) * .1 , origin = '1970-01-01')
one_day$second = as.POSIXct((as.numeric(one_day$stoptime) + as.numeric(one_day$starttime)) * .2, origin = '1970-01-01')
one_day$thrid = as.POSIXct((as.numeric(one_day$stoptime) + as.numeric(one_day$starttime)) * .3, origin = '1970-01-01')
one_day$fourth = as.POSIXct((as.numeric(one_day$stoptime) + as.numeric(one_day$starttime)) * .4, origin = '1970-01-01')
one_day$fifth = as.POSIXct((as.numeric(one_day$stoptime) + as.numeric(one_day$starttime)) * .5, origin = '1970-01-01')
one_day$sixth = as.POSIXct((as.numeric(one_day$stoptime) + as.numeric(one_day$starttime)) * .6, origin = '1970-01-01')
one_day$seventh = as.POSIXct((as.numeric(one_day$stoptime) + as.numeric(one_day$starttime)) * .7, origin = '1970-01-01')
one_day$eighth = as.POSIXct((as.numeric(one_day$stoptime) + as.numeric(one_day$starttime)) * .8, origin = '1970-01-01')
one_day$ninth = as.POSIXct((as.numeric(one_day$stoptime) + as.numeric(one_day$starttime)) * .9, origin = '1970-01-01')
head(one_day)
diff.time stoptime starttime midtime first
1 1800 2014-08-19 14:52:04 2014-08-19 15:22:04 2014-08-19 15:07:04 1978-12-05 03:49:24
2 3019 2014-08-19 16:27:14 2014-08-19 17:17:33 2014-08-19 16:52:23 1978-12-05 04:10:28
3 193 2014-08-19 18:05:59 2014-08-19 18:09:12 2014-08-19 18:07:35 1978-12-05 04:25:31
4 211 2014-08-19 17:25:35 2014-08-19 17:29:06 2014-08-19 17:27:20 1978-12-05 04:17:28
5 2045 2014-08-19 18:23:29 2014-08-19 18:57:34 2014-08-19 18:40:31 1978-12-05 04:32:06
6 574 2014-08-19 07:39:15 2014-08-19 07:48:49 2014-08-19 07:44:02 1978-12-05 02:20:48
second thrid fourth fifth sixth
1 1987-11-08 12:38:49 1996-10-11 21:28:14 2005-09-15 06:17:39 2014-08-19 15:07:04 2023-07-23 23:56:28
2 1987-11-08 13:20:57 1996-10-11 22:31:26 2005-09-15 07:41:54 2014-08-19 16:52:23 2023-07-24 02:02:52
3 1987-11-08 13:51:02 1996-10-11 23:16:33 2005-09-15 08:42:04 2014-08-19 18:07:35 2023-07-24 03:33:06
4 1987-11-08 13:34:56 1996-10-11 22:52:24 2005-09-15 08:09:52 2014-08-19 17:27:20 2023-07-24 02:44:48
5 1987-11-08 14:04:12 1996-10-11 23:36:18 2005-09-15 09:08:25 2014-08-19 18:40:31 2023-07-24 04:12:37
6 1987-11-08 09:41:36 1996-10-11 17:02:25 2005-09-15 00:23:13 2014-08-19 07:44:02 2023-07-23 15:04:50
seventh eighth ninth
1 2032-06-26 08:45:53 2041-05-30 17:35:18 2050-05-04 02:24:43
2 2032-06-26 11:13:20 2041-05-30 20:23:49 2050-05-04 05:34:18
3 2032-06-26 12:58:37 2041-05-30 22:24:08 2050-05-04 07:49:39
4 2032-06-26 12:02:16 2041-05-30 21:19:44 2050-05-04 06:37:12
5 2032-06-26 13:44:44 2041-05-30 23:16:50 2050-05-04 08:48:56
6 2032-06-25 22:25:38 2041-05-30 05:46:27 2050-05-03 13:07:15
dput(data1)
structure(list(stoptime = structure(c(1408477924, 1408483634,
1408489559, 1408487135, 1408490609, 1408451955, 1408452727, 1408498708,
1408486644, 1408454996), class = c("POSIXct", "POSIXt"), tzone = "EST"),
starttime = structure(c(1408479724, 1408486653, 1408489752,
1408487346, 1408492654, 1408452529, 1408455826, 1408501153,
1408488389, 1408458514), class = c("POSIXct", "POSIXt"), tzone = "EST")), .Names = c("stoptime",
"starttime"), row.names = c(NA, 10L), class = "data.frame")
1: Seq
First of all you have to convert the columns of your dataframe as POSIXct or POSIXlt class, because the r base function seq has a method for objects of that class.
Just see this semplified code:
library(lubridate)
a <- "2014-08-19 14:52:04"
b <- "2014-08-19 15:22:04"
a <- ymd_hms(a)
b <- ymd_hms(b)
a
[1] "2014-08-19 14:52:04 UTC"
b
[1] "2014-08-19 15:22:04 UTC"
Then you have to just use the seq function and set the parameters length.out with the value of the sequence you are seeking. The code will automatically create a sequence of values from the start to the end equally divided.
seq(a, b, length.out = 10)
[1] "2014-08-19 14:52:04 UTC" "2014-08-19 14:55:24 UTC"
[3] "2014-08-19 14:58:44 UTC" "2014-08-19 15:02:04 UTC"
[5] "2014-08-19 15:05:24 UTC" "2014-08-19 15:08:44 UTC"
[7] "2014-08-19 15:12:04 UTC" "2014-08-19 15:15:24 UTC"
[9] "2014-08-19 15:18:44 UTC" "2014-08-19 15:22:04 UTC"
2: Vectorize step 1
Now that you know how to achieve your goal, it is just a matter of trying how to vectorize it along values.
I bet there are several approaches, here is one. With the mapply function you can loop trough the elements and match the first element (of the first object) with the first element (of the second object) and so on. Keep in mind that you have to specify which parameters are fixed with the MoreArg arguments.
Here is the code:
mapply(seq,
to = data1$starttime,
from = data1$stoptime,
MoreArgs = list(length.out = 10),
SIMPLIFY = F)
that produces a list of your desired data but not in the desired format sadly:
[[1]]
[1] "2014-08-19 14:52:04 UTC" "2014-08-19 14:55:24 UTC"
[3] "2014-08-19 14:58:44 UTC" "2014-08-19 15:02:04 UTC"
[5] "2014-08-19 15:05:24 UTC" "2014-08-19 15:08:44 UTC"
[7] "2014-08-19 15:12:04 UTC" "2014-08-19 15:15:24 UTC"
[9] "2014-08-19 15:18:44 UTC" "2014-08-19 15:22:04 UTC"
[[2]]
[1] "2014-08-19 16:27:14 UTC" "2014-08-19 16:32:49 UTC"
[3] "2014-08-19 16:38:24 UTC" "2014-08-19 16:44:00 UTC"
[5] "2014-08-19 16:49:35 UTC" "2014-08-19 16:55:11 UTC"
[7] "2014-08-19 17:00:46 UTC" "2014-08-19 17:06:22 UTC"
[9] "2014-08-19 17:11:57 UTC" "2014-08-19 17:17:33 UTC"
[[3]]
[1] "2014-08-19 18:05:59 UTC" "2014-08-19 18:06:20 UTC"
[3] "2014-08-19 18:06:41 UTC" "2014-08-19 18:07:03 UTC"
[5] "2014-08-19 18:07:24 UTC" "2014-08-19 18:07:46 UTC"
[7] "2014-08-19 18:08:07 UTC" "2014-08-19 18:08:29 UTC"
[9] "2014-08-19 18:08:50 UTC" "2014-08-19 18:09:12 UTC"
[[4]]
[1] "2014-08-19 17:25:35 UTC" "2014-08-19 17:25:58 UTC"
[3] "2014-08-19 17:26:21 UTC" "2014-08-19 17:26:45 UTC"
[5] "2014-08-19 17:27:08 UTC" "2014-08-19 17:27:32 UTC"
[7] "2014-08-19 17:27:55 UTC" "2014-08-19 17:28:19 UTC"
[9] "2014-08-19 17:28:42 UTC" "2014-08-19 17:29:06 UTC"
[[5]]
[1] "2014-08-19 18:23:29 UTC" "2014-08-19 18:27:16 UTC"
[3] "2014-08-19 18:31:03 UTC" "2014-08-19 18:34:50 UTC"
[5] "2014-08-19 18:38:37 UTC" "2014-08-19 18:42:25 UTC"
[7] "2014-08-19 18:46:12 UTC" "2014-08-19 18:49:59 UTC"
[9] "2014-08-19 18:53:46 UTC" "2014-08-19 18:57:34 UTC"
[[6]]
[1] "2014-08-19 07:39:15 UTC" "2014-08-19 07:40:18 UTC"
[3] "2014-08-19 07:41:22 UTC" "2014-08-19 07:42:26 UTC"
[5] "2014-08-19 07:43:30 UTC" "2014-08-19 07:44:33 UTC"
[7] "2014-08-19 07:45:37 UTC" "2014-08-19 07:46:41 UTC"
[9] "2014-08-19 07:47:45 UTC" "2014-08-19 07:48:49 UTC"
[[7]]
[1] "2014-08-19 07:52:07 UTC" "2014-08-19 07:57:51 UTC"
[3] "2014-08-19 08:03:35 UTC" "2014-08-19 08:09:20 UTC"
[5] "2014-08-19 08:15:04 UTC" "2014-08-19 08:20:48 UTC"
[7] "2014-08-19 08:26:33 UTC" "2014-08-19 08:32:17 UTC"
[9] "2014-08-19 08:38:01 UTC" "2014-08-19 08:43:46 UTC"
[[8]]
[1] "2014-08-19 20:38:28 UTC" "2014-08-19 20:42:59 UTC"
[3] "2014-08-19 20:47:31 UTC" "2014-08-19 20:52:03 UTC"
[5] "2014-08-19 20:56:34 UTC" "2014-08-19 21:01:06 UTC"
[7] "2014-08-19 21:05:38 UTC" "2014-08-19 21:10:09 UTC"
[9] "2014-08-19 21:14:41 UTC" "2014-08-19 21:19:13 UTC"
[[9]]
[1] "2014-08-19 17:17:24 UTC" "2014-08-19 17:20:37 UTC"
[3] "2014-08-19 17:23:51 UTC" "2014-08-19 17:27:05 UTC"
[5] "2014-08-19 17:30:19 UTC" "2014-08-19 17:33:33 UTC"
[7] "2014-08-19 17:36:47 UTC" "2014-08-19 17:40:01 UTC"
[9] "2014-08-19 17:43:15 UTC" "2014-08-19 17:46:29 UTC"
[[10]]
[1] "2014-08-19 08:29:56 UTC" "2014-08-19 08:36:26 UTC"
[3] "2014-08-19 08:42:57 UTC" "2014-08-19 08:49:28 UTC"
[5] "2014-08-19 08:55:59 UTC" "2014-08-19 09:02:30 UTC"
[7] "2014-08-19 09:09:01 UTC" "2014-08-19 09:15:32 UTC"
[9] "2014-08-19 09:22:03 UTC" "2014-08-19 09:28:34 UTC"
At this point I guess it is just a matter of same data manipulation but I can't figure out a way (now).
You can't just multiply the time interval by 0.1, you have to add that 0.1 of the time interval to the earlier time. For example:
one_day$firstexample = one_day$stoptime + 0.1*difftime(one_day$starttime, one_day$stoptime, units = "mins")
As a side note, if you find yourself typing out very similar things multiple times, that's usually a sign that you should turn it into a function.

Appending list of lists to data frame in single column in R

I am Working on below code in r to scrape a web page info :
library(rvest)
crickbuzz <- read_html(httr::GET("http://www.cricbuzz.com/cricket -match/live-scores"))
matches_dates <- crickbuzz %>%
html_nodes(".schedule-date:nth-child(1)")%>%
html_attr("timestamp")
matches_dates
[1] "1452268800000" "1452132000000" "1452247200000" "1452242400000" "1452327000000" "1452290400000" "1452310200000" "1452310200000" "1452310200000"
[10] "1452310200000" "1452324600000" "1452324600000" "1452324600000" "1452324600000" "1452324600000" "1452150000000" "1452153600000" "1452153600000"
now i am converting it to proper date and time format
dates <- lapply(X = matches_date , function(timestamp_match){
(as.POSIXct(as.numeric(timestamp_match)/1000, origin="1970-01-01")) })
and now i have dates in the below form :
dates
[[1]]
[1] "2016-01-10 07:30:00 IST"
[[2]]
[1] "2016-01-10 21:30:00 IST"
[[3]]
[1] "2016-01-09 12:00:00 IST"
[[4]]
[1] "2016-01-10 13:55:00 IST"
[[5]]
[1] "2016-01-10 10:50:00 IST"
[[6]]
[1] "2016-01-07 12:30:00 IST"
[[7]]
[1] "2016-01-07 13:30:00 IST"
[[8]]
[1] "2016-01-10 09:00:00 IST"
[[9]]
[1] "2016-01-10 09:00:00 IST"
[[10]]
[1] "2016-01-10 09:00:00 IST"
[[11]]
[1] "2016-01-10 09:00:00 IST"
[[12]]
[1] "2016-01-10 09:00:00 IST"
[[13]]
[1] "2016-01-10 13:00:00 IST"
[[14]]
[1] "2016-01-10 13:00:00 IST"
[[15]]
[1] "2016-01-10 13:00:00 IST"
[[16]]
[1] "2016-01-10 13:00:00 IST"
[[17]]
[1] "2016-01-10 03:30:00 IST"
[[18]]
[1] "2016-01-10 03:30:00 IST"
now i am appending this to one column of data frame :
matches_info[,"Date And Time"] <- dates
but only 1st date is getting copied over whole column and giving below warning.
Warning message:
In `[<-.data.frame`(`*tmp*`, , "Date And Time", value = list(1452391200, :
provided 18 variables to replace 1 variables
and if i will do unlist(dates) it is giving me timestamps again. How can i extrate date and time ??
Try do.call(c, dates) instead of unlist(dates) to prevent R from converting the list elements to numeric and keeping them POSIXct:
matches_date <- c("1452268800000", "1452132000000")
dates <- lapply(X = matches_date , function(timestamp_match){
(as.POSIXct(as.numeric(timestamp_match)/1000, origin="1970-01-01")) })
do.call(c, dates)
# [1] "2016-01-08 17:00:00 CET" "2016-01-07 03:00:00 CET"
matches_info[,"Date And Time"] <- do.call(c, dates)
or simply
matches_date <- c("1452268800000", "1452132000000")
matches_info[,"Date And Time"] <- as.POSIXct(as.numeric(matches_date)/1000, origin="1970-01-01")

Resources