My goal is to apply the geosphere::bearing function to a very large data frame,
yet because the data frame concerns multiple individuals, I split it using the purrr package and split function.
I have seen the use of 'lists' and 'forloops' in the past but I have no experience with these.
Below is a fraction of my dataset, I have split the dataframe by ID, into a list with 43 elements. I have attached long and lat in wgs84 to the initial data frame.
ID Date Time Datetime Long Lat x y
10_17 4/18/2017 15:02:00 4/18/2017 15:02 379800.5 5181001 -91.72272 46.35156
10_17 4/20/2017 6:00:00 4/20/2017 6:00 383409 5179885 -91.7044 46.34891
10_17 4/21/2017 21:02:00 4/21/2017 21:02 383191.2 5177960 -91.72297 46.35134
10_24 4/22/2017 10:03:00 4/22/2017 10:03 383448.6 5179918 -91.72298 46.35134
10_17 4/23/2017 12:01:00 4/23/2017 12:01 378582.5 5182110 -91.7242 46.34506
10_24 4/24/2017 1:00:00 4/24/2017 1:00 383647.4 5180009 -91.72515 46.34738
10_24 4/25/2017 16:01:00 4/25/2017 16:01 383407.9 5179872 -91.7184 46.32236
10_17 4/26/2017 18:02:00 4/26/2017 18:02 380691.9 5179353 -91.65361 46.34712
10_36 4/27/2017 20:00:00 4/27/2017 20:00 382521.9 5175266 -91.66127 46.3485
10_36 4/29/2017 11:01:00 4/29/2017 11:01 383443.8 5179909 -91.70303 46.35451
10_36 4/30/2017 0:00:00 4/30/2017 0:00 383060.8 5178361 -91.6685 46.32941
10_40 4/30/2017 13:02:00 4/30/2017 13:02 383426.3 5179873 -91.70263 46.35481
10_40 5/2/2017 17:02:00 5/2/2017 17:02 383393.7 5179883 -91.67099 46.34138
10_40 5/3/2017 6:01:00 5/3/2017 6:01 382875.8 5179376 -91.66324 46.34763
10_88 5/3/2017 19:02:00 5/3/2017 19:02 383264.3 5179948 -91.73075 46.3684
10_88 5/4/2017 8:01:00 5/4/2017 8:01 378554.4 5181966 -91.70413 46.35429
10_88 5/4/2017 21:03:00 5/4/2017 21:03 379830.5 5177232 -91.66452 46.37274
I then try this function
library(geosphere)
library(sf)
library(magrittr)
dis_list <- split(data, data$ID)
answer <- lapply(dis_list, function(df) {
start <- df[-1 , c("x", "y")] %>%
st_as_sf(coords = c('x', 'y'))
end <- df[-nrow(df), c("x", "y")] %>%
st_as_sf(coords = c('x', 'y'))
angles <-geosphere::bearing(start, end)
df$angles <- c(NA, angles)
df
})
answer
which gives the error
Error in .pointsToMatrix(p1) :
'list' object cannot be coerced to type 'double'
A google search on "pass sf points to geosphere bearings" brings up this SE::GIS answer that seems to address the issue which I would characterize as "how to extract numeric vectors from items that are sf-classed POINTS": https://gis.stackexchange.com/questions/416316/compute-east-west-or-north-south-orientation-of-polylines-sf-linestring-in-r
I needed to work with a single section first and then apply the lessons from #Spacedman to this task:
> st_coordinates( st_as_sf(dis_list[[1]], coords = c('x', 'y')) )
X Y
1 -91.72272 46.35156
2 -91.70440 46.34891
3 -91.72297 46.35134
4 -91.72420 46.34506
5 -91.65361 46.34712
So st_coordinates wilL extract the POINTS classed values into a two column matrix that can THEN get passed to geosphere::bearings
dis_list <- split(dat, dat$ID)
answer <- lapply(dis_list, function(df) {
start <- df[-1 , c("x", "y")] %>%
st_as_sf(coords = c('x', 'y')) %>% st_coordinates
end1 <- df[-nrow(df), c("x", "y")] %>%
st_as_sf(coords = c('x', 'y')) %>% st_coordinates
angles <-geosphere::bearing(start, end1)
df$angles <- c(NA, angles)
df
})
answer
#------------------------
$`10_17`
ID Date Time date time Long Lat x y
1 10_17 4/18/2017 15:02:00 4/18/2017 15:02 379800.5 5181001 -91.72272 46.35156
2 10_17 4/20/2017 6:00:00 4/20/2017 6:00 383409.0 5179885 -91.70440 46.34891
3 10_17 4/21/2017 21:02:00 4/21/2017 21:02 383191.2 5177960 -91.72297 46.35134
5 10_17 4/23/2017 12:01:00 4/23/2017 12:01 378582.5 5182110 -91.72420 46.34506
8 10_17 4/26/2017 18:02:00 4/26/2017 18:02 380691.9 5179353 -91.65361 46.34712
Datetime angles
1 4/18/2017 15:02 NA
2 4/20/2017 6:00 -78.194383
3 4/21/2017 21:02 100.694352
5 4/23/2017 12:01 7.723513
8 4/26/2017 18:02 -92.387473
$`10_24`
ID Date Time date time Long Lat x y
4 10_24 4/22/2017 10:03:00 4/22/2017 10:03 383448.6 5179918 -91.72298 46.35134
6 10_24 4/24/2017 1:00:00 4/24/2017 1:00 383647.4 5180009 -91.72515 46.34738
7 10_24 4/25/2017 16:01:00 4/25/2017 16:01 383407.9 5179872 -91.71840 46.32236
Datetime angles
4 4/22/2017 10:03 NA
6 4/24/2017 1:00 20.77910
7 4/25/2017 16:01 -10.58228
$`10_36`
ID Date Time date time Long Lat x y
9 10_36 4/27/2017 20:00:00 4/27/2017 20:00 382521.9 5175266 -91.66127 46.34850
10 10_36 4/29/2017 11:01:00 4/29/2017 11:01 383443.8 5179909 -91.70303 46.35451
11 10_36 4/30/2017 0:00:00 4/30/2017 0:00 383060.8 5178361 -91.66850 46.32941
Datetime angles
9 4/27/2017 20:00 NA
10 4/29/2017 11:01 101.72602
11 4/30/2017 0:00 -43.60192
$`10_40`
ID Date Time date time Long Lat x y
12 10_40 4/30/2017 13:02:00 4/30/2017 13:02 383426.3 5179873 -91.70263 46.35481
13 10_40 5/2/2017 17:02:00 5/2/2017 17:02 383393.7 5179883 -91.67099 46.34138
14 10_40 5/3/2017 6:01:00 5/3/2017 6:01 382875.8 5179376 -91.66324 46.34763
Datetime angles
12 4/30/2017 13:02 NA
13 5/2/2017 17:02 -58.48235
14 5/3/2017 6:01 -139.34297
$`10_88`
ID Date Time date time Long Lat x y
15 10_88 5/3/2017 19:02:00 5/3/2017 19:02 383264.3 5179948 -91.73075 46.36840
16 10_88 5/4/2017 8:01:00 5/4/2017 8:01 378554.4 5181966 -91.70413 46.35429
17 10_88 5/4/2017 21:03:00 5/4/2017 21:03 379830.5 5177232 -91.66452 46.37274
Datetime angles
15 5/3/2017 19:02 NA
16 5/4/2017 8:01 -52.55217
17 5/4/2017 21:03 -123.91920
The help page for st_coordinates characterizes its function as "retrieve coordinates in matrix form".
Given the data is all ready in longitude and latitude form.
Then just using bearing(data[, c("Long", "Lat")]) and distGeo(data[, c("Long", "Lat")]) from geosphere on the split data frames will work. No need to create a start and end points.
library(geosphere)
dfs <- split(data, data$ID)
library(geosphere)
answer <- lapply(dfs, function(df) {
df$distances <-c(distGeo(df[,c("Long", "Lat")]))
df$bearings <- c(bearing(df[,c("Long", "Lat")]))
df
})
answer
The sf package is useful for converting between coordinate systems, but with the data set above, that step can be skipped. I find the geosphere package more straight forward and simpler to use.
Related
My goal is to apply the st_distance function to a very large data frame,
yet because the data frame concerns multiple individuals, I split it using the purrr package and split function.
I have seen the use of 'lists' and 'forloops' in the past but I have no experience with these.
Below is a fraction of my dataset, I have split the dataframe by ID, into a list with 43 elements.
The st_distance function I plan to use looks something like, it it would be applied to the full data frame, not split into a list:
PART 2:
I want to do the same as explained by Dave2e, but now for geosphere::bearing
I have attached long and lat in wgs84 to the initial data frame, which now looks like this:
ID Date Time Datetime Long Lat x y
10_17 4/18/2017 15:02:00 4/18/2017 15:02 379800.5 5181001 -91.72272 46.35156
10_17 4/20/2017 6:00:00 4/20/2017 6:00 383409 5179885 -91.7044 46.34891
10_17 4/21/2017 21:02:00 4/21/2017 21:02 383191.2 5177960 -91.72297 46.35134
10_24 4/22/2017 10:03:00 4/22/2017 10:03 383448.6 5179918 -91.72298 46.35134
10_17 4/23/2017 12:01:00 4/23/2017 12:01 378582.5 5182110 -91.7242 46.34506
10_24 4/24/2017 1:00:00 4/24/2017 1:00 383647.4 5180009 -91.72515 46.34738
10_24 4/25/2017 16:01:00 4/25/2017 16:01 383407.9 5179872 -91.7184 46.32236
10_17 4/26/2017 18:02:00 4/26/2017 18:02 380691.9 5179353 -91.65361 46.34712
10_36 4/27/2017 20:00:00 4/27/2017 20:00 382521.9 5175266 -91.66127 46.3485
10_36 4/29/2017 11:01:00 4/29/2017 11:01 383443.8 5179909 -91.70303 46.35451
10_36 4/30/2017 0:00:00 4/30/2017 0:00 383060.8 5178361 -91.6685 46.32941
10_40 4/30/2017 13:02:00 4/30/2017 13:02 383426.3 5179873 -91.70263 46.35481
10_40 5/2/2017 17:02:00 5/2/2017 17:02 383393.7 5179883 -91.67099 46.34138
10_40 5/3/2017 6:01:00 5/3/2017 6:01 382875.8 5179376 -91.66324 46.34763
10_88 5/3/2017 19:02:00 5/3/2017 19:02 383264.3 5179948 -91.73075 46.3684
10_88 5/4/2017 8:01:00 5/4/2017 8:01 378554.4 5181966 -91.70413 46.35429
10_88 5/4/2017 21:03:00 5/4/2017 21:03 379830.5 5177232 -91.66452 46.37274
I then try a function similar to the one below, but with the coordinates changed to x and y but it leads to an error
dis_list <- split(data, data$ID)
answer <- lapply(dis_list, function(df) {
start <- df[-1 , c("x", "y")] %>%
st_as_sf(coords = c('x', 'y'))
end <- df[-nrow(df), c("x", "y")] %>%
st_as_sf(coords = c('x', 'y'))
angles <-geosphere::bearing(start, end)
df$angles <- c(NA, angles)
df
})
answer
which gives the error
Error in .pointsToMatrix(p1) :
'list' object cannot be coerced to type 'double'
Here is an basic solution. I split the original data into multiple data frames using split and then wrapped the distance function in lapply().
data <- read.table(header=TRUE, text="ID Date Time Datetime time2 Long Lat
10_17 4/18/2017 15:02:00 4/18/2017 15:02 379800.5 5181001
10_17 4/20/2017 6:00:00 4/20/2017 6:00 383409 5179885
10_17 4/21/2017 21:02:00 4/21/2017 21:02 383191.2 5177960
10_24 4/22/2017 10:03:00 4/22/2017 10:03 383448.6 5179918
10_17 4/23/2017 12:01:00 4/23/2017 12:01 378582.5 5182110
10_24 4/24/2017 1:00:00 4/24/2017 1:00 383647.4 5180009
10_24 4/25/2017 16:01:00 4/25/2017 16:01 383407.9 5179872
10_17 4/26/2017 18:02:00 4/26/2017 18:02 380691.9 5179353
10_36 4/27/2017 20:00:00 4/27/2017 20:00 382521.9 5175266
10_36 4/29/2017 11:01:00 4/29/2017 11:01 383443.8 5179909
10_36 4/30/2017 0:00:00 4/30/2017 0:00 383060.8 5178361
10_40 4/30/2017 13:02:00 4/30/2017 13:02 383426.3 5179873
10_40 5/2/2017 17:02:00 5/2/2017 17:02 383393.7 5179883
10_40 5/3/2017 6:01:00 5/3/2017 6:01 382875.8 5179376
10_88 5/3/2017 19:02:00 5/3/2017 19:02 383264.3 5179948
10_88 5/4/2017 8:01:00 5/4/2017 8:01 378554.4 5181966
10_88 5/4/2017 21:03:00 5/4/2017 21:03 379830.5 5177232")
#EPSG:32615 32615
library(sf)
library(magrittr)
dfs <- split(data, data$ID)
answer <- lapply(dfs, function(df) {
#convert to a sf oject and specify coordinate systems
start <- df[-1 , c("Long", "Lat")] %>%
st_as_sf(coords = c('Long', 'Lat')) %>%
st_set_crs(32615)
end <- df[-nrow(df), c("Long", "Lat")] %>%
st_as_sf(coords = c('Long', 'Lat')) %>%
st_set_crs(32615)
#long_lat <-st_transform(start, 4326)
distances <-sf::st_distance(start, end, by_element = TRUE)
df$distances <- c(NA, distances)
df
})
answer
$`10_17`
ID Date Time Datetime time2 Long Lat distances
1 10_17 4/18/2017 15:02:00 4/18/2017 15:02 379800.5 5181001 NA
2 10_17 4/20/2017 6:00:00 4/20/2017 6:00 383409.0 5179885 3777.132
3 10_17 4/21/2017 21:02:00 4/21/2017 21:02 383191.2 5177960 1937.282
5 10_17 4/23/2017 12:01:00 4/23/2017 12:01 378582.5 5182110 6201.824
8 10_17 4/26/2017 18:02:00 4/26/2017 18:02 380691.9 5179353 3471.400
$`10_24`
ID Date Time Datetime time2 Long Lat distances
4 10_24 4/22/2017 10:03:00 4/22/2017 10:03 383448.6 5179918 NA
6 10_24 4/24/2017 1:00:00 4/24/2017 1:00 383647.4 5180009 218.6377
7 10_24 4/25/2017 16:01:00 4/25/2017 16:01 383407.9 5179872 275.9153
There should be an easier way to calculate distances between rows instead of creating 2 series of points.
Referenced: Converting table columns to spatial objects
OK, this is making me crazy.
I have several datasets with time values that need to be rolled up into 15 minute intervals.
I found a solution here that works beautifully on one dataset. But on the next one I try to do I'm getting weird results. I have a column with character data representing dates:
BeginTime
-------------------------------
1 1/3/19 1:50 PM
2 1/3/19 1:30 PM
3 1/3/19 4:56 PM
4 1/4/19 11:23 AM
5 1/6/19 7:45 PM
6 1/7/19 10:15 PM
7 1/8/19 12:02 PM
8 1/9/19 10:43 PM
And I'm using the following code (which is exactly what I used on the other dataset except for the names)
df$by15 = cut(mdy_hm(df$BeginTime), breaks="15 min")
but what I get is:
BeginTime by15
-------------------------------------------------------
1 1/3/19 1:50 PM 2019-01-03 13:36:00
2 1/3/19 1:30 PM 2019-01-03 13:21:00
3 1/3/19 4:56 PM 2019-01-03 16:51:00
4 1/4/19 11:23 AM 2019-01-04 11:21:00
5 1/6/19 7:45 PM 2019-01-06 19:36:00
6 1/7/19 10:15 PM 2019-01-07 22:06:00
7 1/8/19 12:02 PM 2019-01-08 11:51:00
8 1/9/19 10:43 PM 2019-01-09 22:36:00
9 1/10/19 11:25 AM 2019-01-10 11:21:00
Any suggestions on why I'm getting such random times instead of the 15-minute intervals I'm looking for? Like I said, this worked fine on the other data set.
You can use lubridate::round_date() function which will roll-up your datetime data as follows;
library(lubridate) # To handle datetime data
library(dplyr) # For data manipulation
# Creating dataframe
df <-
data.frame(
BeginTime = c("1/3/19 1:50 PM", "1/3/19 1:30 PM", "1/3/19 4:56 PM",
"1/4/19 11:23 AM", "1/6/19 7:45 PM", "1/7/19 10:15 PM",
"1/8/19 12:02 PM", "1/9/19 10:43 PM")
)
df %>%
# First we parse the data in order to convert it from string format to datetime
mutate(by15 = parse_date_time(BeginTime, '%d/%m/%y %I:%M %p'),
# We roll up the data/round it to 15 minutes interval
by15 = round_date(by15, "15 mins"))
#
# BeginTime by15
# 1/3/19 1:50 PM 2019-03-01 13:45:00
# 1/3/19 1:30 PM 2019-03-01 13:30:00
# 1/3/19 4:56 PM 2019-03-01 17:00:00
# 1/4/19 11:23 AM 2019-04-01 11:30:00
# 1/6/19 7:45 PM 2019-06-01 19:45:00
# 1/7/19 10:15 PM 2019-07-01 22:15:00
# 1/8/19 12:02 PM 2019-08-01 12:00:00
# 1/9/19 10:43 PM 2019-09-01 22:45:00
I have a data of swimming times that I would like to be able to plot over time. I was wondering if there was a quick way to change these variables from character to numeric?
I started by trying to convert the times to a POSIX date-time format, but that proved to not be helpful, especially because I would like to do some ARIMA predictions on the data.
Here is my data
times <- c("47.45","47.69",
"47.69","47.82",
"47.84","47.92",
"47.96","48.13",
"48.16","48.16",
"48.16","48.31",
"49.01","49.27",
"49.33","49.40",
"49.48","49.51",
"52.85","52.89",
"53.14","54.31",
"54.63","56.91",
"1:18.39","1:20.26",
"1:38.30")
dates <- c("2017-02-24 MST",
"2017-02-24 MST",
"2016-02-26 MST",
"2018-02-23 MST",
"2015-12-04 MST",
"2015-03-06 MST",
"2015-03-06 MST",
"2016-12-02 MST",
"2016-02-26 MST",
"2017-11-17 MST",
"2016-12-02 MST",
"2017-11-17 MST",
"2014-11-22 MST",
"2017-01-13 MST",
"2017-01-21 MST",
"2015-10-17 MDT",
"2017-01-27 MST",
"2016-01-29 MST",
"2017-10-20 MDT",
"2016-11-05 MDT",
"2015-11-07 MST",
"2015-10-30 MDT",
"2014-11-22 MST",
"2016-11-11 MST",
"2014-02-28 MST",
"2014-02-28 MST",
"2014-02-28 MST",)
df <- cbind(as.data.frame(dates),as.data.frame(times))
I hope to get a column for time, probably in seconds, so the first 24 obs would stay the same, but the last 3 obs would change to 78.39,80.26, and 98.30
One way is to pre-pend those times that don't have minutes with "00:".
Then you can use lubridate::ms to do the time conversion.
library(dplyr)
library(lubridate)
data.frame(times = times,
stringsAsFactors = FALSE) %>%
mutate(times2 = ifelse(grepl(":", times), times, paste0("00:", times)),
seconds = as.numeric(ms(times2)))
Result:
times times2 seconds
1 47.45 00:47.45 47.45
2 47.69 00:47.69 47.69
3 47.69 00:47.69 47.69
4 47.82 00:47.82 47.82
5 47.84 00:47.84 47.84
6 47.92 00:47.92 47.92
7 47.96 00:47.96 47.96
8 48.13 00:48.13 48.13
9 48.16 00:48.16 48.16
10 48.16 00:48.16 48.16
11 48.16 00:48.16 48.16
12 48.31 00:48.31 48.31
13 49.01 00:49.01 49.01
14 49.27 00:49.27 49.27
15 49.33 00:49.33 49.33
16 49.40 00:49.40 49.40
17 49.48 00:49.48 49.48
18 49.51 00:49.51 49.51
19 52.85 00:52.85 52.85
20 52.89 00:52.89 52.89
21 53.14 00:53.14 53.14
22 54.31 00:54.31 54.31
23 54.63 00:54.63 54.63
24 56.91 00:56.91 56.91
25 1:18.39 1:18.39 78.39
26 1:20.26 1:20.26 80.26
27 1:38.30 1:38.30 98.30
as.difftime, and a quick regex to add the minutes when they are not present, should handle it:
as.difftime(sub("(^\\d{1,2}\\.)", "0:\\1", times), format="%M:%OS")
#Time differences in secs
# [1] 47.45 47.69 47.69 47.82 47.84 47.92 47.96 48.13 48.16 48.16 48.16 48.31
#[13] 49.01 49.27 49.33 49.40 49.48 49.51 52.85 52.89 53.14 54.31 54.63 56.91
#[25] 78.39 80.26 98.30
You can use separate in the Tidyverse tidyr package to split the strings into minutes and seconds:
library(tidyr)
library(dplyr)
separate(tibble(times = times), times, sep = ":",
into = c("min", "sec"), fill = "left", convert = T) %>%
mutate(min = ifelse(is.na(min), 0, min),
seconds = 60 * min + sec)
# A tibble: 27 x 3
min sec seconds
<dbl> <dbl> <dbl>
1 0 47.4 47.4
2 0 47.7 47.7
3 0 47.7 47.7
4 0 47.8 47.8
5 0 47.8 47.8
6 0 47.9 47.9
7 0 48.0 48.0
8 0 48.1 48.1
9 0 48.2 48.2
10 0 48.2 48.2
# ... with 17 more rows
The new column seconds is the number of seconds, multiplying the number of minutes by 60.
I have data for electricity sensor reading with interval 15 min but the start time is not fixed for example
in this day it start at min 13 another day start from different minute
dateTime KW
1/1/2013 1:13 34.70
1/1/2013 1:28 43.50
1/1/2013 1:43 50.50
1/1/2013 1:58 57.50
.
.
.//here start from min 02
1/30/2013 0:02 131736.30
1/30/2013 0:17 131744.30
1/30/2013 0:32 131751.10
1/30/2013 0:47 131759.00
I have data for one year and i need to have regular interval 30 min starting from mid night 00:00.
I am new to R ..can anyone help me
May be you can try:
dT <- as.POSIXct(strptime(df$dateTime, '%m/%d/%Y %H:%M'))
grp <- as.POSIXct(cut(c(as.POSIXct(gsub(' +.*', '', min(dT))), dT,
as.POSIXct(gsub(' +.*', '', max(dT)+24*3600))), breaks='30 min'))
df$grp <- grp[-c(1,length(grp))]
df
# dateTime KW grp
#1 1/1/2013 1:13 34.7 2013-01-01 01:00:00
#2 1/1/2013 1:28 43.5 2013-01-01 01:00:00
#3 1/1/2013 1:43 50.5 2013-01-01 01:30:00
#4 1/1/2013 1:58 57.5 2013-01-01 01:30:00
#5 1/30/2013 0:02 131736.3 2013-01-30 00:00:00
#6 1/30/2013 0:17 131744.3 2013-01-30 00:00:00
#7 1/30/2013 0:32 131751.1 2013-01-30 00:30:00
#8 1/30/2013 0:47 131759.0 2013-01-30 00:30:00
data
df <- structure(list(dateTime = c("1/1/2013 1:13", "1/1/2013 1:28",
"1/1/2013 1:43", "1/1/2013 1:58", "1/30/2013 0:02", "1/30/2013 0:17",
"1/30/2013 0:32", "1/30/2013 0:47"), KW = c(34.7, 43.5, 50.5,
57.5, 131736.3, 131744.3, 131751.1, 131759)), .Names = c("dateTime",
"KW"), class = "data.frame", row.names = c(NA, -8L))
I am attempting to perform a study on the clustering of high/low points based on time. I managed to achieve the above by using to.daily on intraday data and merging the two using:
intraday.merge <- merge(intraday,daily)
intraday.merge <- na.locf(intraday.merge)
intraday.merge <- intraday.merge["T08:30:00/T16:30:00"] # remove record at 00:00:00
Next, I tried to obtain the records where the high == daily.high/low == daily.low using:
intradayhi <- test[test$High == test$Daily.High]
intradaylo <- test[test$Low == test$Daily.Low]
Resulting data resembles the following:
Open High Low Close Volume Daily.Open Daily.High Daily.Low Daily.Close Daily.Volume
2012-06-19 08:45:00 258.9 259.1 258.5 258.7 1424 258.9 259.1 257.7 258.7 31523
2012-06-20 13:30:00 260.8 260.9 260.6 260.6 1616 260.4 260.9 259.2 260.8 35358
2012-06-21 08:40:00 260.7 260.8 260.4 260.5 493 260.7 260.8 257.4 258.3 31360
2012-06-22 12:10:00 255.9 256.2 255.9 256.1 626 254.5 256.2 253.9 255.3 50515
2012-06-22 12:15:00 256.1 256.2 255.9 255.9 779 254.5 256.2 253.9 255.3 50515
2012-06-25 11:55:00 254.5 254.7 254.4 254.6 1589 253.8 254.7 251.5 253.9 65621
2012-06-26 08:45:00 253.4 254.2 253.2 253.7 5849 253.8 254.2 252.4 253.1 70635
2012-06-27 11:25:00 255.6 256.0 255.5 255.9 973 251.8 256.0 251.8 255.2 53335
2012-06-28 09:00:00 257.0 257.3 256.9 257.1 601 255.3 257.3 255.0 255.1 23978
2012-06-29 13:45:00 253.0 253.4 253.0 253.4 451 247.3 253.4 246.9 253.4 52539
There are duplicated results using the subset, how do I achieve only the first record of the day? I would then be able to plot the count of records for periods in the day.
Also, are there alternate methods to get the results I want? Thanks in advance.
Edit:
Sample output should look like this, count could either be 1st result for day or aggregated (more than 1 occurrence in that day):
Time Count
08:40:00 60
08:45:00 54
08:50:00 60
...
14:00:00 20
14:05:00 12
14:10:00 30
You can get the first observation of each day via:
y <- apply.daily(x, first)
Then you can simply aggregate the count based on hours and minutes:
z <- aggregate(1:NROW(y), by=list(Time=format(index(y),"%H:%M")), sum)