Subset dataframe by most number of daily records - r

I am working with a large dataset, an example can be shown below. For the majority of individual files I will have to process there should be more than one day's worth of data.
Date <- c("05/12/2012 05:00:00", "05/12/2012 06:00:00", "05/12/2012 07:00:00",
"05/12/2012 08:00:00", "06/12/2012 07:00:00", "06/12/2012 08:00:00",
"07/12/2012 05:00:00", "07/12/2012 06:00:00", "07/12/2012 07:00:00",
"07/12/2012 08:00:00")
Date <- strptime(Date, "%d/%m/%Y %H:%M")
c <- c("0","1","5","4","6","8","0","3","10","6")
c <- as.numeric(c)
df1 <- data.frame(Date,c,stringsAsFactors = FALSE)
I wish to only be left with data on a single day. This day will be chosen by having the most number of data points for that day. If for any reason two days are tied (with the maximum number of data points), I wish to select the day with the highest individual value recorded.
In the example dataframe given above, I would be left with 7th Dec. It has 4 data points (as has the 5th Dec), but it has the highest value recorded out of these two days (i.e. 10).

Here's a solution with tapply.
# count rows per day and find maximum c value
res <- with(df1, tapply(c, as.Date(Date), function(x) c(length(x), max(x))))
# order these two values in decreasing order and find the associated day
# (at top position):
maxDate <- names(res)[order(sapply(res, "[", 1),
sapply(res, "[", 2), decreasing = TRUE)[1]]
# subset data frame:
subset(df1, as.character(as.Date(Date)) %in% maxDate)
Date c
7 2012-12-07 05:00:00 0
8 2012-12-07 06:00:00 3
9 2012-12-07 07:00:00 10
10 2012-12-07 08:00:00 6

A data.table solution:
dt <- data.table(df1)
# get just the date
dt[, day := as.Date(Date)]
setkey(dt, "day")
# get total entries (N) and max(c) for each day-group
dt <- dt[, `:=`(N = .N, mc = max(c)), by=day]
setkey(dt, "N")
# filter by maximum of N
dt <- dt[J(max(N))]
setkey(dt, "mc")
# settle ties with maximum of c
dt <- dt[J(max(mc))]
dt[, c("N", "mc", "day") := NULL]
print(dt)
# Date c
# 1: 2012-12-07 05:00:00 0
# 2: 2012-12-07 06:00:00 3
# 3: 2012-12-07 07:00:00 10
# 4: 2012-12-07 08:00:00 6

And to be complete, here's one with plyr :
library(plyr)
df1$day <- strftime(df1$Date, "%d/%m/%Y")
tmp <- ddply(df1[,c("day","c")], .(day), summarize, nb=length(c), max=max(c))
tmp <- tmp[order(tmp$nb, tmp$max, decreasing=TRUE),]
df1[df1$day==tmp$day[1],]
Which gives :
Date c day
7 2012-12-07 05:00:00 0 07/12/2012
8 2012-12-07 06:00:00 3 07/12/2012
9 2012-12-07 07:00:00 10 07/12/2012
10 2012-12-07 08:00:00 6 07/12/2012

Related

Summarizing across overlapping dates

I am trying to see how I can create a variable which summarizes observations across multiple dates.
library(data.table)
library(lubridate)
library(magrittr)
sample <- data.table(start = c("2018-12-22 23:00:00",
"2018-12-23 06:00:00",
"2018-12-22 06:00:00",
"2018-12-23 06:00:00"),
end = c("2018-12-23 06:00:00",
"2018-12-23 13:00:00",
"2018-12-23 12:00:00",
"2018-12-24 01:00:00"),
store = c("A", "A", "B", "B"))
sample[, start:= ymd_hms(start)]
sample[, end := ymd_hms(end)]
sample
> sample
start end store
1: 2018-12-22 23:00:00 2018-12-23 06:00:00 A
2: 2018-12-23 06:00:00 2018-12-23 13:00:00 A
3: 2018-12-22 06:00:00 2018-12-23 12:00:00 B
4: 2018-12-23 06:00:00 2018-12-24 01:00:00 B
Here, sample is a time card of "shifts" used across each store. We see that store A has two observations, each with a start and end time. If there was no "bleeding" across dates (e.g. first observation begins on 2018-12-22 and ends on 2018-12-23), I would simply subtract the start and end times, and sum across the stores to get the total amount of minutes used across each stores. Something like:
worked_mins <- sample %>%
.[, date := ymd(substr(start,1,10))] %>%
.[, minutes := end - start] %>%
.[, .(worked_mins = sum(minutes)), by = .(store,date)]
However, I am trying to see how to best sum the number of minutes when shifts overlap across multiple days (potentially even >=2 days).
From the above, the desired output would be:
worked_mins = data.table(store = c("A","A", "B", "B", "B"),
date = c("2018-12-22", "2018-12-23",
"2018-12-22", "2018-12-23",
"2018-12-24"),
worked_mins = c(1, 13, 18, 30, 1))
> worked_mins
store date worked_mins
1: A 2018-12-22 1
2: A 2018-12-23 13
3: B 2018-12-22 18
4: B 2018-12-23 30
5: B 2018-12-24 1
Thanks!
An updated solution that counts actual time, not just counting hours. This should take into account fractional hours.
library(lubridate) # ceiling_date, floor_date
func <- function(st, en, units = "hours") {
midns <- ceiling_date(seq(st, en, by = "day"), unit = "day")
times <- unique(sort(c(midns[ st < midns & midns < en], st, en)))
if (length(times) < 2) {
data.table(date = as.Date(floor_date(st)), d = structure(0, class = "difftime", units = units))
} else {
data.table(date = as.Date(floor_date(times[-length(times)], unit = "days")), d = `units<-`(diff(times), units))
}
}
sample[, rbindlist(Map(func, start, end)), by = .(store)
][, .(d = sum(d)), by = .(store, date)]
# store date d
# <char> <Date> <difftime>
# 1: A 2018-12-22 1 hours
# 2: A 2018-12-23 13 hours
# 3: B 2018-12-22 18 hours
# 4: B 2018-12-23 30 hours
# 5: B 2018-12-24 1 hours
(The 1 hours is still a numeric column, it just has a label of its units attached; this can be removed easily by wrapping the diff in as.numeric.)
func works by including midnights between st and en; creating a times ordered vector of these unique timestamps allows us to diff across them, then floor_date them so that we know the date that each diff started.
You can see what func is doing with this quick demo, one that makes the first line a 0-second difference (for testing and validation):
copy(sample)[1, end:=start][, rbindlist(Map(func, start, end)), by = .(store)]
# store date d
# <char> <Date> <difftime>
# 1: A 2018-12-22 0 hours
# 2: A 2018-12-23 7 hours
# 3: B 2018-12-22 18 hours
# 4: B 2018-12-23 12 hours
# 5: B 2018-12-23 18 hours
# 6: B 2018-12-24 1 hours
Does this achieve what you need?
sample %>%
rowwise() %>%
mutate(
worked_hours = map2(start, end, ~seq(.x, .y, "hours") %>% head(-1))
) %>%
unnest(cols = c(worked_hours)) %>%
select(store, worked_hours) %>%
mutate(date = floor_date(worked_hours, "days")) %>%
group_by(store, date) %>%
count(name = "worked_mins")
# A tibble: 5 x 3
# Groups: store, date [5]
store date worked_mins
<chr> <dttm> <int>
1 A 2018-12-22 00:00:00 1
2 A 2018-12-23 00:00:00 13
3 B 2018-12-22 00:00:00 18
4 B 2018-12-23 00:00:00 30
5 B 2018-12-24 00:00:00 1

extract data from another data frame based on nearest timestamp and conditions in R [duplicate]

I have 2 data sets, each containing a date-time value in POSIXlt format, and some other numeric and character variables.
I want to combine both data sets based on the date-time column.
But the date stamps of both data sets do not match, so I need to combine them by nearest date (before or after).
In my example, data value "e" from 2016-03-01 23:52:00 needs to be combined with "binH" at 2016-03-02 00:00:00, not "binG".
Is there a function that would allow me to combine my data sets by nearest date-time value, even if it is after?
I have found ways of combining dates to the next previous date using the cut() function, or the roll=Inf function in data.tables. But I couldn't get my timestamps into any format roll='nearest' would accept.
>df1
date1 value
1 2016-03-01 17:52:00 a
2 2016-03-01 18:01:30 b
3 2016-03-01 18:05:00 c
4 2016-03-01 20:42:30 d
5 2016-03-01 23:52:00 e
>df2
date2 bin_name
1 2016-03-01 17:00:00 binA
2 2016-03-01 18:00:00 binB
3 2016-03-01 19:00:00 binC
4 2016-03-01 20:00:00 binD
5 2016-03-01 21:00:00 binE
6 2016-03-01 22:00:00 binF
7 2016-03-01 23:00:00 binG
8 2016-03-02 00:00:00 binH
9 2016-03-02 01:00:00 binI
data.table should work for this (can you explain the error you're coming up against?), although it does tend to convert POSIXlt to POSIXct on its own (perhaps do that conversion on your datetime column manually to keep data.table happy). Also make sure you're setting the key column before using roll.
(I've created my own example tables here to make my life that little bit easier. If you want to use dput on yours, I'm happy to update this example with your data):
new <- data.table( date = as.POSIXct( c( "2016-03-02 12:20:00", "2016-03-07 12:20:00", "2016-04-02 12:20:00" ) ), data.new = c( "t","u","v" ) )
head( new, 2 )
date data.new
1: 2016-03-02 12:20:00 t
2: 2016-03-07 12:20:00 u
old <- data.table( date = as.POSIXct( c( "2016-03-02 12:20:00", "2016-03-07 12:20:00", "2016-04-02 12:20:00", "2015-03-02 12:20:00" ) ), data.old = c( "a","b","c","d" ) )
head( old, 2 )
date data.old
1: 2016-03-02 12:20:00 a
2: 2016-03-07 12:20:00 b
setkey( new, date )
setkey( old, date )
combined <- new[ old, roll = "nearest" ]
combined
date data.new data.old
1: 2015-03-02 12:20:00 t d
2: 2016-03-02 12:20:00 t a
3: 2016-03-07 12:20:00 u b
4: 2016-04-02 12:20:00 v c
I've intentionally made the two tables different row lengths, in order to show how the rolling join deals with multiple matches. You can switch the way it joins with:
combined <- old[ new, roll = "nearest" ]
combined
date data.old data.new
1: 2016-03-02 12:20:00 a t
2: 2016-03-07 12:20:00 b u
3: 2016-04-02 12:20:00 c v
I had a similar problem, but instead of using data.table or tidyverse I created my own function amerge for "approximate merge". It takes 4 arguments:
two data frames,
a vector of column names for "firm" (not approximate) merge - these must exist in both data frames,
and the name of a single column (in both data frames) for approximate merge. It will work for any numeric values, including dates.
The idea was to merge rows 1-to-1 of best matches, and not loose any rows from any data frame. Here is my commented code with a working example.
amerge <- function(d1, d2, firm=NULL, approx=NULL) {
rt = Sys.time()
# Take care of conflicting column names
n2 = data.frame(oldname = names(d2), newname = names(d2))
n2$newname = as.character(n2$newname)
n2$newname[(n2$oldname %in% names(d1)) & !(n2$oldname %in% firm)] =
paste(n2$newname[(n2$oldname %in% names(d1)) & !(n2$oldname %in% firm)], "2", sep=".")
# Add unique row IDs
if (length(c(firm, approx))>1) {
d1$ID1 = factor(apply(d1[,c(approx,firm)], 1, paste, collapse=" "))
d2$ID2 = factor(apply(d2[,c(approx,firm)], 1, paste, collapse=" "))
} else {
d1$ID1 = factor(d1[,c(approx,firm)])
d2$ID2 = factor(d2[,c(approx,firm)])
}
# Perform initial merge on the 'firm' parameters, if any
# Otherwise match all to all
if (length(firm)>0) {
t1 = merge(d1, d2, by=firm, all=T, suff=c("",".2"))
} else {
names(d2)= c(n2$newname,"ID2")
t1 = data.frame()
for (i1 in 1:nrow(d1)) {
trow = d1[i1,]
t1 = rbind(t1, cbind(trow, d2))
}
}
# Match by the most approximate record
if (length(approx)==1) {
# Calculate the differential for approximate merging
t1$DIFF = abs(t1[,approx] - t1[,n2$newname[n2$oldname==approx]])
# Sort data by ascending DIFF, so that best matching records are used first
t1 = t1[order(t1$DIFF, t1$ID1, t1$ID2),]
t2 = data.frame()
d2$used = 0
# For each record of d1, find match from d2
for (i1 in na.omit(unique(t1$ID1))) {
tx = t1[!is.na(t1$DIFF) & t1$ID1==i1,]
# If there are non-missing records, get the one with minimum DIFF (top one)
if (nrow(tx)>0) {
tx = tx[1,]
# If matching record found, remove it from the pool, so it's not used again
t1[!is.na(t1$ID2) & t1$ID2==tx$ID2, c(n2$newname[!(n2$newname %in% firm)], "DIFF")] = NA
# And mark it as used
d2$used[d2$ID2==tx$ID2] = 1
} else {
# If there are no non-missing records, just get the first one from the top
tx = t1[!is.na(t1$ID1) & t1$ID1==i1,][1,]
}
t2 = rbind(t2,tx)
}
} else {
t2 = t1
}
# Make the records the same order as d1
t2 = t2[match(d1$ID1, t2$ID1),]
# Add unmatched records from d2 to the end of output
if (any(d2$used==0)) {
tx = t1[t1$ID2 %in% d2$ID2[d2$used==0], ]
tx = tx[!duplicated(tx$ID2),]
tx[, names(d1)[!(names(d1) %in% c(firm))]] = NA
t2 = rbind(t2,tx)
t2[is.na(t2[,approx]), approx] = t2[is.na(t2[,approx]), n2$newname[n2$oldname==approx]]
}
t2$DIFF = t2$ID1 = t2$ID2 = NULL
cat("* Run time: ", round(difftime(Sys.time(),rt, "secs"),1), " seconds.\n", sep="")
return(t2)
}
And the example:
new <- data.frame(ID=c(1,1,1,2), date = as.POSIXct( c("2016-03-02 12:20:00", "2016-03-07 12:20:00", "2016-04-02 12:20:00", "2016-04-12 11:03:00")), new = c("t","u","v","x"))
old <- data.frame(ID=c(1,1,1,1,1), date = as.POSIXct( c("2016-03-07 12:20:00", "2016-04-02 12:20:00", "2016-03-01 10:09:00", "2015-04-12 10:09:00","2016-03-03 12:20:00")), old = c("a","b","c","d","e"))
amerge(old, new, firm="ID", approx="date")
It outputs:
ID date old date.2 new
2 1 2016-03-07 12:20:00 a 2016-03-07 12:20:00 u
6 1 2016-04-02 12:20:00 b 2016-04-02 12:20:00 v
7 1 2016-03-01 10:09:00 c <NA> <NA>
10 1 2015-04-12 10:09:00 d <NA> <NA>
13 1 2016-03-03 12:20:00 e 2016-03-02 12:20:00 t
16 2 2016-04-12 11:03:00 <NA> 2016-04-12 11:03:00 x
So works for my purpose as intended - there is exactly one copy of each row from both data frames - matched by shortest time difference. One note: the function copies date.2 into date column where the date would be missing.

R data.table add column as function of another data.table

I have one data table which contains just a sequence of times. I have another data table containing two columns: start_time and end_time. I want to take the first data table and add a column where the value is the count of all of the rows in the second data table where the time from the first data table fits within the start and end time. Here is my code
start_date <- as.POSIXct(x = "2017-01-31 17:00:00", format = "%Y-%m-%d %H:%M:%S")
end_date <- as.POSIXct(x = "2017-02-01 09:00:00", format = "%Y-%m-%d %H:%M:%S")
all_dates <- as.data.table(seq(start_date, end_date, "min"))
colnames(all_dates) <- c("Bin")
start_times <- sample(seq(start_date,end_date,"min"), 100)
offsets <- sample(seq(60,7200,60), 100)
end_times <- start_times + offsets
input_data <- data.table(start_times, end_times)
Here is what i want to do, but this is wrong and gives an error. What's the right way to write this?
all_dates[, BinCount := input_data[start_times < Bin & end_times > Bin, .N] ]
In the end i should get something like
Bin BinCount
2017-01-31 17:00:00 1
2017-01-31 17:01:00 5
...
The problem can be solved very easily using sqldf as it provides easy way to join tables with range checking. Hence one solution could be:
The data from OP:
library(data.table)
start_date <- as.POSIXct(x = "2017-01-31 17:00:00", format = "%Y-%m-%d %H:%M:%S")
end_date <- as.POSIXct(x = "2017-02-01 09:00:00", format = "%Y-%m-%d %H:%M:%S")
all_dates <- as.data.table(seq(start_date, end_date, "min"))
colnames(all_dates) <- c("Bin")
start_times <- sample(seq(start_date,end_date,"min"), 100)
offsets <- sample(seq(60,7200,60), 100)
end_times <- start_times + offsets
input_data <- data.table(start_times, end_times)
library(sqldf)
result <- sqldf("SELECT all_dates.bin, count() as BinCount
FROM all_dates, input_data
WHERE all_dates.bin > input_data.start_times AND
all_dates.bin < input_data.end_times
GROUP BY bin" )
result
Bin BinCount
1 2017-01-31 17:01:00 1
2 2017-01-31 17:02:00 1
3 2017-01-31 17:03:00 1
4 2017-01-31 17:04:00 1
5 2017-01-31 17:05:00 1
6 2017-01-31 17:06:00 1
...........
...........
497 2017-02-01 01:17:00 6
498 2017-02-01 01:18:00 5
499 2017-02-01 01:19:00 5
500 2017-02-01 01:20:00 4
[ reached getOption("max.print") -- omitted 460 rows ]
In data.table you're after a range join.
library(data.table)
start_date <- as.POSIXct(x = "2017-01-31 17:00:00", format = "%Y-%m-%d %H:%M:%S")
end_date <- as.POSIXct(x = "2017-02-01 09:00:00", format = "%Y-%m-%d %H:%M:%S")
all_dates <- as.data.table(seq(start_date, end_date, "min"))
colnames(all_dates) <- c("Bin")
set.seed(123)
start_times <- sample(seq(start_date,end_date,"min"), 100)
offsets <- sample(seq(60,7200,60), 100)
end_times <- start_times + offsets
input_data <- data.table(start_times, end_times)
## doing the range-join and calculating the number of items per bin in one chained step
input_data[
all_dates
, on = .(start_times < Bin, end_times > Bin)
, nomatch = 0
, allow.cartesian = T
][, .N, by = start_times]
# start_times N
# 1: 2017-01-31 17:01:00 1
# 2: 2017-01-31 17:02:00 1
# 3: 2017-01-31 17:03:00 1
# 4: 2017-01-31 17:04:00 1
# 5: 2017-01-31 17:05:00 1
# ---
# 956: 2017-02-01 08:56:00 6
# 957: 2017-02-01 08:57:00 4
# 958: 2017-02-01 08:58:00 4
# 959: 2017-02-01 08:59:00 5
# 960: 2017-02-01 09:00:00 5
Note:
I've put the all_dates object on the right-hand-side of the join, so the result contains the names of the input_data columns, even though they are your Bins (see this issue for the discussion on this topic)
I've used set.seed(), as you're taking samples
Wasn't requested, but here is a compact alternative solution using the tidyverse. Uses lubridate parsers, interval, and %within%, as well as purrr::map_int to generate the desired bin counts.
library(tidyverse)
library(lubridate)
start_date <- ymd_hms(x = "2017-01-31 17:00:00") # lubridate parsers
end_date <- ymd_hms(x = "2017-02-01 09:00:00")
all_dates <- tibble(seq(start_date, end_date, "min")) # tibble swap for data.table
colnames(all_dates) <- c("Bin")
start_times <- sample(seq(start_date,end_date,"min"), 100)
offsets <- sample(seq(60,7200,60), 100)
end_times <- start_times + offsets
input_data <- tibble(
start_times,
end_times,
intvl = interval(start_times, end_times) # Add interval column
)
all_dates %>% # Checks date in Bin and counts intervals it lies within
mutate(BinCount = map_int(.$Bin, ~ sum(. %within% input_data$intvl)))
# A tibble: 961 x 2
Bin BinCount
<dttm> <int>
1 2017-01-31 17:00:00 0
2 2017-01-31 17:01:00 0
3 2017-01-31 17:02:00 0
4 2017-01-31 17:03:00 0
5 2017-01-31 17:04:00 0
6 2017-01-31 17:05:00 0
7 2017-01-31 17:06:00 0
8 2017-01-31 17:07:00 1
9 2017-01-31 17:08:00 1
10 2017-01-31 17:09:00 1
# ... with 951 more rows

Aggregate data frame by sequence of events per day

I have a data frame (df) like this:
TIMESTAMP STATUS
2016-01-01 00:00:00 OFF
2016-01-01 01:00:00 ON
2016-01-01 02:00:00 ON
2016-01-01 03:00:00 OFF
2016-01-02 00:00:00 ON
2016-01-02 01:00:00 OFF
...
I need to aggregate(?) the sequence of statuses for each day. For example the first day in df gives the sequence OFF-ON-ON-OFF whereas the second day just gives OFF-ON
So I need an aggregated data frame by date like this:
DAY SEQUENCE
2016-01-01 OFF-ON-ON-OFF
2016-01-02 ON-OFF
...
library(dplyr)
df %>%
arrange(TIMESTAMP) %>%
mutate(date = as.Date(TIMESTAMP)) %>%
group_by(date) %>%
summarise(sequence = paste(status, collapse = "-"))
data
df <- data.frame(
TIMESTAMP = c("2016-01-01 00:00:00", "2016-01-01 01:00:00", "2016-01-01 02:00:00", "2016-01-01 03:00:00", "2016-01-02 00:00:00", "2016-01-02 01:00:00"),
status = c("OFF", "ON", "ON", "OFF", "ON", "OFF")
)
By tradition I'll add a data.table solution here:
library(data.table)
library(lubridate)
s <- "TIMESTAMP, STATUS
2016-01-01 00:00:00, OFF
2016-01-01 01:00:00, ON
2016-01-01 02:00:00, ON
2016-01-01 03:00:00, OFF
2016-01-02 00:00:00, ON
2016-01-02 01:00:00, OFF"
dt <- fread(s)
dt[, day_time := ymd_hms(TIMESTAMP)]
# better to make sure the events is in right order
setorder(dt, day_time)
dt[, DAY := date(day_time)]
dt[, paste0(STATUS, collapse = "-"), by = DAY]
Based on your desired result, I assume that you want to remove the time stamps as well. If that is the case, you can use aggregate, as.Date, and paste from base R.
df <- data.frame(TIMESTAMP =
c('2016-01-01 00:00:00','2016-01-01 01:00:00',
'2016-01-01 02:00:00','2016-01-01 03:00:00',
'2016-01-02 00:00:00','2016-01-02 01:00:00'),
STATUS = c('OFF','ON','ON','OFF','ON','OFF'))
aggregate(df$STATUS, list(as.Date(df$TIMESTAMP)), paste, collapse="-")
## Group.1 x
## 2016-01-01 OFF-ON-ON-OFF
## 2016-01-02 ON-OFF

How do I subset datetimes and pivot the measurement column in R

I have a dataframe like this
Datetime <- c("2015-12-31 08:30:13", "2015-12-31 12:45:00", "2016-01-01 02:53:20", "2016-01-01 03:22:18",
"2016-01-01 09:42:10", "2016-01-01 20:55:50", "2016-01-01 21:14:10", "2016-01-02 05:42:16",
"2016-01-02 08:31:15", "2016-01-02 09:13:10", "2016-01-03 00:45:14", "2016-01-03 05:56:00",
"2016-01-03 13:44:00", "2016-01-03 14:41:20", "2016-01-03 15:33:10", "2016-01-04 04:24:00",
"2016-01-04 17:24:12", "2016-01-04 17:28:16", "2016-01-04 18:22:34", "2016-01-05 02:34:31")
Measurement <- c("Length","Breadth","Height","Length",
"Breadth","Breadth","Breadth","Length",
"Length","Breadth","Height","Height",
"Height","Length","Height","Length",
"Length","Breadth","Breadth","Breadth")
df1 <- data.frame(Datetime,Measurement)
I am trying to subset the dates in this format
Day1 = December 31st,2015 at 6:30AM to January 1st 2016 6:30AM
Day2 = January 1st,2015 at 6:30AM to January 2nd 2016 6:30AM
etc..
While doing this, I would also like to pivot the Measurement column into its individual columns with count of each category
My desired output is
Days Length Breadth Height
Day1 2 1 1
Day2 1 3 0
Day3 1 1 2
Day4 2 0 2
Day5 1 3 0
I tried something like this to get the date ranges
today <- as.POSIXlt(Sys.time())
today$mday <- today$mday + (today$wday-(today$wday+27))
today$hour = "6";today$min = "30";today$sec = "0"
Back1Day <- today
Back1Day$mday <- today$mday-1
How do I subset according to this problem. I tried to do it using dcast but not getting it right.
df2 <- dcast(df1, Datetime ~ Measurement)
Kindly provide some directions on this.
This seem to satisfy your needs (according to your comments). I'm just creating a sequence from the first date to the last one by day, and then utilizing the findInterval function in order to match the days. Then, a simple dcast gives you what you need.
library(data.table)
setDT(df1)[, Datetime := as.POSIXct(Datetime)] ## First need to convert to POSIXct class
df1[, Days := paste0("Day", findInterval(Datetime,
seq(as.POSIXct(paste(as.Date(Datetime[1L]), "6:30")),
as.POSIXct(paste(as.Date(Datetime[.N]), "6:30")),
by = "day")))]
dcast(df1, Days ~ Measurement)
# Days Breadth Height Length
# 1: Day1 1 1 2
# 2: Day2 3 0 1
# 3: Day3 1 2 1
# 4: Day4 0 2 2
# 5: Day5 3 0 1

Resources