I have a dataframe like this
Datetime <- c("2015-12-31 08:30:13", "2015-12-31 12:45:00", "2016-01-01 02:53:20", "2016-01-01 03:22:18",
"2016-01-01 09:42:10", "2016-01-01 20:55:50", "2016-01-01 21:14:10", "2016-01-02 05:42:16",
"2016-01-02 08:31:15", "2016-01-02 09:13:10", "2016-01-03 00:45:14", "2016-01-03 05:56:00",
"2016-01-03 13:44:00", "2016-01-03 14:41:20", "2016-01-03 15:33:10", "2016-01-04 04:24:00",
"2016-01-04 17:24:12", "2016-01-04 17:28:16", "2016-01-04 18:22:34", "2016-01-05 02:34:31")
Measurement <- c("Length","Breadth","Height","Length",
"Breadth","Breadth","Breadth","Length",
"Length","Breadth","Height","Height",
"Height","Length","Height","Length",
"Length","Breadth","Breadth","Breadth")
df1 <- data.frame(Datetime,Measurement)
I am trying to subset the dates in this format
Day1 = December 31st,2015 at 6:30AM to January 1st 2016 6:30AM
Day2 = January 1st,2015 at 6:30AM to January 2nd 2016 6:30AM
etc..
While doing this, I would also like to pivot the Measurement column into its individual columns with count of each category
My desired output is
Days Length Breadth Height
Day1 2 1 1
Day2 1 3 0
Day3 1 1 2
Day4 2 0 2
Day5 1 3 0
I tried something like this to get the date ranges
today <- as.POSIXlt(Sys.time())
today$mday <- today$mday + (today$wday-(today$wday+27))
today$hour = "6";today$min = "30";today$sec = "0"
Back1Day <- today
Back1Day$mday <- today$mday-1
How do I subset according to this problem. I tried to do it using dcast but not getting it right.
df2 <- dcast(df1, Datetime ~ Measurement)
Kindly provide some directions on this.
This seem to satisfy your needs (according to your comments). I'm just creating a sequence from the first date to the last one by day, and then utilizing the findInterval function in order to match the days. Then, a simple dcast gives you what you need.
library(data.table)
setDT(df1)[, Datetime := as.POSIXct(Datetime)] ## First need to convert to POSIXct class
df1[, Days := paste0("Day", findInterval(Datetime,
seq(as.POSIXct(paste(as.Date(Datetime[1L]), "6:30")),
as.POSIXct(paste(as.Date(Datetime[.N]), "6:30")),
by = "day")))]
dcast(df1, Days ~ Measurement)
# Days Breadth Height Length
# 1: Day1 1 1 2
# 2: Day2 3 0 1
# 3: Day3 1 2 1
# 4: Day4 0 2 2
# 5: Day5 3 0 1
Related
R:
I have a data-set with N Products sales value from some yyyy-mm-dd to some yyyy-mm-dd, I just want to filter the data for the last 12 months for each product in the data-set.
Eg:
Say, I have values from 2016-01-01 to 2020-02-01
So now I want to filter the sales values for the last 12 months that is from 2019-02-01 to 2020-02-01
I just cannot simply mention a "filter(Month >= as.Date("2019-04-01") & Month <= as.Date("2020-04-01"))" because the end date keeps changing for my case as every months passes by so I need to automate the case.
You can use :
library(dplyr)
library(lubridate)
data %>%
group_by(Product) %>%
filter(between(date, max(date) - years(1), max(date)))
#filter(date >= (max(date) - years(1)) & date <= max(date))
You can test whether the date is bigger equal the maximal date per product minus 365 days:
library(dplyr)
df %>%
group_by(Products) %>%
filter(Date >= max(Date)-365)
# A tibble: 6 x 2
# Groups: Products [3]
Products Date
<dbl> <date>
1 1 2002-01-21
2 1 2002-02-10
3 2 2002-02-24
4 2 2002-02-10
5 2 2001-07-01
6 3 2005-03-10
Data
df <- data.frame(
Products = c(1,1,1,1,2,2,2,3,3,3),
Date = as.Date(c("2000-02-01", "2002-01-21", "2002-02-10",
"2000-06-01", "2002-02-24", "2002-02-10",
"2001-07-01", "2003-01-02", "2005-03-10",
"2002-05-01")))
If your aim is to just capture entries from today back to the same day last year, then:
The function Sys.Date() returns the current date as an object of type Date. You can then convert that to POSIXlc form to adjust the year to get the start date. For example:
end.date <- Sys.Date()
end.date.lt <- asPOSIXlt(end.date)
start.date.lt <- end.date.lt
start.date.lt$year <- start.date.lt$year - 1
start.date <- asPOSIXct(start.date.lt)
Now this does have one potential fail-state: if today is February 29th. One way to deal with that would be to write a "today.last.year" function to do the above conversion, but give an explicit treatment for leap years - possibly including an option to count "today last year" as either February 28th or March 1st, depending on which gives you the desired behaviour.
Alternatively, if you wanted to filter based on a start-of-month date, you can make your function also set start.date.lt$day = 1, and so forth if you need to adjust in different ways.
Input:
product date
1: a 2017-01-01
2: b 2017-04-01
3: a 2017-07-01
4: b 2017-10-01
5: a 2018-01-01
6: b 2018-04-01
7: a 2018-07-01
8: b 2018-10-01
9: a 2019-01-01
10: b 2019-04-01
11: a 2019-07-01
12: b 2019-10-01
Code:
library(lubridate)
library(data.table)
DT <- data.table(
product = rep(c("a", "b"), 6),
date = seq(as.Date("2017-01-01"), as.Date("2019-12-31"), by = "quarter")
)
yearBefore <- function(x){
year(x) <- year(x) - 1
x
}
date_DT <- DT[, .(last_date = last(date)), by = product]
date_DT[, year_before := yearBefore(last_date)]
result <- DT[, date_DT[DT, on = .(product, year_before <= date), nomatch=0]]
result[, last_date := NULL]
setnames(result, "year_before", "date")
Output:
product date
1: a 2018-07-01
2: b 2018-10-01
3: a 2019-01-01
4: b 2019-04-01
5: a 2019-07-01
6: b 2019-10-01
Is this what you are looking for?
I'm trying to summarize values for overlapping time periods.
I can use only tidyr, ggplot2 and dplyr libraries. Base R is preferred though.
My data looks like this, but usually it has around 100 records:
df <- structure(list(Start = structure(c(1546531200, 1546531200, 546531200, 1546638252.6316, 1546549800, 1546534800, 1546545600, 1546531200, 1546633120, 1547065942.1053), class = c("POSIXct", "POSIXt"), tzone = "UTC"), Stop = structure(c(1546770243.1579, 1546607400, 1547110800, 1546670652.6316, 1547122863.1579, 1546638252.6316, 1546878293.5579, 1546416000, 1546849694.4, 1547186400), class = c("POSIXct", "POSIXt"), tzone = "UTC"), Value = c(12610, 520, 1500, 90, 331380, 27300, 6072, 4200, 61488, 64372)), .Names = c("Start", "Stop", "Value"), row.names = c(41L, 55L, 25L, 29L, 38L, 28L, 1L, 20L, 14L, 31L), class = c("tbl_df", "tbl", "data.frame"))
head(df) and str(df) gives:
Start Stop Value
2019-01-03 16:00:00 2019-01-06 10:24:03 12610
2019-01-03 16:00:00 2019-01-04 13:10:00 520
2019-01-03 16:00:00 2019-01-10 09:00:00 1500
2019-01-04 21:44:12 2019-01-05 06:44:12 90
2019-01-03 21:10:00 2019-01-10 12:21:03 331380
2019-01-03 17:00:00 2019-01-04 21:44:12 27300
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 10 obs. of 3 variables:
$ Start: POSIXct, format: "2019-01-03 16:00:00" "2019-01-03 16:00:00" ...
$ Stop : POSIXct, format: "2019-01-06 10:24:03" "2019-01-04 13:10:00" ...
$ Value: num 12610 520 1500 90 331380 ...
So there are overlapping time periods with "Start" and "Stop" dates with assigned value. In any given record when there is a value between df$Start and df$Stop and outside of this scope the value is 0.
I want to create another dataframe based on which I could show how this values summarize and change over time. The Desired output would look like this (the "sum" column is made up):
> head(df2)
timestamp sum
"2019-01-02 09:00:00 CET" 14352
"2019-01-03 17:00:00 CET" 6253
"2019-01-03 18:00:00 CET" 23465
"2019-01-03 21:00:00 CET" 3241
"2019-01-03 22:10:00 CET" 23235
"2019-01-04 14:10:00 CET" 123321
To get unique timestamps:
timestamps <- sort(unique(c(df$`Start`, df$`Stop`)))
With df2 dataframe I could easily draw a graph with ggplot, but how to get this sums?
I think I should iterate over df data frame either some custom function or any built-it summarize function which would work like this:
fnct <- function(date, min, max, value) {
if (date >= min && date <=max) {
a <- value
}
else {
a <- 0
}
return(a)
}
...for every given date from timestamps iterate through df and give me a sum of values for the timestamp.
It looks really simple and I'm missing something very basic.
Here's a tidyverse solution similar to my response to this recent question. I gather to bring the timestamps (Starts and Stops) into one column, with another column specifying which. The Starts add the value and the Stops subtract it, and then we just take the cumulative sum to get values at all the instants when the sum changes.
For 100 records, there won't be any perceivable speed improvement from using data.table; in my experience it starts to make more of a difference around 1M records, especially when grouping is involved.
library(dplyr); library(tidyr)
df2 <- df %>%
gather(type, time, Start:Stop) %>%
mutate(chg = if_else(type == "Start", Value, -Value)) %>%
arrange(time) %>%
mutate(sum = cumsum(chg)) # EDIT: corrected per OP comment
> head(df2)
## A tibble: 6 x 5
# Value type time chg sum
# <dbl> <chr> <dttm> <dbl> <dbl>
#1 1500 Start 1987-04-27 14:13:20 1500 1500
#2 4200 Stop 2019-01-02 08:00:00 -4200 -2700
#3 12610 Start 2019-01-03 16:00:00 12610 9910
#4 520 Start 2019-01-03 16:00:00 520 10430
#5 4200 Start 2019-01-03 16:00:00 4200 14630
#6 27300 Start 2019-01-03 17:00:00 27300 41930
In the past I have tried to solve similar problems using the tidyverse/baseR... But nothing comes even remotely close to the speeds that data.table provides for these kind of operations, so I encourage you to give it a try...
For questions like this, my favourite finction is foverlaps() from the data.table-package. With this function you can (fast!) perform an overlap-join. If you want more flexibility in your joining than foverlaps() provides, a non-equi-join (again using data.table) is probably the best (and fastest!) option. But foverlaps() will do here (I guess).
I used the sample data you provided, but filtered out rows where Stop <= Start (probably a tyop in your sample data). When df$Start is not before df$Stop, foverlaps give a warning and won't execute.
library( data.table )
#create data.table with periods you wish to simmarise on
#NB: UTC is used as timezone, since this is also the case in the sample data provided!!
dt.dates <- data.table( id = paste0( "Day", 1:31 ),
Start = seq( as.POSIXct( "2019-01-01 00:00:00", format = "%Y-%m-%d %H:%M:%S", tz = "UTC" ),
as.POSIXct( "2019-01-31 00:00:00", format = "%Y-%m-%d %H:%M:%S", tz = "UTC" ),
by = "1 days"),
Stop = seq( as.POSIXct( "2019-01-02 00:00:00", format = "%Y-%m-%d %H:%M:%S", tz = "UTC" ) - 1,
as.POSIXct( "2019-02-01 00:00:00", format = "%Y-%m-%d %H:%M:%S", tz = "UTC" ) - 1,
by = "1 days") )
If you do not want to summarise on a daily basis, but by hour, minute, second, of year. Just change the values (and stepsize) in dt.dates data.table so that they match your periods.
#set df as data.table
dt <- as.data.table( df )
#filter out any row where Stop is smaller than Start
dt <- dt[ Start < Stop, ]
#perform overlap join
#first set keys
setkey(dt, Start, Stop)
#then perform join
result <- foverlaps( dt.dates, dt, type = "within" )
#summarise
result[, .( Value = sum( Value , na.rm = TRUE ) ), by = .(Day = i.Start) ]
output
# Day Value
# 1: 2019-01-01 1500
# 2: 2019-01-02 1500
# 3: 2019-01-03 1500
# 4: 2019-01-04 351562
# 5: 2019-01-05 413050
# 6: 2019-01-06 400440
# 7: 2019-01-07 332880
# 8: 2019-01-08 332880
# 9: 2019-01-09 332880
# 10: 2019-01-10 64372
# 11: 2019-01-11 0
# 12: 2019-01-12 0
# 13: 2019-01-13 0
# 14: 2019-01-14 0
# 15: 2019-01-15 0
# 16: 2019-01-16 0
# 17: 2019-01-17 0
# 18: 2019-01-18 0
# 19: 2019-01-19 0
# 20: 2019-01-20 0
# 21: 2019-01-21 0
# 22: 2019-01-22 0
# 23: 2019-01-23 0
# 24: 2019-01-24 0
# 25: 2019-01-25 0
# 26: 2019-01-26 0
# 27: 2019-01-27 0
# 28: 2019-01-28 0
# 29: 2019-01-29 0
# 30: 2019-01-30 0
# 31: 2019-01-31 0
# Day Value
plot
#summarise for plot
result.plot <- result[, .( Value = sum( Value , na.rm = TRUE ) ), by = .(Day = i.Start) ]
library( ggplot2 )
ggplot( data = result.plot, aes( x = Day, y = Value ) ) + geom_col()
I have one data table which contains just a sequence of times. I have another data table containing two columns: start_time and end_time. I want to take the first data table and add a column where the value is the count of all of the rows in the second data table where the time from the first data table fits within the start and end time. Here is my code
start_date <- as.POSIXct(x = "2017-01-31 17:00:00", format = "%Y-%m-%d %H:%M:%S")
end_date <- as.POSIXct(x = "2017-02-01 09:00:00", format = "%Y-%m-%d %H:%M:%S")
all_dates <- as.data.table(seq(start_date, end_date, "min"))
colnames(all_dates) <- c("Bin")
start_times <- sample(seq(start_date,end_date,"min"), 100)
offsets <- sample(seq(60,7200,60), 100)
end_times <- start_times + offsets
input_data <- data.table(start_times, end_times)
Here is what i want to do, but this is wrong and gives an error. What's the right way to write this?
all_dates[, BinCount := input_data[start_times < Bin & end_times > Bin, .N] ]
In the end i should get something like
Bin BinCount
2017-01-31 17:00:00 1
2017-01-31 17:01:00 5
...
The problem can be solved very easily using sqldf as it provides easy way to join tables with range checking. Hence one solution could be:
The data from OP:
library(data.table)
start_date <- as.POSIXct(x = "2017-01-31 17:00:00", format = "%Y-%m-%d %H:%M:%S")
end_date <- as.POSIXct(x = "2017-02-01 09:00:00", format = "%Y-%m-%d %H:%M:%S")
all_dates <- as.data.table(seq(start_date, end_date, "min"))
colnames(all_dates) <- c("Bin")
start_times <- sample(seq(start_date,end_date,"min"), 100)
offsets <- sample(seq(60,7200,60), 100)
end_times <- start_times + offsets
input_data <- data.table(start_times, end_times)
library(sqldf)
result <- sqldf("SELECT all_dates.bin, count() as BinCount
FROM all_dates, input_data
WHERE all_dates.bin > input_data.start_times AND
all_dates.bin < input_data.end_times
GROUP BY bin" )
result
Bin BinCount
1 2017-01-31 17:01:00 1
2 2017-01-31 17:02:00 1
3 2017-01-31 17:03:00 1
4 2017-01-31 17:04:00 1
5 2017-01-31 17:05:00 1
6 2017-01-31 17:06:00 1
...........
...........
497 2017-02-01 01:17:00 6
498 2017-02-01 01:18:00 5
499 2017-02-01 01:19:00 5
500 2017-02-01 01:20:00 4
[ reached getOption("max.print") -- omitted 460 rows ]
In data.table you're after a range join.
library(data.table)
start_date <- as.POSIXct(x = "2017-01-31 17:00:00", format = "%Y-%m-%d %H:%M:%S")
end_date <- as.POSIXct(x = "2017-02-01 09:00:00", format = "%Y-%m-%d %H:%M:%S")
all_dates <- as.data.table(seq(start_date, end_date, "min"))
colnames(all_dates) <- c("Bin")
set.seed(123)
start_times <- sample(seq(start_date,end_date,"min"), 100)
offsets <- sample(seq(60,7200,60), 100)
end_times <- start_times + offsets
input_data <- data.table(start_times, end_times)
## doing the range-join and calculating the number of items per bin in one chained step
input_data[
all_dates
, on = .(start_times < Bin, end_times > Bin)
, nomatch = 0
, allow.cartesian = T
][, .N, by = start_times]
# start_times N
# 1: 2017-01-31 17:01:00 1
# 2: 2017-01-31 17:02:00 1
# 3: 2017-01-31 17:03:00 1
# 4: 2017-01-31 17:04:00 1
# 5: 2017-01-31 17:05:00 1
# ---
# 956: 2017-02-01 08:56:00 6
# 957: 2017-02-01 08:57:00 4
# 958: 2017-02-01 08:58:00 4
# 959: 2017-02-01 08:59:00 5
# 960: 2017-02-01 09:00:00 5
Note:
I've put the all_dates object on the right-hand-side of the join, so the result contains the names of the input_data columns, even though they are your Bins (see this issue for the discussion on this topic)
I've used set.seed(), as you're taking samples
Wasn't requested, but here is a compact alternative solution using the tidyverse. Uses lubridate parsers, interval, and %within%, as well as purrr::map_int to generate the desired bin counts.
library(tidyverse)
library(lubridate)
start_date <- ymd_hms(x = "2017-01-31 17:00:00") # lubridate parsers
end_date <- ymd_hms(x = "2017-02-01 09:00:00")
all_dates <- tibble(seq(start_date, end_date, "min")) # tibble swap for data.table
colnames(all_dates) <- c("Bin")
start_times <- sample(seq(start_date,end_date,"min"), 100)
offsets <- sample(seq(60,7200,60), 100)
end_times <- start_times + offsets
input_data <- tibble(
start_times,
end_times,
intvl = interval(start_times, end_times) # Add interval column
)
all_dates %>% # Checks date in Bin and counts intervals it lies within
mutate(BinCount = map_int(.$Bin, ~ sum(. %within% input_data$intvl)))
# A tibble: 961 x 2
Bin BinCount
<dttm> <int>
1 2017-01-31 17:00:00 0
2 2017-01-31 17:01:00 0
3 2017-01-31 17:02:00 0
4 2017-01-31 17:03:00 0
5 2017-01-31 17:04:00 0
6 2017-01-31 17:05:00 0
7 2017-01-31 17:06:00 0
8 2017-01-31 17:07:00 1
9 2017-01-31 17:08:00 1
10 2017-01-31 17:09:00 1
# ... with 951 more rows
I have a big data.table that looks like:
dt<-data.table(start=c("2012-07-13 23:45:00", "2012-07-14 15:30:00",
"2012-07-14 23:57:00"),
end=c("2012-07-14 00:02:00", "2012-07-14 15:35:00",
"2012-07-15 00:05:00"), id=c(1,2,1),cat=c("a","b","a"))
dt
start end id cat
1: 2012-07-13 23:45:00 2012-07-14 00:02:00 1 a
2: 2012-07-14 15:30:00 2012-07-14 15:35:00 2 b
3: 2012-07-14 23:57:00 2012-07-15 00:05:00 1 a
I need to get an output that shows total minutes of event on each calendar day by id and category. Using the example above the output should be:
day id cat V1
1: 13.07.2012 1 a 15
2: 14.07.2012 1 a 5
3: 14.07.2012 2 b 5
4: 15.07.2012 1 a 5
I used adply function from plyr package to split duration in intervals by minute:
fn<-function(x){
s<-seq(from = as.POSIXct(x$start),
to = as.POSIXct(x$end)-1,by = "mins")
# here s is a sequence of all minutes in the given interval
df<-data.table(x$id,x$cat,s)
# return new data.table that contains each calendar minute for each id
# and categoryy of the original data
df
}
# run the function above for each row in the data.table
dd<-adply(dt,1,fn)
# extract the date from calendar minutes
dd[,day:=format(as.POSIXct(s,"%d.%m.%Y %H:%M%:%S"), "%d.%m.%Y")]
#calculate sum of all minutes of event for each day, id and category
dd[,.N,by=c("day","id","cat")][order(day,id,cat)]
The solution above perfectly suits my needs except the time it takes for calculation. When adply is run in a very big data and several categories defined in fn function, it feels like CPU runs forever.
I will highly appreciate any hint on how to use pure data.table functionality in this problem.
I would suggest a few things
Convert to as.POSIXct only once instead of per each row.
instead of adply which creates a whole data.table in each iteration, just use by within the data.table scope.
In order to do so, simple create an row index using .I
Here's a quick attempt (I've used substr because it will be probably faster than as.Date or as.POSIXct. If you want it to be Date class again, use res[, Date := as.IDate(Date)] on the result istead of doing it by group).
dt[, `:=`(start = as.POSIXct(start), end = as.POSIXct(end), indx = .I)]
dt[, seq(start, end - 1L, by = "mins"), by = .(indx, id, cat)
][, .N, by = .(Date = substr(V1, 1L, 10L), id, cat)]
# Date id cat N
# 1: 2012-07-13 1 a 15
# 2: 2012-07-14 1 a 5
# 3: 2012-07-14 2 b 5
# 4: 2012-07-15 1 a 5
Try to see if this is faster.
It's still data.table in the background, but I'm using a dplyr syntax for the process.
library(data.table)
dt<-data.table(start=c("2012-07-13 23:45:00", "2012-07-14 15:30:00",
"2012-07-14 23:57:00"),
end=c("2012-07-14 00:02:00", "2012-07-14 15:35:00",
"2012-07-15 00:05:00"), id=c(1,2,1),cat=c("a","b","a"))
fn<-function(x){
s<-seq(from = as.POSIXct(x$start),
to = as.POSIXct(x$end)-1,by = "mins")
# here s is a sequence of all minutes in the given interval
df<-data.table(x$id,x$cat,s)
# return new data.table that contains each calendar minute for each id
# and categoryy of the original data
df
}
library(dplyr)
dt %>%
rowwise() %>% # for each row
do(fn(.)) %>% # apply your function
select(day=s, id=V1, cat=V2) %>% # rename columns
mutate(day = substr(day,1,10)) %>% # keep only the day
ungroup %>%
group_by(day,id,cat) %>%
summarise(N=n()) %>%
ungroup
# Source: local data frame [4 x 4]
#
# day id cat N
# (chr) (dbl) (chr) (int)
# 1 2012-07-13 1 a 15
# 2 2012-07-14 1 a 5
# 3 2012-07-14 2 b 5
# 4 2012-07-15 1 a 5
I am working with a large dataset, an example can be shown below. For the majority of individual files I will have to process there should be more than one day's worth of data.
Date <- c("05/12/2012 05:00:00", "05/12/2012 06:00:00", "05/12/2012 07:00:00",
"05/12/2012 08:00:00", "06/12/2012 07:00:00", "06/12/2012 08:00:00",
"07/12/2012 05:00:00", "07/12/2012 06:00:00", "07/12/2012 07:00:00",
"07/12/2012 08:00:00")
Date <- strptime(Date, "%d/%m/%Y %H:%M")
c <- c("0","1","5","4","6","8","0","3","10","6")
c <- as.numeric(c)
df1 <- data.frame(Date,c,stringsAsFactors = FALSE)
I wish to only be left with data on a single day. This day will be chosen by having the most number of data points for that day. If for any reason two days are tied (with the maximum number of data points), I wish to select the day with the highest individual value recorded.
In the example dataframe given above, I would be left with 7th Dec. It has 4 data points (as has the 5th Dec), but it has the highest value recorded out of these two days (i.e. 10).
Here's a solution with tapply.
# count rows per day and find maximum c value
res <- with(df1, tapply(c, as.Date(Date), function(x) c(length(x), max(x))))
# order these two values in decreasing order and find the associated day
# (at top position):
maxDate <- names(res)[order(sapply(res, "[", 1),
sapply(res, "[", 2), decreasing = TRUE)[1]]
# subset data frame:
subset(df1, as.character(as.Date(Date)) %in% maxDate)
Date c
7 2012-12-07 05:00:00 0
8 2012-12-07 06:00:00 3
9 2012-12-07 07:00:00 10
10 2012-12-07 08:00:00 6
A data.table solution:
dt <- data.table(df1)
# get just the date
dt[, day := as.Date(Date)]
setkey(dt, "day")
# get total entries (N) and max(c) for each day-group
dt <- dt[, `:=`(N = .N, mc = max(c)), by=day]
setkey(dt, "N")
# filter by maximum of N
dt <- dt[J(max(N))]
setkey(dt, "mc")
# settle ties with maximum of c
dt <- dt[J(max(mc))]
dt[, c("N", "mc", "day") := NULL]
print(dt)
# Date c
# 1: 2012-12-07 05:00:00 0
# 2: 2012-12-07 06:00:00 3
# 3: 2012-12-07 07:00:00 10
# 4: 2012-12-07 08:00:00 6
And to be complete, here's one with plyr :
library(plyr)
df1$day <- strftime(df1$Date, "%d/%m/%Y")
tmp <- ddply(df1[,c("day","c")], .(day), summarize, nb=length(c), max=max(c))
tmp <- tmp[order(tmp$nb, tmp$max, decreasing=TRUE),]
df1[df1$day==tmp$day[1],]
Which gives :
Date c day
7 2012-12-07 05:00:00 0 07/12/2012
8 2012-12-07 06:00:00 3 07/12/2012
9 2012-12-07 07:00:00 10 07/12/2012
10 2012-12-07 08:00:00 6 07/12/2012