I have two datasets, one with values at specific time points for different IDs and another one with several time frames for the IDs. Now I want to check if the timepoint in dataframe one is within any of the time frames from dataset 2 matching the ID.
For example:
df1:
ID date time
1 2020-04-14 11:00:00
1 2020-04-14 18:00:00
1 2020-04-15 10:00:00
1 2020-04-15 20:00:00
1 2020-04-16 11:00:00
1 ...
2 ...
df2:
ID start end
1 2020-04-14 16:00:00 2020-04-14 20:00:00
1 2020-04-15 18:00:00 2020-04-16 13:00:00
2 ...
2
what I want
df1_new:
ID date time mark
1 2020-04-14 11:00:00 0
1 2020-04-14 18:00:00 1
1 2020-04-15 10:00:00 0
1 2020-04-15 20:00:00 1
1 2020-04-16 11:00:00 1
1 ...
2 ...
Any help would be appreciated!
An option could be:
library(tidyverse)
library(lubridate)
#> date, intersect, setdiff, union
df_1 <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L), date = c("14.04.2020",
"14.04.2020", "15.04.2020", "15.04.2020", "16.04.2020"), time = c("11:00:00",
"18:00:00", "10:00:00", "20:00:00", "11:00:00"), date_time = structure(c(1586862000,
1586887200, 1586944800, 1586980800, 1587034800), class = c("POSIXct",
"POSIXt"), tzone = "UTC")), class = "data.frame", row.names = c(NA,
-5L))
df_2 <- structure(list(ID = c(1L, 1L), start = c("14.04.2020 16:00",
"15.04.2020 18:00"), end = c("14.04.2020 20:00", "16.04.2020 13:00"
)), class = "data.frame", row.names = c(NA, -2L))
df_22 <- df_2 %>%
mutate(across(c("start", "end"), dmy_hm)) %>%
group_nest(ID)
left_join(x = df_1, y = df_22, by = "ID") %>%
as_tibble() %>%
mutate(mark = map2_dbl(date_time, data, ~+any(.x %within% interval(.y$start, .y$end)))) %>%
select(-data)
#> # A tibble: 5 x 5
#> ID date time date_time mark
#> <int> <chr> <chr> <dttm> <dbl>
#> 1 1 14.04.2020 11:00:00 2020-04-14 11:00:00 0
#> 2 1 14.04.2020 18:00:00 2020-04-14 18:00:00 1
#> 3 1 15.04.2020 10:00:00 2020-04-15 10:00:00 0
#> 4 1 15.04.2020 20:00:00 2020-04-15 20:00:00 1
#> 5 1 16.04.2020 11:00:00 2020-04-16 11:00:00 1
Created on 2021-05-25 by the reprex package (v2.0.0)
Related
This question already has answers here:
How to join two dataframes by nearest time-date?
(2 answers)
Closed last year.
I've seen various solutions for this question based on date only, but the time component is tripping me up. I have two data frames with POSIXct columns called 'datetime'. For DF1 that column has data rounded to the nearest hour. For DF2, the time component is not rounded to the nearest hour and can occur anytime. The dataframes look like this:
DF1
datetime
X
Y
Z
2020-09-01 03:00:00
1
3
4
2020-09-02 12:00:00
12
3
5
2020-09-02 22:00:00
4
9
19
2020-09-03 01:00:00
4
10
2
2020-09-04 06:00:00
4
12
1
2020-09-04 08:00:00
11
13
10
DF2
datetime
Var
2020-09-01 02:23:14
A
2020-09-01 03:12:09
B
2020-09-02 11:52:15
A
2020-09-02 12:15:44
B
2020-09-02 22:31:56
A
2020-09-02 21:38:05
B
2020-09-03 01:11:39
A
2020-09-03 00:59:33
B
2020-09-04 05:12:19
A
2020-09-04 06:07:09
B
2020-09-04 08:22:28
A
2020-09-04 07:50:17
B
What I want is to merge these two dataframes based on this column using the date and time that are closest in time to 'datetime' in DF1, so that it looks like this:
datetime
X
Y
Z
Var
2020-09-01 03:00:00
1
3
4
B
2020-09-02 12:00:00
12
3
5
A
2020-09-02 22:00:00
4
9
19
B
2020-09-03 01:00:00
4
10
2
B
2020-09-04 06:00:00
4
12
1
B
2020-09-04 08:00:00
11
13
10
B
Thank you!
Adding helper columns for merge and group_by, using merge and then dplyr for the filtering
library(dplyr)
df1$tmp <- as.Date(df1$datetime)
df2$tmp <- as.Date(df2$datetime)
df1$grp <- 1:(nrow(df1))
merge(df1, df2, "tmp") %>%
group_by(grp) %>%
slice(which.min(abs(difftime(datetime.x, datetime.y)))) %>%
ungroup() %>%
select(-c(tmp,grp,datetime.y))
# A tibble: 6 × 5
datetime.x X Y Z Var
<chr> <int> <int> <int> <chr>
1 2020-09-01 03:00:00 1 3 4 B
2 2020-09-02 12:00:00 12 3 5 A
3 2020-09-02 22:00:00 4 9 19 B
4 2020-09-03 01:00:00 4 10 2 B
5 2020-09-04 06:00:00 4 12 1 B
6 2020-09-04 08:00:00 11 13 10 B
Data
df1 <- structure(list(datetime = c("2020-09-01 03:00:00", "2020-09-02 12:00:00",
"2020-09-02 22:00:00", "2020-09-03 01:00:00", "2020-09-04 06:00:00",
"2020-09-04 08:00:00"), X = c(1L, 12L, 4L, 4L, 4L, 11L), Y = c(3L,
3L, 9L, 10L, 12L, 13L), Z = c(4L, 5L, 19L, 2L, 1L, 10L)), class = "data.frame", row.names = c(NA,
-6L))
df2 <- structure(list(datetime = c("2020-09-01 02:23:14", "2020-09-01 03:12:09",
"2020-09-02 11:52:15", "2020-09-02 12:15:44", "2020-09-02 22:31:56",
"2020-09-02 21:38:05", "2020-09-03 01:11:39", "2020-09-03 00:59:33",
"2020-09-04 05:12:19", "2020-09-04 06:07:09", "2020-09-04 08:22:28",
"2020-09-04 07:50:17"), Var = c("A", "B", "A", "B", "A", "B",
"A", "B", "A", "B", "A", "B")), class = "data.frame", row.names = c(NA,
-12L))
I have a table with multiple datetime columns, I wish to extract weekday for each of those columns and add as a new column.
Sample dataset:
structure(list(mealTime = structure(c(1542492000, 1578852000,
1604253600, 1545901200, 1549821600, 1544306400), tzone = "UTC", class = c("POSIXct",
"POSIXt")), weight_measure_time = structure(c(1542226000, 1578812400,
1594710000, 1545896762, 1546416823, 1544227245), tzone = "UTC", class = c("POSIXct",
"POSIXt")), height_measure_time = structure(c(1542106434, 1543337043,
1543337043, 1542387988, 1542366547, 1542802228), tzone = "UTC", class = c("POSIXct",
"POSIXt")), hba1c_measure_time = structure(c(1542106860, 1573455600,
1594625400, 1544781600, 1545920520, 1544096580), tzone = "UTC", class = c("POSIXct",
"POSIXt")), bpMeasureTime = structure(c(1542380623, 1578812400,
1583218800, 1545896774, 1546416837, 1544266110), tzone = "UTC", class = c("POSIXct",
"POSIXt"))), row.names = c(NA, -6L), class = c("tbl_df", "tbl",
"data.frame"))
which looks something like this:
> smple
# A tibble: 6 x 5
mealTime weight_measure_time height_measure_time
<dttm> <dttm> <dttm>
1 2018-11-17 22:00:00 2018-11-14 20:06:40 2018-11-13 10:53:54
2 2020-01-12 18:00:00 2020-01-12 07:00:00 2018-11-27 16:44:03
3 2020-11-01 18:00:00 2020-07-14 07:00:00 2018-11-27 16:44:03
4 2018-12-27 09:00:00 2018-12-27 07:46:02 2018-11-16 17:06:28
5 2019-02-10 18:00:00 2019-01-02 08:13:43 2018-11-16 11:09:07
6 2018-12-08 22:00:00 2018-12-08 00:00:45 2018-11-21 12:10:28
# ... with 2 more variables: hba1c_measure_time <dttm>, bpMeasureTime <dttm>
For the above dataset, the expected result I am expecting is, i.e. for each datetime column extract the weekday and add it in respective column:
glimpse(smple)
Rows: 6
Columns: 10
$ mealTime <dttm> 2018-11-17 22:00:00, 2020-01-12 18:00:00, 20~
$ weight_measure_time <dttm> 2018-11-14 20:06:40, 2020-01-12 07:00:00, 20~
$ height_measure_time <dttm> 2018-11-13 10:53:54, 2018-11-27 16:44:03, 20~
$ hba1c_measure_time <dttm> 2018-11-13 11:01:00, 2019-11-11 07:00:00, 20~
$ bpMeasureTime <dttm> 2018-11-16 15:03:43, 2020-01-12 07:00:00, 20~
$ mealTime_day <chr> "Saturday", "Sunday", "Sunday", "Thursday", "~
$ weight_measure_time_day <chr> "Wednesday", "Sunday", "Tuesday", "Thursday",~
$ height_measure_time_day <chr> "Tuesday", "Tuesday", "Tuesday", "Friday", "F~
$ hba1c_measure_time_day <chr> "Tuesday", "Monday", "Monday", "Friday", "Thu~
$ bpMeasureTime_day <chr> "Friday", "Sunday", "Tuesday", "Thursday", "W~
In the base R, I can achieve the above as follows:
smple[paste(colnames(smple), "day", sep="_")] = apply(smple, 2, lubridate::wday, label=TRUE, abbr=FALSE)
I wanted to know if there is a similar way in tidyverse, which adds column dynamically by evaluating both LHS and RHS.
Making use of across and where you could do:
library(dplyr)
library(lubridate)
mutate(smpl, across(where(is.POSIXct), lubridate::wday,
label=TRUE, abbr=FALSE, .names = "{.col}_day"))
#> # A tibble: 6 x 10
#> mealTime weight_measure_time height_measure_time
#> <dttm> <dttm> <dttm>
#> 1 2018-11-17 22:00:00 2018-11-14 20:06:40 2018-11-13 10:53:54
#> 2 2020-01-12 18:00:00 2020-01-12 07:00:00 2018-11-27 16:44:03
#> 3 2020-11-01 18:00:00 2020-07-14 07:00:00 2018-11-27 16:44:03
#> 4 2018-12-27 09:00:00 2018-12-27 07:46:02 2018-11-16 17:06:28
#> 5 2019-02-10 18:00:00 2019-01-02 08:13:43 2018-11-16 11:09:07
#> 6 2018-12-08 22:00:00 2018-12-08 00:00:45 2018-11-21 12:10:28
#> # … with 7 more variables: hba1c_measure_time <dttm>, bpMeasureTime <dttm>,
#> # mealTime_day <dbl>, weight_measure_time_day <dbl>,
#> # height_measure_time_day <dbl>, hba1c_measure_time_day <dbl>,
#> # bpMeasureTime_day <dbl>
Here is one way to solve your problem:
df[paste0(names(df), "_day")] <- lapply(df, weekdays)
Base R solution:
cbind(
df,
setNames(
data.frame(
Map(
weekdays,
df
)
),
paste0(
names(df),
ifelse(
grepl(
"_",
names(df)
),
"_day_of_week",
"DayOfWeek"
)
)
)
)
dplyr solution only using weekdays from base R
library(dplyr)
df %>%
mutate(across(everything(), weekdays, .names = "{.col}_day"))
Output:
mealTime weight_measure_time height_measure_time hba1c_measure_time bpMeasureTime mealTime_day weight_measure_time_day
<dttm> <dttm> <dttm> <dttm> <dttm> <chr> <chr>
1 2018-11-17 22:00:00 2018-11-14 20:06:40 2018-11-13 10:53:54 2018-11-13 11:01:00 2018-11-16 15:03:43 Samstag Mittwoch
2 2020-01-12 18:00:00 2020-01-12 07:00:00 2018-11-27 16:44:03 2019-11-11 07:00:00 2020-01-12 07:00:00 Sonntag Sonntag
3 2020-11-01 18:00:00 2020-07-14 07:00:00 2018-11-27 16:44:03 2020-07-13 07:30:00 2020-03-03 07:00:00 Sonntag Dienstag
4 2018-12-27 09:00:00 2018-12-27 07:46:02 2018-11-16 17:06:28 2018-12-14 10:00:00 2018-12-27 07:46:14 Donnerstag Donnerstag
5 2019-02-10 18:00:00 2019-01-02 08:13:43 2018-11-16 11:09:07 2018-12-27 14:22:00 2019-01-02 08:13:57 Sonntag Mittwoch
6 2018-12-08 22:00:00 2018-12-08 00:00:45 2018-11-21 12:10:28 2018-12-06 11:43:00 2018-12-08 10:48:30 Samstag Samstag
# ... with 3 more variables: height_measure_time_day <chr>, hba1c_measure_time_day <chr>, bpMeasureTime_day <chr>
Here is my data
sampleData <- structure(list(Category = c("A", "B", "C", "D", "E", "F", "G",
"H", "I", "J", "K"), Date = structure(c(1546300800, 1547510400,
1547769600, 1548288000, 1548979200, 1549756800, 1550188800, 1551398400,
1552348800, 1552608000, 1553472000), class = c("POSIXct", "POSIXt"
), tzone = "UTC")), row.names = c(NA, -11L), class = c("tbl_df",
"tbl", "data.frame"))
# A tibble: 11 x 2
Category Date
<chr> <dttm>
1 A 2019-01-01
2 B 2019-01-15
3 C 2019-01-18
4 D 2019-01-24
5 E 2019-02-01
6 F 2019-02-10
7 G 2019-02-15
8 H 2019-03-01
9 I 2019-03-12
10 J 2019-03-15
11 K 2019-03-25
lookupData <- structure(list(`Original Date` = structure(c(1546560000, 1547769600,
1548979200, 1550188800, 1551398400, 1552608000, 1553817600, 1555027200,
1556236800, 1557446400, 1558656000, 1559865600), class = c("POSIXct",
"POSIXt"), tzone = "UTC")), row.names = c(NA, -12L), class = c("tbl_df",
"tbl", "data.frame"))
# A tibble: 12 x 1
`Original Date`
<dttm>
1 2019-01-04
2 2019-01-18
3 2019-02-01
4 2019-02-15
5 2019-03-01
6 2019-03-15
7 2019-03-29
8 2019-04-12
9 2019-04-26
10 2019-05-10
11 2019-05-24
12 2019-06-07
Currently I have multiple ifelse() statements something like this to get this working.
sampleData$ModifiedDate <- ifelse(sampleData$Date <= "2019-01-04", "2019-01-04",
ifelse(sampleData$Date <= "2019-01-18", "2019-01-18",
ifelse(sampleData$Date <= "2019-02-01", "2019-02-01",
ifelse(sampleData$Date <= "2019-02-15", "2019-02-15",
ifelse(sampleData$Date <= "2019-03-01", "2019-03-01",
ifelse(sampleData$Date <= "2019-03-15", "2019-03-15",
ifelse(sampleData$Date <= "2019-03-29", "2019-03-29",
ifelse(sampleData$Date <= "2019-04-12", "2019-04-12",
ifelse(sampleData$Date <= "2019-04-26", "2019-04-26","")))))))))
This works, but it is not the way I would want it. Is there a more efficient way to do this? I tried the merge() and fuzzy_left_join() options but I don't get the desired results like below.
Here's an attempt with fuzzyjoin:
library(dplyr)
lookupData %>%
mutate(z = lag(`Original Date`, default = as.POSIXct("1970-01-01"))) %>%
fuzzyjoin::fuzzy_left_join(
sampleData, .,
by = c(Date = "z", Date = "Original Date"),
match_fun = list(`>`, `<=`)) %>%
select(-z)
# # A tibble: 11 x 3
# Category Date `Original Date`
# <chr> <dttm> <dttm>
# 1 A 2019-01-01 00:00:00 2019-01-04 00:00:00
# 2 B 2019-01-15 00:00:00 2019-01-18 00:00:00
# 3 C 2019-01-18 00:00:00 2019-01-18 00:00:00
# 4 D 2019-01-24 00:00:00 2019-02-01 00:00:00
# 5 E 2019-02-01 00:00:00 2019-02-01 00:00:00
# 6 F 2019-02-15 00:00:00 2019-02-15 00:00:00
# 7 G 2019-02-10 00:00:00 2019-02-15 00:00:00
# 8 H 2019-03-12 00:00:00 2019-03-15 00:00:00
# 9 I 2019-03-01 00:00:00 2019-03-01 00:00:00
# 10 J 2019-03-15 00:00:00 2019-03-15 00:00:00
# 11 K 2019-03-25 00:00:00 2019-03-29 00:00:00
This would be better served with a formula as it appears you are advancing all dates to the following, 2nd Friday. If that is correct then the following will accomplish that and does not matter how long the dates span.
Setting baseDate that is used to determine what is the first date for reference:
baseDate <- structure(1546560000, class = c("POSIXct", "POSIXt"), tzone = "UTC")
Using ceiling to advance the date to the following, 2nd Friday:
sampleData$NewDate <- baseDate + ceiling((sampleData$Date - baseDate) / 14) * 14
Category Date NewDate
1 A 2019-01-01 2019-01-04
2 B 2019-01-15 2019-01-18
3 C 2019-01-18 2019-01-18
4 D 2019-01-24 2019-02-01
5 E 2019-02-01 2019-02-01
6 F 2019-02-15 2019-02-15
7 G 2019-02-10 2019-02-15
8 H 2019-03-12 2019-03-15
9 I 2019-03-01 2019-03-01
10 J 2019-03-15 2019-03-15
11 K 2019-03-25 2019-03-29
In a data frame that I've called into R, I'm trying to change the dates listed to a different date. For example, I want 2020-06-04 to become 2020-06-03.
Below is code that I've tried to write in order to do this, but haven't succeeded.
I also did this to the data frame prior:
AbsoluteCover$Date <- as.Date(AbsoluteCover$Date,
format = "%m/%d/%y")
1:
AC <- mutate(AbsoluteCover, NewDate = c("2020-06-04" == "2020-06-03" & "2020-06-19" == "2020-06-18" & "2020-07-12" == "2020-07-28"))
This just creates a new column called "NewDate" but with all FALSE in the cells. This outcome makes sense, but it's not what I want.
2:
AC <- AbsoluteCover %>% mutate(Date, "2020-06-04" == "2020-06-03" & "2020-06-19" == "2020-06-18" & "2020-07-12" == "2020-07-28")
This does the same thing as 1 above.
3:
AC <- replace(AbsoluteCover$Date, c("2020-06-04", "2020-06-19", "2020-07-12"), c("2020-06-03", "2020-06-18", "2020-07-28"))
This just returns a data frame with one column with dates.
Here is an example of my data frame:
dput(head(AbsoluteCover))
structure(list(Plot = c("A1", "A1", "A1", "A2", "A2", "A2"),
Date = structure(c(18417, 18432, 18455, 18417, 18432, 18455
), class = "Date"), Cover = c(12L, 34L, 17L, 2L, 50L, 3L)), row.names = c(NA,
-6L), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), groups = structure(list(
Plot = c("A1", "A2"), .rows = list(1:3, 4:6)), row.names = c(NA,
-2L), class = c("tbl_df", "tbl", "data.frame"), .drop = TRUE))
As you haven't provided a sample dataframe, I have worked out the example using a test dataset.
You can use the which function to select the rows based on condition
dates = c(as.Date('2020-06-04'), as.Date('2020-01-03'))
df = data.frame('a' = sample(dates, 15, replace= TRUE))
df
#> a
#> 1 2020-01-03
#> 2 2020-06-04
#> 3 2020-06-04
#> 4 2020-01-03
#> 5 2020-06-04
#> 6 2020-06-04
#> 7 2020-01-03
#> 8 2020-06-04
#> 9 2020-01-03
#> 10 2020-01-03
#> 11 2020-01-03
#> 12 2020-01-03
#> 13 2020-01-03
#> 14 2020-06-04
#> 15 2020-06-04
df[which(df$a == as.Date('2020-06-04')), 'a'] = as.Date('2020-06-03')
df
#> a
#> 1 2020-01-03
#> 2 2020-06-03
#> 3 2020-06-03
#> 4 2020-01-03
#> 5 2020-06-03
#> 6 2020-06-03
#> 7 2020-01-03
#> 8 2020-06-03
#> 9 2020-01-03
#> 10 2020-01-03
#> 11 2020-01-03
#> 12 2020-01-03
#> 13 2020-01-03
#> 14 2020-06-03
#> 15 2020-06-03
Created on 2020-07-09 by the reprex package (v0.3.0)
You can use mutateand case_when:
library(dplyr)
df %>% mutate(Date = case_when(
Date == "2020-06-04" ~ "2020-06-03",
Date == "2020-06-19" ~ "2020-06-18",
Date == "2020-07-12" ~ "2020-07-28"))
# A tibble: 6 x 3
# Groups: Plot [2]
Plot Date Cover
<chr> <chr> <int>
1 A1 2020-06-03 12
2 A1 2020-06-18 34
3 A1 2020-07-28 17
4 A2 2020-06-03 2
5 A2 2020-06-18 50
6 A2 2020-07-28 3
I would like to compute the spatial average over a region of data that I define, by defining a longitude/latitude gridbox.
The data I have is ECMWF Sea-ice data, so it's spatio-temporal data for each .75x.75 lon/lat coordinate over the whole Northern Hemisphere. I've changed the data from NetCDF format into an R dataframe, so the head(var.df) looks like this with columns: Date, longitude, latitude, value
date_time lon lat ci
1 2016-01-01 18:00:00 0 87.75 1
2 2016-01-02 18:00:00 0 87.75 1
3 2016-01-03 18:00:00 0 87.75 1
4 2016-01-04 18:00:00 0 87.75 1
5 2016-01-05 18:00:00 0 87.75 1
6 2016-01-06 18:00:00 0 87.75 1
There is therefore a value for each lon/lat coordinate across the northern hemisphere (df is ordered by date, rather than lon for some reason).
How would I extract the spatial area that I want i.e.
BK <- subset(var.df,lon <= 30 & lon >= 105 & lat >= 70 & lat <= 80)
and then average all the values that fall within that area, for each timestep (day)? So I'd have the mean of a gridbox that I define.
Thanks in advance, I hope this wasn't phrased terribly.
Update
Using GGamba's suggested code below, I got the following output, with multiple values for the same day so it hadn't averaged the whole region by timeslice.
date_time binlat binlon mean
<dttm> <fctr> <fctr> <dbl>
1 2016-01-01 18:00:00 [80,90) [0,10) 0.4200042
2 2016-01-01 18:00:00 [80,90) [10,20) 0.4503899
3 2016-01-01 18:00:00 [80,90) [20,30) 0.5614429
4 2016-01-01 18:00:00 [80,90) [30,40) 0.6118528
5 2016-01-01 18:00:00 [80,90) [40,50) 0.5809092
6 2016-01-01 18:00:00 [80,90) [50,60) 0.5617919
7 2016-01-01 18:00:00 [80,90) [60,70) 0.6071370
8 2016-01-01 18:00:00 [80,90) [70,80) 0.6011818
9 2016-01-01 18:00:00 [80,90) [80,90] 0.5442770
10 2016-01-01 18:00:00 [80,90) NA 0.4120862
# ... with 610 more rows
I also had to add na.rm = TRUE to the mean() function at the end, as the averages were NA.
Using dplyr we can do:
library(dplyr)
df %>%
mutate(binlon = cut(lon, seq(from = min(lon), to = max(lon), by = .75), include.lowest = T, right = F),
binlat = cut(lat, seq(from = min(lat), to = max(lat), by = .75), include.lowest = T, right = F)) %>%
group_by(date_time, binlat, binlon) %>%
summarise(mean = mean(ci))
Data:
structure(list(date_time = structure(1:6, .Label = c("2016-01-01 18:00:00",
"2016-01-02 18:00:00", "2016-01-03 18:00:00", "2016-01-04 18:00:00",
"2016-01-05 18:00:00", "2016-01-06 18:00:00"), class = "factor"),
lon = c(0L, 0L, 0L, 0L, 0L, 90L), lat = c(0, 87.75, 87.75,
87.75, 87.75, 90), ci = c(1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("date_time",
"lon", "lat", "ci"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
Results:
# date_time binlat binlon mean
# <fctr> <fctr> <fctr> <dbl>
# 1 2016-01-01 18:00:00 [0,0.75) [0,0.75) 1
# 2 2016-01-02 18:00:00 [87.8,88.5) [0,0.75) 1
# 3 2016-01-03 18:00:00 [87.8,88.5) [0,0.75) 1
# 4 2016-01-04 18:00:00 [87.8,88.5) [0,0.75) 1
# 5 2016-01-05 18:00:00 [87.8,88.5) [0,0.75) 1
# 6 2016-01-06 18:00:00 [89.2,90] [89.2,90] 1
# 6 2016-01-06 18:00:00 [80,90) [0,10) 1
This create two new columns binning lat & lon into bins defined into the cut function.
Then group by date_time and the new columns and calculate the ci mean on the group.
Of course you should adapt the cut function to suit your need.