I am trying to join 2 data sets with the closest TimeDate(POSTIXct format).
Indeed, some of the DateTime are properly matching and others are different by 5 min.
(df1) //// every 5 min with specific time points
# A tibble: 6 × 3
TimeDate TimeDateAnimal Event
<dttm> <chr> <dbl>
1 2015-03-01 00:55:00 2015-03-01 00:55:00 G 1
**2 2015-03-01 03:40:00 2015-03-01 03:40:00 G 1
3 2015-03-01 03:45:00 2015-03-01 03:45:00 G 1**
4 2015-03-01 13:35:00 2015-03-01 13:35:00 G 1
5 2015-03-01 18:45:00 2015-03-01 18:45:00 G 1
6 2015-03-01 19:10:00 2015-03-01 19:10:00 G 1
> (df2) /// every 10 min
A tibble: 52 × 3
TimeDate TimeDateAnimal Temperature
<dttm> <chr> <dbl>
1 2015-03-01 00:05:00 2015-03-01 00:05:00 G 38.52000
2 2015-03-01 00:15:00 2015-03-01 00:15:00 G 38.65333
3 2015-03-01 00:25:00 2015-03-01 00:25:00 G 38.78667
4 2015-03-01 00:35:00 2015-03-01 00:35:00 G 38.86000
5 2015-03-01 00:45:00 2015-03-01 00:45:00 G 38.92667
6 2015-03-01 00:55:00 2015-03-01 00:55:00 G 38.99333
..
**34 2015-03-01 03:35:00 2015-03-01 03:35:00 G 38.80000
35 2015-03-01 03:45:00 2015-03-01 03:45:00 G 38.80000**
I would like this output:
Merge df:
TimeDate TimeDateAnimal Temperature Event
<dttm> <chr> <dbl> <dbl>
1 2015-03-01 00:05:00 2015-03-01 00:05:00 G 38.52000 NA
2 2015-03-01 00:15:00 2015-03-01 00:15:00 G 38.65333 NA
3 2015-03-01 00:25:00 2015-03-01 00:25:00 G 38.78667 NA
4 2015-03-01 00:35:00 2015-03-01 00:35:00 G 38.86000 NA
5 2015-03-01 00:45:00 2015-03-01 00:45:00 G 38.92667 NA
6 2015-03-01 00:55:00 2015-03-01 00:55:00 G 38.99333 NA
..
**34 2015-03-01 03:35:00 2015-03-01 03:35:00 G 38.80000 1
35 2015-03-01 03:45:00 2015-03-01 03:45:00 G 38.80000 1**
I tried fuzzyjoin, data.table but I always get extra raw instead of merging by the nearest TimeDate
Test<- merge(df1, df2, by = "TimeDate", roll = "nearest",all = T)
#head(Test, n=15)
TimeDate TimeDateAnimal.x Event TimeDateAnimal.y Temperature
10 2015-03-01 00:45:00 <NA> NA 2015-03-01 00:45:00 G 38.92667
11 2015-03-01 00:55:00 2015-03-01 00:55:00 G 1 <NA> NA
12 2015-03-01 00:55:00 <NA> NA 2015-03-01 00:55:00 G 38.99333
Thanks in advance.
Related
I am trying to identify clusters in a dataframe that are within 4 subsequent days of the first event. Additionally we have a grouping variable.
Here is an example:
startDate <- as.POSIXct("2022-10-01")
dt1 <- data.table(
id = 1:20,
timestamp = startDate+ lubridate::days(rep(1:10,2))+ lubridate::hours(1:20),
group_id = rep(c("A","B"), each= 10)
)
id timestamp group_id t_diff
1: 1 2022-10-02 01:00:00 A 0.000000 days
2: 2 2022-10-03 02:00:00 A 1.041667 days
3: 3 2022-10-04 03:00:00 A 2.083333 days
4: 4 2022-10-05 04:00:00 A 3.125000 days
5: 5 2022-10-06 05:00:00 A 4.166667 days
6: 6 2022-10-07 06:00:00 A 5.208333 days
7: 7 2022-10-08 07:00:00 A 6.250000 days
8: 8 2022-10-09 08:00:00 A 7.291667 days
9: 9 2022-10-10 09:00:00 A 8.333333 days
10: 10 2022-10-11 10:00:00 A 9.375000 days
11: 11 2022-10-02 11:00:00 B 0.000000 days
12: 12 2022-10-03 12:00:00 B 1.041667 days
13: 13 2022-10-04 13:00:00 B 2.083333 days
14: 14 2022-10-05 14:00:00 B 3.125000 days
15: 15 2022-10-06 15:00:00 B 4.166667 days
16: 16 2022-10-07 16:00:00 B 5.208333 days
17: 17 2022-10-08 17:00:00 B 6.250000 days
18: 18 2022-10-09 18:00:00 B 7.291667 days
19: 19 2022-10-10 19:00:00 B 8.333333 days
20: 20 2022-10-11 20:00:00 B 9.375000 days
The result should look like this:
id timestamp group_id t_diff cluster_id
1: 1 2022-10-02 01:00:00 A 0.000000 days 1
2: 2 2022-10-03 02:00:00 A 1.041667 days 1
3: 3 2022-10-04 03:00:00 A 2.083333 days 1
4: 4 2022-10-05 04:00:00 A 3.125000 days 1
5: 5 2022-10-06 05:00:00 A 4.166667 days 2
6: 6 2022-10-07 06:00:00 A 5.208333 days 2
7: 7 2022-10-08 07:00:00 A 6.250000 days 2
8: 8 2022-10-09 08:00:00 A 7.291667 days 2
9: 9 2022-10-10 09:00:00 A 8.333333 days 3
10: 10 2022-10-11 10:00:00 A 9.375000 days 3
11: 11 2022-10-02 11:00:00 B 0.000000 days 4
12: 12 2022-10-03 12:00:00 B 1.041667 days 4
13: 13 2022-10-04 13:00:00 B 2.083333 days 4
14: 14 2022-10-05 14:00:00 B 3.125000 days 4
15: 15 2022-10-06 15:00:00 B 4.166667 days 5
16: 16 2022-10-07 16:00:00 B 5.208333 days 5
17: 17 2022-10-08 17:00:00 B 6.250000 days 5
18: 18 2022-10-09 18:00:00 B 7.291667 days 5
19: 19 2022-10-10 19:00:00 B 8.333333 days 6
20: 20 2022-10-11 20:00:00 B 9.375000 days 6
I have tried an approch with lapply, but the code is ugly and very slow. I am looking for a data.table approach, but I don't know how to dynamically refer to the "first" observation.
By first observation I mean the first observation of the 4 day interval.
You can use integer division.
Not that as.numeric run on a difftime object as an argument units that converts the difference to the desired time unit.
startDate <- as.POSIXct("2022-10-01")
dt1 <- data.table::data.table(
id = 1:20,
timestamp = startDate + lubridate::days(rep(1:10,2)) + lubridate::hours(1:20),
group_id = rep(c("A","B"), each= 10)
)
#
dt1[, GRP := as.numeric(timestamp - min(timestamp),
units = "days") %/% 4,
by = group_id][]
#> id timestamp group_id GRP
#> 1: 1 2022-10-02 01:00:00 A 0
#> 2: 2 2022-10-03 02:00:00 A 0
#> 3: 3 2022-10-04 03:00:00 A 0
#> 4: 4 2022-10-05 04:00:00 A 0
#> 5: 5 2022-10-06 05:00:00 A 1
#> 6: 6 2022-10-07 06:00:00 A 1
#> 7: 7 2022-10-08 07:00:00 A 1
#> 8: 8 2022-10-09 08:00:00 A 1
#> 9: 9 2022-10-10 09:00:00 A 2
#> 10: 10 2022-10-11 10:00:00 A 2
#> 11: 11 2022-10-02 11:00:00 B 0
#> 12: 12 2022-10-03 12:00:00 B 0
#> 13: 13 2022-10-04 13:00:00 B 0
#> 14: 14 2022-10-05 14:00:00 B 0
#> 15: 15 2022-10-06 15:00:00 B 1
#> 16: 16 2022-10-07 16:00:00 B 1
#> 17: 17 2022-10-08 17:00:00 B 1
#> 18: 18 2022-10-09 18:00:00 B 1
#> 19: 19 2022-10-10 19:00:00 B 2
#> 20: 20 2022-10-11 20:00:00 B 2
# When you want a single ID index
# alternatovely, just you the combination of group_id and GRP in subsequent `by`s
dt1[, cluster_id := .GRP, by = .(group_id, GRP)][]
#> id timestamp group_id GRP cluster_id
#> 1: 1 2022-10-02 01:00:00 A 0 1
#> 2: 2 2022-10-03 02:00:00 A 0 1
#> 3: 3 2022-10-04 03:00:00 A 0 1
#> 4: 4 2022-10-05 04:00:00 A 0 1
#> 5: 5 2022-10-06 05:00:00 A 1 2
#> 6: 6 2022-10-07 06:00:00 A 1 2
#> 7: 7 2022-10-08 07:00:00 A 1 2
#> 8: 8 2022-10-09 08:00:00 A 1 2
#> 9: 9 2022-10-10 09:00:00 A 2 3
#> 10: 10 2022-10-11 10:00:00 A 2 3
#> 11: 11 2022-10-02 11:00:00 B 0 4
#> 12: 12 2022-10-03 12:00:00 B 0 4
#> 13: 13 2022-10-04 13:00:00 B 0 4
#> 14: 14 2022-10-05 14:00:00 B 0 4
#> 15: 15 2022-10-06 15:00:00 B 1 5
#> 16: 16 2022-10-07 16:00:00 B 1 5
#> 17: 17 2022-10-08 17:00:00 B 1 5
#> 18: 18 2022-10-09 18:00:00 B 1 5
#> 19: 19 2022-10-10 19:00:00 B 2 6
#> 20: 20 2022-10-11 20:00:00 B 2 6
I have a data frame with two date/time columns:
BeginTime EndTime Value
-----------------------------------------------------
1 2019-01-03 13:45:00 2019-01-03 17:30:00 41
2 2019-01-03 13:30:00 2019-01-03 14:30:00 20
3 2019-01-03 16:45:00 2019-01-03 19:00:00 23
That I need to transform into this:
Time Value
--------------------------------
1 2019-01-03 13:45:00 41
2 2019-01-03 14:00:00 41
3 2019-01-03 14:15:00 41
4 2019-01-03 14:30:00 41
5 2019-01-03 14:45:00 41
6 2019-01-03 15:00:00 41
7 2019-01-03 15:15:00 41
8 2019-01-03 13:30:00 20
9 2019-01-03 13:45:00 20
10 2019-01-03 14:00:00 20
11 2019-01-03 14:15:00 20
12 2019-01-03 16:45:00 23
But not sure how to do that. Any suggestions?
Code to create testdf
testdf <- data.frame(c("2019-01-03 13:45:00", "2019-01-03 13:30:00", "2019-01-03 16:45:00"),
c("2019-01-03 15:30:00", "2019-01-03 14:30:00", "2019-01-03 17:00:00"),
c(41,20,23))
colnames(testdf) <-c("BeginTime", "EndTime", "Value")
testdf$BeginTime <- as.POSIXct(testdf$BeginTime)
testdf$EndTime <- as.POSIXct(testdf$EndTime)
(I know there's probably a way to create the columns as POSIXct initially but this works)
We can use seq between the two columns and create a list with complete 15 minute intervals for each Beginning - Endand then use rep based on their length to get the value, i.e.
l1 <- Map(function(x, y)seq(x, y, by = '15 mins'), testdf$BeginTime, testdf$EndTime)
data.frame(Time = do.call(c, l1), value = rep(testdf$Value, lengths(l1)))
which gives,
Time value
1 2019-01-03 13:45:00 41
2 2019-01-03 14:00:00 41
3 2019-01-03 14:15:00 41
4 2019-01-03 14:30:00 41
5 2019-01-03 14:45:00 41
6 2019-01-03 15:00:00 41
7 2019-01-03 15:15:00 41
8 2019-01-03 15:30:00 41
9 2019-01-03 13:30:00 20
10 2019-01-03 13:45:00 20
11 2019-01-03 14:00:00 20
12 2019-01-03 14:15:00 20
13 2019-01-03 14:30:00 20
14 2019-01-03 16:45:00 23
15 2019-01-03 17:00:00 23
Using the tidyverse
testdf <- data.frame(c("2019-01-03 13:45:00", "2019-01-03 13:30:00", "2019-01-03 16:45:00"),
c("2019-01-03 15:30:00", "2019-01-03 14:30:00", "2019-01-03 17:00:00"),
c(41,20,23))
colnames(testdf) <-c("BeginTime", "EndTime", "Value")
testdf$BeginTime <- as.POSIXct(testdf$BeginTime)
testdf$EndTime <- as.POSIXct(testdf$EndTime)
library(tidyverse)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#>
#> date
testdf %>%
mutate(periods = map2(BeginTime,EndTime,seq,by = '15 mins')) %>%
unnest(periods)
#> # A tibble: 15 x 4
#> BeginTime EndTime Value periods
#> <dttm> <dttm> <dbl> <dttm>
#> 1 2019-01-03 13:45:00 2019-01-03 15:30:00 41 2019-01-03 13:45:00
#> 2 2019-01-03 13:45:00 2019-01-03 15:30:00 41 2019-01-03 14:00:00
#> 3 2019-01-03 13:45:00 2019-01-03 15:30:00 41 2019-01-03 14:15:00
#> 4 2019-01-03 13:45:00 2019-01-03 15:30:00 41 2019-01-03 14:30:00
#> 5 2019-01-03 13:45:00 2019-01-03 15:30:00 41 2019-01-03 14:45:00
#> 6 2019-01-03 13:45:00 2019-01-03 15:30:00 41 2019-01-03 15:00:00
#> 7 2019-01-03 13:45:00 2019-01-03 15:30:00 41 2019-01-03 15:15:00
#> 8 2019-01-03 13:45:00 2019-01-03 15:30:00 41 2019-01-03 15:30:00
#> 9 2019-01-03 13:30:00 2019-01-03 14:30:00 20 2019-01-03 13:30:00
#> 10 2019-01-03 13:30:00 2019-01-03 14:30:00 20 2019-01-03 13:45:00
#> 11 2019-01-03 13:30:00 2019-01-03 14:30:00 20 2019-01-03 14:00:00
#> 12 2019-01-03 13:30:00 2019-01-03 14:30:00 20 2019-01-03 14:15:00
#> 13 2019-01-03 13:30:00 2019-01-03 14:30:00 20 2019-01-03 14:30:00
#> 14 2019-01-03 16:45:00 2019-01-03 17:00:00 23 2019-01-03 16:45:00
#> 15 2019-01-03 16:45:00 2019-01-03 17:00:00 23 2019-01-03 17:00:00
Created on 2020-01-07 by the reprex package (v0.3.0)
I am working on a problem where I need to calculate the time difference in minutes. I have the time values in hh:mm:ss format in a column (more than 28,000 values).
I have been using the following function to calculate the time difference.
tdiff <- dt[dt, Time_Diff := c(abs(diff(as.numeric(Time))),0.30), Student_ID]
where dt --> is the ordered data table and
0.30 --> 30 minutes assigned to the last activity of the student in a course.
This works, but it is not considering the midnight time.
Thanks to #niko for his help and this problem is solved, however the '30 minutes' that should be assigned to each student's last activity is still not done. Any help in this direction will be greatly appreciated. Thank you.
The expected output is described below
S_Id Date Time Time_Diff Time_Diff(minutes)
A 10/08/2018 23:49:00 00:01:00 1 minutes
A 10/08/2018 23:50:00 00:09:00 9
A 10/08/2018 23:59:00 00:02:00 2
A 10/09/2018 00:01:00 00:09:00 9
A 10/09/2018 00:10:00 08:02:00 482
A 10/09/2018 08:12:00 04:08:00 248
A 10/09/2018 12:20:00 10:01:00 601
A 10/09/2018 22:21:00 01:35:00 95
A 10/09/2018 23:56:00 00:09:00 9
A 10/10/2018 00:05:00 00:05:00 5
A 10/10/2018 00:10:00 00:02:00 2
A 10/10/2018 00:12:00 00:30:00 30
B 10/08/2018 23:49:00 00:01:00 1
B 10/08/2018 23:50:00 00:09:00 9
B 10/08/2018 23:59:00 00:02:00 2
B 10/09/2018 00:01:00 00:09:00 9
B 10/09/2018 00:10:00 08:02:00 482
B 10/09/2018 08:12:00 04:08:00 248
B 10/09/2018 12:20:00 10:01:00 601
B 10/09/2018 22:21:00 01:35:00 95
B 10/09/2018 23:56:00 00:09:00 9
B 10/10/2018 00:05:00 00:05:00 5
B 10/10/2018 00:10:00 00:02:00 2
B 10/10/2018 00:12:00 00:30:00 30
C 10/08/2018 23:49:00 00:01:00 1
C 10/08/2018 23:50:00 00:09:00 9
C 10/08/2018 23:59:00 00:02:00 2
C 10/09/2018 00:01:00 00:09:00 9
C 10/09/2018 00:10:00 08:02:00 482
C 10/09/2018 08:12:00 04:08:00 248
C 10/09/2018 12:20:00 10:01:00 601
C 10/09/2018 22:21:00 01:35:00 95
C 10/09/2018 23:56:00 00:09:00 9
C 10/10/2018 00:05:00 00:05:00 5
C 10/10/2018 00:10:00 00:02:00 2
C 10/10/2018 00:12:00 00:30:00 30
Try converting date and time to POSIXct
# dt is your data frame
diff(as.POSIXct(paste(dt$Date, dt$Time), format='%m/%d/%Y %H:%M:%S')) # or '%d/%m/%Y %H:%M:%S'
That should do the trick.
Data
dt <- structure(list(Date = c("10/08/2018", "10/08/2018", "10/08/2018", "10/09/2018", "10/09/2018",
"10/09/2018", "10/09/2018", "10/09/2018", "10/09/2018", "10/10/2018",
"10/10/2018", "10/10/2018"),
Time = c("23:49:00", "23:50:00", "23:59:00", "00:01:00", "00:10:00", "08:12:00",
"12:20:00", "22:21:00", "23:56:00", "00:05:00", "00:10:00", "00:12:00")),
class = "data.frame", row.names = c(NA, -12L))
This question already has answers here:
Insert rows for missing dates/times
(9 answers)
Closed 5 years ago.
I have a dataframe that contains hourly weather information. I would like to increase the granularity of the time measurements (5 minute intervals instead of 60 minute intervals) while copying the other columns data into the new rows created:
Current Dataframe Structure:
Date Temperature Humidity
2015-01-01 00:00:00 25 0.67
2015-01-01 01:00:00 26 0.69
Target Dataframe Structure:
Date Temperature Humidity
2015-01-01 00:00:00 25 0.67
2015-01-01 00:05:00 25 0.67
2015-01-01 00:10:00 25 0.67
.
.
.
2015-01-01 00:55:00 25 0.67
2015-01-01 01:00:00 26 0.69
2015-01-01 01:05:00 26 0.69
2015-01-01 01:10:00 26 0.69
.
.
.
What I've Tried:
for(i in 1:nrow(df)) {
five.minutes <- seq(df$date[i], length = 12, by = "5 mins")
for(j in 1:length(five.minutes)) {
df$date[i]<-rbind(five.minutes[j])
}
}
Error I'm getting:
Error in as.POSIXct.numeric(value) : 'origin' must be supplied
The one possible solution can be using fill from tidyr and right_join from dplyr.
The approach is to create date/time series between min and max+55mins times from dataframe. Left join dataframe with timeseries which will provide you all desired rows but NA for Temperature and Humidity. Now use fill to populated NA values with previous valid values.
# Data
df <- read.table(text = "Date Temperature Humidity
'2015-01-01 00:00:00' 25 0.67
'2015-01-01 01:00:00' 26 0.69
'2015-01-01 02:00:00' 28 0.69
'2015-01-01 03:00:00' 25 0.69", header = T, stringsAsFactors = F)
df$Date <- as.POSIXct(df$Date, format = "%Y-%m-%d %H:%M:%S")
# Create a dataframe with all possible date/time at intervale of 5 mins
Dates <- data.frame(Date = seq(min(df$Date), max(df$Date)+3540, by = 5*60))
result <- df %>%
right_join(Dates, by="Date") %>%
fill(Temperature, Humidity)
result
# Date Temperature Humidity
#1 2015-01-01 00:00:00 25 0.67
#2 2015-01-01 00:05:00 25 0.67
#3 2015-01-01 00:10:00 25 0.67
#4 2015-01-01 00:15:00 25 0.67
#5 2015-01-01 00:20:00 25 0.67
#6 2015-01-01 00:25:00 25 0.67
#7 2015-01-01 00:30:00 25 0.67
#8 2015-01-01 00:35:00 25 0.67
#9 2015-01-01 00:40:00 25 0.67
#10 2015-01-01 00:45:00 25 0.67
#11 2015-01-01 00:50:00 25 0.67
#12 2015-01-01 00:55:00 25 0.67
#13 2015-01-01 01:00:00 26 0.69
#14 2015-01-01 01:05:00 26 0.69
#.....
#.....
#44 2015-01-01 03:35:00 25 0.69
#45 2015-01-01 03:40:00 25 0.69
#46 2015-01-01 03:45:00 25 0.69
#47 2015-01-01 03:50:00 25 0.69
#48 2015-01-01 03:55:00 25 0.69
I think this might do:
df=tibble(DateTime=c("2015-01-01 00:00:00","2015-01-01 01:00:00"),Temperature=c(25,26),Humidity=c(.67,.69))
df$DateTime<-ymd_hms(df$DateTime)
DateTime=as.POSIXct((sapply(1:(nrow(df)-1),function(x) seq(from=df$DateTime[x],to=df$DateTime[x+1],by="5 min"))),
origin="1970-01-01", tz="UTC")
Temperature=c(sapply(1:(nrow(df)-1),function(x) rep(df$Temperature[x],12)),df$Temperature[nrow(df)])
Humidity=c(sapply(1:(nrow(df)-1),function(x) rep(df$Humidity[x],12)),df$Humidity[nrow(df)])
tibble(as.character(DateTime),Temperature,Humidity)
<chr> <dbl> <dbl>
1 2015-01-01 00:00:00 25.0 0.670
2 2015-01-01 00:05:00 25.0 0.670
3 2015-01-01 00:10:00 25.0 0.670
4 2015-01-01 00:15:00 25.0 0.670
5 2015-01-01 00:20:00 25.0 0.670
6 2015-01-01 00:25:00 25.0 0.670
7 2015-01-01 00:30:00 25.0 0.670
8 2015-01-01 00:35:00 25.0 0.670
9 2015-01-01 00:40:00 25.0 0.670
10 2015-01-01 00:45:00 25.0 0.670
11 2015-01-01 00:50:00 25.0 0.670
12 2015-01-01 00:55:00 25.0 0.670
13 2015-01-01 01:00:00 26.0 0.690
I am trying to fill in the gaps in one of my time series by merging a full day time series into my original time series. But for some reason I get duplicate entries and all the rest of my data is NA.
My data looks like this:
> head(data)
TIME Water_Temperature
1 2016-08-22 00:00:00 81.000
2 2016-08-22 00:01:00 80.625
3 2016-08-22 00:02:00 85.000
4 2016-08-22 00:03:00 80.437
5 2016-08-22 00:04:00 85.000
6 2016-08-22 00:05:00 80.375
> tail(data)
TIME Water_Temperature
1398 2016-08-22 23:54:00 19.5
1399 2016-08-22 23:55:00 19.5
1400 2016-08-22 23:56:00 19.5
1401 2016-08-22 23:57:00 19.5
1402 2016-08-22 23:58:00 19.5
1403 2016-08-22 23:59:00 19.5
In between are some minutes missing (1403 rows instead of 1440). I tried to fill them in using:
data.length <- length(data$TIME)
time.min <- data$TIME[1]
time.max <- data$TIME[data.length]
all.dates <- seq(time.min, time.max, by="min")
all.dates.frame <- data.frame(list(TIME=all.dates))
merged.data <- merge(all.dates.frame, data, all=T)
But that gives me a result of 1449 rows instead of 1440. The first eight minutes are duplicates in the time stamp column and all other values in Water_Temperature are NA. Looks like this:
> merged.data[1:25,]
TIME Water_Temperature
1 2016-08-22 00:00:00 NA
2 2016-08-22 00:00:00 81.000
3 2016-08-22 00:01:00 NA
4 2016-08-22 00:01:00 80.625
5 2016-08-22 00:02:00 NA
6 2016-08-22 00:02:00 85.000
7 2016-08-22 00:03:00 NA
8 2016-08-22 00:03:00 80.437
9 2016-08-22 00:04:00 NA
10 2016-08-22 00:04:00 85.000
11 2016-08-22 00:05:00 NA
12 2016-08-22 00:05:00 80.375
13 2016-08-22 00:06:00 NA
14 2016-08-22 00:06:00 80.812
15 2016-08-22 00:07:00 NA
16 2016-08-22 00:07:00 80.812
17 2016-08-22 00:08:00 NA
18 2016-08-22 00:08:00 80.937
19 2016-08-22 00:09:00 NA
20 2016-08-22 00:10:00 NA
21 2016-08-22 00:11:00 NA
22 2016-08-22 00:12:00 NA
23 2016-08-22 00:13:00 NA
24 2016-08-22 00:14:00 NA
25 2016-08-22 00:15:00 NA
> tail(merged.data)
TIME Water_Temperature
1444 2016-08-22 23:54:00 NA
1445 2016-08-22 23:55:00 NA
1446 2016-08-22 23:56:00 NA
1447 2016-08-22 23:57:00 NA
1448 2016-08-22 23:58:00 NA
1449 2016-08-22 23:59:00 NA
Does anyone has an idea whats going wrong?
EDIT:
Using the xts and zoo package now to do the job by doing:
library(xts)
library(zoo)
df1.zoo<-zoo(data[,-1],data[,1])
df2 <- as.data.frame(as.zoo(merge(as.xts(df1.zoo), as.xts(zoo(,seq(start(df1.zoo),end(df1.zoo),by="min"))))))
Very easy and effective!
Instead of merge use rbind which gives you an irregular time series without NAs to start with. If you really want a regular time series with a frequency of say 1 minute you can build a time based sequence as an index and merge it with your data after ( after using rbind) and fill the resulting NAs with na.locf. Hope this helps.
you can try merging with full_join from tidyverse
This works for me with two dataframes (daily values) sharing a column named date.
big_data<-my_data %>%
reduce(full_join, by="Date")