I have a dataset of coordinates that are merged by time into one dataframe, with the individual IDs in the header. For example:
> Date_time<-c("2015/03/04 01:00:00","2015/03/04 02:00:00","2015/03/04 03:00:00","2015/03/04 04:00:00")
> lat.1<-c(63.81310,63.83336,63.83250,63.82237)
> long.1<-c(-149.1176,-149.0193,-149.0249,-149.0408)
> lat.2<-c(63.85893 ,63.85885,63.86108,63.86357)
> long.2<-c(-151.1336,-151.1336,-151.1236,-151.1238)
> lat.3<-c(63.87627,63.87670, 63.85044,63.85052)
> long.3<-c(-149.5029,-149.5021,-149.5199,-149.5199)
>
> data<-data.frame(Date_time,lat.1,long.1,lat.2,long.2,lat.3,long.3)
> data
Date_time lat.1 long.1 lat.2 long.2 lat.3 long.3
1 2015/03/04 01:00:00 63.8131 -149.1176 63.85893 -151.1336 63.87627 -149.5029
2 2015/03/04 02:00:00 63.8131 -149.1176 63.85893 -151.1336 63.87627 -149.5029
3 2015/03/04 03:00:00 63.8131 -149.1176 63.85893 -151.1336 63.87627 -149.5029
4 2015/03/04 04:00:00 63.8131 -149.1176 63.85893 -151.1336 63.87627 -149.5029
I want to calculate the distance between each of the individuals, so between 1 and 2, 1 and 3, and 2 and 3. My dataframe has many more individuals than this so I am hoping to apply a loop function.
I can do them individually using
> data$distbetween12<-distHaversine(cbind(data$long.1,data$lat.1), cbind(data$long.2,data$lat.2))
> data$distbetween12
[1] 99083.48 99083.48 99083.48 99083.48
But can I calculate all the pairwise distances without typing out every pair combination?
Thank you!
Here's a solution that relies on the combn function to generate the necessary combinations. If you have more than 3 pairs of lat, long columns, just change the first number in the combn function to the correct number of pairs.
Note this solution also requires that your columns strictly adhere to the naming lat.1 long.1, lat.2, long.2 etc.
combos <- combn(3, 2)
cbind(data, as.data.frame(`colnames<-`(apply(combos, 2, function(x) {
lats <- paste0("lat.", x)
lons <- paste0("long.", x)
geosphere::distHaversine(cbind(data[[lons[1]]], data[[lats[1]]]),
cbind(data[[lons[2]]], data[[lats[2]]]))
}), apply(combos, 2, paste, collapse = " v "))))
#> Date_time lat.1 long.1 lat.2 long.2 lat.3 long.3
#> 1 2015/03/04 01:00:00 63.81310 -149.1176 63.85893 -151.1336 63.87627 -149.5029
#> 2 2015/03/04 02:00:00 63.83336 -149.0193 63.85885 -151.1336 63.87670 -149.5021
#> 3 2015/03/04 03:00:00 63.83250 -149.0249 63.86108 -151.1236 63.85044 -149.5199
#> 4 2015/03/04 04:00:00 63.82237 -149.0408 63.86357 -151.1238 63.85052 -149.5199
#> 1 v 2 1 v 3 2 v 3
#> 1 99083.48 20172.13 79974.87
#> 2 103778.13 24168.80 80014.97
#> 3 103020.61 24374.46 78669.90
#> 4 102317.93 23724.27 78680.61
Related
I'm having trouble using a function POSIXct.
When I apply the function in my dataset, the year appears with two zeros ahead.
like this:
datu1$timestamp <- as.POSIXct(datu1$date.sec, origin = "1970-01-01", tz="GMT")
datu1$timestamp <- as.POSIXct(datu1$timestamp,
format = "%Y-%m-%d %H:%M:%S", tz = 'GMT')
head(datu1)
ID date.sec lon lat lon.025 lat.025 lon.5 lat.5 lon.975 lat.975
1 102211.10 -61827840000 -38.6616 -13.59272 -40.5025 -15.25025 -38.7 -13.76 -36.9000 -10.88950
2 102211.10 -61827818400 -38.6647 -13.60312 -40.4000 -15.17025 -38.7 -13.77 -37.0975 -11.03975
3 102211.10 -61827796800 -38.6723 -13.64505 -40.3000 -15.10000 -38.7 -13.79 -37.0000 -11.29950
4 102211.10 -61827775200 -38.6837 -13.68972 -40.2000 -14.98025 -38.7 -13.83 -37.2000 -11.45975
5 102211.10 -61827753600 -38.7030 -13.73054 -40.2000 -14.98100 -38.7 -13.84 -37.3000 -11.62925
6 102211.10 -61827732000 -38.7221 -13.77846 -40.0000 -15.04050 -38.7 -13.88 -37.5000 -11.69950
bmode bmode.5 timestamp
1 1.556 2 0010-10-03 00:00:00
2 1.565 2 0010-10-03 06:00:00
3 1.571 2 0010-10-03 12:00:00
4 1.571 2 0010-10-03 18:00:00
5 1.589 2 0010-10-04 00:00:00
6 1.599 2 0010-10-04 06:00:00
How can I fix this to get the full year (like: 2010) instead of two zeros?
Perhaps your data was encoded with a weird origin (e.g. excel uses "1899-12-30"). Just adapt the origin= 'till the date matches what you require.
as.POSIXct(-61827840000, origin="1970-01-01", tz="GMT")
# [1] "0010-10-03 GMT"
as.POSIXct(-61827840000, origin="3970-01-01", tz="GMT")
# [1] "2010-10-03 GMT"
I am trying to use the prepData function in the R package moveHMM. I am getting "Error in prepData(x, coordNames = c("lon", "lat")) : Each animal's obervations must be contiguous."
x is a data.frame with column names "ID", "long", "lat". ID column is the name of each animal as a character, and lon/lat are numeric. There are no NA values, no missing rows.
I do not know what this error means nor can I fix it. Help please.
x <- data.frame(dat$ID, dat$lon, dat$lat)
hmmgps <- prepData(x, coordNames=c("lon", "lat"))
The function prepData assumes that the rows for each track (or each animal) are grouped together in the data frame. The error message indicates that it is not the case, and that at least one track is split. For example, the following (artificial) data set would cause this error:
> data
ID lon lat
1 1 54.08658 12.190313
2 1 54.20608 12.101203
3 1 54.18977 12.270896
4 2 55.79217 9.943341
5 2 55.88145 9.986028
6 2 55.91742 9.887342
7 1 54.25305 12.374541
8 1 54.28061 12.190078
This is because the track with ID "1" is split into two parts, separated by the track with ID "2".
The tracks need to be contiguous, i.e. all observations with ID "1" should come first, followed by all observations with ID "2". One possible solution would be to order the data by ID and by date.
Consider the same data set, with a "date" column:
> data
ID lon lat date
1 1 54.08658 12.190313 2019-09-06 14:20:00
2 1 54.20608 12.101203 2019-09-06 15:20:00
3 1 54.18977 12.270896 2019-09-06 16:20:00
4 2 55.79217 9.943341 2019-09-04 07:55:00
5 2 55.88145 9.986028 2019-09-04 08:55:00
6 2 55.91742 9.887342 2019-09-04 09:55:00
7 1 54.25305 12.374541 2019-09-06 17:20:00
8 1 54.28061 12.190078 2019-09-06 18:20:00
Following the answer to that question, you can define the ordered data set with:
> data_ordered <- data[with(data, order(ID, date)),]
> data_ordered
ID lon lat date
1 1 54.08658 12.190313 2019-09-06 14:20:00
2 1 54.20608 12.101203 2019-09-06 15:20:00
3 1 54.18977 12.270896 2019-09-06 16:20:00
7 1 54.25305 12.374541 2019-09-06 17:20:00
8 1 54.28061 12.190078 2019-09-06 18:20:00
4 2 55.79217 9.943341 2019-09-04 07:55:00
5 2 55.88145 9.986028 2019-09-04 08:55:00
6 2 55.91742 9.887342 2019-09-04 09:55:00
Then, the ordered data (excluding the date column) can be passed to prepData:
> hmmgps <- prepData(data_ordered[,1:3], coordNames = c("lon", "lat"))
> hmmgps
ID step angle x y
1 1 16.32042 NA 54.08658 12.190313
2 1 18.85560 2.3133191 54.20608 12.101203
3 1 13.37296 -0.6347523 54.18977 12.270896
4 1 20.62507 -2.4551318 54.25305 12.374541
5 1 NA NA 54.28061 12.190078
6 2 10.86906 NA 55.79217 9.943341
7 2 11.60618 -1.6734604 55.88145 9.986028
8 2 NA NA 55.91742 9.887342
I hope that this helps.
I have a dataframe of datetimes
tdata_df <- data.frame(timestamp=seq(c(ISOdate(2018,4,20)), by = (60*229), length.out = 6))
tdata_df
timestamp
1 2018-04-20 21:00:00
2 2018-04-21 00:49:00
3 2018-04-21 04:38:00
4 2018-04-21 08:27:00
5 2018-04-21 12:16:00
6 2018-04-21 16:05:00
then I would like to get value from this time range table
time_range_df <- data.frame(start=c("08:30","11:35","15:10","05:00"),
end=c("11:29","15:09","02:29","08:29"),value=c(1,2,3,4))
timerange_df
start end value
1 08:30 11:29 1
2 11:35 15:09 2
3 15:10 02:29 3
4 05:00 08:29 4
like this
timestamp value
1 2018-04-20 21:00:00 3
2 2018-04-21 00:49:00 3
3 2018-04-21 04:38:00 NA
4 2018-04-21 08:27:00 4
5 2018-04-21 12:16:00 2
6 2018-04-21 16:05:00 3
Any help would be greatly appreciated.
The sqldf package provides greater flexibility to join in such cases. The approach is:
Change time in time_range_df to offset from mid-night.
Add a column in tdata_df to represent time elapsed since midnight
Join both data frames for overlapped time since midnight
library(lubridate)
time_range_df$start <- as.numeric(seconds(hm(time_range_df$start)))
time_range_df$end <- as.numeric(seconds(hm(time_range_df$end)))
tdata_df$timeSinceMidNigh <- as.numeric(seconds(hms(format(ymd_hms(tdata_df$timestamp),
format = "%H:%M:%S"))))
library(sqldf)
sqlquery <- "SELECT D1.timestamp, Q.value FROM tdata_df D1
LEFT JOIN (SELECT * FROM tdata_df D, time_range_df R
WHERE (R.start < R.end AND D.timeSinceMidNigh between R.start AND R.end) OR
(R.start > R.end AND D.timeSinceMidNigh between R.start AND 86400) OR
(R.start > R.end AND D.timeSinceMidNigh between 0 and R.end)) Q
ON D1.timestamp = Q.timestamp"
sqldf(sqlquery)
# timestamp value
# 1 2018-04-20 13:00:00 2
# 2 2018-04-20 16:49:00 3
# 3 2018-04-20 20:38:00 3
# 4 2018-04-21 00:27:00 3
# 5 2018-04-21 04:16:00 NA
# 6 2018-04-21 08:05:00 4
Data:
tdata_df <- data.frame(timestamp=seq(c(ISOdate(2018,4,20)), by = (60*229), length.out = 6))
time_range_df <- data.frame(start=c("08:30","11:35","15:10","05:00"),
end=c("11:29","15:09","02:29","08:29"),value=c(1,2,3,4))
I have the following data as a list of POSIXct times that span one month. Each of them represent a bike delivery. My aim is to find the average amount of bike deliveries per ten-minute interval over a 24-hour period (producing a total of 144 rows). First all of the trips need to be summed and binned into an interval, then divided by the number of days. So far, I've managed to write a code that sums trips per 10-minute interval, but it produces incorrect values. I am not sure where it went wrong.
The data looks like this:
head(start_times)
[1] "2014-10-21 16:58:13 EST" "2014-10-07 10:14:22 EST" "2014-10-20 01:45:11 EST"
[4] "2014-10-17 08:16:17 EST" "2014-10-07 17:46:36 EST" "2014-10-28 17:32:34 EST"
length(start_times)
[1] 1747
The code looks like this:
library(lubridate)
library(dplyr)
tripduration <- floor(runif(1747) * 1000)
time_bucket <- start_times - minutes(minute(start_times) %% 10) - seconds(second(start_times))
df <- data.frame(tripduration, start_times, time_bucket)
summarized <- df %>%
group_by(time_bucket) %>%
summarize(trip_count = n())
summarized <- as.data.frame(summarized)
out_buckets <- data.frame(out_buckets = seq(as.POSIXlt("2014-10-01 00:00:00"), as.POSIXct("2014-10-31 23:0:00"), by = 600))
out <- left_join(out_buckets, summarized, by = c("out_buckets" = "time_bucket"))
out$trip_count[is.na(out$trip_count)] <- 0
head(out)
out_buckets trip_count
1 2014-10-01 00:00:00 0
2 2014-10-01 00:10:00 0
3 2014-10-01 00:20:00 0
4 2014-10-01 00:30:00 0
5 2014-10-01 00:40:00 0
6 2014-10-01 00:50:00 0
dim(out)
[1] 4459 2
test <- format(out$out_buckets,"%H:%M:%S")
test2 <- out$trip_count
test <- cbind(test, test2)
colnames(test)[1] <- "interval"
colnames(test)[2] <- "count"
test <- as.data.frame(test)
test$count <- as.numeric(test$count)
test <- aggregate(count~interval, test, sum)
head(test, n = 20)
interval count
1 00:00:00 32
2 00:10:00 33
3 00:20:00 32
4 00:30:00 31
5 00:40:00 34
6 00:50:00 34
7 01:00:00 31
8 01:10:00 33
9 01:20:00 39
10 01:30:00 41
11 01:40:00 36
12 01:50:00 31
13 02:00:00 33
14 02:10:00 34
15 02:20:00 32
16 02:30:00 32
17 02:40:00 36
18 02:50:00 32
19 03:00:00 34
20 03:10:00 39
but this is impossible because when I sum the counts
sum(test$count)
[1] 7494
I get 7494 whereas the number should be 1747
I'm not sure where I went wrong and how to simplify this code to get the same result.
I've done what I can, but I can't reproduce your issue without your data.
library(dplyr)
I created the full sequence of 10 minute blocks:
blocks.of.10mins <- data.frame(out_buckets=seq(as.POSIXct("2014/10/01 00:00"), by="10 mins", length.out=30*24*6))
Then split the start_times into the same bins. Note: I created a baseline time of midnight to force the blocks to align to 10 minute intervals. Removing this later is an exercise for the reader. I also changed one of your data points so that there was at least one example of multiple records in the same bin.
start_times <- as.POSIXct(c("2014-10-01 00:00:00", ## added
"2014-10-21 16:58:13",
"2014-10-07 10:14:22",
"2014-10-20 01:45:11",
"2014-10-17 08:16:17",
"2014-10-07 10:16:36", ## modified
"2014-10-28 17:32:34"))
trip_times <- data.frame(start_times) %>%
mutate(out_buckets = as.POSIXct(cut(start_times, breaks="10 mins")))
The start_times and all the 10 minute intervals can then be merged
trips_merged <- merge(trip_times, blocks.of.10mins, by="out_buckets", all=TRUE)
These can then be grouped by 10 minute block and counted
trips_merged %>% filter(!is.na(start_times)) %>%
group_by(out_buckets) %>%
summarise(trip_count=n())
Source: local data frame [6 x 2]
out_buckets trip_count
(time) (int)
1 2014-10-01 00:00:00 1
2 2014-10-07 10:10:00 2
3 2014-10-17 08:10:00 1
4 2014-10-20 01:40:00 1
5 2014-10-21 16:50:00 1
6 2014-10-28 17:30:00 1
Instead, if we only consider time, not date
trips_merged2 <- trips_merged
trips_merged2$out_buckets <- format(trips_merged2$out_buckets, "%H:%M:%S")
trips_merged2 %>% filter(!is.na(start_times)) %>%
group_by(out_buckets) %>%
summarise(trip_count=n())
Source: local data frame [6 x 2]
out_buckets trip_count
(chr) (int)
1 00:00:00 1
2 01:40:00 1
3 08:10:00 1
4 10:10:00 2
5 16:50:00 1
6 17:30:00 1
i have time variable : "00:00:29","00:06:39","20:43:15"....
and I want to recode to new vector - time based work shifts:
07:00:00 - 13:00:00 - 1
13:00:00 - 20:00:00 - 2
23:00:00 - 7:00:00 - 3
thanks for any idea :)
Assuming the time variables are strings as shown, this seems to work:
secNr <- function(x){ sum(as.numeric(unlist(strsplit(x,":",fixed=TRUE))) * c(3600,60,1)) }
workShift <- function(x)
{
n <- which.max(secNr(x) >= c(secNr("23:00:00"),secNr("20:00:00"),secNr("13:00:00"),secNr("07:00:00"),secNr("00:00:00")))
c(3,NA,2,1,3)[n]
}
"workShift" computes the work shift of one such time string. If you have a vector of time strings, use "sapply". Example:
> Time <- sprintf("%i:%02i:00", 0:23, sample(0:59,24))
> Shift <- sapply(Time,"workShift")
> Shift
0:37:00 1:17:00 2:35:00 3:09:00 4:08:00 5:28:00 6:03:00 7:43:00 8:27:00 9:38:00 10:48:00 11:50:00 12:58:00 13:32:00 14:05:00 15:39:00 16:56:00
3 3 3 3 3 3 3 1 1 1 1 1 1 2 2 2 2
17:00:00 18:22:00 19:02:00 20:42:00 21:11:00 22:15:00 23:01:00
2 2 2 NA NA NA 3