Averaging the value with respect to time - r

I have the below dataset with date-time and the corresponding value. The time interval is every 10 mins. I need to generate new rows with 15 mins interval.
For example, for 15:40 the value is 599 and for 15:50 the value is 594, so a new row needs to be generated between the two, i.e 15:45 with average of 599 & 594 which is 596.5
I.e, I need to generate an average between 10 & 20 to get the value for say 16:15; and 40 & 50 to get the value for 16:45. The value for 00, 30 remains the same
Date...Time RA.CO2
6/15/2017 15:40 599
6/15/2017 15:50 594
6/15/2017 16:00 606
6/15/2017 16:10 594
6/15/2017 16:20 594
6/15/2017 16:30 594
6/15/2017 16:40 594
6/15/2017 16:50 594
6/16/2017 0:00 496.25
6/16/2017 0:10 500
6/16/2017 0:20 496.25
6/16/2017 0:30 496.25
6/16/2017 0:40 600
6/16/2017 0:50 650
6/16/2017 1:00 700
str(df)
'data.frame': 6092 obs. of 2 variables:
$ Date...Time: chr "6/15/2017 15:40" "6/15/2017 15:50" "6/15/2017 16:00"
"6/15/2017 16:10" ...
$ RA.CO2 : num 599 594 606 594 594 594 594 594 594 594 ...
Output
Date...Time RA.CO2
6/15/2017 15:45 596.5
6/15/2017 16:00 606
6/15/2017 16:15 594
6/15/2017 16:30 594
6/15/2017 16:45 594
6/16/2017 0:00 496.25
6/16/2017 0:15 498.125
6/16/2017 0:30 496.25
6/16/2017 0:45 625
6/16/2017 1:00 700

We can use tidyr to expand the data frame and imputeTS to impute the missing values by linear interpolation.
library(dplyr)
library(tidyr)
library(lubridate)
library(imputeTS)
dt2 <- dt %>%
mutate(Date...Time = mdy_hm(Date...Time)) %>%
mutate(Date = as.Date(Date...Time)) %>%
group_by(Date) %>%
complete(Date...Time = seq(min(Date...Time), max(Date...Time), by = "5 min")) %>%
mutate(RA.CO2 = na.interpolation(RA.CO2)) %>%
ungroup() %>%
select(Date...Time, RA.CO2)
dt2
# A tibble: 22 x 2
Date...Time RA.CO2
<dttm> <dbl>
1 2017-06-15 15:40:00 599.0
2 2017-06-15 15:45:00 596.5
3 2017-06-15 15:50:00 594.0
4 2017-06-15 15:55:00 600.0
5 2017-06-15 16:00:00 606.0
6 2017-06-15 16:05:00 600.0
7 2017-06-15 16:10:00 594.0
8 2017-06-15 16:15:00 594.0
9 2017-06-15 16:20:00 594.0
10 2017-06-15 16:25:00 594.0
# ... with 12 more rows
My output is not entirely the same as your desired output. This is because:
It is not clear how do you get the values in 6/16/2017 0:10.
Why sometimes the interval is 5 minutes, but sometimes it is 10 minutes?
Why do you include the last three rows? It is also not clear the rules to fill the values of the last three rows.
Nevertheless, I think my solution provides you a possible way to achieve this task. You may need to adjust the code by yourself to fit those unclear rules.
Data
dt <- read.table(text = "Date...Time RA.CO2
'6/15/2017 15:40' 599
'6/15/2017 15:50' 594
'6/15/2017 16:00' 606
'6/15/2017 16:10' 594
'6/15/2017 16:20' 594
'6/15/2017 16:30' 594
'6/15/2017 16:40' 594
'6/15/2017 16:50' 594
'6/16/2017 0:00' 496.25
'6/16/2017 0:10' 496.25
'6/16/2017 0:20' 496.25
'6/16/2017 0:30' 496.25",
header = TRUE, stringsAsFactors = FALSE)

Here are some solutions. I have re-read the question and am assuming that new intermediate times should only be inserted before times that are 20 or 50 minutes after the hour and in both cases the immediately prior time (before inserting the intermediate time) must be 10 minutes previous. If that is not the intention of the question then it, the vector of intermediate times, will need to be changed from what is shown.
1) zoo Merge df with a data frame having the intermediate times it and then run na.approx from the zoo package on the RA column to fill in the NA values:
library(zoo)
it <- with(df, DT[c(FALSE, diff(DT) == 10) & as.POSIXlt(DT)$min %in% c(20, 50)] - 5 * 60)
M <- merge(df, data.frame(DT = it), all = TRUE)
transform(M, RA = na.approx(RA))
giving:
DT RA
1 2017-06-15 15:40:00 599.00
2 2017-06-15 15:45:00 596.50
3 2017-06-15 15:50:00 594.00
4 2017-06-15 16:00:00 606.00
5 2017-06-15 16:10:00 594.00
6 2017-06-15 16:15:00 594.00
7 2017-06-15 16:20:00 594.00
8 2017-06-15 16:30:00 594.00
9 2017-06-15 16:40:00 594.00
10 2017-06-15 16:45:00 594.00
11 2017-06-15 16:50:00 594.00
12 2017-06-16 00:00:00 496.25
13 2017-06-16 00:10:00 496.25
14 2017-06-16 00:15:00 496.25
15 2017-06-16 00:20:00 496.25
16 2017-06-16 00:30:00 496.25
1a) Note that if df were converted to zoo, i.e. z <- read.zoo(df, tz = ""), then this could be written as just this giving a zoo object result:
na.approx(merge(z, zoo(, it)))
2) approx This one uses no packages. it is from above.
with(df, data.frame(approx(DT, RA, xout = sort(c(DT, it)))))
giving:
x y
1 2017-06-15 15:40:00 599.00
2 2017-06-15 15:45:00 596.50
3 2017-06-15 15:50:00 594.00
4 2017-06-15 16:00:00 606.00
5 2017-06-15 16:10:00 594.00
6 2017-06-15 16:15:00 594.00
7 2017-06-15 16:20:00 594.00
8 2017-06-15 16:30:00 594.00
9 2017-06-15 16:40:00 594.00
10 2017-06-15 16:45:00 594.00
11 2017-06-15 16:50:00 594.00
12 2017-06-16 00:00:00 496.25
13 2017-06-16 00:10:00 496.25
14 2017-06-16 00:15:00 496.25
15 2017-06-16 00:20:00 496.25
16 2017-06-16 00:30:00 496.25
Note: The input used for the above is:
df <- structure(list(DT = structure(c(1497555600, 1497556200, 1497556800,
1497557400, 1497558000, 1497558600, 1497559200, 1497559800, 1497585600,
1497586200, 1497586800, 1497587400), class = c("POSIXct", "POSIXt"
)), RA = c(599, 594, 606, 594, 594, 594, 594, 594, 496.25, 496.25,
496.25, 496.25)), .Names = c("DT", "RA"), row.names = c(NA, -12L
), class = "data.frame")
Update: Have revised assumption of which intermediate times to include.

Here's a solution using dplyr:
library(dplyr)
df %>%
# calculate interpolated value between each row & next row
mutate(DT.next = lead(DT),
RA.next = lead(RA)) %>%
mutate(diff = difftime(DT.next, DT)) %>%
filter(as.numeric(diff) == 10) %>% #keep only 10 min intervals
mutate(DT.interpolate = DT + diff/2,
RA.interpolate = (RA + RA.next) / 2) %>%
# bind to original dataframe & sort by date
select(DT.interpolate, RA.interpolate) %>%
rename(DT = DT.interpolate, RA = RA.interpolate) %>%
rbind(df) %>%
arrange(DT)
DT RA
1 2017-06-15 15:40:00 599.00
2 2017-06-15 15:45:00 596.50
3 2017-06-15 15:50:00 594.00
4 2017-06-15 15:55:00 600.00
5 2017-06-15 16:00:00 606.00
6 2017-06-15 16:05:00 600.00
7 2017-06-15 16:10:00 594.00
8 2017-06-15 16:15:00 594.00
9 2017-06-15 16:20:00 594.00
10 2017-06-15 16:25:00 594.00
11 2017-06-15 16:30:00 594.00
12 2017-06-15 16:35:00 594.00
13 2017-06-15 16:40:00 594.00
14 2017-06-15 16:45:00 594.00
15 2017-06-15 16:50:00 594.00
16 2017-06-16 00:00:00 496.25
17 2017-06-16 00:05:00 496.25
18 2017-06-16 00:10:00 496.25
19 2017-06-16 00:15:00 496.25
20 2017-06-16 00:20:00 496.25
21 2017-06-16 00:25:00 496.25
22 2017-06-16 00:30:00 496.25
Dataset:
df <- data.frame(
DT = c(seq(from = as.POSIXct("2017-06-15 15:40"),
to = as.POSIXct("2017-06-15 16:50"),
by = "10 min"),
seq(from = as.POSIXct("2017-06-16 00:00"),
to = as.POSIXct("2017-06-16 00:30"),
by = "10 min")),
RA = c(599, 594, 606, rep(594, 5), rep(496.25, 4))
)

Here is a different idea using zoo library,
library(zoo)
df1 <- df[rep(rownames(df), each = 2),]
df1$DateTime[c(FALSE, TRUE)] <- df1$DateTime[c(FALSE, TRUE)]+5*60
df1$RA.CO2[c(FALSE, TRUE)] <- rollapply(df$RA.CO2, 2, by = 2, mean)
which gives,
DateTime RA.CO2
1 2017-06-15 15:40:00 599.00
1.1 2017-06-15 15:45:00 596.50
2 2017-06-15 15:50:00 594.00
2.1 2017-06-15 15:55:00 600.00
3 2017-06-15 16:00:00 606.00
3.1 2017-06-15 16:05:00 594.00
4 2017-06-15 16:10:00 594.00
4.1 2017-06-15 16:15:00 594.00
5 2017-06-15 16:20:00 594.00
5.1 2017-06-15 16:25:00 496.25
6 2017-06-15 16:30:00 594.00
6.1 2017-06-15 16:35:00 496.25
7 2017-06-15 16:40:00 594.00
7.1 2017-06-15 16:45:00 596.50
8 2017-06-15 16:50:00 594.00
8.1 2017-06-15 16:55:00 600.00
9 2017-06-16 00:00:00 496.25
9.1 2017-06-16 00:05:00 594.00
10 2017-06-16 00:10:00 496.25
10.1 2017-06-16 00:15:00 594.00
11 2017-06-16 00:20:00 496.25
11.1 2017-06-16 00:25:00 496.25
12 2017-06-16 00:30:00 496.25
12.1 2017-06-16 00:35:00 496.25

Related

Find the value of a variable 30 days into future

Hi below is my r dataframe. It is a date excerpt from S&P500 data for the past 10 years or so. As you can see I have created a column called Date 30, which is the date + 30 days. I want to add a new column (using dplyr if I can) called Close30, which is the "Close" value on the date of "Date30" - I want to look into the future from a given date (obviously it wont work for the past 30 days...). Sort of like offsetting a column, but it needs a filter/lookup function, because the data is business days, and I need to add 30 calendar days - so I cannot do a constant offset - it needs to be a lookup.
I have tried a few things but an getting nowhere...
Thanks so much if you can help!!?
tidySP500 = na.omit(SP500_Raw) # remove NA in casefuture data have NAs
tidySP500$Date = AsDate(tidySP500$Date)
tidySP500 = tidySP500 %>%
select("Date", "Open", "High", "Low", "Price") %>% # select and re-order required variables
rename("Close" = "Price") %>%
filter(Date >= as.Date("2014-01-05") & Date <= (as.Date("2014-01-05")+100)) %>%
mutate(Date30 = Date + 30)# %>% #WORKS UP TO HERE
mutate(Close30 = Close[Date == Date30]) %>% # FAILS
mutate(Close30 = filter(Close, Date == Date30)) #FAILS
Date Open High Low Close Date30
1 2014-04-15 1831.45 1844.02 1816.29 1842.98 2014-05-15
2 2014-04-14 1818.18 1834.19 1815.80 1830.61 2014-05-14
3 2014-04-11 1830.65 1835.07 1814.36 1815.69 2014-05-11
4 2014-04-10 1872.28 1872.53 1830.87 1833.08 2014-05-10
5 2014-04-09 1852.64 1872.43 1852.38 1872.18 2014-05-09
6 2014-04-08 1845.48 1854.95 1837.49 1851.96 2014-05-08
7 2014-04-07 1863.92 1864.04 1841.48 1845.04 2014-05-07
8 2014-04-04 1890.25 1897.28 1863.26 1865.09 2014-05-04
9 2014-04-03 1891.43 1893.80 1882.65 1888.77 2014-05-03
10 2014-04-02 1886.61 1893.17 1883.79 1890.90 2014-05-02
11 2014-04-01 1873.96 1885.84 1873.96 1885.52 2014-05-01
12 2014-03-31 1859.16 1875.18 1859.16 1872.34 2014-04-30
13 2014-03-28 1850.07 1866.63 1850.07 1857.62 2014-04-27
14 2014-03-27 1852.11 1855.55 1842.11 1849.04 2014-04-26
15 2014-03-26 1867.09 1875.92 1852.56 1852.56 2014-04-25
16 2014-03-25 1859.48 1871.87 1855.96 1865.62 2014-04-24
17 2014-03-24 1867.67 1873.34 1849.69 1857.44 2014-04-23
18 2014-03-21 1874.53 1883.97 1863.46 1866.52 2014-04-20
19 2014-03-20 1860.09 1873.49 1854.63 1872.01 2014-04-19
20 2014-03-19 1872.25 1874.14 1850.35 1860.77 2014-04-18
21 2014-03-18 1858.92 1873.76 1858.92 1872.25 2014-04-17
22 2014-03-17 1842.81 1862.30 1842.81 1858.83 2014-04-16
23 2014-03-14 1845.07 1852.44 1839.57 1841.13 2014-04-13
24 2014-03-13 1869.06 1874.40 1841.86 1846.34 2014-04-12
25 2014-03-12 1866.15 1868.38 1854.38 1868.20 2014-04-11
26 2014-03-11 1878.26 1882.35 1863.88 1867.63 2014-04-10
27 2014-03-10 1877.86 1877.87 1867.04 1877.17 2014-04-09
28 2014-03-07 1878.52 1883.57 1870.56 1878.04 2014-04-06
29 2014-03-06 1874.18 1881.94 1874.18 1877.03 2014-04-05
30 2014-03-05 1874.05 1876.53 1871.11 1873.81 2014-04-04
31 2014-03-04 1849.23 1876.23 1849.23 1873.91 2014-04-03
32 2014-03-03 1857.68 1857.68 1834.44 1845.73 2014-04-02
33 2014-02-28 1855.12 1867.92 1847.67 1859.45 2014-03-30
34 2014-02-27 1844.90 1854.53 1841.13 1854.29 2014-03-29
35 2014-02-26 1845.79 1852.65 1840.66 1845.16 2014-03-28
36 2014-02-25 1847.66 1852.91 1840.19 1845.12 2014-03-27
37 2014-02-24 1836.78 1858.71 1836.78 1847.61 2014-03-26
38 2014-02-21 1841.07 1846.13 1835.60 1836.25 2014-03-23
39 2014-02-20 1829.24 1842.79 1824.58 1839.78 2014-03-22
40 2014-02-19 1838.90 1847.50 1826.99 1828.75 2014-03-21
41 2014-02-18 1839.03 1842.87 1835.01 1840.76 2014-03-20
42 2014-02-14 1828.46 1841.65 1825.59 1838.63 2014-03-16
43 2014-02-13 1814.82 1830.25 1809.22 1829.83 2014-03-15
44 2014-02-12 1820.12 1826.55 1815.97 1819.26 2014-03-14
45 2014-02-11 1800.45 1823.54 1800.41 1819.75 2014-03-13
46 2014-02-10 1796.20 1799.94 1791.83 1799.84 2014-03-12
47 2014-02-07 1776.01 1798.03 1776.01 1797.02 2014-03-09
48 2014-02-06 1752.99 1774.06 1752.99 1773.43 2014-03-08
49 2014-02-05 1753.38 1755.79 1737.92 1751.64 2014-03-07
50 2014-02-04 1743.82 1758.73 1743.82 1755.20 2014-03-06
51 2014-02-03 1782.68 1784.83 1739.66 1741.89 2014-03-05
52 2014-01-31 1790.88 1793.88 1772.26 1782.59 2014-03-02
53 2014-01-30 1777.17 1798.77 1777.17 1794.19 2014-03-01
54 2014-01-29 1790.15 1790.15 1770.45 1774.20 2014-02-28
55 2014-01-28 1783.00 1793.87 1779.49 1792.50 2014-02-27
56 2014-01-27 1791.03 1795.98 1772.88 1781.56 2014-02-26
57 2014-01-24 1826.96 1826.96 1790.29 1790.29 2014-02-23
58 2014-01-23 1842.29 1842.29 1820.06 1828.46 2014-02-22
59 2014-01-22 1844.71 1846.87 1840.88 1844.86 2014-02-21
60 2014-01-21 1841.05 1849.31 1832.38 1843.80 2014-02-20
61 2014-01-17 1844.23 1846.04 1835.23 1838.70 2014-02-16
62 2014-01-16 1847.99 1847.99 1840.30 1845.89 2014-02-15
63 2014-01-15 1840.52 1850.84 1840.52 1848.38 2014-02-14
64 2014-01-14 1821.36 1839.26 1821.36 1838.88 2014-02-13
65 2014-01-13 1841.26 1843.45 1815.52 1819.20 2014-02-12
66 2014-01-10 1840.06 1843.15 1832.43 1842.37 2014-02-09
67 2014-01-09 1839.00 1843.23 1830.38 1838.13 2014-02-08
68 2014-01-08 1837.90 1840.02 1831.40 1837.49 2014-02-07
69 2014-01-07 1828.71 1840.10 1828.71 1837.88 2014-02-06
70 2014-01-06 1832.31 1837.16 1823.73 1826.77 2014-02-05
Something like this?
library(tidyverse)
tidySP500 %>% left_join(select(tidySP500, Close, Date30 = Date), by = c('Date30'))
#> # A tibble: 70 x 7
#> Date Open High Low Close.x Date30 Close.y
#> <date> <dbl> <dbl> <dbl> <dbl> <date> <dbl>
#> 1 2014-04-15 1831. 1844. 1816. 1843. 2014-05-15 NA
#> 2 2014-04-14 1818. 1834. 1816. 1831. 2014-05-14 NA
#> 3 2014-04-11 1831. 1835. 1814. 1816. 2014-05-11 NA
#> 4 2014-04-10 1872. 1873. 1831. 1833. 2014-05-10 NA
#> 5 2014-04-09 1853. 1872. 1852. 1872. 2014-05-09 NA
#> 6 2014-04-08 1845. 1855. 1837. 1852. 2014-05-08 NA
#> 7 2014-04-07 1864. 1864. 1841. 1845. 2014-05-07 NA
#> 8 2014-04-04 1890. 1897. 1863. 1865. 2014-05-04 NA
#> 9 2014-04-03 1891. 1894. 1883. 1889. 2014-05-03 NA
#> 10 2014-04-02 1887. 1893. 1884. 1891. 2014-05-02 NA
#> # … with 60 more rows
Created on 2020-02-22 by the reprex package (v0.3.0)
DATA
tidySP500 <- read.so::read_so('Date Open High Low Close Date30
1 2014-04-15 1831.45 1844.02 1816.29 1842.98 2014-05-15
2 2014-04-14 1818.18 1834.19 1815.80 1830.61 2014-05-14
3 2014-04-11 1830.65 1835.07 1814.36 1815.69 2014-05-11
4 2014-04-10 1872.28 1872.53 1830.87 1833.08 2014-05-10
5 2014-04-09 1852.64 1872.43 1852.38 1872.18 2014-05-09
6 2014-04-08 1845.48 1854.95 1837.49 1851.96 2014-05-08
7 2014-04-07 1863.92 1864.04 1841.48 1845.04 2014-05-07
8 2014-04-04 1890.25 1897.28 1863.26 1865.09 2014-05-04
9 2014-04-03 1891.43 1893.80 1882.65 1888.77 2014-05-03
10 2014-04-02 1886.61 1893.17 1883.79 1890.90 2014-05-02
11 2014-04-01 1873.96 1885.84 1873.96 1885.52 2014-05-01
12 2014-03-31 1859.16 1875.18 1859.16 1872.34 2014-04-30
13 2014-03-28 1850.07 1866.63 1850.07 1857.62 2014-04-27
14 2014-03-27 1852.11 1855.55 1842.11 1849.04 2014-04-26
15 2014-03-26 1867.09 1875.92 1852.56 1852.56 2014-04-25
16 2014-03-25 1859.48 1871.87 1855.96 1865.62 2014-04-24
17 2014-03-24 1867.67 1873.34 1849.69 1857.44 2014-04-23
18 2014-03-21 1874.53 1883.97 1863.46 1866.52 2014-04-20
19 2014-03-20 1860.09 1873.49 1854.63 1872.01 2014-04-19
20 2014-03-19 1872.25 1874.14 1850.35 1860.77 2014-04-18
21 2014-03-18 1858.92 1873.76 1858.92 1872.25 2014-04-17
22 2014-03-17 1842.81 1862.30 1842.81 1858.83 2014-04-16
23 2014-03-14 1845.07 1852.44 1839.57 1841.13 2014-04-13
24 2014-03-13 1869.06 1874.40 1841.86 1846.34 2014-04-12
25 2014-03-12 1866.15 1868.38 1854.38 1868.20 2014-04-11
26 2014-03-11 1878.26 1882.35 1863.88 1867.63 2014-04-10
27 2014-03-10 1877.86 1877.87 1867.04 1877.17 2014-04-09
28 2014-03-07 1878.52 1883.57 1870.56 1878.04 2014-04-06
29 2014-03-06 1874.18 1881.94 1874.18 1877.03 2014-04-05
30 2014-03-05 1874.05 1876.53 1871.11 1873.81 2014-04-04
31 2014-03-04 1849.23 1876.23 1849.23 1873.91 2014-04-03
32 2014-03-03 1857.68 1857.68 1834.44 1845.73 2014-04-02
33 2014-02-28 1855.12 1867.92 1847.67 1859.45 2014-03-30
34 2014-02-27 1844.90 1854.53 1841.13 1854.29 2014-03-29
35 2014-02-26 1845.79 1852.65 1840.66 1845.16 2014-03-28
36 2014-02-25 1847.66 1852.91 1840.19 1845.12 2014-03-27
37 2014-02-24 1836.78 1858.71 1836.78 1847.61 2014-03-26
38 2014-02-21 1841.07 1846.13 1835.60 1836.25 2014-03-23
39 2014-02-20 1829.24 1842.79 1824.58 1839.78 2014-03-22
40 2014-02-19 1838.90 1847.50 1826.99 1828.75 2014-03-21
41 2014-02-18 1839.03 1842.87 1835.01 1840.76 2014-03-20
42 2014-02-14 1828.46 1841.65 1825.59 1838.63 2014-03-16
43 2014-02-13 1814.82 1830.25 1809.22 1829.83 2014-03-15
44 2014-02-12 1820.12 1826.55 1815.97 1819.26 2014-03-14
45 2014-02-11 1800.45 1823.54 1800.41 1819.75 2014-03-13
46 2014-02-10 1796.20 1799.94 1791.83 1799.84 2014-03-12
47 2014-02-07 1776.01 1798.03 1776.01 1797.02 2014-03-09
48 2014-02-06 1752.99 1774.06 1752.99 1773.43 2014-03-08
49 2014-02-05 1753.38 1755.79 1737.92 1751.64 2014-03-07
50 2014-02-04 1743.82 1758.73 1743.82 1755.20 2014-03-06
51 2014-02-03 1782.68 1784.83 1739.66 1741.89 2014-03-05
52 2014-01-31 1790.88 1793.88 1772.26 1782.59 2014-03-02
53 2014-01-30 1777.17 1798.77 1777.17 1794.19 2014-03-01
54 2014-01-29 1790.15 1790.15 1770.45 1774.20 2014-02-28
55 2014-01-28 1783.00 1793.87 1779.49 1792.50 2014-02-27
56 2014-01-27 1791.03 1795.98 1772.88 1781.56 2014-02-26
57 2014-01-24 1826.96 1826.96 1790.29 1790.29 2014-02-23
58 2014-01-23 1842.29 1842.29 1820.06 1828.46 2014-02-22
59 2014-01-22 1844.71 1846.87 1840.88 1844.86 2014-02-21
60 2014-01-21 1841.05 1849.31 1832.38 1843.80 2014-02-20
61 2014-01-17 1844.23 1846.04 1835.23 1838.70 2014-02-16
62 2014-01-16 1847.99 1847.99 1840.30 1845.89 2014-02-15
63 2014-01-15 1840.52 1850.84 1840.52 1848.38 2014-02-14
64 2014-01-14 1821.36 1839.26 1821.36 1838.88 2014-02-13
65 2014-01-13 1841.26 1843.45 1815.52 1819.20 2014-02-12
66 2014-01-10 1840.06 1843.15 1832.43 1842.37 2014-02-09
67 2014-01-09 1839.00 1843.23 1830.38 1838.13 2014-02-08
68 2014-01-08 1837.90 1840.02 1831.40 1837.49 2014-02-07
69 2014-01-07 1828.71 1840.10 1828.71 1837.88 2014-02-06
70 2014-01-06 1832.31 1837.16 1823.73 1826.77 2014-02-05')

How to calculate distance and time between two locations

Here's a sample of some data
Tag.ID TimeStep.coa Latitude.coa Longitude.coa
<chr> <dttm> <dbl> <dbl>
1 1657 2017-08-17 12:00:00 72.4 -81.1
2 1657 2017-08-17 18:00:00 72.3 -81.1
3 1658 2017-08-14 18:00:00 72.3 -81.2
4 1658 2017-08-15 00:00:00 72.3 -81.3
5 1659 2017-08-14 18:00:00 72.3 -81.1
6 1659 2017-08-15 00:00:00 72.3 -81.2
7 1660 2017-08-20 18:00:00 72.3 -81.1
8 1660 2017-08-21 00:00:00 72.3 -81.2
9 1660 2017-08-21 06:00:00 72.3 -81.2
10 1660 2017-08-21 12:00:00 72.3 -81.3
11 1661 2017-08-28 12:00:00 72.4 -81.1
12 1661 2017-08-28 18:00:00 72.3 -81.1
13 1661 2017-08-29 06:00:00 72.3 -81.2
14 1661 2017-08-29 12:00:00 72.3 -81.2
15 1661 2017-08-30 06:00:00 72.3 -81.2
16 1661 2017-08-30 18:00:00 72.3 -81.2
17 1661 2017-08-31 00:00:00 72.3 -81.2
18 1661 2017-08-31 06:00:00 72.3 -81.2
19 1661 2017-08-31 12:00:00 72.3 -81.2
20 1661 2017-08-31 18:00:00 72.4 -81.1
I'm looking for a method to obtain distances travelled for each ID. I will be using the ComputeDistance function within VTrack package (could use a different function though). The function looks like this:
ComputeDistance( Lat1, Lat2, Lon1, Lon2)
This calculates a straight line distance between lat/lon coordinates.
I eventually want a dataframe with four columns Tag.ID, Timestep1, Timestep2, and distance. Here's an example:
Tag.ID Timestep1 Timestep2 Distance
1657 2017-08-17 12:00:00 2017-08-17 18:00:00 ComputeDistance(72.4,72.3,-81.1,-81.1)
1658 2017-08-14 18:00:00 2017-08-15 00:00:00 ComputeDistance(72.3,72.3,-81.2,-81.3)
1659 2017-08-14 18:00:00 2017-08-15 00:00:00 ComputeDistance(72.3,72.3,-81.1,-81.2)
1660 2017-08-20 18:00:00 2017-08-21 00:00:00 ComputeDistance(72.3,72.3,-81.1,-81.2)
1660 2017-08-21 00:00:00 2017-08-21 06:00:00 ComputeDistance(72.3,72.3,=81.1,-81.2
And so on
EDIT:
This is the code I used (thanks AntoniosK). COASpeeds2 is exactly the same as the sample df above:
test <- COASpeeds2 %>%
group_by(Tag.ID) %>%
mutate(Timestep1 = TimeStep.coa,
Timestep2 = lead(TimeStep.coa),
Distance = ComputeDistance(Latitude.coa, lead(Latitude.coa),
Longitude.coa, lead(Longitude.coa))) %>%
ungroup() %>%
na.omit() %>%
select(Tag.ID, Timestep1, Timestep2, Distance)
This is the df I'm getting.
Tag.ID Timestep1 Timestep2 Distance
<fct> <dttm> <dttm> <dbl>
1 1657 2017-08-17 12:00:00 2017-08-17 18:00:00 2.76
2 1657 2017-08-17 18:00:00 2017-08-14 18:00:00 1.40
3 1658 2017-08-14 18:00:00 2017-08-15 00:00:00 6.51
4 1658 2017-08-15 00:00:00 2017-08-14 18:00:00 10.5
5 1659 2017-08-14 18:00:00 2017-08-15 00:00:00 7.51
6 1659 2017-08-15 00:00:00 2017-08-20 18:00:00 7.55
7 1660 2017-08-20 18:00:00 2017-08-21 00:00:00 3.69
8 1660 2017-08-21 00:00:00 2017-08-21 06:00:00 4.32
9 1660 2017-08-21 06:00:00 2017-08-21 12:00:00 3.26
10 1660 2017-08-21 12:00:00 2017-08-28 12:00:00 10.5
11 1661 2017-08-28 12:00:00 2017-08-28 18:00:00 1.60
12 1661 2017-08-28 18:00:00 2017-08-29 06:00:00 1.94
13 1661 2017-08-29 06:00:00 2017-08-29 12:00:00 5.22
14 1661 2017-08-29 12:00:00 2017-08-30 06:00:00 0.759
15 1661 2017-08-30 06:00:00 2017-08-30 18:00:00 1.94
16 1661 2017-08-30 18:00:00 2017-08-31 00:00:00 0.342
17 1661 2017-08-31 00:00:00 2017-08-31 06:00:00 0.281
18 1661 2017-08-31 06:00:00 2017-08-31 12:00:00 4.21
19 1661 2017-08-31 12:00:00 2017-08-31 18:00:00 8.77
library(tidyverse)
library(VTrack)
# example data
dt = read.table(text = "
Tag.ID TimeStep.coa Latitude.coa Longitude.coa
1 1657 2017-08-17_12:00:00 72.4 -81.1
2 1657 2017-08-17_18:00:00 72.3 -81.1
3 1658 2017-08-14_18:00:00 72.3 -81.2
4 1658 2017-08-15_00:00:00 72.3 -81.3
5 1659 2017-08-14_18:00:00 72.3 -81.1
6 1659 2017-08-15_00:00:00 72.3 -81.2
7 1660 2017-08-20_18:00:00 72.3 -81.1
8 1660 2017-08-21_00:00:00 72.3 -81.2
9 1660 2017-08-21_06:00:00 72.3 -81.2
10 1660 2017-08-21_12:00:00 72.3 -81.3
", header=T)
dt %>%
group_by(Tag.ID) %>%
mutate(Timestep1 = TimeStep.coa,
Timestep2 = lead(TimeStep.coa),
Distance = ComputeDistance(Latitude.coa, lead(Latitude.coa),
Longitude.coa, lead(Longitude.coa))) %>%
ungroup() %>%
na.omit() %>%
select(Tag.ID, Timestep1, Timestep2, Distance)
As a result you get this:
# # A tibble: 6 x 4
# Tag.ID Timestep1 Timestep2 Distance
# <int> <fct> <fct> <dbl>
# 1 1657 2017-08-17_12:00:00 2017-08-17_18:00:00 11.1
# 2 1658 2017-08-14_18:00:00 2017-08-15_00:00:00 3.38
# 3 1659 2017-08-14_18:00:00 2017-08-15_00:00:00 3.38
# 4 1660 2017-08-20_18:00:00 2017-08-21_00:00:00 3.38
# 5 1660 2017-08-21_00:00:00 2017-08-21_06:00:00 0.0000949
# 6 1660 2017-08-21_06:00:00 2017-08-21_12:00:00 3.38
You could use geosphere::distGeo in a by approach.
library(geosphere)
do.call(rbind.data.frame, by(dat, dat$Tag.ID, function(s) {
t.diff <- (s$TimeStep.coa[length(s$TimeStep.coa)] - s$TimeStep.coa[1])
d.diff <- sum(mapply(function(x, y)
distGeo(s[x, 3:4], s[y, 3:4]), x=1:(nrow(s)-1), y=2:nrow(s)))/1e3
`colnames<-`(cbind(t.diff, d.diff), c("hours", "km"))
}))
# hours km
# 1657 6.00 1.727882
# 1658 6.00 11.166785
# 1659 6.00 11.166726
# 1660 18.00 22.333511
# 1661 3.25 24.192753
Data:
dat <- structure(list(Tag.ID = c(1657L, 1657L, 1658L, 1658L, 1659L,
1659L, 1660L, 1660L, 1660L, 1660L, 1661L, 1661L, 1661L, 1661L,
1661L, 1661L, 1661L, 1661L, 1661L, 1661L), TimeStep.coa = structure(c(1502964000,
1502985600, 1502726400, 1502748000, 1502726400, 1502748000, 1503244800,
1503266400, 1503288000, 1503309600, 1503914400, 1503936000, 1503979200,
1504000800, 1504065600, 1504108800, 1504130400, 1504152000, 1504173600,
1504195200), class = c("POSIXct", "POSIXt"), tzone = ""), Latitude.coa = c(72.4,
72.3, 72.3, 72.3, 72.3, 72.3, 72.3, 72.3, 72.3, 72.3, 72.4, 72.3,
72.3, 72.3, 72.3, 72.3, 72.3, 72.3, 72.3, 72.4), Longitude.coa = c(-81.1,
-81.1, -81.2, -81.3, -81.1, -81.2, -81.1, -81.2, -81.2, -81.3,
-81.1, -81.1, -81.2, -81.2, -81.2, -81.2, -81.2, -81.2, -81.2,
-81.1)), row.names = c(NA, -20L), class = "data.frame")
Assuming the start and ending points are in order and have a matching pair.
Here is another option:
#identify the start and end of each trip
df$leg<-rep(c("Start", "End"), nrow(df)/2)
#label each trip
df$trip <- rep(1:(nrow(df)/2), each=2)
#change the shape
library(tidyr)
output<-pivot_wider(df, id_cols = c(Tag.ID, trip),
names_from = leg,
values_from = c(TimeStep.coa, Latitude.coa, Longitude.coa))
#calcuate distance (use your package of choice)
library(geosphere)
output$distance<-distGeo(output[ ,c("Longitude.coa_Start", "Latitude.coa_Start")],
output[ ,c("Longitude.coa_End", "Latitude.coa_End")])
# #remove undesired columns
# output <- output[, -c(5, 6, 7, 8)]
output
> output[, -c(5, 6, 7, 8)]
# A tibble: 10 x 5
Tag.ID trip TimeStep.coa_Start TimeStep.coa_End distance
<int> <int> <fct> <fct> <dbl>
1 1657 1 2017-08-17 12:00:00 2017-08-17 18:00:00 11159.
2 1658 2 2017-08-14 18:00:00 2017-08-15 00:00:00 3395.
3 1659 3 2017-08-14 18:00:00 2017-08-15 00:00:00 3395.
4 1660 4 2017-08-20 18:00:00 2017-08-21 00:00:00 3395.
5 1660 5 2017-08-21 06:00:00 2017-08-21 12:00:00 3395.
6 1661 6 2017-08-28 12:00:00 2017-08-28 18:00:00 11159.
7 1661 7 2017-08-29 06:00:00 2017-08-29 12:00:00 0
8 1661 8 2017-08-30 06:00:00 2017-08-30 18:00:00 0
9 1661 9 2017-08-31 00:00:00 2017-08-31 06:00:00 0
10 1661 10 2017-08-31 12:00:00 2017-08-31 18:00:00 11661.

Fill Missing Interval Values in r

I have a data with 4 variables, for which 2 of them are date variables. I would like to check whether the intervals for rows with TYPE == “OT” or TYPE == “NON-OT” fall within the interval of the preceding row with TYPE == “ICU”.
Data:
df <- structure(list(id = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1), TYPE = c("NON-OT", "NON-OT", "OT", "ICU", "OT",
"NON-OT", "OT", "NON-OT", "ICU", "OT", "OT", "ICU", "OT", "OT",
"NON-OT", "OT", "NON-OT"), DATE1 = structure(c(1427214540, 1427216280,
1427279700, 1427370420, 1427543700, 1427564520, 1427800800, 1427849280,
1427850240, 1427927400, 1428155400, 1428166380, 1428514500, 1428927000,
1429167600, 1429264500, 1429388160), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), DATE2 = structure(c(1427216280, 1427370420,
1427279700, 1427564520, 1427543700, 1427849280, 1427800800, 1427850240,
1428166380, 1427927400, 1428155400, 1429388160, 1428514500, 1428927000,
1429167600, 1429264500, 1430362020), class = c("POSIXct", "POSIXt"
), tzone = "UTC")), .Names = c("id", "TYPE", "DATE1", "DATE2"
), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-17L))
# id TYPE DATE1 DATE2
# 1 1 NON-OT 2015-03-24 16:29:00 2015-03-24 16:58:00
# 2 1 NON-OT 2015-03-24 16:58:00 2015-03-26 11:47:00
# 3 1 OT 2015-03-25 10:35:00 2015-03-25 10:35:00
# 4 1 ICU 2015-03-26 11:47:00 2015-03-28 17:42:00
# 5 1 OT 2015-03-28 11:55:00 2015-03-28 11:55:00
# 6 1 NON-OT 2015-03-28 17:42:00 2015-04-01 00:48:00
# 7 1 OT 2015-03-31 11:20:00 2015-03-31 11:20:00
# 8 1 NON-OT 2015-04-01 00:48:00 2015-04-01 01:04:00
# 9 1 ICU 2015-04-01 01:04:00 2015-04-04 16:53:00
# 10 1 OT 2015-04-01 22:30:00 2015-04-01 22:30:00
# 11 1 OT 2015-04-04 13:50:00 2015-04-04 13:50:00
# 12 1 ICU 2015-04-04 16:53:00 2015-04-18 20:16:00
# 13 1 OT 2015-04-08 17:35:00 2015-04-08 17:35:00
# 14 1 OT 2015-04-13 12:10:00 2015-04-13 12:10:00
# 15 1 NON-OT 2015-04-16 07:00:00 2015-04-16 07:00:00
# 16 1 OT 2015-04-17 09:55:00 2015-04-17 09:55:00
# 17 1 NON-OT 2015-04-18 20:16:00 2015-04-30 02:47:00
This is what I have done:
Obtain a new variable, INT that gives the interval between DATE1 and DATE2 for every row.
Obtain another variable, INT_ICU that gives the interval for rows with TYPE == “ICU” only and fill down (This is where the problem comes as the fill function in tidyr could not fill in the missing interval values.)
Obtain a logical variable, WITHIN_ICU, which gives TRUE if the interval is within the interval of ICU and FALSE otherwise.
Code:
library(tidyverse)
df %>%
mutate(INT = interval(DATE1, DATE2),
INT_ICU = if_else(TYPE == "ICU", interval(DATE1, DATE2), NA_real_)) %>%
fill(INT_ICU) %>%
mutate(WITHIN_ICU = INT %within% INT_ICU)
Output:
As you can see, there are a lot of missing values in INT_ICU variables even when I have applied fill function.
# id TYPE DATE1 DATE2 INT INT_ICU WITHIN_ICU
# <dbl> <chr> <dttm> <dttm> <S4: Interval> <S4: Interval> <lgl>
# 1 1 NON-OT 2015-03-24 16:29:00 2015-03-24 16:58:00 2015-03-24 16:29:00 UTC--2015-03-24 16:58:00 UTC NA--NA NA
# 2 1 NON-OT 2015-03-24 16:58:00 2015-03-26 11:47:00 2015-03-24 16:58:00 UTC--2015-03-26 11:47:00 UTC NA--NA NA
# 3 1 OT 2015-03-25 10:35:00 2015-03-25 10:35:00 2015-03-25 10:35:00 UTC--2015-03-25 10:35:00 UTC NA--NA NA
# 4 1 ICU 2015-03-26 11:47:00 2015-03-28 17:42:00 2015-03-26 11:47:00 UTC--2015-03-28 17:42:00 UTC 2015-03-26 11:47:00 UTC--2015-03-28 17:42:00 UTC TRUE
# 5 1 OT 2015-03-28 11:55:00 2015-03-28 11:55:00 2015-03-28 11:55:00 UTC--2015-03-28 11:55:00 UTC NA--NA NA
# 6 1 NON-OT 2015-03-28 17:42:00 2015-04-01 00:48:00 2015-03-28 17:42:00 UTC--2015-04-01 00:48:00 UTC NA--NA NA
# 7 1 OT 2015-03-31 11:20:00 2015-03-31 11:20:00 2015-03-31 11:20:00 UTC--2015-03-31 11:20:00 UTC NA--NA NA
# 8 1 NON-OT 2015-04-01 00:48:00 2015-04-01 01:04:00 2015-04-01 00:48:00 UTC--2015-04-01 01:04:00 UTC NA--NA NA
# 9 1 ICU 2015-04-01 01:04:00 2015-04-04 16:53:00 2015-04-01 01:04:00 UTC--2015-04-04 16:53:00 UTC 2015-04-01 01:04:00 UTC--2015-04-04 16:53:00 UTC TRUE
# 10 1 OT 2015-04-01 22:30:00 2015-04-01 22:30:00 2015-04-01 22:30:00 UTC--2015-04-01 22:30:00 UTC NA--NA NA
# 11 1 OT 2015-04-04 13:50:00 2015-04-04 13:50:00 2015-04-04 13:50:00 UTC--2015-04-04 13:50:00 UTC NA--NA NA
# 12 1 ICU 2015-04-04 16:53:00 2015-04-18 20:16:00 2015-04-04 16:53:00 UTC--2015-04-18 20:16:00 UTC 2015-04-04 16:53:00 UTC--2015-04-18 20:16:00 UTC TRUE
# 13 1 OT 2015-04-08 17:35:00 2015-04-08 17:35:00 2015-04-08 17:35:00 UTC--2015-04-08 17:35:00 UTC NA--NA NA
# 14 1 OT 2015-04-13 12:10:00 2015-04-13 12:10:00 2015-04-13 12:10:00 UTC--2015-04-13 12:10:00 UTC NA--NA NA
# 15 1 NON-OT 2015-04-16 07:00:00 2015-04-16 07:00:00 2015-04-16 07:00:00 UTC--2015-04-16 07:00:00 UTC NA--NA NA
# 16 1 OT 2015-04-17 09:55:00 2015-04-17 09:55:00 2015-04-17 09:55:00 UTC--2015-04-17 09:55:00 UTC NA--NA NA
# 17 1 NON-OT 2015-04-18 20:16:00 2015-04-30 02:47:00 2015-04-18 20:16:00 UTC--2015-04-30 02:47:00 UTC NA--NA NA
Desired Output:
# id TYPE DATE1 DATE2 WITHIN_ICU
# <dbl> <chr> <dttm> <dttm> <lgl>
# 1 1 NON-OT 2015-03-24 16:29:00 2015-03-24 16:58:00 NA
# 2 1 NON-OT 2015-03-24 16:58:00 2015-03-26 11:47:00 NA
# 3 1 OT 2015-03-25 10:35:00 2015-03-25 10:35:00 NA
# 4 1 ICU 2015-03-26 11:47:00 2015-03-28 17:42:00 TRUE
# 5 1 OT 2015-03-28 11:55:00 2015-03-28 11:55:00 TRUE
# 6 1 NON-OT 2015-03-28 17:42:00 2015-04-01 00:48:00 FALSE
# 7 1 OT 2015-03-31 11:20:00 2015-03-31 11:20:00 FALSE
# 8 1 NON-OT 2015-04-01 00:48:00 2015-04-01 01:04:00 FALSE
# 9 1 ICU 2015-04-01 01:04:00 2015-04-04 16:53:00 TRUE
# 10 1 OT 2015-04-01 22:30:00 2015-04-01 22:30:00 TRUE
# 11 1 OT 2015-04-04 13:50:00 2015-04-04 13:50:00 TRUE
# 12 1 ICU 2015-04-04 16:53:00 2015-04-18 20:16:00 TRUE
# 13 1 OT 2015-04-08 17:35:00 2015-04-08 17:35:00 TRUE
# 14 1 OT 2015-04-13 12:10:00 2015-04-13 12:10:00 TRUE
# 15 1 NON-OT 2015-04-16 07:00:00 2015-04-16 07:00:00 TRUE
# 16 1 OT 2015-04-17 09:55:00 2015-04-17 09:55:00 TRUE
# 17 1 NON-OT 2015-04-18 20:16:00 2015-04-30 02:47:00 FALSE
This should work
# use own function to fill rather than using dplyr's fill
f2 <- function(x) {
for(i in seq_along(x)[-1]) if(is.na(x#start[i])) x[i] <- x[i-1]#check if Start in S4 interval object is NA.
x
}
df %>%
mutate(INT = interval(DATE1, DATE2),
INT_ICU = if_else(TYPE == "ICU", interval(DATE1, DATE2), NA_real_)) %>%
mutate(INT_ICU = f2(t$INT_ICU)) %>% #instead of fill
mutate(WITHIN_ICU = INT %within% INT_ICU)
The output:
# A tibble: 17 x 6
id TYPE DATE1 DATE2 INT_ICU WITHIN_ICU
<dbl> <chr> <dttm> <dttm> <S4: Interval> <lgl>
1 1. NON-OT 2015-03-24 16:29:00 2015-03-24 16:58:00 NA--NA NA
2 1. NON-OT 2015-03-24 16:58:00 2015-03-26 11:47:00 NA--NA NA
3 1. OT 2015-03-25 10:35:00 2015-03-25 10:35:00 NA--NA NA
4 1. ICU 2015-03-26 11:47:00 2015-03-28 17:42:00 2015-03-26 11:47:00 UTC--2015-03-28 17:42:00 UTC TRUE
5 1. OT 2015-03-28 11:55:00 2015-03-28 11:55:00 2015-03-26 11:47:00 UTC--2015-03-28 17:42:00 UTC TRUE
6 1. NON-OT 2015-03-28 17:42:00 2015-04-01 00:48:00 2015-03-26 11:47:00 UTC--2015-03-28 17:42:00 UTC FALSE
7 1. OT 2015-03-31 11:20:00 2015-03-31 11:20:00 2015-03-26 11:47:00 UTC--2015-03-28 17:42:00 UTC FALSE
8 1. NON-OT 2015-04-01 00:48:00 2015-04-01 01:04:00 2015-03-26 11:47:00 UTC--2015-03-28 17:42:00 UTC FALSE
9 1. ICU 2015-04-01 01:04:00 2015-04-04 16:53:00 2015-04-01 01:04:00 UTC--2015-04-04 16:53:00 UTC TRUE
10 1. OT 2015-04-01 22:30:00 2015-04-01 22:30:00 2015-04-01 01:04:00 UTC--2015-04-04 16:53:00 UTC TRUE
11 1. OT 2015-04-04 13:50:00 2015-04-04 13:50:00 2015-04-01 01:04:00 UTC--2015-04-04 16:53:00 UTC TRUE
12 1. ICU 2015-04-04 16:53:00 2015-04-18 20:16:00 2015-04-04 16:53:00 UTC--2015-04-18 20:16:00 UTC TRUE
13 1. OT 2015-04-08 17:35:00 2015-04-08 17:35:00 2015-04-04 16:53:00 UTC--2015-04-18 20:16:00 UTC TRUE
14 1. OT 2015-04-13 12:10:00 2015-04-13 12:10:00 2015-04-04 16:53:00 UTC--2015-04-18 20:16:00 UTC TRUE
15 1. NON-OT 2015-04-16 07:00:00 2015-04-16 07:00:00 2015-04-04 16:53:00 UTC--2015-04-18 20:16:00 UTC TRUE
16 1. OT 2015-04-17 09:55:00 2015-04-17 09:55:00 2015-04-04 16:53:00 UTC--2015-04-18 20:16:00 UTC TRUE
17 1. NON-OT 2015-04-18 20:16:00 2015-04-30 02:47:00 2015-04-04 16:53:00 UTC--2015-04-18 20:16:00 UTC FALSE

extract the remaining time period

I have two data frames.
df1
Tstart Tend start_temp
2012-12-19 21:12:00 2012-12-20 02:48:00 17.7637930350627
2013-01-31 17:36:00 2013-01-31 22:54:00 18.9618654078963
2013-02-14 09:12:00 2013-02-14 09:48:00 18.2361739981826
2013-02-21 15:36:00 2013-02-21 16:36:00 20.9938186870285
2013-03-21 03:54:00 2013-03-21 05:18:00 16.7130008152092
2013-03-30 23:42:00 2013-03-31 02:30:00 15.3775459369926
df2
datetime airtemp
2012-12-11 23:00:00 14.40
2012-12-11 23:06:00 14.22
2012-12-11 23:12:00 14.04
2012-12-11 23:18:00 13.86
2012-12-11 23:24:00 13.68
2012-12-11 23:30:00 13.50
......
2015-03-31 23:24:00 15.46
2015-03-31 23:30:00 15.90
2015-03-31 23:36:00 15.82
2015-03-31 23:42:00 15.74
I want to extract the remaining datetime from df2 (df2 is a time series) other than the periods between startT and endT in df1.
Can you please help me to do this?
Many thanks.
With base R we can try the following (with the following df1 & df2):
df1 <- read.csv(text='Tstart, Tend, start_temp
2012-12-19 21:12:00, 2012-12-20 02:48:00, 17.7637930350627
2013-01-31 17:36:00, 2013-01-31 22:54:00, 18.9618654078963
2013-02-14 09:12:00, 2013-02-14 09:48:00, 18.2361739981826
2013-02-21 15:36:00, 2013-02-21 16:36:00, 20.9938186870285
2013-03-21 03:54:00, 2013-03-21 05:18:00, 16.7130008152092
2013-03-30 23:42:00, 2013-03-31 02:30:00, 15.3775459369926', header=TRUE)
df2 <- read.csv(text='datetime, airtemp
2012-12-11 23:00:00, 14.40
2012-12-11 23:06:00, 14.22
2012-12-11 23:12:00, 14.04
2012-12-11 23:18:00, 13.86
2012-12-11 23:24:00, 13.68
2012-12-19 23:30:00, 13.50
2013-03-21 04:24:00, 15.46
2013-03-21 23:30:00, 15.90
2015-03-31 23:36:00, 15.82
2015-03-31 23:42:00, 15.74', header=TRUE)
df1$Tstart <- strptime(as.character(df1$Tstart), '%Y-%m-%d %H:%M:%S')
df1$Tend <- strptime(as.character(df1$Tend), '%Y-%m-%d %H:%M:%S')
df2$datetime <- strptime(as.character(df2$datetime), '%Y-%m-%d %H:%M:%S')
indices <- sapply(1:nrow(df2), function(j) all(sapply(1:nrow(df1), function(i) df2[j,]$datetime < df1[i,]$Tstart | df2[j,]$datetime > df1[i,]$Tend)))
df2[indices,]
# datetime airtemp
#1 2012-12-11 23:00:00 14.40
#2 2012-12-11 23:06:00 14.22
#3 2012-12-11 23:12:00 14.04
#4 2012-12-11 23:18:00 13.86
#5 2012-12-11 23:24:00 13.68
#8 2013-03-21 23:30:00 15.90
#9 2015-03-31 23:36:00 15.82
#10 2015-03-31 23:42:00 15.74

how to transfer ts into data.frame?

> print( ts(as.character(seq(as.Date("2013-9-1"),length.out=30,by=1)), frequency = 7, start = c(1, 7)), calendar = TRUE)
p1 p2 p3 p4 p5 p6 p7
1 2013-09-01
2 2013-09-02 2013-09-03 2013-09-04 2013-09-05 2013-09-06 2013-09-07 2013-09-08
3 2013-09-09 2013-09-10 2013-09-11 2013-09-12 2013-09-13 2013-09-14 2013-09-15
4 2013-09-16 2013-09-17 2013-09-18 2013-09-19 2013-09-20 2013-09-21 2013-09-22
5 2013-09-23 2013-09-24 2013-09-25 2013-09-26 2013-09-27 2013-09-28 2013-09-29
6 2013-09-30
I want to get a data.frame from the ts as up and have two features:
1.rownames is 1 2 3 4 5 6
2.colnames is Mon Tue Wed Thu Fri Sat Sun
how can i get it ?
Mon Tue Wed Thu Fri Sat Sun
1 2013-09-01
2 2013-09-02 2013-09-03 2013-09-04 2013-09-05 2013-09-06 2013-09-07 2013-09-08
3 2013-09-09 2013-09-10 2013-09-11 2013-09-12 2013-09-13 2013-09-14 2013-09-15
4 2013-09-16 2013-09-17 2013-09-18 2013-09-19 2013-09-20 2013-09-21 2013-09-22
5 2013-09-23 2013-09-24 2013-09-25 2013-09-26 2013-09-27 2013-09-28 2013-09-29
6 2013-09-30
maybe it is the quickest way to transfer a data.frame from my code.
I would try something like this:
## Your daily time series data
out <- ts(as.character(seq(as.Date("2013-9-1"),
length.out = 30, by = 1)),
frequency = 7, start = c(1, 7))
## Comes in useful later
WD <- c("Monday", "Tuesday", "Wednesday", "Thursday",
"Friday", "Saturday", "Sunday")
## Create your data as a long data.frame
## Extract the weekdays using the weekdays function
out2 <- data.frame(weekday = weekdays(as.Date(as.character(out))), out)
## Use cumsum to determine the weeks. We'll start our weeks on Monday
out2$week <- cumsum(out2$weekday == "Monday")
## This is your new "long" dataset
head(out2)
# weekday out week
# 1 Sunday 2013-09-01 0
# 2 Monday 2013-09-02 1
# 3 Tuesday 2013-09-03 1
# 4 Wednesday 2013-09-04 1
# 5 Thursday 2013-09-05 1
# 6 Friday 2013-09-06 1
From there, it is pretty easy to "reshape" your data (either with base R's reshape, or more conveniently, with dcast from "reshape2").
library(reshape2)
dcast(out2, week ~ weekday, value.var="out", fill="")[WD]
# Monday Tuesday Wednesday Thursday Friday Saturday Sunday
# 1 2013-09-01
# 2 2013-09-02 2013-09-03 2013-09-04 2013-09-05 2013-09-06 2013-09-07 2013-09-08
# 3 2013-09-09 2013-09-10 2013-09-11 2013-09-12 2013-09-13 2013-09-14 2013-09-15
# 4 2013-09-16 2013-09-17 2013-09-18 2013-09-19 2013-09-20 2013-09-21 2013-09-22
# 5 2013-09-23 2013-09-24 2013-09-25 2013-09-26 2013-09-27 2013-09-28 2013-09-29
# 6 2013-09-30
This should work:
time.df<-data.frame(date=as.Date(c(time)))
time.df$day<-strftime(time.df$date,'%A')
time.df$year.week<-strftime(time.df$date,'%Y-%W') # Monday starts week.
# Just to avoid locale differences, get the names of the days of week in current locale.
dows<-strftime(seq(as.Date('2013-11-18'),(as.Date('2013-11-18')+6),by=1),'%A')
dow.order<-paste('date',dows,sep='.')
calendar<-reshape(time.df,idvar='year.week',timevar='day',direction='wide') [dow.order]
rownames(calendar)<-NULL
colnames(calendar)<-dows
calendar
# Monday Tuesday Wednesday Thursday Friday Saturday Sunday
# 1 <NA> <NA> <NA> <NA> <NA> <NA> 2013-09-01
# 2 2013-09-02 2013-09-03 2013-09-04 2013-09-05 2013-09-06 2013-09-07 2013-09-08
# 3 2013-09-09 2013-09-10 2013-09-11 2013-09-12 2013-09-13 2013-09-14 2013-09-15
# 4 2013-09-16 2013-09-17 2013-09-18 2013-09-19 2013-09-20 2013-09-21 2013-09-22
# 5 2013-09-23 2013-09-24 2013-09-25 2013-09-26 2013-09-27 2013-09-28 2013-09-29
# 6 2013-09-30 <NA> <NA> <NA> <NA> <NA> <NA>
But I wonder why you would ever need this.

Resources