how to transfer ts into data.frame? - r

> print( ts(as.character(seq(as.Date("2013-9-1"),length.out=30,by=1)), frequency = 7, start = c(1, 7)), calendar = TRUE)
p1 p2 p3 p4 p5 p6 p7
1 2013-09-01
2 2013-09-02 2013-09-03 2013-09-04 2013-09-05 2013-09-06 2013-09-07 2013-09-08
3 2013-09-09 2013-09-10 2013-09-11 2013-09-12 2013-09-13 2013-09-14 2013-09-15
4 2013-09-16 2013-09-17 2013-09-18 2013-09-19 2013-09-20 2013-09-21 2013-09-22
5 2013-09-23 2013-09-24 2013-09-25 2013-09-26 2013-09-27 2013-09-28 2013-09-29
6 2013-09-30
I want to get a data.frame from the ts as up and have two features:
1.rownames is 1 2 3 4 5 6
2.colnames is Mon Tue Wed Thu Fri Sat Sun
how can i get it ?
Mon Tue Wed Thu Fri Sat Sun
1 2013-09-01
2 2013-09-02 2013-09-03 2013-09-04 2013-09-05 2013-09-06 2013-09-07 2013-09-08
3 2013-09-09 2013-09-10 2013-09-11 2013-09-12 2013-09-13 2013-09-14 2013-09-15
4 2013-09-16 2013-09-17 2013-09-18 2013-09-19 2013-09-20 2013-09-21 2013-09-22
5 2013-09-23 2013-09-24 2013-09-25 2013-09-26 2013-09-27 2013-09-28 2013-09-29
6 2013-09-30
maybe it is the quickest way to transfer a data.frame from my code.

I would try something like this:
## Your daily time series data
out <- ts(as.character(seq(as.Date("2013-9-1"),
length.out = 30, by = 1)),
frequency = 7, start = c(1, 7))
## Comes in useful later
WD <- c("Monday", "Tuesday", "Wednesday", "Thursday",
"Friday", "Saturday", "Sunday")
## Create your data as a long data.frame
## Extract the weekdays using the weekdays function
out2 <- data.frame(weekday = weekdays(as.Date(as.character(out))), out)
## Use cumsum to determine the weeks. We'll start our weeks on Monday
out2$week <- cumsum(out2$weekday == "Monday")
## This is your new "long" dataset
head(out2)
# weekday out week
# 1 Sunday 2013-09-01 0
# 2 Monday 2013-09-02 1
# 3 Tuesday 2013-09-03 1
# 4 Wednesday 2013-09-04 1
# 5 Thursday 2013-09-05 1
# 6 Friday 2013-09-06 1
From there, it is pretty easy to "reshape" your data (either with base R's reshape, or more conveniently, with dcast from "reshape2").
library(reshape2)
dcast(out2, week ~ weekday, value.var="out", fill="")[WD]
# Monday Tuesday Wednesday Thursday Friday Saturday Sunday
# 1 2013-09-01
# 2 2013-09-02 2013-09-03 2013-09-04 2013-09-05 2013-09-06 2013-09-07 2013-09-08
# 3 2013-09-09 2013-09-10 2013-09-11 2013-09-12 2013-09-13 2013-09-14 2013-09-15
# 4 2013-09-16 2013-09-17 2013-09-18 2013-09-19 2013-09-20 2013-09-21 2013-09-22
# 5 2013-09-23 2013-09-24 2013-09-25 2013-09-26 2013-09-27 2013-09-28 2013-09-29
# 6 2013-09-30

This should work:
time.df<-data.frame(date=as.Date(c(time)))
time.df$day<-strftime(time.df$date,'%A')
time.df$year.week<-strftime(time.df$date,'%Y-%W') # Monday starts week.
# Just to avoid locale differences, get the names of the days of week in current locale.
dows<-strftime(seq(as.Date('2013-11-18'),(as.Date('2013-11-18')+6),by=1),'%A')
dow.order<-paste('date',dows,sep='.')
calendar<-reshape(time.df,idvar='year.week',timevar='day',direction='wide') [dow.order]
rownames(calendar)<-NULL
colnames(calendar)<-dows
calendar
# Monday Tuesday Wednesday Thursday Friday Saturday Sunday
# 1 <NA> <NA> <NA> <NA> <NA> <NA> 2013-09-01
# 2 2013-09-02 2013-09-03 2013-09-04 2013-09-05 2013-09-06 2013-09-07 2013-09-08
# 3 2013-09-09 2013-09-10 2013-09-11 2013-09-12 2013-09-13 2013-09-14 2013-09-15
# 4 2013-09-16 2013-09-17 2013-09-18 2013-09-19 2013-09-20 2013-09-21 2013-09-22
# 5 2013-09-23 2013-09-24 2013-09-25 2013-09-26 2013-09-27 2013-09-28 2013-09-29
# 6 2013-09-30 <NA> <NA> <NA> <NA> <NA> <NA>
But I wonder why you would ever need this.

Related

Find the value of a variable 30 days into future

Hi below is my r dataframe. It is a date excerpt from S&P500 data for the past 10 years or so. As you can see I have created a column called Date 30, which is the date + 30 days. I want to add a new column (using dplyr if I can) called Close30, which is the "Close" value on the date of "Date30" - I want to look into the future from a given date (obviously it wont work for the past 30 days...). Sort of like offsetting a column, but it needs a filter/lookup function, because the data is business days, and I need to add 30 calendar days - so I cannot do a constant offset - it needs to be a lookup.
I have tried a few things but an getting nowhere...
Thanks so much if you can help!!?
tidySP500 = na.omit(SP500_Raw) # remove NA in casefuture data have NAs
tidySP500$Date = AsDate(tidySP500$Date)
tidySP500 = tidySP500 %>%
select("Date", "Open", "High", "Low", "Price") %>% # select and re-order required variables
rename("Close" = "Price") %>%
filter(Date >= as.Date("2014-01-05") & Date <= (as.Date("2014-01-05")+100)) %>%
mutate(Date30 = Date + 30)# %>% #WORKS UP TO HERE
mutate(Close30 = Close[Date == Date30]) %>% # FAILS
mutate(Close30 = filter(Close, Date == Date30)) #FAILS
Date Open High Low Close Date30
1 2014-04-15 1831.45 1844.02 1816.29 1842.98 2014-05-15
2 2014-04-14 1818.18 1834.19 1815.80 1830.61 2014-05-14
3 2014-04-11 1830.65 1835.07 1814.36 1815.69 2014-05-11
4 2014-04-10 1872.28 1872.53 1830.87 1833.08 2014-05-10
5 2014-04-09 1852.64 1872.43 1852.38 1872.18 2014-05-09
6 2014-04-08 1845.48 1854.95 1837.49 1851.96 2014-05-08
7 2014-04-07 1863.92 1864.04 1841.48 1845.04 2014-05-07
8 2014-04-04 1890.25 1897.28 1863.26 1865.09 2014-05-04
9 2014-04-03 1891.43 1893.80 1882.65 1888.77 2014-05-03
10 2014-04-02 1886.61 1893.17 1883.79 1890.90 2014-05-02
11 2014-04-01 1873.96 1885.84 1873.96 1885.52 2014-05-01
12 2014-03-31 1859.16 1875.18 1859.16 1872.34 2014-04-30
13 2014-03-28 1850.07 1866.63 1850.07 1857.62 2014-04-27
14 2014-03-27 1852.11 1855.55 1842.11 1849.04 2014-04-26
15 2014-03-26 1867.09 1875.92 1852.56 1852.56 2014-04-25
16 2014-03-25 1859.48 1871.87 1855.96 1865.62 2014-04-24
17 2014-03-24 1867.67 1873.34 1849.69 1857.44 2014-04-23
18 2014-03-21 1874.53 1883.97 1863.46 1866.52 2014-04-20
19 2014-03-20 1860.09 1873.49 1854.63 1872.01 2014-04-19
20 2014-03-19 1872.25 1874.14 1850.35 1860.77 2014-04-18
21 2014-03-18 1858.92 1873.76 1858.92 1872.25 2014-04-17
22 2014-03-17 1842.81 1862.30 1842.81 1858.83 2014-04-16
23 2014-03-14 1845.07 1852.44 1839.57 1841.13 2014-04-13
24 2014-03-13 1869.06 1874.40 1841.86 1846.34 2014-04-12
25 2014-03-12 1866.15 1868.38 1854.38 1868.20 2014-04-11
26 2014-03-11 1878.26 1882.35 1863.88 1867.63 2014-04-10
27 2014-03-10 1877.86 1877.87 1867.04 1877.17 2014-04-09
28 2014-03-07 1878.52 1883.57 1870.56 1878.04 2014-04-06
29 2014-03-06 1874.18 1881.94 1874.18 1877.03 2014-04-05
30 2014-03-05 1874.05 1876.53 1871.11 1873.81 2014-04-04
31 2014-03-04 1849.23 1876.23 1849.23 1873.91 2014-04-03
32 2014-03-03 1857.68 1857.68 1834.44 1845.73 2014-04-02
33 2014-02-28 1855.12 1867.92 1847.67 1859.45 2014-03-30
34 2014-02-27 1844.90 1854.53 1841.13 1854.29 2014-03-29
35 2014-02-26 1845.79 1852.65 1840.66 1845.16 2014-03-28
36 2014-02-25 1847.66 1852.91 1840.19 1845.12 2014-03-27
37 2014-02-24 1836.78 1858.71 1836.78 1847.61 2014-03-26
38 2014-02-21 1841.07 1846.13 1835.60 1836.25 2014-03-23
39 2014-02-20 1829.24 1842.79 1824.58 1839.78 2014-03-22
40 2014-02-19 1838.90 1847.50 1826.99 1828.75 2014-03-21
41 2014-02-18 1839.03 1842.87 1835.01 1840.76 2014-03-20
42 2014-02-14 1828.46 1841.65 1825.59 1838.63 2014-03-16
43 2014-02-13 1814.82 1830.25 1809.22 1829.83 2014-03-15
44 2014-02-12 1820.12 1826.55 1815.97 1819.26 2014-03-14
45 2014-02-11 1800.45 1823.54 1800.41 1819.75 2014-03-13
46 2014-02-10 1796.20 1799.94 1791.83 1799.84 2014-03-12
47 2014-02-07 1776.01 1798.03 1776.01 1797.02 2014-03-09
48 2014-02-06 1752.99 1774.06 1752.99 1773.43 2014-03-08
49 2014-02-05 1753.38 1755.79 1737.92 1751.64 2014-03-07
50 2014-02-04 1743.82 1758.73 1743.82 1755.20 2014-03-06
51 2014-02-03 1782.68 1784.83 1739.66 1741.89 2014-03-05
52 2014-01-31 1790.88 1793.88 1772.26 1782.59 2014-03-02
53 2014-01-30 1777.17 1798.77 1777.17 1794.19 2014-03-01
54 2014-01-29 1790.15 1790.15 1770.45 1774.20 2014-02-28
55 2014-01-28 1783.00 1793.87 1779.49 1792.50 2014-02-27
56 2014-01-27 1791.03 1795.98 1772.88 1781.56 2014-02-26
57 2014-01-24 1826.96 1826.96 1790.29 1790.29 2014-02-23
58 2014-01-23 1842.29 1842.29 1820.06 1828.46 2014-02-22
59 2014-01-22 1844.71 1846.87 1840.88 1844.86 2014-02-21
60 2014-01-21 1841.05 1849.31 1832.38 1843.80 2014-02-20
61 2014-01-17 1844.23 1846.04 1835.23 1838.70 2014-02-16
62 2014-01-16 1847.99 1847.99 1840.30 1845.89 2014-02-15
63 2014-01-15 1840.52 1850.84 1840.52 1848.38 2014-02-14
64 2014-01-14 1821.36 1839.26 1821.36 1838.88 2014-02-13
65 2014-01-13 1841.26 1843.45 1815.52 1819.20 2014-02-12
66 2014-01-10 1840.06 1843.15 1832.43 1842.37 2014-02-09
67 2014-01-09 1839.00 1843.23 1830.38 1838.13 2014-02-08
68 2014-01-08 1837.90 1840.02 1831.40 1837.49 2014-02-07
69 2014-01-07 1828.71 1840.10 1828.71 1837.88 2014-02-06
70 2014-01-06 1832.31 1837.16 1823.73 1826.77 2014-02-05
Something like this?
library(tidyverse)
tidySP500 %>% left_join(select(tidySP500, Close, Date30 = Date), by = c('Date30'))
#> # A tibble: 70 x 7
#> Date Open High Low Close.x Date30 Close.y
#> <date> <dbl> <dbl> <dbl> <dbl> <date> <dbl>
#> 1 2014-04-15 1831. 1844. 1816. 1843. 2014-05-15 NA
#> 2 2014-04-14 1818. 1834. 1816. 1831. 2014-05-14 NA
#> 3 2014-04-11 1831. 1835. 1814. 1816. 2014-05-11 NA
#> 4 2014-04-10 1872. 1873. 1831. 1833. 2014-05-10 NA
#> 5 2014-04-09 1853. 1872. 1852. 1872. 2014-05-09 NA
#> 6 2014-04-08 1845. 1855. 1837. 1852. 2014-05-08 NA
#> 7 2014-04-07 1864. 1864. 1841. 1845. 2014-05-07 NA
#> 8 2014-04-04 1890. 1897. 1863. 1865. 2014-05-04 NA
#> 9 2014-04-03 1891. 1894. 1883. 1889. 2014-05-03 NA
#> 10 2014-04-02 1887. 1893. 1884. 1891. 2014-05-02 NA
#> # … with 60 more rows
Created on 2020-02-22 by the reprex package (v0.3.0)
DATA
tidySP500 <- read.so::read_so('Date Open High Low Close Date30
1 2014-04-15 1831.45 1844.02 1816.29 1842.98 2014-05-15
2 2014-04-14 1818.18 1834.19 1815.80 1830.61 2014-05-14
3 2014-04-11 1830.65 1835.07 1814.36 1815.69 2014-05-11
4 2014-04-10 1872.28 1872.53 1830.87 1833.08 2014-05-10
5 2014-04-09 1852.64 1872.43 1852.38 1872.18 2014-05-09
6 2014-04-08 1845.48 1854.95 1837.49 1851.96 2014-05-08
7 2014-04-07 1863.92 1864.04 1841.48 1845.04 2014-05-07
8 2014-04-04 1890.25 1897.28 1863.26 1865.09 2014-05-04
9 2014-04-03 1891.43 1893.80 1882.65 1888.77 2014-05-03
10 2014-04-02 1886.61 1893.17 1883.79 1890.90 2014-05-02
11 2014-04-01 1873.96 1885.84 1873.96 1885.52 2014-05-01
12 2014-03-31 1859.16 1875.18 1859.16 1872.34 2014-04-30
13 2014-03-28 1850.07 1866.63 1850.07 1857.62 2014-04-27
14 2014-03-27 1852.11 1855.55 1842.11 1849.04 2014-04-26
15 2014-03-26 1867.09 1875.92 1852.56 1852.56 2014-04-25
16 2014-03-25 1859.48 1871.87 1855.96 1865.62 2014-04-24
17 2014-03-24 1867.67 1873.34 1849.69 1857.44 2014-04-23
18 2014-03-21 1874.53 1883.97 1863.46 1866.52 2014-04-20
19 2014-03-20 1860.09 1873.49 1854.63 1872.01 2014-04-19
20 2014-03-19 1872.25 1874.14 1850.35 1860.77 2014-04-18
21 2014-03-18 1858.92 1873.76 1858.92 1872.25 2014-04-17
22 2014-03-17 1842.81 1862.30 1842.81 1858.83 2014-04-16
23 2014-03-14 1845.07 1852.44 1839.57 1841.13 2014-04-13
24 2014-03-13 1869.06 1874.40 1841.86 1846.34 2014-04-12
25 2014-03-12 1866.15 1868.38 1854.38 1868.20 2014-04-11
26 2014-03-11 1878.26 1882.35 1863.88 1867.63 2014-04-10
27 2014-03-10 1877.86 1877.87 1867.04 1877.17 2014-04-09
28 2014-03-07 1878.52 1883.57 1870.56 1878.04 2014-04-06
29 2014-03-06 1874.18 1881.94 1874.18 1877.03 2014-04-05
30 2014-03-05 1874.05 1876.53 1871.11 1873.81 2014-04-04
31 2014-03-04 1849.23 1876.23 1849.23 1873.91 2014-04-03
32 2014-03-03 1857.68 1857.68 1834.44 1845.73 2014-04-02
33 2014-02-28 1855.12 1867.92 1847.67 1859.45 2014-03-30
34 2014-02-27 1844.90 1854.53 1841.13 1854.29 2014-03-29
35 2014-02-26 1845.79 1852.65 1840.66 1845.16 2014-03-28
36 2014-02-25 1847.66 1852.91 1840.19 1845.12 2014-03-27
37 2014-02-24 1836.78 1858.71 1836.78 1847.61 2014-03-26
38 2014-02-21 1841.07 1846.13 1835.60 1836.25 2014-03-23
39 2014-02-20 1829.24 1842.79 1824.58 1839.78 2014-03-22
40 2014-02-19 1838.90 1847.50 1826.99 1828.75 2014-03-21
41 2014-02-18 1839.03 1842.87 1835.01 1840.76 2014-03-20
42 2014-02-14 1828.46 1841.65 1825.59 1838.63 2014-03-16
43 2014-02-13 1814.82 1830.25 1809.22 1829.83 2014-03-15
44 2014-02-12 1820.12 1826.55 1815.97 1819.26 2014-03-14
45 2014-02-11 1800.45 1823.54 1800.41 1819.75 2014-03-13
46 2014-02-10 1796.20 1799.94 1791.83 1799.84 2014-03-12
47 2014-02-07 1776.01 1798.03 1776.01 1797.02 2014-03-09
48 2014-02-06 1752.99 1774.06 1752.99 1773.43 2014-03-08
49 2014-02-05 1753.38 1755.79 1737.92 1751.64 2014-03-07
50 2014-02-04 1743.82 1758.73 1743.82 1755.20 2014-03-06
51 2014-02-03 1782.68 1784.83 1739.66 1741.89 2014-03-05
52 2014-01-31 1790.88 1793.88 1772.26 1782.59 2014-03-02
53 2014-01-30 1777.17 1798.77 1777.17 1794.19 2014-03-01
54 2014-01-29 1790.15 1790.15 1770.45 1774.20 2014-02-28
55 2014-01-28 1783.00 1793.87 1779.49 1792.50 2014-02-27
56 2014-01-27 1791.03 1795.98 1772.88 1781.56 2014-02-26
57 2014-01-24 1826.96 1826.96 1790.29 1790.29 2014-02-23
58 2014-01-23 1842.29 1842.29 1820.06 1828.46 2014-02-22
59 2014-01-22 1844.71 1846.87 1840.88 1844.86 2014-02-21
60 2014-01-21 1841.05 1849.31 1832.38 1843.80 2014-02-20
61 2014-01-17 1844.23 1846.04 1835.23 1838.70 2014-02-16
62 2014-01-16 1847.99 1847.99 1840.30 1845.89 2014-02-15
63 2014-01-15 1840.52 1850.84 1840.52 1848.38 2014-02-14
64 2014-01-14 1821.36 1839.26 1821.36 1838.88 2014-02-13
65 2014-01-13 1841.26 1843.45 1815.52 1819.20 2014-02-12
66 2014-01-10 1840.06 1843.15 1832.43 1842.37 2014-02-09
67 2014-01-09 1839.00 1843.23 1830.38 1838.13 2014-02-08
68 2014-01-08 1837.90 1840.02 1831.40 1837.49 2014-02-07
69 2014-01-07 1828.71 1840.10 1828.71 1837.88 2014-02-06
70 2014-01-06 1832.31 1837.16 1823.73 1826.77 2014-02-05')

Fill Missing Interval Values in r

I have a data with 4 variables, for which 2 of them are date variables. I would like to check whether the intervals for rows with TYPE == “OT” or TYPE == “NON-OT” fall within the interval of the preceding row with TYPE == “ICU”.
Data:
df <- structure(list(id = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1), TYPE = c("NON-OT", "NON-OT", "OT", "ICU", "OT",
"NON-OT", "OT", "NON-OT", "ICU", "OT", "OT", "ICU", "OT", "OT",
"NON-OT", "OT", "NON-OT"), DATE1 = structure(c(1427214540, 1427216280,
1427279700, 1427370420, 1427543700, 1427564520, 1427800800, 1427849280,
1427850240, 1427927400, 1428155400, 1428166380, 1428514500, 1428927000,
1429167600, 1429264500, 1429388160), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), DATE2 = structure(c(1427216280, 1427370420,
1427279700, 1427564520, 1427543700, 1427849280, 1427800800, 1427850240,
1428166380, 1427927400, 1428155400, 1429388160, 1428514500, 1428927000,
1429167600, 1429264500, 1430362020), class = c("POSIXct", "POSIXt"
), tzone = "UTC")), .Names = c("id", "TYPE", "DATE1", "DATE2"
), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-17L))
# id TYPE DATE1 DATE2
# 1 1 NON-OT 2015-03-24 16:29:00 2015-03-24 16:58:00
# 2 1 NON-OT 2015-03-24 16:58:00 2015-03-26 11:47:00
# 3 1 OT 2015-03-25 10:35:00 2015-03-25 10:35:00
# 4 1 ICU 2015-03-26 11:47:00 2015-03-28 17:42:00
# 5 1 OT 2015-03-28 11:55:00 2015-03-28 11:55:00
# 6 1 NON-OT 2015-03-28 17:42:00 2015-04-01 00:48:00
# 7 1 OT 2015-03-31 11:20:00 2015-03-31 11:20:00
# 8 1 NON-OT 2015-04-01 00:48:00 2015-04-01 01:04:00
# 9 1 ICU 2015-04-01 01:04:00 2015-04-04 16:53:00
# 10 1 OT 2015-04-01 22:30:00 2015-04-01 22:30:00
# 11 1 OT 2015-04-04 13:50:00 2015-04-04 13:50:00
# 12 1 ICU 2015-04-04 16:53:00 2015-04-18 20:16:00
# 13 1 OT 2015-04-08 17:35:00 2015-04-08 17:35:00
# 14 1 OT 2015-04-13 12:10:00 2015-04-13 12:10:00
# 15 1 NON-OT 2015-04-16 07:00:00 2015-04-16 07:00:00
# 16 1 OT 2015-04-17 09:55:00 2015-04-17 09:55:00
# 17 1 NON-OT 2015-04-18 20:16:00 2015-04-30 02:47:00
This is what I have done:
Obtain a new variable, INT that gives the interval between DATE1 and DATE2 for every row.
Obtain another variable, INT_ICU that gives the interval for rows with TYPE == “ICU” only and fill down (This is where the problem comes as the fill function in tidyr could not fill in the missing interval values.)
Obtain a logical variable, WITHIN_ICU, which gives TRUE if the interval is within the interval of ICU and FALSE otherwise.
Code:
library(tidyverse)
df %>%
mutate(INT = interval(DATE1, DATE2),
INT_ICU = if_else(TYPE == "ICU", interval(DATE1, DATE2), NA_real_)) %>%
fill(INT_ICU) %>%
mutate(WITHIN_ICU = INT %within% INT_ICU)
Output:
As you can see, there are a lot of missing values in INT_ICU variables even when I have applied fill function.
# id TYPE DATE1 DATE2 INT INT_ICU WITHIN_ICU
# <dbl> <chr> <dttm> <dttm> <S4: Interval> <S4: Interval> <lgl>
# 1 1 NON-OT 2015-03-24 16:29:00 2015-03-24 16:58:00 2015-03-24 16:29:00 UTC--2015-03-24 16:58:00 UTC NA--NA NA
# 2 1 NON-OT 2015-03-24 16:58:00 2015-03-26 11:47:00 2015-03-24 16:58:00 UTC--2015-03-26 11:47:00 UTC NA--NA NA
# 3 1 OT 2015-03-25 10:35:00 2015-03-25 10:35:00 2015-03-25 10:35:00 UTC--2015-03-25 10:35:00 UTC NA--NA NA
# 4 1 ICU 2015-03-26 11:47:00 2015-03-28 17:42:00 2015-03-26 11:47:00 UTC--2015-03-28 17:42:00 UTC 2015-03-26 11:47:00 UTC--2015-03-28 17:42:00 UTC TRUE
# 5 1 OT 2015-03-28 11:55:00 2015-03-28 11:55:00 2015-03-28 11:55:00 UTC--2015-03-28 11:55:00 UTC NA--NA NA
# 6 1 NON-OT 2015-03-28 17:42:00 2015-04-01 00:48:00 2015-03-28 17:42:00 UTC--2015-04-01 00:48:00 UTC NA--NA NA
# 7 1 OT 2015-03-31 11:20:00 2015-03-31 11:20:00 2015-03-31 11:20:00 UTC--2015-03-31 11:20:00 UTC NA--NA NA
# 8 1 NON-OT 2015-04-01 00:48:00 2015-04-01 01:04:00 2015-04-01 00:48:00 UTC--2015-04-01 01:04:00 UTC NA--NA NA
# 9 1 ICU 2015-04-01 01:04:00 2015-04-04 16:53:00 2015-04-01 01:04:00 UTC--2015-04-04 16:53:00 UTC 2015-04-01 01:04:00 UTC--2015-04-04 16:53:00 UTC TRUE
# 10 1 OT 2015-04-01 22:30:00 2015-04-01 22:30:00 2015-04-01 22:30:00 UTC--2015-04-01 22:30:00 UTC NA--NA NA
# 11 1 OT 2015-04-04 13:50:00 2015-04-04 13:50:00 2015-04-04 13:50:00 UTC--2015-04-04 13:50:00 UTC NA--NA NA
# 12 1 ICU 2015-04-04 16:53:00 2015-04-18 20:16:00 2015-04-04 16:53:00 UTC--2015-04-18 20:16:00 UTC 2015-04-04 16:53:00 UTC--2015-04-18 20:16:00 UTC TRUE
# 13 1 OT 2015-04-08 17:35:00 2015-04-08 17:35:00 2015-04-08 17:35:00 UTC--2015-04-08 17:35:00 UTC NA--NA NA
# 14 1 OT 2015-04-13 12:10:00 2015-04-13 12:10:00 2015-04-13 12:10:00 UTC--2015-04-13 12:10:00 UTC NA--NA NA
# 15 1 NON-OT 2015-04-16 07:00:00 2015-04-16 07:00:00 2015-04-16 07:00:00 UTC--2015-04-16 07:00:00 UTC NA--NA NA
# 16 1 OT 2015-04-17 09:55:00 2015-04-17 09:55:00 2015-04-17 09:55:00 UTC--2015-04-17 09:55:00 UTC NA--NA NA
# 17 1 NON-OT 2015-04-18 20:16:00 2015-04-30 02:47:00 2015-04-18 20:16:00 UTC--2015-04-30 02:47:00 UTC NA--NA NA
Desired Output:
# id TYPE DATE1 DATE2 WITHIN_ICU
# <dbl> <chr> <dttm> <dttm> <lgl>
# 1 1 NON-OT 2015-03-24 16:29:00 2015-03-24 16:58:00 NA
# 2 1 NON-OT 2015-03-24 16:58:00 2015-03-26 11:47:00 NA
# 3 1 OT 2015-03-25 10:35:00 2015-03-25 10:35:00 NA
# 4 1 ICU 2015-03-26 11:47:00 2015-03-28 17:42:00 TRUE
# 5 1 OT 2015-03-28 11:55:00 2015-03-28 11:55:00 TRUE
# 6 1 NON-OT 2015-03-28 17:42:00 2015-04-01 00:48:00 FALSE
# 7 1 OT 2015-03-31 11:20:00 2015-03-31 11:20:00 FALSE
# 8 1 NON-OT 2015-04-01 00:48:00 2015-04-01 01:04:00 FALSE
# 9 1 ICU 2015-04-01 01:04:00 2015-04-04 16:53:00 TRUE
# 10 1 OT 2015-04-01 22:30:00 2015-04-01 22:30:00 TRUE
# 11 1 OT 2015-04-04 13:50:00 2015-04-04 13:50:00 TRUE
# 12 1 ICU 2015-04-04 16:53:00 2015-04-18 20:16:00 TRUE
# 13 1 OT 2015-04-08 17:35:00 2015-04-08 17:35:00 TRUE
# 14 1 OT 2015-04-13 12:10:00 2015-04-13 12:10:00 TRUE
# 15 1 NON-OT 2015-04-16 07:00:00 2015-04-16 07:00:00 TRUE
# 16 1 OT 2015-04-17 09:55:00 2015-04-17 09:55:00 TRUE
# 17 1 NON-OT 2015-04-18 20:16:00 2015-04-30 02:47:00 FALSE
This should work
# use own function to fill rather than using dplyr's fill
f2 <- function(x) {
for(i in seq_along(x)[-1]) if(is.na(x#start[i])) x[i] <- x[i-1]#check if Start in S4 interval object is NA.
x
}
df %>%
mutate(INT = interval(DATE1, DATE2),
INT_ICU = if_else(TYPE == "ICU", interval(DATE1, DATE2), NA_real_)) %>%
mutate(INT_ICU = f2(t$INT_ICU)) %>% #instead of fill
mutate(WITHIN_ICU = INT %within% INT_ICU)
The output:
# A tibble: 17 x 6
id TYPE DATE1 DATE2 INT_ICU WITHIN_ICU
<dbl> <chr> <dttm> <dttm> <S4: Interval> <lgl>
1 1. NON-OT 2015-03-24 16:29:00 2015-03-24 16:58:00 NA--NA NA
2 1. NON-OT 2015-03-24 16:58:00 2015-03-26 11:47:00 NA--NA NA
3 1. OT 2015-03-25 10:35:00 2015-03-25 10:35:00 NA--NA NA
4 1. ICU 2015-03-26 11:47:00 2015-03-28 17:42:00 2015-03-26 11:47:00 UTC--2015-03-28 17:42:00 UTC TRUE
5 1. OT 2015-03-28 11:55:00 2015-03-28 11:55:00 2015-03-26 11:47:00 UTC--2015-03-28 17:42:00 UTC TRUE
6 1. NON-OT 2015-03-28 17:42:00 2015-04-01 00:48:00 2015-03-26 11:47:00 UTC--2015-03-28 17:42:00 UTC FALSE
7 1. OT 2015-03-31 11:20:00 2015-03-31 11:20:00 2015-03-26 11:47:00 UTC--2015-03-28 17:42:00 UTC FALSE
8 1. NON-OT 2015-04-01 00:48:00 2015-04-01 01:04:00 2015-03-26 11:47:00 UTC--2015-03-28 17:42:00 UTC FALSE
9 1. ICU 2015-04-01 01:04:00 2015-04-04 16:53:00 2015-04-01 01:04:00 UTC--2015-04-04 16:53:00 UTC TRUE
10 1. OT 2015-04-01 22:30:00 2015-04-01 22:30:00 2015-04-01 01:04:00 UTC--2015-04-04 16:53:00 UTC TRUE
11 1. OT 2015-04-04 13:50:00 2015-04-04 13:50:00 2015-04-01 01:04:00 UTC--2015-04-04 16:53:00 UTC TRUE
12 1. ICU 2015-04-04 16:53:00 2015-04-18 20:16:00 2015-04-04 16:53:00 UTC--2015-04-18 20:16:00 UTC TRUE
13 1. OT 2015-04-08 17:35:00 2015-04-08 17:35:00 2015-04-04 16:53:00 UTC--2015-04-18 20:16:00 UTC TRUE
14 1. OT 2015-04-13 12:10:00 2015-04-13 12:10:00 2015-04-04 16:53:00 UTC--2015-04-18 20:16:00 UTC TRUE
15 1. NON-OT 2015-04-16 07:00:00 2015-04-16 07:00:00 2015-04-04 16:53:00 UTC--2015-04-18 20:16:00 UTC TRUE
16 1. OT 2015-04-17 09:55:00 2015-04-17 09:55:00 2015-04-04 16:53:00 UTC--2015-04-18 20:16:00 UTC TRUE
17 1. NON-OT 2015-04-18 20:16:00 2015-04-30 02:47:00 2015-04-04 16:53:00 UTC--2015-04-18 20:16:00 UTC FALSE

Aggregate on a daily basis in R

I'm borrowing the reproducible example given here:
Aggregate daily level data to weekly level in R
since it's pretty much close to what I want to do.
Interval value
1 2012-06-10 552
2 2012-06-11 4850
3 2012-06-12 4642
4 2012-06-13 4132
5 2012-06-14 4190
6 2012-06-15 4186
7 2012-06-16 1139
8 2012-06-17 490
9 2012-06-18 5156
10 2012-06-19 4430
11 2012-06-20 4447
12 2012-06-21 4256
13 2012-06-22 3856
14 2012-06-23 1163
15 2012-06-24 564
16 2012-06-25 4866
17 2012-06-26 4421
18 2012-06-27 4206
19 2012-06-28 4272
20 2012-06-29 3993
21 2012-06-30 1211
22 2012-07-01 698
23 2012-07-02 5770
24 2012-07-03 5103
25 2012-07-04 775
26 2012-07-05 5140
27 2012-07-06 4868
28 2012-07-07 1225
29 2012-07-08 671
30 2012-07-09 5726
31 2012-07-10 5176
In his question, he asks to aggregate on weekly intervals, what I'd like to do is aggregate on a "day of the week basis".
So I'd like to have a table similar to that one, adding the values of all the same day of the week:
Day of the week value
1 "Sunday" 60000
2 "Monday" 50000
3 "Tuesday" 60000
4 "Wednesday" 50000
5 "Thursday" 60000
6 "Friday" 50000
7 "Saturday" 60000
You can try:
aggregate(d$value, list(weekdays(as.Date(d$Interval))), sum)
We can group them by weekly intervals using weekdays :
library(dplyr)
df %>%
group_by(Day_Of_The_Week = weekdays(as.Date(Interval))) %>%
summarise(value = sum(value))
# Day_Of_The_Week value
# <chr> <int>
#1 Friday 16903
#2 Monday 26368
#3 Saturday 4738
#4 Sunday 2975
#5 Thursday 17858
#6 Tuesday 23772
#7 Wednesday 13560
We can do this with data.table
library(data.table)
setDT(df1)[, .(value = sum(value)), .(Dayofweek = weekdays(as.Date(Interval)))]
# Dayofweek value
#1: Sunday 2975
#2: Monday 26368
#3: Tuesday 23772
#4: Wednesday 13560
#5: Thursday 17858
#6: Friday 16903
#7: Saturday 4738
using lubridate https://cran.r-project.org/web/packages/lubridate/vignettes/lubridate.html
df1$Weekday=wday(arrive,label=TRUE)
library(data.table)
df1=data.table(df1)
df1[,sum(value),Weekday]

cut by interval and aggregate over one month in R

I have the given data - all bike trips that started from a particular station over the month of October 2013. I'd like to count the amount of trips that occurred within ten-minute time intervals. There should be a total of 144 rows with a sum of all of the trips that occurred within that interval for the entire month. How would one cut the data.frame and then aggregate by interval (so that trips occurring between 00:00:01 and 00:10:00 are counted in the second row, between 00:10:01 and 00:20:00 are counted in the third row, and so on...)?
head(one.station)
tripduration starttime stoptime start.station.id start.station.name
59 803 2013-10-01 00:11:49 2013-10-01 00:25:12 521 8 Ave & W 31 St
208 445 2013-10-01 00:40:05 2013-10-01 00:47:30 521 8 Ave & W 31 St
359 643 2013-10-01 01:25:57 2013-10-01 01:36:40 521 8 Ave & W 31 St
635 388 2013-10-01 05:30:30 2013-10-01 05:36:58 521 8 Ave & W 31 St
661 314 2013-10-01 05:38:00 2013-10-01 05:43:14 521 8 Ave & W 31 St
768 477 2013-10-01 05:54:49 2013-10-01 06:02:46 521 8 Ave & W 31 St
start.station.latitude start.station.longitude end.station.id end.station.name
59 40.75045 -73.99481 2003 1 Ave & E 18 St
208 40.75045 -73.99481 505 6 Ave & W 33 St
359 40.75045 -73.99481 508 W 46 St & 11 Ave
635 40.75045 -73.99481 459 W 20 St & 11 Ave
661 40.75045 -73.99481 462 W 22 St & 10 Ave
768 40.75045 -73.99481 457 Broadway & W 58 St
end.station.latitude end.station.longitude bikeid usertype birth.year gender
59 40.73416 -73.98024 15139 Subscriber 1985 1
208 40.74901 -73.98848 20538 Subscriber 1990 2
359 40.76341 -73.99667 19935 Customer \\N 0
635 40.74674 -74.00776 14781 Subscriber 1955 1
661 40.74692 -74.00452 17976 Subscriber 1982 1
768 40.76695 -73.98169 19022 Subscriber 1973 1
So that the output looks like this
output
interval total_trips
1 00:00:00 0
2 00:10:00 1
3 00:20:00 2
4 00:30:00 3
5 00:40:00 4
Here it is using only start time:
library(lubridate)
library(dplyr)
tripduration <- floor(runif(6) * 1000)
start_times <- as.POSIXlt(
c("2013-10-01 00:11:49"
,"2013-10-01 00:40:05"
,"2013-10-01 01:25:57"
,"2013-10-01 05:30:30"
,"2013-10-01 05:38:00"
,"2013-10-01 05:54:49")
)
time_bucket <- start_times - minutes(minute(start_times) %% 10) - seconds(second(start_times))
df <- data.frame(tripduration, start_times, time_bucket)
summarized <- df %>%
group_by(time_bucket) %>%
summarize(trip_count = n())
summarized <- as.data.frame(summarized)
out_buckets <- data.frame(out_buckets = seq(as.POSIXlt("2013-10-01 00:00:00"), as.POSIXct("2013-10-01 06:0:00"), by = 600))
out <- left_join(out_buckets, summarized, by = c("out_buckets" = "time_bucket"))
out$trip_count[is.na(out$trip_count)] <- 0
out
out_buckets trip_count
1 2013-10-01 00:00:00 0
2 2013-10-01 00:10:00 1
3 2013-10-01 00:20:00 0
4 2013-10-01 00:30:00 0
5 2013-10-01 00:40:00 1
6 2013-10-01 00:50:00 0
7 2013-10-01 01:00:00 0
8 2013-10-01 01:10:00 0
9 2013-10-01 01:20:00 1
10 2013-10-01 01:30:00 0
11 2013-10-01 01:40:00 0
12 2013-10-01 01:50:00 0
13 2013-10-01 02:00:00 0
14 2013-10-01 02:10:00 0
15 2013-10-01 02:20:00 0
16 2013-10-01 02:30:00 0
17 2013-10-01 02:40:00 0
18 2013-10-01 02:50:00 0
19 2013-10-01 03:00:00 0
20 2013-10-01 03:10:00 0
21 2013-10-01 03:20:00 0
22 2013-10-01 03:30:00 0
23 2013-10-01 03:40:00 0
24 2013-10-01 03:50:00 0
25 2013-10-01 04:00:00 0
26 2013-10-01 04:10:00 0
27 2013-10-01 04:20:00 0
28 2013-10-01 04:30:00 0
29 2013-10-01 04:40:00 0
30 2013-10-01 04:50:00 0
31 2013-10-01 05:00:00 0
32 2013-10-01 05:10:00 0
33 2013-10-01 05:20:00 0
34 2013-10-01 05:30:00 2
35 2013-10-01 05:40:00 0
36 2013-10-01 05:50:00 1
37 2013-10-01 06:00:00 0
The lubridate library can provide one solution. It has a nice function for interval overlap logic. The below uses lapply to loop through the intervals provided in the data then buckets them accordingly.
library(lubridate)
start_times <- as.POSIXlt(
c("2013-10-01 00:11:49"
,"2013-10-01 00:40:05"
,"2013-10-01 01:25:57"
,"2013-10-01 05:30:30"
,"2013-10-01 05:38:00"
,"2013-10-01 05:54:49")
)
stop_times <- as.POSIXlt(
c("2013-10-01 00:25:12"
,"2013-10-01 00:47:30"
,"2013-10-01 01:36:40"
,"2013-10-01 05:36:58"
,"2013-10-01 05:43:14"
,"2013-10-01 06:02:46")
)
start_bucket <- seq(as.POSIXct("2013-10-01 00:00:00"), as.POSIXct("2013-10-01 06:0:00"), by = 600)
end_bucket <- start_bucket + 600
bucket_interval <- interval(start_bucket, end_bucket)
data_interval <- interval(start_times, stop_times)
int_list <- lapply(data_interval, function(x) ifelse(int_overlaps(x, bucket_interval),1,0))
rides_per_bucket <- rowSums(do.call(cbind, int_list))
out_df <- data.frame(bucket_interval, rides_per_bucket)
out_df
bucket_interval rides_per_bucket
1 2013-10-01 00:00:00 PDT--2013-10-01 00:10:00 PDT 0
2 2013-10-01 00:10:00 PDT--2013-10-01 00:20:00 PDT 1
3 2013-10-01 00:20:00 PDT--2013-10-01 00:30:00 PDT 1
4 2013-10-01 00:30:00 PDT--2013-10-01 00:40:00 PDT 0
5 2013-10-01 00:40:00 PDT--2013-10-01 00:50:00 PDT 1
6 2013-10-01 00:50:00 PDT--2013-10-01 01:00:00 PDT 0
7 2013-10-01 01:00:00 PDT--2013-10-01 01:10:00 PDT 0
8 2013-10-01 01:10:00 PDT--2013-10-01 01:20:00 PDT 0
9 2013-10-01 01:20:00 PDT--2013-10-01 01:30:00 PDT 1
10 2013-10-01 01:30:00 PDT--2013-10-01 01:40:00 PDT 1
11 2013-10-01 01:40:00 PDT--2013-10-01 01:50:00 PDT 0
12 2013-10-01 01:50:00 PDT--2013-10-01 02:00:00 PDT 0
13 2013-10-01 02:00:00 PDT--2013-10-01 02:10:00 PDT 0
14 2013-10-01 02:10:00 PDT--2013-10-01 02:20:00 PDT 0
15 2013-10-01 02:20:00 PDT--2013-10-01 02:30:00 PDT 0
16 2013-10-01 02:30:00 PDT--2013-10-01 02:40:00 PDT 0
17 2013-10-01 02:40:00 PDT--2013-10-01 02:50:00 PDT 0
18 2013-10-01 02:50:00 PDT--2013-10-01 03:00:00 PDT 0
19 2013-10-01 03:00:00 PDT--2013-10-01 03:10:00 PDT 0
20 2013-10-01 03:10:00 PDT--2013-10-01 03:20:00 PDT 0
21 2013-10-01 03:20:00 PDT--2013-10-01 03:30:00 PDT 0
22 2013-10-01 03:30:00 PDT--2013-10-01 03:40:00 PDT 0
23 2013-10-01 03:40:00 PDT--2013-10-01 03:50:00 PDT 0
24 2013-10-01 03:50:00 PDT--2013-10-01 04:00:00 PDT 0
25 2013-10-01 04:00:00 PDT--2013-10-01 04:10:00 PDT 0
26 2013-10-01 04:10:00 PDT--2013-10-01 04:20:00 PDT 0
27 2013-10-01 04:20:00 PDT--2013-10-01 04:30:00 PDT 0
28 2013-10-01 04:30:00 PDT--2013-10-01 04:40:00 PDT 0
29 2013-10-01 04:40:00 PDT--2013-10-01 04:50:00 PDT 0
30 2013-10-01 04:50:00 PDT--2013-10-01 05:00:00 PDT 0
31 2013-10-01 05:00:00 PDT--2013-10-01 05:10:00 PDT 0
32 2013-10-01 05:10:00 PDT--2013-10-01 05:20:00 PDT 0
33 2013-10-01 05:20:00 PDT--2013-10-01 05:30:00 PDT 0
34 2013-10-01 05:30:00 PDT--2013-10-01 05:40:00 PDT 2
35 2013-10-01 05:40:00 PDT--2013-10-01 05:50:00 PDT 1
36 2013-10-01 05:50:00 PDT--2013-10-01 06:00:00 PDT 1
37 2013-10-01 06:00:00 PDT--2013-10-01 06:10:00 PDT 1

R: find a range of values and replace them as NA in a cvs file

My data collected from field measurements are a bit messy and they need some clean up before further calculation.
Can you please give me an example in R to find a range of values and replace them as NA in a cvs file?
The reasonable values should between 1800 and 2200 or NA. Any values out of this range should be replaced as NA.
The dataset looks like this:
timestamp tr ts
1 2015-07-08 02:29:00 -40.5 1978.62
2 2015-07-08 02:30:00 1936.74 30.5
3 2015-07-08 02:31:00 1937.14 1978.99
4 2015-07-08 02:32:00 1937.66 1978.83
5 2015-07-08 02:33:00 402.4 1979.15
6 2015-07-08 02:45:00 1937.00 1979.00
7 2015-07-08 02:46:00 1937.75 1979.29
8 2015-07-08 02:47:00 1937.84 1978.44
9 2015-07-08 02:48:00 -30.23 3.5
10 2015-07-08 02:49:00 1937.82 1978.68
11 2015-07-08 02:50:00 1937.55 1979.60
12 2015-07-08 02:51:00 1937.55 1979.13
13 2015-07-08 02:52:00 1937.65 1979.12
14 2015-07-08 02:53:00 1937.56 1978.28
15 2015-07-08 02:54:00 1937.38 1978.99
16 2015-07-08 02:58:00 -22.34 1978.61
17 2015-07-08 02:59:00 1937.78 1978.85
18 2015-07-08 03:00:00 1937.71 100.42
19 2015-07-08 03:01:00 1937.14 1979.04
20 2015-07-08 03:02:00 2500.00 0.13
Dataset after screening and replacement.
timestamp tr ts
1 2015-07-08 02:29:00 NA 1978.62
2 2015-07-08 02:30:00 1936.74 NA
3 2015-07-08 02:31:00 1937.14 1978.99
4 2015-07-08 02:32:00 1937.66 1978.83
5 2015-07-08 02:33:00 NA 1979.15
6 2015-07-08 02:45:00 1937.00 1979.00
7 2015-07-08 02:46:00 1937.75 1979.29
8 2015-07-08 02:47:00 1937.84 1978.44
9 2015-07-08 02:48:00 NA NA
10 2015-07-08 02:49:00 1937.82 1978.68
11 2015-07-08 02:50:00 1937.55 1979.60
12 2015-07-08 02:51:00 1937.55 1979.13
13 2015-07-08 02:52:00 1937.65 1979.12
14 2015-07-08 02:53:00 1937.56 1978.28
15 2015-07-08 02:54:00 1937.38 1978.99
16 2015-07-08 02:58:00 NA 1978.61
17 2015-07-08 02:59:00 1937.78 1978.85
18 2015-07-08 03:00:00 1937.71 NA
19 2015-07-08 03:01:00 1937.14 1979.04
20 2015-07-08 03:02:00 NA NA
Thanks a lot guys.
It's a similar philosophy with #Ranalyst 's answer, but I'm using an ifelse approach combined with sapply to update multiple columns.
dt = read.table(text = "timestamp tr ts
2015-07-08 -40.5 1978.62
2015-07-08 1936.74 30.5
2015-07-08 1937.14 1978.99
2015-07-08 1937.66 1978.83
2015-07-08 402.4 1979.15
2015-07-08 1937.00 1979.00", header=T)
dt
# timestamp tr ts
# 1 2015-07-08 -40.50 1978.62
# 2 2015-07-08 1936.74 30.50
# 3 2015-07-08 1937.14 1978.99
# 4 2015-07-08 1937.66 1978.83
# 5 2015-07-08 402.40 1979.15
# 6 2015-07-08 1937.00 1979.00
# select positions of columns to update
cols_to_update = 2:3
# update those columns
dt[,cols_to_update] = sapply(cols_to_update, function(x) ifelse(dt[,x] <= 1800 | dt[,x] >= 2200, NA, dt[,x]))
dt
# timestamp tr ts
# 1 2015-07-08 NA 1978.62
# 2 2015-07-08 1936.74 NA
# 3 2015-07-08 1937.14 1978.99
# 4 2015-07-08 1937.66 1978.83
# 5 2015-07-08 NA 1979.15
# 6 2015-07-08 1937.00 1979.00
# simulate some data
set.seed(123)
ts=rnorm(15,2000,300)
ts
1 1831.857
2 1930.947
3 2467.612
4 2021.153
5 2038.786
6 2514.519
7 2138.275
8 1620.482
9 1793.944
10 1866.301
11 2367.225
12 2107.944
13 2120.231
14 2033.205
15 1833.248
# then convert all numbers less than 1800 or greater than 2200 to NA's
ts[ts <= 1800 | ts >= 2200] = NA
as.data.frame(list(ts=ts))
ts
1 1831.857
2 1930.947
3 NA
4 2021.153
5 2038.786
6 NA
7 2138.275
8 NA
9 NA
10 1866.301
11 NA
12 2107.944
13 2120.231
14 2033.205
15 1833.248
or in your case if your data frame is called data
data$ts[data$ts <= 1800 | data$ts >= 2200] = NA

Resources