Accessing data in dataframe based on dates in r - r

I have this mesurements from my temperature sensor that I put in a dataframe data.
Time Temperature
1 2012-06-28 12:49:00 23.04
2 2012-06-28 12:49:34 23.06
3 2012-06-28 12:49:38 23.06
4 2012-06-28 12:49:39 23.08
5 2012-06-28 12:49:40 23.08
6 2012-06-28 12:49:56 23.09
7 2012-06-28 13:49:00 23.02
8 2012-06-28 14:49:00 22.73
9 2012-06-28 15:49:00 22.50
10 2012-06-28 16:49:00 22.38
11 2012-06-28 17:49:00 22.31
12 2012-06-28 18:49:00 22.16
13 2012-06-28 19:49:00 22.11
14 2012-06-28 20:49:00 22.04
15 2012-06-28 21:49:00 21.89
16 2012-06-28 22:49:00 21.78
17 2012-06-28 23:49:00 21.66
18 2012-06-29 00:49:00 21.64
19 2012-06-29 01:49:00 21.52
20 2012-06-29 02:49:00 21.42
21 2012-06-29 03:49:00 21.36
22 2012-06-29 04:49:00 21.34
23 2012-06-29 05:49:00 21.24
24 2012-06-29 06:49:00 21.29
25 2012-06-29 07:27:08 21.32
26 2012-06-29 07:49:00 21.38
27 2012-06-29 08:49:00 21.39
28 2012-06-29 09:49:00 21.44
29 2012-06-29 10:49:00 21.42
30 2012-06-29 11:49:00 21.58
31 2012-06-29 12:49:00 21.96
32 2012-06-29 13:49:00 22.22
33 2012-06-29 14:49:00 22.33
34 2012-06-29 15:49:00 22.51
The type of data in data$Temps are POSIxlt
I want to create a new dataframe that includes only the mesurement of for exemple this day : 2012-06-28. That would be data[1:17,]
I tried to work with the function which() based on exemples from the internet but I failed to find a solution.
What function should I use ?

In order to do that i used this :
library(lubridate)
data[date(data$Time)==ymd("2012-06-28"),]
It works just fine.

We can use as.Date
subset(data, as.Date(Time) == as.Date("2012-06-28"))

Related

Expand / impute time series in R

I have a time series that I require on a weekly basis, however I currently have the data on a quarterly basis. For example:
R> test
Quarter week totA totB totC totD
1 1 2015-12-28 1745 1720 11 1714
2 2 2016-03-28 1736 1718 7 1710
3 3 2016-06-27 1777 1768 5 1750
4 4 2016-09-26 1833 1815 13 1795
5 1 2016-12-26 1708 1697 6 1677
R>
What I want is to have the information on a weekly basis, each of the totals (totA to totD) needs to be divided by the number of weeks until the next quarter (i.e. 13, as there are 13 weeks in the quarter in 2016 - but very occasionally it might be 14 if there is a year with 53 weeks such as 2015) such that the quarterly total is the same. So, from the example above, the first 26 weeks become:
1 1 2015-12-28 134.9 132.3 0.8462 131.8
2 2 2016-01-04 134.9 132.3 0.8462 131.8
3 3 2016-01-11 134.9 132.3 0.8462 131.8
4 4 2016-01-18 134.9 132.3 0.8462 131.8
5 5 2016-01-25 134.9 132.3 0.8462 131.8
6 6 2016-02-01 134.9 132.3 0.8462 131.8
7 7 2016-02-08 134.9 132.3 0.8462 131.8
8 8 2016-02-15 134.9 132.3 0.8462 131.8
9 9 2016-02-22 134.9 132.3 0.8462 131.8
10 10 2016-02-29 134.9 132.3 0.8462 131.8
11 11 2016-03-07 134.9 132.3 0.8462 131.8
12 12 2016-03-14 134.9 132.3 0.8462 131.8
13 13 2016-03-21 134.9 132.3 0.8462 131.8
14 14 2016-03-28 133.5 132.2 0.5385 131.5
15 15 2016-04-04 133.5 132.2 0.5385 131.5
16 16 2016-04-11 133.5 132.2 0.5385 131.5
17 17 2016-04-18 133.5 132.2 0.5385 131.5
18 18 2016-04-25 133.5 132.2 0.5385 131.5
19 19 2016-05-02 133.5 132.2 0.5385 131.5
20 20 2016-05-09 133.5 132.2 0.5385 131.5
21 21 2016-05-16 133.5 132.2 0.5385 131.5
22 22 2016-05-23 133.5 132.2 0.5385 131.5
23 23 2016-05-30 133.5 132.2 0.5385 131.5
24 24 2016-06-06 133.5 132.2 0.5385 131.5
25 25 2016-06-13 133.5 132.2 0.5385 131.5
26 26 2016-06-20 133.5 132.2 0.5385 131.5
R>
That was obtained using:
rbind(
data.frame(Week_number=c(1:13),
Week_commencing=seq(as.Date("2015-12-28"), by=7, len=13),
totA=rep(1754/13,13),
totB=rep(1720/13,13),
totC=rep(11/13,13),
totD=rep(1714/13,13)
),
data.frame(Week_number=c(14:26),
Week_commencing=seq(as.Date("2016-03-28"), by=7, len=13),
totA=rep(1736/13,13),
totB=rep(1718/13,13),
totC=rep(7/13,13),
totD=rep(1710/13,13)
)
)
But there's clearly a better way of doing it rather than manually... The data set is, of course, much larger!
I've tried a few things, but other than creating a sequence of weeks and then filling it in manually as above, I'm going around in circles. I'm sure there's a way to do it in the tidyverse, but I can't figure out how (most of my R is self-taught, and from before when tidyverse was available). Any help would be appreciated!
As all information used in this solution comes from the data, the number of days (or weeks) in the final quarter is assumed to be the same as in the quarter preceding it. So you might want to check up on that ...
library(dplyr)
library(lubridate)
library(purrr)
library(tidyr)
test %>%
mutate(week = ymd(week),
weekCount = if_else(week != max(week),
as.double(abs(week - lead(week)))/7,
as.double(abs(week - lag(week)))/7),
weekInQuarter = map(weekCount, ~ seq_len(.)),
across(totA:totD, ~ ./weekCount)) %>%
unnest(weekInQuarter) %>%
mutate(week = week + weeks(weekInQuarter - 1)) %>%
select(- weekCount)
# Quarter week totA totB totC totD weekInQuarter
# <dbl> <date> <dbl> <dbl> <dbl> <dbl> <int>
# 1 1 2015-12-28 134. 132. 0.846 132. 1
# 2 1 2016-01-04 134. 132. 0.846 132. 2
# 3 1 2016-01-11 134. 132. 0.846 132. 3
# 4 1 2016-01-18 134. 132. 0.846 132. 4
# 5 1 2016-01-25 134. 132. 0.846 132. 5
# 6 1 2016-02-01 134. 132. 0.846 132. 6
# 7 1 2016-02-08 134. 132. 0.846 132. 7
# 8 1 2016-02-15 134. 132. 0.846 132. 8
# 9 1 2016-02-22 134. 132. 0.846 132. 9
# 10 1 2016-02-29 134. 132. 0.846 132. 10

Find the value of a variable 30 days into future

Hi below is my r dataframe. It is a date excerpt from S&P500 data for the past 10 years or so. As you can see I have created a column called Date 30, which is the date + 30 days. I want to add a new column (using dplyr if I can) called Close30, which is the "Close" value on the date of "Date30" - I want to look into the future from a given date (obviously it wont work for the past 30 days...). Sort of like offsetting a column, but it needs a filter/lookup function, because the data is business days, and I need to add 30 calendar days - so I cannot do a constant offset - it needs to be a lookup.
I have tried a few things but an getting nowhere...
Thanks so much if you can help!!?
tidySP500 = na.omit(SP500_Raw) # remove NA in casefuture data have NAs
tidySP500$Date = AsDate(tidySP500$Date)
tidySP500 = tidySP500 %>%
select("Date", "Open", "High", "Low", "Price") %>% # select and re-order required variables
rename("Close" = "Price") %>%
filter(Date >= as.Date("2014-01-05") & Date <= (as.Date("2014-01-05")+100)) %>%
mutate(Date30 = Date + 30)# %>% #WORKS UP TO HERE
mutate(Close30 = Close[Date == Date30]) %>% # FAILS
mutate(Close30 = filter(Close, Date == Date30)) #FAILS
Date Open High Low Close Date30
1 2014-04-15 1831.45 1844.02 1816.29 1842.98 2014-05-15
2 2014-04-14 1818.18 1834.19 1815.80 1830.61 2014-05-14
3 2014-04-11 1830.65 1835.07 1814.36 1815.69 2014-05-11
4 2014-04-10 1872.28 1872.53 1830.87 1833.08 2014-05-10
5 2014-04-09 1852.64 1872.43 1852.38 1872.18 2014-05-09
6 2014-04-08 1845.48 1854.95 1837.49 1851.96 2014-05-08
7 2014-04-07 1863.92 1864.04 1841.48 1845.04 2014-05-07
8 2014-04-04 1890.25 1897.28 1863.26 1865.09 2014-05-04
9 2014-04-03 1891.43 1893.80 1882.65 1888.77 2014-05-03
10 2014-04-02 1886.61 1893.17 1883.79 1890.90 2014-05-02
11 2014-04-01 1873.96 1885.84 1873.96 1885.52 2014-05-01
12 2014-03-31 1859.16 1875.18 1859.16 1872.34 2014-04-30
13 2014-03-28 1850.07 1866.63 1850.07 1857.62 2014-04-27
14 2014-03-27 1852.11 1855.55 1842.11 1849.04 2014-04-26
15 2014-03-26 1867.09 1875.92 1852.56 1852.56 2014-04-25
16 2014-03-25 1859.48 1871.87 1855.96 1865.62 2014-04-24
17 2014-03-24 1867.67 1873.34 1849.69 1857.44 2014-04-23
18 2014-03-21 1874.53 1883.97 1863.46 1866.52 2014-04-20
19 2014-03-20 1860.09 1873.49 1854.63 1872.01 2014-04-19
20 2014-03-19 1872.25 1874.14 1850.35 1860.77 2014-04-18
21 2014-03-18 1858.92 1873.76 1858.92 1872.25 2014-04-17
22 2014-03-17 1842.81 1862.30 1842.81 1858.83 2014-04-16
23 2014-03-14 1845.07 1852.44 1839.57 1841.13 2014-04-13
24 2014-03-13 1869.06 1874.40 1841.86 1846.34 2014-04-12
25 2014-03-12 1866.15 1868.38 1854.38 1868.20 2014-04-11
26 2014-03-11 1878.26 1882.35 1863.88 1867.63 2014-04-10
27 2014-03-10 1877.86 1877.87 1867.04 1877.17 2014-04-09
28 2014-03-07 1878.52 1883.57 1870.56 1878.04 2014-04-06
29 2014-03-06 1874.18 1881.94 1874.18 1877.03 2014-04-05
30 2014-03-05 1874.05 1876.53 1871.11 1873.81 2014-04-04
31 2014-03-04 1849.23 1876.23 1849.23 1873.91 2014-04-03
32 2014-03-03 1857.68 1857.68 1834.44 1845.73 2014-04-02
33 2014-02-28 1855.12 1867.92 1847.67 1859.45 2014-03-30
34 2014-02-27 1844.90 1854.53 1841.13 1854.29 2014-03-29
35 2014-02-26 1845.79 1852.65 1840.66 1845.16 2014-03-28
36 2014-02-25 1847.66 1852.91 1840.19 1845.12 2014-03-27
37 2014-02-24 1836.78 1858.71 1836.78 1847.61 2014-03-26
38 2014-02-21 1841.07 1846.13 1835.60 1836.25 2014-03-23
39 2014-02-20 1829.24 1842.79 1824.58 1839.78 2014-03-22
40 2014-02-19 1838.90 1847.50 1826.99 1828.75 2014-03-21
41 2014-02-18 1839.03 1842.87 1835.01 1840.76 2014-03-20
42 2014-02-14 1828.46 1841.65 1825.59 1838.63 2014-03-16
43 2014-02-13 1814.82 1830.25 1809.22 1829.83 2014-03-15
44 2014-02-12 1820.12 1826.55 1815.97 1819.26 2014-03-14
45 2014-02-11 1800.45 1823.54 1800.41 1819.75 2014-03-13
46 2014-02-10 1796.20 1799.94 1791.83 1799.84 2014-03-12
47 2014-02-07 1776.01 1798.03 1776.01 1797.02 2014-03-09
48 2014-02-06 1752.99 1774.06 1752.99 1773.43 2014-03-08
49 2014-02-05 1753.38 1755.79 1737.92 1751.64 2014-03-07
50 2014-02-04 1743.82 1758.73 1743.82 1755.20 2014-03-06
51 2014-02-03 1782.68 1784.83 1739.66 1741.89 2014-03-05
52 2014-01-31 1790.88 1793.88 1772.26 1782.59 2014-03-02
53 2014-01-30 1777.17 1798.77 1777.17 1794.19 2014-03-01
54 2014-01-29 1790.15 1790.15 1770.45 1774.20 2014-02-28
55 2014-01-28 1783.00 1793.87 1779.49 1792.50 2014-02-27
56 2014-01-27 1791.03 1795.98 1772.88 1781.56 2014-02-26
57 2014-01-24 1826.96 1826.96 1790.29 1790.29 2014-02-23
58 2014-01-23 1842.29 1842.29 1820.06 1828.46 2014-02-22
59 2014-01-22 1844.71 1846.87 1840.88 1844.86 2014-02-21
60 2014-01-21 1841.05 1849.31 1832.38 1843.80 2014-02-20
61 2014-01-17 1844.23 1846.04 1835.23 1838.70 2014-02-16
62 2014-01-16 1847.99 1847.99 1840.30 1845.89 2014-02-15
63 2014-01-15 1840.52 1850.84 1840.52 1848.38 2014-02-14
64 2014-01-14 1821.36 1839.26 1821.36 1838.88 2014-02-13
65 2014-01-13 1841.26 1843.45 1815.52 1819.20 2014-02-12
66 2014-01-10 1840.06 1843.15 1832.43 1842.37 2014-02-09
67 2014-01-09 1839.00 1843.23 1830.38 1838.13 2014-02-08
68 2014-01-08 1837.90 1840.02 1831.40 1837.49 2014-02-07
69 2014-01-07 1828.71 1840.10 1828.71 1837.88 2014-02-06
70 2014-01-06 1832.31 1837.16 1823.73 1826.77 2014-02-05
Something like this?
library(tidyverse)
tidySP500 %>% left_join(select(tidySP500, Close, Date30 = Date), by = c('Date30'))
#> # A tibble: 70 x 7
#> Date Open High Low Close.x Date30 Close.y
#> <date> <dbl> <dbl> <dbl> <dbl> <date> <dbl>
#> 1 2014-04-15 1831. 1844. 1816. 1843. 2014-05-15 NA
#> 2 2014-04-14 1818. 1834. 1816. 1831. 2014-05-14 NA
#> 3 2014-04-11 1831. 1835. 1814. 1816. 2014-05-11 NA
#> 4 2014-04-10 1872. 1873. 1831. 1833. 2014-05-10 NA
#> 5 2014-04-09 1853. 1872. 1852. 1872. 2014-05-09 NA
#> 6 2014-04-08 1845. 1855. 1837. 1852. 2014-05-08 NA
#> 7 2014-04-07 1864. 1864. 1841. 1845. 2014-05-07 NA
#> 8 2014-04-04 1890. 1897. 1863. 1865. 2014-05-04 NA
#> 9 2014-04-03 1891. 1894. 1883. 1889. 2014-05-03 NA
#> 10 2014-04-02 1887. 1893. 1884. 1891. 2014-05-02 NA
#> # … with 60 more rows
Created on 2020-02-22 by the reprex package (v0.3.0)
DATA
tidySP500 <- read.so::read_so('Date Open High Low Close Date30
1 2014-04-15 1831.45 1844.02 1816.29 1842.98 2014-05-15
2 2014-04-14 1818.18 1834.19 1815.80 1830.61 2014-05-14
3 2014-04-11 1830.65 1835.07 1814.36 1815.69 2014-05-11
4 2014-04-10 1872.28 1872.53 1830.87 1833.08 2014-05-10
5 2014-04-09 1852.64 1872.43 1852.38 1872.18 2014-05-09
6 2014-04-08 1845.48 1854.95 1837.49 1851.96 2014-05-08
7 2014-04-07 1863.92 1864.04 1841.48 1845.04 2014-05-07
8 2014-04-04 1890.25 1897.28 1863.26 1865.09 2014-05-04
9 2014-04-03 1891.43 1893.80 1882.65 1888.77 2014-05-03
10 2014-04-02 1886.61 1893.17 1883.79 1890.90 2014-05-02
11 2014-04-01 1873.96 1885.84 1873.96 1885.52 2014-05-01
12 2014-03-31 1859.16 1875.18 1859.16 1872.34 2014-04-30
13 2014-03-28 1850.07 1866.63 1850.07 1857.62 2014-04-27
14 2014-03-27 1852.11 1855.55 1842.11 1849.04 2014-04-26
15 2014-03-26 1867.09 1875.92 1852.56 1852.56 2014-04-25
16 2014-03-25 1859.48 1871.87 1855.96 1865.62 2014-04-24
17 2014-03-24 1867.67 1873.34 1849.69 1857.44 2014-04-23
18 2014-03-21 1874.53 1883.97 1863.46 1866.52 2014-04-20
19 2014-03-20 1860.09 1873.49 1854.63 1872.01 2014-04-19
20 2014-03-19 1872.25 1874.14 1850.35 1860.77 2014-04-18
21 2014-03-18 1858.92 1873.76 1858.92 1872.25 2014-04-17
22 2014-03-17 1842.81 1862.30 1842.81 1858.83 2014-04-16
23 2014-03-14 1845.07 1852.44 1839.57 1841.13 2014-04-13
24 2014-03-13 1869.06 1874.40 1841.86 1846.34 2014-04-12
25 2014-03-12 1866.15 1868.38 1854.38 1868.20 2014-04-11
26 2014-03-11 1878.26 1882.35 1863.88 1867.63 2014-04-10
27 2014-03-10 1877.86 1877.87 1867.04 1877.17 2014-04-09
28 2014-03-07 1878.52 1883.57 1870.56 1878.04 2014-04-06
29 2014-03-06 1874.18 1881.94 1874.18 1877.03 2014-04-05
30 2014-03-05 1874.05 1876.53 1871.11 1873.81 2014-04-04
31 2014-03-04 1849.23 1876.23 1849.23 1873.91 2014-04-03
32 2014-03-03 1857.68 1857.68 1834.44 1845.73 2014-04-02
33 2014-02-28 1855.12 1867.92 1847.67 1859.45 2014-03-30
34 2014-02-27 1844.90 1854.53 1841.13 1854.29 2014-03-29
35 2014-02-26 1845.79 1852.65 1840.66 1845.16 2014-03-28
36 2014-02-25 1847.66 1852.91 1840.19 1845.12 2014-03-27
37 2014-02-24 1836.78 1858.71 1836.78 1847.61 2014-03-26
38 2014-02-21 1841.07 1846.13 1835.60 1836.25 2014-03-23
39 2014-02-20 1829.24 1842.79 1824.58 1839.78 2014-03-22
40 2014-02-19 1838.90 1847.50 1826.99 1828.75 2014-03-21
41 2014-02-18 1839.03 1842.87 1835.01 1840.76 2014-03-20
42 2014-02-14 1828.46 1841.65 1825.59 1838.63 2014-03-16
43 2014-02-13 1814.82 1830.25 1809.22 1829.83 2014-03-15
44 2014-02-12 1820.12 1826.55 1815.97 1819.26 2014-03-14
45 2014-02-11 1800.45 1823.54 1800.41 1819.75 2014-03-13
46 2014-02-10 1796.20 1799.94 1791.83 1799.84 2014-03-12
47 2014-02-07 1776.01 1798.03 1776.01 1797.02 2014-03-09
48 2014-02-06 1752.99 1774.06 1752.99 1773.43 2014-03-08
49 2014-02-05 1753.38 1755.79 1737.92 1751.64 2014-03-07
50 2014-02-04 1743.82 1758.73 1743.82 1755.20 2014-03-06
51 2014-02-03 1782.68 1784.83 1739.66 1741.89 2014-03-05
52 2014-01-31 1790.88 1793.88 1772.26 1782.59 2014-03-02
53 2014-01-30 1777.17 1798.77 1777.17 1794.19 2014-03-01
54 2014-01-29 1790.15 1790.15 1770.45 1774.20 2014-02-28
55 2014-01-28 1783.00 1793.87 1779.49 1792.50 2014-02-27
56 2014-01-27 1791.03 1795.98 1772.88 1781.56 2014-02-26
57 2014-01-24 1826.96 1826.96 1790.29 1790.29 2014-02-23
58 2014-01-23 1842.29 1842.29 1820.06 1828.46 2014-02-22
59 2014-01-22 1844.71 1846.87 1840.88 1844.86 2014-02-21
60 2014-01-21 1841.05 1849.31 1832.38 1843.80 2014-02-20
61 2014-01-17 1844.23 1846.04 1835.23 1838.70 2014-02-16
62 2014-01-16 1847.99 1847.99 1840.30 1845.89 2014-02-15
63 2014-01-15 1840.52 1850.84 1840.52 1848.38 2014-02-14
64 2014-01-14 1821.36 1839.26 1821.36 1838.88 2014-02-13
65 2014-01-13 1841.26 1843.45 1815.52 1819.20 2014-02-12
66 2014-01-10 1840.06 1843.15 1832.43 1842.37 2014-02-09
67 2014-01-09 1839.00 1843.23 1830.38 1838.13 2014-02-08
68 2014-01-08 1837.90 1840.02 1831.40 1837.49 2014-02-07
69 2014-01-07 1828.71 1840.10 1828.71 1837.88 2014-02-06
70 2014-01-06 1832.31 1837.16 1823.73 1826.77 2014-02-05')

difference between the first date and last date within same individual in R

Good afternoon
I am not an R user, but I need to get the difference between the first date and last date within RFID, to create a new column X. Therefore, the first value needs to be 1 (not zero), the second 2, ..., n.
Here an example of the data.
Thanks in advance.
RFID visit_date ADFI location
985152014315936 2017-11-25 2133 16
985152014315936 2017-11-26 2186 16
985152014315936 2017-11-27 3489 16
985152014315936 2017-11-28 2432 16
985152014315937 2017-11-24 15 17
985152014315937 2017-11-25 1512 17
985152014315937 2017-11-26 2378 17
985152014315937 2017-11-27 3241 17
985152014315938 2017-11-24 584 17
985152014315938 2017-11-25 1689 17
985152014315938 2017-11-26 2807 17
985152014315938 2017-11-27 2369 17
985152014315938 2017-11-28 2576 17
985152014315939 2017-11-25 1084 17
985152014315939 2017-11-26 3489 17
985152014315939 2017-11-27 2630 17
985152014315939 2017-11-28 3585 17
985152014315939 2017-11-29 3433 17
985152014315939 2017-11-30 2962 17
Here is a solution using dplyr and lubridate:
require(tidyverse);
require(lubridate);
df %>% group_by(RFID) %>% mutate(X = max(ymd(visit_date)) - min(ymd(visit_date)));
## A tibble: 19 x 5
## Groups: RFID [4]
# RFID visit_date ADFI location X
# <dbl> <fct> <int> <int> <time>
# 1 985152014315936 2017-11-25 2133 16 3
# 2 985152014315936 2017-11-26 2186 16 3
# 3 985152014315936 2017-11-27 3489 16 3
# 4 985152014315936 2017-11-28 2432 16 3
# 5 985152014315937 2017-11-24 15 17 3
# 6 985152014315937 2017-11-25 1512 17 3
# 7 985152014315937 2017-11-26 2378 17 3
# 8 985152014315937 2017-11-27 3241 17 3
# 9 985152014315938 2017-11-24 584 17 4
#10 985152014315938 2017-11-25 1689 17 4
#11 985152014315938 2017-11-26 2807 17 4
#12 985152014315938 2017-11-27 2369 17 4
#13 985152014315938 2017-11-28 2576 17 4
#14 985152014315939 2017-11-25 1084 17 5
#15 985152014315939 2017-11-26 3489 17 5
#16 985152014315939 2017-11-27 2630 17 5
#17 985152014315939 2017-11-28 3585 17 5
#18 985152014315939 2017-11-29 3433 17 5
#19 985152014315939 2017-11-30 2962 17 5
Sample data
df <- read.table(text =
"RFID visit_date ADFI location
985152014315936 2017-11-25 2133 16
985152014315936 2017-11-26 2186 16
985152014315936 2017-11-27 3489 16
985152014315936 2017-11-28 2432 16
985152014315937 2017-11-24 15 17
985152014315937 2017-11-25 1512 17
985152014315937 2017-11-26 2378 17
985152014315937 2017-11-27 3241 17
985152014315938 2017-11-24 584 17
985152014315938 2017-11-25 1689 17
985152014315938 2017-11-26 2807 17
985152014315938 2017-11-27 2369 17
985152014315938 2017-11-28 2576 17
985152014315939 2017-11-25 1084 17
985152014315939 2017-11-26 3489 17
985152014315939 2017-11-27 2630 17
985152014315939 2017-11-28 3585 17
985152014315939 2017-11-29 3433 17
985152014315939 2017-11-30 2962 17", header = T)
Using data.table:
data <- data.table(data)
data[, diff := max(as.Date(visit_date)) - min(as.Date(visit_date)), by = RFID]
of if you want to add 1:
data[, diff := max(as.Date(visit_date)) - min(as.Date(visit_date)) + 1, by = RFID]

Aggregate a data frame on variance

Say I have this data frame, df,
Day value
1 2012-06-10 552
2 2012-06-10 4850
3 2012-06-11 4642
4 2012-06-11 4132
5 2012-06-11 4190
6 2012-06-12 4186
7 2012-06-13 1139
8 2012-06-13 490
9 2012-06-13 5156
10 2012-06-13 4430
11 2012-06-13 4447
12 2012-06-14 4256
13 2012-06-14 3856
14 2012-06-14 1163
15 2012-06-17 564
16 2012-06-17 4866
17 2012-06-17 4421
18 2012-06-19 4206
19 2012-06-20 4272
20 2012-06-20 3993
21 2012-06-20 1211
22 2012-07-21 698
23 2012-07-21 5770
24 2012-07-21 5103
25 2012-07-21 775
26 2012-07-21 5140
27 2012-07-22 4868
I would like a to create a data.frame, dfvar, that would contain the daily variance: something like:
Day Variance
1 2012-06-10 9236402
2 2012-06-11 X
3 2012-06-12 4186
4 2012-06-13 1139
5 2012-06-14 4256
6 2012-06-17 564
7 2012-06-19 4206
8 2012-06-20 4272
9 2012-07-21 698
10 2012-07-22 4868
So for example, I computed it, the entry
dfvar$Variance[1] = var(c(552, 4850))
I tried to do
dfvar <- aggregate(df, by = list(Day), FUN = var)
but this isn't the input I expected. I really want to have the variance of the values of the same day, without the other days...
Any ideas about that?
Is this what you want ?
library(dplyr)
df%>%group_by(Day)%>%dplyr::summarise(Variance=var(value))#return NA if only one value within the group
Day Variance
<fctr> <dbl>
1 2012-06-10 9236402.00
2 2012-06-11 77961.33
3 2012-06-12 NA
4 2012-06-13 4615704.30
5 2012-06-14 2829816.33
6 2012-06-17 5596946.33
7 2012-06-19 NA
8 2012-06-20 2864514.33
9 2012-07-21 6422224.70
10 2012-07-22 NA

Plot with dates as X axis in R

I maintain my journal electronically and I'm trying to get an idea of how consistent I've been with my journal writing over the last few months. I have the following data file, which shows how many journal entries (Entry Count) and words (Word Count) I recorded over the preceding 30-day period.
Date Entry Count Word Count
2010-08-25 22 4205
2010-08-26 21 4012
2010-08-27 20 3865
2010-08-28 20 4062
2010-08-29 19 3938
2010-08-30 18 3759
2010-08-31 17 3564
2010-09-01 17 3564
2010-09-02 16 3444
2010-09-03 17 3647
2010-09-04 17 3617
2010-09-05 16 3390
2010-09-06 15 3251
2010-09-07 15 3186
2010-09-08 15 3186
2010-09-09 16 3414
2010-09-10 15 3228
2010-09-11 14 3006
2010-09-12 13 2769
2010-09-13 13 2781
2010-09-14 12 2637
2010-09-15 13 2774
2010-09-16 13 2808
2010-09-17 12 2732
2010-09-18 12 2664
2010-09-19 13 2931
2010-09-20 13 2751
2010-09-21 13 2710
2010-09-22 14 2950
2010-09-23 14 2834
2010-09-24 14 2834
2010-09-25 14 2834
2010-09-26 14 2834
2010-09-27 14 2834
2010-09-28 14 2543
2010-09-29 14 2543
2010-09-30 15 2884
2010-10-01 16 3105
2010-10-02 16 3105
2010-10-03 16 3105
2010-10-04 15 2902
2010-10-05 14 2805
2010-10-06 14 2805
2010-10-07 14 2805
2010-10-08 14 2812
2010-10-09 15 2895
2010-10-10 14 2667
2010-10-11 15 2876
2010-10-12 16 2938
2010-10-13 17 3112
2010-10-14 16 2894
2010-10-15 16 2894
2010-10-16 16 2923
2010-10-17 15 2722
2010-10-18 15 2722
2010-10-19 14 2544
2010-10-20 13 2277
2010-10-21 13 2329
2010-10-22 12 2132
2010-10-23 11 1892
2010-10-24 10 1764
2010-10-25 10 1764
2010-10-26 10 1764
2010-10-27 10 1764
2010-10-28 10 1764
2010-10-29 9 1670
2010-10-30 10 1969
2010-10-31 10 1709
2010-11-01 10 1624
2010-11-02 11 1677
2010-11-03 11 1677
2010-11-04 11 1677
2010-11-05 11 1677
2010-11-06 12 1786
2010-11-07 12 1786
2010-11-08 11 1529
2010-11-09 10 1446
2010-11-10 11 1682
2010-11-11 11 1540
2010-11-12 11 1673
2010-11-13 11 1765
2010-11-14 12 1924
2010-11-15 13 2276
2010-11-16 12 2110
2010-11-17 13 2524
2010-11-18 14 2615
2010-11-19 14 2615
2010-11-20 15 2706
2010-11-21 14 2549
2010-11-22 15 2647
2010-11-23 16 2874
2010-11-24 16 2874
2010-11-25 16 2874
2010-11-26 17 3249
2010-11-27 18 3421
2010-11-28 18 3421
2010-11-29 19 3647
I'm trying to plot this data with R to get a graphical representation of my journal-writing consistency. I load it into R with the following command.
d <- read.table("journal.txt", header=T, sep="\t")
I can then graph the data with the following command.
plot(seq(from=1, to=length(d$Entry.Count), by=1), d$Entry.Count, type="o", ylim=c(0, max(d$Entry.Count)))
However, in this plot the X axis is just a number, not a date. I tried adjusting the command to show dates on the X axis like this.
plot(d$Date, d$Entry.Count, type="o", ylim=c(0, max(d$Entry.Count)))
However, not only does the plot look strange, but the labels on the X axis are not very helpful. What is the best way to plot this data so that I can clearly associate dates with points on the plotted curve?
Based on your code the dates are just characters.
Try converting them to Dates:
plot(as.Date(d$Date), d$Entry.Count)
Quite simple in your case as the "%Y-%m-%d" format is the default for as.Date. See strptime for more general options.
You could use zoo. ?plot.zoo has several examples of how to create custom axis labels.
z <- zoo(d[,-1],as.Date(d[,1]))
plot(z)
# Example of custom axis labels
plot(z$Entry.Count, screen = 1, col = 1:2, xaxt = "n")
ix <- seq(1, length(time(z)), 3)
axis(1, at = time(z)[ix], labels = format(time(z)[ix],"%b-%d"), cex.axis = 0.7)

Resources