I maintain my journal electronically and I'm trying to get an idea of how consistent I've been with my journal writing over the last few months. I have the following data file, which shows how many journal entries (Entry Count) and words (Word Count) I recorded over the preceding 30-day period.
Date Entry Count Word Count
2010-08-25 22 4205
2010-08-26 21 4012
2010-08-27 20 3865
2010-08-28 20 4062
2010-08-29 19 3938
2010-08-30 18 3759
2010-08-31 17 3564
2010-09-01 17 3564
2010-09-02 16 3444
2010-09-03 17 3647
2010-09-04 17 3617
2010-09-05 16 3390
2010-09-06 15 3251
2010-09-07 15 3186
2010-09-08 15 3186
2010-09-09 16 3414
2010-09-10 15 3228
2010-09-11 14 3006
2010-09-12 13 2769
2010-09-13 13 2781
2010-09-14 12 2637
2010-09-15 13 2774
2010-09-16 13 2808
2010-09-17 12 2732
2010-09-18 12 2664
2010-09-19 13 2931
2010-09-20 13 2751
2010-09-21 13 2710
2010-09-22 14 2950
2010-09-23 14 2834
2010-09-24 14 2834
2010-09-25 14 2834
2010-09-26 14 2834
2010-09-27 14 2834
2010-09-28 14 2543
2010-09-29 14 2543
2010-09-30 15 2884
2010-10-01 16 3105
2010-10-02 16 3105
2010-10-03 16 3105
2010-10-04 15 2902
2010-10-05 14 2805
2010-10-06 14 2805
2010-10-07 14 2805
2010-10-08 14 2812
2010-10-09 15 2895
2010-10-10 14 2667
2010-10-11 15 2876
2010-10-12 16 2938
2010-10-13 17 3112
2010-10-14 16 2894
2010-10-15 16 2894
2010-10-16 16 2923
2010-10-17 15 2722
2010-10-18 15 2722
2010-10-19 14 2544
2010-10-20 13 2277
2010-10-21 13 2329
2010-10-22 12 2132
2010-10-23 11 1892
2010-10-24 10 1764
2010-10-25 10 1764
2010-10-26 10 1764
2010-10-27 10 1764
2010-10-28 10 1764
2010-10-29 9 1670
2010-10-30 10 1969
2010-10-31 10 1709
2010-11-01 10 1624
2010-11-02 11 1677
2010-11-03 11 1677
2010-11-04 11 1677
2010-11-05 11 1677
2010-11-06 12 1786
2010-11-07 12 1786
2010-11-08 11 1529
2010-11-09 10 1446
2010-11-10 11 1682
2010-11-11 11 1540
2010-11-12 11 1673
2010-11-13 11 1765
2010-11-14 12 1924
2010-11-15 13 2276
2010-11-16 12 2110
2010-11-17 13 2524
2010-11-18 14 2615
2010-11-19 14 2615
2010-11-20 15 2706
2010-11-21 14 2549
2010-11-22 15 2647
2010-11-23 16 2874
2010-11-24 16 2874
2010-11-25 16 2874
2010-11-26 17 3249
2010-11-27 18 3421
2010-11-28 18 3421
2010-11-29 19 3647
I'm trying to plot this data with R to get a graphical representation of my journal-writing consistency. I load it into R with the following command.
d <- read.table("journal.txt", header=T, sep="\t")
I can then graph the data with the following command.
plot(seq(from=1, to=length(d$Entry.Count), by=1), d$Entry.Count, type="o", ylim=c(0, max(d$Entry.Count)))
However, in this plot the X axis is just a number, not a date. I tried adjusting the command to show dates on the X axis like this.
plot(d$Date, d$Entry.Count, type="o", ylim=c(0, max(d$Entry.Count)))
However, not only does the plot look strange, but the labels on the X axis are not very helpful. What is the best way to plot this data so that I can clearly associate dates with points on the plotted curve?
Based on your code the dates are just characters.
Try converting them to Dates:
plot(as.Date(d$Date), d$Entry.Count)
Quite simple in your case as the "%Y-%m-%d" format is the default for as.Date. See strptime for more general options.
You could use zoo. ?plot.zoo has several examples of how to create custom axis labels.
z <- zoo(d[,-1],as.Date(d[,1]))
plot(z)
# Example of custom axis labels
plot(z$Entry.Count, screen = 1, col = 1:2, xaxt = "n")
ix <- seq(1, length(time(z)), 3)
axis(1, at = time(z)[ix], labels = format(time(z)[ix],"%b-%d"), cex.axis = 0.7)
Related
Hi below is my r dataframe. It is a date excerpt from S&P500 data for the past 10 years or so. As you can see I have created a column called Date 30, which is the date + 30 days. I want to add a new column (using dplyr if I can) called Close30, which is the "Close" value on the date of "Date30" - I want to look into the future from a given date (obviously it wont work for the past 30 days...). Sort of like offsetting a column, but it needs a filter/lookup function, because the data is business days, and I need to add 30 calendar days - so I cannot do a constant offset - it needs to be a lookup.
I have tried a few things but an getting nowhere...
Thanks so much if you can help!!?
tidySP500 = na.omit(SP500_Raw) # remove NA in casefuture data have NAs
tidySP500$Date = AsDate(tidySP500$Date)
tidySP500 = tidySP500 %>%
select("Date", "Open", "High", "Low", "Price") %>% # select and re-order required variables
rename("Close" = "Price") %>%
filter(Date >= as.Date("2014-01-05") & Date <= (as.Date("2014-01-05")+100)) %>%
mutate(Date30 = Date + 30)# %>% #WORKS UP TO HERE
mutate(Close30 = Close[Date == Date30]) %>% # FAILS
mutate(Close30 = filter(Close, Date == Date30)) #FAILS
Date Open High Low Close Date30
1 2014-04-15 1831.45 1844.02 1816.29 1842.98 2014-05-15
2 2014-04-14 1818.18 1834.19 1815.80 1830.61 2014-05-14
3 2014-04-11 1830.65 1835.07 1814.36 1815.69 2014-05-11
4 2014-04-10 1872.28 1872.53 1830.87 1833.08 2014-05-10
5 2014-04-09 1852.64 1872.43 1852.38 1872.18 2014-05-09
6 2014-04-08 1845.48 1854.95 1837.49 1851.96 2014-05-08
7 2014-04-07 1863.92 1864.04 1841.48 1845.04 2014-05-07
8 2014-04-04 1890.25 1897.28 1863.26 1865.09 2014-05-04
9 2014-04-03 1891.43 1893.80 1882.65 1888.77 2014-05-03
10 2014-04-02 1886.61 1893.17 1883.79 1890.90 2014-05-02
11 2014-04-01 1873.96 1885.84 1873.96 1885.52 2014-05-01
12 2014-03-31 1859.16 1875.18 1859.16 1872.34 2014-04-30
13 2014-03-28 1850.07 1866.63 1850.07 1857.62 2014-04-27
14 2014-03-27 1852.11 1855.55 1842.11 1849.04 2014-04-26
15 2014-03-26 1867.09 1875.92 1852.56 1852.56 2014-04-25
16 2014-03-25 1859.48 1871.87 1855.96 1865.62 2014-04-24
17 2014-03-24 1867.67 1873.34 1849.69 1857.44 2014-04-23
18 2014-03-21 1874.53 1883.97 1863.46 1866.52 2014-04-20
19 2014-03-20 1860.09 1873.49 1854.63 1872.01 2014-04-19
20 2014-03-19 1872.25 1874.14 1850.35 1860.77 2014-04-18
21 2014-03-18 1858.92 1873.76 1858.92 1872.25 2014-04-17
22 2014-03-17 1842.81 1862.30 1842.81 1858.83 2014-04-16
23 2014-03-14 1845.07 1852.44 1839.57 1841.13 2014-04-13
24 2014-03-13 1869.06 1874.40 1841.86 1846.34 2014-04-12
25 2014-03-12 1866.15 1868.38 1854.38 1868.20 2014-04-11
26 2014-03-11 1878.26 1882.35 1863.88 1867.63 2014-04-10
27 2014-03-10 1877.86 1877.87 1867.04 1877.17 2014-04-09
28 2014-03-07 1878.52 1883.57 1870.56 1878.04 2014-04-06
29 2014-03-06 1874.18 1881.94 1874.18 1877.03 2014-04-05
30 2014-03-05 1874.05 1876.53 1871.11 1873.81 2014-04-04
31 2014-03-04 1849.23 1876.23 1849.23 1873.91 2014-04-03
32 2014-03-03 1857.68 1857.68 1834.44 1845.73 2014-04-02
33 2014-02-28 1855.12 1867.92 1847.67 1859.45 2014-03-30
34 2014-02-27 1844.90 1854.53 1841.13 1854.29 2014-03-29
35 2014-02-26 1845.79 1852.65 1840.66 1845.16 2014-03-28
36 2014-02-25 1847.66 1852.91 1840.19 1845.12 2014-03-27
37 2014-02-24 1836.78 1858.71 1836.78 1847.61 2014-03-26
38 2014-02-21 1841.07 1846.13 1835.60 1836.25 2014-03-23
39 2014-02-20 1829.24 1842.79 1824.58 1839.78 2014-03-22
40 2014-02-19 1838.90 1847.50 1826.99 1828.75 2014-03-21
41 2014-02-18 1839.03 1842.87 1835.01 1840.76 2014-03-20
42 2014-02-14 1828.46 1841.65 1825.59 1838.63 2014-03-16
43 2014-02-13 1814.82 1830.25 1809.22 1829.83 2014-03-15
44 2014-02-12 1820.12 1826.55 1815.97 1819.26 2014-03-14
45 2014-02-11 1800.45 1823.54 1800.41 1819.75 2014-03-13
46 2014-02-10 1796.20 1799.94 1791.83 1799.84 2014-03-12
47 2014-02-07 1776.01 1798.03 1776.01 1797.02 2014-03-09
48 2014-02-06 1752.99 1774.06 1752.99 1773.43 2014-03-08
49 2014-02-05 1753.38 1755.79 1737.92 1751.64 2014-03-07
50 2014-02-04 1743.82 1758.73 1743.82 1755.20 2014-03-06
51 2014-02-03 1782.68 1784.83 1739.66 1741.89 2014-03-05
52 2014-01-31 1790.88 1793.88 1772.26 1782.59 2014-03-02
53 2014-01-30 1777.17 1798.77 1777.17 1794.19 2014-03-01
54 2014-01-29 1790.15 1790.15 1770.45 1774.20 2014-02-28
55 2014-01-28 1783.00 1793.87 1779.49 1792.50 2014-02-27
56 2014-01-27 1791.03 1795.98 1772.88 1781.56 2014-02-26
57 2014-01-24 1826.96 1826.96 1790.29 1790.29 2014-02-23
58 2014-01-23 1842.29 1842.29 1820.06 1828.46 2014-02-22
59 2014-01-22 1844.71 1846.87 1840.88 1844.86 2014-02-21
60 2014-01-21 1841.05 1849.31 1832.38 1843.80 2014-02-20
61 2014-01-17 1844.23 1846.04 1835.23 1838.70 2014-02-16
62 2014-01-16 1847.99 1847.99 1840.30 1845.89 2014-02-15
63 2014-01-15 1840.52 1850.84 1840.52 1848.38 2014-02-14
64 2014-01-14 1821.36 1839.26 1821.36 1838.88 2014-02-13
65 2014-01-13 1841.26 1843.45 1815.52 1819.20 2014-02-12
66 2014-01-10 1840.06 1843.15 1832.43 1842.37 2014-02-09
67 2014-01-09 1839.00 1843.23 1830.38 1838.13 2014-02-08
68 2014-01-08 1837.90 1840.02 1831.40 1837.49 2014-02-07
69 2014-01-07 1828.71 1840.10 1828.71 1837.88 2014-02-06
70 2014-01-06 1832.31 1837.16 1823.73 1826.77 2014-02-05
Something like this?
library(tidyverse)
tidySP500 %>% left_join(select(tidySP500, Close, Date30 = Date), by = c('Date30'))
#> # A tibble: 70 x 7
#> Date Open High Low Close.x Date30 Close.y
#> <date> <dbl> <dbl> <dbl> <dbl> <date> <dbl>
#> 1 2014-04-15 1831. 1844. 1816. 1843. 2014-05-15 NA
#> 2 2014-04-14 1818. 1834. 1816. 1831. 2014-05-14 NA
#> 3 2014-04-11 1831. 1835. 1814. 1816. 2014-05-11 NA
#> 4 2014-04-10 1872. 1873. 1831. 1833. 2014-05-10 NA
#> 5 2014-04-09 1853. 1872. 1852. 1872. 2014-05-09 NA
#> 6 2014-04-08 1845. 1855. 1837. 1852. 2014-05-08 NA
#> 7 2014-04-07 1864. 1864. 1841. 1845. 2014-05-07 NA
#> 8 2014-04-04 1890. 1897. 1863. 1865. 2014-05-04 NA
#> 9 2014-04-03 1891. 1894. 1883. 1889. 2014-05-03 NA
#> 10 2014-04-02 1887. 1893. 1884. 1891. 2014-05-02 NA
#> # … with 60 more rows
Created on 2020-02-22 by the reprex package (v0.3.0)
DATA
tidySP500 <- read.so::read_so('Date Open High Low Close Date30
1 2014-04-15 1831.45 1844.02 1816.29 1842.98 2014-05-15
2 2014-04-14 1818.18 1834.19 1815.80 1830.61 2014-05-14
3 2014-04-11 1830.65 1835.07 1814.36 1815.69 2014-05-11
4 2014-04-10 1872.28 1872.53 1830.87 1833.08 2014-05-10
5 2014-04-09 1852.64 1872.43 1852.38 1872.18 2014-05-09
6 2014-04-08 1845.48 1854.95 1837.49 1851.96 2014-05-08
7 2014-04-07 1863.92 1864.04 1841.48 1845.04 2014-05-07
8 2014-04-04 1890.25 1897.28 1863.26 1865.09 2014-05-04
9 2014-04-03 1891.43 1893.80 1882.65 1888.77 2014-05-03
10 2014-04-02 1886.61 1893.17 1883.79 1890.90 2014-05-02
11 2014-04-01 1873.96 1885.84 1873.96 1885.52 2014-05-01
12 2014-03-31 1859.16 1875.18 1859.16 1872.34 2014-04-30
13 2014-03-28 1850.07 1866.63 1850.07 1857.62 2014-04-27
14 2014-03-27 1852.11 1855.55 1842.11 1849.04 2014-04-26
15 2014-03-26 1867.09 1875.92 1852.56 1852.56 2014-04-25
16 2014-03-25 1859.48 1871.87 1855.96 1865.62 2014-04-24
17 2014-03-24 1867.67 1873.34 1849.69 1857.44 2014-04-23
18 2014-03-21 1874.53 1883.97 1863.46 1866.52 2014-04-20
19 2014-03-20 1860.09 1873.49 1854.63 1872.01 2014-04-19
20 2014-03-19 1872.25 1874.14 1850.35 1860.77 2014-04-18
21 2014-03-18 1858.92 1873.76 1858.92 1872.25 2014-04-17
22 2014-03-17 1842.81 1862.30 1842.81 1858.83 2014-04-16
23 2014-03-14 1845.07 1852.44 1839.57 1841.13 2014-04-13
24 2014-03-13 1869.06 1874.40 1841.86 1846.34 2014-04-12
25 2014-03-12 1866.15 1868.38 1854.38 1868.20 2014-04-11
26 2014-03-11 1878.26 1882.35 1863.88 1867.63 2014-04-10
27 2014-03-10 1877.86 1877.87 1867.04 1877.17 2014-04-09
28 2014-03-07 1878.52 1883.57 1870.56 1878.04 2014-04-06
29 2014-03-06 1874.18 1881.94 1874.18 1877.03 2014-04-05
30 2014-03-05 1874.05 1876.53 1871.11 1873.81 2014-04-04
31 2014-03-04 1849.23 1876.23 1849.23 1873.91 2014-04-03
32 2014-03-03 1857.68 1857.68 1834.44 1845.73 2014-04-02
33 2014-02-28 1855.12 1867.92 1847.67 1859.45 2014-03-30
34 2014-02-27 1844.90 1854.53 1841.13 1854.29 2014-03-29
35 2014-02-26 1845.79 1852.65 1840.66 1845.16 2014-03-28
36 2014-02-25 1847.66 1852.91 1840.19 1845.12 2014-03-27
37 2014-02-24 1836.78 1858.71 1836.78 1847.61 2014-03-26
38 2014-02-21 1841.07 1846.13 1835.60 1836.25 2014-03-23
39 2014-02-20 1829.24 1842.79 1824.58 1839.78 2014-03-22
40 2014-02-19 1838.90 1847.50 1826.99 1828.75 2014-03-21
41 2014-02-18 1839.03 1842.87 1835.01 1840.76 2014-03-20
42 2014-02-14 1828.46 1841.65 1825.59 1838.63 2014-03-16
43 2014-02-13 1814.82 1830.25 1809.22 1829.83 2014-03-15
44 2014-02-12 1820.12 1826.55 1815.97 1819.26 2014-03-14
45 2014-02-11 1800.45 1823.54 1800.41 1819.75 2014-03-13
46 2014-02-10 1796.20 1799.94 1791.83 1799.84 2014-03-12
47 2014-02-07 1776.01 1798.03 1776.01 1797.02 2014-03-09
48 2014-02-06 1752.99 1774.06 1752.99 1773.43 2014-03-08
49 2014-02-05 1753.38 1755.79 1737.92 1751.64 2014-03-07
50 2014-02-04 1743.82 1758.73 1743.82 1755.20 2014-03-06
51 2014-02-03 1782.68 1784.83 1739.66 1741.89 2014-03-05
52 2014-01-31 1790.88 1793.88 1772.26 1782.59 2014-03-02
53 2014-01-30 1777.17 1798.77 1777.17 1794.19 2014-03-01
54 2014-01-29 1790.15 1790.15 1770.45 1774.20 2014-02-28
55 2014-01-28 1783.00 1793.87 1779.49 1792.50 2014-02-27
56 2014-01-27 1791.03 1795.98 1772.88 1781.56 2014-02-26
57 2014-01-24 1826.96 1826.96 1790.29 1790.29 2014-02-23
58 2014-01-23 1842.29 1842.29 1820.06 1828.46 2014-02-22
59 2014-01-22 1844.71 1846.87 1840.88 1844.86 2014-02-21
60 2014-01-21 1841.05 1849.31 1832.38 1843.80 2014-02-20
61 2014-01-17 1844.23 1846.04 1835.23 1838.70 2014-02-16
62 2014-01-16 1847.99 1847.99 1840.30 1845.89 2014-02-15
63 2014-01-15 1840.52 1850.84 1840.52 1848.38 2014-02-14
64 2014-01-14 1821.36 1839.26 1821.36 1838.88 2014-02-13
65 2014-01-13 1841.26 1843.45 1815.52 1819.20 2014-02-12
66 2014-01-10 1840.06 1843.15 1832.43 1842.37 2014-02-09
67 2014-01-09 1839.00 1843.23 1830.38 1838.13 2014-02-08
68 2014-01-08 1837.90 1840.02 1831.40 1837.49 2014-02-07
69 2014-01-07 1828.71 1840.10 1828.71 1837.88 2014-02-06
70 2014-01-06 1832.31 1837.16 1823.73 1826.77 2014-02-05')
Good afternoon
I am not an R user, but I need to get the difference between the first date and last date within RFID, to create a new column X. Therefore, the first value needs to be 1 (not zero), the second 2, ..., n.
Here an example of the data.
Thanks in advance.
RFID visit_date ADFI location
985152014315936 2017-11-25 2133 16
985152014315936 2017-11-26 2186 16
985152014315936 2017-11-27 3489 16
985152014315936 2017-11-28 2432 16
985152014315937 2017-11-24 15 17
985152014315937 2017-11-25 1512 17
985152014315937 2017-11-26 2378 17
985152014315937 2017-11-27 3241 17
985152014315938 2017-11-24 584 17
985152014315938 2017-11-25 1689 17
985152014315938 2017-11-26 2807 17
985152014315938 2017-11-27 2369 17
985152014315938 2017-11-28 2576 17
985152014315939 2017-11-25 1084 17
985152014315939 2017-11-26 3489 17
985152014315939 2017-11-27 2630 17
985152014315939 2017-11-28 3585 17
985152014315939 2017-11-29 3433 17
985152014315939 2017-11-30 2962 17
Here is a solution using dplyr and lubridate:
require(tidyverse);
require(lubridate);
df %>% group_by(RFID) %>% mutate(X = max(ymd(visit_date)) - min(ymd(visit_date)));
## A tibble: 19 x 5
## Groups: RFID [4]
# RFID visit_date ADFI location X
# <dbl> <fct> <int> <int> <time>
# 1 985152014315936 2017-11-25 2133 16 3
# 2 985152014315936 2017-11-26 2186 16 3
# 3 985152014315936 2017-11-27 3489 16 3
# 4 985152014315936 2017-11-28 2432 16 3
# 5 985152014315937 2017-11-24 15 17 3
# 6 985152014315937 2017-11-25 1512 17 3
# 7 985152014315937 2017-11-26 2378 17 3
# 8 985152014315937 2017-11-27 3241 17 3
# 9 985152014315938 2017-11-24 584 17 4
#10 985152014315938 2017-11-25 1689 17 4
#11 985152014315938 2017-11-26 2807 17 4
#12 985152014315938 2017-11-27 2369 17 4
#13 985152014315938 2017-11-28 2576 17 4
#14 985152014315939 2017-11-25 1084 17 5
#15 985152014315939 2017-11-26 3489 17 5
#16 985152014315939 2017-11-27 2630 17 5
#17 985152014315939 2017-11-28 3585 17 5
#18 985152014315939 2017-11-29 3433 17 5
#19 985152014315939 2017-11-30 2962 17 5
Sample data
df <- read.table(text =
"RFID visit_date ADFI location
985152014315936 2017-11-25 2133 16
985152014315936 2017-11-26 2186 16
985152014315936 2017-11-27 3489 16
985152014315936 2017-11-28 2432 16
985152014315937 2017-11-24 15 17
985152014315937 2017-11-25 1512 17
985152014315937 2017-11-26 2378 17
985152014315937 2017-11-27 3241 17
985152014315938 2017-11-24 584 17
985152014315938 2017-11-25 1689 17
985152014315938 2017-11-26 2807 17
985152014315938 2017-11-27 2369 17
985152014315938 2017-11-28 2576 17
985152014315939 2017-11-25 1084 17
985152014315939 2017-11-26 3489 17
985152014315939 2017-11-27 2630 17
985152014315939 2017-11-28 3585 17
985152014315939 2017-11-29 3433 17
985152014315939 2017-11-30 2962 17", header = T)
Using data.table:
data <- data.table(data)
data[, diff := max(as.Date(visit_date)) - min(as.Date(visit_date)), by = RFID]
of if you want to add 1:
data[, diff := max(as.Date(visit_date)) - min(as.Date(visit_date)) + 1, by = RFID]
Say I have this data frame, df,
Day value
1 2012-06-10 552
2 2012-06-10 4850
3 2012-06-11 4642
4 2012-06-11 4132
5 2012-06-11 4190
6 2012-06-12 4186
7 2012-06-13 1139
8 2012-06-13 490
9 2012-06-13 5156
10 2012-06-13 4430
11 2012-06-13 4447
12 2012-06-14 4256
13 2012-06-14 3856
14 2012-06-14 1163
15 2012-06-17 564
16 2012-06-17 4866
17 2012-06-17 4421
18 2012-06-19 4206
19 2012-06-20 4272
20 2012-06-20 3993
21 2012-06-20 1211
22 2012-07-21 698
23 2012-07-21 5770
24 2012-07-21 5103
25 2012-07-21 775
26 2012-07-21 5140
27 2012-07-22 4868
I would like a to create a data.frame, dfvar, that would contain the daily variance: something like:
Day Variance
1 2012-06-10 9236402
2 2012-06-11 X
3 2012-06-12 4186
4 2012-06-13 1139
5 2012-06-14 4256
6 2012-06-17 564
7 2012-06-19 4206
8 2012-06-20 4272
9 2012-07-21 698
10 2012-07-22 4868
So for example, I computed it, the entry
dfvar$Variance[1] = var(c(552, 4850))
I tried to do
dfvar <- aggregate(df, by = list(Day), FUN = var)
but this isn't the input I expected. I really want to have the variance of the values of the same day, without the other days...
Any ideas about that?
Is this what you want ?
library(dplyr)
df%>%group_by(Day)%>%dplyr::summarise(Variance=var(value))#return NA if only one value within the group
Day Variance
<fctr> <dbl>
1 2012-06-10 9236402.00
2 2012-06-11 77961.33
3 2012-06-12 NA
4 2012-06-13 4615704.30
5 2012-06-14 2829816.33
6 2012-06-17 5596946.33
7 2012-06-19 NA
8 2012-06-20 2864514.33
9 2012-07-21 6422224.70
10 2012-07-22 NA
I have a weekly dataset of prices of a product. This product has many varieties, each with its own price. I am interested in calculating a weighted price depending on the sales volume of each.
I tried to do with a loop, but does not work.
Can someone help me?
Here, a minimal example of my dataset:
Any
nrow week variety price volume
1 10 Semiduro 911 15550
2 10 Semiduro 809 13400
3 10 Semiduro 611 15200
4 10 Semiduro 517 17250
5 10 Semiduro 389 4550
6 10 Semiduro 300 1500
7 10 Paisana(o) 1100 19200
8 10 Paisana(o) 726 22900
9 10 Paisana(o) 452 10450
10 11 Semiduro 1362 13250
11 11 Semiduro 1163 7100
12 11 Semiduro 1032 15580
13 11 Semiduro 768 9700
14 11 Semiduro 703 3670
15 11 Semiduro 550 1450
16 11 Paisana(o) 1825 20200
17 11 Paisana(o) 1402 30650
18 11 Paisana(o) 838 9750
19 12 Semiduro 1050 11350
20 12 Semiduro 878 9200
We could use dplyr
library(dplyr)
df1 %>%
group_by(week, variety) %>%
summarise(wprice = weighted.mean(price, volume))
# week variety wprice
# <int> <chr> <dbl>
#1 10 Paisana(o) 808.1598
#2 10 Semiduro 673.5663
#3 11 Paisana(o) 1452.2574
#4 11 Semiduro 1048.4625
#5 12 Semiduro 972.9976
My data is follow the sequence:
deptime .count
1 4.5 6285
2 14.5 5901
3 24.5 6002
4 34.5 5401
5 44.5 5080
6 54.5 4567
7 104.5 3162
8 114.5 2784
9 124.5 1950
10 134.5 1800
11 144.5 1630
12 154.5 1076
13 204.5 738
14 214.5 556
15 224.5 544
16 234.5 650
17 244.5 392
18 254.5 309
19 304.5 356
20 314.5 364
My ggplot code:
ggplot(pplot, aes(x=deptime, y=.count)) + geom_bar(stat="identity",fill='#FF9966',width = 5) + labs(x="time", y="count")
output figure
There are a gap between each 100. Does anyone know how to fix it?
Thank You