R: Smoothing Time Series Data by Item - r

I have a data series that displays purchases over time by item ID. Purchasing habits are irregular, so I would like to smooth this data out over time and by item ID.
If items had orders placed more regularly (i.e. Every day) we could better plot/evaluate our ordering and set stocking levels. However, some people will purchase excess of an item so they don't have to restock. This then is skewing our par level data (Since a 1 day total could really be a week's worth of product since they could only be ordering once per week.
Reproducible Example:
POData <- structure(list(a = structure(c(1499918400, 1499918400, 1499918400,
1499918400, 1499918400, 1499918400, 1496376000, 1497412800, 1497412800,
1497412800, 1497412800, 1497412800, 1497240000, 1497412800, 1497412800,
1497412800, 1501214400, 1496376000, 1496376000, 1496376000, 1496289600,
1496289600, 1496289600, 1496289600, 1496289600, 1496289600, 1501214400,
1501214400, 1501214400, 1501214400), class = c("POSIXct", "POSIXt"
), tzone = ""), b = c(446032L, 101612L, 37740L, 482207L, 152360L,
4483L, 482207L, 141729L, 81192L, 482207L, 85273L, 142955L, 460003L,
142955L, 17752L, 29763L, 309189L, 361905L, 17396L, 410762L, 437420L,
17752L, 18002L, 150698L, 163342L, 433332L, 150587L, 44159L, 433332L,
446032L), c = c(4, 1, 25, 1, 1, 1, 3, 12, 12, 1, 1, 1, 300, 1,
1, 2, 6, 6, 2, 1, 1, 1, 1, 1, 1, 1, 40, 2, 1, 2)), .Names = c("PO Date",
"PS Item ID", "PO Qty"), row.names = c(NA, 30L), class = "data.frame")
This is probably a simple question, but I hope someone has a simple way to do this.

You could use something like this
require(zoo)
require(dply)
df2 = POData %>%
arrange(`PS Item ID`,`PO Date`)%>%
group_by(`PS Item ID`)%>%
mutate(temp_lag1 = lag( `PO Qty`))%>%
mutate(temp.5.previous = rollapply(data = temp_lag1,
width = 2,
FUN = mean,
align = "left",
fill = `PO Qty`,
na.rm = T))
It essentially groups by PS Item ID and arranges by PS Item ID and PO Date. The width argument in mutate specifies how far you would like to go back for a moving average. As of now its set to 1 because your data is not that extensive by product ID.

Related

Export manually edited htmlwidget to SVG or similar

I often create Sankey-diagrams in R via {sankeyD3}, because it seems to be the package with the most options/features to do so. However, one feature that is missing is the ability to set the order of nodes on the y-axis (although this issue tried to fix that?).
Therefore, I must arrange the nodes manually afterwards. I can do this by setting dragY = TRUE when creating the diagram and then exporting it to an html file via htmlwidgets::saveWidget(). This allows me to manually drage the nodes when opening the html file.
reprex
links <- data.frame(
source = c(0, 0, 0, 1, 2, 3, 4, 4),
target = c(1, 2, 3, 4, 4, 4, 5, 6),
value = c(2, 3, 4, 2, 3 , 4, 4, 5)
)
nodes <- data.frame(
label = c("A1", "B1", "B3", "B2", "C1", "D1", "D2"),
yOrder = c(1, 1, 3, 2, 1, 1, 2)
)
out <- sankeyD3::sankeyNetwork(
Links = links,
Nodes = nodes,
Source = "source",
Target = "target",
Value = "value",
NodeID = "label",
fontFamily = "Arial",
fontSize = 12,
numberFormat = ",.1s",
height = 500,
width = 700,
dragY = TRUE)
htmlwidgets::saveWidget(out,
file = here::here("out.html"),
selfcontained = TRUE)
and here is a screenshot showing the exported html on the left and the one where I manually rearranged the nodes on the right:
Question
My goal is to insert the edited diagram into a word-document in the best possible quality. So I guess I want to know how to export the edited html-file to a SVG format or similar?
Open the result in a browser, make any manual adjustments you want, then use an SVG extractor like https://nytimes.github.io/svg-crowbar/ to save it as an SVG.

Change Size of Additional Flextable Header

I have been using add_header_lines() to add a header to my flextable. I would like this header to have a different text format than the table as a whole. Unfortunately, when I use fontsize() it formats text for both the header and the column names. Is there a way to only change the text format for the header produced with add_header_lines()?
Here is some sample code:
df <- data.frame(crime = c("assault", "homicide", "burglary"),
Jan = c(5, 2, 7),
Feb = c(2, 4, 0),
Mar = c(1, 2, 1))
flex <- flextable(df)
flex <- add_header_lines(flex, values = "This is a header")
flex <- fontsize(flex, size = 15, part = "header")
As you can see, the fontsize function affects both the added header and the column values. Any idea of how to change just the headers text format?
You have to use the selector i.
library(flextable)
df <- data.frame(crime = c("assault", "homicide", "burglary"),
Jan = c(5, 2, 7),
Feb = c(2, 4, 0),
Mar = c(1, 2, 1))
flex <- flextable(df)
flex <- add_header_lines(flex, values = "This is a header")
flex <- fontsize(flex, size = 15, part = "header", i = 1)
flex <- color(flex, color = "red", part = "header", i = 1)
flex

R haven: accessing column label from imported SPSS file

I have a dataset in SPSS that I am reading into R using the 'haven' library
df <- structure(list(SC155Q09HA = structure(c(2, 1, 1, 2, 1, 2, 3,
4, 3, 1), label = "School's capacity using digital devices: An effective online learning support platform is available", labels = c(`Strongly disagree` = 1,
Disagree = 2, Agree = 3, `Strongly agree` = 4, `Valid Skip` = 5,
`Not Applicable` = 7, Invalid = 8, `No Response` = 9), class = "haven_labelled")), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))
I'm trying to extract the label from the dataframe and can do this in base R:
library(tidyverse)
library(magrittr)
library(haven)
focus <- quo(SC156Q05HA)
attr(df$SC155Q09HA,"label")
>[1] "School's capacity using digital devices: An effective online learning support platform is available"
But not in in a dplyr style way with a variable for selection:
df[quo_name(focus)] %>% attr("label")
>NULL
df %>% select(!!focus) %>% attr("label")
>NULL
I understand that the two none-working examples return tibbles, whilst the first returns a labelled double. How do I make them equivalent?
You can do:
focus <- quo(SC155Q09HA) # Changed to match the data provided
df %>% pull(!!focus) %>% attr("label")
[1] "School's capacity using digital devices: An effective online learning support platform is available"
Your attempt using select() passes the tibble to attr() which doesn't have a label attribute, hence it returns NULL.
If you have multiple labels to extract use purrr::map_chr()
df %>% purrr::map_chr(attr, "label")

Joining multiple tables using dplyr

I am working on healthcare data. For the sake of simplicity, I am providing data on only one patient ID. Every patient has a unique ID and over a period of time, the doctors monitor the BCR_ABL value as shown in the table below.
structure(list(PatientId = c("Hospital1_124", "Hospital1_124",
"Hospital1_124", "Hospital1_124", "Hospital1_124", "Hospital1_124",
"Hospital1_124"), TestDate = c("2007-11-13", "2008-09-01", "2011-02-24",
"2013-05-01", "2016-02-16", "2017-05-12", "2017-08-29"), BCR_ABL = c(0.029,
0, 0, 0, 0, 100, 0)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -7L), .Names = c("PatientId", "TestDate",
"BCR_ABL"))
At the start of the treatment, each patient has a BCR_ABL value of 100 and ideally post treatment, this value should drop down to 0. The patients undergo tests for BCR_ABL at various stages as shown in the TestDate column.
The patients also visit the hospital for follow up visits and this is recorded in another table which contains the followup date as well as the date of starting of the medication. The table looks like this:
structure(list(PatientId = c("Hospital1_124", "Hospital1_124",
"Hospital1_124", "Hospital1_124"), FollowupDate = structure(c(11323,
17298, 17407, 17553), class = "Date"), dateofStarting = structure(c(11323,
17318, 17318, 17318), class = "Date"), nameTKI = c("Imatinib",
"Imatinib", "Imatinib", "Imatinib"), brandTKI = c("Glivec", "Glivec",
"Glivec", "Glivec"), dailydose = c("100", "400", "400", "400"
)), class = "data.frame", row.names = c(NA, -4L), .internal.selfref = <pointer: 0x0>, .Names = c("PatientId",
"FollowupDate", "dateofStarting", "nameTKI", "brandTKI", "dailydose"
))
Now the aim of the analysis is to find out the efficacy of the drug (nameTKI) being prescribed. To my mind, the best representation would be a line graph with Date on the x-axis and BCR_ABL on the y-axis. However, I am stuck on how do I go about combining the dates. I am looking at a new table which has the following variables: PatientId, Date, BCR_ABL, nameTKI, brandTKI and dailydose. I don't think the follow up date has too much of a significance. So negelecting it, the Date variable needs to be a combination of TestDate from the first table and dateofStarting from the second table, arranged chronologically for all the individual patients (I could use group_by() for that). The value for BCR_ABL would start off as 100, till the value obtained after the first test and then follow those values for all the Date entries.
I have been trying various joins from dplyr without any success. Would appreciate some help please.
A bit hard to follow your code there, but you could join the tables together using the PatientId as the primary key. However, you should think carefully about the structure of the data as well. If the first table is at the patient/test level and the second is supposed to just be at the patient level; why are there multiple dateofStarting values for a single PatientId?
library(tidyverse)
t1 <- data.frame(PatientId = rep("Hospital1_124", 7),
TestDate = as.Date(c("2007-11-13", "2008-09-01", "2011-02-24", "2013-05-01",
"2016-02-16", "2017-05-12", "2017-08-29")),
BCR_ABL = c(0.029, 0, 0, 0, 0, 100, 0),
stringsAsFactors = FALSE)
t2 <- data.frame(PatientId = rep("Hospital1_124", 4),
FollowupDate = as.Date(c(11323, 17298, 17407, 17553), origin = "1970-01-01"),
dateofStarting = as.Date(c(11323, 17318, 17318, 17318), origin = "1970-01-01"),
nameTKI = rep("Imatinib", 4),
brandTKI = rep("Glivec", 4),
dailydose = c(100, 400, 400, 400),
stringsAsFactors = FALSE)
data <- t2 %>%
select(-FollowupDate) %>%
inner_join(t1, by = c("PatientId" = "PatientId"))

Highlighting the maximum value of each column of data frame in R

I have a data.frame in R :
p=structure(list(WSbin01 = c(214.98151752527, -46.9493685420515,
154.726947679253), WSbin02 = c(1093.46050365665, 420.318207941967,
927.97317496775), WSbin03 = c(2855.24990411661, 2035.57575481323,
2662.2595957214), WSbin04 = c(5863.91399544626, 4881.81544665127,
5625.17650575444), WSbin05 = c(9891.70254019722, 8845.32506336827,
9666.14583347469), WSbin06 = c(14562.1527820802, 13401.1727730953,
14321.601249974), WSbin07 = c(19091.1307681137, 18003.2115315665,
18903.0179613827), WSbin08 = c(24422.7094972645, 23694.5453703207,
24357.8071162775), WSbin09 = c(30215.4088114124, 30214.3195264298,
30310.242671113), WSbin10 = c(36958.2122031382, 37964.9044838778,
37239.6908819524), WSbin11 = c(41844.810779792, 43701.2643596447,
42343.7442683171), WSbin12 = c(37616.8187087318, 39348.3188777835,
38178.9009247311), WSbin13 = c(20953.0973658833, 21720.1930292221,
21251.8654076726), WSbin14 = c(7155.3786781173, 7262.61983182254,
7233.60584469268), WSbin15 = c(2171.61052809769, 2120.97045661101,
2173.49396732091), WSbin16 = c(779.72276608943, 745.52198490267,
767.81436310063)), .Names = c("WSbin01", "WSbin02", "WSbin03",
"WSbin04", "WSbin05", "WSbin06", "WSbin07", "WSbin08", "WSbin09",
"WSbin10", "WSbin11", "WSbin12", "WSbin13", "WSbin14", "WSbin15",
"WSbin16"), class = "data.frame", row.names = c(NA, -3L))
I would like to set a background color for the maximum value of each column.
Using DT::datatable would return the table but I don't know how to set the formatStyle parameters to return the max value in each column in different color.
Furthermore, I have a vector z= c(1, 1, 1, 1, 1, 1, 1, 1, 3, 2, 2, 2, 2, 2, 3, 1) . I wanna have the background color in each column like if z[i]=1 column i should be green, if z[i]=2 then column i should be red and if z[i]=3 the column i should be blue.
Combining parts of the dt guide (https://rstudio.github.io/DT/010-style.html) and this q (Datatable: apply different formatStyle to each column), I get this:
colors <- apply(col2rgb(rainbow(n=ncol(p))),2,function(x)paste0("rgb(",paste(x,collapse=","),")"))
data <- datatable(p)
sapply(c(1:ncol(p)),function(x){
data <<- data %>% formatStyle(colnames(p)[[x]],backgroundColor = styleEqual(max(p[[x]]), colors[x]))
})
data
The answer to your second q is similar-
z= c(1, 1, 1, 1, 1, 1, 1, 1, 3, 2, 2, 2, 2, 2, 3, 1)
colors <- apply(col2rgb(rainbow(n=max(z))),2,function(x)paste0("rgb(",paste(x,collapse=","),")"))
data <- datatable(p)
sapply(c(1:ncol(p)),function(x){
data <<- data %>% formatStyle(
colnames(p)[[x]],
backgroundColor = colors[z[x]]
)
})
data

Resources