Best practices for drake pipeline experimentation - r

I'm new to drake but loving it so far. One thing I'm having trouble with is how to best go about experimenting with different pipeline configurations. That is, my plans consist purely of a chain of targets where the output from the first target is the input for the second, the second forms the input for the third, etc. My targets all have the same basic structure (dynamic targets with tibbles as individual entries) expected as input and supplied as output, and I want to experiment with different orderings, inclusion/exclusion of certain steps, etc. For example:
plan = drake::drake_plan(
a_transformed = target(
compute_a_transform(list_of_input_data)
, dynamic = map(list_of_input_data)
)
, b_transformed = target(
compute_b_transform(a_transformed)
, dynamic = map(a_transformed)
)
, c_transformed = target(
compute_c_transform(c_transformed)
, dynamic = map(c_transformed)
)
)
The way I've been using drake so far is that each target has a unique/meaningful name, so when I, for example, remove a target, I have to rename the input supplied to the subsequent target:
plan = drake::drake_plan(
a_transformed = target(
compute_a_transform(list_of_input_data)
, dynamic = map(list_of_input_data)
)
#, b_transformed = target(
# compute_b_transform(a_transformed)
# , dynamic = map(a_transformed)
#)
#note the b-transform step has been removed (commented-out), requiring inputs to c_transform to be changed from `b_transform` to `a_transform`
, c_transformed = target(
compute_c_transform(a_transformed) #had to rename things here
, dynamic = map(a_transformed) #and here
)
)
Would it be too much to hope that there's a better way of experimenting that doesn't require this manual commenting-out and renaming?

I worked out a method that is a bit of a hack but works for me. I simply add a skip argument to each function that triggers return of the input if TRUE:
compute_a_transform = function(x,skip=F){
if(skip){
return(x)
}
... #regular compute_a_transform stuff here
}
Then, when I want to skip a step in the processing chain, I simply set skip=TRUE without commenting-out or renaming anything
plan = drake::drake_plan(
a_transformed = target(
compute_a_transform(list_of_input_data)
, dynamic = map(list_of_input_data)
)
, b_transformed = target(
compute_b_transform(a_transformed, skip=TRUE) #skip=TRUE means the b-transform isn't actually applied
, dynamic = map(a_transformed)
)
, c_transformed = target(
compute_c_transform(c_transformed)
, dynamic = map(c_transformed)
)
)

Related

Run [[processors.regex]] over multiple measurements

Is it possible to run the regex preprocessor over multiple measurements like that?
[[processors.regex]]
namepass = ["measure1", "measure2"]
[[processors.regex.fields]]
key = "agent"
pattern = '^.*$'
replacement = "NORMAL"
result_key = "agent_type"
In my case two measurements both have an Access-Log as source ([[inputs.tail]]) but I want to keep them seperate as I want to compare both eventually.
To answer my own question: I'm not sure if this is how it's meant to be but a quickfix would looke like that:
[[processors.regex]]
namepass = ["measure1"]
[[processors.regex.fields]]
key = "agent"
pattern = '^.*$'
replacement = "NORMAL"
result_key = "agent_type"
[[processors.regex]]
namepass = ["measure2"]
[[processors.regex.fields]]
key = "agent"
pattern = '^.*$'
replacement = "NORMAL"
result_key = "agent_type"
Unfortunately it contains duplicated code which is bad.

R Shiny - Operation not allowed without an active reactive context

This is my first time creating a shiny application, and I am getting this error message:
Error in .getReactiveEnvironment()$currentContext() :
Operation not allowed without an active reactive context. (You tried to do something that can only be done from inside a reactive expression or observer.)
I suspect it has something to do with using the slider's value to do my calculation. Please assist, and thanks in advance!
UI Start
header = dashboardHeader(title = "Fifa player valuation")
sidebar = dashboardSidebar(
sidebarMenu(
menuSubItem("Player Valuation - Striker", tabName = "StrikerTab"),
menuSubItem("Player Valuation - Midfielder", tabName = "MidfielderTab"),
menuSubItem("Player valuation - Goalkeeper", tabName = "GkTab")
)
)
ageSlider = sliderInput('age', 'Enter your age in years', min = 0, max = 150, value = 25)
dribblingSlider = sliderInput('dribbling_skills', 'Enter your dribbling skills', min = 0, max = 100, value = 70)
tacklingSlider = sliderInput('tackling_skills', 'Enter your tackling skills', min = 0, max = 100, value = 70)
shootingSlider= sliderInput("shooting_skills", "Enter your shooting skills", min = 0, max = 100, value = 70)
body = dashboardBody(
tabItems(
tabItem(tabName = "StrikerTab",ageSlider,dribblingSlider, tacklingSlider, shootingSlider,verbatimTextOutput("strikerValue")),
tabItem(tabName = "MidfielderTab",ageSlider,dribblingSlider, tacklingSlider),
tabItem(tabName = "GkTab", ageSlider, gkSlider)
),
verbatimTextOutput("playerValueTxt")
)
ui = dashboardPage(header, sidebar, body, skin = "green")
UI end
Server Start
strikerProjectedValue = function(age, dribbling, tackling = 0, shooting = 0){
return (-250.19*age) + (1987.33*dribbling) + (4439.32*tackling) + (3232.44*shooting)
}
server = function(input, output) {
predictedStrikerValue = strikerProjectedValue(input$age, 150)
output$strikerValue = renderPrint(predictedStrikerValue)
}
Probably someone with more programming, and especially shiny programming, can give you a better answer. Your question can use quite some clarification in my opinion, it is a bit unclear to me what the exact results is that you want / what your rational is behind this code. Despite that I will try to give some answer. Maybe some of the things I mention are 'personal' as I would not program a shiny like this.
Personally, I would not use a function in Shiny, rather use the reactive({}) statement. I do not know why you would prefer the function over reactive, especially as you have a lot of sliders that you do not use anywhere. So what is the point of having all these expressions in your function?
In addition I do not really understand why you want every step to be written into a separate vector. Specifically, I mean the part of
predictedStrikerValue = strikerProjectedValue(input$age, 150)
output$strikerValue = renderPrint(predictedStrikerValue)
Maybe you have some plans to use them later but again that is not clear from your description. Why not putting renderPrint(strikerProjectedValue(input$age, 150))? (this is why I stated in the beginning your question might need some extra explanation ;), because there could be a good rational to do this).
In order to get your code working I replaced everything in your code of server with
output$strikerValue = renderText((-250.19*input$age) +
(1987.33*input$dribbling_skills) +
(4439.32*input$tackling_skills) +
(3232.44*input$shooting_skills))
In order to get exactly the same as in your function you will have to put the sliders initially on 0.
Another way to do it, which is more like your initial function:
predictedStrikerValue<- reactive({(-250.19*input$age) + (1987.33*input$dribbling_skills)
})
output$strikerValue = renderText(predictedStrikerValue())
Similar to the code above you can add all the other parts of the function to get it completely reactive.

Format number into K(thousand), M(million) in Shiny DataTables

I'm looking for a straight forward way to change the formatting of numbers into K,M in shiny dataTables. Preferably with something like formatCurrency. I don't want to write k, m functions to convert number into string in order to do the formatting as it makes it difficult to sort rows by value.
There's no built-in way to do this, but it's not too bad to write your own format function in JavaScript that doesn't break row sorting.
See Column Rendering in the DT docs for how to do this: https://rstudio.github.io/DT/options.html
And this will also help:
https://datatables.net/reference/option/columns.render
Here's an example of a custom thousands formatter that rounds to 1 decimal place:
library(DT)
formatThousands <- JS(
"function(data) {",
"return (data / 1000).toFixed(1) + 'K'",
"}")
datatable(datasets::rock, rownames = FALSE, options = list(
columnDefs = list(list(
targets = 0:1, render = formatThousands
))
))
Alternatively, if you want a non-JavaScript method, you could use the colFormat function used with the reactable package. Unfortunately, there is no automatic millions option but it's pretty easy to replicate if you divide the original data and add the labels on with colFormat.
Product <- c('Apples','Oranges','Pears')
Revenue <- c(212384903, 23438872, 26443879)
df <- data.frame(Product,Revenue)
df$Revenue_millions <- dfeg$Revenue/1000000
reactable(df,
showSortable = TRUE,
columns = list(
Revenue_millions = colDef(format = colFormat(prefix = "£", separators = TRUE,digits=1,suffix = "m"))))
The data should now sort correctly
If you are using DataTables, to get the data as Unit format i.e
10000 -> 10K
we can use render function
"render": function ( data ) {
if(data > 999 && data < 1000000) {
return data/1000+' K'
}
else if(data > 1000000){
return data/1000000+' M'
}
else{
return data
}
}
}

DT conditional formatting for column

I need some help with conditional formatting for DT::datatable. I would like to highlight a couple of names in the following example by printing them in italics. The names which need to be highlighted are in a vector name.highlight <- c("ABC","JKL")
require(DT)
mydf <- data.frame(name=c("ABC","DEF","GHI","JKL","MNO","PQR"), value=1:6)
DT::datatable(mydf)
Based on what I see here and here, seems like I need to use render. I have no idea how to write the JS code or how I can pass in a vector/container with all the strings which need to be highlighted.
datatable(mydf, options = list(columnDefs = list(list(
targets = 0, render = JS("function(data, type, full, meta) {", ..., "}")
))))
Thanks.
datatable(mydf, options = list(columnDefs = list(list(
targets = 0, render = JS(
"function(data, type, full, meta) {",
"italic_words=['ABC','JKL']",
"return type === 'display' && italic_words.indexOf(data) != -1 ?",
"'<i>'+data+'</i>' : data;",
"}")
))))
I defined the italic_words variable in the javascript function. The variable contains an array of all the words you want in italic. Then I used the indexOf() javascript function. If the name isn't in the variable italic_words, this function will return -1, and the name will not be italicized.

Avoiding consideration of enclosing frames when retrieving field value of a S4 Reference Class

I'm a huge fan of S4 Reference Classes as they allow for a hybrid programming style (functional/pass-by-value vs. oop/pass-by-reference; example) and thus increase flexibility dramatically.
However, I think I just came across an undesired behavior with respect to the way R scans through environments/frames when you ask it to retrieve a certain field value via method $field() (see help page). The problem is that R also seems to look in enclosing environments/frames if the desired field is not found in the actual local/target environment (which would be the environment making up the S4 Reference Class), i.e. it's just like running get(<objname>, inherits=TRUE) (see help page).
Actual question
In order to have R just look in the local/target environment, I was thinking something like $field(name="<fieldname>", inherits=FALSE) but $field() doesn't have a ... argument that would allow me to pass inherits=FALSE along to get() (which I'm guessing is called somewhere along the way). Is there a workaround to this?
Code Example
For those interested in more details: here's a little code example illustrating the behavior
setRefClass("A", fields=list(a="character"))
x <- getRefClass("A")$new(a="a")
There is a field a in class A, so it's found in the target environment and the value is returned:
> x$field("a")
[1] "a"
Things look differently if we try to access a field that is not a field of the reference class but happens to have a name identical to that of some other object in the workspace/searchpath (in this case "lm"):
require("MASS")
> x$field("lm")
function (formula, data, subset, weights, na.action, method = "qr",
model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE,
contrasts = NULL, offset, ...)
{
ret.x <- x
ret.y <- y
[omitted]
if (!qr)
z$qr <- NULL
z
}
<bytecode: 0x02e6b654>
<environment: namespace:stats>
Not really what I would expect at this point. IMHO an error or at least a warning would be much better. Or opening method $field() for arguments that can be passed along to other functions via .... I'm guessing somewhere along the way get() is called when calling $field(), so something like this could prevent the above behavior from occurring:
x$field("digest", inherits=FALSE)
Workaround: own proposal
This should do the trick, but maybe there's something more elegant that doesn't involve the specification of a new method on top of $field():
setRefClass("A", fields=list(a="character"),
methods=list(
myField=function(name, ...) {
# VALIDATE NAME //
if (!name %in% names(getRefClass(class(.self))$fields())) {
stop(paste0("Invalid field name: '", name, "'"))
}
# //
.self$field(name=name)
}
)
)
x <- getRefClass("A")$new(a="a")
> x$myField("a")
[1] "a"
> x$myField("lm")
Error in x$myField("lm") : Invalid field name: 'lm'
The default field() method can be replaced with your own. So adding an inherits argument to avoid the enclosing frames is simply a matter of grabbing the existing x$field definition and adding it...
setRefClass( Class="B",
fields= list( a="character" ),
methods= list(
field = function(name, value, inherits=TRUE ) {
if( missing(value) ) {
get( name, envir=.self, inherits=inherits )
} else {
if( is.na( match( name, names( .refClassDef#fieldClasses ) ) ) ) {
stop(gettextf("%s is not a field in this class", sQuote(name)), domain = NA)
}
assign(name, value, envir = .self)
}
}
),
)
Or you could have a nice error message with a little rearranging
setRefClass( Class="C",
fields= list( a="character" ),
methods= list(
field = function(name, value, inherits=TRUE ) {
if( is.na( match( name, names( .refClassDef#fieldClasses ) ) ) &&
( !missing(value) || inherits==FALSE) ) {
stop(gettextf("%s is not a field in this class", sQuote(name)), domain = NA)
}
if( missing(value) ) {
get( name, envir=.self, inherits=inherits )
} else {
assign(name, value, envir = .self)
}
}
),
)
Since you can define any of your own methods to replace the defaults pretty much any logic you want can be implemented for your refclasses. Perhaps an error if the variable is acquired using inheritance but the mode matches to c("expression", "name", "symbol", "function") and warning if it doesn't directly match the local refClass field names?

Resources