Recycle function parameters instead of writing out each time - r

I have a custom function fit_xgb which takes several paramaters:
fit_xgb(train_df = training_data,
test_df = testing_data,
training_features = c("spender", "spend_7d", "spend_30d", "d7_utility_sum", "recent_utility_ratio", "IOS",
"is_publisher_organic", "is_publisher_facebook"),
hyper_param = hyperparam_value,
binary_target = "spender",
regression_target = paste0("spend_", day_m, "d"),
spend_from = paste0("spend_", day_n, "d"),
spend_to = paste0("spend_", day_m, "d"))
I have another custom function fit_rf that takes the same paramaters:
fit_rf(train_df = training_data,
test_df = testing_data,
training_features = c("spender", "spend_7d", "spend_30d", "d7_utility_sum", "recent_utility_ratio", "IOS",
"is_publisher_organic", "is_publisher_facebook"),
hyper_param = hyperparam_value,
binary_target = "spender",
regression_target = paste0("spend_", day_m, "d"),
spend_from = paste0("spend_", day_n, "d"),
spend_to = paste0("spend_", day_m, "d"))
Rather than spell out the params each time I call either of these two functions, I'd like to create a single variable that I can call once:
model_function_params <- list(
train_df = training_data,
test_df = testing_data,
training_features = c("spender", "spend_7d", "spend_30d", "d7_utility_sum", "recent_utility_ratio", "IOS",
"is_publisher_organic", "is_publisher_facebook"),
hyper_param = hyperparam_value,
binary_target = "spender",
regression_target = paste0("spend_", day_m, "d"),
spend_from = paste0("spend_", day_n, "d"),
spend_to = paste0("spend_", day_m, "d"))
fit_rf(model_function_params)
fit_xgb(model_function_params)
This does not work. I know I would have to specify each list component with
training_data = model_function_params$train_df
test_df = model_function_params$test_df
etc
But that it almost going to defeat the purpose of writing less code and keeping my script minimal.
Is there an elegant way of defining the function parameters once, then passing to either fit_rf or fit_xgb without having to specify the parameters twice in my code?

Related

How can I make the group-by code to call a function from another package faster?

I have below code to compute a meta value using meta package:
probMetaControl <- long %>% group_by(ID, sample) %>% group_split() %>% mclapply(mc.cores = 10 ,function(endf){
message(endf$ID[1])
res <- meta::metagen(data = endf, studlab = ID, TE = expression , seTE = sd, sm = "SMD",
n.e = rep(1,nrow(endf)),
method.tau = "REML",
hakn = TRUE,
control = list(maxiter=1000))
data.frame(
ID = endf$ID[1],
sample = endf$sample[1],
meta.exprs = res$TE.fixed,
stringsAsFactors = F
)
}) %>% do.call(what = rbind) %>% as.data.frame()
the long dataframe has around 800,000 rows. The small part of long dataframe is as:
as.data.table(structure(list(ID = c("h:5982", "h:3310", "h:7849", "h:2978",
"h:7318"), pID = c("X1053_at", "X117_at", "X121_at", "X1255_g_at",
"X1294_at"), sd = c(0.228908614809978, 0.436455554523966, 0.210542866430305,
0.672545478318169, 0.26926204466525), sample = c("A", "B", "A",
"C", "A"), expression = c(6.53920197406645, 6.12380136266864,
8.01553257692446, 4.62636832157394, 7.58222133679378)), row.names = c(NA,
-5L), class = c("data.table", "data.frame")))
At the moment, this code takes 23 mins to run. Is there any way to make it faster?

Create an R function that normalizes data based on input values

I don't make to many complicated functions and typically stick with very basic ones. I have a question, how do I create a function that takes a dataset and normalizes based on desired normalization method and boxplots the output? Currently norm_method is different between the norm methods, was wondering if there is a way to call this in the start of function to pull through the correct method? Below is the code I created, but am stuck how to proceed.
library(reshape2) # for melt
library(cowplot)
demoData;
# target_deoData will need to be changed at some point
TestFunc <- function(demoData) {
# Q3 norm (75th percentile)
target_demoData <- normalize(demoData ,
norm_method = "quant",
desiredQuantile = .75,
toElt = "q_norm")
# Background normalization without spike
target_demoData <- normalize(demoData ,
norm_method = "neg",
fromElt = "exprs",
toElt = "neg_norm")
boxplot(assayDataElement(demoData[,1:10], elt = "q_norm"),
col = "red", main = "Q3",
log = "y", names = 1:10, xlab = "Segment",
ylab = "Counts, Q3 Normalized")
boxplot(assayDataElement(demoData[,1:10], elt = "neg_norm"),
col = "blue", main = "Neg",
log = "y", names = 1:10, xlab = "Segment",
ylab = "Counts, Neg. Normalized")
}
You might want to consider designing your normalize() and assayDataElement() functions to take ..., which provides more flexibility.
In lieu of that, given the examples above, you could make a simple configuration list, and elements of that configuration are passed to your normalize() and assayDataElement() functions, like this:
TestFunc <- function(demoData, method=c("quant", "neg")) {
method = match.arg(method)
method_config = list(
"quant" = list("norm_args" = list("norm_method" = "quant", desired_quantile = 0.75, "toElt" = "q_norm"),
"plot_args" = list("col"="red", main="Q3", ylab = "Counts, Q3 Normalized")),
"neg" = list("norm_args" = list("fromElt" = "exprs", "toElt" = "neg_norm"),
"plot_args" = list("col"="blue", main="Neg", ylab = "Counts, Neg Normalized"))
)
mcn = method_config[[method]][["norm_args"]]
mcp = method_config[[method]][["plot_args"]]
# normalize the data
target_demoData = do.call(normalize, c(list(data = demoData[1:10]), mcn))
# get the plot
boxplot(assayDataElement(
demoData[1:10], elt=mcp[["toElt"]],col = mcp[["col"],main = mcp[["main"]],
log = "y", names = 1:10, xlab = "Segment",ylab = mcp[["ylab"]]
)
}
Again, using this approach is not as flexible as ... (and consider splitting into two functions.. one that returns normalized data, and a second function that generates the plot..

Using map and pluck to get values from nested list

I have the following nasty, nested list
Edit: updated to include value_I_dont_want
mylist <- list(
list(
nested_1 = list(
nested_2 = list(
list( value_I_want = "a", value_I_dont_want = "f"),
list( value_I_want = "b", value_I_dont_want = "g")
)
)
),
list(
nested_1 = list(
nested_2 = list(
list( value_I_want = "c", value_I_dont_want = "h"),
list( value_I_want = "d", value_I_dont_want = "i"),
list( value_I_want = "e", value_I_dont_want = "j")
)
)
)
)
And I want to get all the value_I_wants
I know I can use the following code within a for loop
mylist[[x]]$nested_1$nested_2[[y]]$value_I_want
But I want to improve my map skills. I understand how to use map_chr when the list is a single level but I haven't found many resources on plucking from very nested lists. I also know I can use [[ but haven't found good documentation for when this is appropriate?
Any help appreciated!
If we need the 'yay's
library(purrr)
library(dplyr)
map(mylist, ~ .x$nested_1$nested_2 %>% unlist%>% grep("^yay", ., value = TRUE))
Or use pluck to extract the elements based on the key 'value_I_want' after looping over the list with map
map(mylist, ~ .x$nested_1$nested_2 %>%
map(pluck, "value_I_want") )
A more general solution that requires we only know how deeply the desired values are nested:
map(mylist, ~pluck(.,1,1) %>% map(pluck, "value_I_want"))
The second pluck operates on the nesting level set by the first pluck.
This can also work on nested lists that are missing intermediate names, as often found in JSON data pulled from the internet.

Trouble creating lists in R for the networkD3 package

I'd like to create the radial network above utilizing the R package networkD3. I read the guide here which utilizes lists to create radial networks. Unfortunately my R skills with lists are lacking. They're actually non-existent. Fortunately there's the R4DS guide here.
After reading everything I come up with this code below, to create the diagram above.
library(networkD3)
nd3 <- list(Start = list(A = list(1, 2, 3), B = "B"))
diagonalNetwork(List = nd3, fontSize = 10, opacity = 0.9)
Alas, my attempt fails. And subsequent attempts fail to generate anything that's close to the diagram above. I'm pretty sure it's my list that's wrong. Maybe you can show me the right list and things will start to make sense.
Jason!
The issue here is that the parameter nd3 has a very specific grammar of node name and children. So your code should look like this:
library(networkD3)
nd3 <- list(name = "Start", children = list(list(name = "A",
children = list(list(name = "1"),
list(name = "2"),
list(name = "3")
)),
list(name = "B")))
diagonalNetwork(List = nd3, fontSize = 10, opacity = 0.9)
If you're like me and the data frame/spreadsheet format is easier to wrap your head around, you could build an easy data frame with your data and then use data.tree functions to convert it to the list/json format...
library(data.tree)
library(networkD3)
source <- c("Start", "Start", "A", "A", "A")
target <- c("A", "B", "1", "2", "3")
df <- data.frame(source, target)
nd3 <- ToListExplicit(FromDataFrameNetwork(df), unname = T)
diagonalNetwork(List = nd3, fontSize = 10, opacity = 0.9)

Add values to rCharts hPlot tooltip

I would like to add some extra values to the standard Highcharts tooltip via rCharts. Example code:
require(rCharts)
df <- data.frame(x = c(1:5), y = c(5:1),
z = c("A", "B", "C", "D", "E"),
name = c("K", "L", "M", "N", "O"))
h1 <- hPlot(x = "x", y = "y", data = df, type = "scatter", group = "z")
This generates a tooltip with the x and y values. And the series name z as title. Now I also want the to add the name values to the tooltip. However I have no idea how this is done.
rCharts is a great package. But it still not well documented(Maybe I miss this point). I think you need to redefine new JS function for tooltip attribute.
Any JS literals need to be wrapped between #! and !# . Here a beginning but it doesn't work as I imagine ( I think is a good start):
h1$tooltip( formatter = "#! function() { return 'x: ' + this.point.x +
'y: ' + this.point.y +
'name: ' + this.point.group; } !#")
After several years, I have an answer.
It seems like these wrapper functions like hPlot() does not support additional tooltip variables even with a simple custom formatter function. See working example below based on the dataset from the question.
require(rCharts)
# create data frame
df <- data.frame(x = c(1:5), y = c(5:1),
z = c("A", "B", "C", "D", "E"),
name = c("K", "L", "M", "N", "O"))
# Plot using hPlot() approach
h1 <- hPlot(x = "x", y = "y", data = df, type = "scatter", group = "z")
h1$tooltip(borderWidth=0, followPointer=TRUE, followTouchMove=TRUE, shared = FALSE,
formatter = "#! function(){return 'X: ' + this.point.x + '<br>Y: ' + this.point.y + '<br>Z: ' + this.point.z + '<br>Name: ' + this.point.name;} !#")
h1
Tooltips do not work in the above example because the variables in the array are not named. See str(h1).
# Plot using manual build
h1 <- rCharts:::Highcharts$new()
dlev <- levels(factor(as.character(df$z)))
for(i in 1:length(dlev))
{
h1$series(data = toJSONArray2(df[df$z==dlev[i],,drop=F], json = F,names=T), name = dlev[i],type = c("scatter"), marker = list(radius = 3))
}
h1$tooltip(borderWidth=0, followPointer=TRUE, followTouchMove=TRUE, shared = FALSE,
formatter = "#! function(){return 'X: ' + this.point.x + '<br>Y: ' + this.point.y + '<br>Z: ' + this.point.z + '<br>Name: ' + this.point.name;} !#")
h1
This works because the array variables are named using names=T in the line starting h1$series.... See str(h1).
This sort of solves the tooltip issue, but there might be other problems with the named arrays. For example, it breaks things in a shiny-app environment. There must be a reason why hPlot() does not use the named arrays.

Resources