How to better analyze contents of a `call`? - r

In the package rootSolve, the function multiroot requires an input argument which is a collection of functions. I found a way to determine dynamically how many functions are contained in that input argument, but would like some help from the R community as to a cleaner approach. The example input function here (underdefined, but that doesn't matter so far as parsing it goes)
Kfunc<-function(x) {
z<-c( z1<- 4*var1 -3*var2 +5*var3, z2<-8*var1 +5*var2 -2*var3 )
}
Where the z1,z2,z3 are the outputs and "varJ" are the parameters to be determined
I came up with this stack of functions to find out how many separate functions are "inside" the definition of z :
bar <- parse(text = (parse(text = body(Kfunc)[2] )[[1]][3]))
length(bar[[1]])
#[1] 3
bar[[1]][1]
#c()
bar[[1]][2]
#(z1 <- 4 * var1 - 3 * var2 + 5 * var3)()
bar[[1]][3]
#(z2 <- 8 * var1 + 5 * var2 - 2 * var3)()
Showing that the number of equations is equal to length(bar[[1]]) - 1
Is there a faster/shorter/cleaner way, preferably without requiring a non-base library?

Related

r - check if columns renamed when unnesting

Is there a way to know if a table was renamed in the process of unnesting? I want to know if there is something where I can intercept any messages that come through with New names: and give more context about solutions
# min reprex
library(tidyverse)
f <- function() {
tibble(
x = 1:2,
y = 2:1,
z = tibble(x = 1)
) |>
unnest_wider(z, names_repair = "unique")
}
f()
New names:
• `x` -> `x...1`
• `x` -> `x...3`
x...1 y x...3
----- ----- -----
1 2 1
2 1 1
More context:
The message stems from
vctrs::vec_as_names(c("x", "x"), repair = "universal")
I see information about withCallingHandlers() but not sure if that is the right route. I thought there was a way for errors/messages to have classes that you can intercept but I can't remember what I read.
Something in testthat::expect_message() may help. I thought there would be a has_message() function out there.
There is a lot of tidy evaluation and comparing names before and after might be tricky. I could look for the names with the regex "\\.+\\d+$" but not sure that is robust enough since data could have fields with that syntax already.
Thank you!
Taking inspiration from hadley's answer, on the question that #ritchie-sacramento linked, you should check out the evaluate package.
> eval_res <- evaluate::evaluate("f()")
> eval_res[[2]]$message
[1] "New names:\n* x -> x...1\n* x -> x...3\n"
This will require more testing to see what happens to the data structure when there are errors, warnings, or even multiple messages. But this seems like the right track.

Changing Default dots argument in R

This question is related to this question and this question.
I need to assign a default to the ... argument in a function. I have successfully been able to to use the default package to accomplish this for specific arguments. For instance, lets say I want to allow toJSON from the jsonlite package to show more than four digits. The default is 4, but I want to show 10.
library(jsonlite)
library(default)
df <- data.frame(x = 2:5,
y = 2:5 / pi)
df
#> x y
#> 1 2 0.6366198
#> 2 3 0.9549297
#> 3 4 1.2732395
#> 4 5 1.5915494
# show as JSON - defaults to four digits
toJSON(df)
#> [{"x":2,"y":0.6366},{"x":3,"y":0.9549},{"x":4,"y":1.2732},{"x":5,"y":1.5915}]
# use default pacakge to change to 10
default(toJSON) <- list(digits = 10)
toJSON(df)
#> [{"x":2,"y":0.63661977237},{"x":3,"y":0.95492965855},{"x":4,"y":1.2732395447},{"x":5,"y":1.5915494309}]
There is another function called stream_out which uses toJSON but only uses the digits argument in ....
> stream_out(df)
{"x":2,"y":0.63662}
{"x":3,"y":0.95493}
{"x":4,"y":1.27324}
{"x":5,"y":1.59155}
Complete! Processed total of 4 rows.
>
> stream_out(df, digits = 10)
{"x":2,"y":0.63661977237}
{"x":3,"y":0.95492965855}
{"x":4,"y":1.2732395447}
{"x":5,"y":1.5915494309}
Complete! Processed total of 4 rows.
So even though I have changed the digits in toJSON, it isn't passed to the ... in stream_out. I cannot change this in the same manner as with toJSON.
> default(stream_out) <- list(digits = 10)
Error: 'digits' is not an argument of this function
This is not strictly a jsonlite question, but that is my use case here. I need to somehow change the ... argument of the stream_out function so that any time it is used, 10 digits are returned, rather than 4. However, any examples that show how to change defaults of ... arguments could probably be used to get to what I need.
Thanks!

Using cpquery function for several pairs from dataset

I am relatively beginner in R and trying to figure out how to use cpquery function for bnlearn package for all edges of DAG.
First of all, I created a bn object, a network of bn and a table with all strengths.
library(bnlearn)
data(learning.test)
baynet = hc(learning.test)
fit = bn.fit(baynet, learning.test)
sttbl = arc.strength(x = baynet, data = learning.test)
Then I tried to create a new variable in sttbl dataset, which was the result of cpquery function.
sttbl = sttbl %>% mutate(prob = NA) %>% arrange(strength)
sttbl[1,4] = cpquery(fit, `A` == 1, `D` == 1)
It looks pretty good (especially on bigger data), but when I am trying to automate this process somehow, I am struggling with errors, such as:
Error in sampling(fitted = fitted, event = event, evidence = evidence, :
logical vector for evidence is of length 1 instead of 10000.
In perfect situation, I need to create a function that fills the prob generated variable of sttbl dataset regardless it's size. I tried to do it with for loop to, but stumbled over the error above again and again. Unfortunately, I am deleting failed attempts, but they were smt like this:
for (i in 1:nrow(sttbl)) {
j = sttbl[i,1]
k = sttbl[i,2]
sttbl[i,4]=cpquery(fit, fit$j %in% sttbl[i,1]==1, fit$k %in% sttbl[i,2]==1)
}
or this:
for (i in 1:nrow(sttbl)) {
sttbl[i,4]=cpquery(fit, sttbl[i,1] == 1, sttbl[i,2] == 1)
}
Now I think I misunderstood something in R or bnlearn package.
Could you please tell me how to realize this task with filling the column by multiple cpqueries? That would help me a lot with my research!
cpquery is quite difficult to work with programmatically. If you look at the examples in the help page you can see the author uses eval(parse(...)) to build the queries. I have added two approaches below, one using the methods from the help page and one using cpdist to draw samples and reweighting to get the probabilities.
Your example
library(bnlearn); library(dplyr)
data(learning.test)
baynet = hc(learning.test)
fit = bn.fit(baynet, learning.test)
sttbl = arc.strength(x = baynet, data = learning.test)
sttbl = sttbl %>% mutate(prob = NA) %>% arrange(strength)
This uses cpquery and the much maligned eval(parse(...)) -- this is the
approach the the bnlearn author takes to do this programmatically in the ?cpquery examples. Anyway,
# You want the evidence and event to be the same; in your question it is `1`
# but for example using learning.test data we use 'a'
state = "\'a\'" # note if the states are character then these need to be quoted
event = paste(sttbl$from, "==", state)
evidence = paste(sttbl$to, "==", state)
# loop through using code similar to that found in `cpquery`
set.seed(1) # to make sampling reproducible
for(i in 1:nrow(sttbl)) {
qtxt = paste("cpquery(fit, ", event[i], ", ", evidence[i], ",n=1e6", ")")
sttbl$prob[i] = eval(parse(text=qtxt))
}
I find it preferable to work with cpdist which is used to generate random samples conditional on some evidence. You can then use these samples to build up queries. If you use likelihood weighting (method="lw") it is slightly easier to do this programatically (and without evil(parse(...))).
The evidence is added in a named list i.e. list(A='a').
# The following just gives a quick way to assign the same
# evidence state to all the evidence nodes.
evidence = setNames(replicate(nrow(sttbl), "a", simplify = FALSE), sttbl$to)
# Now loop though the queries
# As we are using likelihood weighting we need to reweight to get the probabilities
# (cpquery does this under the hood)
# Also note with this method that you could simulate from more than
# one variable (event) at a time if the evidence was the same.
for(i in 1:nrow(sttbl)) {
temp = cpdist(fit, sttbl$from[i], evidence[i], method="lw")
w = attr(temp, "weights")
sttbl$prob2[i] = sum(w[temp=='a'])/ sum(w)
}
sttbl
# from to strength prob prob2
# 1 A D -1938.9499 0.6186238 0.6233387
# 2 A B -1153.8796 0.6050552 0.6133448
# 3 C D -823.7605 0.7027782 0.7067417
# 4 B E -720.8266 0.7332107 0.7328657
# 5 F E -549.2300 0.5850828 0.5895373

R, getting an invalid argument to unary operator when using order function

I'm essentially doing the exact same thing 3 times, and when adding a new variable I get this error
Error in -emps$EV : invalid argument to unary operator
The code chunk causing this is
evps<-aggregate(EV~player,s1k,mean)
sort2<-evps[order(-evps$EV),]
head(sort2,10)
s1k$EM<-s1k$points-s1k$EV
emps<-aggregate(EM~player,s1k,mean)
sort3<-emps[order(-emps$EV),]
head(sort3,10)
Works like a charm for the first list, but the identical code thereafter causes the error.
This specific line is causing the error
sort3<-emps[order(-emps$EV),]
How can I fix/workaround this?
Full Code
url <- getURL("https://raw.githubusercontent.com/M-ttM/Basketball/master/class.csv")
shots <- read.csv(text = url)
shots$make<-shots$points>0
shots2<-shots[which(!(shots$player=="Luc Richard Mbah a Moute")),]
fit1<-glm(make~factor(type)+factor(period), data=shots2,family="binomial")
summary(fit1)
shots2$makeodds<-fitted(fit1)
shots2$EV<-shots2$makeodds*ifelse(shots2$type=="3pt",3,2)
shots3<-shots2[which(shots2$y>7),]
locmakes<-data.frame(table(shots3[, c("x", "y")]))
s1k <- shots2[with(shots2, player %in% names(which(table(player)>=1000))), ]
pps<-aggregate(points~player,s1k,mean)
sort<-pps[order(-PPS$points),]
head(sort,10)
evps<-aggregate(EV~player,s1k,mean)
sort2<-evps[order(-evps$EV),]
head(sort2,10)
s1k$EM<-s1k$points-s1k$EV
emps<-aggregate(EM~player,s1k,mean)
sort3<-emps[order(-emps$EV),]
head(sort3,10)
The error message seems to occur when trying to order columns including chr type data. A possible workaround is to use the reverse function rev() instead of the minus sign, like so:
column_a = c("a","a","b","b","c","c")
column_b = seq(6)
df = data.frame(column_a, column_b)
df$column_a = as.character(df$column_a)
df[with(df, order(-column_a, column_b)),]
> Error in -column_a : invalid argument to unary operator
df[with(df, order(rev(column_a), column_b)),]
column_a column_b
5 c 5
6 c 6
3 b 3
4 b 4
1 a 1
2 a 2
Let me know if it works in your case.
On this line, emps$EV doesn't exist.
s1k$EM<-s1k$points-s1k$EV
emps<-aggregate(EM~player,s1k,mean)
sort3<-emps[order(-emps$EV),]
head(sort3,10)
You probably meant
s1k$EM<-s1k$points-s1k$EV
emps<-aggregate(EM~player,s1k,mean)
sort3<-emps[order(-emps$EM),]
head(sort3,10)

subsetting data.cube inside custom function

I am trying to make a function of my own to subset a data.cube in R, and format the result automatically for some predefined plots I aim to build.
This is my function.
require(data.table)
require(data.cube)
secciona <- function(cubo = NULL,
fecha_valor = list(),
loc_valor = list(),
prod_valor = list(),
drop = FALSE){
cubo[fecha_valor, loc_valor, prod_valor, drop = drop]
## The line above will really be an asignment of type y <- format(cubo[...drop])
## Rest of code which will end up plotting the subset of the function
}
The thing is I keep on getting the error: Error in eval(expr, envir, enclos) : object 'fecha_valor' not found
What is most strange for me, is that on the console everything works fine, but not when defined inside the subsetting function of mine.
In console:
> dc[list(as.Date("2013/01/01"))]
> dc[list(as.Date("2013/01/01")),]
> dc[list(as.Date("2013/01/01")),,]
> dc[list(as.Date("2013/01/01")),list(),list()]
all give as result:
<data.cube>
fact:
5627 rows x 2 dimensions x 1 measures (0.32 MB)
dimensions:
localizacion : 4 entities x 3 levels (0.01 MB)
producto : 153994 entities x 3 levels (21.29 MB)
total size: 21.61 MB
But whenever I try
secciona(dc)
secciona(dc, fecha_valor = list(as.Date("2013/01/01")))
secciona(dc, fecha_valor = list())
I always get the error above mentioned.
Any ideas why this is happening? should I proceed in else way for my approach of editing the subset for plotting?
This is the standard issue that R users will face when dealing with non-standard evaluation. This is a consequence of Computing on the language R language feature.
[.data.cube function expects to be used in interactive way, that extends the flexibility of the arguments passed to it, but gives some restrictions. In that aspect it is similar to [.data.table when passing expressions from wrapper function to [ subset operator. I've added dummy example to make it reproducible.
I see you are already using data.cube-oop branch, so just to clarify for other readers. data.cube-oop branch is 92 commits ahead of master branch, to install use the following.
install.packages("data.cube", repos = paste0("https://", c(
"jangorecki.gitlab.io/data.cube",
"Rdatatable.github.io/data.table",
"cran.rstudio.com"
)))
library(data.cube)
set.seed(1)
ar = array(rnorm(8,10,5), rep(2,3),
dimnames = list(color = c("green","red"),
year = c("2014","2015"),
country = c("IN","UK"))) # sorted
dc = as.data.cube(ar)
f = function(color=list(), year=list(), country=list(), drop=FALSE){
expr = substitute(
dc[color=.color, year=.year, country=.country, drop=.drop],
list(.color=color, .year=year, .country=country, .drop=drop)
)
eval(expr)
}
f(year=list(c("2014","2015")), country="UK")
#<data.cube>
#fact:
# 4 rows x 3 dimensions x 1 measures (0.00 MB)
#dimensions:
# color : 2 entities x 1 levels (0.00 MB)
# year : 2 entities x 1 levels (0.00 MB)
# country : 1 entities x 1 levels (0.00 MB)
#total size: 0.01 MB
You can track the expression just by putting print(expr) before/instead eval(expr).
Read more about non-standard evaluation:
- R Language Definition: Computing on the language
- Advanced R: Non-standard evaluation
- manual of substitute function
And some related SO questions:
- Passing on non-standard evaluation arguments to the subset function
- In R, why is [ better than subset?

Resources