Lexical Error while running featuretoolsR - r

In an effort to test the working of featuretools, I installed featuretoolsR through RStudio,and installed numpy and featuretools in Python.
However on trying to create an entitiy following error is coming
# Libs
library(featuretoolsR)
library(magrittr)
# Create some mock data
set_1 <- data.frame(key = 1:100, value = sample(letters, 100, T))
set_2 <- data.frame(key = 1:100, value = sample(LETTERS, 100, T))
# Create entityset
es <- as_entityset(set_1, index = "key", entity_id = "set_1", id = "demo")```
Error: lexical error: invalid char in json text.
WARNING: The conda.compat modul
(right here) ------^
Kindly help in diagnosing and providing solution to same.

The same warning happened to me after updating to conda version 4.6.11. I think the problem is generated because of the print statement at the end of the compat.py script. I know this is not a great fix but I accessed the compat.py file and removed the print statement:
print("WARNING: The conda.compat module is deprecated and will be removed in a future release.", file=sys.stderr)
The file should be located here: \Anaconda3\pkgs\conda-4.6.11-py37_0\Lib\site-packages\conda
I hope it helps.

Related

Consistent error message while running grouping analysis in 'plspm' package

I am looking for some help in resolving an error using the partial least squares path modeling package ('plspm').
I can get results running a basic PLS-PM analysis but run into issues when using the grouping function, receiving the error message:
Error in if (w_dif < specs$tol || iter == specs$maxiter) break : missing value where TRUE/FALSE needed
I have no missing values and all variables have the proper classification. Elsewhere I read that there is a problem with processing observations with the exact same values across all variables, I have deleted those and still face this issue. I seem to be facing the issue only when I run the groups using the "bootstrap" method as well.
farmwood = read.csv("farmwood_groups(distance).csv", header = TRUE) %>%
slice(-c(119:123))
Control = c(0,0,0,0,0,0)
Normative = c(0,0,0,0,0,0)
B_beliefs = c(0,0,0,0,0,0)
P_control = c(1,0,0,0,0,0)
S_norm = c(0,1,0,0,0,0)
Behavior = c(0,0,1,1,1,0)
farmwood_path = rbind(Control, Normative, B_beliefs, P_control, S_norm, Behavior)
colnames(farmwood_path) = rownames(farmwood_path)
farmwood_blocks = list(14:18,20:23,8:13,24:27,19,4:7)
farmwood_modes = rep("A", 6)
farmwood_pls = plspm(farmwood, farmwood_path, farmwood_blocks, modes = farmwood_modes)
ames(farmwood)[names(farmwood) == "QB3"] <- "Distance"
farmwood$Distance <- as.factor(farmwood$Distance)
distance_boot = plspm.groups(farmwood_pls, farmwood$Distance, method = "bootstrap")
distance_perm = plspm.groups(farmwood_pls, farmwood$Distance, method = "permutation")
The data is contained here:
https://www.dropbox.com/s/8vewuupywpi1jkt/farmwood_groups%28distance%29.csv?dl=0
Any help would be appreciated. Thank you in advance

googledrive::drive_mv gives error "Parent specified via 'path' is invalid: x Does not exist"

This is a weird one and I am hoping someone can figure it out. I have written a function that uses googlesheets4 and googledrive. One thing I'm trying to do is move a googledrive document (spreadsheet) from the base folder to a specified folder. I had this working perfectly yesterday so I don't know what happened as it just didn't when I came in this morning.
The weird thing is that if I step through the function, it works fine. It's just when I run the function all at once that I get the error.
I am using a folder ID instead of a name and using drive_find to get the correct folder ID. I am also using a sheet ID instead of a name. The folder already exists and like I said, it was working yesterday.
outFolder <- 'exact_outFolder_name_without_slashes'
createGoogleSheets <- function(
outFolder
){
folder_id <- googledrive::drive_find(n_max = 10, pattern = outFolder)$id
data <- data.frame(Name = c("Sally", "Sue"), Data = c("data1", "data2"))
sheet_id <- NA
nameDate <- NA
tempData <- data.frame()
for (i in 1:nrow(data)){
nameDate <- data[i, "Name"]
tempData <- data[i, ]
googlesheets4::gs4_create(name = nameDate, sheets = list(sheet1 = tempData)
sheet_id <- googledrive::drive_find(type = "spreadsheet", n_max = 10, pattern = nameDate)$id
googledrive::drive_mv(file = as_id(sheet_id), path = as_id(folder_id))
} end 'for'
} end 'function'
I don't think this will be a reproducible example. The offending code is within the for loop that is within the function and it works fine when I run through it step by step. folder_id is defined within the function but outside of the for loop. sheet_id is within the for loop. When I move folder_id into the for loop, it still doesn't work although I don't know why it would change anything. These are just the things I have tried. I do have the proper authorization for google drive and googlesheets4 by using:
googledrive::drive_auth()
googlesheets4::gs4_auth(token = drive_token())
<error/rlang_error>
Error in as_parent():
! Parent specified via path is invalid:
x Does not exist.
Backtrace:
global createGoogleSheets(inputFile, outPath, addNames)
googledrive::drive_mv(file = as_id(sheet_id), path = as_id(folder_id))
googledrive:::as_parent(path)
Run rlang::last_trace() to see the full context.
Backtrace:
x
-global createGoogleSheets(inputFile, outPath, addNames)
-googledrive::drive_mv(file = as_id(sheet_id), path = as_id(folder_id))
\-googledrive:::as_parent(path)
\-googledrive:::drive_abort(c(invalid_parent, x = "Does not exist."))
\-cli::cli_abort(message = message, ..., .envir = .envir)
\-rlang::abort(message, ..., call = call, use_cli_format = TRUE)
I have tried changing the folder_id to the exact path of my google drive W:/My Drive... and got the same error. I should mention I have also tried deleting the folder and re-creating it fresh.
Anybody have any ideas?
Thank you in advance for your help!
I can't comment because I don't have the reputation yet, but I believe you're missing a parenthesis in your for-loop.
You need that SECOND parenthesis below:
for (i in 1:nrow(tempData) ) {
...
}

R6Class bug __Deferred_Default_Marker__

I'm implementing a new R6Class and trying to add new members dynamically (https://cran.r-project.org/web/packages/R6/vignettes/Introduction.html#adding-members-to-an-existing-class) but I get this error "__Deferred_Default_Marker__" (whether it be dynamic or not) when I implement the getx2 function.
Simple <- R6Class("Simple",
public = list(
x = 1,
getx = function() self$x,
getx2 = function() return(self$x * 2)
)
)
# To replace an existing member, use overwrite=TRUE
Simple$set("public", "x", 10, overwrite = TRUE)
s <- Simple$new()
s$getx2() # this returns "__Deferred_Default_Marker__"
Any ideas on this? It's exactly like in the documentation
The solution was to update the package. The problem with the following instruction:
devtools::install_github('r-lib/R6', build_vignettes = FALSE)
was it threw me the following error: namespace 'R6' is imported by 'CompatibilityAPI', 'mrsdeploy' so cannot be unloaded"
so i closed RStudio, and opened R.exe (C:\Program Files\R\R-3.3.3\bin) and ran the same command. Now, I have this package:
Package: R6
Version: 2.2.2.9000
URL: https://github.com/r-lib/R6/
and it works as in the specification.

slurm_apply a RefClass method from within a RefClass method

EDIT: New version of rslurm makes the solution very easy. See my answer below.
Apologies for the somewhat longer than desired MWE, and a title that I realize after submitting the question may be needlessly complicated. I believe the real issue is getting the environment of a RefClass object into rslurm::slurm_apply.
MWE
Here I define a toy reference class called BankAccount. It has two fields and two methods.
The fields are transactions, a list of all transactions associated with the account and suspicion_threshold the value above which the bank will investigate the transaction.
The two methods are is_suspicious which compares the transactions with the suspicion_threshold on the local machine and is_suspicious_slurm, which uses rslurm::slurm_apply to spread many calls to is_suspicious over a cluster of computers managed by SLURM. You can imagine if there were many transactions or if the is_suspicious function were more complex, this might be necessary.
So, here's the setup
BankAccount <- setRefClass(
Class = 'BankAccount',
fields = list(
transactions = 'numeric',
suspicion_threshold = 'numeric'
)
)
BankAccount$methods(
is_suspicious = function(start_idx = 1, stop_idx = length(transactions)) {
return(start_idx + which(transactions[start_idx:stop_idx] > suspicion_threshold) - 1)
}
)
BankAccount$methods(
is_suspicious_slurm = function(num_nodes) {
usingMethods(is_suspicious)
t <- length(transactions)
t_per_n <- floor(t/num_nodes)
starts <- seq(from = 1, length.out = num_nodes, by = t_per_n)
stops <- seq(from = t_per_n, length.out = num_nodes, by = t_per_n)
stops[num_nodes] <- t
sjob <- rslurm::slurm_apply(f = is_suspicious,
params = data.frame(start_idx = starts,
stop_idx = stops),
nodes = num_nodes,
add_objects = .self)
results_list <- rslurm::get_slurm_out(slr_job = sjob,
outtype = "raw",
wait = TRUE)
return(unlist(results_list))
}
)
Now, on my local machine I can run:
library(RCexampleforSE)
set.seed(27599)
b <- BankAccount$new()
b$transactions <- rnorm(n = 500)
b$suspicion_threshold <- 2
b$is_suspicious()
b$is_suspicious_slurm(num_nodes = 3)
and it works as expected:
62 103 155 171 182 188 297 398 493 499
If I run:
b$is_suspicious_slurm(num_nodes = 3)
I get an error, since my personal computer is not connected to a SLURM cluster.
sh: squeue: command not found
Cannot submit; no SLURM workload manager on path
Submission scripts output in directory _rslurm_13ba46e3c70b0
Error in rslurm::get_slurm_out(slr_job = sjob, outtype = "raw", wait = TRUE):
slr_job has not been submitted
If I logon to my university cluster, which uses SLURM, and run the same script, the setup and local methods work just as they did on my personal computer. When I run:
b$is_suspicious_slurm(num_nodes = 3)
it sends jobs to the cluster, as hoped for:
Submitted batch job 6363868
But these jobs error immediately with the following error message in slurm_0.out, slurm_1.out, and slurm_2.out:
Error in attr(, "mayCall") : argument 1 is empty
Execution halted
Thoughts and Attempts
I figure the job probably needs, but doesn't have available, the BankAccount object. So I tried passing it in as add_objects parameter to rslurm::slurm_apply:
sjob <- rslurm::slurm_apply(f = is_suspicious,
params = data.frame(start_idx = starts,
stop_idx = stops),
nodes = num_nodes,
add_objects = .self)
I also tried it in quotes and inside eval(), neither of which worked.
How can I make the object accessible to the worker jobs created with rslurm::slurm_apply?
Version 0.4.0 of rslurm completely solved this problem.
Define is_suspicious_slurm() as:
BankAccount$methods(
is_suspicious_slurm = function(num_nodes) {
usingMethods(is_suspicious)
t <- length(transactions)
t_per_n <- floor(t/num_nodes)
starts <- seq(from = 1, length.out = num_nodes, by = t_per_n)
stops <- seq(from = t_per_n, length.out = num_nodes, by = t_per_n)
stops[num_nodes] <- t
sjob <- rslurm::slurm_apply(f = is_suspicious,
params = data.frame(start_idx = starts,
stop_idx = stops),
nodes = num_nodes)
results_list <- rslurm::get_slurm_out(slr_job = sjob,
outtype = "raw",
wait = TRUE)
return(unlist(results_list))
}
)
The only change is that in the call to rslurm::slurm_apply, the add_objects parameter is not specified. It does not need to be specified because as #Ian pointed out:
"...you don't need to pass self at all when slurm_apply sends the serialized function, which appears to include both ".self" and "transactions" in the enclosing environment."
EDIT: OP's answer is all you need to know.
The add_objects parameter is used for passing a character vector, not the objects themselves. All the objects are then saved in one RData file, assuming they can be found by name. In theory, you should be able to use add_objects = c('.self') within your method definition.
The key here is, "assuming they can be found". I will edit this post once a pending update to the rslurm package (which should make that finding more successful) is released.
Be very careful passing objects to cluster nodes: they do not come back. Not only will any side effects be lost, there's no inter-node communication implemented by rslurm.
Also be careful with which :) Your is_suspicious method will be wrong for arguments that don't start at 1. Try this version:
BankAccount$methods(
is_suspicious = function(i = 1:length(transactions)) {
idx <- which(transactions[i] > suspicion_threshold)
i[idx]
}
)

Getting NULL trying to return object from eqmcc() function in QCA package

I keep getting a NULL returned when running the exact same script provided in the QCA package documentation:
data(d.represent)
Krook<-d.represent
KrookTT <- truthTable(Krook, outcome = "WNP")
KrookSI <- eqmcc(KrookTT, include = "?", direxp = c(1,1,1,1,1), details =TRUE)
KrookSI$PIchart$i.sol$C1P1
That returns a NULL as does the following:
KrookSI$pims$i.sol$C1P1
Any help is much appreciated!
Apparently all I needed to was the following:
KrookSP$pims
KrookSP$PIchart

Resources