Interpreting evaluationscheme from Recommenderlab in R - r

I have created an evaluation scheme using the recommenderlab package with binaryRatingMatrix. How can I see which all users from the actual data are there in unknown test set?
scheme <- evaluationScheme(data = data1, method = "split", train = 0.9, given = 3)
where data1 is binaryRatingMatrix. I would like to extract the list of users who are in the unknown set getData(scheme, "unknown")?

This will print out the first column which is all the userIds.
getRatingMatrix(getData(scheme, "unknown")[,1])

Related

R mlr3 TaskClassif 'termlabels' must be a character vector of length at least one

I am using mlr3 for a simple classification model. But I encounter errors with several different models which mlr3 gives access to. Here I provide one reprex to illustrate the problem:
library(data.table)
library(mlr3extralearners)
library(mlr3)
library(mlr3learners)
library(mlr3tuning)
library(mlr3pipelines)
library(mlr3filters)
#Make example data
DT = data.table(target = c(0,0,0,0,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0),pred = c(0.05767878,0.05761652,0.06508700,0.06531820,0.07050699,0.07098812,0.07150984,0.07845767,0.07891081,0.07873572,0.08035471,0.08039300,0.08040480,0.08040480,0.08472619,0.08489135,0.08517742,0.08612768,0.08728675,0.08790671,0.08913434,0.08911522,0.09036788,0.09147726,0.09154964,0.09236259,0.09299088,0.09499589,0.09748171,0.09756818,0.09756818,0.09861013,0.10193147,0.10211796,0.10277547,0.10379659,0.10393602,0.10397469,0.10364373,0.10368016,0.10362235,0.10387504,0.10385431,0.10387288,0.10423139,0.10483475,0.10570517,0.10573617,0.10569312,0.10572714,0.10597040,0.10573924,0.10551367,0.10573499,0.10602269,0.10765947,0.10721005,0.10703524,0.10824609,0.10933141,0.10936178,0.10957693,0.10874663,0.10875077))
DT[, target := as.factor(target)] #Target Variable as factor is required
task <- TaskClassif$new(id='pizza', backend = DT, target = "target", positive = '1')
#Select an algo and a filter
randF = lrn("classif.randomForest", predict_type = "prob")#
filter1 = mlr_pipeops$get("filter", filter = mlr3filters::FilterVariance$new(),param_vals = list(filter.cutoff = 0.05))
#Construct a simple graph
graph = filter1 %>>%
PipeOpLearner$new(lrn("classif.randomForest"), id = "randF")
#graph$plot()
#Construct a learner and train it
learner = GraphLearner$new(graph)
learner$train(task)
This give the error:
'Error in reformulate(attributes(Terms)$term.labels) :
'termlabels' must be a character vector of length at least one'
I have the impression, that the task- object of mlr3 somehow doesnt interact well with the graph. The error then comes from the randomForest classifier, but to me it seems like the data was not properly handed over to it. But thats just a theory of mine. I may alter the question if its not clear enough.
Your filter is removing the only feature, and feature filtering is not necessary if there is only a single feature.

R read csv with association rules not as datafram, but arules class

I have created set of association rules in Julia language, and saved them as csv. Now I would like to use R to make visualizations of them, but when I read csv it is a data frame, not a arules class (which is kind of obvious!).
How to convert data frame into rules-class (https://www.rdocumentation.org/packages/arules/versions/1.6-6/topics/rules-class) to use visualizations from arulesViz library (https://cran.r-project.org/web/packages/arulesViz/vignettes/arulesViz.pdf)?
# reproduce some data
dat <- data.frame(
lhs_ids = c("{}", "{B}"),
rhs_id = c("{A}", "{A}"),
confidence = c(0.25, 0.2),
support = c(0.25, 0.03),
lift = c(1, 0.5),
)
# convert
a_rules <- as(dat, "rules")
Error in as(., "rules") :
no method or default for coercing “data.frame” to “rules”
You should use PMML, the standard way to transfer rules between tools. See: https://rdrr.io/cran/arules/man/pmml.html

R nsltools Regression, preview function doesn't take variables

im quite new to R but wanted to use the packages "nls" and "nlstools" since it has nice tools for analysis and evaluation.
the code I use is:
conB1_2015 = read.csv("C:\\Path_to_File\\conB1_2015.csv")
conB1_2015 = na.omit(conB1_2015)
tRef <- mean(conB1_2015$Mean_Soil_Temp_V2..C., na.rm=TRUE)
rRef <- conB1_2015$Lin_Flux..mymol.m.2.s.1.[which.min(abs(conB1_2015$Mean_Soil_Temp_V2..C.-tRef))]
rMax <- max(conB1_2015$Lin_Flux..mymol.m.2.s.1., na.rm=TRUE)
half <- rMax/2
half_SM <- conB1_2015$Soil_Moist_V3[which.min(abs(conB1_2015$Lin_Flux..mymol.m.2.s.1.-half))]
form <- as.formula(Lin_Flux..mymol.m.2.s.1. ~ (rRef)*a*exp(b*Mean_Soil_Temp_V2..C.)*Soil_Moist_V3/(half_SM)+Soil_Moist_V3)
preview(form, data = conB1_2015, start = c(a = -1.98, b = -0.05), variable = 1)
The Problem is, that i get this Error running this code:
Error in data.frame(value, row.names = rn, check.names = FALSE) :
row names supplied are of the wrong length
When i change the variables in form <- as.formula(Lin_Flux..mymol.m.2.s.1. ~ (rRef)*a*exp(b*Mean_Soil_Temp_V2..C.)*Soil_Moist_V3/(half_SM)+Soil_Moist_V3)
to form <- as.formula(Lin_Flux..mymol.m.2.s.1.~(rRef<-4.41)*a*exp(b*Mean_Soil_Temp_V2..C.)*Soil_Moist_V3/(half_SM<-7.19)+Soil_Moist_V3)
the function works fine.
I wanted to automate the script to run over several csv's to test different models on different data. Is it really not possible to pass variables into the preview function or am I missing something? There can't be a problem with headers or the data table since it's working fine in the second example.

Using mRMRe in R

I am currently working on a project where I have to do some feature selection for building a predictive model. I was lead to a package in R called mRMRe. I am just trying to work the example but cannot get it working. The example can be found here - http://www.inside-r.org/packages/cran/mRMRe/docs/mRMR.ensemble.
Here is my code -
data(cgps)
data <- data.frame(target=cgps.ic50, cgps.ge)
mRMR.ensemble(data, 1, rep.int(1, 30))
When I run this code I get the error -
Error in .local(.Object, ...) : data must be of type mRMRe.Data.
I dug a litter further and found that you actually have to convert the data to mRMR.Data type. So I did this update -
# Update
data <- mRMR.data(data = data.frame(target=cgps.ic50, cgps.ge))
mRMR.ensemble(data, 1, rep.int(1, 30))
but I still get the same error. When I look at the class I have -
> class(data)
[1] "mRMRe.Data"
attr(,"package")
[1] "mRMRe"
So the data is the requested type but the code is still not functional.
My question is if anyone has experience using this package or any help or comments would be appreciated!
Also want to note that in the example from the link - when I load the data
cgps_ic50 -> cgps.ic50
cgps_ge -> cgps.ge
so the names of the data aren't the same as the same in the example.
With the code you wrote:
data(cgps)
data <- mRMR.data(data = data.frame(target=cgps.ic50, cgps.ge))
mRMR.ensemble(data, 1, rep.int(1, 30))
The function mRMR.ensemble is getting the data as the first parameter, but the default first parameter in this function is solution_count.
I understand that your intentions executing that example are finding 30 relevant and non-redundant features using the classic mRMR feature selection algorithm so try this:
data(cgps)
data <- mRMR.data(data = data.frame(target=cgps.ic50, cgps.ge))
mRMR.ensemble(data = data, target_indices = 1,
feature_count = 30, solution_count = 1)
The target_indices are the positions in the original data.frame of the features used to maximize the relevance (correlation or other quality measure for this issue), so features selected in the end will be good for explaining the features indicated in the target_indices.
For example, in a classification problem, we would choose the position of the class variable as the value for the target_indices parameter.
The feature_count parameter indicates the number of variables to be chosen.
The solution_count is not a parameter of the classic mRMR. It indicates the number of mRMR algorithms to be ensembled to get a final feature selection, so if set to 1 it performs only one classic mRMR.

how do I set missing values using PmmlTransformation in r

I have a dataframe in r which has some NA values in it. How can I use pmmlTransformations to set a missing value treatment for these fields. Ive seen that you can set missingValue treatments when transforming the data (normalization, field mapping, etc) but I would like to know how to just set the missing values with out having to normalize the data.
library(pmml)
library(pmmlTransformations)
df <- data.frame(id=1:5, y=1:5, x=c(2,4,3,NA,8))
dataBox <- WrapData(df)
# update the wrapped data to set x=1 when it its NA
fit <- glm(formula=y~x, data = dataBox$data)
pmml(fit, transforms=dataBox)
Many thanks in advance
Andrew
If you just want to add the missingValueReplacement=1 attribute to all MiningField elements in the PMML document, then append unknownValue = 1 to your pmml::pmml.glm function call:
library(pmml)
df <- data.frame(id=1:5, y=1:5, x=c(2,4,3,NA,8))
# Set missing values to 1 before training a GLM model
df$x[is.na(df$x)] = 1
fit <- glm(formula=y~x, data = df)
# Encode information about the missing value transformation into the PMML document
pmml = pmml.glm(fit, unknownValue = 1)
saveXML(pmml, "glm.pmml")
Sure, the unknownValue parameter appears to be deprecated, but it does exactly what you need without firing up a complex sequence of transformations.
You can use the unknownValue parameter:
pmml.glm(glm, transforms = dataBox, unknownValue = 0)
but this will be applied to all you variables, including your target variable.
I wrote a fix that allows specifying replacement values for each of the variables:
https://github.com/guleatoma/pmml
Using this version of the package you can do this:
pmml.glm(glm, transforms = dataBox, unknownValue = list("x1" = 0, "x2" = 100))

Resources