I have created set of association rules in Julia language, and saved them as csv. Now I would like to use R to make visualizations of them, but when I read csv it is a data frame, not a arules class (which is kind of obvious!).
How to convert data frame into rules-class (https://www.rdocumentation.org/packages/arules/versions/1.6-6/topics/rules-class) to use visualizations from arulesViz library (https://cran.r-project.org/web/packages/arulesViz/vignettes/arulesViz.pdf)?
# reproduce some data
dat <- data.frame(
lhs_ids = c("{}", "{B}"),
rhs_id = c("{A}", "{A}"),
confidence = c(0.25, 0.2),
support = c(0.25, 0.03),
lift = c(1, 0.5),
)
# convert
a_rules <- as(dat, "rules")
Error in as(., "rules") :
no method or default for coercing “data.frame” to “rules”
You should use PMML, the standard way to transfer rules between tools. See: https://rdrr.io/cran/arules/man/pmml.html
Related
im quite new to R but wanted to use the packages "nls" and "nlstools" since it has nice tools for analysis and evaluation.
the code I use is:
conB1_2015 = read.csv("C:\\Path_to_File\\conB1_2015.csv")
conB1_2015 = na.omit(conB1_2015)
tRef <- mean(conB1_2015$Mean_Soil_Temp_V2..C., na.rm=TRUE)
rRef <- conB1_2015$Lin_Flux..mymol.m.2.s.1.[which.min(abs(conB1_2015$Mean_Soil_Temp_V2..C.-tRef))]
rMax <- max(conB1_2015$Lin_Flux..mymol.m.2.s.1., na.rm=TRUE)
half <- rMax/2
half_SM <- conB1_2015$Soil_Moist_V3[which.min(abs(conB1_2015$Lin_Flux..mymol.m.2.s.1.-half))]
form <- as.formula(Lin_Flux..mymol.m.2.s.1. ~ (rRef)*a*exp(b*Mean_Soil_Temp_V2..C.)*Soil_Moist_V3/(half_SM)+Soil_Moist_V3)
preview(form, data = conB1_2015, start = c(a = -1.98, b = -0.05), variable = 1)
The Problem is, that i get this Error running this code:
Error in data.frame(value, row.names = rn, check.names = FALSE) :
row names supplied are of the wrong length
When i change the variables in form <- as.formula(Lin_Flux..mymol.m.2.s.1. ~ (rRef)*a*exp(b*Mean_Soil_Temp_V2..C.)*Soil_Moist_V3/(half_SM)+Soil_Moist_V3)
to form <- as.formula(Lin_Flux..mymol.m.2.s.1.~(rRef<-4.41)*a*exp(b*Mean_Soil_Temp_V2..C.)*Soil_Moist_V3/(half_SM<-7.19)+Soil_Moist_V3)
the function works fine.
I wanted to automate the script to run over several csv's to test different models on different data. Is it really not possible to pass variables into the preview function or am I missing something? There can't be a problem with headers or the data table since it's working fine in the second example.
I want to export an R model in pmml format and use it elsewhere. The other software requires some variables as integers but all numeric variables are exported as double instead, even when they are explicitly integer in my dataset.
I tried to bypass this problem by changing them manually (or with regex) and I deleted every decimal but while the software accepts the new format, the prediction is not what I expect (because I just deleted decimals), so I want to solve this directly inside R.
How can I force my variables to be a certain dataType (particularly "integer")?
This is a code example that exports a .pmml:
# Required packages -------------------------------------------------------
library(tidyverse)
library(r2pmml)
library(randomForest)
library(nnet)
# Dataset creation --------------------------------------------------------
seed = 1
data = data.frame(
var1 = round(runif(10) * 100),
var2 = round(runif(10) * 100),
y = round(runif(10) * 100)
)
data =
data %>%
mutate(var1 = as.integer(var1),
var2 = as.integer(var2))
# Structure check ---------------------------------------------------------
str(data)
# Neural Network and Random Forest models ---------------------------------
nn =
nnet(
y ~ .,
data = data,
method = "nnet",
size = c(2),
linout = 1
)
rf =
randomForest(y ~ .,
data = data)
# pmml export -------------------------------------------------------------
r2pmml(rf,
file = "rf.pmml",
dataset = data,
verbose = TRUE)
r2pmml(nn,
file = "nn.pmml",
dataset = data,
verbose = TRUE)
I expect my pmml to have variables var1 and var2 as an integer, but they end up being double in this section of the output
<DataDictionary>
<DataField name="y" optype="continuous" dataType="double"/>
<DataField name="var1" optype="continuous" dataType="double"/>
<DataField name="var2" optype="continuous" dataType="double"/>
and I got decimal numbers in
<NeuralLayer activationFunction="logistic">
<Neuron id="hidden/1" bias="-0.4112317232771385">
<Con from="input/1" weight="-6.591508925328581"/>
<Con from="input/2" weight="-31.805468580606753"/>
</Neuron>
but I'm not sure if that should be integer or double.
With the R2PMML package, and its underlying JPMML-R library being open source, you can always take a look into the source code (of the version that you're using) to see how things are implemented. In case of the nnet model type, you could take a look into the org.jpmml.rexp.NNetConverter class.
Essentially, there are two options. First, the R model object (nnet objects saved into RDS file) may not contain any feature type information at all. Second, this information might be there, but the converter is not using it yet - it is defaulting to the default data type of the nnet algorithm (all numeric computation works is done using the double data type, so it seems like a good choice for storing in the PMML document).
Where exactly is it recorded in your R model object(s) that features var1 and var2 are integers (instead of doubles)? If you think
you've found the answer, consider opening a feature request with the JPMML-R project.
Currently I am working on item-item based recommendation system using r. The package which I have used is arules. I have done my basic models but I want to modify my model with following criteria:
In the apriori algo. We will receive only one output, not multiple output. I want multiple output value in the rhs side. For example:
lhs rhs
{GH DAILY MOONG DAL PREMIUM 1kg,
MDH POW SPICE DEGHI CHILLI 100g,PREM 1kg} => {DAILY OTH PULSE CHANA DAL...
Rice}
My recommendation system totally based on item-item. Is there any other algorithm or package exist in r which will give me better business output?
How to calculate confidence and support value? For my case I am using default values.
My code is given below:
#Create Sparse Matrix
dataset = read.transactions('/Users/Nikita/Downloads/Reco_System/market_basket_before_model.csv', sep = ',', rm.duplicates = TRUE)
summary(dataset)
itemFrequencyPlot(dataset, topN = 20, type = 'absolute')
#1st cut
# Training Apriori on the dataset
rules = apriori(data = dataset, parameter = list(support = 0.001, confidence = 0.8))
# Visualising the results
inspect(sort(rules, by = 'lift')[1:30])
Thanks in advance.
Most implementations of association rule mining algorithms restrict the RHS of the rules to a single item to avoid further combinatorial explosion.
I have created an evaluation scheme using the recommenderlab package with binaryRatingMatrix. How can I see which all users from the actual data are there in unknown test set?
scheme <- evaluationScheme(data = data1, method = "split", train = 0.9, given = 3)
where data1 is binaryRatingMatrix. I would like to extract the list of users who are in the unknown set getData(scheme, "unknown")?
This will print out the first column which is all the userIds.
getRatingMatrix(getData(scheme, "unknown")[,1])
I have a dataframe in r which has some NA values in it. How can I use pmmlTransformations to set a missing value treatment for these fields. Ive seen that you can set missingValue treatments when transforming the data (normalization, field mapping, etc) but I would like to know how to just set the missing values with out having to normalize the data.
library(pmml)
library(pmmlTransformations)
df <- data.frame(id=1:5, y=1:5, x=c(2,4,3,NA,8))
dataBox <- WrapData(df)
# update the wrapped data to set x=1 when it its NA
fit <- glm(formula=y~x, data = dataBox$data)
pmml(fit, transforms=dataBox)
Many thanks in advance
Andrew
If you just want to add the missingValueReplacement=1 attribute to all MiningField elements in the PMML document, then append unknownValue = 1 to your pmml::pmml.glm function call:
library(pmml)
df <- data.frame(id=1:5, y=1:5, x=c(2,4,3,NA,8))
# Set missing values to 1 before training a GLM model
df$x[is.na(df$x)] = 1
fit <- glm(formula=y~x, data = df)
# Encode information about the missing value transformation into the PMML document
pmml = pmml.glm(fit, unknownValue = 1)
saveXML(pmml, "glm.pmml")
Sure, the unknownValue parameter appears to be deprecated, but it does exactly what you need without firing up a complex sequence of transformations.
You can use the unknownValue parameter:
pmml.glm(glm, transforms = dataBox, unknownValue = 0)
but this will be applied to all you variables, including your target variable.
I wrote a fix that allows specifying replacement values for each of the variables:
https://github.com/guleatoma/pmml
Using this version of the package you can do this:
pmml.glm(glm, transforms = dataBox, unknownValue = list("x1" = 0, "x2" = 100))