Read Wordlist into Rapidminer Execute R

Read Wordlist into Rapidminer Execute R - r

I'm using Rapidminer, and after a Wordlist to Data operator, I want to create a word cloud with Execute R using the script below. I get the R execution failure: "no applicable method for 'TermDoumentMatrix' applied to an object of class \"c('data.table', data.frame')\".
The datatable has a column of words, a column of document occurrences, and a column of word occurrences.
Can anyone advise please how I resolve the error?
rm_main = function(data)
{
wordcloud::wordcloud(data, scale=c(5,0.5), max.words=100, random.order=FALSE,
rot.per=0.35, use.r.layout=FALSE, colors="Dark2")
}

Passing the whole data frame data as 1st argument to wordcloud does not work. (In fact it would work, if it was an object of type termDocumentMatrix, which is part of R's tm package, but that's another story; however that's what the error message is about.) From within RapidMiner you have to specify the words and their frequency as 1st and 2nd parameter respectively.
So you could use something like
rm_main = function(data)
{
windows() # on MS Windows systems
wordcloud::wordcloud(data$word, data$total, min.freq = 1)
Sys.sleep(5)
}
Here's an example process, which creates a word cloud, saves it to "c:\mywordcloud.pdf" and opens it (on Windows) with the default application that is associated with .pdf files:
<?xml version="1.0" encoding="UTF-8"?><process version="7.3.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.3.000" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="text:create_document" compatibility="7.3.000" expanded="true" height="68" name="Create Document" width="90" x="179" y="187">
<parameter key="text" value="Hello world world!"/>
<parameter key="add label" value="false"/>
<parameter key="label_type" value="nominal"/>
</operator>
<operator activated="true" class="text:process_documents" compatibility="7.3.000" expanded="true" height="103" name="Process Documents" width="90" x="313" y="238">
<parameter key="create_word_vector" value="true"/>
<parameter key="vector_creation" value="TF-IDF"/>
<parameter key="add_meta_information" value="true"/>
<parameter key="keep_text" value="false"/>
<parameter key="prune_method" value="none"/>
<parameter key="prune_below_percent" value="3.0"/>
<parameter key="prune_above_percent" value="30.0"/>
<parameter key="prune_below_rank" value="0.05"/>
<parameter key="prune_above_rank" value="0.95"/>
<parameter key="datamanagement" value="double_sparse_array"/>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="7.3.000" expanded="true" height="68" name="Tokenize" width="90" x="45" y="34">
<parameter key="mode" value="non letters"/>
<parameter key="characters" value=".:"/>
<parameter key="language" value="English"/>
<parameter key="max_token_length" value="3"/>
</operator>
<connect from_port="document" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="text:wordlist_to_data" compatibility="7.3.000" expanded="true" height="82" name="WordList to Data" width="90" x="514" y="289"/>
<operator activated="true" class="r_scripting:execute_r" compatibility="7.2.000" expanded="true" height="82" name="Execute R" width="90" x="715" y="289">
<parameter key="script" value="rm_main = function(data)
{
windows()
wordcloud::wordcloud(data$word, data$total, min.freq = 1)
Sys.sleep(3)
}
"/>
</operator>
<connect from_op="Create Document" from_port="output" to_op="Process Documents" to_port="documents 1"/>
<connect from_op="Process Documents" from_port="word list" to_op="WordList to Data" to_port="word list"/>
<connect from_op="WordList to Data" from_port="example set" to_op="Execute R" to_port="input 1"/>
<connect from_op="Execute R" from_port="output 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>

Related

What is the criteria of data that will be used in FP-Growth mining in rapidminner?

I want to process a dataset like this, in RapidMiner:
order_id | items1 | items2 | items3
1 | book | book | pencil
2 | pencil | book | eraser
I want to process those data using fp-growth and association rule. What is the appropriate dataset that fit in RapidMiner rule?

have you taken a look at the tutorial process for the FP-Growth operator in RapidMiner (follow the link in the help text), you can find a detailed sample process. The data are already very similar to your example [1]
Getting such structured data into RapidMiner is easy. Either use the "Import Data" button, or load and prepare your data with the "Turbo Prep" assistance tool.
[1] just c&p the xml into your process design window:
<?xml version="1.0" encoding="UTF-8"?><process version="9.1.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.1.000" expanded="true" name="Process" origin="GENERATED_TUTORIAL">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" breakpoints="after" class="retrieve" compatibility="9.1.000" expanded="true" height="68" name="Load Transactions" origin="GENERATED_TUTORIAL" width="90" x="112" y="187">
<parameter key="repository_entry" value="//Samples/Templates/Market Basket Analysis/Transactions"/>
</operator>
<operator activated="true" class="aggregate" compatibility="6.0.006" expanded="true" height="82" name="Aggregate" origin="GENERATED_TUTORIAL" width="90" x="112" y="336">
<parameter key="use_default_aggregation" value="false"/>
<parameter key="attribute_filter_type" value="all"/>
<parameter key="attribute" value=""/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="attribute_value"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="time"/>
<parameter key="block_type" value="attribute_block"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
<parameter key="default_aggregation_function" value="average"/>
<list key="aggregation_attributes">
<parameter key="product 1" value="concatenation"/>
</list>
<parameter key="group_by_attributes" value="Invoice"/>
<parameter key="count_all_combinations" value="false"/>
<parameter key="only_distinct" value="false"/>
<parameter key="ignore_missings" value="true"/>
</operator>
<operator activated="true" class="rename" compatibility="9.1.000" expanded="true" height="82" name="Rename" origin="GENERATED_TUTORIAL" width="90" x="246" y="340">
<parameter key="old_name" value="concat(product 1)"/>
<parameter key="new_name" value="Products"/>
<list key="rename_additional_attributes"/>
</operator>
<operator activated="true" class="set_role" compatibility="9.1.000" expanded="true" height="82" name="Set Role" origin="GENERATED_TUTORIAL" width="90" x="380" y="340">
<parameter key="attribute_name" value="Invoice"/>
<parameter key="target_role" value="id"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" breakpoints="before" class="concurrency:fp_growth" compatibility="9.1.000" expanded="true" height="82" name="FP-Growth" origin="GENERATED_TUTORIAL" width="90" x="648" y="289">
<parameter key="input_format" value="item list in a column"/>
<parameter key="item_separators" value="|"/>
<parameter key="use_quotes" value="false"/>
<parameter key="quotes_character" value="""/>
<parameter key="escape_character" value="\"/>
<parameter key="trim_item_names" value="true"/>
<parameter key="positive_value" value="true"/>
<parameter key="min_requirement" value="support"/>
<parameter key="min_support" value="0.005"/>
<parameter key="min_frequency" value="100"/>
<parameter key="min_items_per_itemset" value="1"/>
<parameter key="max_items_per_itemset" value="0"/>
<parameter key="max_number_of_itemsets" value="1000000"/>
<parameter key="find_min_number_of_itemsets" value="false"/>
<parameter key="min_number_of_itemsets" value="100"/>
<parameter key="max_number_of_retries" value="15"/>
<parameter key="requirement_decrease_factor" value="0.9"/>
<enumeration key="must_contain_list"/>
</operator>
<operator activated="true" class="create_association_rules" compatibility="9.1.000" expanded="true" height="82" name="Create Association Rules" origin="GENERATED_TUTORIAL" width="90" x="648" y="442">
<parameter key="criterion" value="confidence"/>
<parameter key="min_confidence" value="0.1"/>
<parameter key="min_criterion_value" value="0.8"/>
<parameter key="gain_theta" value="2.0"/>
<parameter key="laplace_k" value="1.0"/>
</operator>
<connect from_op="Load Transactions" from_port="output" to_op="Aggregate" to_port="example set input"/>
<connect from_op="Aggregate" from_port="example set output" to_op="Rename" to_port="example set input"/>
<connect from_op="Rename" from_port="example set output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="FP-Growth" to_port="example set"/>
<connect from_op="FP-Growth" from_port="frequent sets" to_op="Create Association Rules" to_port="item sets"/>
<connect from_op="Create Association Rules" from_port="rules" to_port="result 1"/>
<connect from_op="Create Association Rules" from_port="item sets" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="147"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="42"/>
<description align="left" color="yellow" colored="false" height="70" resized="false" width="850" x="20" y="25">MARKET BASKET ANALYSIS<br>Model associations
between products by determining sets of items frequently purchased together and building
association rules to derive recommendations.
</description>
<description align="left" color="blue" colored="true" height="185" resized="true" width="550" x="20" y="105">Step 1:<br/>Load transaction data containing a
transaction id, a product id and a quantifier. The data denotes how many times a certain
product has been purchased as part of a transactions.
</description>
<description align="left" color="purple" colored="true" height="341" resized="true" width="549" x="20" y="300"><br> <br> <br> <br> <br>
<br> <br> <br> Step 2:<br>Edit, transform &amp; load (ETL) -
Aggregate transaction data via concatenation so that the products in a transaction are
in one entry, separated by the pipe symbol.<br>
</description>
<description align="left" color="green" colored="true" height="310" resized="true" width="290" x="580" y="105">Step 3:<br/>Using FP-Growth, determine
frequent item sets. A frequent item sets denotes that the items (products) in the set
have been purchased together frequently, i.e. in a certain ratio of transactions. This
ratio is given by the support of the item set.
</description>
<description align="left" color="green" colored="true" height="215" resized="true" width="286" x="579" y="425"><br> <br> <br> <br> <br>
<br> Step 4:<br/>Create association rules which can be used for product
recommendations depending on the confidences of the rules.<br>
</description>
<description align="left" color="yellow" colored="false" height="35" resized="true" width="849" x="20" y="655">Outputs: association rules, frequent item set<br>
</description>
</process>
</operator>
</process>

NLog use Connection String Name in appsettings

I have an NLog database target that looks like this:
<target xsi:type="Database" name="database"
connectionString="Server=.\SQLEXPRESS;Database=ApplicationOne;Trusted_Connection=True;MultipleActiveResultSets=true;User Id=User0101;Password=PW0101"
commandText="INSERT INTO [SchemaOne].[EventLogs](Id, Message, Level, Logger )VALUES(NewID(), #Message, #Level, #Logger)">
<parameter name="#Message" layout="${message}" />
<parameter name="#Level" layout="${level}" />
<parameter name="#Logger" layout="${logger}" />
</target>
Is it possible to change the connectionString to use connectionStringName from my appsettings instead?
My appsettings is called dssettings.json and it contains the connection details here:
"DatabaseConfiguration": {
"DatabaseName": "ApplicationOne",
"ConnectionName": "DefaultConnection",
"ConnectionString": "Server=.\\SQLEXPRESS;Database=ApplicationOne;Trusted_Connection=True;MultipleActiveResultSets=true;User Id=User0101;Password=PW0101"
},

Update NLog.Extension.Logging ver. 1.4.0
With NLog.Extension.Logging ver. 1.4.0 then you can now use ${configsetting}
See also: https://github.com/NLog/NLog/wiki/ConfigSetting-Layout-Renderer
Original Answer
With help from nuget-package NLog.Appsettings.Standard then you can normally do this:
<extensions>
<add assembly="NLog.Appsettings.Standard" />
</extensions>
<targets>
<target xsi:type="Database" name="database"
connectionString="${appsettings:name=DatabaseConfiguration.ConnectionString}"
commandText="INSERT INTO [SchemaOne].[EventLogs](Id, Message, Level, Logger )VALUES(NewID(), #Message, #Level, #Logger)">
<parameter name="#Message" layout="${message}" />
<parameter name="#Level" layout="${level}" />
<parameter name="#Logger" layout="${logger}" />
</target>
</targets>
But because you are using a special dssettings.json (instead of appsettings.json), then you probably have to implement your own custom NLog layout renderer:
https://github.com/NLog/NLog/wiki/How-to-write-a-custom-layout-renderer
Maybe you can use the source-code from the above nuget-package as inspiration for loading dssettings.json. Or maybe create PullRequest that adds support for specifying non-default config-filename.

GBT algorithm using H2O 3.8.2.6 in R and Rapidminer

I tried tuning parameters for GBM H2O in R using,
https://github.com/h2oai/h2o-3/blob/3.10.0.7/h2o-docs/src/product/tutorials/gbm/gbmTuning.Rmd
I then tried applying the tuned hyper-parameters in Rapidminer for the same data set. In R I got accuracy 97%, whereas in RapidMiner with the same parameters I am getting only 91% accuracy. Both R and RapidMiner use the same version of H2O package. But why is this difference in the accuracy?
My RapidMiner process is given below:
<?xml version="1.0" encoding="UTF-8"?><process version="9.0.002">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.0.002" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="9.0.002" expanded="true" height="68" name="Retrieve Mode_of_Labor_Data" width="90" x="45" y="34">
<parameter key="repository_entry" value="../Data/Mode_of_Labor_Data"/>
</operator>
<operator activated="true" class="set_role" compatibility="9.0.002" expanded="true" height="82" name="Set Role" width="90" x="313" y="34">
<parameter key="attribute_name" value="Mode of Delivery"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="split_data" compatibility="9.0.002" expanded="true" height="103" name="Split Data" width="90" x="447" y="238">
<enumeration key="partitions">
<parameter key="ratio" value="0.8"/>
<parameter key="ratio" value="0.2"/>
</enumeration>
<parameter key="sampling_type" value="automatic"/>
<parameter key="use_local_random_seed" value="true"/>
<parameter key="local_random_seed" value="1234"/>
</operator>
<operator activated="true" class="h2o:gradient_boosted_trees" compatibility="9.0.000" expanded="true" height="103" name="Gradient Boosted Trees" width="90" x="581" y="136">
<parameter key="number_of_trees" value="10000"/>
<parameter key="reproducible" value="false"/>
<parameter key="maximum_number_of_threads" value="4"/>
<parameter key="use_local_random_seed" value="true"/>
<parameter key="local_random_seed" value="1992"/>
<parameter key="maximal_depth" value="15"/>
<parameter key="min_rows" value="16.0"/>
<parameter key="min_split_improvement" value="1.0E-4"/>
<parameter key="number_of_bins" value="1024"/>
<parameter key="learning_rate" value="0.05"/>
<parameter key="sample_rate" value="1.0"/>
<parameter key="distribution" value="bernoulli"/>
<parameter key="early_stopping" value="true"/>
<parameter key="stopping_rounds" value="5"/>
<parameter key="stopping_metric" value="AUC"/>
<parameter key="stopping_tolerance" value="1.0E-4"/>
<parameter key="max_runtime_seconds" value="0"/>
<list key="expert_parameters">
<parameter key="nbins_cats" value="2048"/>
<parameter key="learn_rate_annealing" value="0.99"/>
<parameter key="col_sample_rate" value="0.76"/>
<parameter key="col_sample_rate_per_tree" value="0.91"/>
<parameter key="col_sample_rate_change_per_level" value="0.97"/>
</list>
</operator>
<operator activated="true" class="apply_model" compatibility="9.0.002" expanded="true" height="82" name="Apply Model" width="90" x="715" y="340">
<list key="application_parameters"/>
<parameter key="create_view" value="false"/>
</operator>
<operator activated="true" class="performance_classification" compatibility="9.0.002" expanded="true" height="82" name="Performance" width="90" x="782" y="34">
<parameter key="main_criterion" value="accuracy"/>
<parameter key="accuracy" value="true"/>
<parameter key="classification_error" value="false"/>
<parameter key="kappa" value="false"/>
<parameter key="weighted_mean_recall" value="false"/>
<parameter key="weighted_mean_precision" value="false"/>
<parameter key="spearman_rho" value="false"/>
<parameter key="kendall_tau" value="false"/>
<parameter key="absolute_error" value="false"/>
<parameter key="relative_error" value="false"/>
<parameter key="relative_error_lenient" value="false"/>
<parameter key="relative_error_strict" value="false"/>
<parameter key="normalized_absolute_error" value="false"/>
<parameter key="root_mean_squared_error" value="false"/>
<parameter key="root_relative_squared_error" value="false"/>
<parameter key="squared_error" value="false"/>
<parameter key="correlation" value="false"/>
<parameter key="squared_correlation" value="false"/>
<parameter key="cross-entropy" value="false"/>
<parameter key="margin" value="false"/>
<parameter key="soft_margin_loss" value="false"/>
<parameter key="logistic_loss" value="false"/>
<parameter key="skip_undefined_labels" value="true"/>
<parameter key="use_example_weights" value="true"/>
<list key="class_weights"/>
</operator>
<connect from_op="Retrieve Mode_of_Labor_Data" from_port="output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Split Data" to_port="example set"/>
<connect from_op="Split Data" from_port="partition 1" to_op="Gradient Boosted Trees" to_port="training set"/>
<connect from_op="Split Data" from_port="partition 2" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Gradient Boosted Trees" from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>

Exists relevel command in Rapidminer?

How I can change the order of the levels of a factor in a Label Attribute? I want to implement this R command:
(Label<- relevel(Label, ref = "Yes")
How can I do that ?

You could use the Execute R operator. Here's an example that changes the levels of the label in the iris data set. If you view the RapidMiner log you can see the print statements that show the structures.
<?xml version="1.0" encoding="UTF-8"?><process version="7.2.003">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.2.003" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="7.2.003" expanded="true" height="68" name="Retrieve Iris" width="90" x="112" y="85">
<parameter key="repository_entry" value="//Samples/data/Iris"/>
</operator>
<operator activated="true" class="r_scripting:execute_r" compatibility="7.2.000" expanded="true" height="103" name="Execute R" width="90" x="246" y="85">
<parameter key="script" value="# rm_main is a mandatory function,
# the number of arguments has to be the number of input ports (can be none)
rm_main = function(data)
{
print('Hello, world!')
# output can be found in Log View
# your code goes here
data$label = factor(data$label)
data2 = data
data2$label = relevel(data2$label, ref = 'Iris-virginica')
print(str(data))
print(str(data2))
return(list(data, data2))
}
"/>
</operator>
<connect from_op="Retrieve Iris" from_port="output" to_op="Execute R" to_port="input 1"/>
<connect from_op="Execute R" from_port="output 1" to_port="result 1"/>
<connect from_op="Execute R" from_port="output 2" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
I observe that the id and label roles are removed after the R script but this can be fixed by using the Set Role operator.

"classes" parallel in TestNG test suite doesn't work when executing the suite in TestNG Ant task

I set up Selenium grid2 and it works well with following paralleled TestNG test suite:
<!DOCTYPE suite SYSTEM "http://beust.com/testng/testng-1.0.dtd" >
<suite name="Sample Test Suite" parallel="classes" thread-count="2">
<test name="Test in Chrome" preserve-order="true">
<parameter name="browser" value="chrome" />
<classes>
<class name="testCases.SampleCase1" />
<class name="testCases.SampleCase2" />
</classes>
</test>
</suite>
But the parallel doesn't work any more when I execute the tests in TestNG Ant task.
And it works again after I change the parallel mode in the test suite file with "tests" as below:
<!DOCTYPE suite SYSTEM "http://beust.com/testng/testng-1.0.dtd" >
<suite name="Sample Test Suite" parallel="tests" thread-count="2">
<test name="Test1 in Chrome" preserve-order="true">
<parameter name="browser" value="chrome" />
<classes>
<class name="testCases.SampleCase1" />
</classes>
</test>
<test name="Test2 in Chrome" preserve-order="true">
<parameter name="browser" value="chrome" />
<classes>
<class name="testCases.SampleCase2" />
</classes>
</test>
</suite>
So does that mean TestNG ant task doesn't support "classes" paralleled test suite?

This issue is solved after replace the tsetng.jar from version 6.2 to 6.8.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Read Wordlist into Rapidminer Execute R - r

Related

What is the criteria of data that will be used in FP-Growth mining in rapidminner?

NLog use Connection String Name in appsettings

GBT algorithm using H2O 3.8.2.6 in R and Rapidminer

Exists relevel command in Rapidminer?

"classes" parallel in TestNG test suite doesn't work when executing the suite in TestNG Ant task

Categories

Resources