Need to know when is ExtractCMLObject test step used in CATT and what is the purpose of this test step - automated-tests

In a test script gave following steps
DSConnect
SelectTradeByID
ExtractCMLObject
ExtractCMLObject creates a new data document using the data of trade, as per the description given. The data document was not created.

Related

How can we use multiple csv files in one spec file using Taiko+Gauge

I am getting error when I declare different csv files according to test data to different scenarios on specs level while using Taiko + Gauge.
Any one can help on this?
Reference:
specs with multiple csv files
eg.
table:specs/case_sclm.csv
Verify test method1
table:specs/case_creation_ts_record.csv
Verify test method2,
Here both the above methods having 2 different csv files passed as arguments with test data.
Error showing is : Multiple data table present, ignoring table
Advance thanks for any leads or help!
Reference :
https://github.com/getgauge/gauge/issues/1518#issue-513703584
A spec file can have only one table in this form:
table:some/table.csv
because Gauge will use this table for data driven execution
In order to pass tables as arguments, you need to use Table Parameters
In your example try something like this
* Verify test method <table:specs/case_sclm.csv>
* Verify test method <table:specs/case_creation_ts_record.csv>
The same step implementation will receive a Table object with data from respective csv files.

Can AZ ML workbench reference multiple data sources from Data Prep Transform Dataflow expression

Using AZ ML workbench for a class project (required tool) I coded the desired logic below in an exploration notebook but cannot find a way to include this into a Data-prep Transform Data flow.
all_columns = df.columns
sum_columns = [col_name for col_name in all_columns if col_name not in ['NPI', 'Gender', 'State', 'Credentials', 'Specialty']]
sum_op_columns = list(set(sum_columns) & set(df_op['Drug Name'].values))
The logic is using the column names from one data source df_op (opioid drugs) to choose which subset of columns to include from another data source df (all drugs). When adding a py script/expression Transform Data Flow I'm only seeing the ability to reference the single df. Alternatives?
I may have a way for you to access both data frames.
In Workbench, once you have the data sources that you need loaded, right click on one and select "Generate Data Access Code File".
Once there you're automatically given code to access that specific file. However, you can use the same code to access the other files.
In the screenshot above, I have two data sources. I can use the below code to access them both as a pandas data frame and manipulate them as I need.
df_salary = datasource.load_datasource('SalaryData.dsource')
df_startup = datasource.load_datasource('50-Startups.dsource')
I believe from there you can save your updated data frame to a CSV and then use that in the train script.
Hope that helps or at least points you to another solution.

PowerBI: How to save result of R script?

Is it possible to implement the following scenario in Power BI Desktop?
Load data from Excel file to several tables
Make calculation with R script from several data sources
Store results of calculation to new table in Power BI (.pbix)
The idea is to use Power BI Desktop for solving "transportation problem" with linear programming in R. Before solver will be running we need to make data transformations from several data sources. I'm new in Power BI. I see that it is possible to apply R scripts for loading and transformation of data, and visualizations. But I need the possibility of saving the results of calculation, for the subsequent visualization by the regular means of Power BI. Is it possible?
As I mentioned in my comment, this post would have solved most of your challenges. That approach replaces one of the tables with a new one after the R script, but you're specifically asking to produce a new table, presumably leaving the input tables untouched. I've recently written a post where you can do this using Python in the Power Query Editor. The only difference in your case would be the R script itself.
Here's how I would do it with an R script:
Data samples:
Table1
Date,Value1
2108-10-12,1
2108-10-13,2
2108-10-14,3
2108-10-15,4
2108-10-16,5
Table2
Date,Value2
2108-10-12,10
2108-10-13,11
2108-10-14,12
2108-10-15,13
2108-10-16,14
Power Query Editor:
With these tables loaded either from Excel or CSV files, you've got this setup in the Power Query Editor::
Now you can follow these steps to get a new table using an R script:
1. Change the data type of the Date Column to Text
2. Click Enter Data and click OK to get an empty table named Table3 by default.
3. Select the Transform tab and click Run R Script to open the Run R Script Edtor.
4. Leave it empty and click OK.
5. Remove = R.Execute("# 'dataset' holds the input data for this script",[dataset=#"Changed Type"]) from the Formula Bar and insert this: = R.Execute("# R Script:",[df1=Table1, df2=Table2]).
6. If you're promted to do so, click Edit Permission and Run.
7. Click the gear symbol next to Run R Scritp under APPLIED STEPS and insert the following snippet:
R script:
df3 <- merge(x = df1, y = df2, by = "Date", all.x = TRUE)
df3$Value3 <- df1$Value1 + df2$Value2
This snippet produces a new dataframe df3 by joining df1 and df2, and adds a new column Value3. This is a very simple setup but now you can do pretty much anything by just replacing the join and calculation methods:
8. Click Home > Close&Apply to get back to Power BI Desktop (Consider changing the data type of the Date column in Table3 from Text to Date before you do that, depending on how you'd like you tables, charts and slicers to behave.)
9. Insert a simple table to make sure everything went smoothly
I hope this was exactly what you were looking for. Let me know if not and I'll take another look at it.

How to get ngrams with word2phrase function in wordVectors package in R?

I want to use the word2phrase() function in the wordVectors package to generate some ngrams for subsequent training using train_word2vec().
library(wordVectors)
word2phrase(train_file="txt.csv", output_file="ngrams.txt", min_count=10, threshold=50, force=TRUE)
The first time I run it, I got the following output message:
Vocab size (unigrams + bigrams): 20868
Words in train file: 193569
The second time I run it, I got the following output message:
Vocab size (unigrams + bigrams): 20868
Words in train file: 258092
So every time I run it, the "Words in train file" keeps increasing, while "Vocab size" stays the same. But when I check the output file "ngrams.txt", nothing really changes: I only have 1-gram and 2-grams in the file. How can I have n-grams stored in "ngrams.txt"?
To compute n-grams, you need to run word2phrase n-1 times successively.
After the first run, you have a vocabulary 1-grams and 2-grams. The second run, working from the output of the first run, can create 2-grams from words in the previous vocabulary so you can create 3-grams (and even 4-grams I guess if it creates a bi-gram of 2 bi-grams).
Thankfully, this is already implemented in the function prep_word2vec.
You can simply run:
library("wordVectors")
max_n_grams <- 4
prep_word2vec("txt.csv", "ngrams.txt", bundle_ngrams = max_n_grams,
min_count=10, threshold=50, force=TRUE)

R scripting in SPSS Modeler 16: change default "rowCount=1000" for modelerData

When applying R transform Field operation node in SPSS Modeler, for every script, the system will automatically add the following code on the top of my own script to interface with the R Add-on:
while(ibmspsscfdata.HasMoreData()){
modelerDataModel <- ibmspsscfdatamodel.GetDataModel()
modelerData <- ibmspsscfdata.GetData(rowCount=1000,missing=NA,rDate="None",logicalFields=FALSE)
Please note "rowCount=1000". When I process a table with >1000 rows (which is very normal), errors occur.
Looking for a way to change the default setting or any way to help to process table >1000 rows!
I've tried to add this at the beggining of my code and it works just fine:
while(ibmspsscfdata.HasMoreData())
{
modelerData <-rbind(modelerData,ibmspsscfdata.GetData(rowCount=1000,missing=NA,rDate="None",logicalFields=FALSE))
}
Note that you will consume a lot of memory with "big data" and parameters of .GetData() function should be set accordingly to "Read Data Options" in node setting.

Resources