How can we use multiple csv files in one spec file using Taiko+Gauge - automated-tests

I am getting error when I declare different csv files according to test data to different scenarios on specs level while using Taiko + Gauge.
Any one can help on this?
Reference:
specs with multiple csv files
eg.
table:specs/case_sclm.csv
Verify test method1
table:specs/case_creation_ts_record.csv
Verify test method2,
Here both the above methods having 2 different csv files passed as arguments with test data.
Error showing is : Multiple data table present, ignoring table
Advance thanks for any leads or help!
Reference :
https://github.com/getgauge/gauge/issues/1518#issue-513703584

A spec file can have only one table in this form:
table:some/table.csv
because Gauge will use this table for data driven execution
In order to pass tables as arguments, you need to use Table Parameters
In your example try something like this
* Verify test method <table:specs/case_sclm.csv>
* Verify test method <table:specs/case_creation_ts_record.csv>
The same step implementation will receive a Table object with data from respective csv files.

Related

How to have a user prompt accept a phyloseq-type file as dataframe?

I am trying to write an R script (to run from Rstudio by my students) that can accept a file of the phyloseq object type (is several tables connected to each other stored into one object) so I can use that object to run the code on. Since it seems like I cannot accept any phyloseq file directly, I decided to strip the file into 3 tables (of which are stored in the input file) like so
input <- as.data.frame(readline(prompt = "Enter phyloseq object name "))
input.taxtab <- as.data.frame(tax_table(input)) #tax_table is a ps-function that takes out the taxonomy table
input.otutab <- as.data.frame(input#otu_tab)
input.samdat <- as.data.frame(input#sam_data)
however, the obvious issues I see with my code are that the input file cannot store all the information, but it also (at the moment only seems to accept characters, i.e. it only takes the filename literally, not the file itself).
I have tried to get the data frames separately (by having 3 times the prompt input) but that doesn't coerce the file itself either. Here is a snippet of that:
input.taxtab <- as.data.frame(tax_table(readline(prompt = "Enter phyloseq object name ")))
Hope that someone can help me evolve my code to accept an actual file instead of its name

SASxport to R: Errors while reading XPT SAS file

Anyone knows how to ignore/skip error while getting SAS export format file into R?
require(SASxport)
asc = SASxport::read.xport("..\\LLCP2018.XPT_", keep = cols)
Checking if the specified file has the appropriate header
Extracting data file information...
Reading the data file... ### Error in [.data.frame(ds, whichds) : undefined columns selected
I have plenty of columns and don't want to check one-by-one if it really exists.
Would like to ignore missing but there's no option within the function.
EDIT
Found an easy solution:
lu = SASxport::lookup.xport(xfile)
Now can probably choose from lu$names and intersect with cols. Still not every variables can be read but it's better.
But when I choose few columns (checked) I get another error unable to skip:
Error in if (any(tooLong)) { : missing value where TRUE/FALSE needed
Why this stops the reading process and returns null?
EDIT 2
Found workaround reading the same function but from different package:
asc <- foreign::read.xport(xfile)
Works, unfortunately, loads whole data - if there's some size limitation probably nothing I could do.

Can AZ ML workbench reference multiple data sources from Data Prep Transform Dataflow expression

Using AZ ML workbench for a class project (required tool) I coded the desired logic below in an exploration notebook but cannot find a way to include this into a Data-prep Transform Data flow.
all_columns = df.columns
sum_columns = [col_name for col_name in all_columns if col_name not in ['NPI', 'Gender', 'State', 'Credentials', 'Specialty']]
sum_op_columns = list(set(sum_columns) & set(df_op['Drug Name'].values))
The logic is using the column names from one data source df_op (opioid drugs) to choose which subset of columns to include from another data source df (all drugs). When adding a py script/expression Transform Data Flow I'm only seeing the ability to reference the single df. Alternatives?
I may have a way for you to access both data frames.
In Workbench, once you have the data sources that you need loaded, right click on one and select "Generate Data Access Code File".
Once there you're automatically given code to access that specific file. However, you can use the same code to access the other files.
In the screenshot above, I have two data sources. I can use the below code to access them both as a pandas data frame and manipulate them as I need.
df_salary = datasource.load_datasource('SalaryData.dsource')
df_startup = datasource.load_datasource('50-Startups.dsource')
I believe from there you can save your updated data frame to a CSV and then use that in the train script.
Hope that helps or at least points you to another solution.

R: give data frames new names based on contents of their current name

I'm writing a script to plot data from multiple files. Each file is named using the same format, where strings between “.” give some info on what is in the file. For example, SITE.TT.AF.000.52.000.001.002.003.WDSD_30.csv.
These data will be from multiple sites, so SITE, or WDSD_30, or any other string, may be different depending on where the data is from, though it's position in the file name will always indicate a specific feature such as location or measurement.
So far I have each file read into R and saved as a data frame named the same as the file. I'd like to get something like the following to work: if there is a data frame in the global environment that contains WDSD_30, then plot a specific column from that data frame. The column will always have the same name, so I could write plot(WDSD_30$meas), and no matter what site's files were loaded in the global environment, the script would find the WDSD_30 file and plot the meas variable. My goal for this script is to be able to point it to any folder containing files from a particular site, and no matter what the site, the script will be able to read in the data and find files containing the variables I'm interested in plotting.
A colleague suggested I try using strsplit() to break up the file name and extract the element I want to use, then use that to rename the data frame containing that element. I'm stuck on how exactly to do this or whether this is the best approach.
Here's what I have so far:
site.files<- basename(list.files( pattern = ".csv",recursive = TRUE,full.names= FALSE))
sfsplit<- lapply(site.files, function(x) strsplit(x, ".", fixed =T)[[1]])
for (i in 1:length(site.files)) assign(site.files[i],read.csv(site.files[i]))
for (i in 1:length(site.files))
if (sfsplit[[i]][10]==grep("PARQL", sfsplit[[i]][10]))
{assign(data.frame.getting.named.PARQL, sfsplit[[i]][10])}
else if (sfsplit[i][10]==grep("IRBT", sfsplit[[i]][10]))
{assign(data.frame.getting.named.IRBT, sfsplit[[i]][10])
...and so on for each data frame I'd like to eventually plot from.Is this a good approach, or is there some better way? I'm also unclear on how to refer to the objects I made up for this example, data.frame.getting.named.xxxx, without using the entire filename as it was read into R. Is there something like data.frame[1] to generically refer to the 1st data frame in the global environment.

R scripting in SPSS Modeler 16: change default "rowCount=1000" for modelerData

When applying R transform Field operation node in SPSS Modeler, for every script, the system will automatically add the following code on the top of my own script to interface with the R Add-on:
while(ibmspsscfdata.HasMoreData()){
modelerDataModel <- ibmspsscfdatamodel.GetDataModel()
modelerData <- ibmspsscfdata.GetData(rowCount=1000,missing=NA,rDate="None",logicalFields=FALSE)
Please note "rowCount=1000". When I process a table with >1000 rows (which is very normal), errors occur.
Looking for a way to change the default setting or any way to help to process table >1000 rows!
I've tried to add this at the beggining of my code and it works just fine:
while(ibmspsscfdata.HasMoreData())
{
modelerData <-rbind(modelerData,ibmspsscfdata.GetData(rowCount=1000,missing=NA,rDate="None",logicalFields=FALSE))
}
Note that you will consume a lot of memory with "big data" and parameters of .GetData() function should be set accordingly to "Read Data Options" in node setting.

Resources