How do I add displayName as an Attribute in pmml file using RStudio? - rstudio-server

*****Error: The pmml code that was generated in R was imported as a predictive model into our Electronic Medical Record application. When it was imported we received an error similar to the one below( I substituted Sepal.Length,Species,etc. for the names):
PMML>DataDictionary>DataField [Sepal.Length]: Recommended attribute 'displayName' missing.
PMML>DataDictionary>DataField [Sepal.Width]: Recommended attribute 'displayName' missing.
PMML>DataDictionary>DataField [Petal.Length]: Recommended attribute 'displayName' missing.
PMML>DataDictionary>DataField [Species]: Recommended attribute 'displayName' missing.
*****The error suggests that I need to add the displayName to my pmml. I have tried to use data from IRIS to add the displayName using the code below but to no success.
library(pmml)
#####-- run and fit the model
fit <- lm(Sepal.Length ~ ., data = iris)
fit_pmml <- pmml(fit)
#####--- create a list of row names from the data table---
list_of_names = colnames(iris)
rownames(attributes) <- c("displayName")
#####---- add row names to pmml
# ####-- I'm not sure how to write code to match the list_of_names to the names in the pmml code
add_data_field_attributes(fit_pmml, c("displayName" = c(list_of_names), list_of_names))
*****Attached is the format I am looking to get so that I can try re-importing the pmml file
<Application name="SoftwareAG PMML Generator" version="2.5.1"/>
<Timestamp>2022-05-10 10:08:33</Timestamp>
</Header>
<DataDictionary numberOfFields="4">
<DataField name="Sepal.Length" optype="continuous" dataType="double" " displayName="Sepal.Length" isCyclic="1"/>
<DataField name="Sepal.Width" optype="continuous" dataType="double" displayName="Sepal.Width" isCyclic="1"/>
<DataField name="Petal.Length" optype="continuous" dataType="double" displayName="Petal.Length" isCyclic="0"/>

Related

Function to check if model exists and load if not in R

I am trying to write some code that will be in a function used to predict on data with multiple models. Rather than manually loading in every model before running the function, I am trying to write some code that will check if the model object is present in the environment, if it is not it will load the model, and if it is return nothing. So far I have tried:
ifelse(exists("cs_gam_l"), NULL, readRDS("cs_gam_l.rds"))
However, this only loads in the first element of the model object (which is a list):
[[1]]
(Intercept) s(plate_x,plate_z).1 s(plate_x,plate_z).2 s(plate_x,plate_z).3
-7.0647984 -16.9798528 -18.2271461 -22.4370534
s(plate_x,plate_z).4 s(plate_x,plate_z).5 s(plate_x,plate_z).6 s(plate_x,plate_z).7
-19.2327755 -37.1586233 -30.2684923 4.6120894
s(plate_x,plate_z).8 s(plate_x,plate_z).9 s(plate_x,plate_z).10 s(plate_x,plate_z).11
28.8297008 45.2086431 -20.3660257 -16.2869315
s(plate_x,plate_z).12 s(plate_x,plate_z).13 s(plate_x,plate_z).14 s(plate_x,plate_z).15
-18.5888112 -1.4086437 -36.6699351 36.9229236
s(plate_x,plate_z).16 s(plate_x,plate_z).17 s(plate_x,plate_z).18 s(plate_x,plate_z).19
-7.2833487 41.9148567 11.3234356 -18.1201592
s(plate_x,plate_z).20 s(plate_x,plate_z).21 s(plate_x,plate_z).22 s(plate_x,plate_z).23
44.4676843 -6.3082659 -3.9912789 25.9499761
s(plate_x,plate_z).24 s(plate_x,plate_z).25 s(plate_x,plate_z).26 s(plate_x,plate_z).27
12.7216310 -20.5901414 42.3215648 309.0042229
s(plate_x,plate_z).28 s(plate_x,plate_z).29
-10.2289569 -0.5119436
What am I doing wrong/ how can I fix this?

Saving .maf file as a table

I am trying to save a .maf file as a table, but I always get the error below:
Error in as.data.frame.default(x[[i]], optional = TRUE) :
cannot coerce class ‘structure("MAF", package = "maftools")’ to a data.frame
This is the code I am using:
library(maftools)
laml.maf <- "/Users/PC/mc3.v0.2.8.PUBLIC.maf"
laml = read.maf(maf = laml.maf)
write.table(laml, file="/Users/PC/tp53atm.txt")
I understand that the .maf file has several fields, but I am not sure how to isolate them to save as a table. Any help would be much appreciated!
The problem is, that the write.table function doesn't know how to deal with an object of class MAF.
However, you can access the underlying data like this:
write.table(laml#data, file="/Users/PC/tp53atm.txt")
But note that this way you will only export the raw data, whereas the MAF object contains various other meta data:
> slotNames(laml)
[1] "data" "variants.per.sample" "variant.type.summary" "variant.classification.summary"
[5] "gene.summary" "summary"
"maf.silent" "clinical.data"
>

R removing duplicate siblings in xml data

I am working on bugs XML data set:
`</short_desc>
<report id="322231">
<update>
<when>1136281841</when>
<what>When uploading a objectice-c++ file (.mm) bugzilla sets the MIME type as application/octet-stream</what>
</update>
<update>
<when>1136420901</when>
<what>When uploading a objective-c++ file (.mm) bugzilla sets the MIME type as application/octet-stream</what>
</update>
</report>
</short_desc> `
I am creating a data frame from the above xml data by keeping only <when> and <what> node data. Due to duplicate content in the <what> node. I wish to keep only last node (most recent), if the content of <what> node in both the <update> is similar. It was supposed to be compared using cosine similarity in R. In case the data in <what> node is different, then I want to keep both in the data frame to be created. Please suggest, there are cases when there are more than two updates in single <report> and have approximately similar text.
try the following...
library(xml2)
sample data
doc <- read_xml( '<report id="322231">
<update>
<when>1136281841</when>
<what>When uploading a objective-c++ file (.mm) bugzilla sets the MIME type as application/octet-stream</what>
</update>
<update>
<when>1136420901</when>
<what>When uploading a objective-c++ file (.mm) bugzilla sets the MIME type as application/octet-stream</what>
</update>
</report>')
code
#create nodeset with all 'what'-nodes
what.nodes <- xml_find_all( doc, ".//what" )
#no make a data.frame
df <- data.frame(
#get report-attribute "id" by retracing the ancestor tree from the what.nodes
report_id = xml_attr( xml_find_first( what.nodes, ".//ancestor::report" ), "id" ),
#get the sibling 'when' fro the what-node
when = xml_text( xml_find_first( what.nodes, ".//preceding-sibling::when" ) ),
#get 'what'
what = xml_text( what.nodes ),
#set stringsAsfactors
stringsAsFactors = FALSE )
#get rows with unique values from the bottom-up
df[ !duplicated( df$what, fromLast = TRUE ), ]
output
# report_id when what
# 2 322231 1136420901 When uploading a objective-c++ file (.mm) bugzilla sets the MIME type as application/octet-stream

Reading xml data from oracle table column and parsing it in R

I have one scenario in R.
I have connected the oracle database with R through RODBC package and in one column of table there is xml data. Now when I am using xmlParse function its showing error as XML content does not seem to be XML. and class(xmldata) is data frame.
When i am copying the xml data and put it into new xml file and parsing though xmlParse function its getting parsed correctly and class(sourcefile) as XMLInternalDocument.
Error is raised because you are running XML::xmlParse on a dataframe object which is the returned value of RODBC::sqlQuery(), and not underlying XML content. Simply index the column and row value for specific XML content.
As example, below reads an XML (top 5 StackOverflow users in R tag) into a dataframe and runs xmlParse to reproduce error and another xmlParse call to resolve error.
Dataframe Build (replicating sqlQuery)
txt <- '<?xml version="1.0"?>
<stackoverflow>
<group lang="r">
<topusers>
<user>akrun</user>
<link>https://stackoverflow.com/users/3732271/akrun</link>
<location>Bengaluru, Karnataka, India</location>
<year_rep>15,900</year_rep>
<total_rep>328,573</total_rep>
<tag1>r</tag1>
<tag2>dataframe</tag2>
<tag3>dplyr</tag3>
</topusers>
<topusers>
<user>Dirk Eddelbuettel</user>
<link>https://stackoverflow.com/users/143305/dirk-eddelbuettel</link>
<location>Chicago, IL, United States </location>
<year_rep>5,588</year_rep>
<total_rep>253,481</total_rep>
<tag1>r</tag1>
<tag2>rcpp</tag2>
<tag3>c++</tag3>
</topusers>
<topusers>
<user>42-</user>
<link>https://stackoverflow.com/users/1855677/42</link>
<location>Alameda, CA</location>
<year_rep>4,143</year_rep>
<total_rep>193,407</total_rep>
<tag1>r</tag1>
<tag2>dataframe</tag2>
<tag3>plot</tag3>
</topusers>
<topusers>
<user>A5C1D2H2I1M1N2O1R2T1</user>
<link>https://stackoverflow.com/users/1270695/a5c1d2h2i1m1n2o1r2t1</link>
<location>Chennai, India</location>
<year_rep>3,982</year_rep>
<total_rep>141,425</total_rep>
<tag1>r</tag1>
<tag2>dataframe</tag2>
<tag3>reshape</tag3>
</topusers>
<topusers>
<user>Gavin Simpson</user>
<link>https://stackoverflow.com/users/429846/gavin-simpson</link>
<location>Regina, Canada </location>
<year_rep>2,780</year_rep>
<total_rep>124,779</total_rep>
<tag1>r</tag1>
<tag2>plot</tag2>
<tag3>dataframe</tag3>
</topusers>
</group>
</stackoverflow>'
res <- data.frame(Col1 = txt)
Error line
result1 <- xmlParse(res, asText=TRUE)
# Error: XML content does not seem to be XML: '1'
Resolved line (which yields no error)
# SINGLE XML
result1 <- xmlParse(res$Col1[[1]], asText=TRUE)
# MULTIPLE XML (ACROSS ALL ROWS)
result_list <- lapply(res$Col1, xmlParse, asText=TRUE)

Save a data frame to a file addressing by name

I have a data frame and a text variable containing the name of this data frame:
adsl = data.frame(a=2, b=7, w=17)
ds_name = "adsl"
I want to save my data frame from the workspace to the file named "dest_file". The code should be wrapped into a function get_r()
with the data frame name as an argument:
get_r(ds_name="adsl")
So I need to avoid using the explicit name "adsl" inside the code.
The following works almost correctly but the resulting data frame is called "temp_dataset", not "adsl":
get_r = function(ds_name){
temp_dataset = eval(parse(text=ds_name))
save(temp_dataset, file = "dest_file")
}
Here is another option which works wrong (the text string is being saved, not the data frame):
get_r = function(ds_name){
save(ds_name, file = "dest_file")
}
What should I do to make R just execute
save(adsl, file="dest_file")
inside the function? Thank you for any help.
Try
save(list = ds_name, file = "dest_file")
The list argument in save() allows you to pass the name of the data as a character string. See help(save) for more.

Resources