Export PMML to a text file? - r

Simple question, I have stored PMML code of an R object using pmmlcode <- pmml(my.object), and I would like some way to save it directly to a text file. The usual write.table method isn't working because the data is not a table.

You can simply use SaveXML as in the example below:
library(randomForest)
library(pmml)
data(airquality)
ozone.out <- randomForest(Ozone ~ Wind+Temp+Month, data=na.omit(airquality), ntree=200)
saveXML(pmml(ozone.out, data=airquality), "airquality_rf.pmml")

Try toString.XMLNode from XML package and then write to file with writeLines. You'll need to provide example data for a more complete answer.

I am using the iris data just to generate a dummy pmml file and sink command to put your pmml output into a .pmml file,
R > library(pmml)
R > lml <- lm(iris$Sepal.Length~iris$Sepal.Width)
R > sink("myPmml.pmml")
R > cat("<?xml version=\"1.0\"?>\n")
R > pmml(lml)
R > sink()
The output myPmml.pmml should be saved wherever your setwd is set on your .Rprofile , the default is "Mydocuments" in windows. Offcourse this will work even if you put .txt instead of .pmml in the sink() command , something like:
sink("mypmml.txt")
Edit: Added cat command to put xml tags on top, Thanks to J.Dimeo

In the absence of test code to create this but after solving my earlier problem with the availability of the pmml package on the UCLA CRAN mirror. This produces acceptable output for human readability although not in a format that will be interpretable my a PMML-aware application:
cat(paste(unlist(pmmlcode),"\n"), file="yourfile.txt")
Neither of these worked:
If it's just a character vector:
cat(pmmlcode, file="yourfile.txt")
Or if it's a list:
lapply(pmmlcode, cat, file="yourfile.txt", append=TRUE)

Related

R language Amelia specify prefix of output files

This R statement uses the Amelia package to create output data files containing imputed data:
ds.im <- amelia(ds, m=5, p2s=2)
The names of the 5 output files are: output1.csv to output5.csv
In the Amelia package, is there a way to specify the prefix of the output files to something more meaningful? For example, boat_impute1.csv to boat_impute5.csv
I could not locate such a command in the amelia documentation (https://cran.r-project.org/web/packages/Amelia/vignettes/amelia.pdf)
Thanks.
My question not a good one.
The output files are not written with the amelia command. Rather they are written with the
write.amelia command. One enters what I called the prefix with the file.stem command, such as:
write.amelia(ds.im, file.stem = "boat_impute", format = "csv")

How to scrape a downloaded PDF file with R

I’ve recently gotten into scraping (and programming in general) for my internship, and I came across PDF scraping. Every time I try to read a scanned pdf with R, I can never get it to work. I’ve tried using the file.choose() function to no avail. Do I need to change my directory, or how can I get the pdf from my files into R?
The code looks something like this:
> library(pdftools)
> text=pdf_text("C:/Users/myname/Documents/renewalscan.pdf")
> text
[1] ""
Also, using pdftables leads me here:
> library(pdftables)
> convert_pdf("C:/Users/myname/Documents/renewalscan.pdf","my.csv")
Error in get_content(input_file, format, api_key) :
Bad Request (HTTP 400).
You should use the packages pdftools and pdftables.
If you are trying to read text inside the pdf, then use pdf_text() function. What goes inside is the path (in your computer or web) to the pdf. For example
tt = pdf_text("C:/Users/Smith/Documents/my_file.pdf")
It would be nice if you were more specif and also give us reproducible example.
To use the PDFTables R package, you need to the run the following command:
convert_pdf('test/index.pdf', output_file = NULL, format = "xlsx-single", message = TRUE, api_key = "insert_API_key")
If you are looking to get tabular data, you might try tabulizer. Here is a full code tutorial: https://www.business-science.io/code-tools/2019/09/23/tabulizer-pdf-scraping.html
Basically, you can use this code from the tutorial:
library(tabulizer)
extract_tables(
file = "2019-09-23-tabulizer/endangered_species.pdf",
method = "decide",
output = "data.frame")

Error: could not find function "read_excel" using R on Mac

I am trying to link up my excel data set to R for statistical analysis. I am running on OSX Sierra (10.12.6) with R studio (1.0.153) and Java 8 (update 144).
The function "read_excel" was able to open my excel document a week ago. When I moved the excel and the R document together to another folder, it no longer worked. Reloading the libraries has had no effect. After multiple attempts (and restarting R studio and computer), something finally worked but function "lmer" was no longer found. After reloading library "lme4", "read_excel" no longer worked!
I have also tried using "read.xlsx" and "readWorksheet(loadWorkbook(...))", which didn't work. "read.csv" also did not work properly since the commas were creating disorganized columns and I am dealing with a larger excel workbook with ongoing changes.
Reading on Stack, question Importing .xlsx file into R has not resolved my issue! Please help!
Libraries loaded:
library(multcomp)
library(nlme)
library(XLConnect)
library(XLConnectJars)
library(lme4)
library(car)
library(rJava)
library(xlsx)
library(readxl)
R data file:
Dataset <- read_excel("Example.xlsx",sheet="testing")
#alternative line: Dataset <- read.xlsx("~/Desktop/My Stuff/Sample/Example.xlsx", sheet=7)
Dataset$AAA <- as.factor(Dataset$AAA)
Dataset$BBB <- as.factor(Dataset$BBB)
Dataset$CCC <- as.numeric(Dataset$CCC)
Dataset$DDD <- as.numeric(Dataset$DDD)
Dataset_lme = lmer(CCC ~ AAA + BBB + (1|DDD), data=Dataset)
While you called the library, try and see if adding readxl::read_excel(path = "yourPath",sheet=1), or even remove the sheet reference. It will automatically take the first sheet.
Perhaps, when you moved the excel and R file to another folder, the pathway should be change either.
Try change the pathway, or replace the pathay by file.choose() and search the excel file manually.
You called the package "xlsx", which can do the thing what you need. Maybe you're typing it wrong.
Dataset <- read.xlsx("Example.xlsx",sheetName="testing")
or
Dataset <- read.xlsx("Example.xlsx",sheetIndex="number of the excel sheet")
I hope it helps.
Try activating library(tidyverse) and library(readr) then use the read_excel().This should work.

Attach date to PDF generated with Sweave

I generate via Sweave a daily report. I would like to attach to the PDF´s name the current date in the format YYYYMMDD. I am using the following code to generate the file:
rnwfile <- system.file("Sweave", "Margin.Rnw", package = "utils")
Sweave(rnwfile)
tools::texi2pdf("Margin.tex")
Margin.Rnw is my master copy of the report I want to generate (mixing LaTeX with R code). The output I get is a the file Margin.pdf. I would like instead to have a file named *Margin_YYYYMMDD.pdf*.
I would appreciate if you have any advise.
See the output argument to ?RweaveLatex.
This is untested but should (?) work:
rnwfile <- system.file("Sweave", "Margin.Rnw", package = "utils")
outfn <- paste0("Margin_",format(Sys.time(),"%Y%m%d"),".tex")
Sweave(rnwfile,output=outfn)
tools::texi2pdf(outfn)

Read SPSS file into R

I am trying to learn R and want to bring in an SPSS file, which I can open in SPSS.
I have tried using read.spss from foreign and spss.get from Hmisc. Both error messages are the same.
Here is my code:
## install.packages("Hmisc")
library(foreign)
## change the working directory
getwd()
setwd('C:/Documents and Settings/BTIBERT/Desktop/')
## load in the file
## ?read.spss
asq <- read.spss('ASQ2010.sav', to.data.frame=T)
And the resulting error:
Error in read.spss("ASQ2010.sav", to.data.frame = T) : error
reading system-file header In addition: Warning message: In
read.spss("ASQ2010.sav", to.data.frame = T) : ASQ2010.sav: position
0: character `\000' (
Also, I tried saving out the SPSS file as a SPSS 7 .sav file (was previously using SPSS 18).
Warning messages: 1: In read.spss("ASQ2010_test.sav", to.data.frame =
T) : ASQ2010_test.sav: Unrecognized record type 7, subtype 14
encountered in system file 2: In read.spss("ASQ2010_test.sav",
to.data.frame = T) : ASQ2010_test.sav: Unrecognized record type 7,
subtype 18 encountered in system file
I had a similar issue and solved it following a hint in read.spss help.
Using package memisc instead, you can import a portable SPSS file like this:
data <- as.data.set(spss.portable.file("filename.por"))
Similarly, for .sav files:
data <- as.data.set(spss.system.file('filename.sav'))
although in this case I seem to miss some string values, while the portable import works seamlessly. The help page for spss.portable.file claims:
The importer mechanism is more flexible and extensible than read.spss and read.dta of package "foreign", as most of the parsing of the file headers is done in R. They are also adapted to load efficiently large data sets. Most importantly, importer objects support the labels, missing.values, and descriptions, provided by this package.
The read.spss seems to be outdated a little bit, so I used package called memisc.
To get this to work do this:
install.packages("memisc")
data <- as.data.set(spss.system.file('yourfile.sav'))
You may also try this:
setwd("C:/Users/rest of your path")
library(haven)
data <- read_sav("data.sav")
and if you want to read all files from one folder:
temp <- list.files(pattern = "*.sav")
read.all <- sapply(temp, read_sav)
I know this post is old, but I also had problems loading a Qualtrics SPSS file into R. R's read.spss code came from PSPP a long time ago, and hasn't been updated in a while. (And Hmisc's code uses read.spss(), too, so no luck there.)
The good news is that PSPP 0.6.1 should read the files fine, as long as you specify a "String Width" of "Short - 255 (SPSS 12.0 and earlier)" on the "Download Data" page in Qualtrics. Read it into PSPP, save a new copy, and you should be in business. Awkward, but free.
,
You can read SPSS file from R using above solutions or the one you are currently using. Just make sure that the command is fed with the file, that it can read properly. I had same error and the problem was, SPSS could not access that file. You should make sure the file path is correct, file is accessible and it is in correct format.
library(foreign)
asq <- read.spss('ASQ2010.sav', to.data.frame=TRUE)
As far as warning message is concerned, It does not affect the data. The record type 7 is used to store features in newer SPSS software to make older SPSS software able to read new data. But does not affect data. I have used this numerous times and data is not lost.
You can also read about this at http://r.789695.n4.nabble.com/read-spss-warning-message-Unrecognized-record-type-7-subtype-18-encountered-in-system-file-td3000775.html#a3007945
It looks like the R read.spss implementation is incomplete or broken. R2.10.1 does better than R2.8.1, however. It appears that R gets upset about custom attributes in a sav file even with 2.10.1 (The latest I have). R also may not understand the character encoding field in the file, and in particular it probably does not work with SPSS Unicode files.
You might try opening the file in SPSS, deleting any custom attributes, and resaving the file.
You can see whether there are custom attributes with the SPSS command
display attributes.
If so, delete them (see VARIABLE ATTRIBUTE and DATAFILE ATTRIBUTE commands), and try again.
HTH,
Jon Peck
If you have access to SPSS, save file as .csv, hence import it with read.csv or read.table. I can't recall any problem with .sav file importing. So far it was working like a charm both with read.spss and spss.get. I reckon that spss.get will not give different results, since it depends on foreign::read.spss
Can you provide some info on SPSS/R/Hmisc/foreign version?
Another solution not mentioned here is to read SPSS data in R via ODBC. You need:
IBM SPSS Statistics Data File Driver. Standalone driver is enough.
Import SPSS data using RODBC package in R.
See the example here. However I have to admit that, there could be problems with very big data files.
For me it works well using memisc!
install.packages("memisc")
load('memisc')
Daten.Februar <-as.data.set(spss.system.file("NPS_Februar_15_Daten.sav"))
names(Daten.Februar)
I agree with #SDahm that the haven package would be the way to go. I myself have struggled a bit with string values when starting to use it, so I thought I'd share my approach on that here, too.
The "semantics" vignette has some useful information on this topic.
library(tidyverse)
library(haven)
# Some interesting information in here
vignette('semantics')
# Get data from spss file
df <- read_sav(path_to_file)
# get value labels
df <- map_df(.x = df, .f = function(x) {
if (class(x) == 'labelled') as_factor(x)
else x})
# get column names
colnames(df) <- map(.x = spss_file, .f = function(x) {attr(x, 'label')})
There is no such problem with packages you are using. The only requirement for read a spss file is to put the file into a PORTABLE format file. I mean, spss file have *.sav extension. You need to transform your spss file in a portable document that uses *.por extension.
There is more info in http://www.statmethods.net/input/importingdata.html
In my case this warning was combined with a appearance of a new variable before first column of my data with values -100, 2, 2, 2, ..., a shift in the correspondence between labels and values and the deletion of the last variable. A solution that worked was (using SPSS) to create a new dump variable in the last column of the file, fill it with random values and execute the following code:
(filename is the path to the sav file and in my case the original SPSS file had 62 columns, thus 63 with the additional dumb variable)
library(memisc)
data <- as.data.set(spss.system.file(filename))
copyofdata = data
for(i in 2:63){
names(data)[i] <- names(copyofdata)[i-1]
}
data[[1]] <- NULL
newcopyofdata = data
for(i in 2:62){
labels(data[[i]]) <- labels(newcopyofdata[[i-1]])
}
labels(data[[1]]) <- NULL
Hope the above code will help someone else.
Turn your UNICODE in SPSS off
Open SPSS without any data open and run the code below in your syntax editor
SET UNICODE OFF.
Open the data set and resave it to remove the Unicode
read.spss('yourdata.sav', to.data.frame=T) works correctly then
I just came came across an SPSS file that I couldn't get open using haven, foreign, or memisc, but readspss::read.por did the trick for me:
download.file("http://www.tcd.ie/Political_Science/elections/IMSgeneral92.zip",
"IMSgeneral92.zip")
unzip("IMSgeneral92.zip", exdir = "IMSgeneral92")
# rio, haven, foreign, memisc pkgs don't work on this file! But readspss does:
if(!require(readspss)) remotes::install_git("https://github.com/JanMarvin/readspss.git")
ims92 <- readspss::read.por("IMSgeneral92/IMS_Nov7 92.por", convert.factors = FALSE)
Nice! Thanks, #JanMarvin!
1)
I've found the program, stat-transfer, useful for importing spss and stata files into R.
It resolves the issue you mention by converting spss to R dataset. Also very useful for subsetting super large datasets into smaller portions consumable by R. Not free, but a very useful tool for working with datasets from different programs -- especially if you don't have access to them.
2)
Memisc package also has an spss function worth trying.

Resources