Writing rules generated by Apriori

Writing rules generated by Apriori - r

I'm working with some large transactions data. I've been using read.transactions and apriori (parts of the arules package) to mine for frequent item pairings.
My problem is this: when rules are generated (using "inspect()") I can easily view them in the R console. Right now I'm manually copying the results into a text file, then saving and opening in excel. I'd like to just save the generated rules using write.csv, or something similar, but when I try, I receive an error that the data cannot be coerced into data.frame.
Does anyone have experience doing this successfully in R?

I know I'm answering my own question, but I found out that the solution is to use as() to convert the rules into a data frame. [I'm new to R, so I missed this my first time searching for a solution.] From there, it can easily be manipulated in any way you'd like (sub setting, sorting, exporting, etc.).
> mba = read.transactions(file="Book2.csv",rm.duplicates=FALSE, format="single", sep=",",cols=c(1,2));
> rules_1 <- apriori(mba,parameter = list(sup = 0.001, conf = 0.01, target="rules"));
> as(rules_1, "data.frame");

Another way to achieve that would be:
write(rules_1,
file = "association_rules.csv",
sep = ",",
quote = TRUE,
row.names = FALSE)

I found this post when struggling with writing my rules to excel. My solution is:
library(writexl)
write_xlsx(as(rules_1, "data.frame"), "rules_1.xlsx")
It is much easier to read and report when it is in excel.

Related

uploading CVS document into R

I am an R beginner. I am trying to upload a CSV file into R. However, When I upload the dataset, I am getting strange structures and I cannot figure out how to solve this problem. My CSV original document looks like so:
1,"male","other",39.1500015258789,"yes","no","yes","yes",6.19999980926514,8.09000015258789,0.200000002980232,0.889150023460388,12,"high","other"
When upload into R my data frame looks like so:
"\"1" "\"\"male\"\"" "\"\"other\"\"" 39.2 "\"\"yes\"\""
I tried to remove the slash and the quotation mark by using the following function: new <- read_csv("CollegeDistance.csv", TRUE, quote = ""). However , it does not really help.
Does someone now how to solve this problem. Thanks, by advance.

You have to change the second argument from TRUE to FALSE, because the file is only one row and can therefore not use it as column names.
new <- read_csv("CollegeDistance.csv", FALSE, quote = "")

Saving text from webpage for word cloud in R

I'm trying to practice making word clouds in R and I've seen the process nicely explained in sites like this (http://www.r-bloggers.com/building-wordclouds-in-r/) and in some videos on YouTube. So I thought I'd pick some random long document to practice myself.
I chose the script for Good Will Hunting. It is available here (https://finearts.uvic.ca/writing/websites/writ218/screenplays/award_winning/good_will_hunting.html). What I did is copy that into Notepad++ and start removing blank lines, names, etc. to try to clean up the data before saving. Saving as a .csv file doesn't seem to be an option so I saved it as a .txt file and R doesn't seem to want to read it in.
Both of the following lines return errors in R.
goodwillhunting <- read.csv("C:/Users/MyName/Desktop/goodwillhunting.txt", sep="", stringsAsFactors=FALSE)
goodwillhunting <- read.table("C:/Users/MyName/Desktop/goodwillhunting.txt", sep="", stringsAsFactors=FALSE)
My question is based on an html document what is the best way to save it to be read in to be used for something like this? I know with the rvest package you can read in webpages. The tutorials for word clouds have used .csv files so I'm not sure if that's what my end goal needs to be.
This might be a way to read in the data going that route?
test = read_html("https://finearts.uvic.ca/writing/websites/writ218/screenplays/award_winning/good_will_hunting.html")
text = html_text(test)
Any help is appreciated!

Here's one way:
library(rvest)
library(wordcloud)
test <- read_html("https://finearts.uvic.ca/writing/websites/writ218/screenplays/
award_winning/good_will_hunting.html")
text <- html_text(test)
content <- stringi::stri_extract_all_words(text, simplify = TRUE)
wordcloud(content, min.freq = 10, colors = RColorBrewer::brewer.pal(5,"Spectral"))
Which gives:

Here is a simple example:
library(wordcloud)
text = scan("fulltext.txt", character(0), strip.white = TRUE)
frequency_table = as.data.frame(table(text))
wordcloud(frequency_table$text, frequency_table$Freq)

Read ADCP data in R

I have ADCP measured data for a river and I am wondering if it is possible to read the ADCP file in R. I found a package called "oce" but I couldn't read the ADCP file.
The function I found in oce package is as follows:
read.oce
read.adp
I have uploaded the sample file here https://www.dropbox.com/sh/owian354auah6h3/379D5spA2X.
If anyone could help me how to read this kind of ADCP, I would highly appreciate.
Thanks.

Since it's just a text file, the simplest thing to do is to do something like
my_header <- readlines(myfile,n=7)
followed by
my_data <- read.table(myfile,skip=7,...)
(You'll need a few more parameters, probably, in those calls).
That way the metadata is separated from the array of data, which should simplify subsequent processing operations.

I'm very late to the party here, but hopefully, this will be useful for others.
data<-read.adp("my.prf", from = 1, to = 10, by=1, tz="UTC")
the arguments from = and to = correspond to the record number in the .sen file.

Can scan (or any import function) return partial results after it bump into errors?

Is there anything I can do to get partial results from after bumping into errors in a big file? I am using the following command to import data from files. This is the fastest way I know, but it's not robust. It can easily screw up everything because of a small error. I hope at least there is way that scan(or any reader) can quickly return which row/line has the error, or partial results it read (than I will have an idea where the error is). Then, I can skip enough lines to recover over 99% good data.
rawData = scan(file = "rawData.csv", what = scanformat, sep = ",", skip = 1, quiet = TRUE, fill = TRUE, na.strings = c("-", "NA", "Na","N"))
All importing data tutorials I found seem to assume the files are in good shape. I didn't find a useful hint to deal with dirty files.
I will sincerely appreciate any hint or suggestion! It was really frustrating.

Idea1: Open a file connection (with file function) and then scan line by line (with nlines=1). Put each scan into try to recover after reading a bad line.
Idea2: Use readLines to read the file in raw format; then use strsplit to parse. You can analyse this output to find bad lines and remove it.

The count.fields function will preprocess a table like file and give you how many fields it found on each line (in the sense that read.table will look for fields). This is often a quick way to identify lines that have a problem because they will show a different number of fields from what is expected (or just different from the majority of other lines).

Read SPSS file into R

I am trying to learn R and want to bring in an SPSS file, which I can open in SPSS.
I have tried using read.spss from foreign and spss.get from Hmisc. Both error messages are the same.
Here is my code:
## install.packages("Hmisc")
library(foreign)
## change the working directory
getwd()
setwd('C:/Documents and Settings/BTIBERT/Desktop/')
## load in the file
## ?read.spss
asq <- read.spss('ASQ2010.sav', to.data.frame=T)
And the resulting error:
Error in read.spss("ASQ2010.sav", to.data.frame = T) : error
reading system-file header In addition: Warning message: In
read.spss("ASQ2010.sav", to.data.frame = T) : ASQ2010.sav: position
0: character `\000' (
Also, I tried saving out the SPSS file as a SPSS 7 .sav file (was previously using SPSS 18).
Warning messages: 1: In read.spss("ASQ2010_test.sav", to.data.frame =
T) : ASQ2010_test.sav: Unrecognized record type 7, subtype 14
encountered in system file 2: In read.spss("ASQ2010_test.sav",
to.data.frame = T) : ASQ2010_test.sav: Unrecognized record type 7,
subtype 18 encountered in system file

I had a similar issue and solved it following a hint in read.spss help.
Using package memisc instead, you can import a portable SPSS file like this:
data <- as.data.set(spss.portable.file("filename.por"))
Similarly, for .sav files:
data <- as.data.set(spss.system.file('filename.sav'))
although in this case I seem to miss some string values, while the portable import works seamlessly. The help page for spss.portable.file claims:
The importer mechanism is more flexible and extensible than read.spss and read.dta of package "foreign", as most of the parsing of the file headers is done in R. They are also adapted to load efficiently large data sets. Most importantly, importer objects support the labels, missing.values, and descriptions, provided by this package.

The read.spss seems to be outdated a little bit, so I used package called memisc.
To get this to work do this:
install.packages("memisc")
data <- as.data.set(spss.system.file('yourfile.sav'))

You may also try this:
setwd("C:/Users/rest of your path")
library(haven)
data <- read_sav("data.sav")
and if you want to read all files from one folder:
temp <- list.files(pattern = "*.sav")
read.all <- sapply(temp, read_sav)

I know this post is old, but I also had problems loading a Qualtrics SPSS file into R. R's read.spss code came from PSPP a long time ago, and hasn't been updated in a while. (And Hmisc's code uses read.spss(), too, so no luck there.)
The good news is that PSPP 0.6.1 should read the files fine, as long as you specify a "String Width" of "Short - 255 (SPSS 12.0 and earlier)" on the "Download Data" page in Qualtrics. Read it into PSPP, save a new copy, and you should be in business. Awkward, but free.
,

You can read SPSS file from R using above solutions or the one you are currently using. Just make sure that the command is fed with the file, that it can read properly. I had same error and the problem was, SPSS could not access that file. You should make sure the file path is correct, file is accessible and it is in correct format.
library(foreign)
asq <- read.spss('ASQ2010.sav', to.data.frame=TRUE)
As far as warning message is concerned, It does not affect the data. The record type 7 is used to store features in newer SPSS software to make older SPSS software able to read new data. But does not affect data. I have used this numerous times and data is not lost.
You can also read about this at http://r.789695.n4.nabble.com/read-spss-warning-message-Unrecognized-record-type-7-subtype-18-encountered-in-system-file-td3000775.html#a3007945

It looks like the R read.spss implementation is incomplete or broken. R2.10.1 does better than R2.8.1, however. It appears that R gets upset about custom attributes in a sav file even with 2.10.1 (The latest I have). R also may not understand the character encoding field in the file, and in particular it probably does not work with SPSS Unicode files.
You might try opening the file in SPSS, deleting any custom attributes, and resaving the file.
You can see whether there are custom attributes with the SPSS command
display attributes.
If so, delete them (see VARIABLE ATTRIBUTE and DATAFILE ATTRIBUTE commands), and try again.
HTH,
Jon Peck

If you have access to SPSS, save file as .csv, hence import it with read.csv or read.table. I can't recall any problem with .sav file importing. So far it was working like a charm both with read.spss and spss.get. I reckon that spss.get will not give different results, since it depends on foreign::read.spss
Can you provide some info on SPSS/R/Hmisc/foreign version?

Another solution not mentioned here is to read SPSS data in R via ODBC. You need:
IBM SPSS Statistics Data File Driver. Standalone driver is enough.
Import SPSS data using RODBC package in R.
See the example here. However I have to admit that, there could be problems with very big data files.

For me it works well using memisc!
install.packages("memisc")
load('memisc')
Daten.Februar <-as.data.set(spss.system.file("NPS_Februar_15_Daten.sav"))
names(Daten.Februar)

I agree with #SDahm that the haven package would be the way to go. I myself have struggled a bit with string values when starting to use it, so I thought I'd share my approach on that here, too.
The "semantics" vignette has some useful information on this topic.
library(tidyverse)
library(haven)
# Some interesting information in here
vignette('semantics')
# Get data from spss file
df <- read_sav(path_to_file)
# get value labels
df <- map_df(.x = df, .f = function(x) {
if (class(x) == 'labelled') as_factor(x)
else x})
# get column names
colnames(df) <- map(.x = spss_file, .f = function(x) {attr(x, 'label')})

There is no such problem with packages you are using. The only requirement for read a spss file is to put the file into a PORTABLE format file. I mean, spss file have *.sav extension. You need to transform your spss file in a portable document that uses *.por extension.
There is more info in http://www.statmethods.net/input/importingdata.html

In my case this warning was combined with a appearance of a new variable before first column of my data with values -100, 2, 2, 2, ..., a shift in the correspondence between labels and values and the deletion of the last variable. A solution that worked was (using SPSS) to create a new dump variable in the last column of the file, fill it with random values and execute the following code:
(filename is the path to the sav file and in my case the original SPSS file had 62 columns, thus 63 with the additional dumb variable)
library(memisc)
data <- as.data.set(spss.system.file(filename))
copyofdata = data
for(i in 2:63){
names(data)[i] <- names(copyofdata)[i-1]
}
data[[1]] <- NULL
newcopyofdata = data
for(i in 2:62){
labels(data[[i]]) <- labels(newcopyofdata[[i-1]])
}
labels(data[[1]]) <- NULL
Hope the above code will help someone else.

Turn your UNICODE in SPSS off
Open SPSS without any data open and run the code below in your syntax editor
SET UNICODE OFF.
Open the data set and resave it to remove the Unicode
read.spss('yourdata.sav', to.data.frame=T) works correctly then

I just came came across an SPSS file that I couldn't get open using haven, foreign, or memisc, but readspss::read.por did the trick for me:
download.file("http://www.tcd.ie/Political_Science/elections/IMSgeneral92.zip",
"IMSgeneral92.zip")
unzip("IMSgeneral92.zip", exdir = "IMSgeneral92")
# rio, haven, foreign, memisc pkgs don't work on this file! But readspss does:
if(!require(readspss)) remotes::install_git("https://github.com/JanMarvin/readspss.git")
ims92 <- readspss::read.por("IMSgeneral92/IMS_Nov7 92.por", convert.factors = FALSE)
Nice! Thanks, #JanMarvin!

1)
I've found the program, stat-transfer, useful for importing spss and stata files into R.
It resolves the issue you mention by converting spss to R dataset. Also very useful for subsetting super large datasets into smaller portions consumable by R. Not free, but a very useful tool for working with datasets from different programs -- especially if you don't have access to them.
2)
Memisc package also has an spss function worth trying.