asRules(tree) R save rules - r

I do have next trouble:
I created a decision tree with R based on rpart library, and since I have a broad list of variables, rules are and endeless list.
By using asRules(tree) from rattle library, result is nicer than by just running tree once tree is computed.
The problem is the set of rules is longer than number of lines printeables from console, so I can't copy them by Control + C, and by saving this result into a variable, for instance:
t <- asRules(tree)
I would expect something like
Rule number: 1 [target=0 cover=500 (4%) prob=0.8]
var1 < 10
var2 < 2
var3 >=45
var4 >=5
Eventhough result is
[1] 297 242 295 126 127 124
And obviously this isn't what I am looking for.
So I understand 3 ways of solving:
Increasing limit of printable lines to access from console (I don't know how to do that).
Print in console with a key press to continue, in order to first copy, then paste, and the pressing the button to get next results (I don't know how to do that either).
Being able to save bunch of rules into a txt file or something similar instead of [1] 297 242 295 126 127 124.
Guys, any help is very much appreciated!
Thank you!

For #3 use
sink(file='somefile.txt')
asRules(tree)
sink()

Related

R: Getting a value from a table based on a loop

I have a loop where I am trying to build a table by grabbing information from a driver table I import. What I'm stuck on is I want to loop through columns based on a loop, something like:
In the first loop through I want it to function like
df$a <- Driver$M1[i]
and then in the second loop through function like
df$a <- Driver$M2[i] and so on
Through searching I thought I had come across the solution of
df$a <- get(paste0("Driver$M",j,"[i]")) but I get the error
object 'Driver$M1[i]' not found
so I don't think "get" functions like I thought it did.
Could someone help me find out how to make this work?
Thanks
Iterating over the columns of a table "smoke"
> smoke
High Low Middle
current 51 43 22
former 92 28 21
never 68 22 9
is as simple as
> for (i in colnames(smoke)) {t = smoke[,i]; print(i); print(t)}
[1] "High"
current former never
51 92 68
[1] "Low"
current former never
43 28 22
[1] "Middle"
current former never
22 21 9
Thanks for everyone looking at this, I kept looking and came across writing it in a different way. Writing it this way seems to do what I was looking for: Driver[i,paste0("M",j)]
I'm not very experienced so I don't want to be sharing incorrect information but it seems like the $ function cant accept variables but by changing the way its written to Driver[row, column] column is looking for a string anyway so paste0() now works like I want it too.

Why do i have to do the "<-"? Can't i design my function to bypass that?

One of the things i don't like in r is the save process. Since i am always developing, i have large working environments, and when i save i like to save a specific object frequently. And one of the most annoying things to me is the save process can be so complicated. The object (which is one of up to 10 at a time) is a list of 10 to 20 various data frames (ranging from rasterized images, to medium and large data frames), that are all used in different ways by different functions, which can get very complex.
One of the things that i have not been able to figure out is during my function (if i am performing something that will change that data), i would like to save the changed object back to the directory automatically. Instead of i have to do something as follows. Note this is fine to do with a list of objects through a for loop, but i would like to do it for the object I input into the function.
# obtain the name of the object you will be inputing into
# the function in character form
dat.name<-ls(pattern="dat")
#or select it from a list if there are multiple
dat.name<-select.list(ls(pattern="dat"))
# do the function with the object assign it to a new name just in case
# something doesn't work
tmp.dat<-cell.creator(dat)
#next assign the tmp to the real
assign(dat, tmp.dat)
##or## just do the straight up rename if you are brave,
#and i am starting to get pretty brave with some of my functions
dat<-cell.creator(dat)
#paste .rdata on the back to create a file name
file.name<-paste(dat.name, ".rdata")
#then... FINALLY save it
save(dat, file=file.name)
What i really want to do is internalize those commands into the function, but (unless i am not understanding this) there is nothing that stores the way my object is named during the input, unless i input it with quotations. Which doesn't allow me to use the tabbing autocomplete in rgui. :(
so, lets say dat is
bob<-sample(seq(1,1000))
and my function sorts my object
bob.sorter<-function(dat){
dat<-sort(dat)
return(dat)}
So now when i input bob, i would like something to just go ahead and save bob
for me basically do the equivalent of
dat<-cell.creator(dat)
Am i missing something here?
I don't fully understand your question, but this seems to address part of it. The following is a function which will take an object assigned to a variable (e.g. bob) and automatically saves it to a file whose name is the variable name followed by .rdata (e.g. "bob.rdata") without the need to actually type the file name:
qsave <- function(dat){
dat.name <- deparse(substitute(dat))
file.name <- paste0(dat.name,".rdata")
save(list = dat.name, file=file.name)
}
To test it:
> bob <- islands
> qsave(bob)
> rm(bob) #bob is now gone
> load("~/bob.rdata") #you can check that this restores bob
You can do this:
set.seed(1492) # reproducible science
bob <- sample(1:1000, 500) # the actual way sample() shld be called
str(bob)
## int [1:500] 278 216 185 111 52 9 848 507 388 763 ...
bob_sorter <- function(dat) {
dat <- dat[order(dat)] # actual sorting happening
dat
}
str(bob_sorter(bob))
## int [1:500] 3 6 7 8 9 10 11 13 14 17 ...
bobs_silly_sorter <- function(dat) {
passed_in_name <- as.character(substitute(dat)) # pls never do this
dat <- dat[order(dat)]
assign(passed_in_name, dat, envir=.GlobalEnv) # pls never do this
}
str(bob)
## int [1:500] 278 216 185 111 52 9 848 507 388 763 ...
bobs_silly_sorter(bob)
str(bob)
## int [1:500] 3 6 7 8 9 10 11 13 14 17 ...
It's horribad. Your future self will prbly hate you for doing it. And, anyone else who has to work with your code will also end up muttering obscenities under their breath at you every time you walk by them.

avoid string printed to console getting truncated (in RStudio)

I want to print a long string to the RStudio console so that it does not get truncated.
> paste(1:300, letters, collapse=" ")
[1] "1 a 2 b 3 c 4 d 5 e 6 f 7 g 8 h 9 i
...
181 y 182 z 183 a 184 b... <truncated>
I supposed this should be fairly simple, but I cannot figure out how. I tried
options(max.print = 10000)
and looked through the args on the print help pages. Still no luck.
What parameter / settings to I have to change to achieve this?
This is an RStudio-specific feature, intended to help resolve problems where printing overly long strings could cause IDE sluggishness. (I believe it was added with the latest release, v0.99.896)
You can opt-out of this truncation by setting the Limit length of lines displayed in the console to: option to 0 (see the final option in the dialog):

Object not found error

Firstly I need to explain that I've had MINIMAL training on R and have 0 knowledge of coding languages or programmes like R so please excuse me if I ask silly questions or don't understand something basic.
Also, I have tried to look at past topics/answers on this but I'm having a hard time relating the answers to my data so I apologise if this question has already been answered.
Basically I have a data set and I'm trying to find the mean of two variables (Peak flow before a walk in the cold, and peak flow after a walk in the cold) in this set. This is the entire code I've used so far:
drugs <- read.table(file = "C:\\Users\\Becky\\My Documents\\Asthmadata.txt", header = TRUE)
drugs
str(drugs)
mean.Asthmadata <- tapply (Asthmadata$trial1, list(Asthmadata$PEFR1), mean)
mean.Asthmadata
It works fine until the mean.Asthmadata. The data comes up in R just fine with the other codes but when I get to the mean and do the mean.Asthmadata [...] code, I keep getting the same error: "object 'mean.Asthmadata' not found"
My friend used the same code I did and it worked for him so I'm confused. Am I doing something wrong?
Thanks
EDIT:
#BenBolker
This is my data set
trial1 PEFR1 trial2 PEFR2
Before 310 After 299
Before 242 After 201
Before 340 After 232
Before 388 After 312
Before 294 After 221
Before 251 After 256
Before 391 After 327
Before 401 After 331
Before 287 After 231
And here's all the code I've used:
drugs <- read.table(file = "C:\\Users\\Becky\\My Documents\\Asthmadata.txt", header = TRUE)
drugs
str(drugs)
mean.drugs <- tapply (drugs$trial1, list(drugs$PEFR1), mean)
mean.drugs
The R version I have has two versions: i386 3.1.3, and x64 3.1.3 – I've tried both but neither seem to do what I want. I'm also using Windows 7 Home Premium 64bit. Hope I've included everything you need and I apologise if my formatting is off – I can't quite figure out how to format properly on here yet.
And the error I'm getting NOW is: “Error in split.default(X, group) : first argument must be a vector” when running the code Roland kindly provided. So I'm getting a different error each time I try it – it must be something I'm doing wrong.
Hope I've formatted that all correctly and included everything you need. Thanks :)
drugs <- read.table(header=TRUE,text="
trial1 PEFR1 trial2 PEFR2
Before 310 After 299
Before 242 After 201
Before 340 After 232
Before 388 After 312
Before 294 After 221
Before 251 After 256
Before 391 After 327
Before 401 After 331
Before 287 After 231")
In the current format you can calculate the mean before and after just by doing
mean(drugs$PEFR1)
and
mean(drugs$PEFR2)
What you may have had in mind was this shape:
drugs2 <- with(drugs,
data.frame(trial=c(as.character(trial1),
as.character(trial2)),
PEFR=c(PEFR1,PEFR2)))
I used with() for convenience -- it's a way to temporarily attach a data frame so you can refer directly to the variables therein.)
There's a bit of a pitfall in combining trial1 and trial2, as they get coerced to their numeric codes, which are all 1s in both cases, unless you use as.character() ...
you had the order of the variable to aggregate and the variable to group by backwards (you want to aggregate PEFR by trial, not the other way around)
mean.drugs <- with(drugs2,
tapply (PEFR, list(trial), mean))
## After Before
## 267.7778 322.6667

Read.CSV not working as expected in R

I am stumped. Normally, read.csv works as expected, but I have come across an issue where the behavior is unexpected. It most likely is user error on my part, but any help will be appreciated.
Here is the URL for the file
http://nces.ed.gov/ipeds/datacenter/data/SFA0910.zip
Here is my code to get the file, unzip, and read it in:
URL <- "http://nces.ed.gov/ipeds/datacenter/data/SFA0910.zip"
download.file(URL, destfile="temp.zip")
unzip("temp.zip")
tmp <- read.table("sfa0910.csv",
header=T, stringsAsFactors=F, sep=",", row.names=NULL)
Here is my problem. When I open the data csv data in Excel, the data look as expected. When I read the data into R, the first column is actually named row.names. R is reading in one extra row of data, but I can't figure out where the "error" occurs that is causing row.names to be a column. Simply, it looks like the data shifted over.
However, what is strange is that the last column in R does appear to contain the proper data.
Here are a few rows from the first few columns:
tmp[1:5,1:7]
row.names UNITID XSCUGRAD SCUGRAD XSCUGFFN SCUGFFN XSCUGFFP
1 100654 R 4496 R 1044 R 23
2 100663 R 10646 R 1496 R 14
3 100690 R 380 R 5 R 1
4 100706 R 6119 R 774 R 13
5 100724 R 4638 R 1209 R 26
Any thoughts on what I could be doing wrong?
My tip: use count.fields() as a quick diagnostic when delimited files do not behave as expected.
First, count the number of fields using table():
table(count.fields("sfa0910.csv", sep = ","))
# 451 452
# 1 6852
That tells you that all but one of the lines contains 452 fields. So which is the aberrant line?
which(count.fields("sfa0910.csv", sep = ",") != 452)
# [1] 1
The first line is the problem. On inspection, all lines except the first are terminated by 2 commas.
The question now is: what does that mean? Is there supposed to be an extra field in the header row which was omitted? Or were the 2 commas appended to the other lines in error? It may be best to contact whoever generated the data, if possible, to clarify the ambiguity.
I have a fix maybe based on mnel's comments
dat<-readLines(paste("sfa", '0910', ".csv", sep=""))
ncommas<-sapply(seq_along(dat),function(x){sum(attributes(gregexpr(',',dat[x])[[1]])$match.length)})
> head(ncommas)
[1] 450 451 451 451 451 451
all columns after the first have an extra seperator which excel ignores.
for(i in seq_along(dat)[-1]){
dat[i]<-gsub('(.*),','\\1',dat[i])
}
write(dat,'temp.csv')
tmp<-read.table('temp.csv',header=T, stringsAsFactors=F, sep=",")
> tmp[1:5,1:7]
UNITID XSCUGRAD SCUGRAD XSCUGFFN SCUGFFN XSCUGFFP SCUGFFP
1 100654 R 4496 R 1044 R 23
2 100663 R 10646 R 1496 R 14
3 100690 R 380 R 5 R 1
4 100706 R 6119 R 774 R 13
5 100724 R 4638 R 1209 R 26
the moral of the story .... listen to Joshua Ulrich ;)
Quick fix. Open the file in excel and save it. This will also delete the extra seperators.
Alternatively
dat<-readLines(paste("sfa", '0910', ".csv", sep=""),n=1)
dum.names<-unlist(strsplit(dat,','))
tmp <- read.table(paste("sfa", '0910', ".csv", sep=""),
header=F, stringsAsFactors=F,col.names=c(dum.names,'XXXX'),sep=",",skip=1)
tmp1<-tmp[,-dim(tmp)[2]]
I know you've found an answer but as your answer helped me to find out this, I'll share:
If you read into R a file with different amount of columns for different rows, like this:
1,2,3,4,5
1,2,3,4
1,2,3
it would be read-in filling the missing columns with NAs, like this:
1,2,3,4,5
1,2,3,4,NA
1,2,3,NA,NA
BUT!
If the row with the biggest columns is not the first row, like this:
1,2,3,4
1,2,3,4,5
1,2,3
then it would be read in a bit confusing way:
1,2,3,4
1,2,3,4
5,NA,NA,NA
1,2,3,NA
(overwhelming before you figure out the problem and quite simple after!)
Just hope it may help someone!
If you using local data, also make sure that it's in the right place. To be sure put it for instance in your working directory and change it via
setwd("C:/[User]/[MyFolder]")
directly in your R-console.

Resources