avoid string printed to console getting truncated (in RStudio) - r

I want to print a long string to the RStudio console so that it does not get truncated.
> paste(1:300, letters, collapse=" ")
[1] "1 a 2 b 3 c 4 d 5 e 6 f 7 g 8 h 9 i
...
181 y 182 z 183 a 184 b... <truncated>
I supposed this should be fairly simple, but I cannot figure out how. I tried
options(max.print = 10000)
and looked through the args on the print help pages. Still no luck.
What parameter / settings to I have to change to achieve this?

This is an RStudio-specific feature, intended to help resolve problems where printing overly long strings could cause IDE sluggishness. (I believe it was added with the latest release, v0.99.896)
You can opt-out of this truncation by setting the Limit length of lines displayed in the console to: option to 0 (see the final option in the dialog):

Related

getting < table of extent 0 > when using table() function to get table of frequency

Using RStudio from Anaconda, I am trying to generate a table of frequencies from a CSV file. When I run the code, instead of the expected table of frequencies, I get < table of extent 0 > as a result.
I tried running the same code in R (instead of RStudio) and it works as expected there. I am using RStudio from Anaconda, which already cause me a few problems upon reading code files, so I suspect it might be linked?
Code :
sn <- read.csv("social_network.csv", header = T)
table(sn$Site)
File content > head(sn):
ID.Gender.Age.Site.Times
1 1;male;24;None;0
2 2;female;26;Facebook;20
3 3;male;54;Facebook;2
4 4;female;42;Facebook;7
5 5;male;54;None;
6 6;female;21;Facebook;3
Expected result:
Facebook LinkedIn MySpace None Other Twitter
93 3 22 70 11 3
Actual result:
< table of extent 0 >
The column delimiter is not correctly set. Please read the file by specifying the correct delimiter:
sn <- read.csv('social_network.csv', header = TRUE, sep = ';')

Why do i have to do the "<-"? Can't i design my function to bypass that?

One of the things i don't like in r is the save process. Since i am always developing, i have large working environments, and when i save i like to save a specific object frequently. And one of the most annoying things to me is the save process can be so complicated. The object (which is one of up to 10 at a time) is a list of 10 to 20 various data frames (ranging from rasterized images, to medium and large data frames), that are all used in different ways by different functions, which can get very complex.
One of the things that i have not been able to figure out is during my function (if i am performing something that will change that data), i would like to save the changed object back to the directory automatically. Instead of i have to do something as follows. Note this is fine to do with a list of objects through a for loop, but i would like to do it for the object I input into the function.
# obtain the name of the object you will be inputing into
# the function in character form
dat.name<-ls(pattern="dat")
#or select it from a list if there are multiple
dat.name<-select.list(ls(pattern="dat"))
# do the function with the object assign it to a new name just in case
# something doesn't work
tmp.dat<-cell.creator(dat)
#next assign the tmp to the real
assign(dat, tmp.dat)
##or## just do the straight up rename if you are brave,
#and i am starting to get pretty brave with some of my functions
dat<-cell.creator(dat)
#paste .rdata on the back to create a file name
file.name<-paste(dat.name, ".rdata")
#then... FINALLY save it
save(dat, file=file.name)
What i really want to do is internalize those commands into the function, but (unless i am not understanding this) there is nothing that stores the way my object is named during the input, unless i input it with quotations. Which doesn't allow me to use the tabbing autocomplete in rgui. :(
so, lets say dat is
bob<-sample(seq(1,1000))
and my function sorts my object
bob.sorter<-function(dat){
dat<-sort(dat)
return(dat)}
So now when i input bob, i would like something to just go ahead and save bob
for me basically do the equivalent of
dat<-cell.creator(dat)
Am i missing something here?
I don't fully understand your question, but this seems to address part of it. The following is a function which will take an object assigned to a variable (e.g. bob) and automatically saves it to a file whose name is the variable name followed by .rdata (e.g. "bob.rdata") without the need to actually type the file name:
qsave <- function(dat){
dat.name <- deparse(substitute(dat))
file.name <- paste0(dat.name,".rdata")
save(list = dat.name, file=file.name)
}
To test it:
> bob <- islands
> qsave(bob)
> rm(bob) #bob is now gone
> load("~/bob.rdata") #you can check that this restores bob
You can do this:
set.seed(1492) # reproducible science
bob <- sample(1:1000, 500) # the actual way sample() shld be called
str(bob)
## int [1:500] 278 216 185 111 52 9 848 507 388 763 ...
bob_sorter <- function(dat) {
dat <- dat[order(dat)] # actual sorting happening
dat
}
str(bob_sorter(bob))
## int [1:500] 3 6 7 8 9 10 11 13 14 17 ...
bobs_silly_sorter <- function(dat) {
passed_in_name <- as.character(substitute(dat)) # pls never do this
dat <- dat[order(dat)]
assign(passed_in_name, dat, envir=.GlobalEnv) # pls never do this
}
str(bob)
## int [1:500] 278 216 185 111 52 9 848 507 388 763 ...
bobs_silly_sorter(bob)
str(bob)
## int [1:500] 3 6 7 8 9 10 11 13 14 17 ...
It's horribad. Your future self will prbly hate you for doing it. And, anyone else who has to work with your code will also end up muttering obscenities under their breath at you every time you walk by them.

asRules(tree) R save rules

I do have next trouble:
I created a decision tree with R based on rpart library, and since I have a broad list of variables, rules are and endeless list.
By using asRules(tree) from rattle library, result is nicer than by just running tree once tree is computed.
The problem is the set of rules is longer than number of lines printeables from console, so I can't copy them by Control + C, and by saving this result into a variable, for instance:
t <- asRules(tree)
I would expect something like
Rule number: 1 [target=0 cover=500 (4%) prob=0.8]
var1 < 10
var2 < 2
var3 >=45
var4 >=5
Eventhough result is
[1] 297 242 295 126 127 124
And obviously this isn't what I am looking for.
So I understand 3 ways of solving:
Increasing limit of printable lines to access from console (I don't know how to do that).
Print in console with a key press to continue, in order to first copy, then paste, and the pressing the button to get next results (I don't know how to do that either).
Being able to save bunch of rules into a txt file or something similar instead of [1] 297 242 295 126 127 124.
Guys, any help is very much appreciated!
Thank you!
For #3 use
sink(file='somefile.txt')
asRules(tree)
sink()

Optimizing the sum of a variable in R given a constraint

Using the following dataset:
ID=c(1:24)
COST=c(85,109,90,104,107,87,99,95,82,112,105,89,101,93,111,83,113,81,97,97,91,103,86,108)
POINTS=c(113,96,111,85,94,105,105,95,107,88,113,100,96,89,89,93,100,92,109,90,101,114,112,109)
mydata=data.frame(ID,COST,POINTS)
I need a R function that will consider all combinations of rows where the sum of 'COST' is less than a fixed value - in this case, $500 - and make the optimal selection based on the summed 'POINTS'.
Your help is appreciated.
So since this post is still open I thought I would give my solution. These kinds of problems are always fun. So, you can try to brute force the solution by checking all possible combinations (some 2^24, or over 16 million) one by one. This could be done by considering that for each combination, a value is either in it or not. Thinking in binary you could use the follow function code which was inspired by this post:
#DO NOT RUN THIS CODE
for(i in 1:2^24)
sum_points[i]<-ifelse(sum(as.numeric((intToBits(i)))[1:24] * mydata$COST) < 500,
sum(as.numeric((intToBits(i)))[1:24] * mydata$POINTS),
0)
I estimate this would take many hours to run. Improvements could be made with parallelization, etc, but still this is a rather intense calculation. This method will also not scale very well, as an increase by 1 (to 25 different IDs now) will double the computation time. Another option would be to cheat a little. For example, we know that we have to stay under $500. If we added up the n cheapest items, at would n would we definitely be over $500?
which(cumsum(sort(mydata$COST))>500)
[1] 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
So any more than 5 IDs chosen and we are definitely over $500. What else.
Well we can run a little code and take the max for that portion and see what that tells us.
sum_points<-1:10000
for(i in 1:10000)
sum_points[i]<-ifelse(sum(as.numeric((intToBits(i)))[1:24]) <6,
ifelse(sum(as.numeric((intToBits(i)))[1:24] * mydata$COST) < 500,
sum(as.numeric((intToBits(i)))[1:24] * mydata$POINTS),
0),
0)
sum_points[which.max(sum_points)]
[1] 549
So we have to try to get over 549 points with the remaining 2^24 - 10000 choices. But:
which(cumsum(rev(sort(mydata$POINTS)))<549)
[1] 1 2 3 4
Even if we sum the 4 highest point values, we still dont beat 549, so there is no reason to even search those. Further, the number of choices to consider must be greater than 4, but less than 6. My gut feeling tells me 5 would be a good number to try. Instead of looking at all 16 millions choices, we can just look at all of the ways to make 5 out of 24, which happens to be 24 choose 5:
num<-1:choose(24,5)
combs<-combn(24,5)
sum_points<-1:length(num)
for(i in num)
sum_points[i]<-ifelse(sum(mydata[combs[,i],]$COST) < 500,
sum(mydata[combs[,i],]$POINTS),
0)
which.max(sum_points)
[1] 2582
sum_points[2582]
[1] 563
We have a new max on the 2582nd iteration. To retrieve the IDs:
mydata[combs[,2582],]$ID
[1] 1 3 11 22 23
And to verify that nothing went wrong:
sum(mydata[combs[,2582],]$COST)
[1] 469 #less than 500
sum(mydata[combs[,2582],]$POINTS)
[1] 563 #what we expected.

R readHTMLTable() function error

I'm running into a problem when trying to use the readHTMLTable function in the R package XML. When running
library(XML)
baseurl <- "http://www.pro-football-reference.com/teams/"
team <- "nwe"
year <- 2011
theurl <- paste(baseurl,team,"/",year,".htm",sep="")
readurl <- getURL(theurl)
readtable <- readHTMLTable(readurl)
I get the error message:
Error in names(ans) = header :
'names' attribute [27] must be the same length as the vector [21]
I'm running 64 bit R 2.15.1 through R Studio 0.96.330. It seems there are several other questions that have been asked about the readHTMLTable() function, but none addressed this specific question. Does anyone know what's going on?
When readHTMLTable() complains about the 'names' attribute, it's a good bet that it's having trouble matching the data with what it's parsed for header values. The simplest way around this is to simply turn off header parsing entirely:
table.list <- readHTMLTable(theurl, header=F)
Note that I changed the name of the return value from "readtable" to "table.list". (I also skipped the getURL() call since 1. it didn't work for me and 2. readHTMLTable() knows how to handle URLs). The reason for the change is that, without further direction, readHTMLTable() will hunt down and parse every HTML table it can find on the given page, returning a list containing a data.frame for each.
The page you have sent it after is fairly rich, with 8 separate tables:
> length(table.list)
[1] 8
If you were only interested in a single table on the page, you can use the which attribute to specify it and receive its contents as a data.frame directly.
This could also cure your original problem if it had choked on a table you're not interested in. Many pages still use tables for navigation, search boxes, etc., so it's worth taking a look at the page first.
But this is unlikely to be the case in your example since it actually choked on all but one of them. In the unlikely event that the stars aligned and you were only interested in the successfully-oarsed third table on the page (passing statistics) you could grab it like this, keeping header parsing on:
> passing.df = readHTMLTable(theurl, which=3)
> print(passing.df)
No. Age Pos G GS QBrec Cmp Att Cmp% Yds TD TD% Int Int% Lng Y/A AY/A Y/C Y/G Rate Sk Yds NY/A ANY/A Sk% 4QC GWD
1 12 Tom Brady* 34 QB 16 16 13-3-0 401 611 65.6 5235 39 6.4 12 2.0 99 8.6 9.0 13.1 327.2 105.6 32 173 7.9 8.2 5.0 2 3
2 8 Brian Hoyer 26 3 0 1 1 100.0 22 0 0.0 0 0.0 22 22.0 22.0 22.0 7.3 118.7 0 0 22.0 22.0 0.0

Resources