Sometimes I read posts where people use the print() function and I don't understand why it is used. Here for example in one answer the code is
print(fitted(m))
# 1 2 3 4 5 6 7 8
# 0.3668989 0.6083009 0.4677463 0.8685777 0.8047078 0.6116263 0.5688551 0.4909217
# 9 10
# 0.5583372 0.6540281
But using fitted(m) would give the same output. I know there are situations where we need print(), for example if we want create plots inside of loops. But why is the print() function used in cases like the one above?
I guess that in many cases usage of print is just a bad/redundant habit, however print has a couple of interesting options:
Data:
x <- rnorm(5)
y <- rpois(5, exp(x))
m <- glm(y ~ x, family="poisson")
m2 <- fitted(m)
# 1 2 3 4 5
# 0.8268702 1.0523189 1.9105627 1.0776197 1.1326286
digits - shows wanted number of digits
print(m2, digits = 3) # same as round(m2, 3)
# 1 2 3 4 5
# 0.827 1.052 1.911 1.078 1.133
na.print - turns NA values into a specified value (very similar to zero.print argument)
m2[1] <- NA
print(m2, na.print = "Failed")
# 1 2 3 4 5
# Failed 1.052319 1.910563 1.077620 1.132629
max - prints wanted number of values
print(m2, max = 2) # similar to head(m2, 2)
# 1 2
# NA 1.052319
I'm guessing, as I rarely use print myself:
using print() makes it obvious which lines of your code do printing and which ones do actual staff. It might make re-reading your code later easier.
using print() explicitly might make it easier to later refactor your code into a function, you just need to change the print into a return
programmers coming from a language with strict syntax might have a strong dislike towards the automatic printing feature of r
Related
When we want a sequence in R, we use either construction:
> 1:5
[1] 1 2 3 4 5
> seq(1,5)
[1] 1 2 3 4 5
this produces a sequence from start to stop (inclusive)
is there a way to generate a sequence from start to stop (exclusive)? like
[1] 1 2 3 4
Also, I don't want to use a workaround like a minus operator, like:
seq(1,5-1)
This is because I would like to have statements in my code that are elegant and concise. In my real world example the start and stop are not hardcoded integers but descriptive variable names. Using the variable_name -1 construction just my script uglier and difficult to read for a reviewer.
PS: The difference between this question and the one at remove the last element of a vector is that I am asking for sequence generation while the former focuses on removing the last element of a vector
Moreover the answers provided here are different and relevant to my problem
One possible solution would be
head(1:5, -1)
# [1] 1 2 3 4
or you could define your own function
seq_last_exclusive <- function(x) return(x[-length(x)])
seq_last_exclusive(1:5)
# [1] 1 2 3 4
We can use the following function
f <- function(start, stop, ...) {
if(identical(start, stop)) {
return(vector("integer", 0))
}
seq.int(from = start, to = stop - 1L, ...)
}
Test
f(1, 5)
# [1] 1 2 3 4
f(1, 1)
# integer(0)
I've read the other answers for issues related to the "promise already under evaluation" warning, but I am unable to see how they can help me avoid this problem.
Here I have a function that for one method, takes a default argument value that is a function of another value.
myfun <- function(x, ones = NULL) {
UseMethod("myfun")
}
myfun.list <- function(x, ones = NA) {
data.frame(x = x[[1]], ones)
}
ones <- function(x) {
rep(1, length(x))
}
So far, so good:
myfun(list(letters[1:5]))
## x ones
## 1 a NA
## 2 b NA
## 3 c NA
## 4 d NA
## 5 e NA
But when I define another method that sets the default for the ones argument as the function ones(x), I get an error:
myfun.character <- function(x, ones = ones(x)) {
myfun(as.list(x), ones)
}
myfun(letters[1:5])
## Error in data.frame(x = x[[1]], ones) :
## promise already under evaluation: recursive default argument reference or earlier problems?
For various reasons, I need to keep the argument name the same as the function name (for ones). How can I force evaluation of the argument within my fun.character? I also need this to work (which it does):
myfun(letters[1:5], 1:5)
## x ones
## 1 a 1
## 2 a 2
## 3 a 3
## 4 a 4
## 5 a 5
Thanks!
One would need to look deep into R's (notorious) environments to understand exactly, where it tries to find ones. The problem is located in the way supplied and default arguments are evaluated within a function. You can see this link from the R manual and also an explanation here.
The easy solution is to tell R where to look for it. It will save you the hassle. In your case that's the global environment.
Changing method myfun.character to tell it to look for ones in the global environment:
myfun.character <- function(x, ones = get('ones', envir = globalenv())(x)) {
myfun(as.list(x), ones)
}
will be enough here.
Out:
myfun(letters[1:5])
# x ones
#1 a 1
#2 a 1
#3 a 1
#4 a 1
#5 a 1
myfun(letters[1:5], 1:5)
# x ones
#1 a 1
#2 a 2
#3 a 3
#4 a 4
#5 a 5
I am trying to run simulation scenarios which in turn should provide me with the best scenario for a given date, back tested a couple of months. The input for a specific scenario has 4 input variables with each of the variables being able to be in 5 states (625 permutations). The flow of the model is as follows:
Simulate 625 scenarios to get each of their profit
Rank each of the scenarios according to their profit
Repeat the process through a 1-day expanding window for the last 2 months starting on the 1st Dec 2015 - creating a time series of ranks for each of the 625 scenarios
The unfortunate result for this is 5 nested for loops which can take extremely long to run. I had a look at the foreach package, but I am concerned around how the combining of the outputs will work in my scenario.
The current code that I am using works as follows, first I create the possible states of each of the inputs along with the window
a<-seq(as.Date("2015-12-01", "%Y-%m-%d"),as.Date(Sys.Date()-1, "%Y-%m-%d"),by="day")
#input variables
b<-seq(1,5,1)
c<-seq(1,5,1)
d<-seq(1,5,1)
e<-seq(1,5,1)
set.seed(3142)
tot_results<-NULL
Next the nested for loops proceed to run through the simulations for me.
for(i in 1:length(a))
{
cat(paste0("\n","Current estimation date: ", a[i]),";itteration:",i," \n")
#subset data for backtesting
dataset_calc<-dataset[which(dataset$Date<=a[i]),]
p=1
results<-data.frame(rep(NA,625))
for(j in 1:length(b))
{
for(k in 1:length(c))
{
for(l in 1:length(d))
{
for(m in 1:length(e))
{
if(i==1)
{
#create a unique ID to merge onto later
unique_ID<-paste0(replicate(1, paste(sample(LETTERS, 5, replace=TRUE), collapse="")),round(runif(n=1,min=1,max=1000000)))
}
#Run profit calculation
post_sim_results<-profit_calc(dataset_calc, param1=e[m],param2=d[l],param3=c[k],param4=b[j])
#Exctract the final profit amount
profit<-round(post_sim_results[nrow(post_sim_results),],2)
results[p,]<-data.frame(unique_ID,profit)
p=p+1
}
}
}
}
#extract the ranks for all scenarios
rank<-rank(results$profit)
#bind the ranks for the expanding window
if(i==1)
{
tot_results<-data.frame(ID=results[,1],rank)
}else{
tot_results<-cbind(tot_results,rank)
}
suppressMessages(gc())
}
My biggest concern is the binding of the results given that the outer loop's actions are dependent on the output of the inner loops.
Any advice on how proceed would greatly be appreciated.
So I think that you can vectorize most of this, which should give a big reduction in run time.
Currently, you use for-loops (5, to be exact) to create every combination of values, and then run the values one by one through profit_calc (a function that is not specified). Ideally, you'd just take all possible combinations in one go and push them through profit_calc in one single operation.
-- Rationale --
a <- 1:10
b <- 1:10
d <- rep(NA,10)
for (i in seq(a)) d[i] <- a[i] * b[i]
d
# [1] 1 4 9 16 25 36 49 64 81 100
Since * also works on vectors, we can rewrite this to:
a <- 1:10
b <- 1:10
d <- a*b
d
# [1] 1 4 9 16 25 36 49 64 81 100
While it may save us only one line of code, it actually reduces the problem from 10 steps to 1 step.
-- Application --
So how does that apply to your code? Well, given that we can vectorize profit_calc, you can basically generate a data frame where each row is every possible combination of your parameters. We can do this with expand.grid:
foo <- expand.grid(b,c,d,e)
head(foo)
# Var1 Var2 Var3 Var4
# 1 1 1 1 1
# 2 2 1 1 1
# 3 3 1 1 1
# 4 4 1 1 1
# 5 5 1 1 1
# 6 1 2 1 1
Lets say we have a formula... (a - b) / (c + d)... Then it would work like:
bar <- (foo[,1] - foo[,2]) * (foo[,3] + foo[,4])
head(bar)
# [1] 0 2 4 6 8 -2
So basically, try to find a way to replace for-loops with vectorized options. If you cannot vectorize something, try looking into apply instead, as that can also save you some time in most cases. If your code is running too slow, you'd ideally first see if you can write a more efficient script. Also, you may be interested in the microbenchmark library, or ?system.time.
I found out that zoo's transform is not able to use additional (i.e. not part of the zoo object) variables when used in a function body.
Let me explain:
I entered the following code at the prompt to create a small two-column zoo object z and to add a new column calculated from an existing column and a variable x:
library(zoo)
z <- zoo(matrix(1:10, ncol=2, dimnames=list(NULL, c("a", "b"))), order.by=2001:2005)
x <- 2
transform(z, c = x*a)
I got the desired result, a zoo object with a new colum c. No problem here.
Now I'd like to use transform in a function body; the variable for the calculation is passed as a parameter to the function:
rm(x)
f <- function(data, x) { transform(data, c = x*a) }
f(z, 2)
This stops with Error in eval(expr, envir, enclos) (from #1) : object 'x' not found. If I assign x <- 2 at the prompt, it works (therefore the rm(x) above).
With dataframes (i.e. transform.data.frame) there is no problem.
I think that when transform.zoo calls transform.data.frame, the bindings of the formals of f are lost. I don't understand R's environments well enough to find out what exactly is wrong here.
Edited to add: Not only can transform not get the formals but also no variables from inside the function body.
Is there a way to make transform see x? (I know I could work without transform, but it's a nice tool for short, succinct code.)
I think the best advice is: Don't do that!
transform(), with(), subset() are really sugar for use at the top level, to make things somewhat easier to write data manipulation code. If you are writing functions you should use the general replacement functions [<- and [[<- depending on what you are doing.
If you don't believe me, see the Warning in ?transform
Warning:
This is a convenience function intended for use interactively.
For programming it is better to use the standard subsetting
arithmetic functions, and in particular the non-standard
evaluation of argument ‘transform’ can have unanticipated
consequences.
What I mean by using [<- or [ or other functions is to write f like this
f <- function(obj, x) {
cd <- coredata(obj)
cd <- cbind(cd, c = x * cd[, "a"])
zoo(cd, index(obj), attr(obj, "frequency"))
}
f(z, 2)
Which gives the desired result
> transform(z, c = x*a)
a b c
2001 1 6 2
2002 2 7 4
2003 3 8 6
2004 4 9 8
2005 5 10 10
> f(z, 2)
a b c
2001 1 6 2
2002 2 7 4
2003 3 8 6
2004 4 9 8
2005 5 10 10
f is complicated because coredata(obj) is a matrix. It might be neater to
f2 <- function(obj, x) {
cd <- as.data.frame(coredata(obj))
cd[, "c"] <- x * cd[, "a"] ## or cd$c <- x * cd$a
zoo(cd, index(obj), attr(obj, "frequency"))
}
f2(z, 2)
> f2(z, 2)
a b c
2001 1 6 2
2002 2 7 4
2003 3 8 6
2004 4 9 8
2005 5 10 10
You really need to understand environments and evaluation frames to use transform() - well you can't you'd need to learn to use eval(), which is what transform() calls internally, and specify the correct values for envir (the environment in which to evaluate), and enclos the enclosure. See ?eval.
I have a bunch of large dataframes, so every time I want to display them, I have to use head:
head( blahblah(somedata) )
Typing head all the time gets old after the first few hundred times, so I'd like an easy way to do this if possible. One of the cool things about R compared to java that things like this are often really easy, if you know the secret incantation.
I searched in options, and found max.print, which almost works, except there is now a time delay.
head( blahblah(somedata) )
.... is instantaneous (to within the limits of my perception)
options(max.print=100)
blahblah(somedata)
.... takes about 3 seconds, so longer than typing head
Is there some way of making head be applied automatically when printing large data structures?
An piece of code which reproduces this behavior:
long_dataset = data.frame(a = runif(10e5),
b = runif(10e5),
c = runif(10e5))
system.time(head(long_dataset))
options(max.print = 6)
system.time(print(long_dataset))
Putting my comment into an answer, using the data.table package (and data.table not data.frame objects) will automatically print only the first 5 and last 5 rows (once the data.table is larger than 100 rows)
library(data.table)
DT <- data.table(long_data)
DT
1: 0.19613138 0.88714284 0.25715067
2: 0.25405787 0.76544909 0.75632468
3: 0.24841384 0.22095875 0.52588596
4: 0.72766161 0.79696771 0.88802759
5: 0.02448372 0.77885568 0.38199993
---
999996: 0.28230967 0.09410921 0.84420162
999997: 0.73598931 0.86043537 0.30147089
999998: 0.86314546 0.90334347 0.08545391
999999: 0.85507851 0.46621131 0.23892566
1000000: 0.33172155 0.43060483 0.44173400
The data.table FAQ 2.11 deals with this explicitly.
EDIT to deal with existing data.frame objects you don't want to convert.
If you were hesitant at converting existing data.frame objects to data.table objects, you could simply define print.data.frame as data.table:::print.data.table
print.data.frame <- data.table:::print.data.table
long_dataset
1: 0.19613138 0.88714284 0.25715067
2: 0.25405787 0.76544909 0.75632468
3: 0.24841384 0.22095875 0.52588596
4: 0.72766161 0.79696771 0.88802759
5: 0.02448372 0.77885568 0.38199993
---
999996: 0.28230967 0.09410921 0.84420162
999997: 0.73598931 0.86043537 0.30147089
999998: 0.86314546 0.90334347 0.08545391
999999: 0.85507851 0.46621131 0.23892566
1000000: 0.33172155 0.43060483 0.44173400
I'd go along with #thelatemail's suggestion, i.e. redefine print.data.frame:
print.data.frame <- function(df) {
if (nrow(df) > 10) {
base::print.data.frame(head(df, 5))
cat("----\n")
base::print.data.frame(tail(df, 5))
} else {
base::print.data.frame(df)
}
}
data.frame(x=1:100, y=1:100)
# x y
# 1 1 1
# 2 2 2
# 3 3 3
# 4 4 4
# 5 5 5
# ----
# x y
# 96 96 96
# 97 97 97
# 98 98 98
# 99 99 99
# 100 100 100
A more elaborate version could line everything up together and avoid the repeated header, but you get the idea.
You could put such function in your .Rprofile or Rprofile.site files (see ?Startup) so it will be there every time you start an R session.