I am trying to develop my first package in R and I am facing some issues with "myclass" generic functions that i will try to describe.
Assume a data.frame X with n <- nrow(X) rows and K <- ncol(X) columns.
My main package function (too big to put it in this post) lets say
fun1 <- function(X){
# do staff...
out <- list(index= character vector, A= A, B= B,... etc)
return(out)
class(out) <- "myclass"
}
returns as an output a list. Then I have to use the output for the generic print method in a print.myclass function. However, in my print function I want to use the data frame X used in my main function without asking the user to provide it in an argument (i.e, print(out,X)) and without having it in my output list out (visible to the user at least). Is there any way to do that? Thanks in advance!
Related
I'm sure the question is a bit dummy (sorry)... I'm trying to create a function using differents variables I have stored in a Dataframe. The function is like that:
mlr_turb <- function(Cond_in, Flow_in, pH_in, pH_out, Turb_in, nm250_i, nm400_i, nm250_o, nm400_o){
Coag = (+0.032690 + 0.090289*Cond_in + 0.003229*Flow_in - 0.021980*pH_in - 0.037486*pH_out
+0.016031*Turb_in -0.026006*nm250_i +0.093138*nm400_o - 0.397858*nm250_o - 0.109392*nm400_o)/0.167304
return(Coag)
}
m4_turb <- mlr_turb(dataset)
The problem is when I try to run my function in a dataframe (with the same name of variables). It doesn't detect my variables and shows this message:
Error in mlr_turb(dataset) :
argument "Flow_in" is missing, with no default
But, actually, there is, also all the variables.
I think I missplace or missing some order in the function that gives it the possibility to take the variables from the dataset. I have searched a lot about that but I have not found any answer...
No dumb questions!
I think you're looking for do.call. This function allows you to unpack values into a function as arguments. Here's a really simple example.
# a simple function that takes x, y and z as arguments
myFun <- function(x, y, z){
result <- (x + y)/z
return(result)
}
# a simple data frame with columns x, y and z
myData <- data.frame(x=1:5,
y=(1:5)*pi,
z=(11:15))
# unpack the values into the function using do.call
do.call('myFun', myData)
Output:
[1] 0.3765084 0.6902654 0.9557522 1.1833122 1.3805309
You meet a standard problem when writing R that is related to the question of standard evaluation (SE) vs non standard evaluation (NSE). If you need more elements, you can have a look at this blog post I wrote
I think the most convenient way to write function using variables is to use variable names as arguments of the function.
Let's take again #Muon example.
# a simple function that takes x, y and z as arguments
myFun <- function(x, y, z){
result <- (x + y)/z
return(result)
}
The question is where R should find the values behind names x, y and z. In a function, R will first look within the function environment (here x,y and z are defined as parameters) then it will look at global environment and then it will look at the different packages attached.
In myFun, R expects vectors. If you give a column name, you will experience an error. What happens if you want to give a column name ? You must say to R that the name you gave should be associated to a value in the scope of a dataframe. You can for instance do something like that:
myFun <- function(df, col1 = "x", col2 = "y", col3 = "z"){
result <- (df[,col1] + df[,col2])/df[,col3]
return(result)
}
You can go far further in that aspect with data.table package. If you start writing functions that need to use variables from a dataframe, I recommend you to start having a look at this package
I like Muon's answer, but I couldn't get it to work if there are columns in the data.frame not in the function. Using the with() function is a simple way to make this work as well...
#Code from Muon:
# a simple function that takes x, y and z as arguments
myFun <- function(x, y, z){
result <- (x + y)/z
return(result)
}
# a simple data frame with columns x, y and z
myData <- data.frame(x=1:5,
y=(1:5)*pi,
z=(11:15),
a=6:10) #adding a var not used in myFun
# unpack the values into the function using do.call
do.call('myFun', myData)
#generates an error for the unused "a" column
#using with() function:
with(myData, myFun(x, y, z))
(I hope that this question hasn't been asked before).
For convenience I am using abbreviations for functions like "cn" instead of "colnames". However, for colnames/rownames the abbreviated functions only work for reading purposes. I am not able to set colnames with that new "cn" function. Can anyone explain the black magic behind the colnames function? This is the example:
cn <- match.fun(colnames)
x <- matrix(1:2)
colnames(x) <- "a" # OK, works.
cn(x) <- "b" # Error in cn(x) <- "b" : could not find function "cn<-"
Thank you, echasnovski, for the link to that great website.
It has helped me a lot to better understand R!
http://adv-r.had.co.nz/Functions.html#replacement-functions
In R, special "replacement functions" like foo<- can be defined. E.g. we can define a function
`setSecondElement<-` <- function(x, value){
x[2] <- value
return(x)
}
# Let's try it:
x <- 1:3
setSecondElement(x) <- 100
print(x)
# [1] 1 100 3
The colnames<- function works essentially the same. However, "behind the scenes" it will check if x is a data.frame or matrix and set either names(x) or dimnames(x)[[2]]. Just execute the following line in R and you'll see the underlying routine.
print( `colnames<-` )
For my specific problem the solution turns out to be very simple. Remember that I'd like to have a shorter version of colnames which shall be called cn. I can either do it like this:
cn <- match.fun(colnames);
`cn<-` <- function(x, value){
colnames(x) <- value
return(x)
}
More easily, as Stéphane Laurent points out, the definition of `cn<-` can be simplified to:
`cn<-` <- `colnames<-`
There is a minor difference between these approaches. The first approach will define a new function, which calls the colnames<- function. The second approach will copy the reference from the colnames<- function and make exactly the same function call even if you use cn<-. This approach is more efficient, since 1 additinal function call will be avoided.
I want to calculate the log return of data . I define a function and want to load the data. but system always mentions second factor is missing. Otherwise it just calculate the log of row number.
#read data
data <- read.csv(file="E:/Lect-1-TradingTS.csv",header=TRUE)
mode(data)
p<-data["Price"]
#func1
func1 <- function(x1,x2)
{
result <- log(x2)-log(x1)
return(result)
}
#calculate log return
log_return<-vector(mode="numeric", length=(nrow(data)-1))
for(i in 2:nrow(p))
{
log_return[i-1] <- func1(p[(i-1):i])
}
Error in func1(p[(i - 1):i]) : argument "x2" is missing, with no default
Your function func1 was defined to accept two arguments, but you are passing it a single argument: the vector p[(i-1):i], which has two elements but is still considered a single object. To fix this you need to pass two separate arguments, p[i-1] and p[i]. Alternatively, modify the definition of func1 to accept a two-element vector:
func1 <- function(v)
{
x1 <- v[1]
x2 <- v[2]
result <- log(x2)-log(x1)
return(result)
}
Thank you guys,all your answers inspired me. I think I found a solution.
log_return[i-1] <- func1(p[(i-1),"Price"],p[(i),"Price"])
basically you do not need a func for those calcs in R
R's vectorization comes in handy in these cases
data <- read.csv(file="E:/Lect-1-TradingTS.csv",header=TRUE)
mode(data)
p <- data[["Price"]]
logrets <- log(p[2:length(p)]) - log(p[1:length(p)-1])
This vectorized computation will usually also heavily outperform any function you define "by hand".
I would like to monitor the progress of my mapply function. The data consists of 2 lists and there is a function with 2 arguments.
If I do something similar with a function that takes 1 arguments I can use ldply instead of lapply. (I'd like to rbind.fill the output to a data.frame)
If I want to do the same with mdply it doesn't work as the function in mdply wants values taken from columns of a data frame or array. Mapply takes lists as input.
These plyr apply functions are handy, not just because I can get the output as a data.frame but also because I can use the progress bar.
I know there is the pbapply package but that there is no mapply version and there is the txtProgressBar function but I could not figure out how to use this with mapply.
I tried to create a reproducible example (takes around 30 s to run)
I guess bad example. My l1 is a list of scraped websites (rvest::read_html) which I cannot send as a data frame to mdply. The lists really need to be lists.
mdply <- plyr::mdply
l1 <- as.list(rep("a", 2*10^6+1))
l2 <- as.list(rnorm(-10^6:10^6))
my_func <- function(x, y) {
ab <- paste(x, "b", sep = "_")
ab2 <- paste0(ab, exp(y), sep = "__")
return(ab2)
}
mapply(my_func, x = l1, y = l2)
mdply does't work
mdply(l1, l2, my_func, .progress='text')
Error in do.call(flat, c(args, list(...))) : 'what' must be a function or character string
From ?mdply I dare say you can't specify two data-inputs. Your error message means mdply is trying to use l2 as function but a list cannot be coerced into a function...
The following works fine
mdply(
data.frame(x=unlist(l1), y=unlist(l2)), # create a data.frame from l1 and l2
my_func, # your function
.progress=plyr::progress_text(style = 3) # create a textual progress bar
)[, 3] # keep the output only
I think I've understood your purpose now:
mdply(
.data=data.frame(r=1:length(l1)), # "fake data" (I will use them as item index)
.fun=function(r) return(my_func(l1[[r]], l2[[r]])), # a wrapper function of your function
.progress=plyr::progress_text(style = 3) # create a textual progress bar
)[, 2] # keep the output only
Please note I had to wrap your function with a new one which takes into account just one argument and it uses that argument to access l1 and l2
Answering my own question. There is now a function called pbmapply in pbapply that adds progress bars to mapply.
I want to create an S4 class in R that will allow me to access large datasets (in chunks) from the cloud (similar to the goals of the ff package). Right now I'm working with a toy example called "range.vec" (I don't want to deal with internet access yet), which stores a sequence of numbers like so:
setClass("range.vec",
representation(start = "numeric", #beginning num in sequence
end = "numeric", #last num in sequence
step = "numeric", #step size
chunk = "numeric", #cache a chunk here to save memory
chunkpos = "numeric"), #where does the chunk start in the overall vec
contains="numeric" #inherits methods from numeric
)
I want this class to inherit the methods from "numeric", but I want it to use these methods on the whole vector, not just the chunk that I'm storing. For example, I don't want to define my own method for 'mean', but I want 'mean' to get the mean of the whole vector by accessing it chunk by chunk, using length(), '[', '[[', and el() functions that I've defined. I've also defined a chunking function:
setGeneric("set.chunk", function(x,...) standardGeneric("set.chunk"))
setMethod("set.chunk", signature(x = "range.vec"),
function (x, chunksize=100, chunkpos=1) {
#This function extracts a chunk of data from the range.vec object.
begin <- x#start + (chunkpos - 1)*x#step
end <- x#start + (chunkpos + chunksize - 2)*x#step
data <- seq(begin, end, x#step) #calculate values in data chunk
#get rid of out-of-bounds values
data[data > x#end] <- NA
x#chunk <- data
x#chunkpos <- chunkpos
return(x)
}})
When I try to call a method like 'mean', the function inherits correctly, and accesses my length function, but returns NA because I don't have any data stored in the .Data slot. Is there a way that I can use the .Data slot to point to my chunking function, or to tell the class to chunk numeric methods without defining every single method myself? I'm trying to avoid coding in C if I can. Any advice would be very helpful!
You could remove your chunk slot and replace it by numeric's .Data slot.
Little example:
## class definition
setClass("foo", representation(bar="numeric"), contains="numeric")
setGeneric("set.chunk", function(x, y, z) standardGeneric("set.chunk"))
setMethod("set.chunk",
signature(x="foo", y="numeric", z="numeric"),
function(x, y, z) {
## instead of x#chunk you could use numeric's .Data slot
x#.Data <- y
x#bar <- z
return(x)
})
a <- new("foo")
a <- set.chunk(a, 1:10, 4)
mean(a) # 5.5
Looks like there isn't a good way to do this within the class. The only solution I've found is to tell the user to calculate to loop through all of the chunks of data from the cloud, and calculate as they go.