It often seems to be the case that R packages contain multiple functions that create an object of some class, specified by the package, with generic or non-generic methods that apply to all objects of that class. Although it is generally easy to find out about the functions in a package, I have not found any equally straightforward way to find a precise description of the class itself for S3 classes. I think this is at least partly intentional. Class definitions may be regarded as the sort of internal workings that, on one hand, the user should not have to think about, and on the other, may be changeable by the package creator, who wants people not to rely on them.
However, I find that I sometimes want to create additional objects of the same class that work with the package functions that are methods for that class. And it is not always easy to deduce what features an object must have in order to be usable by package functions that do various things to objects of that class, especially as instances created by different functions may or may not all have exactly the same structure.
The example with which I am currently wrestling are forecast objects created by various functions of the forecast package. The forecast package provides a large number of functions that take forecast objects as inputs. This blog post by Rob Hyndman describes a function to do cross validation and requires an object of class forecast as an argument The tsCV function documentation says it takes a "forecastFunction" as an argument, which must return an object of class forecast and have a univariate time series as its first object (of forecasts, one assumes) and have an argument h giving the horizon. Well, that sounds easy enough. But then in Hyndman’s associated textbook, section 3.6, we are told that forecast objects contain information about the forecasting method, the data, the point forecasts, prediction intervals, residuals, and fitted values. That’s a lot of things, and I am not sure if they are all mandatory or if some are optional, or required only if you intend to use certain methods. And I don’t know anything about mandatory internal structure of the class.
Finally, I particularly want to know if the new fable package, intended as a forecast package replacement, uses the same forecast class mechanism and require the same internal structure., or if not, how they are different. I have not been able to find, in fpp3 or elsewhere, anything that either describes a change or contains a comparable description of objects of class forecast.
I’m going to be embarrassed if there is some simple function,
you_should_know_this_dummy(package = “forecast”, class = “forecast”),
that returns a detailed description of the class. But I have looked for such a function every way I could think of and not found it.
O.K., my bad. I was trying so hard to find a way of locating the help file for the class description (which I don't think exists) that I overlooked the existence of a pretty good description of the class forecast under the function forecast() in the manual for the package forecast. Here it is:
An object of class "forecast" is a list usually containing at least the following elements:
model A list containing information about the fitted model
method The name of the forecasting method as a character string
mean Point forecasts as a time series
lower Lower limits for prediction intervals
upper Upper limits for prediction intervals
level The confidence values associated with the prediction intervals
x The original time series (either object itself or the time series used to create the model stored as object).
residuals Residuals from the fitted model. For models with additive errors, the residuals will be x minus the fitted values.
fitted Fitted values (one-step forecasts)
This still leaves some questions unanswered, like the format for the model information argument model, and for the x argument with multivariate models. But I am hoping that these are similar to those handed to or returned by, e.g., lm(). I think this gives me enough to get started and to hope for informative errors.
I still don't know if the fable package also uses objects of class forecast. The forecast package documents the forecast() function as a generic. The fable package does not document the generic, though it has a very similar list of functions that look like methods, e.g., forecast.whatever. If I figure out the answer, I'll post it here.
I am also looking for a number of other package that provide time series forecast of particular types. I'm hoping that they provide output similar enough that I can use the forecast/fable functions for display, cross-validation, and so forth. We'll see.
Related
I am learning to use the createDataPartition() function in package caret and do not understand what does the parameter does.
As I understand, the list returned from the function is the row sampled, instead of the value. Why bother picking y in this case?
If you go to the data splitting section of the main help pages for caret, you'll see the following:
The function createDataPartition can be used to create balanced splits of the data. If the y argument to this function is a factor, the random sampling occurs within each class and should preserve the overall class distribution of the data.
The rationale for choosing y is to be able to preserve an overall class distribution in the outcome more easily. As discussed here, there can be many problems with imbalanced classes in your training data.
'msts' is a class from the R forecast package developed by Rob Hyndman. I don't see an explicit list of models that can utilize multi seasonal time series in the help file or through searching SO. Does anyone know where I can find a list?
I understand that you can feed msts objects to any of the other models in the forecast package, but I specifically want to know which models will consider all seasonal patterns (for instance, ets/auto.arima will accept it, but will only consider one seasonal specification and ignore others).
I'm working on building an R package, and I've encountered a structural problem that I'm not sure how I should solve. I have several different distributions that I'd like to implement in my package (normal, student's t, etc.) and for each distribution I'll have several functions related to it. I will then have an additional function that uses these functions to execute some process, and so I'm trying to avoid having to define all of these functions with different names.
To be more clear, let me give a simple example. Let's say I want to write a simple package to do maximum likelihood estimation for several distributions. Ideally, I'd like to call an MLE function like:
MLE(data, distribution = "normal")
and then have the MLE function load all the related normal distribution functions that it needs. So, it may load density and gradDensity specific to the normal distribution and operate with these functions. However, if I call
MLE(data, distribution = "studentT")
then density and gradDensity are defined as different functions, now specific to the Student's t distribution.
My question is this: how can I appropriately define the density and gradDensity functions for each different distribution I'm interested in and load them when I need them? I've considered defining a new class for this package and having this object contain all the distribution functions I'd need, but this seems problematic because I want one of the functions in this object to be able to call another one of the functions in the object (for example, gradDensity may call density). I also considered defining separate environments for each distribution, but I wasn't sure if that was good practice. Ideally, I'd also like users to be able to define their own distribution and then use this package as well, but I'm having a hard time understanding how to appropriately construct this structure in R.
I need to see the code for the summary function which is used in mirt package to see the factor loading matrix.
I have tried edit(summary) which in return giving me this
function (object, ...)
standardGeneric("summary")
which is not very helpful for me. How can I see the code for summary?
summary is a generic function. So you need to see the class-specific method for whatever type of object you're trying to summary-ize. E.g., see summary.lm (for lm objects). You don't specify the object class you're working with, so it's impossible to say what specific summary function you need to look at.
UPDATE: These are s4 generics, which is a bit more complicated than summary.lm, which is s3. Some quick googling reveals that you can see the relevant s4 methods on the package's Github page. The contents of summary will depend on what specific class you're looking at (and there appear to be four classes in the package):
Exploratory
Mixed
Confirmatory
MultipleGroup
This question and answer will also be helpful for addressing these kinds of questions in general in the future.
Hi I am trying out classification for imbalanced dataset in R using kernlab package, as the class distribution is not 1:1 I am using the option of class.weights in the ksvm() function call however I do not get any difference in the classification scenario when I add weights or remove weights? So the question is what is the correct syntax for declaring the class weights?
I am using the following function calls:
model = ksvm(dummy[1:466], lab_tr,type='C-svc',kernel=pre,cross=10,C=10,prob.model=F,class.weights=c("Negative"=0.7,"Positive"=0.3))
#this is the function call with class weights
model = ksvm(dummy[1:466], lab_tr,type='C-svc',kernel=pre,cross=10,C=10,prob.model=F)
Can anyone please comment on this, am I following the right syntax of adding weights? Also I discovered that if we use the weights with prob.model=T the ksvm function returns a error!
Your syntax is ok, but the problem of not-working-class-balance is fairly common in machine learning; in a way, the removal of some objects from the bigger class is an only method guaranteed to work, still it may be a source of error increase, and one must be careful to do it in an intelligent way (in SVM the potential support vectors should have the priority - of course now there is a question how to locate them).
You may also try to boost the weights over simple length ratio, lets say ten-fold, and check if it helped even a little or luckily rather overshoot the imbalance to the other side.