Creating a data frame in R - r

I am using some simple forecasting method such as Naive for my project. I am using accuracy(naive_train, test) to measure the accuracy of these method. Right now I have this as my output of meanf method.
However, I want to create a dataframe like this to compare the different methods.
How can I make a data frame like this and adding the extra column on the right indicating which method this is? Thank you!

Related

How to extract variables from tab_model in R to create new data frame?

Example output of tab_model
I have created a table from tab_model that includes multiple models and wish to extract all 'p-values' and 'Estimates/Odds Ratio' to create a data frame that includes these. Output of tab_model is an html file. I am unable to find a function to pull this info in accordance, any ideas on how I could do this?
For example, I want to retrieve all p-values and Estimates for variable 'age' in all of my models...Only 3 in example image but I have hundreds
You should get these values from the regression models themselves, instead of outputting them to a HTML-table, and then extract them.
Without further knowledge of your process and data it is difficult to provide a more concrete answer.

Partial correlation in R without correlating everything

I need help figuring out how to write the R code for a partial correlation, I'm still fairly new to R. I have a dataset with 22 columns of interval data. I'm trying to run a partial correlation of columns 19:22 with columns 2:18, controlling for/partialling out column 1. I've used the following code:
par.r=partial.r(dataset, c(2:22), c(1))
The problem is that this gives me everything correlated together, which isn't what I'm looking for. If this was a standard correlation I would use the below code:
normal.correlation = corr.test(dataset[,c(2:18)],[,c(19:22)], method="spearman")
My question is how do I run a partial correlation with my variables without correlating everything? Thanks for any help you can provide.
Is it because of a timing issue that you don't want to correlate everything? If not then you could use the following to extract the relevant part of the (full) partial correlation matrix
par.r=partial.r(dataset, c(2:22), c(1))
par[1:17,18:21]

How to interpret the values in auto arima plot and store it in a dataframe

I want to use forecasting to my data and I have used the auto arima method and got graph.
The following is my code,
fit <- auto.arima(a)
LH.pred <- forecast(fit,h=30)
plot(LH.pred)
I want to interpret the graphs as values and store it in a data frame, so that I can make calculations based on the forecasting.
Can anybody let me know how to take the values from the graph and store it in a data frame?
Also when I used the auto arima method, the days just got converted to days count from 1-1-1970. I want to convert back to normal dates. Can anybody plese help in that too?
Thanks
Observer
Taking the values from the graph is not really necessary. The graph consists of two parts. The first one is the time series 'a' used to build 'fit'. It is still stored in 'fit' as 'fit$x'. The second part is the forecast. You can take it from 'LH.pred' using 'as.data.frame(LH.pred)'.

Trying to apply formula to each column in R, how to feed data to formula?

So I'm trying to apply an exponential smoothing model to each column in a data frame called 'cities'. I have used apply to identify the data frame, go by columns, and I thought to run the model. However, when I try to do so, it tells me that I need to specify data for the exponential smoothing model...I thought I already had by putting it in the apply loop.
apply(x=cities,2,FUN=HoltWinters(x=x,gamma=FALSE))
Also, eventually I'd like to predict the next 4 periods using the HW model developed using forecast.predict. Do I need to use a different loop or can I combine it all in this one?
FUN takes a function, but you're trying to give it the output of a function.
Try this:
apply(cities, 2, FUN=function(x) HoltWinters(x=x,gamma=FALSE))

Generating variable names for dataframes based on the loop number in a loop in R

I am working on developing and optimizing a linear model using the lm() function and subsequently the step() function for optimization. I have added a variable to my dataframe by using a random generator of 0s and 1s (50% chance each). I use this variable to subset the dataframe into a training set and a validation set If a record is not assigned to the training set it is assigned to the validation set. By using these subsets I am able to estimate how good the fit of the model is (by using the predict function for the records in the validation set and comparing them to the original values). I am interested in the coefficients of the optimized model and in the results of the KS-test between the distributions of the predicted and actual results.
All of my code was working fine, but when I wanted to test whether my model is sensitive to the subset that I chose I ran into some problems. To do this I wanted to create a for (i in 1:10) loop, each time using a different random subset. This turned out to be quite a challenge for me (I have never used a for loop in R before).
Here's the problem (well actually there are many problems, but here is one of them):
I would like to have separate dataframes for each run in the loop with a unique name (for example: Run1, Run2, Run3). I have been able to create a variable with different strings using paste(("Run",1:10,sep=""), but that just gives you a list of strings. How do I use these strings as names for my (subsetted) dataframes?
Another problem that I expect to encounter:
Subsequently I want to use the fitted coefficients for each run and export these to Excel. By using coef(function) I have been able to retrieve the coefficients, however the number of coefficients included in the model may change per simulation run because of the optimization algorithm. This will almost certainly give me some trouble with pasting them into the same dataframe, any thoughts on that?
Thanks for helping me out.
For your first question:
You can create the strings as before, using
df.names <- paste(("Run",1:10,sep="")
Then, create your for loop and do the following to give the data frames the names you want:
for (i in 1:10){
d.frame <- # create your data frame here
assign(df.name[i], d.frame)
}
Now you will end up with ten data frames with ten different names.
For your second question about the coefficients:
As far as I can tell, these don't naturally fit into your data frame structure. You should consider using lists, as they allow different classes - in other words, for each run, create a list containing a data frame and a numeric vector with your coefficients.
Don't create objects with numbers in their names, and then try and access them in a loop later, using get and paste and assign. The right way to do this is to store your elements in an R list object.

Resources