Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 5 years ago.
Improve this question
I have a time-series data containing 16 values (of no. of Vehicles) from 2001 to 2016. I wanted to predict - based on the underlying trend - the values upto 2050 (which is a long shot I agree).
Upon doing some research, I found that it can be done by methods like HoltWinters or TBATS which, even though, did not go with my own plan of using some Machine Learning algorithm.
I am using R for all my work. Now, after using HoltWinters() and then forecast() methods, I did get an extrapolated curve uptil 2050 but it is a simple exponential curve from 2017 to 2050 which I think I could have obtained through meager calculations.
My question is twofold:
1) What would be the best approach to obtain a meaningful extrapolation?
2) Does my current approach be modified to give me a more meaningful extrapolation?
By meaningful I want to express that a curve with the details more closer to actuality.
Thanks a lot.
I guess you need more data to make predictions. HoltWinters or TBATS may work but there are many other ML models for time series data you can try.
http://a-little-book-of-r-for-time-series.readthedocs.io/en/latest/src/timeseries.html
This link has the R sample code for Holtwinters and the plots.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I have a vector of yearly population data from 1980 to 2020 with only two values (years 2000 and 2010) and I need to predict the missing data.
My first thought was to use na.approx to fill in the missing data between 2000 and 2010 and then use the ARIMA model. However, as the population is declining, in the remote future its values would become negative, which is illogical.
My second thought was to use differences of logarithms between the sample data dividing it by 10(since there is a 10 year gap between the actual values) and using it as a percentage change to predict the missing data.
However, I am new to R and statistics so I am not sure if this is the best way to get the predictions. Any ideas would be really appreciated.
Since the line that the two data points provides does not make intuitive sense, I would recommend just using the average of the two unless you can get additional data. If you are able to get either more yearly data, or even expected variation values, then you can do some additional analysis. But for now, you're kinda stuck.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 8 years ago.
Improve this question
More a general question, but since I am using R -> tags
My training data set is 15,000 entries big from which around 20 i would like to use for positive data set -> building up the svm. I wanted to use the remaining resampled dataset as my negative dataset, but i was wondering, it might be better to take the same size (around 20) as the negative data set, otherwise it's highly imbalanced? Is there an easy approach to pool then the classifiers (ensemble based) in R after 1000 rounds of resampling? (or even with the e1071 package)
Followup question: I would like to calculate a score for each prediction afterwards, is it fine just to take the probabilities times 100??
Thx
You can try "class weight" approach in which the smaller class gets more weight, thus taking more cost to mis-classify the positive labelled class.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
http://i43.tinypic.com/8yz893.png
The figure in the link shows the relation between one of my predictors(vms) and the response(responses[i]).
We can distinguish many log-like trends within the same graph.
According to this, a single value of my predictor can be mapped to many values of the response.
Is this acceptable or should I be alarmed that there is a problem with my data?
What regression model would seem more suitable for this picture?
This isn't really an R question, but rather a general statistics question, so you may get downvoted, but I'll try to help you out.
There's nothing wrong with having individual values of the predictor mapping to multiple values of response. This would be a problem if you were defining and evaluating a function, but you're not technically evaluating a function, you're evaluating the statistical relationship between two variables. You will then create a functional form to model this relationship.
It seems to me that a conventional OLS model would be very inappropriate here, as one of the assumptions of OLS is that the relationship between the predictor and the outcome variable is linear, which is clearly is not in this case. The relationship actually looks a lot like a 1/x curve, so you may want to try a 1/x transformation and see where that gets you.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
Maybe my question will fail to be specific but when fitting a glme model (using lme4 package in R) I get for one of the parameters SE=1000, with the estimated parameter as high as 16. The variable is a dichotomous variable. My question is if there might be an explanation for such a result, considering that the other parameters have parameters and SE that seem ok
That's a sign that you have complete separation. You should re-run the model without that covariate. Since its an ME model you may need to do a tabulation of outcome by covariate by levels to see what is happening. More details would allow greater specificity in our answers.
This is a link to a posting by Jarrod Hadfield, one of the guRus on the R mixed model mailing list. It demonstrates how complete separation leads to the Hauck-Donner effect, and it offers some further approaches to attempt dealing with it.
You may be seeing a case of the Hauck-Donner effect. Here is one post that discusses it, you can read the original paper or search the web for additional discussions.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I got the following time series of residuals from another regression.
One index is a day. You can directly observe the year cycle.
Aim is to fit a harmonic function through it to expalain further part of the underlying time series.
I really appreciate your ideas about which function to use for estimating the right parameters! From acf we learn that there is also a week cycle. However, this issue i will adress later with sarima.
This seems to be the sort of thing a fourier transform is designed for.
Try
fftobj = fft(x)
plot(Mod(fftobj)[1:floor(length(x)/2)])
The peaks in this plot corresponds to frequencies with high coefficients in the fit. Arg(fftobj) will give you the phases.
Well i tried it, but it provides a forecast that looks like a exponential distribution. I solved the problem meanwhile in another way. I added a factor component for each month and draw a regression. In the next step I smoothed the results from this regression and got a intra-year pattern that is more accurate than a harmonic function. E.g. during the June and July (around 185) there is generally a low level but also a high amount of peaks.