I am using the R. I am trying to use the "lines' command in ggplot2 to show the predicted values vs. the actual values for a statistical model (arima, time series). Yet, when I ran the code, I can only see a line of one color.
I simulated some data in R and then tried to make plots that show actual vs predicted:
#set seed
set.seed(123)
#load libraries
library(xts)
library(stats)
#create data
date_decision_made = seq(as.Date("2014/1/1"), as.Date("2016/1/1"),by="day")
date_decision_made <- format(as.Date(date_decision_made), "%Y/%m/%d")
property_damages_in_dollars <- rnorm(731,100,10)
final_data <- data.frame(date_decision_made, property_damages_in_dollars)
#aggregate
y.mon<-aggregate(property_damages_in_dollars~format(as.Date(date_decision_made),
format="%W-%y"),data=final_data, FUN=sum)
y.mon$week = y.mon$`format(as.Date(date_decision_made), format = "%W-%y")`
ts = ts(y.mon$property_damages_in_dollars, start = c(2014,1), frequency = 12)
#statistical model
fit = arima(ts, order = c(4, 1, 1))
Here were my attempts at plotting the graphs:
#first attempt at plotting (no second line?)
plot(fit$residuals, col="red")
lines(fitted(fit),col="blue")
#second attempt at plotting (no second line?)
par(mfrow = c(2,1),
oma = c(0,0,0,0),
mar = c(2,4,1,1))
plot(ts, main="as-is") # plot original sim
lines(fitted(fit), col = "red") # plot fitted values
legend("topleft", legend = c("original","fitted"), col = c("black","red"),lty = 1)
#third attempt (plot actual, predicted and 5 future values - here, the actual and future values show up, but not the predicted)
pred = predict(fit, n.ahead = 5)
ts.plot(ts, pred$pred, lty = c(1,3), col=c(5,2))
However, none of these seem to be working correctly. Could someone please tell me what I am doing wrong? (note: the computer I am using for my work does not have an internet connection or a usb port - it only has R with some preloaded packages. I do not have access to the forecast package.)
Thanks
Sources:
In R plot arima fitted model with the original series
R fitted ARIMA off by one timestep? pkg:Forecast
Plotting predicted values in ARIMA time series in R
You seem to be confusing a couple of things:
fitted usually does not work on an object of class arima. Usually, you can load the forecast package first and then use fitted.
But since you do not have acces to the forecast package you cannot use fitted(fit): it always returns NULL. I had problems with fitted
before.
You want to compare the actual series (x) to the fitted series (y), yet in your first attempt you work with the residuals (e = x - y)
You say you are using ggplot2 but actually you are not
So here is a small example on how to plot the actual series and the fitted series without ggplot.
set.seed(1)
x <- cumsum(rnorm(10))
y <- stats::arima(x, order = c(1, 0, 0))
plot(x, col = "red", type = "l")
lines(x - y$residuals, col = "blue")
I Hope this answer helps you get back on tracks.
Related
Recently, i use Savitzky-Golay in signal package for smoothing my data, but it is not work well. I hear that Perona-Malik is good smooth method for this task, however, i could not realize it. My question is that is it possible realize the task to smooth the data by using P& M model by using R.
Thanks
hees
Simple example.
library(signal)
bf <- butter(5,1/3)
x <- c(rep(0,15), rep(10, 10), rep(0, 15))
###
sg <- sgolayfilt(x) # replace at here
plot(sg, type="l")
lines(filtfilt(rep(1, 5)/5,1,x), col = "red") # averaging filter
lines(filtfilt(bf,x), col = "blue") # butterworth
p
I am looking to combine all three" test information function" lines (one for each model) into one and the same graph. I have a data set of category 1-5 Likert responses in 400 rows in sets of 8 columns (one for each item). I have ran three IRT models on these sets using mirt package in R, and produced test info plots. I would like to combine IRT test info plots from three different (graded response) models, three lines, in one and the same grid.
plot(PFgrmodel29, type = 'info', xlim = c(-4, 4), ylim=c(0,85))
plot(PFgrmodel43, type = 'info', xlim = c(-4, 4), ylim=c(0,85))
plot(PFgrmodel57, type = 'info', xlim = c(-4, 4), ylim=c(0,85))
Example of test info plot:
How can I achieve this with mirt, lattice, ggplot2 or similar?
Your plots from the mirt package are a lattice object, so you can try using latticeExtra, since you did not provide your dataset, I provide an example code below using the example dataset in the package:
library(mirt)
library(latticeExtra)
fulldata <- expand.table(LSAT7)
mod1 <- mirt(fulldata,1,SE=TRUE)
mod2 <- mirt(fulldata,1, itemtype = 'Rasch')
mod3 <- mirt(fulldata,1,itemtype='ideal')
key=list(columns=2,
text=list(lab=c("mod1","mod2","mod3")),
lines=list(lwd=4, col=c("blue","orange","red"))
)
p1 = plot(mod1,type="info",key=key)
p2 = update(plot(mod2,type="info"),col="orange")
p3 = update(plot(mod3,type="info"),col="red")
p1+p2+p3
That is just beautiful! Works like a charm, except I needed to add ylim=c(0,100) to modify the y axis (taller) to fit the data. I thought that placing the model with the highest info curve first ( as mod1) would do it, but no. Thank you Stupidwolf so much for providing the code!! No need for latticeExtra package.
ALso I had to retain the "model" part of the code for this to work:
model <- 'F = 1-5 PRIOR = (5, g, norm, -1.5, 3)'
My code looks like this now:
library(mirt)
library(latticeExtra)
model <- 'F = 1-5 PRIOR = (5, g, norm, -1.5, 3)'
mod1 <- mirt(PFdata57,1,itemtype="graded", SE=TRUE)
mod2 <- mirt(PFdata43,1,itemtype="graded", SE=TRUE)
mod3 <- mirt(PFdata29,1,itemtype="graded", SE=TRUE)
key=list(columns=1,
text=list(lab=c("P57/PF Short form 8a","P43/PF Short form 6a","P29/PF Short form 4a")),
lines=list(lwd=4, col=c("blue","orange","red")))
p1 = plot(mod1,type="info",key=key,xlim=c(-4,4),ylim=c(0,85))
p2 = update(plot(mod2,type="info"),col="orange")
p3 = update(plot(mod3,type="info"),col="red")
p1+p2+p3
I made a plot that predicts a time series. It was achieved wih this code:
forecast1 <- HoltWinters(ts, beta = FALSE, gamma = TRUE)
forecast2 <- forecast(forecast1, h = 60)
autoplot(forecast2)
Where 'ts' is a time series object.
So I would like to add another time series to compare predicted values with actual values, starting from my last actual observation. I achieved it with a classical plot, adding a line with actual time series. This are the plots I have:
How can I add this new line to my first plot?
Here is the simplest way to do it:
library(ggplot2)
library(forecast)
smpl1 <- window(AirPassengers, end = c(1952, 12))
smpl2 <- window(AirPassengers, start = c(1953, 1), end = c(1953,12))
hw <- HoltWinters(smpl1, beta = FALSE, gamma = TRUE)
forecast <- forecast(hw, h = 12)
autoplot(forecast) +
autolayer(smpl2, series="Data") +
autolayer(forecast$mean, series="Forecasts")
The autolayer command from the forecast package allows you to add layers involving time series and forecasts to existing plots.
I am trying to plot multiple survival curves in the same plot. Using plot I can easily do this by
plot(sr_fit_0, col = 'red' , conf.int=TRUE, xlim=c(0, max_m))
par(new=TRUE)
plot(sr_fit_1, col ='blue', conf.int=TRUE, xlim=c(0, max_m))`
But now I want to use ggsurv to plot survival curve and I don't know how to have both of them in the same plot(not subplots). Any help is appreciated.
I generated some data for life below for life of hamsters and gerbils. You can use the survfit() function similar to other curve fitting functions and define a data frame column that splits the population. When you create the plot with ggsurv() I think it will display what you are looking for.
## Make some data for varmint life
set.seed(1); l1 <- rnorm(120, 2.5, 1)
gerbils <- data.frame(life = l1[l1>0])
set.seed(3); l2 <- rnorm(120, 3, 1)
hamsters <- data.frame(life = l2[l2>0])
## Load required packages
require('survival'); require('GGally')
## Generate fits for survival curves
## (Note that Surv(x) creates a Survival Object)
sf.gerbils <- survfit(Surv(life) ~ 1, data = gerbils)
sf.hamsters <- survfit(Surv(life) ~ 1, data = hamsters)
ggsurv(sf.gerbils) #Survival plot for gerbils
ggsurv(sf.hamsters) #Survival plot for hamsters
## Combine gerbils and hamsters while adding column for identification
varmints <- rbind((cbind(gerbils, type = 'gerbil')),
(cbind(hamsters, type = 'hamster')))
## Generate survival for fit for all varmints as a function of type
sf.varmints <- survfit(Surv(life) ~ type, data = varmints)
## Plot the survival curves on one chart
ggsurv(sf.varmints)
I am trying to visualize some data and in order to do it I am using R's hist.
Bellow are my data
jancoefabs <- as.numeric(as.vector(abs(Janmodelnorm$coef)))
jancoefabs
[1] 1.165610e+00 1.277929e-01 4.349831e-01 3.602961e-01 7.189458e+00
[6] 1.856908e-04 1.352052e-05 4.811291e-05 1.055744e-02 2.756525e-04
[11] 2.202706e-01 4.199914e-02 4.684091e-02 8.634340e-01 2.479175e-02
[16] 2.409628e-01 5.459076e-03 9.892580e-03 5.378456e-02
Now as the more cunning of you might have guessed these are the absolute values of some model's coefficients.
What I need is an histogram that will have for axes:
x will be the number (count or length) of coefficients which is 19 in total, along with their names.
y will show values of each column (as breaks?) having a ylim="" set, according to min and max of those values (or something similar).
Note that Janmodelnorm$coef simply produces the following
(Intercept) LON LAT ME RAT
1.165610e+00 -1.277929e-01 -4.349831e-01 -3.602961e-01 -7.189458e+00
DS DSA DSI DRNS DREW
-1.856908e-04 1.352052e-05 4.811291e-05 -1.055744e-02 -2.756525e-04
ASPNS ASPEW SI CUR W_180_270
-2.202706e-01 -4.199914e-02 4.684091e-02 -8.634340e-01 -2.479175e-02
W_0_360 W_90_180 W_0_180 NDVI
2.409628e-01 5.459076e-03 -9.892580e-03 -5.378456e-02
So far and consulting ?hist, I am trying to play with the code bellow without success. Therefore I am taking it from scratch.
# hist(jancoefabs, col="lightblue", border="pink",
# breaks=8,
# xlim=c(0,10), ylim=c(20,-20), plot=TRUE)
When plot=FALSE is set, I get a bunch of somewhat useful info about the set. I also find hard to use breaks argument efficiently.
Any suggestion will be appreciated. Thanks.
Rather than using hist, why not use a barplot or a standard plot. For example,
## Generate some data
set.seed(1)
y = rnorm(19, sd=5)
names(y) = c("Inter", LETTERS[1:18])
Then plot the cofficients
barplot(y)
Alternatively, you could use a scatter plot
plot(1:19, y, axes=FALSE, ylim=c(-10, 10))
axis(2)
axis(1, 1:19, names(y))
and add error bars to indicate the standard errors (see for example Add error bars to show standard deviation on a plot in R)
Are you sure you want a histogram for this? A lattice barchart might be pretty nice. An example with the mtcars built-in data set.
> coef <- lm(mpg ~ ., data = mtcars)$coef
> library(lattice)
> barchart(coef, col = 'lightblue', horizontal = FALSE,
ylim = range(coef), xlab = '',
scales = list(y = list(labels = coef),
x = list(labels = names(coef))))
A base R dotchart might be good too,
> dotchart(coef, pch = 19, xlab = 'value')
> text(coef, seq(coef), labels = round(coef, 3), pos = 2)