Put result of forecast::ma() as a matrix and compute RMSE - r

I am really new to R. I am trying to calculate some MA[n] forecasts in R.
Here is my code,
# simple reproducible example
set.seed(0); factory <- round(rnorm(84), 1)
library(forecast)
factory.ts <- ts(factory, start = 1947, frequency = 12)
fit_EMA <- ma(factory.ts, order=5)
It works fine. Below is what fit_EMA looks like in R console. But I don't like the format as I couldn't find a way to take those fitted points for further usage. For example, how can I extract a row or column?
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1947 NA NA 0.80 0.24 0.12 -0.20 -0.46 -0.06 0.40 0.42 0.26 0.20
1948 -0.34 -0.58 -0.36 -0.32 -0.18 -0.36 -0.32 -0.30 -0.10 -0.02 0.20 0.34
1949 0.48 0.32 -0.10 -0.08 -0.22 -0.54 -0.48 -0.34 -0.20 0.08 0.38 0.38
1950 0.74 0.54 0.66 0.58 0.56 0.16 -0.02 -0.60 -1.04 -0.70 -0.38 -0.18
1951 0.10 0.34 0.58 0.26 0.28 0.28 0.48 -0.04 -0.32 -0.56 -0.54 -0.66
1952 -0.80 -0.38 -0.28 -0.32 -0.60 -0.34 -0.28 -0.10 -0.14 0.20 0.00 -0.06
1953 0.06 0.28 0.24 0.34 0.18 -0.24 -0.62 -0.38 -0.20 -0.06 NA NA
Also, how can I calculate RMSE or other error methods? forecast::ma or TTR::SMA, TTR::EMA doesn't give a calculated error measures in summary. Or I have missed a library function?

The result of forecast::ma() is always a "ts" object. Although your fit_EMA appears as a matrix when you print it to screen (because frequence = 12 so you have 12 columns), it is essentially a vector. You can use str(fit_EMA) to inspect it. You can do
mat <- matrix(fit_EMA, ncol = 12, byrow = TRUE)
to get a matrix. Then mat[1, ] gives the fitted values for the first year (year 1947).
Getting RMSE is so straightforward that a function / library routine is not needed. Do:
MSE <- mean((fit_EMA - factory.ts) ^ 2, na.rm = TRUE)
# [1] 0.55876
RMSE <- sqrt(MSE)
# [1] 0.7475025

Related

Getting a null score vector using princomp funtion in R

I'm trying to do PCA analysis on some data. I'm not given the raw data, just the correlation matrix in this way:
Tmax Tmin P H PT V Vmax
Tmax 1.00 0.70 -0.08 -0.41 -0.09 -0.23 -0.08
Tmin 0.70 1.00 -0.30 0.07 0.14 -0.03 -0.01
P -0.08 -0.30 1.00 -0.18 -0.13 -0.29 -0.25
H -0.41 0.07 -0.18 1.00 0.32 -0.15 -0.19
PT -0.09 0.14 -0.13 0.32 1.00 0.11 0.07
V -0.23 -0.03 -0.29 -0.15 0.11 1.00 0.83
Vmax -0.08 -0.01 -0.25 -0.19 0.07 0.83 1.00
For this I'm trying to use the princomp() function since it has the covmat option so I can introduce data as a correlation matrix. For the pca analysis I'm using the following code:
pca_prim <- princomp(covmat=Primavera, cor = T, scores = TRUE)
I need the scores in order to plot a biplot in following steps but the scores vector I get is null:
biplot(pca_prim)
Error in biplot.princomp(pca_prim) : object 'pca_prim' has no scores
pca_prim$scores
NULL
I can't seem to find what the problem is in order to get the scores. Any suggestions?

Create data frame from EFA output in R

I am working on EFA and would like to customize my tables. There is a function, psych.print to suppress factor loadings of a certain value to make the table easier to read. When I run this function, it produces this data and the summary stats in the console (in an .RMD document, it produces console text and a separate data frame of the factor loadings with loadings suppressed). However, if I attempt to save this as an object, it does not keep this data.
Here is an example:
library(psych)
bfi_data=bfi
bfi_data=bfi_data[complete.cases(bfi_data),]
bfi_cor <- cor(bfi_data)
factors_data <- fa(r = bfi_cor, nfactors = 6)
print.psych(fa_ml_oblimin_2, cut=.32, sort="TRUE")
In an R script, it produces this:
item MR2 MR3 MR1 MR5 MR4 MR6 h2 u2 com
N2 17 0.83 0.654 0.35 1.0
N1 16 0.82 0.666 0.33 1.1
N3 18 0.69 0.549 0.45 1.1
N5 20 0.47 0.376 0.62 2.2
N4 19 0.44 0.43 0.506 0.49 2.4
C4 9 -0.67 0.555 0.45 1.3
C2 7 0.66 0.475 0.53 1.4
C5 10 -0.56 0.433 0.57 1.4
C3 8 0.56 0.317 0.68 1.1
C1 6 0.54 0.344 0.66 1.3
In R Markdown, it produces this:
How can I save that data.frame as an object?
Looking at the str of the object it doesn't look that what you want is built-in. An ugly way would be to use capture.output and try to convert the character vector to dataframe using string manipulation. Else since the data is being displayed it means that the data is present somewhere in the object itself. I could find out vectors of same length which can be combined to form the dataframe.
loadings <- unclass(factors_data$loadings)
h2 <- factors_data$communalities
#There is also factors_data$communality which has same values
u2 <- factors_data$uniquenesses
com <- factors_data$complexity
data <- cbind(loadings, h2, u2, com)
data
This returns :
# MR2 MR3 MR1 MR5 MR4 MR6 h2 u2 com
#A1 0.11 0.07 -0.07 -0.56 -0.01 0.35 0.38 0.62 1.85
#A2 0.03 0.09 -0.08 0.64 0.01 -0.06 0.47 0.53 1.09
#A3 -0.04 0.04 -0.10 0.60 0.07 0.16 0.51 0.49 1.26
#A4 -0.07 0.19 -0.07 0.41 -0.13 0.13 0.29 0.71 2.05
#A5 -0.17 0.01 -0.16 0.47 0.10 0.22 0.47 0.53 2.11
#C1 0.05 0.54 0.08 -0.02 0.19 0.05 0.34 0.66 1.32
#C2 0.09 0.66 0.17 0.06 0.08 0.16 0.47 0.53 1.36
#C3 0.00 0.56 0.07 0.07 -0.04 0.05 0.32 0.68 1.09
#C4 0.07 -0.67 0.10 -0.01 0.02 0.25 0.55 0.45 1.35
#C5 0.15 -0.56 0.17 0.02 0.10 0.01 0.43 0.57 1.41
#E1 -0.14 0.09 0.61 -0.14 -0.08 0.09 0.41 0.59 1.34
#E2 0.06 -0.03 0.68 -0.07 -0.08 -0.01 0.56 0.44 1.07
#E3 0.02 0.01 -0.32 0.17 0.38 0.28 0.51 0.49 3.28
#E4 -0.07 0.03 -0.49 0.25 0.00 0.31 0.56 0.44 2.26
#E5 0.16 0.27 -0.39 0.07 0.24 0.04 0.41 0.59 3.01
#N1 0.82 -0.01 -0.09 -0.09 -0.03 0.02 0.67 0.33 1.05
#N2 0.83 0.02 -0.07 -0.07 0.01 -0.07 0.65 0.35 1.04
#N3 0.69 -0.03 0.13 0.09 0.02 0.06 0.55 0.45 1.12
#N4 0.44 -0.14 0.43 0.09 0.10 0.01 0.51 0.49 2.41
#N5 0.47 -0.01 0.21 0.21 -0.17 0.09 0.38 0.62 2.23
#O1 -0.05 0.07 -0.01 -0.04 0.57 0.09 0.36 0.64 1.11
#O2 0.12 -0.09 0.01 0.12 -0.43 0.28 0.30 0.70 2.20
#O3 0.01 0.00 -0.10 0.05 0.65 0.04 0.48 0.52 1.06
#O4 0.10 -0.05 0.34 0.15 0.37 -0.04 0.24 0.76 2.55
#O5 0.04 -0.04 -0.02 -0.01 -0.50 0.30 0.33 0.67 1.67
#gender 0.20 0.09 -0.12 0.33 -0.21 -0.15 0.18 0.82 3.58
#education -0.03 0.01 0.05 0.11 0.12 -0.22 0.07 0.93 2.17
#age -0.06 0.07 -0.02 0.16 0.03 -0.26 0.10 0.90 2.05
Ronak Shaw answered my question above, and I used his answer to help create the following function, which nearly reproduces the psych.print data.frame of fa.sort output
fa_table <- function(x, cut) {
#get sorted loadings
loadings <- fa.sort(fa_ml_oblimin)$loadings %>% round(3)
#cut loadings
loadings[loadings < cut] <- ""
#get additional info
add_info <- cbind(x$communalities,
x$uniquenesses,
x$complexity) %>%
as.data.frame() %>%
rename("commonality" = V1,
"uniqueness" = V2,
"complexity" = V3) %>%
rownames_to_column("item")
#build table
loadings %>%
unclass() %>%
as.data.frame() %>%
rownames_to_column("item") %>%
left_join(add_info) %>%
mutate(across(where(is.numeric), round, 3))
}

Yearly average temperature function in R

I need to write a function to calculate the average annual temperature. I have data for each month's temperature from 1880 to 2017 and need a vector that displays each of the 138 average temperatures. This is what I have tried, I know it's not a lot and not very good but I am very new to this so bear with me:
average <- function(x) {
if(any(is.na(x)))
stop("x is missing values")
c(mean(x[,1:138]))
}
sapply(gistemp.new[2:13],average)
The gistemp.new is the name I gave for the data frame and the first column is just the year. It is like this:
Year Jan Feb Mar Apr May Jun
1 1880 -0.29 -0.18 -0.11 -0.19 -0.11 -0.23
2 1881 -0.15 -0.17 0.04 0.04 0.02 -0.20
3 1882 0.15 0.15 0.04 -0.18 -0.16 -0.26
4 1883 -0.31 -0.39 -0.13 -0.17 -0.20 -0.12
df1 <- read.table(text="Year Jan Feb Mar Apr May Jun
1 1880 -0.29 -0.18 -0.11 -0.19 -0.11 -0.23
2 1881 -0.15 -0.17 0.04 0.04 0.02 -0.20
3 1882 0.15 0.15 0.04 -0.18 -0.16 -0.26
4 1883 -0.31 -0.39 -0.13 -0.17 -0.20 -0.12",h=T,strin=F)
rowMeans(df1[-1])
# 1 2 3 4
# -0.18500000 -0.07000000 -0.04333333 -0.22000000

How to subset a time series in R

In particular, I'd like to subset the temperature measurements from 1960 onwards in the time series gtemp in the package astsa:
require(astsa)
gtemp
Time Series:
Start = 1880
End = 2009
Frequency = 1
[1] -0.28 -0.21 -0.26 -0.27 -0.32 -0.32 -0.29 -0.36 -0.27 -0.17 -0.39 -0.27 -0.32
[14] -0.33 -0.33 -0.25 -0.14 -0.11 -0.25 -0.15 -0.07 -0.14 -0.24 -0.30 -0.34 -0.24
[27] -0.19 -0.39 -0.33 -0.35 -0.33 -0.34 -0.32 -0.30 -0.15 -0.10 -0.30 -0.39 -0.33
[40] -0.20 -0.19 -0.14 -0.26 -0.22 -0.22 -0.17 -0.02 -0.15 -0.12 -0.26 -0.08 -0.02
[53] -0.08 -0.19 -0.07 -0.12 -0.05 0.07 0.10 0.01 0.04 0.10 0.03 0.09 0.19
[66] 0.06 -0.05 0.00 -0.04 -0.07 -0.16 -0.04 0.03 0.11 -0.10 -0.10 -0.17 0.08
[79] 0.08 0.06 -0.01 0.07 0.04 0.08 -0.21 -0.11 -0.03 -0.01 -0.04 0.08 0.03
[92] -0.10 0.00 0.14 -0.08 -0.05 -0.16 0.12 0.01 0.08 0.18 0.26 0.04 0.26
[105] 0.09 0.05 0.12 0.26 0.31 0.19 0.37 0.35 0.12 0.13 0.23 0.37 0.29
[118] 0.39 0.56 0.32 0.33 0.48 0.56 0.55 0.48 0.62 0.54 0.57 0.43 0.57
The individual time points are not labeled in years, so although I can do gtemp[3] [1] -0.26, I can't do gtemp[as.date(1960)], for instance to get the value in 1960.
How can I bring out the correspondence between year and measurements, so as to later subset values?
We can make use of the window function
gtemp1 <- window(gtemp, start = 1960)
gtemp1
#Time Series:
#Start = 1960
#End = 2009
#Frequency = 1
#[1] -0.01 0.07 0.04 0.08 -0.21 -0.11 -0.03 -0.01 -0.04 0.08 0.03
#[12]-0.10 0.00 0.14 -0.08 -0.05 -0.16 0.12 0.01 0.08 0.18 0.26
#[23] 0.04 0.26 0.09 0.05 0.12 0.26 0.31 0.19 0.37 0.35 0.12
#[34] 0.13 0.23 0.37 0.29 0.39 0.56 0.32 0.33 0.48 0.56 0.55
#[45] 0.48 0.62 0.54 0.57 0.43 0.57
Function time can also help to answer your question
How can I bring out the correspondence between year and measurements, so as to later subset values?
head(time(gtemp))
[1] 1880 1881 1882 1883 1884 1885
If you want the value that corresponds to 1961, you can write
gtemp[time(gtemp) == 1961]
[1] 0.07
As mentioned in the first answer, you can also use the function window
window(gtemp, start = 1961, end = 1961)
Time Series:
Start = 1961
End = 1961
Frequency = 1
[1] 0.07
that returns the result as one point time series. You can convert it into a number by
as.numeric(window(gtemp, start = 1961, end = 1961))
[1] 0.07

A better way to plot lots of lines (in ggplot perhaps)?

Using R 3.0.2, I have a dataframe that looks like
head()
0 5 10 15 30 60 120 180 240
YKL134C 0.08 -0.03 -0.74 -0.92 -0.80 -0.56 -0.54 -0.42 -0.48
YMR056C -0.33 -0.26 -0.56 -0.58 -0.97 -1.47 -1.31 -1.53 -1.55
YBR085W 0.55 3.33 4.11 3.47 2.16 2.19 2.01 2.09 1.55
YJR155W -0.44 -0.92 -0.27 0.75 0.28 0.45 0.45 0.38 0.51
YNL331C 0.42 0.01 -0.05 0.23 0.19 0.43 0.73 0.95 0.86
YOL165C -0.49 -0.46 -0.25 0.03 -0.26 -0.16 -0.12 -0.37 -0.34
Where row.names() are variable names, names() are measurement times, and the values are measurements. It's several thousand rows deep. Let's call it tmp.
I want to do a sanity check of plotting every variable as time versus value as a line-plot on one plot. What's a better way to do it than naively plotting each line with plot() and lines():
timez <- names(tmp)
plot(x=timez, y=tmp[1,], type="l", ylim=c(-5,5))
for (i in 2:length(tmp[,1])) {
lines(x=timez,y=tmp[i,])
}
The above crude answer is good enough, but I'm looking for a way to do this right. I had a concusion recently, so sorry if I'm missing something obvious. I've been doing that a lot.
Could it be something with transposing the data.frame so it's each timepoint observed across several thousand variables? Or melt()-ing the data.frame in some meaningful way? Is there someway of handling it in ggplot using aggregate()s of data.frames or something? This isn't the right way to do this, is it?
At a loss.
I personally prefer ggplot2 for all of my plotting needs. Assuming I've understood you correctly, you can put the data in long format with reshape2 and then use ggplot2 to plot all of your lines on the same plot:
library(reshape2)
df2<-melt(df,id.var="var")
names(df2)<-c("var","time","value")
df2$time<-as.numeric(substring(df2$time,2))
library(ggplot2)
ggplot(df2,aes(x=time,y=value,colour=var))+geom_line()
You can simply use matplot as follows
DF
## 0 5 10 15 30 60 120 180 240
## YKL134C 0.08 -0.03 -0.74 -0.92 -0.80 -0.56 -0.54 -0.42 -0.48
## YMR056C -0.33 -0.26 -0.56 -0.58 -0.97 -1.47 -1.31 -1.53 -1.55
## YBR085W 0.55 3.33 4.11 3.47 2.16 2.19 2.01 2.09 1.55
## YJR155W -0.44 -0.92 -0.27 0.75 0.28 0.45 0.45 0.38 0.51
## YNL331C 0.42 0.01 -0.05 0.23 0.19 0.43 0.73 0.95 0.86
## YOL165C -0.49 -0.46 -0.25 0.03 -0.26 -0.16 -0.12 -0.37 -0.34
matplot(t(DF), type = "l", xaxt = "n", ylab = "") + axis(side = 1, at = 1:length(names(DF)), labels = names(DF))
xaxt = "n" suppresses ploting x axis annotations. axis function allows you to specify details for any axis, in this case we are using to specify labels of x axis.
It should produce plot as below.

Resources