Create data frame from EFA output in R - r

I am working on EFA and would like to customize my tables. There is a function, psych.print to suppress factor loadings of a certain value to make the table easier to read. When I run this function, it produces this data and the summary stats in the console (in an .RMD document, it produces console text and a separate data frame of the factor loadings with loadings suppressed). However, if I attempt to save this as an object, it does not keep this data.
Here is an example:
library(psych)
bfi_data=bfi
bfi_data=bfi_data[complete.cases(bfi_data),]
bfi_cor <- cor(bfi_data)
factors_data <- fa(r = bfi_cor, nfactors = 6)
print.psych(fa_ml_oblimin_2, cut=.32, sort="TRUE")
In an R script, it produces this:
item MR2 MR3 MR1 MR5 MR4 MR6 h2 u2 com
N2 17 0.83 0.654 0.35 1.0
N1 16 0.82 0.666 0.33 1.1
N3 18 0.69 0.549 0.45 1.1
N5 20 0.47 0.376 0.62 2.2
N4 19 0.44 0.43 0.506 0.49 2.4
C4 9 -0.67 0.555 0.45 1.3
C2 7 0.66 0.475 0.53 1.4
C5 10 -0.56 0.433 0.57 1.4
C3 8 0.56 0.317 0.68 1.1
C1 6 0.54 0.344 0.66 1.3
In R Markdown, it produces this:
How can I save that data.frame as an object?

Looking at the str of the object it doesn't look that what you want is built-in. An ugly way would be to use capture.output and try to convert the character vector to dataframe using string manipulation. Else since the data is being displayed it means that the data is present somewhere in the object itself. I could find out vectors of same length which can be combined to form the dataframe.
loadings <- unclass(factors_data$loadings)
h2 <- factors_data$communalities
#There is also factors_data$communality which has same values
u2 <- factors_data$uniquenesses
com <- factors_data$complexity
data <- cbind(loadings, h2, u2, com)
data
This returns :
# MR2 MR3 MR1 MR5 MR4 MR6 h2 u2 com
#A1 0.11 0.07 -0.07 -0.56 -0.01 0.35 0.38 0.62 1.85
#A2 0.03 0.09 -0.08 0.64 0.01 -0.06 0.47 0.53 1.09
#A3 -0.04 0.04 -0.10 0.60 0.07 0.16 0.51 0.49 1.26
#A4 -0.07 0.19 -0.07 0.41 -0.13 0.13 0.29 0.71 2.05
#A5 -0.17 0.01 -0.16 0.47 0.10 0.22 0.47 0.53 2.11
#C1 0.05 0.54 0.08 -0.02 0.19 0.05 0.34 0.66 1.32
#C2 0.09 0.66 0.17 0.06 0.08 0.16 0.47 0.53 1.36
#C3 0.00 0.56 0.07 0.07 -0.04 0.05 0.32 0.68 1.09
#C4 0.07 -0.67 0.10 -0.01 0.02 0.25 0.55 0.45 1.35
#C5 0.15 -0.56 0.17 0.02 0.10 0.01 0.43 0.57 1.41
#E1 -0.14 0.09 0.61 -0.14 -0.08 0.09 0.41 0.59 1.34
#E2 0.06 -0.03 0.68 -0.07 -0.08 -0.01 0.56 0.44 1.07
#E3 0.02 0.01 -0.32 0.17 0.38 0.28 0.51 0.49 3.28
#E4 -0.07 0.03 -0.49 0.25 0.00 0.31 0.56 0.44 2.26
#E5 0.16 0.27 -0.39 0.07 0.24 0.04 0.41 0.59 3.01
#N1 0.82 -0.01 -0.09 -0.09 -0.03 0.02 0.67 0.33 1.05
#N2 0.83 0.02 -0.07 -0.07 0.01 -0.07 0.65 0.35 1.04
#N3 0.69 -0.03 0.13 0.09 0.02 0.06 0.55 0.45 1.12
#N4 0.44 -0.14 0.43 0.09 0.10 0.01 0.51 0.49 2.41
#N5 0.47 -0.01 0.21 0.21 -0.17 0.09 0.38 0.62 2.23
#O1 -0.05 0.07 -0.01 -0.04 0.57 0.09 0.36 0.64 1.11
#O2 0.12 -0.09 0.01 0.12 -0.43 0.28 0.30 0.70 2.20
#O3 0.01 0.00 -0.10 0.05 0.65 0.04 0.48 0.52 1.06
#O4 0.10 -0.05 0.34 0.15 0.37 -0.04 0.24 0.76 2.55
#O5 0.04 -0.04 -0.02 -0.01 -0.50 0.30 0.33 0.67 1.67
#gender 0.20 0.09 -0.12 0.33 -0.21 -0.15 0.18 0.82 3.58
#education -0.03 0.01 0.05 0.11 0.12 -0.22 0.07 0.93 2.17
#age -0.06 0.07 -0.02 0.16 0.03 -0.26 0.10 0.90 2.05

Ronak Shaw answered my question above, and I used his answer to help create the following function, which nearly reproduces the psych.print data.frame of fa.sort output
fa_table <- function(x, cut) {
#get sorted loadings
loadings <- fa.sort(fa_ml_oblimin)$loadings %>% round(3)
#cut loadings
loadings[loadings < cut] <- ""
#get additional info
add_info <- cbind(x$communalities,
x$uniquenesses,
x$complexity) %>%
as.data.frame() %>%
rename("commonality" = V1,
"uniqueness" = V2,
"complexity" = V3) %>%
rownames_to_column("item")
#build table
loadings %>%
unclass() %>%
as.data.frame() %>%
rownames_to_column("item") %>%
left_join(add_info) %>%
mutate(across(where(is.numeric), round, 3))
}

Related

Julia plot applies which function for colors?

When using the Plots.plot function in the case below, apparently the matrix m is taken as the colors, but also is not shown on every point. There seems to be a function applied to the values before it is displayed.
The matrix m has no value at 0 nor 1, but the image shows a lot of white areas... It seems to work out some kind of levels...
How can I find out which function is used?
For example I would like to be able to use the data after that transformation.
# I define a matrix of 21 x 21 pixels
m = Float32[0.22 0.24 0.24 0.26 0.3 0.33 0.33 0.36 0.42 0.4 0.38 0.39 0.42 0.44 0.49 0.53 0.54 0.55 0.56 0.56 0.56; 0.23 0.24 0.25 0.29 0.32 0.36 0.39 0.41 0.44 0.42 0.41 0.44 0.45 0.46 0.54 0.59 0.61 0.61 0.59 0.58 0.58; 0.26 0.26 0.27 0.33 0.36 0.4 0.41 0.44 0.48 0.49 0.46 0.48 0.46 0.48 0.54 0.56 0.58 0.62 0.6 0.6 0.59; 0.27 0.28 0.32 0.36 0.4 0.41 0.44 0.46 0.47 0.47 0.46 0.48 0.46 0.5 0.54 0.57 0.56 0.61 0.6 0.58 0.57; 0.19 0.2 0.25 0.32 0.39 0.42 0.47 0.47 0.47 0.46 0.45 0.47 0.46 0.5 0.56 0.58 0.57 0.62 0.6 0.59 0.58; 0.2 0.2 0.24 0.32 0.34 0.36 0.39 0.42 0.47 0.48 0.46 0.47 0.45 0.49 0.56 0.57 0.6 0.63 0.58 0.59 0.59; 0.21 0.2 0.27 0.34 0.35 0.35 0.36 0.37 0.39 0.45 0.46 0.47 0.45 0.48 0.56 0.62 0.62 0.61 0.58 0.58 0.58; 0.23 0.24 0.31 0.35 0.36 0.38 0.37 0.38 0.38 0.4 0.44 0.45 0.47 0.47 0.55 0.64 0.58 0.58 0.58 0.58 0.57; 0.22 0.28 0.35 0.36 0.37 0.38 0.39 0.39 0.4 0.42 0.43 0.43 0.45 0.47 0.53 0.57 0.56 0.56 0.57 0.57 0.57; 0.21 0.28 0.34 0.36 0.37 0.38 0.39 0.41 0.42 0.42 0.44 0.45 0.45 0.49 0.51 0.54 0.56 0.56 0.56 0.56 0.56; 0.22 0.27 0.31 0.32 0.34 0.37 0.39 0.39 0.39 0.43 0.46 0.46 0.49 0.51 0.51 0.54 0.55 0.56 0.56 0.55 0.55; 0.23 0.27 0.3 0.31 0.33 0.35 0.38 0.38 0.39 0.44 0.46 0.49 0.51 0.5 0.52 0.53 0.55 0.56 0.56 0.56 0.56; 0.23 0.27 0.31 0.34 0.36 0.36 0.38 0.4 0.42 0.45 0.45 0.49 0.51 0.52 0.55 0.55 0.56 0.58 0.58 0.57 0.58; 0.26 0.32 0.35 0.36 0.36 0.36 0.4 0.44 0.47 0.47 0.47 0.49 0.51 0.54 0.56 0.58 0.58 0.58 0.59 0.58 0.57; 0.3 0.33 0.35 0.35 0.36 0.37 0.4 0.47 0.5 0.47 0.46 0.47 0.5 0.53 0.56 0.58 0.6 0.6 0.61 0.6 0.59; 0.31 0.35 0.36 0.36 0.37 0.4 0.44 0.48 0.49 0.46 0.45 0.45 0.5 0.55 0.58 0.6 0.62 0.62 0.61 0.6 0.58; 0.33 0.39 0.41 0.39 0.38 0.44 0.47 0.49 0.5 0.47 0.46 0.45 0.47 0.52 0.54 0.56 0.57 0.58 0.56 0.56 0.55; 0.33 0.4 0.42 0.45 0.46 0.48 0.49 0.5 0.51 0.48 0.46 0.45 0.46 0.49 0.54 0.56 0.56 0.59 0.6 0.59 0.57; 0.37 0.41 0.43 0.47 0.5 0.52 0.49 0.48 0.51 0.49 0.47 0.47 0.47 0.49 0.58 0.61 0.62 0.62 0.62 0.61 0.6; 0.4 0.43 0.47 0.49 0.49 0.51 0.49 0.48 0.51 0.49 0.48 0.49 0.46 0.51 0.59 0.59 0.6 0.6 0.59 0.59 0.59; 0.42 0.47 0.5 0.5 0.49 0.48 0.49 0.49 0.51 0.49 0.5 0.49 0.49 0.55 0.58 0.56 0.57 0.61 0.6 0.58 0.58]
using Plots
plot(1:21, 1:21, m)
It looks like plot called with two vectors and a matrix produces a contour plot - so you'll get the same output from contour(1:21, 1:21, m).
If you want to get rid of the whitespace, use the fill = true keyword:
julia> plot(1:21, 1:21, m, fill = true)

R. Remove blocks of observations in df if they fulfill condition

I have a huge dataframe (>1,000,000 rows) like this.
term estimate st.error statistic p.value SNP
(Intercept) 7.68 0.17 44.64 0 rs1406947
GT 0.01 0.01 0.07 0.19 rs1406947
SEX 1.52 0.14 10.87 0.1 rs1406947
M 0.12 0.29 0.41 0.67 rs1406947
N -0.06 0.12 -0.48 0.63 rs1406947
GT:SEX -0.03 0.08 -0.44 0.65 rs1406947
GT:N -0.00 0.06 -0.08 0.93 rs1406947
(Intercept) 9.23 0.20 34.64 0 rs25904
GT 0.05 0.04 0.12 0.22 rs25904
SEX 1.67 0.76 10.34 0.1 rs25904
M 0.14 0.39 0.51 0.55 rs25904
N -0.08 0.05 -0.46 0.55 rs25904
GT:SEX -0.19 0.11 -0.34 0.44 rs25904
GT:N -0.22 0.33 -0.44 0.55 rs25904
(Intercept) 7.99 0.66 44.44 0 rs7133579
GT 0.01 0.3 0.04 0.33 rs7133579
SEX 1.22 0.22 10.44 0.15 rs7133579
M 0.88 0.22 0.33 0.44 rs7133579
N -0.5 0.5 -0.5 0.6 rs7133579
GT:N -0.00 0.03 -0.04 0.78 rs7133579
It is composed by blocks of 7 observations: (Intercept), GT, SEX, M, N, GT:SEX and GT:N. However, a few blocks lack one or more of the observations (e.g. the third block lacks GT:SEX). Using R, I want to remove these blocks. In this toy example I would get:
term estimate st.error statistic p.value SNP
(Intercept) 7.68 0.17 44.64 0 rs1406947
GT 0.01 0.01 0.07 0.19 rs1406947
SEX 1.52 0.14 10.87 0.1 rs1406947
M 0.12 0.29 0.41 0.67 rs1406947
N -0.06 0.12 -0.48 0.63 rs1406947
GT:SEX -0.03 0.08 -0.44 0.65 rs1406947
GT:N -0.00 0.06 -0.08 0.93 rs1406947
(Intercept) 9.23 0.20 34.64 0 rs25904
GT 0.05 0.04 0.12 0.22 rs25904
SEX 1.67 0.76 10.34 0.1 rs25904
M 0.14 0.39 0.51 0.55 rs25904
N -0.08 0.05 -0.46 0.55 rs25904
GT:SEX -0.19 0.11 -0.34 0.44 rs25904
GT:N -0.22 0.33 -0.44 0.55 rs25904
I think you'd want to group by SNP and check those blocks for whether they comply with your expectations:
library(dplyr)
expected_terms <- c("(Intercept)", "GT", "SEX", "M", "N", "GT:SEX", "GT:N")
df %>%
group_by(SNP) %>%
filter(
all(expected_terms %in% term)
)
Stricter than that, if you need to make sure that each of your terms exist only once or no other terms appear:
df %>%
group_by(SNP) %>%
filter(
# use `table` to count occurrence of terms, keep only if all are counted exactly once
all(table(term)[expected_terms] == 1),
# keep only if no terms remain after removing your expected set
length(setdiff(term, expected_terms)) == 0
)
Assuming that (Intercept) is present everytime, you can test if the length of each block is 7.
x[unlist(lapply(split(seq_len(nrow(x)), cumsum(x$term == "(Intercept)")),
function(y) {if(length(y) == 7) y else NULL})), ]
# term estimate st.error statistic p.value SNP
#1 (Intercept) 7.68 0.17 44.64 0.00 rs1406947
#2 GT 0.01 0.01 0.07 0.19 rs1406947
#3 SEX 1.52 0.14 10.87 0.10 rs1406947
#4 M 0.12 0.29 0.41 0.67 rs1406947
#5 N -0.06 0.12 -0.48 0.63 rs1406947
#6 GT:SEX -0.03 0.08 -0.44 0.65 rs1406947
#7 GT:N 0.00 0.06 -0.08 0.93 rs1406947
#8 (Intercept) 9.23 0.20 34.64 0.00 rs25904
#9 GT 0.05 0.04 0.12 0.22 rs25904
#10 SEX 1.67 0.76 10.34 0.10 rs25904
#11 M 0.14 0.39 0.51 0.55 rs25904
#12 N -0.08 0.05 -0.46 0.55 rs25904
#13 GT:SEX -0.19 0.11 -0.34 0.44 rs25904
#14 GT:N -0.22 0.33 -0.44 0.55 rs25904
Data:
x <- read.table(header=TRUE, text="term estimate st.error statistic p.value SNP
(Intercept) 7.68 0.17 44.64 0 rs1406947
GT 0.01 0.01 0.07 0.19 rs1406947
SEX 1.52 0.14 10.87 0.1 rs1406947
M 0.12 0.29 0.41 0.67 rs1406947
N -0.06 0.12 -0.48 0.63 rs1406947
GT:SEX -0.03 0.08 -0.44 0.65 rs1406947
GT:N -0.00 0.06 -0.08 0.93 rs1406947
(Intercept) 9.23 0.20 34.64 0 rs25904
GT 0.05 0.04 0.12 0.22 rs25904
SEX 1.67 0.76 10.34 0.1 rs25904
M 0.14 0.39 0.51 0.55 rs25904
N -0.08 0.05 -0.46 0.55 rs25904
GT:SEX -0.19 0.11 -0.34 0.44 rs25904
GT:N -0.22 0.33 -0.44 0.55 rs25904
(Intercept) 7.99 0.66 44.44 0 rs7133579
GT 0.01 0.3 0.04 0.33 rs7133579
SEX 1.22 0.22 10.44 0.15 rs7133579
M 0.88 0.22 0.33 0.44 rs7133579
N -0.5 0.5 -0.5 0.6 rs7133579
GT:N -0.00 0.03 -0.04 0.78 rs7133579")

R. Add column to df where rows have names of element from list

I have a list of all files (dataframes) within a directory:
library("plyr")
library("dplyr")
library("broom")
library("tidyr")
snp_list <- list.files(pattern="*.txt", all.files = T,full.names = F)
I also have a dataframe A obtained through the following function:
pv1= lapply(snp_list, function(x) tidy(lm(PV ~ GT*SEX + M + GT*N,read.table(x,header=TRUE)))) %>%
bind_rows()
Dataframe A has 7 rows ((Intercept), GT, SEX, M, N, GT:SEX, GT:N) for each element in list snp_list. In this toy example the list has 3 elements (rs1406947.txt rs25904.txt rs7133579.txt), but in reality there are 1,200,000 elements
A:
term estimate st.error statistic p.value
(Intercept) 7.68 0.17 44.64 0
GT 0.01 0.01 0.07 0.19
SEX 1.52 0.14 10.87 0.1
M 0.12 0.29 0.41 0.67
N -0.06 0.12 -0.48 0.63
GT:SEX -0.03 0.08 -0.44 0.65
GT:N -0.00 0.06 -0.08 0.93
(Intercept) 9.23 0.20 34.64 0
GT 0.05 0.04 0.12 0.22
SEX 1.67 0.76 10.34 0.1
M 0.14 0.39 0.51 0.55
N -0.08 0.05 -0.46 0.55
GT:SEX -0.19 0.11 -0.34 0.44
GT:N -0.22 0.33 -0.44 0.55
(Intercept) 7.99 0.66 44.44 0
GT 0.01 0.3 0.04 0.33
SEX 1.22 0.22 10.44 0.15
M 0.88 0.22 0.33 0.44
N -0.5 0.5 -0.5 0.6
GT:SEX -0.06 0.09 -0.74 0.35
GT:N -0.00 0.03 -0.04 0.78
I want to add a new column "SNP" to A, where each row has the name of the element the rows belongs to (nrows = 7*1,200,000). I would get this:
term estimate st.error statistic p.value SNP
(Intercept) 7.68 0.17 44.64 0 rs1406947
GT 0.01 0.01 0.07 0.19 rs1406947
SEX 1.52 0.14 10.87 0.1 rs1406947
M 0.12 0.29 0.41 0.67 rs1406947
N -0.06 0.12 -0.48 0.63 rs1406947
GT:SEX -0.03 0.08 -0.44 0.65 rs1406947
GT:N -0.00 0.06 -0.08 0.93 rs1406947
(Intercept) 9.23 0.20 34.64 0 rs25904
GT 0.05 0.04 0.12 0.22 rs25904
SEX 1.67 0.76 10.34 0.1 rs25904
M 0.14 0.39 0.51 0.55 rs25904
N -0.08 0.05 -0.46 0.55 rs25904
GT:SEX -0.19 0.11 -0.34 0.44 rs25904
GT:N -0.22 0.33 -0.44 0.55 rs25904
(Intercept) 7.99 0.66 44.44 0 rs7133579
GT 0.01 0.3 0.04 0.33 rs7133579
SEX 1.22 0.22 10.44 0.15 rs7133579
M 0.88 0.22 0.33 0.44 rs7133579
N -0.5 0.5 -0.5 0.6 rs7133579
GT:SEX -0.06 0.09 -0.74 0.35 rs7133579
GT:N -0.00 0.03 -0.04 0.78 rs7133579
Here's how to do what you asked:
A$SNP=rep(0,nrow(A))
for (i in 1:nrow(A)){
A$SNP[i]=snp_list[(i%/%8)+1]
}
Using integer division, you can generate an index for 7 elements to map to each element in snp_list.

Filter rows of dataframe based on combinations of conditions

Let's say we have df1 with p values:
Symbol p1 p2 p3 p4 p5
AABT 0.01 0.12 0.23 0.02 0.32
ABC1 0.13 0.01 0.01 0.12 0.02
ACDC 0.15 0.01 0.34 0.24 0.01
BAM1 0.01 0.02 0.04 0.01 0.02
BCR 0.01 0.36 0.02 0.07 0.04
BDSM 0.02 0.43 0.01 0.03 0.41
BGL 0.27 0.77 0.01 0.04 0.02
and df2 with Fold Changes:
Symbol FC1 FC2 FC3 FC4 FC5
AABT 1.21 -0.32 0.23 -0.72 0.45
ABC1 0.13 0.93 -1.61 0.12 1.03
ACDC 0.23 1.31 0.42 -0.39 1.50
BAM1 -1.33 -1.27 -0.89 1.22 -1.03
BCR 1.43 -0.25 1.29 0.54 0.97
BDSM 1.20 0.23 -1.98 -1.09 -0.31
BGL 0.33 0.12 -1.33 -1.14 -1.23
I would like to do the following in df2:
Keep rows that in df1, have values < 0.05 in 3/5 of columns or greater
Eliminate rows that show discordant signs of FC. FC should be taken into consideration only when the respective p from df1 is lower than 0.05 (i.e. significant)
Sort the resulting data in an intuitive order so as to discriminate rows having positive FC from rows having negative FC, and if possible, discriminate rows whose significances in FC arise sequentially (e.g. FC3 FC4 FC5) from others that don't (e.g. FC1 FC3 FC5)
For example, step 1 would result in:
Symbol FC1 FC2 FC3 FC4 FC5
ABC1 0.13 0.93 -1.61 0.12 1.03
BAM1 -1.33 -1.27 -0.89 1.22 -1.03
BCR 1.43 -0.25 1.29 0.54 0.97
BDSM 1.20 0.23 -1.98 -1.09 -0.31
BGL 0.33 0.12 -1.33 -1.14 -1.23
and step 2, in:
Symbol FC1 FC2 FC3 FC4 FC5
BCR 1.43 -0.25 1.29 0.54 0.97
BGL 0.33 0.12 -1.33 -1.14 -1.23
How can this be achieved? I imagine using a for loop and the count function would do the job for step 1, but steps 2 and 3 look somewhat complicated to me. Thank you in advance for your elegant solutions.
data
df1:
df1 <- read.table(h=T,strin=F,text="Symbol p1 p2 p3 p4 p5
AABT 0.01 0.12 0.23 0.02 0.32
ABC1 0.13 0.01 0.01 0.12 0.02
ACDC 0.15 0.01 0.34 0.24 0.01
BAM1 0.01 0.02 0.04 0.01 0.02
BCR 0.01 0.36 0.02 0.07 0.04
BDSM 0.02 0.43 0.01 0.03 0.41
BGL 0.27 0.77 0.01 0.04 0.02")
df2:
df2 <- read.table(h=T,strin=F,text="Symbol FC1 FC2 FC3 FC4 FC5
AABT 1.21 -0.32 0.23 -0.72 0.45
ABC1 0.13 0.93 -1.61 0.12 1.03
ACDC 0.23 1.31 0.42 -0.39 1.50
BAM1 -1.33 -1.27 -0.89 1.22 -1.03
BCR 1.43 -0.25 1.29 0.54 0.97
BDSM 1.20 0.23 -1.98 -1.09 -0.31
BGL 0.33 0.12 -1.33 -1.14 -1.23")
I'm not sure how elegant this is, but you can get the result you requested using apply and sapply with subsetting, like this:
# Create logical matrix telling us whether p values are significant
sig <- apply(df1[-1], 2, function(x) x < 0.05)
# Create numeric matrix of the sign of each FC (will be either -1 or 1)
sign <- apply(df2[-1], 2, function(x) sign(x))
# Create a vector telling us whether there were 3 or more p < 0.05 in each row
ss1 <- apply(sig, 1, function(x) length(which(x)) > 2)
# Create a vector telling us whether all FC signs match excluding p = ns
ss2 <- sapply(seq(nrow(df1)), function(i) length(table(sign[i,][sig[i,]])) == 1)
# Subset the data frames accordingly:
df1[ss1, ]
#> Symbol p1 p2 p3 p4 p5
#> 2 ABC1 0.13 0.01 0.01 0.12 0.02
#> 4 BAM1 0.01 0.02 0.04 0.01 0.02
#> 5 BCR 0.01 0.36 0.02 0.07 0.04
#> 6 BDSM 0.02 0.43 0.01 0.03 0.41
#> 7 BGL 0.27 0.77 0.01 0.04 0.02
df2[ss1 & ss2, ]
#> Symbol FC1 FC2 FC3 FC4 FC5
#> 5 BCR 1.43 -0.25 1.29 0.54 0.97
#> 7 BGL 0.33 0.12 -1.33 -1.14 -1.23
Created on 2020-07-10 by the reprex package (v0.3.0)

How to subset a time series in R

In particular, I'd like to subset the temperature measurements from 1960 onwards in the time series gtemp in the package astsa:
require(astsa)
gtemp
Time Series:
Start = 1880
End = 2009
Frequency = 1
[1] -0.28 -0.21 -0.26 -0.27 -0.32 -0.32 -0.29 -0.36 -0.27 -0.17 -0.39 -0.27 -0.32
[14] -0.33 -0.33 -0.25 -0.14 -0.11 -0.25 -0.15 -0.07 -0.14 -0.24 -0.30 -0.34 -0.24
[27] -0.19 -0.39 -0.33 -0.35 -0.33 -0.34 -0.32 -0.30 -0.15 -0.10 -0.30 -0.39 -0.33
[40] -0.20 -0.19 -0.14 -0.26 -0.22 -0.22 -0.17 -0.02 -0.15 -0.12 -0.26 -0.08 -0.02
[53] -0.08 -0.19 -0.07 -0.12 -0.05 0.07 0.10 0.01 0.04 0.10 0.03 0.09 0.19
[66] 0.06 -0.05 0.00 -0.04 -0.07 -0.16 -0.04 0.03 0.11 -0.10 -0.10 -0.17 0.08
[79] 0.08 0.06 -0.01 0.07 0.04 0.08 -0.21 -0.11 -0.03 -0.01 -0.04 0.08 0.03
[92] -0.10 0.00 0.14 -0.08 -0.05 -0.16 0.12 0.01 0.08 0.18 0.26 0.04 0.26
[105] 0.09 0.05 0.12 0.26 0.31 0.19 0.37 0.35 0.12 0.13 0.23 0.37 0.29
[118] 0.39 0.56 0.32 0.33 0.48 0.56 0.55 0.48 0.62 0.54 0.57 0.43 0.57
The individual time points are not labeled in years, so although I can do gtemp[3] [1] -0.26, I can't do gtemp[as.date(1960)], for instance to get the value in 1960.
How can I bring out the correspondence between year and measurements, so as to later subset values?
We can make use of the window function
gtemp1 <- window(gtemp, start = 1960)
gtemp1
#Time Series:
#Start = 1960
#End = 2009
#Frequency = 1
#[1] -0.01 0.07 0.04 0.08 -0.21 -0.11 -0.03 -0.01 -0.04 0.08 0.03
#[12]-0.10 0.00 0.14 -0.08 -0.05 -0.16 0.12 0.01 0.08 0.18 0.26
#[23] 0.04 0.26 0.09 0.05 0.12 0.26 0.31 0.19 0.37 0.35 0.12
#[34] 0.13 0.23 0.37 0.29 0.39 0.56 0.32 0.33 0.48 0.56 0.55
#[45] 0.48 0.62 0.54 0.57 0.43 0.57
Function time can also help to answer your question
How can I bring out the correspondence between year and measurements, so as to later subset values?
head(time(gtemp))
[1] 1880 1881 1882 1883 1884 1885
If you want the value that corresponds to 1961, you can write
gtemp[time(gtemp) == 1961]
[1] 0.07
As mentioned in the first answer, you can also use the function window
window(gtemp, start = 1961, end = 1961)
Time Series:
Start = 1961
End = 1961
Frequency = 1
[1] 0.07
that returns the result as one point time series. You can convert it into a number by
as.numeric(window(gtemp, start = 1961, end = 1961))
[1] 0.07

Resources