i want to sum the months for all the years in a time series that looks like
Jan Feb Mar Apr Jun Jul Aug Sep Oct Nov Dec
2006 4 4 3 4 4 5 5 3 3
2007 3 3 2 2 4 3 3 2 2 5 5
2008 3 3 3 2 2 4 4 3
by using
window(the time series object,start=c(2006,3),end=c(2008,3),frequency=1)
this line gives you a new ts object with just march of 2006-2007. However this does not work when the month does not have any values in it, is there any way to replace the gaps with NA? I have seen questions like this before but the dont answer i think for a ts object.
Assuming that
the_time_series_object <- ts(1:31, frequency = 12, start = c(2006, 3))
then:
window(the time series object, start = c(2006,3), end = c(2008,3), frequency = 12)
Your frequency should be 12 instead of 1. There's no NA problem it's just that one variable that you have wrong
Related
I have a time series with semi-annual (half-yearly) data points.
It seems that the ts() function can't handle that as "frequency = 2" returns a very strange time series object that extends far beyond the actual time period.
Is there any way to do time series analysis of this kind of time series object in R?
EDIT: Here's an example:
dat <- seq(1, 17, by = 1)
> semi <- ts(dat, start = c(2008,12), frequency = 2)
> semi
Time Series:
Start = c(2013, 2)
End = c(2021, 2)
Frequency = 2
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
I was expecting:
> semi
s1 s2
2008 1
2009 2 3
2010 4 5
2011 6 7
2012 8 9
2013 10 11
2014 12 13
2015 14 15
2016 16 17
First let me explain why the first ts element starts at 2013 in stead of 2008. The function start and end work with the periods/frequencies. You selected the 12th period after 2008 which is the second period in 2013 if your frequency is 2.
This should work for the period:
semi <- ts(dat, start = c(2008,2), frequency = 2)
Still semi gives the correct timeseries, however, it does not know the names with a frequency of 2. If you plot the timeseries the correct half yearly graph will be shown.
plot.ts(semi)
In this problem someone explained about the standard frequencies, which ts() knows.
I have aggregated a table from my datafile using this synthax:
sumtab <- as.data.frame(table(S$MONTH))
colnames(sumtab) <- c("Month", "Frq")
rownames(sumtab) <- c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug",
"Sep","Oct","Dec")
Resulting in this table sumtab:
Month Frq
Jan 1 3
Feb 2 5
Mar 3 16
Apr 4 45
May 5 11
Jun 6 16
Jul 7 99
Aug 8 101
Sep 9 45
Oct 10 456
Dec 12 112
And this script produces a ggplot:
ggplot(sumtab, aes(x=Month,y=Frq),width=1.5) +
scale_y_continuous(limit=c(0,17),expand=c(0, 0)) +
geom_bar(stat='identity',fill="lightgreen",colour="black") +
xlab("Month") + ylab("No of bears killed") +
theme_bw(base_size = 11) +
theme(axis.text.x=element_text(angle=0,size=9))
The problem is that there are no values for November in my data, and I need to somehow enter a zero for November in the table. Probably a simple thing for most of you, and I have tried to search in other questions , and I have googled and read the books, but been unable to find the correct synthax.Need a little help.
Adding rbind into the script:
sumtab <- as.data.frame(table(S$MONTH))
sumtab <- rbind(sumtab, c(11, 0))
produced this error message:
Warning message:
In `[<-.factor`(`*tmp*`, ri, value = 11) :
invalid factor level, NA generated
ant this table:
Var1 Freq
1 1 3
2 2 5
3 3 6
4 4 14
5 5 7
6 6 2
7 7 13
8 8 12
9 9 3
10 10 1
11 12 4
12 <NA> 0
So thanks #PaulH for your help, but I've probably used your help in a wrong way.
You could use the rbind command to add the November row:
sumtab <- rbind(sumtab, Nov = c(11, 0))
Good luck!
I'm trying to learn R coming from Stata, but have run into the following two problems which I cannot seem to find elegant solutions for in R:
1) I have a panel dataset with gaps in my time variable. I would like to expand my time variable to include the gaps despite having no observed data for these rows.
In Stata I would usually go about this by setting my ID and time variables with xtset and then expanding the dataset based on this with tsfill. Is there an equivalently elegant way in R?
2) I would like to fill some of the new, blank cells with data for constant variables.
In Stata I would do this by copying data from previous (relative to my time variable) observations using the l.-prefix; for example using replace Con = l.Con.
In other words I'm asking how to go from something like this:
ID Time Num Con
1 Jan 10 A
1 Feb 15 A
1 May 20 A
2 Feb 12 B
2 Mar 14 B
2 Jun 15 B
To something like this:
ID Time Num Con
1 Jan 10 A
1 Feb 15 A
1 Mar A
1 Apr A
1 May 20 A
2 Feb 12 B
2 Mar 14 B
2 Apr B
2 May B
2 Jun 15 B
Hopefully that makes sense. Thanks in advance.
You can try merge from base R or the data.table join
library(data.table)
DT2 <- setDT(df1)[, {tmp <- match(Time, month.abb)
list(Time=month.abb[min(tmp):max(tmp)])}, .(ID,Con)]
setkey(df1[, c(1,4,2,3), with=FALSE], ID, Con, Time)[DT2]
# ID Con Time Num
# 1: 1 A Jan 10
# 2: 1 A Feb 15
# 3: 1 A Mar NA
# 4: 1 A Apr NA
# 5: 1 A May 20
# 6: 2 B Feb 12
# 7: 2 B Mar 14
# 8: 2 B Apr NA
# 9: 2 B May NA
#10: 2 B Jun 15
NOTE: It may be better to keep missing value as NA
I issue the following commands:
ops <- read.csv("ops.csv")
ops.ts <- ts(ops, frequency=12, start=c(2014,1))
ops.fc <- forecast(ops.ts)
forecast() then throws the following error:
Error in ...fourier(x, K, 1:length(x)) :
K must be not be greater than period/2
The data from the csv looks like this according to summary(ops):
1 10
2 3
3 7
4 4
5 2
6 20
7 13
8 9
9 8
10 7
11 6
12 11
13 7
R is up to date, Forecast is installed via CRAN.
I appreciate any advice especially because I am quiet new to R.
The error message is self-explanatory.
You have 13 elements in your dataset so when you do:
ops.ts <- ts(ops, frequency = 12, start=c(2014, 1))
You get (notice the 2015 value here):
#> ops.ts
# Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
#2014 10 3 7 4 2 20 13 9 8 7 6 11
#2015 7
I'm guessing you only want to use the first 12 months and then use forecast() ? If that is the case you can do either:
ops.ts <- ts(ops, frequency = 12, start = 2014, end = c(2015, 0))
ops.fc <- forecast(ops.ts)
or
ops <- ops[1:12, ]
ops.ts <- ts(ops, frequency = 12, start = 2014)
ops.fc <- forecast(ops.ts)
The txt is like
#---*----1----*----2----*---
Name Time.Period Value
A Jan 2013 10
B Jan 2013 11
C Jan 2013 12
A Feb 2013 9
B Feb 2013 11
C Feb 2013 15
A Mar 2013 10
B Mar 2013 8
C Mar 2013 13
I tried to use read.table with readLines and count.field as shown belows:
> path <- list.files()
> data <- read.table(text=readLines(path)[count.fields(path, blank.lines.skip=FALSE) == 4])
Warning message:
In readLines(path) : incomplete final line found on 'data1.txt'
> data
V1 V2 V3 V4
1 A Jan 2013 10
2 B Jan 2013 11
3 C Jan 2013 12
4 A Feb 2013 9
5 B Feb 2013 11
6 C Feb 2013 15
7 A Mar 2013 10
8 B Mar 2013 8
9 C Mar 2013 13
The problem is that it give four attributes instead of three. Therefore i manipulate my data as below which seeking a alternative.
> library(zoo)
> data$Name <- as.character(data$V1)
> data$Time.Period <- as.yearmon(paste(data$V2, data$V3, sep=" "))
> data$Value <- as.numeric(data$V4)
> DATA <- data[, 5:7]
> DATA
Name Time.Period Value
1 A Jan 2013 10
2 B Jan 2013 11
3 C Jan 2013 12
4 A Feb 2013 9
5 B Feb 2013 11
6 C Feb 2013 15
7 A Mar 2013 10
8 B Mar 2013 8
9 C Mar 2013 13
You can use read.fwf to read fixed width files. You need to correctly specify the width of each column, in spaces.
data <- read.fwf(path, widths=c(-12, 8, -4, 2), header=T)
The key there is how you specify the width. Negative means skip that many places, positive means read that many. I am assuming entries in the last column have only 2 digits. Change widths accordingly if this is not the case. You will probably also have to fix the column names.
You will have to change the indices if the file format changes, or come up with some clever regexp to read it from the first few rows. A better solution would be to enclose your strings in " or, even better, avoid the format altogether.
?count.fields
As the R Documentation states count.fields counts the number of fields, as separated by sep, in each of the lines of file read, when you set count.fields(path, blank.lines.skip=FALSE) == 4 it will skip the header row which actually has three fields.