Now I have some monthly data like :
1/1/90 620
2/1/90,591
3/1/90,574
4/1/90,542
5/1/90,534
6/1/90,545
#...etc
If I use ts() function, it's easy to make the data into time series structure like:
Jan Feb Mar ... Nov Dec
1990 620 591 574 ... 493 464
1991 100 200 300 ...........
Is there any possibilities to change it into quarterly repeating like this:
1st 2nd 3rd 4th
1990-Q1 620 591 574 464
1990-Q2 100 200 300 400
1990-Q3 ...
1990-Q4 ...
1991-Q1 ...
I tried to change
ts(mydata,start=c(1990,1),frequency=12)
to
ts(mydata,start=c(as.yearqrt("1990-1",1)),frequency=4)
but it seems not working.
Could anyone help me? Thank you very much.
monthly <- ts(mydata, start = c(1990, 1), frequency = 12)
quarterly <- aggregate(monthly, nfrequency = 4)
I don't agree with Hyndman on this one. Which is rare as Hyndman can usually do no wrong. However, I can show you his solution doesn't give the OP what he wants.
test<-c(1:100)
test_ts <- ts(test, start=c(2000,1), frequency=12)
test_ts
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2000 1 2 3 4 5 6 7 8 9 10 11 12
2001 13 14 15 16 17 18 19 20 21 22 23 24
2002 25 26 27 28 29 30 31 32 33 34 35 36
2003 37 38 39 40 41 42 43 44 45 46 47 48
2004 49 50 51 52 53 54 55 56 57 58 59 60
2005 61 62 63 64 65 66 67 68 69 70 71 72
2006 73 74 75 76 77 78 79 80 81 82 83 84
2007 85 86 87 88 89 90 91 92 93 94 95 96
2008 97 98 99 100
test_agg <- aggregate(test_ts, nfrequency=4)
test_agg
2000 6 15 24 33
2001 42 51 60 69
2002 78 87 96 105
2003 114 123 132 141
2004 150 159 168 177
2005 186 195 204 213
2006 222 231 240 249
2007 258 267 276 285
2008 294
Well, wait, that first quarter isn't the average of the 3 months, its the sum. (1+2+3 =6 but you want it to show the mean=2). So you will need to modify that a tad.
test_agg <- aggregate(test_ts, nfrequency=4)/3
# divisor is (old freq)/(new freq) = 12/4 = 3
Qtr1 Qtr2 Qtr3 Qtr4
2000 2 5 8 11
2001 14 17 20 23
2002 26 29 32 35
2003 38 41 44 47
2004 50 53 56 59
2005 62 65 68 71
2006 74 77 80 83
2007 86 89 92 95
2008 98
Which now shows you the mean of the monthly data written as quarterly.
The divisor is the trick here. If you had weekly (freq=52) and wanted quarterly (freq=4) you'd divide by 52/4=13.
If you want the mean instead of the sum, just add "mean":
quarterly <- aggregate(monthly, nfrequency=4,mean)
Related
I have a table of the following format:
Initial Table Formatting
And I'm seeking an output resembling the following:
Date
Value
January 1659
Value 1
February 1659
Value 2
March 1659
Value 3
April 1659
Value 4
and so on (numerical representations of the Month and Year are perfectly fine also.
I've attempted using merge operations but I'm thinking there must be an easier way (possibly using packages). I've found somewhat similar questions asked but none obviously applicable yet.
You can use pivot_longer and unite, both from the tidyr package:
library(tidyr)
pivot_longer(df, -Year) |>
unite(date, name, Year, sep = " ")
#> # A tibble: 120 x 2
#> date value
#> <chr> <int>
#> 1 Jan 1659 68
#> 2 Feb 1659 97
#> 3 Mar 1659 89
#> 4 Apr 1659 74
#> 5 May 1659 44
#> 6 Jun 1659 2
#> 7 Jul 1659 81
#> 8 Aug 1659 22
#> 9 Sep 1659 87
#> 10 Oct 1659 1
#> # ... with 110 more rows
Data used
set.seed(1)
df <- cbind(1659:1668, replicate(12, sample(99, 10))) |>
as.data.frame() |>
setNames(c("Year", month.abb))
df
#> Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
#> 1 1659 68 97 89 74 44 2 81 22 87 1 76 43
#> 2 1660 39 85 37 42 25 45 13 93 83 43 39 1
#> 3 1661 1 21 34 38 70 18 40 28 90 59 24 29
#> 4 1662 34 54 99 20 39 22 89 48 48 26 53 78
#> 5 1663 87 74 44 28 51 78 48 33 64 15 92 22
#> 6 1664 43 7 79 96 42 65 96 45 94 58 86 70
#> 7 1665 14 73 33 44 6 70 23 21 60 29 40 28
#> 8 1666 82 79 84 87 24 87 84 31 51 24 83 37
#> 9 1667 59 98 35 70 32 93 29 17 34 42 90 61
#> 10 1668 51 37 70 40 14 75 98 73 10 48 35 46
Created on 2022-11-29 with reprex v2.0.2
I have a csv file like this:
Project Name jan feb mar apr may jun july aug sep oct nov dec
1 proj1 23 56 76 43 22 99 76 878 54 99 43 534
2 proj2 65 68 33 76 34 66 3 78 44 78 44 778
3 proj3 72 35 21 35 38 92 26 58 745 98 67 43
I need a for loop which will loop through this data and give me row wise data so that I will be able to retrieve the count of each month for each project.
Try using rowSums:
drops <- c("Project", "Name")
df$total <- rowSums(df[, !(names(df) %in% drops)])
Hi I have a time series data as follows:
Code Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1001 2009 183 175 151 173 169 169 158 132 91 91 146 114
1001 2010 76 103 130 103 78 72 64 96 89 91 61 62
1001 2011 73 50 99 90 74 112 113 111 112 112 97 137
1001 2012 105 140 160 129 162 161 150 167 151 161 114 120
1001 2013 140 137 153 128 137 137 135 148 116 134 121 95
1001 2014 135 145 144 110 109 130 110 58 100 109 67 66
1015 2009 21 19 19 21 17 29 56 35 46 33 42 45
1015 2010 46 29 55 62 49 48 44 37 39 46 33 39
1015 2011 59 36 52 41 36 38 42 43 37 37 37 35
1015 2012 46 53 55 41 69 41 38 42 37 50 46 48
1015 2013 64 43 58 43 50 39 29 48 45 26 51 55
1015 2014 40 54 64 58 76 59 69 66 57 60 58 55
1031 2009 2408 2370 2799 3460 3263 3102 2769 2749 3018 3283 3343 3193
1031 2010 3130 3069 3776 3348 3341 4129 3920 4131 4152 4044 4241 3522
1031 2011 3454 3768 5217 4242 4624 5105 4712 6064 5546 6049 5957 4670
1031 2012 4959 3554 2163 1274 1452 1248 1303 1278 916 906 522 324
1031 2013 537 442 417 389 469 423 328 246 291 387 201 122
1031 2014 249 203 42 30 29 36 39 16 36 23 11 19
I am trying to find the decomposition of the timeseries by Code column as different Codes have different trends and seasonality during various months.
I tried using data.table but it gives me error. Following is the code that I am using:
sa_data_ssn_cnt_ts <- data.table(sa_data_ssn_cnt_ts)
sa_data_ssn_cnt_si <- sa_data_ssn_cnt_ts[,list(SI = decompose(sa_data_ssn_cnt_ts, type = "multiplicative", filter = NULL)), by = sa_data_ssn_cnt_ts$site_id]
Error that I get:
Error in decompose(sa_data_ssn_cnt_ts, type = "multiplicative", filter = NULL) : time series has no or less than 2 periods
What is it that I am messing up here?
Is there any other way that I can get the decompositions by Code column?
Thanks a lot for the help.
I've a large dataframe which contains 12 columns each for two types of values, Rested and Active. I want to convert the columns of each month into rows, thus bring all the month columns (Jan, Feb, Mar... ) under 'Month'
My data is as follows:
ID L1 L2 Year JR FR MR AR MYR JR JLR AGR SR OR NR DR JA FA MA AA MYA JA JLA AGA SA OA NA DA
1234 89 65 2003 11 34 6 7 8 90 65 54 3 22 55 66 76 86 30 76 43 67 13 98 67 0 127 74
1234 45 76 2004 67 87 98 5 4 3 77 8 99 76 56 4 3 2 65 78 44 53 67 98 79 53 23 65
I'm trying to make it appear as below (column R represents Rested and column A represents Active. and monthly JR, FR, MR respectively means Jan Rested, Feb Rested, Mar Rested and JA, FA, MA respectively means Jan Active, Feb Active, Mar Active and etc):
So, here I'm trying to convert each of the monthly columns to rows and keeping them beside each other for R and A values by creating a new Month column.
ID L1 L2 Year Month R A
1234 89 65 2003 Jan 11 76
1234 89 65 2003 Feb 34 86
1234 89 65 2003 Mar 6 30
1234 89 65 2003 Apr 7 76
1234 89 65 2003 May 8 43
1234 89 65 2003 Jun 90 67
1234 89 65 2003 Jul 65 13
1234 89 65 2003 Aug 54 98
1234 89 65 2003 Sep 3 67
1234 89 65 2003 Oct 22 0
1234 89 65 2003 Nov 55 127
1234 89 65 2003 Dec 66 74
1234 45 76 2004 Jan 67 3
1234 45 76 2004 Feb 87 2
1234 45 76 2004 Mar 98 65
1234 45 76 2004 Apr 5 78
1234 45 76 2004 May 4 44
1234 45 76 2004 Jun 3 53
1234 45 76 2004 Jul 77 67
1234 45 76 2004 Aug 8 98
1234 45 76 2004 Sep 99 79
1234 45 76 2004 Oct 76 53
1234 45 76 2004 Nov 56 23
1234 45 76 2004 Dec 4 65
I've tried various things like stack,melt,unlist
data_reshape <- reshape(df,direction="long", varying=list(c("JR", "FR", "MR", "AR", "MYR", "JR", "JLR", "AGR", "SR", "OR", "NR", "DR", "JA", "FA","MA", "AA", "MYA", "JA", "JLA","AGA", "SA", "OA","NA", "DA")), v.names="Precipitation", timevar="Month")
data_stacked <- stack(data, select = c("JR", "FR", "MR", "AR", "MYR", "JR", "JLR", "AGR", "SR", "OR", "NR", "DR", "JA", "FA","MA", "AA", "MYA", "JA", "JLA","AGA", "SA", "OA","NA", "DA"))
but their result is not quite expected - they are giving Jan values of all years, and then Feb values of all years, and then March values of all years, and etc. But I want to structure them in an proper monthly manner for each Year for each ID existing in the entire dataset.
How to achieve this in R?
Here's a base reshape approach:
res <- reshape(mydf, direction="long", varying=list(5:16, 17:28), v.names=c("R", "A"), times = month.name, timevar = "Month")
res[with(res, order(ID, -L1, L2, Year)), -8]
Here's a possible solution using the devel version of data.table
library(data.table) ## v >= 1.9.5
res <- melt(setDT(df),
id = 1:4, ## id variables
measure = list(5:16, 17:ncol(df)), # a list of two groups of measure variables
variable = "Month", # The name of the additional variable
value = c("R", "A")) # The names of the grouped variables
setorder(res, ID, -L1, L2, Year) ## Reordering the data to match the desired output
res[, Month := month.abb[Month]] ## You don't really need this part as you already have the months numbers
# ID L1 L2 Year Month R A
# 1: 1234 89 65 2003 Jan 11 76
# 2: 1234 89 65 2003 Feb 34 86
# 3: 1234 89 65 2003 Mar 6 30
# 4: 1234 89 65 2003 Apr 7 76
# 5: 1234 89 65 2003 May 8 43
# 6: 1234 89 65 2003 Jun 90 67
# 7: 1234 89 65 2003 Jul 65 13
# 8: 1234 89 65 2003 Aug 54 98
# 9: 1234 89 65 2003 Sep 3 67
# 10: 1234 89 65 2003 Oct 22 0
# 11: 1234 89 65 2003 Nov 55 127
# 12: 1234 89 65 2003 Dec 66 74
# 13: 1234 45 76 2004 Jan 67 3
# 14: 1234 45 76 2004 Feb 87 2
# 15: 1234 45 76 2004 Mar 98 65
# 16: 1234 45 76 2004 Apr 5 78
# 17: 1234 45 76 2004 May 4 44
# 18: 1234 45 76 2004 Jun 3 53
# 19: 1234 45 76 2004 Jul 77 67
# 20: 1234 45 76 2004 Aug 8 98
# 21: 1234 45 76 2004 Sep 99 79
# 22: 1234 45 76 2004 Oct 76 53
# 23: 1234 45 76 2004 Nov 56 23
# 24: 1234 45 76 2004 Dec 4 65
Installation instructions:
library(devtools)
install_github("Rdatatable/data.table", build_vignettes = FALSE)
This is an inelegant solution, but I'm going to post it just to show how problems can be solved with basic tools without relying on high level functions when the task doesn't necessarily require them. I think that the more tools you have, the more you can approach correctly to problems. Here we are:
#extract the data part
data<-t(as.matrix(df[,5:28]))
#build the data.frame cbinding the needed columns
res<-cbind(df[rep(1:nrow(df),each=12),1:4], #this repeats the first 4 columns 12 times each
Month=month.abb, #the month column
R=as.vector(data[1:12,]), # the R column, obtained from the first 12 rows of data
A=as.vector(data[13:24,])) #as above
rownames(res)<-NULL #just to remove the row names
I have a large data frame where some rows have repeated values in some of their columns. I want to keep the repeated values and sum those which are different. Below there is a sample of my data:
data<-data.frame(season=c(2008,2009,2010,2011,2011,2012,2000,2001),
lic=c(132228,140610,149215,158559,158559,944907,37667,45724),
client=c(174,174,174,174,174,174,175,175),
qtty=c(31,31,31,31,31,31,36,26),
held=c(60,65,58,68,68,70,29,23),
catch=c(7904,6761,9236,9323.2,801,NA,2330,3594.5),
potlift=c(2715,2218,3000,3887,750,NA,2314,3472))
.
season lic client qtty held catch potlift
2008 132228 174 31 60 7904 2715
2009 140610 174 31 65 6761 2218
2010 149215 174 31 58 9236 3000
2011 158559 174 31 68 9323.2 3887
2011 158559 174 31 68 801 750
2012 944907 174 31 70 NA NA
2000 37667 175 36 29 2330 2314
2001 45724 175 26 23 3594.5 3472
Note that the season 2011 is repeated, each variable (client... held), except catch and potlift. I need to keep the values of (client... held) and sum catch and potlift; therefore my new data frame should be like the example below:
season lic client qtty held catch potlift
2008 132228 174 31 60 7904 2715
2009 140610 174 31 65 6761 2218
2010 149215 174 31 58 9236 3000
2011 158559 174 31 68 10124.2 4637
2012 944907 174 31 70 NA NA
2000 37667 175 36 29 2330 2314
2001 45724 175 26 23 3594.5 3472
I have attempted to do so using aggregate, but this function sum everything. Any help will be appreciated.
data$catch <- with(data, ave(catch,list(lic,client,qtty,held),FUN=sum))
data$potlift <- with(data, ave(potlift,list(lic,client,qtty,held),FUN=sum))
unique(data)
season lic client qtty held catch potlift
1 2008 132228 174 31 60 7904.0 2715
2 2009 140610 174 31 65 6761.0 2218
3 2010 149215 174 31 58 9236.0 3000
4 2011 158559 174 31 68 10124.2 4637
6 2012 944907 174 31 70 NA NA
7 2000 37667 175 36 29 2330.0 2314
8 2001 45724 175 26 23 3594.5 3472
aggregate seems to work fine for me, but I'm not sure what you were trying:
> aggregate(cbind(catch, potlift) ~ ., data, sum, na.action = "na.pass")
season lic client qtty held catch potlift
1 2001 45724 175 26 23 3594.5 3472
2 2000 37667 175 36 29 2330.0 2314
3 2010 149215 174 31 58 9236.0 3000
4 2008 132228 174 31 60 7904.0 2715
5 2009 140610 174 31 65 6761.0 2218
6 2011 158559 174 31 68 10124.2 4637
7 2012 944907 174 31 70 NA NA
Here, use cbind to identify the columns that you want to aggregate by. You can then specify all the other columns, or just use . to indicate "use all other columns not mentioned in the cbind call.