Seasonality by day of month - r

I want to check for seasonality in a time series by the day of the month.
The problem is that the months are not of equal length (or frequency) - there are months with 31, 28 & 30 days.
When declaring the ts object I can only specify a fixed frequency so it wont be correct.
> x <- data.frame(d = as.Date("2013-01-01") + 1:365 , v = runif(365))
> tapply(as.numeric(format(x$d,"%d")) , format(x$d,"%m") , max)
01 02 03 04 05 06 07 08 09 10 11 12
31 28 31 30 31 30 31 31 30 31 30 31
How can I create a time series object in r that i can later decompose and check for seasonality ?
Is it possible to create a pivot table and convert it into a ts ?

Related

Get last day of month from YYYYMM (year/month) variable in R [duplicate]

This question already has answers here:
Converting year and month ("yyyy-mm" format) to a date?
(9 answers)
Converting yearmon column to last date of the month in R
(3 answers)
Closed 1 year ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
I would like to get the last day of the month from a year/month variable that is formated as integer YYYYMM.
yearmonth <- c(seq(202001,202012),
seq(202101,202112))
The output I am looking for is below. For instance, the last day of Feb/2020 was 29 (2020 was a leap year) whereas the last day of Feb/2021 was 28.
last <- c(31, 29, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31,
31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31)
Ideally I would like to use the lubridate package.
Base R:
Try using:
day(as.Date(cut(as.Date(paste0(yearmonth, "01"), format = "%Y%m%d") + 32, 'month')) - 1)
Output:
31 29 31 30 31 30 31 31 30 31 30 31 31 28 31 30 31 30 31 31 30 31 30 31
Explanation:
This adds 32 days to all dates and uses the cut function to cut it by month (get the first day of each month). The after that, it subtracts 1 from the dates, which will give the last day of the original month
Update:
Please notice akrun's comment, where we can use the truncated argument of th ymd() function to declare the number of formats that can be truncated:
days_in_month(ymd(yearmonth, truncated = 1))
First answer:
Here is a lubridate solution:
construct date element such as year, month and day
use make_date() to get a date class
Then use days_in_month() function from lubridate
library(lubridate)
my_year <- substr(yearmonth,1,4)
my_month <- as.integer(substr(yearmonth,5,6))
my_day <- rep(1, length(my_year))
days_in_month(make_date(my_year, my_month, my_day))
# you can wrape around `unname` to get vector without names
Output:
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul
31 29 31 30 31 30 31 31 30 31 30 31 31 28 31 30 31 30 31
Aug Sep Oct Nov Dec
31 30 31 30 31
# without names:
unname(days_in_month(make_date(my_year, my_month, my_day)))
[1] 31 29 31 30 31 30 31 31 30 31 30 31 31 28 31 30 31 30 31 31 30 31 30 31
Using lubridate, I add one month to the first day of each provided month, and then I subtract one day. This date should be the last day of your months.
yearmonth <- c(seq(202001,202012),
seq(202101,202112))
yearmonthday <- as.Date(paste0(yearmonth, "01"), format = "%Y%m%d")
library(lubridate)
last <- as.numeric(format(yearmonthday + months(1) - days(1), format = "%d"))
last
Using zoo's as.yearmon -
yearmonth |>
as.character() |>
zoo::as.yearmon('%Y%m') |>
as.Date(frac = 1) |>
format('%d')
#[1] "31" "29" "31" "30" "31" "30" "31" "31" "30" "31" "30" "31" "31" "28"
#[15] "31" "30" "31" "30" "31" "31" "30" "31" "30" "31"
Using lubridate's ceiling_date function.
library(lubridate)
format(ceiling_date(ymd(paste0(yearmonth, '01')), 'month') - 1, '%d')
Using lurbidate:
library(lubridate)
day(ceiling_date(as.Date(paste0(yearmonth,'01'), format = '%Y%m%d'), unit = 'month') - 1)
[1] 31 29 31 30 31 30 31 31 30 31 30 31 31 28 31 30 31 30 31 31 30 31 30 31

Converting Month character to date for time series without "0" before Month

How do I convert this data set into a time series format in R? Lets call the data set Bob. This is what it looks like
1/2013 25
2/2013 865
3/2013 26
4/2013 33
5/2013 74
6/2013 24
Are you looking for something like this....?
> dat <- read.table(text = "1/2013 25
2/2013 865
3/2013 26
4/2013 33
5/2013 74
6/2013 24
", header=FALSE) # your data
> ts(dat$V2, start=c(2013, 1), frequency = 12) # time series object
Jan Feb Mar Apr May Jun
2013 25 865 26 33 74 24
Assuming that your starting point is the data frame DF defined reproducibly in the Note at the end this converts it to a zoo series z as well as a ts series tt.
library(zoo)
z <- read.zoo(DF, FUN = as.yearmon, format = "%m/%Y")
tt <- as.ts(z)
z
## Jan 2013 Feb 2013 Mar 2013 Apr 2013 May 2013 Jun 2013
## 25 865 26 33 74 24
tt
## Jan Feb Mar Apr May Jun
## 2013 25 865 26 33 74 24
Note
Lines <- "1/2013 25
2/2013 865
3/2013 26
4/2013 33
5/2013 74
6/2013 24"
DF <- read.table(text = Lines)

How to filter a dataframe wrt to a particular value range of a column using R?

input<-read. csv("aggregate. csv")
The csv looks like:
TimeStamp Latency Threads
7:00.06 AM 20 19
7:00.09 AM 28 18
7:00.15 AM 26 19
7:04:51 AM 45 20
7:05.07 AM 05 23
7:00.25 AM 15 24
7:10.01 AM 24 25
7:20.01 AM 35 50
8:00:10 AM 05 51
8:00:52 AM 50 10
8:05:00 AM 12 09
8:10:00 AM 100 01
But the problem I am facing is I want to filter out the input dataframe by giving user input as TimeStamp column. Means the console should askEnter your time range
Suppose If I enter bw 7:00:01 AM to 7:05:00 AM. then it should filter the dataframe according to that.
The output should be like..
TimeStamp Latency Threads
7:00.06 AM 20 19
7:00.09 AM 28 18
7:00.15 AM 26 19
7:04:51 AM 45 20
Is it possible?
I posted here because filtering it was getting hard-coded but I want that to be user input.
Please help
You can try strptime function. It will transfer a character into a POSIXlt object, which is a Date-Time Classes. After the transformation, you can compare the times by time 1 < time 2, etc. You can find more details by ?strptime.
One more thing bothering is that your TimeStamp column has multiple formats: 7:00.06 AM and 7:04:51 AM. You probably need to pre-process the format by gsub
Suppose you got your input range is from lower to upper, then
input <- read.csv("C:/Users/s0043102/Desktop/test.csv",stringsAsFactors = FALSE)
adj_TimeStamp <- gsub("\\.",":",input$TimeStamp)
adj_TimeStamp <- strptime(adj_TimeStamp ,format="%H:%M:%S %p")
lower <- strptime("7:00:01 AM",format="%H:%M:%S %p")
upper <- strptime("7:05:00 AM",format="%H:%M:%S %p")
output <- subset(input, adj_TimeStamp<=upper & adj_TimeStamp>=lower)
output
TimeStamp Latency Threads
1 7:00.06 AM 20 19
2 7:00.09 AM 28 18
3 7:00.15 AM 26 19
4 7:04:51 AM 45 20
6 7:00.25 AM 15 24

I got different language result in r

If I write code like
everyday = seq(from=as.Date('2005-1-1'), to=as.Date('2005-12-31'), by='day')
cmonth = format(everyday, '%B')
table(cmonth)
cmonth
10월 11월 12월 1월 2월 3월 4월 5월 6월 7월 8월 9월
31 30 31 31 28 31 30 31 30 31 31 30
I get result in korean, but i want
October November December January February March ...
like this in eng. how can i change that

R find and create an index "1" or "0"

I have a data frame in R that contains one column named H:
H Index
11
11
11
11
12
12
12
13
13
14
14
15
15
15
16
17
18
19
20
20
20
21
22
23
00
00
00
01
01
02
03
04
04
04
04
05
06
07
07
07
08
09
09
09
10
11
12
How can I create a new column filled with 1 for H ranged from 10 to 18 (e.q., 10, 11, 12, 13, 14, 15, 16, 17 and 18) and filled with 0 for H from 19 to 09 (e.q., 19, 20, 21, 22, 23, 00, 01, 02, 03, 04, 05, 06, 07, 08 and 09)?
Thanks a lot.
We could also do
df$Index <- +(df$H<19 & df$H>9)
Or with ifelse
df$Index <- ifelse(df$H < 19 & df$H >9, 1, 0)
If the 'H' column is character, we convert it to numeric
df$H <- as.numeric(df$H)
Or if it is factor
df$H <- as.numeric(as.character(df$H))
and then perform the operations mentioned above
df$Index <- +(df$H < 19 & df$H >9)
This is easy as you need the value based on a range. If df is the dataframe,
df$H<19 & df$H>9
will give you a vector of True/False testing if the value is in the range from 10 to 18 or not. Using the as.integer function, you can cast this to 1s and 0s.1
df$Index <- as.integer(df$H<19 & df$H>9)
If the column is a character vector, we can first cast to a numeric value before doing the test
df$Index <- as.integer( as.integer(df$H)<19 & as.integer(df$H)>9)
If the value is not an integer, we can use as.numeric instead to do the inner casts.
1 This works because according to help(logical), True is coerced to 1 and False is coerced to 0 when called in a numerical context, and as.integer will follow those coercion rules. We could have manually done this coercion as well with the ifelse function as ifelse(df$H<19&df$H>9,1,0) which examines each element in this logical vector and uses a 1 if it is true or a 0 if it is false.

Resources