How to split a TeraDataSQL column after a number of values - teradata

I am trying to split a SQL column from weeks to months. The column reads the year, then the week number from that year.
For example:
week_ID
201742
Which means, 2017, and the 42nd week of that year.
I am trying to get the column to split after the year so that the year will be in one column and the week number in a separate.
For example:
Year Week
2017 42
I will then be using the week column to set the week number equal to a month.

Assuming that week_ID is defined as INTEGER:
week_ID / 100 as yr, week_ID MOD 100 as wk
DECIMAL(6,0):
cast(week_ID as int) / 100 as yr, week_ID MOD 100 as wk
CHAR(6):
substring(week_ID from 1 for 4) as yr, substring(week_ID from 5) as wk

Related

Question for calculating the mean date only with month and day

I have the following dataset, and I would like to have the average date (Month and day) for each (phenology) pheno and station across years. It seems I can directly use the mean function to calculate the mean for the date format objects. However, if I convert the month day to date, with function as.Date, then the year is added, and the average date is not independent of years. How can I directly calculate the mean date only based on Month and day?
You cannot compute a "mean month + day" independet of the year, since not every year has the same number of days. So you need to choose a fixed year for your computations.
Then you can:
Create "dummy" date objects which have the correct month and day, but the previously select year.
Compute the mean of those dummies
Extract month and day from the result (remove the year)
You can use the yday function from the lubridate package to convert each date into the day of the year for that year then average the day of the year for each Pheno. The conversion of the day of the year to a month and day depends upon whether your want the date in a leap year or non leap year. I report both dates.
The code looks like:
library(tidyverse)
library(lubridate)
#
# calculate average day of year
#
average_doy <- df %>% mutate(day_of_year = yday(as.Date(paste(Year,Month,Day, sep="-")))) %>%
group_by(Pheno) %>%
summarize(avg_doy = round(mean(day_of_year,0)))
# set base years
non_leap_year <- 2003
leap_year <- 2004
#
# convert day of year to average day using base years
#
averages <- average_doy %>%
mutate(avg_non_leap_year_mon_day = paste(avg_doy, non_leap_year, sep="_") %>%
as.Date(format = "%j_%Y") %>%
str_remove(paste0(non_leap_year,"-")),
avg_leap_year_mon_day = paste(avg_doy, leap_year, sep="_") %>%
as.Date(format = "%j_%Y") %>%
str_remove(paste0(leap_year,"-") ))
Using the first seven rows of your data, this gives
# A tibble: 3 x 4
Pheno avg_doy avg_non_leap_year_mon_day avg_leap_year_mon_day
<chr> <dbl> <chr> <chr>
1 Dormant 348 12-14 12-13
2 Tillering 343 12-09 12-08
3 Turning green 48 02-17 02-17

Yearweek is parsed wrongly in R

Problem: I am facing the problem that R parses a date (30 December 2019) into yearweek wrongly (Output: 2019 W01). I do not know why this is happening. Any suggestions what to change/alternative way of coding?
format(lubridate::ymd("2019-12-30"), "%Y W%V")
# Output
# 2019 W01
# Desired Output:
# 2019 W52
From the strptime documentation:
%U
Week of the year as decimal number (00–53) using Sunday as the first day 1 of the
week (and typically with the first Sunday of the year as day 1 of week 1). The US
convention.
%V
Week of the year as decimal number (01–53) as defined in ISO 8601. If the week
(starting on Monday) containing 1 January has four or more days in the new year,
then it is considered week 1. Otherwise, it is the last week of the previous year,
and the next week is week 1. (Accepted but ignored on input.)
%W
Week of the year as decimal number (00–53) using Monday as the first day of week
(and typically with the first Monday of the year as day 1 of week 1). The UK
convention.
It sounds like you may want either %U or %W, depending on whether you want to treat Sunday or Monday as the start of the week.
Note however that these can result in values between 00 and 53, which is a consequence of fixing the start of the week to a particular weekday (either Sunday or Monday). Doing that means that there can actually be a partial week at the start and at the end of the year.
If you prefer to count based on week number 1 beginning on the first day of the year, you can use the function lubridate::week.
For example:
library(lubridate)
year_week <- function(date) paste0(year(date), ' W', week(date))
year_week(ymd("2019-01-01"))
# Result: "2019 W1"
year_week(ymd("2019-12-30"))
# Result: "2019 W52"
After some more research I found that this is the best solution:
format(lubridate::ymd("2019-12-30"), "%G W%V")
Use %G instead of %Y to reflect that the week-based year (%G and %g) may differ from the calendar year (%Y and %y).
See also: https://community.rstudio.com/t/converting-week-number-and-year-into-date/27202/2

Convert day of year to date assuming all years are non-leap years

I have a df with year and day of year as columns:
dat <- data.frame(year = rep(1980:2015, each = 365), day = rep(1:365,times = 36))
Please note that I am assuming 365 days in a year even if it is a leap year. I need to generate two things:
1) month
2) date
I did this:
# this tells me how many days in each month
months <- list(1:31, 32:59, 60:90, 91:120, 121:151, 152:181, 182:212, 213:243, 244:273, 274:304, 305:334, 335:365)
library(dplyr)
# this assigns each day to a month
dat1 <- dat %>% mutate(month = sapply(day, function(x) which(sapply(months, function(y) x %in% y))))
I want to produce a third column which is a date in the format year,month,day.
However, since I am assuming all years are non-leap years, I need to ensure that my dates also reflect this i.e. there should be no date as 29th Feb.
The reason I need to generate the date is because I want to generate number
of 15 days period of a year. A year will have 24 15-days period
1st Jan - 15th Jan: 1 period
16th Jan- 31st Jan: 2 period
1st Feb - 15th Feb: 3 period....
16th till 31st dec: 24th period)
I need dates to specify whether a day in a month falls in the first
half (i.e.d day <= 15) or second quarter (day > 15). I use the following
script to do this:
dat2 <- dat1 %>% mutate(twowk = month*2 - (as.numeric(format(date,"%d")) <= 15))
In order for me to run this above line, I need to generate date and hence my question.
A possible solution:
dat$dates <- as.Date(paste0(dat$year,'-',
format(strptime(paste0('1981-',dat$day), '%Y-%j'),
'%m-%d'))
)
What this does:
With strptime(paste0('1981-',dat$day), '%Y-%j') you get the dates of a non-leap year.
By embedding that in format with '%m-%d' you extract the month and the day in the month.
paste that together with the year in the year-column and wrap that in as.Date to get a non-leap-year date.

R: Get workweek number, not seven day periods since Jan 1st

Hi I am looking at data to do with prices of commodities throughout a period of a few years. I want to summarize prices by work weeks, not weeks defined by seven day periods since Jan 1st. When I tried:
data <- mutate(data, week = week(strptime(Date, "%m/%d/%Y")))
The lubridate week() function counts "1/13/10" (mdy) as week 2 and "1/14/10" as week 3. I want those to be in the same week. Basically any run of mon-fri in the same week. If the year starts on a wednesday I want week1 to be wed-fri, week2 to start the next monday. I have no data on any weekends. Any thoughts? Thanks
This will give you week number assuming Date column is in Date format (you can use as.Date() to convert):
data <- mutate(data, week = format(Date, '%U'))
If you want week and year, you can use:
data <- mutate(data, week = format(Date, '%Y-%U'))
It will correctly number partial weeks.
Note: week number starts with 00 (but, that should be no problem).
You can also do it WITHOUT dplyr and it's mutate, like this:
data$week <- format(data$Date, '%U')

Calculating the number of weeks for each year based on dates using R

I have a dataset with dates of 2 different years (2009 and 2010) and would like to have the corresponding week number for each date.
My dataset is similar to this:
anim <- c(012,023,045,098,067)
dob <- c("01-09-2009","12-09-2009","22-09-2009","10-10-2010","28-10-2010")
mydf <- data.frame(anim,dob)
mydf
anim dob
1 12 01-09-2009
2 23 12-09-2009
3 45 22-09-2009
4 98 10-10-2010
5 67 28-10-2010
I would like to have variable "week" in the third column with the corresponding week numbers for each date.
EDIT:
Note: Week one begins on January 1st, week two begins on January 8th for each year
Any help would be highly appreciated.
Baz
Your definition of "week of year"
EDIT: Note: Week one begins on January 1st, week two begins on January 8th for each year
differs from the standard ones supported by strftime:
%U
Week of the year as decimal number (00–53) using Sunday as the first day 1
of the week (and typically with the first Sunday of the year as day 1 of
week 1). The US convention.
%W
Week of the year as decimal number (00–53) using Monday as the first day
of week (and typically with the first Monday of the year as day 1 of week
1). The UK convention.
So you need to compute it based on the day-of-year number.
mydf$week <- (as.numeric(strftime(as.POSIXct(mydf$dob,
format="%d-%m-%Y"),
format="%j")) %/% 7) + 1
Post 2011 Answer
library(lubridate)
mydf$week <- week(mydf$week)
lubridate package is straight-forward for day-to-day tasks like this.
If you want to do how many weeks (or 7 day periods) have passed between your date of interest and the first day of the year, regardless of what day of the week it was on the first of the year, the following is a solution (using floor_date from lubridate).
mydf$weeks <- difftime(mydf$dob, floor_date(mydf$dob, "year"), units = c("weeks")))

Resources