Month Calculate every 28 Days - plsql

is there any way in Oracle that My month start after every 28 days
Example
24-dec- 2015 to 20-jan-16 ( we mention Dec 2015)
21-jan-16 to 17-feb-16 (we mention Jan 16)

select rownum as month_number
,day1 + (rownum-1) * 28 as gregorian_month_start
,day1 + rownum * 28 - 1 as gregorian_month_end
from (select date'2015-12-24' day1
from dual connect by level <= 13);
1 24/DEC/2015 20/JAN/2016
2 21/JAN/2016 17/FEB/2016
3 18/FEB/2016 16/MAR/2016
4 17/MAR/2016 13/APR/2016
5 14/APR/2016 11/MAY/2016
6 12/MAY/2016 08/JUN/2016
7 09/JUN/2016 06/JUL/2016
8 07/JUL/2016 03/AUG/2016
9 04/AUG/2016 31/AUG/2016
10 01/SEP/2016 28/SEP/2016
11 29/SEP/2016 26/OCT/2016
12 27/OCT/2016 23/NOV/2016
13 24/NOV/2016 21/DEC/2016
Note: this doesn't handle the 365th day for normal years, or 366th day for leap years. You would need to specify which month these should be added to.

Related

Calculate Cumulative sum for previous 6 months

RECORD
ATTRIBUTE
DATE
MONTH
AMT
CML AMT
1
A
1/1/2021
1
10
10
2
A
2/1/2021
2
10
20
3
A
3/1/2021
3
10
30
4
A
4/1/2021
4
10
40
5
A
5/1/2021
5
10
50
6
A
6/1/2021
6
10
60
7
B
1/1/2021
1
20
20
8
B
3/1/2021
3
20
40
9
B
5/1/2021
5
20
60
10
B
7/1/2021
7
20
80
11
B
9/1/2021
9
20
80
12
B
11/1/2021
11
20
80
13
C
1/1/2021
1
30
30
14
C
8/1/2021
8
30
30
15
C
9/1/2021
9
30
60
I am looking to calculate the cumulative sum (CML AMT column) using the AMT column for the past 6 months.
The CML AMT column should only look at window of 6 Months.
If there is no other record for the same attribute within a 6 month time frame, then it should simply return the AMT column.
I tried the below which clearly wont work as the dates/months are not consistent.
Any help will be appreciated.
SUM(AMT)
OVER (PARTITION BY ATTRIBUTE
ORDER BY DATE
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW)
Unfortunately Teradata doesn't support RANGE, but if you need to sum over a small number of values only (six months = up to six rows) you can apply a brute-force-approach:
AMT
+
CASE WHEN LAG(DATE,1) OVER (PARTITION BY ATTRIBUTE ORDER BY DATE) >= ADD_MONTHS(DATE,-6)
THEN LAG(AMT,1) OVER (PARTITION BY ATTRIBUTE ORDER BY DATE)
ELSE 0
END
+
CASE WHEN LAG(DATE,2) OVER (PARTITION BY ATTRIBUTE ORDER BY DATE) >= ADD_MONTHS(DATE,-6)
THEN LAG(AMT,2) OVER (PARTITION BY ATTRIBUTE ORDER BY DATE)
ELSE
END
+
...
Looks ugly, but it's mostly cut&paste&modify and still a single step in Explain. Other possible solutions would be based on an additional EXPAND ON or time-series aggregation step.

How to query NOAA for historical daily temperature averages using rnoaa?

I'm trying to find the historical average temperature between a range of dates using NOAA data and comparing to the long term average temperatures.
I'm using the rnoaa package and have hit a bit of a snag. For long term averages, I have been successful using the following syntax:
library('rnoaa')
start_date = "2010-01-15"
end_date = "2010-11-14"
station_id = "USW00093738"
weather_data <- ncdc(datasetid='NORMAL_DLY', stationid=paste0('GHCND:',station_id),
datatypeid='dly-tavg-normal',
startdate = start_date, enddate = end_date,limit=365)
This lets me parse weather_data$data for the long term average temperatures for that given station between January 15th and November 14th.
However, I can't seem to find the right dataset or datatype for historical average temperatures. I'd like to get the same data as the code above except with the actual daily average temperatures for those days. Any idea how to query this? I've been at it for a few hours and have had no luck.
Something I tried was the following:
weather_data <- ncdc(datasetid='GHCND', stationid=paste0('GHCND:',station_id),
startdate = start_date, enddate = end_date,limit=365)
uniq_d_types = unique(weather_data$data$datatype)
View(uniq_d_types)
This let me see the unique data types in the GHCND dataset but none of the data types seemed to be daily average temperatures. Any thoughts?
In order to obtain average daily actual temperatures from the NOAA data using the rnoaa package, one must use the hourly data and aggregate it by day. Hourly NOAA data is in the NORMAL_HLY data set, and the required data type is HLY-TEMP-NORMAL.
library('rnoaa')
library(lubridate)
options(noaakey = "obtain key from NOAA website")
start_date = "2010-01-15"
end_date = "2010-01-31"
station_id = "USW00093738"
weather_data <- ncdc(datasetid='NORMAL_HLY', stationid=paste0('GHCND:',station_id),
datatypeid = "HLY-TEMP-NORMAL",
startdate = start_date, enddate = end_date,limit=500)
data <- weather_data$data
data$year <- year(data$date)
data$month <- month(data$date)
data$day <- day(data$date)
# summarize to average daily temps
aggregate(value ~ year + month + day,mean,data = data)
...and the output:
> aggregate(value ~ year + month + day,mean,data = data)
year month day value
1 2010 1 15 323.5417
2 2010 1 16 322.8750
3 2010 1 17 323.4167
4 2010 1 18 323.7500
5 2010 1 19 323.2083
6 2010 1 20 321.0833
7 2010 1 21 318.4167
8 2010 1 22 317.6667
9 2010 1 23 319.0000
10 2010 1 24 321.0833
11 2010 1 25 323.5417
12 2010 1 26 326.0833
13 2010 1 27 328.4167
14 2010 1 28 330.9583
15 2010 1 29 333.2917
16 2010 1 30 335.7917
17 2010 1 31 308.0000
>
Note that temperatures are stored in tenths of degrees in this data set, so for the period between January 15th and 31st 2010, the average daily temperatures at the Dulles International Airport weather station were between 30.8 degrees and 33.5 degrees.
Also note that to calculate the average by stationId and run across multiple weather stations, simply add station to the aggregate() function.
> # summarize to average daily temps by station
> aggregate(value ~ station + year + month + day,mean,data = data)
station year month day value
1 GHCND:USW00093738 2010 1 15 323.5417
2 GHCND:USW00093738 2010 1 16 322.8750
3 GHCND:USW00093738 2010 1 17 323.4167
4 GHCND:USW00093738 2010 1 18 323.7500
5 GHCND:USW00093738 2010 1 19 323.2083
6 GHCND:USW00093738 2010 1 20 321.0833
7 GHCND:USW00093738 2010 1 21 318.4167
8 GHCND:USW00093738 2010 1 22 317.6667
9 GHCND:USW00093738 2010 1 23 319.0000
10 GHCND:USW00093738 2010 1 24 321.0833
11 GHCND:USW00093738 2010 1 25 323.5417
12 GHCND:USW00093738 2010 1 26 326.0833
13 GHCND:USW00093738 2010 1 27 328.4167
14 GHCND:USW00093738 2010 1 28 330.9583
15 GHCND:USW00093738 2010 1 29 333.2917
16 GHCND:USW00093738 2010 1 30 335.7917
17 GHCND:USW00093738 2010 1 31 308.0000
>
The answer is to grab historical (meaning actual, on the day specified-- not long term average) weather data from the NOAA's ISD database. USAF and WBAN values can be found by looking through the isd-history.csv file found here:
ftp://ftp.ncdc.noaa.gov/pub/data/noaa
Here's an example query.
out <- isd(usaf='724030', wban = '93738', year=2018)
This will grab a years worth of ~hourly weather data from ISD mapping. You can then parse/process this data however you see fit (e.g. for daily average temperatures like I did).

find number of customers added each month

customer_id transaction_id month year
1 3 7 2014
1 4 7 2014
2 5 7 2014
2 6 8 2014
1 7 8 2014
3 8 9 2015
1 9 9 2015
4 10 9 2015
5 11 9 2015
2 12 9 2015
I am well familiar with R basics. Any help will be appreciated.
the expected output should look like following:
month year number_unique_customers_added
7 2014 2
8 2014 0
9 2015 3
In the month 7 and year 2014, only customers_id 1 and 2 are present, so number of customers added is two. In the month 8 and year 2014, no new customer ids are added. So there should be zero customers added in this period. Finally in year 2015 and month 9, customer_ids 3,4 and 5 are the new ones added. So new number of customers added in this period is 3.
Using data.table:
require(data.table)
dt[, .SD[1,], by = customer_id][, uniqueN(customer_id), by = .(year, month)]
Explanation: We first remove all subsequent transactions of each customer (we're interested in the first one, when she is a "new customer"), and then count unique customers by each combination of year and month.
Using dplyr we can first create a column which indicates if a customer is duplicate or not and then we group_by month and year to count the new customers in each group.
library(dplyr)
df %>%
mutate(unique_customers = !duplicated(customer_id)) %>%
group_by(month, year) %>%
summarise(unique_customers = sum(unique_customers))
# month year unique_customers
# <int> <int> <int>
#1 7 2014 2
#2 8 2014 0
#3 9 2015 3

last 6 months total in Teradata

I have to calculate total quantity sold for last 6 months. For example in case of January 2018 , I have to calculate told quantity sold from July - Dec 2017. This total should be grouped by primary key.
Thanks
Primary Key Date qty last 6 months quantity sold
1 1-Oct 4 0
1 1-Nov 10 4
1 5-Dec 20 14
1 1-Jan 3 34
1 1-Sep 88 0
You can calculate the range using ADD_MONTHS plus TRUNC:
WHERE datecole BETWEEN Trunc(Add_Months(Current_Date, -6), 'mon')
AND Trunc(Current_Date, 'mon') -1

Conditional cumulative subtraction

This is what my data.table looks like:
library(data.table)
dt <- fread('
Year Total Shares Balance
2017 10 1 10
2016 12 2 9
2015 10 2 7
2014 10 3 6
2013 10 NA 3
')
**Balance** is my desired column. I am trying to find the cumulative subtractions by taking the first value of Total which is 10(it should also be the first value of Balance field) and then cumulatively subtracting values in Shares. So the second value is 10-1 =9 and the third value is 9-2 = 7 and such. There is one condition, if the Year is 2014, then subtract the Shares value after dividing it by 2. so the fourth value is 7-(2/2)=6 and the fifth value is 6-3=3. I want to end the calc as of the last row.
My attempt is:
dt[, Balance:= ifelse( Year == 2014, cumsum(Total[1]-Shares/2), cumsum(Total[1] - Shares))]
Here is one method.
dt[, Balance2 := Total[1] - cumsum(shift(Shares * (1 - (0.5 *(Year == 2015))), fill=0))]
shift is used to create a lag variable, and the first element is filled with 0, using fill=0. The other elements are calculated as Shares * (1 - (0.5 *(Year == 2015))) which return Shares except when Years == 2015, in which case Shares * 0.5 is returned.
which returns
dt
Year Total Shares Balance Balance2
1: 2017 10 1 10 10
2: 2016 12 2 9 9
3: 2015 10 2 7 7
4: 2014 10 3 6 6
5: 2013 10 NA 3 3
FWIW, I wanted to provide a functional alternative that would allow for more flexible calculations in the cumulative differences, indexing, etc. I also have read in the data with read.table.
dt <- read.table(header=TRUE, text='
Year Total Shares Balance
2017 10 1 10
2016 12 2 9
2015 10 2 7
2014 10 3 6
2013 10 NA 3
')
makeNewBalance <- function(dt) {
output <- NULL
for (i in 1:nrow(dt)) {
if (i==1) {
output[i] <- dt$Total[i]
} else {
output[i] <- output[i-1] - as.integer(ifelse(dt$Year[i]==2014,
dt$Shares[i-1]/2,
dt$Shares[i-1]))
}
}
return(output)
}
dt$NewBalance <- makeNewBalance(dt)
which also returns
> dt
Year Total Shares Balance NewBalance
1 2017 10 1 10 10
2 2016 12 2 9 9
3 2015 10 2 7 7
4 2014 10 3 6 6
5 2013 10 NA 3 3

Resources