Group by Week with Columns for each day of week - asp.net

I'm trying to created a report for my asp.net application which will show the quantity of each item in combination with unit that was ordered for each day of the week. The days of the week are columns.
To be more specific:
I have two table, one is the Orders table with order id, customer name, date etc...
The second table is OrderItems, this table has order id as a foreign key, order Item id, item name, unit (exp: each, box , case), quantity, and price.
When a user picks a date range for the report, for example from 3/2/12 to 4/2/12, on my asp page, the report will group order items by week and will look as follows:
**week (1) starting from sunday of such date to saturday of such date**
item | unit | Sun | Mon | Tues | Wedn | Thur | Fri | Sat | Total Price for week
item1 | bag | 3 | 0 | 12 | 8 | 45 | 1 | 4 | $1234
item4 | box | 2 | 4 | 5 | 0 | 5 | 2 | 6 | $1234
**week (2) starting from sunday of such date to saturday of such date**
item | unit | Sun | Mon | Tues | Wedn | Thur | Fri | Sat | Total Price for week
item1 | bag | 3 | 0 | 12 | 8 | 45 | 1 | 4 | $1234
item4 | box | 2 | 4 | 5 | 0 | 5 | 2 | 6 | $2354
**week (2) starting from sunday of such date to saturday of such date**
item | unit | Sun | Mon | Tues | Wedn | Thur | Fri | Sat | Total Price for week
item1 | bag | 3 | 0 | 12 | 8 | 45 | 1 | 4 | $1234
item4 | box | 2 | 4 | 5 | 0 | 5 | 2 | 6 | $2354
I wish I could have something to show that I have already started, but crystal isn't my strong point and I dont even know where start tackling this one. I do know how to pass parameters and a datatable that I myself pre-filtered before passing it to the report. For example filtering items by date range and customer or order id.
any help would be much appreciated

Create a formula for each day of the week that totals the order.
ie Sunday quantity:
if dayOfWeek(dateField) = 'Sun'
then order.quantity
else 0
Add each day formula to the detail section of the report and then summarize it for each group level. To group it by week, just group by the date field, then set the grouping option to by week. Suppress the detail, and you'll have what you are looking for.
I don't remember the exact name of the dayOfWeek function, but it's something like that.

Related

how to query/add filter attribute in dynamodb query?

I am creating a composite key with artist and song in dynamo db. artist as a primary key and song as an sort key. i'm aware that query might be slow and expensive given how dynamo db works and how filter is applied, is it possible to apply filter to multiple attribute and how does it look like , for example - query all the records by an artist (joe) in a particular genre (country) with rating higher than 7
id | artist | song | albumTitle | Price | Genre | rating
1 | TI | hello | south | 90 | Rap | 7.0
2 | joe | good | free | 87 | pop | 8
3 | joe | bye | one | 99 | country| 7
4 | joe | beat | one | 99 | country| 5
5 | joe | sun | one | 99 | country| 9

Select values from one table, count common values from other table, show 0 if no common values

I have two tables that are something as follows:
WORKDAYS
DATE | WORKDAY_LENGHT |
-----------+----------------+
12-05-2018 | 8 |
13-05-2018 | 6.5 |
14-05-2018 | 7.5 |
15-05-2018 | 8 |
ACCIDENTS
TOD | SEVERITY |
-----------------+-----------+
12-05-2018 12:00 | minor |
12-05-2018 15:00 | minor |
13-05-2018 08:00 | severe |
13-05-2018 12:00 | severe |
14-05-2018 10:30 | severe |
And I need a result that is as follows:
WORKDAYS
DATE | WORKDAY_LENGHT | ACCIDENTS_COUNT|
-----------+----------------+----------------+
12-05-2018 | 8 | 2 |
13-05-2018 | 6.5 | 2 |
14-05-2018 | 7.5 | 1 |
15-05-2018 | 8 | 0 |
What I so far have tried is this:
SELECT DISTINCT
w.date,
(
SELECT
COUNT(*)
FROM
accidents a
WHERE
date(w.date) = date(a.tod)
)
AS accidents_count
FROM
workdays w
Which gives me an answer that is somewhat in the right direction. Something like this:
WORKDAYS
DATE | WORKDAY_LENGHT | ACCIDENTS_COUNT|
-----------+----------------+----------------+
12-05-2018 | 8 | 1 |
12-05-2018 | 8 | 1 |
13-05-2018 | 6.5 | 1 |
13-05-2018 | 6.5 | 1 |
14-05-2018 | 7.5 | 1 |
15-05-2018 | 8 | 0 |
This is sqlite, so the date values are stored as strings. The date function therefore should make them just dates, right? Or is that the one causing problems?
I was missing a group by and feel ashamed for opening a question before figuring this out.
adding GROUP BY date(w.date) is the solution here.

Data imputation for empty subsetted dataframes in R

I'm trying to build a function in R in which I can subset my raw dataframe according to some specifications, and thereafter convert this subsetted dataframe into a proportion table.
Unfortunately, some of these subsettings yields to an empty dataframe as for some particular specifications I do not have data; hence no proportion table can be calculated. So, what I would like to do is to take the closest time step from which I have a non-empty subsetted dataframe and use it as an input for the empty subsetted dataframe.
Here some insights to my dataframe and function:
My raw dataframe looks +/- as follows:
| year | quarter | area | time_comb | no_individuals | lenCls | age |
|------|---------|------|-----------|----------------|--------|-----|
| 2005 | 1 | 24 | 2005.1.24 | 8 | 380 | 3 |
| 2005 | 2 | 24 | 2005.2.24 | 4 | 490 | 2 |
| 2005 | 1 | 24 | 2005.1.24 | 3 | 460 | 6 |
| 2005 | 1 | 21 | 2005.1.21 | 25 | 400 | 2 |
| 2005 | 2 | 24 | 2005.2.24 | 1 | 680 | 6 |
| 2005 | 2 | 21 | 2005.2.21 | 2 | 620 | 5 |
| 2005 | 3 | 21 | 2005.3.21 | NA | NA | NA |
| 2005 | 1 | 21 | 2005.1.21 | 1 | 510 | 5 |
| 2005 | 1 | 24 | 2005.1.24 | 1 | 670 | 4 |
| 2006 | 1 | 22 | 2006.1.22 | 2 | 750 | 4 |
| 2006 | 4 | 24 | 2006.4.24 | 1 | 660 | 8 |
| 2006 | 2 | 24 | 2006.2.24 | 8 | 540 | 3 |
| 2006 | 2 | 24 | 2006.2.24 | 4 | 560 | 3 |
| 2006 | 1 | 22 | 2006.1.22 | 2 | 250 | 2 |
| 2006 | 3 | 22 | 2006.3.22 | 1 | 520 | 2 |
| 2006 | 2 | 24 | 2006.2.24 | 1 | 500 | 2 |
| 2006 | 2 | 22 | 2006.2.22 | NA | NA | NA |
| 2006 | 2 | 21 | 2006.2.21 | 3 | 480 | 2 |
| 2006 | 1 | 24 | 2006.1.24 | 1 | 640 | 5 |
| 2007 | 4 | 21 | 2007.4.21 | 2 | 620 | 3 |
| 2007 | 2 | 21 | 2007.2.21 | 1 | 430 | 3 |
| 2007 | 4 | 22 | 2007.4.22 | 14 | 410 | 2 |
| 2007 | 1 | 24 | 2007.1.24 | NA | NA | NA |
| 2007 | 2 | 24 | 2007.2.24 | NA | NA | NA |
| 2007 | 3 | 24 | 2007.3.22 | NA | NA | NA |
| 2007 | 4 | 24 | 2007.4.24 | NA | NA | NA |
| 2007 | 3 | 21 | 2007.3.21 | 1 | 560 | 4 |
| 2007 | 1 | 21 | 2007.1.21 | 7 | 300 | 3 |
| 2007 | 3 | 23 | 2007.3.23 | 1 | 640 | 5 |
Here year, quarter and area refers to a particular time (Year & Quarter) and area for which X no. of individuals were measured (no_individuals). For example, from the first row we get that in the first quarter of the year 2005 in area 24 I had 8 individuals belonging to a length class (lenCLs) of 380 mm and age=3. It is worth to mention that for a particular year, quarter and area combination I can have different length classes and ages (thus, multiple rows)!
So what I want to do is basically to subset the raw dataframe for a particular year, quarter and area combination, and from that combination calculate a proportion table based on the number of individuals in each length class.
So far my basic function looks as follows:
LAK <- function(df, Year="2005", Quarter="1", Area="22", alkplot=T){
require(FSA)
# subset alk by year, quarter and area
sALK <- subset(df, year==Year & quarter==Quarter & area==Area)
dfexp <- sALK[rep(seq(nrow(sALK)), sALK$no_individuals), 1:ncol(sALK)]
raw <- t(table(dfexp$lenCls, dfexp$age))
key <- round(prop.table(raw, margin=1), 3)
return(key)
if(alkplot==TRUE){
alkPlot(key,"area",xlab="Age")
}
}
From the dataset example above, one can notice that for year=2005 & quarter=3 & area=21, I do not have any measured individuals. Yet, for the same area AND year I have data for either quarter 1 or 2. The most reasonable assumption would be to take the subsetted dataframe from the closest time step (herby quarter 2 with the same area and year), and replace the NA from the columns "no_individuals", "lenCls" and "age" accordingly.
Note also that for some cases I do not have data for a particular year! In the example above, one can see this by looking into area 24 from year 2007. In this case I can not borrow the information from the nearest quarter, and would need to borrow from the previous year instead. This would mean that for year=2007 & area=24 & quarter=1 I would borrow the information from year=2006 & area=24 & quarter 1, and so on and so forth.
I have tried to include this in my function by specifying some extra rules, but due to my poor programming skills I didn't make any progress.
So, any help here will be very much appreciated.
Here my LAK function which I'm trying to update:
LAK <- function(df, Year="2005", Quarter="1", Area="22", alkplot=T){
require(FSA)
# subset alk by year, quarter and area
sALK <- subset(df, year==Year & quarter==Quarter & area==Area)
# In case of empty dataset
#if(is.data.frame(sALK) && nrow(sALK)==0){
if(sALK[rowSums(is.na(sALK)) > 0,]){
warning("Empty subset combination; data will be subsetted based on the
nearest timestep combination")
FIXME: INCLDUE IMPUTATION RULES HERE
}
dfexp <- sALK[rep(seq(nrow(sALK)), sALK$no_individuals), 1:ncol(sALK)]
raw <- t(table(dfexp$lenCls, dfexp$age))
key <- round(prop.table(raw, margin=1), 3)
return(key)
if(alkplot==TRUE){
alkPlot(key,"area",xlab="Age")
}
}
So, I finally came up with a partial solution to my problem and will include my function here in case it might be of someone's interest:
LAK <- function(df, Year="2005", Quarter="1", Area="22",alkplot=T){
require(FSA)
# subset alk by year, quarter, area and species
sALK <- subset(df, year==Year & quarter==Quarter & area==Area)
print(sALK)
if(nrow(sALK)==1){
warning("Empty subset combination; data has been subsetted to the nearest input combination")
syear <- unique(as.numeric(as.character(sALK$year)))
sarea <- unique(as.numeric(as.character(sALK$area)))
sALK2 <- subset(df, year==syear & area==sarea)
vals <- as.data.frame(table(sALK2$comb_index))
colnames(vals)[1] <- "comb_index"
idx <- which(vals$Freq>1)
quarterId <- as.numeric(as.character(vals[idx,"comb_index"]))
imput <- subset(df,year==syear & area==sarea & comb_index==quarterId)
dfexp2 <- imput[rep(seq(nrow(imput)), imput$no_at_length_age), 1:ncol(imput)]
raw2 <- t(table(dfexp2$lenCls, dfexp2$age))
key2 <- round(prop.table(raw2, margin=1), 3)
print(key2)
if(alkplot==TRUE){
alkPlot(key2,"area",xlab="Age")
}
} else {
dfexp <- sALK[rep(seq(nrow(sALK)), sALK$no_at_length_age), 1:ncol(sALK)]
raw <- t(table(dfexp$lenCls, dfexp$age))
key <- round(prop.table(raw, margin=1), 3)
print(key)
if(alkplot==TRUE){
alkPlot(key,"area",xlab="Age")
}
}
}
This solves my problem when I have data for at least one quarter of a particular Year & Area combination. Yet, I'm still struggling to figure out how to deal when I do not have data for a particular Year & Area combination. In this case I need to borrow data from the closest Year that contains data for all the quarters for the same area.
For the example exposed above, this would mean that for year=2007 & area=24 & quarter=1 I would borrow the information from year=2006 & area=24 & quarter 1, and so on and so forth.
I don't know if you have ever encountered MICE, but it is a pretty cool and comprehensive tool for variable imputation. It also allows you to see how the imputed data is distributed so that you can choose the method most suited for your problem. Check this brief explanation and the original package description

Finding the starting location of specific consecutive values in every row

I am kind of new to R and am still learning how to use all the functions. I'm stuck with this problem on my project, so would definitely appreciate any help!
I have the spending behavior data of customer 1, 2, 3, and 4 from Jan to Dec. Every row is a unique customer, and every column is the customer's spending activity of that month (1 is active, 0 is inactive).
+------------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| Name | Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec |
+------------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| Customer 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 |
| Customer 2 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 |
| Customer 3 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| Customer 4 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
+------------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
I am trying to find the "last" occurrence of 3 consecutive zero's in every row, if any, and locate the starting point of them. This would help me identify when the customer goes into "idle" or "hibernation" status.
The expected results would be:
Jun (or 6 as column index) for Customer 1
Sep (or 9 as column index) for Customer 2
Oct (or 10 as column index) for Customer 3, since the last 3-month occurrence begins at Oct instead of Sep
NA for Customer 4, since there's no such an occurrence
After going through some similar questions on stackoverflow, I figured using rle and apply might be the right approach, but I have been struggling with how to write this into actual code. Sincerely appreciate any idea!
I would melt the dataframe so there are 3 columns:
Name, Month, Spending Behaviour.
I would then do a rollmean from the zoo package with a left align, subset for instances where the rollmean is 0 and pick the last observation for each customer.

how to reference a result in a subquery

I have the following table in an sqlite database
+----+-------------+-------+
| ID | Week Number | Count |
+----+-------------+-------+
| 1 | 1 | 31 |
| 2 | 2 | 16 |
| 3 | 3 | 73 |
| 4 | 4 | 59 |
| 5 | 5 | 44 |
| 6 | 6 | 73 |
+----+-------------+-------+
I want to get the following table out. Where I get this weeks sales as one column and then the next column will be last weeks sales.
+-------------+-----------+-----------+
| Week Number | This_Week | Last_Week |
+-------------+-----------+-----------+
| 1 | 31 | null |
| 2 | 16 | 31 |
| 3 | 73 | 16 |
| 4 | 59 | 73 |
| 5 | 44 | 59 |
| 6 | 73 | 44 |
+-------------+-----------+-----------+
This is the select statement i was going to use:
select
id, week_number, count,
(select count from tempTable
where week_number = (week_number-1))
from
tempTable;
You are comparing values in two different rows. When you are just writing week_number, the database does not know which one you mean.
To refer to a column in a specific table, you have to prefix it with the table name: tempTable.week_number.
And if both tables have the same name, you have to rename at least one of them:
SELECT id,
week_number,
count AS This_Week,
(SELECT count
FROM tempTable AS T2
WHERE T2.week_number = tempTable.week_number - 1
) AS Last_Week
FROM tempTable;
In case of you want to take a query upon a same table twice, you have to put aliases on the original one and its replicated one to differentiate them
select a.week_number,a.count this_week,
(select b.count from tempTable b
where b.week_number=(a.week_number-1)) last_week
from tempTable a;

Resources