Sum over all previous, until certain point - formula

I want to create a calculated column in Spotfire which sum the value until a certain point such that it starts over again summing the values.
See below example. When there is a value in the Stocks column, the sum of the Volumes needs to restart again from that point etc.
Thanks!

Breaking it down, you can accomplish this by
1- Calculate a column which groups rows based on the previous non null Stocks row [Group]
last(case when [Stocks] is not null then [Date] end) OVER (allPrevious([Date]))
2- Create a Hierarchy contatining the grouping and date [Gp_Date_Hr]
CREATE NESTED HIERARCHY [Gp_Date_Hr]
[Group] AS [Group],
[Date] AS [Date]
3- Calculate your desired value
Sum([Volume]) OVER (Intersect(Parent([Hierarchy.Gp_Date_Hr]),AllPrevious([Hierarchy.Gp_Date_Hr])))

Related

r data.table - shift/lead - accessing value of multiple row

Is it possible to access the value of multiple previous rows? I would like to look up the value into the previous row (more like cumulative or relative way) e.g. get value from the column as a list from all previous rows
e.g. see below reference code which is calculating unbiased mean by excluding existing row. I am looking for an option to exclude all previous rows from the current row (i.e. relative processing) Based on my assumption, I am assuming that shift will allow us to access the previous or next row but not the value from all the previous rows OR all next row.
http://brooksandrew.github.io/simpleblog/articles/advanced-data-table/#method-1-in-line
dt <- data.table(mtcars)[,.(cyl, gear, mpg)]
dt[, dt[!gear %in% unique(dt$gear)[.GRP], mean(mpg), by=cyl], by=gear] #unbiased mean

Adding date from 3 columns in table X to one column in table Y

Hi I am new to sqlite and I am wondering if it is possible to add date from 3 columns in table X to one column in table Y. For example, in Table X, I have 3 columns called startDay,startMonth,startYear. I want to add these to one column in table Y called Start_Date (possible in format DD/MM/YYYY). Also hopefully the format it is in should be able to carry out computation, i.e subtracting 2 dates. Any ideas?
You can do something like:
CREATE TABLE newtable(start_date TEXT);
INSERT INTO newtable
SELECT printf('%d-%02d-%02d', startYear, startMonth, startDay)
FROM oldtable;
And to compute the number of days between two dates:
SELECT juliandate('2019-06-30') - juliandate('2019-06-29') AS diff;
diff
----------
1.0
(Using a format other than those supported by sqlite date and time functions like your dd/mm/yyyy is a bad idea. Means you can't use them with the functions, and in your case, also means you can't meaningfully sort by date)

Tableau - Average of Ranking based on Average

For a certain data range, for a specific dimension, I need to calculate the average value of a daily rank based on the average value.
First of all this is the starting point:
This is quite simple and for each day and category I get the AVG(value) and the Ranke based on that AVG(Value) computed using Category.
Now what I need is "just" a table with one row for each Category with the average value of that rank for the overall period.
Something like this:
Category Global Rank
A (blue) 1,6 (1+3+1+1+1+3)/6
B (orange) 2,3 (3+2+3+2+2+2)/6
C (red) 2,0 (2+1+2+3+3+1)/6
I tried using the LOD but it's not possble using rank table calculation inside them so I'm wondering if I'm missing anything or if it's even possible in Tableau.
Please find attached the twbx with the raw data here:
Any Help would be appreciated.

adding repeating sequence numbers to a column in SQLite database based on conditions

I added a column in my SQLite database, and I need to insert repeating sequence numbers, starting with 1...n BUT it's based on grouping by other columns. The sequence needs to start over at 1 again when there is a new grouping.
Here is my table:
CREATE TABLE "ProdRunResults" ("ID" INTEGER PRIMARY KEY NOT NULL UNIQUE , "SeqNumbr" INTEGER, "Shift" INTEGER, "ShiftSeqNumbr" INTEGER, "Date" DATETIME, "ProdRunID" INTEGER, "Result" VARCHAR)
ShiftSeqNumbr is the new column that I need to populate with sequence numbers, based on grouping of numbers in ProdRunID column then by numbers in the Shift column.
There could be up to 3 "shifts" (work shifts in a 24 hr period).
I scraped together some code to do this but it adds the sequence numbers to ShiftSeqNumbr column in reverse (descending) order:
UPDATE ProdRunResults
SET ShiftSeqNumbr = (SELECT COUNT (*)
FROM ProdRunResults AS N
WHERE N.ProdRunID = ProdRunResults.ProdRunID
AND N.Shift = ProdRunResults.Shift
AND N.ShiftSeqNumbr = ProdRunResults.ShiftSeqNumbr);
How can I change the Update statement so the sequence numbers start at 1 and go up? Or is there a better way to do this?
Your UPDATE statement counts how many rows there are that have the same values in the ProdRunID/Shift/ShiftSeqNumbr columns as the current row. The current row always has an empty value in ShiftSeqNumbr, so it is counting how many rows in the current group have not yet been updated.
You need to count how many rows come before the current row, i.e., how many rows have the same ProdRunID and Shift values, and the same or a smaller SeqNumbr value:
UPDATE ProdRunResults
SET ShiftSeqNumbr = (SELECT COUNT (*)
FROM ProdRunResults AS N
WHERE N.ProdRunID = ProdRunResults.ProdRunID
AND N.Shift = ProdRunResults.Shift
AND N.SeqNumbr <= ProdRunResults.SeqNumbr);

Conditional operation on two data frames (R)

I'm having some difficulty executing a conditional operation on two dataframes. For problem illustration, I have three variables: Price, State, and Item, which are stored in a data frame (data1) with those column names. I use ddply to generate a dataframe (data2) that includes columns State and Item, and the average price(or some other function) for that State/Item combination.
What I then want to do is fill in a column in the originating data frame(i.e. a simple prediction vector), where the column's value is the mean value for a given observations combination of State and Item in data1. (e.g., if an observation in data1 has state="Arizona" and item="pen", I then want to retrieve the average price stored in data2 that corresponds to that state/item combination, and insert it into the column.)
Thank you for any help.
The plyr package comes with a great little function called join. You can use this to complete your task.
join(dat1,dat2, by=c('State','Item'))
Review ?join to see the different types of joins possible. I'm pretty sure you want a left join.

Resources