PL/SQL Case with Group By and Pivot - plsql

I have data that I'm presenting in an APEX interactive report, using a pivot statement to display monthly data for a period of 15 years. I am color coding some of the values based on if it contains a decimal using a case statement.
My problem is that by using the case statement, it is creating multiple rows from one row of data. My report is showing 2 rows for each item, one for the row containing values without decimals, and one row with values containing decimals.
Multiple Rows
How can I combine the rows into one? Use a Group By? or is there a better way?
select buscat, prod_parent, year_month, volume, load_source, tstamp,
case when instr(VOLUME, '.') > 0 then 'color:#FF7755;' else 'color:#000000;' end flag
from HISTORY where id > 0
Here is raw data from SQL query...
SQL return

According to the SQL Return image the data is not repeating. It looks like you are not filtering for 'Volume == 0'. Try changing 'ID' to 'volume' in where clause:
select yearMonth, volume, load_source, tstamp,
case when instr(volume, '.') > 0 then 'color:#FF7755;' else 'color:#000000;' end flag
from HISTORY
where volume > 0

Related

How to choice specific data value by row or column index in SAS?

I usually use R, but I just start studying SAS.
In R, we can make some data.frame like this :
df <- as.data.frame(matrix(c(1:6),nrow=2,ncol=3))
and then
df[1,2]
is 3.
Here is my question. How can I use row and column index in SAS?
I coudln't find this..
I want to use row and column number by index of double loop
If the row number and column number have meaning then you probably do not want to store your "matrix" in that form. Instead you probably want to store it in a tall format where the row and column values are stored in variables and the values of the cells in your matrix are stored in another variable. Since you didnt' provide any meaning to your example let's just name these variables ROW, COL and VALUE.
data have;
do col=1 to 3 ;
do row=1 to 2 ;
value+1;
output;
end;
end;
run;
Now if you want to find the value when ROW=1 and COL=2 it is a simple WHERE condition.
proc print data=have;
where row=1 and col=2;
run;
Result:
Obs col row value
3 2 1 3
In a real dataset the ROW might be the individual case or person ID and the COL might be the YEAR or VISIT_NUMBER or SAMPLE_NUMBER of the value.
You access columns via names and rows via _n_ if you really need but there isn't a good usage for this type of logic.
For example, if you wanted the third row and second variable from the SASHELP CLASS data set.
Note that you need to know the name of the variable, you cannot rely on the index/position.
This displays the information:
proc print data=sashelp.class(firstobs = 2 obs=2);
var Age;
run;
This puts in a data set, two different ways:
data want;
set sashelp.class (firstobs = 2 obs=2);
keep age;
run;
data want;
set sashelp.class ;
if _n_ = 3; *filters only the third row into the data set;
run;
R & Python import all data into memory and use that method as the default processing method. SAS instead only loads one row at a time and then continues to loop through each row within a data step, so you have to think of each step differently. Basically break your processes into smaller steps and they work. SAS does have some really nice built in functionality, like confidence intervals by default or the ability to aggregate data at multiple levels within a single procedure.

Adding date from 3 columns in table X to one column in table Y

Hi I am new to sqlite and I am wondering if it is possible to add date from 3 columns in table X to one column in table Y. For example, in Table X, I have 3 columns called startDay,startMonth,startYear. I want to add these to one column in table Y called Start_Date (possible in format DD/MM/YYYY). Also hopefully the format it is in should be able to carry out computation, i.e subtracting 2 dates. Any ideas?
You can do something like:
CREATE TABLE newtable(start_date TEXT);
INSERT INTO newtable
SELECT printf('%d-%02d-%02d', startYear, startMonth, startDay)
FROM oldtable;
And to compute the number of days between two dates:
SELECT juliandate('2019-06-30') - juliandate('2019-06-29') AS diff;
diff
----------
1.0
(Using a format other than those supported by sqlite date and time functions like your dd/mm/yyyy is a bad idea. Means you can't use them with the functions, and in your case, also means you can't meaningfully sort by date)

Redshift join with metadata table and select columns

I have created a subset of the pg_table_def table with table_name,col_name and data_type. I have also added a column active with 'Y' as value for some of the rows. Let us call this table as config.Table config looks like below:
table_name column_name
interaction_summary name_id
tag_transaction name_id
interaction_summary direct_preference
bulk_sent email_image_click
crm_dm web_le_click
Now I want to be able to map the table names from this table to the actual table and fetch values for the corresponding column. name_id will be the key here which will be available in all tables. My output should look like below:
name_id direct_preference email_image_click web_le_click
1 Y 1 2
2 N 1 2
The solution needs to be dynamic so that even if the table list extends tomorrow, the new table should be able to accommodate. Since I am new to Redshift, any help is appreciated. I am also considering to do the same via R using the dplyr package.
I understood that dynamic queries don't work with Redshift.
My objective was to pull any new table that comes in and use their columns for regression analysis in R.
I made this working by using listagg feature and concat operation. And then wrote the output to a dataframe in R. This dataframe would have 'n' number of select queries as different rows.
Below is the format:
df <- as.data.frame(tbl(conn,sql("select 'select ' || col_names|| ' from ' || table_name as q1 from ( select distinct table_name, listagg(col_name,',') within group (order by col_name)
over (partition by table_name) as col_names
from attribute_config
where active = 'Y'
order by table_name )
group by 1")))
Once done, I assigned every row of this dataframe to a new dataframe and fetched the output using below:
df1 <- tbl(conn,sql(df[1,]))
I know this is a round about solution. But it works !! Fetches about 17M records under 1 second.

Count observations arranged in multiple columns

I have a database with species ID in the rows (very large) and places where they occur in the columns (several sites). I need a summary of how many species are per site. My observations are categorical in some cases (present) or numerical (number of individuals), because they are from different database sources. Also, there are several na's in the entire database.
in R, I have been using functions to count observations one site at the time only.
I appreciate any help on how to count the observations from the different columns at the same time.
You could do just:
SELECT COUNT(*)
FROM tables
WHERE conditions
And in the conditions specify the different columns conditions
WHERE t.COLUMN1="THIS" AND t.COLUMN2="THAT"
or with a SUM CASE (probably the best idea in general):
SELECT grfield,
SUM(CASE when a=1 then 1 else 0 end) as tcount1,
SUM(CASE when a=2 then 1 else 0 end) as tcount2
FROM T1
GROUP by grfield;
Or in a more complex way you could do a subquery inside the count:
SELECT COUNT(*) FROM
(
SELECT DISTINCT D
FROM T1
INNER JOIN T2
ON A.T1=B.T2
) AS subquery;
You could do also several counts in subqueries... the possibilities are endless.

adding repeating sequence numbers to a column in SQLite database based on conditions

I added a column in my SQLite database, and I need to insert repeating sequence numbers, starting with 1...n BUT it's based on grouping by other columns. The sequence needs to start over at 1 again when there is a new grouping.
Here is my table:
CREATE TABLE "ProdRunResults" ("ID" INTEGER PRIMARY KEY NOT NULL UNIQUE , "SeqNumbr" INTEGER, "Shift" INTEGER, "ShiftSeqNumbr" INTEGER, "Date" DATETIME, "ProdRunID" INTEGER, "Result" VARCHAR)
ShiftSeqNumbr is the new column that I need to populate with sequence numbers, based on grouping of numbers in ProdRunID column then by numbers in the Shift column.
There could be up to 3 "shifts" (work shifts in a 24 hr period).
I scraped together some code to do this but it adds the sequence numbers to ShiftSeqNumbr column in reverse (descending) order:
UPDATE ProdRunResults
SET ShiftSeqNumbr = (SELECT COUNT (*)
FROM ProdRunResults AS N
WHERE N.ProdRunID = ProdRunResults.ProdRunID
AND N.Shift = ProdRunResults.Shift
AND N.ShiftSeqNumbr = ProdRunResults.ShiftSeqNumbr);
How can I change the Update statement so the sequence numbers start at 1 and go up? Or is there a better way to do this?
Your UPDATE statement counts how many rows there are that have the same values in the ProdRunID/Shift/ShiftSeqNumbr columns as the current row. The current row always has an empty value in ShiftSeqNumbr, so it is counting how many rows in the current group have not yet been updated.
You need to count how many rows come before the current row, i.e., how many rows have the same ProdRunID and Shift values, and the same or a smaller SeqNumbr value:
UPDATE ProdRunResults
SET ShiftSeqNumbr = (SELECT COUNT (*)
FROM ProdRunResults AS N
WHERE N.ProdRunID = ProdRunResults.ProdRunID
AND N.Shift = ProdRunResults.Shift
AND N.SeqNumbr <= ProdRunResults.SeqNumbr);

Resources