Creating a HAVING COUNT(column) > 2 clause in pyDAL - count

I have the following pyDAL table:
market = db.define_table(
'market',
Field('name'),
Field('ask', type='double'),
Field('timestamp', type='datetime', default=datetime.now)
)
I would like to use the expression language to execute the following SQL:
SELECT * FROM market
GROUP BY name
ORDER BY timestamp DESCENDING
HAVING COUNT(name) > 1
I know how to do the ORDER BY and the GROUP BY:
db().select(
db.market.ALL,
orderby=~db.market.timestamp,
groupby=db.market.name
)
but I do not know how to do a count within a having clause even after reading the section in the web2py book on the HAVING clause.

The count() function returns an expression which can be used both as a field in the select query, and to build an argument to the query's having parameter. The Grouping and counting section from the web2py manual has a few hints on this topic.
The following code will give the desired result. The row objects will hold both the market objects and their respective row counts.
count = db.market.name.count()
rows = db().select(
db.market.ALL,
count,
groupby=db.market.name,
orderby=~db.market.timestamp,
having=(count > 2)
)

Related

Count(case when) redshift sql - receiving groupby error

I'm trying to do a count(case when) in Amazon Redshift.
Using this reference, I wrote:
select
sfdc_account_key,
record_type_name,
vplus_stage,
vplus_stage_entered_date,
site_delivered_date,
case when vplus_stage = 'Lost' then -1 else 0 end as stage_lost_yn,
case when vplus_stage = 'Lost' then 2000 else 0 end as stage_lost_revenue,
case when vplus_stage = 'Lost' then datediff(month,vplus_stage_entered_date,CURRENT_DATE) else 0 end as stage_lost_months_since,
count(case when vplus_stage = 'Lost' then 1 else 0 end) as stage_lost_count
from shared.vplus_enrollment_dim
where record_type_name = 'APM Website';
But I'm getting this error:
[42803][500310] [Amazon](500310) Invalid operation: column "vplus_enrollment_dim.sfdc_account_key" must appear in the GROUP BY clause or be used in an aggregate function; java.lang.RuntimeException: com.amazon.support.exceptions.ErrorException: [Amazon](500310) Invalid operation: column "vplus_enrollment_dim.sfdc_account_key" must appear in the GROUP BY clause or be used in an aggregate function;
Query was running fine before I added the count. I'm not sure what I'm doing wrong here -- thanks!
You can not have an aggregate function (sum, count etc) without group by
The syntax is like this
select a, count(*)
from table
group by a (or group by 1 in Redshift)
In your query you need to add
group by 1,2,3,4,5,6,7,8
because you have 8 columns other than count
Since I don't know your data and use case I can not tell you it will give you the right result, but SQL will be syntactically correct.
The basic rule is:
If you are using an aggregate function (eg COUNT(...)), then you must supply a GROUP BY clause to define the grouping
Exception: If all columns are aggregates (eg SELECT COUNT(*), AVG(sales) FROM table)
Any columns that are not aggregate functions must appear in the GROUP BY (eg SELECT year, month, AVG(sales) FROM table GROUP BY year, month)
Your query has a COUNT() aggregate function mixed-in with non-aggregate values, which is giving rise to the error.
In looking at your query, you probably don't want to group on all of the columns (eg stage_lost_revenue and stage_lost_months_since don't look like likely grouping columns). You might want to mock-up a query result to figure out what you actually want from such a query.

How to cast a column into decimal of varying significant digits in Oracle

I have a column that is stored in ###0.0000000000 format. In a report I'm generating I need it to only show a few significant digits. Problem is the number needed changes based on the product with a default of 2. There's a column in another table that provides the required digits per each product.
I've tried a few things so far but it seems to not like it and throws a syntax error.
Cast(A.Price as Numeric(10,coalesce(B.Sig_Digits,2)))
That threw an error so I tried making the coalesce part a column and aliasing it in case the coalesce broke it, and that didn't work either. Round will take a column as an argument but I don't want it to round. Other than an ugly
case when Sig_digits = 1 then to_char(price,'###0.0') when Sig_digits = 2...
etc. what other options are there? This is a very large report, with 100+ columns and a few million rows so I'd prefer to not do the case when.
Use TO_CHAR with RPAD to add 0s to the end of the format model to the correct number of decimal places:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE table_name ( value, sig ) AS
SELECT 123.456789, 2 FROM DUAL UNION ALL
SELECT 123456789.123456789, 7 FROM DUAL;
Query 1:
SELECT TO_CHAR( value, RPAD( 'FM999999999990.', sig + 15, '0' ) )
FROM table_name
Results:
| TO_CHAR(VALUE,RPAD('FM999999999990.',SIG+15,'0')) |
|---------------------------------------------------|
| 123.46 |
| 123456789.1234568 |

PL SQL SELECT Case Statements involving aggregate values

I'm trying to write a query that in Teradata but I'm not sure how to do it; my table looks like this:
col1: text (account_number)
col2: text (secondary account number)
col3: text (Primary_cust)
the business requirements are:
"Group records by account number.
If there is only one record for an account then keep that record.
If there are multiple records for an account number then:
(1) if only one record has Primary_CUST = 'Y' then keep.
(2) if multiple records have Primary_CUST = 'Y' then keep one with lowest SCDRY_ACCT_NBR
(3) If no records have Primary_CUST = 'Y' then keep one with lowest SCDRY_ACCT_NBR.
I know I need a CASE statement and I'm able to write the first requirement, but not sure on the second. Any help would be greatly appreciated.
You just have to think about how to order the rows to get the row you want on top, seems to be like this:
SELECT * FROM tab
QUALIFY
Row_Number()
Over (PARTITION BY account_number -- for each account
ORDER BY Primary_CUST DESC -- 'Y' before 'N' (assuming it's a Y/N column)
,SCDRY_ACCT_NBR -- lowest number
) = 1 -- return the top row
Of course QUALIFY is proprietary Teradata syntax, if you need to do this on Oracle you have to wrap it in a Derived Table:
SELECT *
FROM
(
SELECT t.*,
Row_Number()
Over (PARTITION BY account_number -- for each account
ORDER BY Primary_CUST DESC -- 'Y' before 'N' (assuming it's a Y/N column)
,SCDRY_ACCT_NBR) AS rn-- lowest number
FROM tab
) AS dt
WHERE rn = 1 -- return the top row

Aggregate Data Without Group By

I'm trying to create a single row of data, but the Group By clause is screwing me up.
Here's my table:
RegistrationPK : DateBirth : RegistrationDate
I'm trying to get the age of people at the time of Registration.
What I have is:
SELECT
CASE WHEN DATEDIFF(YEAR,DateBirth,RegistrationDate) < 20 THEN COUNT(registrationpk) END AS Under20
FROM dbo.Registration r
GROUP BY r.DOB, r.RegDate
Instead of getting one column "Under20" with one row of data, I get all the different DateBirth rows.
How can I do a DateDiff without a Group By?
Here it is:
SELECT
SUM (
CASE WHEN DATEDIFF(YEAR,DateBirth,regdate) < 20 THEN 1
ELSE 0 END
)
FROM dbo.Registration r
I got this to work, but I hate Selects within Selects. If anyone knows a simpler way I'd appreciate it. For some reason, when done as a select within a select, SQL doesn't require either statement to have the GroupBy clause.
SELECT COUNT(UNDER20) AS UNDER20
FROM (
SELECT
UNDER20 = CASE WHEN DATEDIFF(YEAR,DateBirth,regdate) < 20 THEN '1' END
FROM dbo.Registration r
) a

How to UPDATE multiple columns using a correlated subquery in SQLite?

I want to update multiple columns in a table using a correlated subquery. Updating a single column is straightforward:
UPDATE route
SET temperature = (SELECT amb_temp.temperature
FROM amb_temp.temperature
WHERE amb_temp.location = route.location)
However, I'd like to update several columns of the route table. As the subquery is much more complex in reality (JOIN with a nested subquery using SpatiaLite functions), I want to avoid repeating it like this:
UPDATE route
SET
temperature = (SELECT amb_temp.temperature
FROM amb_temp.temperature
WHERE amb_temp.location = route.location),
error = (SELECT amb_temp.error
FROM amb_temp.temperature
WHERE amb_temp.location = route.location),
Ideally, SQLite would let me do something like this:
UPDATE route
SET (temperature, error) = (SELECT amb_temp.temperature, amb_temp.error
FROM amb_temp.temperature
WHERE amb_temp.location = route.location)
Alas, that is not possible. Can this be solved in another way?
Here's what I've been considering so far:
use INSERT OR REPLACE as proposed in this answer. It seems it's not possible to refer to the route table in the subquery.
prepend the UPDATE query with a WITH clause, but I don't think that is useful in this case.
For completeness sake, here's the actual SQL query I'm working on:
UPDATE route SET (temperature, time_distance) = ( -- (C)
SELECT -- (B)
temperature.Temp,
MIN(ABS(julianday(temperature.Date_HrMn)
- julianday(route.date_time))) AS datetime_dist
FROM temperature
JOIN (
SELECT -- (A)
*, Distance(stations.geometry,route.geometry) AS distance
FROM stations
WHERE EXISTS (
SELECT 1
FROM temperature
WHERE stations.USAF = temperature.USAF
AND stations.WBAN_ID = temperature.NCDC
LIMIT 1
)
GROUP BY stations.geometry
ORDER BY distance
LIMIT 1
) tmp
ON tmp.USAF = temperature.USAF
AND tmp.WBAN_ID = temperature.NCDC
)
High-level description of this query:
using geometry (= longitude & latitude) and date_time from the route table,
(A) find the weather station (stations table, uniquely identified by the USAF and NCDC/WBAN_ID columns)
closest to the given longitude/latitude (geometry)
for which temperatures are present in the temperature table
(B) find the temperature table row
for the weather station found above
closest in time to the given timestamp
(C) store the temperature and "time_distance" in the route table

Resources