I have researched this problem and have found the answer for a single query, where you can find the nth value of a single column by using DESC OFFSET 2. What I am trying to do is find the nth value for each item in a row. For example, I'm working with a data base concerning bike share data. The data base stores the duration of each trip and the date. I'm trying to find the 3rd longest duration for each day in a data base. If I was going to find the max duration I would use the following code.
SELECT DATE(start_date) trip_date, MAX(duration)
FROM trips
GROUP BY 1
I want the output to be something like this.
Date 3rd_duration
1/1/2017 334
1/2/2017 587
etc
If the value of the third longest duration is the same for two or more different trips, I would like the trip with the lowest trip_id to be ranked 3rd.
I'm working in SQLite.
Any help would be appreciated.
Neither SQLite nor MySQL have a ROW_NUMBER function built in, so get ready for an ugly query. We can still group by the date, but to find the max duration we can use a correlated subquery.
SELECT
DATE(t1.start_date) AS start_date,
t1.duration
FROM trips t1
WHERE
(SELECT COUNT(*) FROM trips t2
WHERE DATE(t2.start_date) = DATE(t1.start_date) AND
t2.duration <= t1.duration) = 3;
Note that this approach might break down if you could have, for a given date, more than one record with the same duration. In this case, you might get multiple results, neither of which might actually be the third highest duration. In order to handle such ties, you should tell us what the logic is with regard to ties.
Demo here:
Rextester
Related
My task is to get total inbound leads for a group of customers, leads by month for the same group of customers and conversion rate of those leads.
The dataset I'm pulling from is 20 million records so I can't query the whole thing. I have successfully done the first step (getting total lead count for each org with this:
inbound_leads <- domo_get_query('6d969e8b-fe3e-46ca-9ba2-21106452eee2',
auto_limit = TRUE,
query = "select org_id,
COUNT(*)
from table
GROUP BY org_id
ORDER BY org_id"
DOMO is the bi tool I'm pulling from and domo_get_query is an internal function from a custom library my company built. It takes a query argument which is a mysql query)and various others which aren't important right now.
sample data looks like this:
org_id, inserted_at, lead_converted_at
1 10/17/2021 2021-01-27T03:39:03
2 10/18/2021 2021-01-28T03:39:03
1 10/17/2021 2021-01-28T03:39:03
3 10/19/2021 2021-01-29T03:39:03
2 10/18/2021 2021-01-29T03:39:03
I have looked through many aggregation online tutorials but none of them seem to go over how to get data needed pre-aggregation (such as number of leads per month per org, which isn't possible once the aggregation has occurred because in the above sample the aggregation would remove the ability to see more than one instance of org_id 1 for example) from a dataset that needs to be aggregated in order to be accessed in the first place. Maybe I just don't understand this enough to know the right questions to ask. Any direction appreciated.
If you're unable to fit your data in memory, you have a few options. You could process the data in batches (i.e. one year at a time) so that it fits in memory. You could use a package like chunked to help.
But in this case I would bet the easiest way to handle your problem is to solve it entirely in your SQL query. To get leads by month, you'll need to truncate your date column and group by org_id, month.
To get conversion rate for leads in those months, you could add a column (in addition to your count column) that is something like:
sum(case when conversion_date is not null then 1 else 0) as convert_count
I'm trying to find the average figure for the last 10 rows in a database table:
select avg(Reading)
from Readings
Order By Rowid
desc limit 10;
This pulls the average of all entries in the table, not the last 10. I've tried all sorts of variations but can't get it to work.
Thanks for the super quick replies, I tried again and managed to type in the correct syntax this time in the From clause.
Here is the correct answer:
select avg(Reading)
from(select Reading
from Readings
Order By Rowid desc
limit 10);
I have a lot of rows with dates in some SQLite database. I'd like to get some aggregated statistics for weeks or some other periods.
Is it possible to bin data in SQLite (i.e. get some number which corresponds to the number of the period - e.g. week)?
To create a column of unique week identifiers use strftime('%Y%W', date_column). More info here.
For your purposes I believe a "SELECT ... GROUP BY strftime('%Y%W', date_column)" should work.
I have a data set in BIRT Designer with two columns, one with day of week abbreviation names (Su, M, Tu, etc.) and the other with numerical representations of those days of the week starting at 0 and going to 6 (0, 1, 2, etc.). I want to determine what percentage of the total number of rows that each day of week represents. For example, if I have 100 total rows and 12 of those rows correspond to Su/0, 12% of the total rows are made up of Su.
I would like to perform this same calculation within BIRT and graph (bar graph) those percentages that each day consists of out of the total. I'm just learning how to use BIRT and assume that I need to do some scripting either when making my data set or when specifying the rows when making the chart. Any tips would be greatly appreciated.
Use computed columns.
Edit Data set > Computed Columns
The simplest way is to put one column that counts every row, for each day of the week. You can have a separate column that adds a count if the day of the week is a specific values
if (row["Day"] == "Su"){
1
}
I should add: that you can use a 'data' element in your table to compute the percentage. A 'Dynamic Text' item could also be used, but the data item gives you a binded value that you can make better use of later if needed.
Edit
To get a total row count, us a computed column I name mine 'All'
For the Expression use the value "1"
With some inspiration from James Jenkins I think I found my answer. It was pretty simple in the end, but all I needed to do was make a new computed column and instead of adding an expression, I simply set the Aggregation to "COUNT". That counts all of the rows in your table and puts that total on each row. That way you can use that total in any calculations that you may need to do. I have added a screenshot for clarity.
I'm in this situation.
I have a cube that has data aggregated by day.
This cube has a dimension with hierarchy (year-> month-> day).
in this hierarchy are visible every day for the last 4 months.
my problem is counting the days selected, because I need to do calculations based on the days observed (on a pivot).
example:
There are data for the day 11/11/2013 (amount orders), but in the hierarchy I select values from 10.11.2013 to 20.11.2013 ..
how do I calculate (amount orders) / 11?
11 is the number of days 10-20.
See my answer at MDX average fact items count by time dimension.
Having the day count as a measure in the cube would automatically apply all filters etc. without you having to take care of it.
You can use the Count() function to calculate the number of members in a set. Your set can consist of a range of dates if you use a colon between start and end date. Something like this:
Count({[Date].[Day].&[2013]&[11]&[10] : [Date].[Day].&[2013]&[11]&[20]})