I have a SQLite database where entries are sorted like this:
| ID | length | breadth | height | time |
1 10 20 30 123
1 10 20 15 432
2 4 2 7 543
2 4 2 8 234
As you see, the height column can vary over time. I want to get the entry with the largest height, for every unique ID in my database. Is there some way to do this in one single query, instead of looping through all id's with something like this
for x in ids:
SELECT length, breadth, height FROM table WHERE id = x ORDER BY height DESC LIMIT 1
Use GROUP BY:
SELECT ID, MAX(height) FROM table GROUP BY ID
Related
I have df that represents users browsing behavior over time. Therefore the df contains a unique UserId and each row has a timestamp and represents a visit to a certain website. Each website has a unique website Id and a unique website category, say c("electronics", "clothes",....).
Now I want to count per row how many unique websites per category the user has visited up to that row (including that row). I call this variable "breadth" since it represents how broad a user is browsing through the internet.
So far I only manage to produce dumb code that creates the total number of unique websites visited per category by filtering on each category and then take the length of the unique vector by the user and then do a left join.
Therefore I do lose information about the development over time.
Thanks so much in advance!
total_breadth <- df %>% filter(category=="electronics") %>%
group_by(user_id) %>%
mutate(breadth=length(unique(website_id)))
#Structure of the df I want to achieve:
user_id time website_id category breadth
1 1 70 "electronics" 1
1 2 93 "clothing" 1
1 3 34 "electronics" 2
1 4 93 "clothing" 1
1 5 26 "electronics" 3
1 6 70 "electronics" 3
#Structure of the df I produce:
user_id time website_id category breadth
1 1 70 "electronics" 3
1 2 93 "clothing" 1
1 3 34 "electronics" 3
1 4 93 "clothing" 1
1 5 26 "electronics" 3
1 6 70 "electronics" 3
This seems to be a case of a split, apply and combine.
Create a binary matrix of 1s and 0s whose dimensions are:
No. of Rows = No. of rows in the original data
No of Columns = No. of unique website categories
Each Row represents the timestamp and each column represents the respective website category. So a cell will be equal to 1 if and only if the user has visited the website for that website category on the respective timestamp else it will be 0.
Take the cumulative sum for individual columns of this matrix and then create a final column where it takes the value only for the visited website category on the respective timestamp.
Though it doesn't seem to be an elegant solution, hope this should solve your problem temporarily.
Im using sqlite3 with my node.js API.
I have a DB talbe structured below:
id | colour
___|_______
1 | blue
1 | red
1 | green
2 | yellow
2 | green
5 | red
I want to return a count of the IDs in my table such that
1 - 3 occurences
2 - 2 occurences
5 - 1 occurence
Is there a sql qualifier I can use count like this, or will this need to be done within the js iteself?
Any help here would be awesome!
You can use COUNT with GROUP BY
select id, COUNT(id) from tbl GROUP BY id
I may be missing some elegant ways in Stata to get to this example, which has to do with electrical parts and observed monthly failures etc.
clear
input str3 (PartID Type FailType)
ABD A 4
BBB S 0
ABD A 3
ABD A 4
ABC A 2
BBB A 0
ABD B 1
ABC B 7
BBB C 1
BBB D 0
end
I would like to group by (bysort) each PartID and record the highest frequency for FailType within each PartID type. Ties can be broken arbitrarily, and preferably, the lower one can be picked.
I looked at groups etc., but do not know how to peel off certain elements from the result set. So that is a major question for me. If you execute a query, how do you select only the elements you want for the next computation? Something like n(0) is the count, n(1) is the mean etc. I was able to use contract, bysort etc. and create a separate data set which I then merged back into the main set with an extra column There must be something simple using gen or egen so that there is no need to create an extra data set.
The expected results here will be:
PartID Freq
ABD 4 #(4 occurs twice)
ABC 2 #(tie broken with minimum)
BBB 0 #(0 occurs 3 times)
Please let me know how I can pick off specific elements that I need from a result set (can be from duplicate reports, tab etc.)
Part II - Clarification: Perhaps I should have clarified and split the question into two parts. For example, if I issue this followup command after running your code: tabdisp Type, c(Freq). It may print out a nice table. Can I then use that (derived) table to perform more computations programatically?
For example get the first row of the table.
Table. ----------------------
Type| Freq ----------+-----------
A | -1
B | -1
C | -1
D | -3
S | -3
---------------------- –
I found this difficult to follow (see comment on question), but some technique is demonstrated here. The numbers of observations in subsets of observations defined by by: are given by _N. The rest is sorting tricks. Negating the frequency is a way to select the highest frequency and the lowest Type which I think is what you are after when splitting ties. Negating back gets you the positive frequencies.
clear
input str3 (PartID Type FailType)
ABD A 4
BBB S 0
ABD A 3
ABD A 4
ABC A 2
BBB A 0
ABD B 1
ABC B 7
BBB C 1
BBB D 0
end
bysort PartID FailType: gen Freq = -_N
bysort PartID (Freq Type) : gen ToShow = _n == 1
replace Freq = -Freq
list PartID Type FailType Freq if ToShow
+---------------------------------+
| PartID Type FailType Freq |
|---------------------------------|
1. | ABC A 2 1 |
3. | ABD A 4 2 |
7. | BBB A 0 3 |
+---------------------------------+
I have a set of data in the following format:
Items Shipped | Month
A 1
B 1
C 1
D 2
E 2
F 3
G 3
H 3
I would like to show the count of items shipped each month using a calculated field in Tableau.
Item_Count | Month
3 1
2 2
3 3
Any Suggestions?
You should probably have a look on the Tableau page for their basic tutorials:
https://www.tableau.com/learn/training
Drag the [month] pill to row (if it's an actual date, change it to discrete month, otherwise leave it like it is)
Drag the [item_count] to columns, click on it and change it to COUNT or COUNTD depending whether you want the total count or only the distinct elements.
I want to query a table name Lines like this:
ID Part Count
--- --------- ------------
1 5 234
2 5 846
3 5 234
4 6 585
5 6 585
6 7 465
and return the rows data like following :
ID Part Count
--- --------- ------------
1 5 1314
4 6 1170
6 7 465
What I want is to merge the Count column value where Part column matches and return other rows as is.I know little about database and have tried many queries but not able to achieve the result that I want.
select Part, sum(Part) as Count from tableName
group by Part