SQLite Ordering Whole Numbers - sqlite

I am fairly aware that ORDER BY in SQLite puts the number in Ascending order unless DESC is at the end. But I realized that it only worked for the starting numbers.
i.e
INT
14
78
357
2999
57
888
ORDER BY INT
Gives
14
2999
357
57
78
888
Is it possible to use the ORDER BY function where the whole numbers are in ascending Order?
As such
14
57
78
357
888
2999

select (INT * 1) as "int_number" from mytable order by 1
or as someone points out in the link:
select INT from mytable order by (cast INT as Integer)

Related

Percentage & GROUP BY

I'm currently working with a collisions dataset which provides all cases that occur in a given day. My first instinct was to get the totals for a given day, where the output looked something like:
collision_date
SUM(severe_injury_count)
SUM(injured_victims)
2001-02-20
19
785
2001-02-20
12
697
2001-02-20
28
823
2001-02-20
29
871
The above example is the output of the below query:
SELECT collision_date, SUM(severe_injury_count),SUM(injured_victims)
FROM collisions c
GROUP BY collision_date
LIMIT 50,100;
I wanted to calculate a percentage of severe_injury_count/injured_victims, I thought it would be straightforward, therefore I attempted running this query (with a few variations of how I might have calc. the % - once I noticed it wasn't giving me what I intended):
SELECT
collision_date,
SUM(severe_injury_count/injured_victims) AS chance_being_sever_injured,
SUM(severe_injury_count),
SUM(injured_victims),
(severe_injury_count/injured_victims)*100,
(SUM(severe_injury_count)/SUM(injured_victims))*100
FROM collisions c
GROUP BY collision_date;
But the output I've been given does do the calculation as I might have expected, giving me results like:
collision_date
chance_being_sever_injured
SUM(severe_injury_count)
SUM(injured_victims)
(severe_injury_count/injured_victims)*100
(SUM(severe_injury_count)/SUM(injured_victims))*100
2001-02-20
13
19
785
NULL
0
2001-02-20
5
12
697
NULL
0
2001-02-20
17
28
823
0
0
2001-02-20
18
29
871
NULL
0
I checked the variable types and they are all integers and not strings, so I would have expected to have the actual percentages calculated.
Given the output results, I believe that I'm missing something fundamental when doing this type of operation.
I also tried using FORMAT(), but the output were all zero's as well...
FORMAT((SUM(severe_injury_count)/SUM(injured_victims))*100,2)
Any insight would be much appreciated.
Thank you for your time and feedback.
Implementing suggestions, hence extending initial post:
I tried the following as well:
SELECT collision_date, SUM(severe_injury_count),SUM(injured_victims),CAST(SUM(severe_injury_count)/SUM(injured_victims) AS DECIMAL)
FROM collisions c
GROUP BY collision_date
LIMIT 50,100;
Tried also to exclude possible NULL's by:
SELECT collision_date, SUM(severe_injury_count),SUM(injured_victims),CAST(SUM(severe_injury_count)/SUM(injured_victims) AS DECIMAL)
FROM collisions c WHERE severe_injury_count IS NOT NULL OR injured_victims IS NOT NULL
GROUP BY collision_date
LIMIT 50,100;
SELECT collision_date, SUM(severe_injury_count),SUM(injured_victims),CAST(SUM(severe_injury_count)/SUM(injured_victims) AS DECIMAL)
FROM collisions c WHERE severe_injury_count > 0 OR injured_victims > 0
GROUP BY collision_date
LIMIT 50,100;
All the above alternatives give me 0 as values for the "percentage" column I'm trying to calculate.
Also attempted to coerce the type for a given column as suggested by #easleyfixed like so:
SELECT collision_date, SUM(severe_injury_count),SUM(injured_victims),CAST(SUM(CAST(severe_injury_count AS INT))/SUM(CAST(injured_victims AS INT)) AS DECIMAL)
FROM collisions c WHERE severe_injury_count > 0 OR injured_victims > 0
GROUP BY collision_date;
Expanding on #nnichols & #easleyfixed suggestions
To better illustrate the data, running:
SELECT collision_date,COUNT(*)
FROM collisions c
GROUP BY collision_date;
Gives me (represents the number of records for a given date):
collision_date
COUNT(*)
2001-01-01
1000
2001-01-02
1330
2001-01-03
1329
2001-01-04
1346
2001-01-05
1457
etc
etc
I therefore expanded the query to try and include what I'm trying to assess.
SELECT collision_date,COUNT(*),SUM(severe_injury_count),SUM(injured_victims),
SUM(severe_injury_count)/SUM(injured_victims) AS chance_being_sever_injured
FROM collisions c
GROUP BY collision_date;
Outputs:
collision_date
COUNT(*)
SUM(severe_injury_count)
SUM(injured_victims)
SUM(severe_injury_count)/SUM(injured_victims) AS chance_being_sever_injured
2001-01-01
1000
37
676
0
2001-01-02
1330
30
797
0
2001-01-03
1329
28
793
0
2001-01-04
1346
23
758
0
2001-01-05
1457
30
836
0
etc
etc
etc
etc
etc
I double checked the database types and the ones with columns are INT but the collision_date is actually set as "TEXT".
For Sh*t and giggles I did:
SELECT CAST(collision_date AS DATE),COUNT(*),SUM(severe_injury_count),SUM(injured_victims),
SUM(severe_injury_count)/SUM(injured_victims) AS chance_being_sever_injured
FROM collisions c
GROUP BY collision_date;
CAST(collision_date AS DATE)
COUNT(*)
SUM(severe_injury_count)
SUM(injured_victims)
SUM(severe_injury_count)/SUM(injured_victims) AS chance_being_sever_injured
2,001
1000
37
676
0
2,001
1330
30
797
0
2,001
1329
28
793
0
2,001
1346
23
758
0
2,001
1457
30
836
0
etc
etc
etc
etc
etc
Also attempting to coerce NULL's into 0 as also suggested.
SELECT collision_date ,COUNT(*),SUM(IFNULL(severe_injury_count,0)),SUM(IFNULL(injured_victims,0)),
SUM(IFNULL(severe_injury_count,0))/SUM(IFNULL(injured_victims,0)) AS chance_being_sever_injured
FROM collisions c
GROUP BY collision_date;
Outputs:
collision_date
COUNT(*)
SUM(severe_injury_count)
SUM(injured_victims)
SUM(severe_injury_count)/SUM(injured_victims) AS chance_being_sever_injured
2001-01-01
1000
37
676
0
2001-01-02
1330
30
797
0
2001-01-03
1329
28
793
0
2001-01-04
1346
23
758
0
2001-01-05
1457
30
836
0
etc
etc
etc
etc
etc
I'm truly baffled...
MySQL and SQLite are definitely not the same thing! I have updated the tag on your question.
Integer divide yields an integer result, truncated toward zero. docs
You need to cast to REAL or FLOAT for the division to work on SQLite:
SELECT
collision_date,
SUM(severe_injury_count),
SUM(injured_victims),
ROUND(CAST(SUM(severe_injury_count) AS REAL) / CAST(SUM(injured_victims) AS REAL) * 100, 2)
FROM collisions
GROUP BY collision_date
The NULLS observed in one of your tests were the result of division by 0 (zero).

How do I use loops for automating work in R?

I have a file with data on the delivery of products to the store.I need to calculate the total number of products in the store. I want to use the knowledge of cycles to calculate the total quantity of the product in the store, but my cycle only counts the total quantity of the last product. Why?
Here is the delivery data:
"Day" "Cott.cheese, pcs." "Kefir, pcs." "Sour cream, pcs."
1 104 117 119
2 94 114 114
3 105 107 117
4 99 112 120
5 86 104 111
6 88 110 126
7 95 106 129
I put this table in the in1 variable
Here is code:
s<-0
for (p in (2:ncol(in1))){
s<-sum(in1[,p]) }
s
Not sure I understand correctly your question but if you only want to add all values of your data.frame except for the first column (Day), you just need to do this:
sum(in1[,-1])
You are rewriting the s variable each iteration, that's why it only shows the result for the last column. Try
s<-c()
for (p in 2:ncol(in1)) {
s<-c(s,sum(in1[,p]))
}
alternatively
colSums(in1[,-1])

How to sum column based on value in another column in two dataframes?

I am trying to create a limit order book and in one of the functions I want to return a list that sums the column 'size' for the ask dataframe and the bid dataframe in the limit order book.
The output should be...
$ask
oid price size
8 a 105 100
7 o 104 292
6 r 102 194
5 k 99 71
4 q 98 166
3 m 98 88
2 j 97 132
1 n 96 375
$bid
oid price size
1 b 95 100
2 l 95 29
3 p 94 87
4 s 91 102
Total volume: 318 1418
Where the input is...
oid,side,price,size
a,S,105,100
b,B,95,100
I have a function book.total_volumes <- function(book, path) { ... } that should return total volumes.
I tried to use aggregate but struggled with the fact that it is both ask and bid in the limit order book.
I appreciate any help, I am clearly a complete beginner. Only hear to learn :)
If there is anything more I can add to this question so is more clear feel free to leave a comment!

Combining Two Rows with Different Levels according to Some Conditions into One in R

This is a part of my data: (The actual data contains about 10,000 observations with about 500 levels of SalesItem)
s1<-c('1008','1009','1012','1013','1016','1017','1018','1019','1054','1055')
s2<-c(155,153,154,150,176,165,159,143,179,150)
S<-data.frame(SalesItem=factor(s1), Sales=s2)
> str(S)
'data.frame': 10 obs. of 2 variables:
$ SalesItem: Factor w/ 10 levels "1008","1009",..: 1 2 3 4 5 6 7 8 9 10
$ Sales : num 155 153 154 150 176 165 159 143 179 150`
What I want to do is, if diff(SalesItem)=1, I want to combine the level of SalesItem into 1, for example: diff between SalesItem 1008 and 1009 equal to one, so, I want to rename SalesItem 1009 to 1008. So, later I can compute the sum of Sales for this SalesItem as one, because of my actual data=10,000, so, it is quite hard for me to do this one by one.
Is there any simplest way for me to do that?
Clearly the fact that you have converted the first column to a factor indicates that you might need those factors in some place. so i would suggest that instead of changing any of the columns, add a third column to your data frame which will help you maintain the SalesItem relevant to that value. here are the steps for it :
> s1<-c('1008','1009','1012','1013','1016','1017','1018','1019','1054','1055')
> s2<-c(155,153,154,150,176,165,159,143,179,150)
> s1 = as.integer(s1)
> s3 = ifelse((s1-1) %in% s1, s1-1, s1)
> S <- data.frame(SalesItem=s1, Sales=s2, ItemId=s3)
then you can just count on the basis of the ItemId column.
This is not a terribly efficient solution, but since your data only contains 10000 records, it is not going to be a big problem.
Set up provided example data, but convert the SalesItem field to an integer so that the diff() operation makes sense.
> s1<-c('1008','1009','1012','1013','1016','1017','1018','1019','1054','1055')
> s2<-c(155,153,154,150,176,165,159,143,179,150)
> s1 = as.integer(s1)
> S<-data.frame(SalesItem=s1, Sales=s2)
Reorder data frame so that the SalesItem field is in ascending order (not necessary for current data set, but required for solution) then find the differences.
> S = S[order(S$SalesItem),]
> d = c(0, diff(S$SalesItem))
Duplicate the SalesItem data and then filter based on the values of the differences.
> labels = s1
> #
> for (n in 1:nrow(S)) {if (d[n] == 1) labels[n] = labels[n-1]}
> S$labels = labels
The (temporary) labels field now has the required new values for the SalesItem field. Once you are happy that this is doing the right thing, you can modify last line in above code to simply over-write the existing SalesItem field.
> S
SalesItem Sales labels
1 1008 155 1008
2 1009 153 1008
3 1012 154 1012
4 1013 150 1012
5 1016 176 1016
6 1017 165 1016
7 1018 159 1016
8 1019 143 1016
9 1054 179 1054
10 1055 150 1054

Does Gray code exist for other bases than two?

Just a matter of curiosity, is the Gray code defined for bases other than base two?
I tried to count in base 3, writing consecutive values paying attention to change only one trit at a time. I've been able to enumerate all the values up to 26 (3**3-1) and it seems to work.
000 122 200
001 121 201
002 120 202
012 110 212
011 111 211
010 112 210
020 102 220
021 101 221
022 100 222
The only issue I can see, is that all three trits change when looping back to zero. But this is only true for odd bases. When using even bases looping back to zero would only change a single digit, as in binary.
I even guess it can be extended to other bases, even decimal. This could lead to another ordering when counting in base ten ... :-)
0 1 2 3 4 5 6 7 8 9 19 18 17 16 15 14 13 12 11 10
20 21 22 23 24 25 26 27 28 29 39 38 37 36 35 34 33 32 31 30
Now the question, has anyone ever heard of it? Is there an application for it? Or it is just mathematical frenzy?
Yes. Have a look at the Gray code article at wikipedia. It has a section on n-ary Gray Code.
There are many specialized types of Gray codes other than the binary-reflected Gray code. One such type of Gray code is the n-ary Gray code, also known as a non-Boolean Gray code. As the name implies, this type of Gray code uses non-Boolean values in its encodings.
Just for completeness (as aioobe already gave the right answer), here's a C++ program that lists all the 168 2-digit gray codes for base 3 that start with 00 and marks the 96 cyclic ones. Using the algorithm from Wikipedia, you can construct longer Gray codes easily for even bases. For uneven bases, you can change the program to generate according Gray codes.
The first cyclic 2-digit gray code found with this program is this one:
00 01 02 12 10 11 21 22 20
After changing the program, the first cyclic 3-digit gray found is this:
000 001 002 012 010 011 021 020 022 122 102 100 101 111
110 112 212 202 222 220 120 121 221 201 211 210 200
Code:
#include <stdio.h>
#include <stdlib.h>
// Highest number using two trits
#define MAXN 9
int gray_code_count, cyclic_count;
bool changes_one_trit(int code1, int code2) {
int trits_changed = 0;
if ((code1 / 3) != (code2 / 3)) trits_changed++;
if ((code1 % 3) != (code2 % 3)) trits_changed++;
return (trits_changed == 1);
}
int generate_gray_code(int* code, int depth) {
bool already_used;
if (depth == MAXN) {
for (int i = 0; i < MAXN; i++) {
printf("%i%i ", code[i]/3, code[i]%3);
}
// check if cyclic
if (changes_one_trit(code[MAXN-1], 0)) {
printf("cyclic");
cyclic_count++;
}
printf("\n");
gray_code_count++;
}
// Iterate through the codes that only change one trit
for (int i = 0; i < MAXN; i++) {
// Check if it was used already
already_used = false;
for (int j = 0; j < depth; j++) {
if (code[j] == i) already_used = true;
}
if (already_used) continue;
if (changes_one_trit(code[depth-1], i)) {
code[depth] = i;
generate_gray_code(code, depth + 1);
}
}
}
int main() {
int* code = (int*)malloc(MAXN * sizeof(int));
code[0] = 0;
gray_code_count = 0;
generate_gray_code(code, 1);
printf("%i gray codes found, %i of them are cyclic\n", gray_code_count, cyclic_count);
free(code);
}

Resources