Is there an sqlite function that can check if a field matches a certain value and return 0 or 1?

Is there an sqlite function that can check if a field matches a certain value and return 0 or 1? - sqlite

Consider the following sqlite3 table:
+------+------+
| col1 | col2 |
+------+------+
| 1 | 200 |
| 1 | 200 |
| 1 | 100 |
| 1 | 200 |
| 2 | 400 |
| 2 | 200 |
| 2 | 100 |
| 3 | 200 |
| 3 | 200 |
| 3 | 100 |
+------+------+
I'm trying to write a query that will select the entire table and return 1 if the value in col2 is 200, and 0 otherwise. For example:
+------+--------------------+
| col1 | SOMEFUNCTION(col2) |
+------+--------------------+
| 1 | 1 |
| 1 | 1 |
| 1 | 0 |
| 1 | 1 |
| 2 | 0 |
| 2 | 1 |
| 2 | 0 |
| 3 | 1 |
| 3 | 1 |
| 3 | 0 |
+------+--------------------+
What is SOMEFUNCTION()?
Thanks in advance...

In SQLite, boolean values are just integer values 0 and 1, so you can use the comparison directly:
SELECT col1, col2 = 200 AS SomeFunction FROM MyTable

Like described in Does sqlite support any kind of IF(condition) statement in a select you can use the case keyword.
SELECT col1,CASE WHEN col2=200 THEN 1 ELSE 0 END AS col2 FROM table1

Related

Sqlite count occurence per year

So let's say I have a table in my Sqlite database with some information about some files, with the following structure:
| id | file format | creation date |
----------------------------------------------------------
| 1 | Word | 2010:02:12 13:31:33+01:00 |
| 2 | PSD | 2021:02:23 15:44:51+01:00 |
| 3 | Word | 2019:02:13 14:18:11+01:00 |
| 4 | Word | 2010:02:12 13:31:20+01:00 |
| 5 | Word | 2003:05:25 18:55:10+02:00 |
| 6 | PSD | 2014:07:20 20:55:58+02:00 |
| 7 | Word | 2014:07:20 21:09:24+02:00 |
| 8 | TIFF | 2011:03:30 11:56:56+02:00 |
| 9 | PSD | 2015:07:15 14:34:36+02:00 |
| 10 | PSD | 2009:08:29 11:25:57+02:00 |
| 11 | Word | 2003:05:25 20:06:18+02:00 |
I would like results that show me a chronology of how many of each file format were created in a given year – something along the lines of this:
|Format| 2003 | 2009 | 2010 | 2011 | 2014 | 2015 | 2019 | 2021 |
----------------------------------------------------------------
| Word | 2 | 0 | 0 | 2 | 0 | 0 | 2 | 0 |
| PSD | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 |
| TIFF | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
I've gotten kinda close (I think) with this, but am stuck:
SELECT
file_format,
COUNT(CASE file_format WHEN creation_date LIKE '%2010%' THEN 1 ELSE 0 END),
COUNT(CASE file_format WHEN creation_date LIKE '%2011%' THEN 1 ELSE 0 END),
COUNT(CASE file_format WHEN creation_date LIKE '%2012%' THEN 1 ELSE 0 END)
FROM
fileinfo
GROUP BY
file_format;
When I do this I am getting unique amounts for each file format, but the same count for every year…
|Format| 2010 | 2011 | 2012 |
-----------------------------
| Word | 4 | 4 | 4 |
| PSD | 1 | 1 | 1 |
| TIFF | 6 | 6 | 6 |
Why am I getting that incorrect tally, and moreover, is there a smarter way of querying that doesn't rely on the year being statically searched for as a string for every single year? If it helps, the column headers and row headers could be switched – doesn't matter to me. Please help a n00b :(

Use SUM() aggregate function for conditional aggregation:
SELECT file_format,
SUM(creation_date LIKE '2010%') AS `2010`,
SUM(creation_date LIKE '2011%') AS `2011`,
..........................................
FROM fileinfo
GROUP BY file_format;
See the demo.

How do you assign groups to larger groups dpylr

I would like to assign groups to larger groups in order to assign them to cores for processing. I have 16 cores.This is what I have so far
test<-data_extract%>%group_by(group_id)%>%sample_n(16,replace = TRUE)
This takes staples OF 16 from each group.
This is an example of what I would like the final product to look like (with two clusters),all I really want is for the same group id to belong to the same cluster as a set number of clusters
________________________________
balance | group_id | cluster|
454452 | a | 1 |
5450441 | a | 1 |
5444531 | b | 1 |
5404051 | b | 1 |
5404501 | b | 1 |
5404041 | b | 1 |
544251 | b | 1 |
254252 | b | 1 |
541254 | c | 2 |
54123254 | d | 1 |
542541 | d | 1 |
5442341 | e | 2 |
541 | f | 1 |
________________________________

test<-data%>%group_by(group_id)%>% mutate(group = sample(1:16,1))

How to remove empty cells and reduce columns

I have a table, that looks roughly like this:
| variable | observer1 | observer2 | observer3 | final |
| -------- | --------- | --------- | --------- | ----- |
| case1 | | | | |
| var1 | 1 | 1 | | |
| var2 | 3 | 3 | | |
| var3 | 4 | 5 | | 5 |
| case2 | | | | |
| var1 | 2 | | 2 | |
| var2 | 5 | | 5 | |
| var3 | 1 | | 1 | |
| case3 | | | | |
| var1 | | 2 | 3 | 2 |
| var2 | | 2 | 2 | |
| var3 | | 1 | 1 | |
| case4 | | | | |
| var1 | 1 | | 1 | |
| var2 | 5 | | 5 | |
| var3 | 3 | | 3 | |
Three colums for the observers, but only two are filled.
First I want to compute the IRR, so I need a table that has two columns without the empty cells like this:
| variable | observer1 | observer2 |
| -------- | --------- | --------- |
| case1 | | |
| var1 | 1 | 1 |
| var2 | 3 | 3 |
| var3 | 4 | 5 |
| case2 | | |
| var1 | 2 | 2 |
| var2 | 5 | 5 |
| var3 | 1 | 1 |
| case3 | | |
| var1 | 2 | 3 |
| var2 | 2 | 2 |
| var3 | 1 | 1 |
| case4 | | |
| var1 | 1 | 1 |
| var2 | 5 | 5 |
| var3 | 3 | 3 |
I try to use the tidyverse packages, but I'm not sure. Some 'ifelse()' magic may be easier.
Is there a clean and easy method to do something like this? Can anybody point me to the right function to use? Or just to a keyword to search for on stackoverflow? I found a lot of methods to remove whole empty columns or rows.
Edit: I removed the link to the original data. It was unnecessary. Thanks to Lamia for his working answer.

Out of your 3 columns observer1, observer2 and observer3, you sometimes have 2 non-NA values, 1 non-NA value, or 3 NA values.
If you want to merge your 3 columns, you could do:
res = data.frame(df$coding,t(apply(df[paste0("observer",1:3)],1,function(x) x[!is.na(x)][1:2])))
The apply function will return for each row the 2 non-NA values if there are 2, one non-NA value and one NA if there is only one value, and two NAs if there is no data in the row.
We then put this result in a dataframe with the first column (coding).

How to subset a dataframe using a column from another dataframe in r?

I have 2 dataframes
Dataframe1:
| Cue | Ass_word | Condition | Freq | Cue_Ass_word |
1 | ACCENDERE | ACCENDINO | A | 1 | ACCENDERE_ACCENDINO
2 | ACCENDERE | ALLETTARE | A | 0 | ACCENDERE_ALLETTARE
3 | ACCENDERE | APRIRE | A | 1 | ACCENDERE_APRIRE
4 | ACCENDERE | ASCENDERE | A | 1 | ACCENDERE_ASCENDERE
5 | ACCENDERE | ATTIVARE | A | 0 | ACCENDERE_ATTIVARE
6 | ACCENDERE | AUTO | A | 0 | ACCENDERE_AUTO
7 | ACCENDERE | ACCENDINO | B | 2 | ACCENDERE_ACCENDINO
8 | ACCENDERE| ALLETTARE | B | 3 | ACCENDERE_ALLETTARE
9 | ACCENDERE| ACCENDINO | C | 2 | ACCENDERE_ACCENDINO
10 | ACCENDERE| ALLETTARE | C | 0 | ACCENDERE_ALLETTARE
Dataframe2:
| Group.1 | x
1 | ACCENDERE_ACCENDINO | 5
13 | ACCENDERE_FUOCO | 22
16 | ACCENDERE_LUCE | 10
24 | ACCENDERE_SIGARETTA | 6
....
I want to exclude from Dataframe1 all the rows that contain words (Cue_Ass_word) that are not reported in the column Group.1 in Dataframe2.
In other words, how can I subset Dataframe1 using the strings reported in Dataframe2$Group.1?

It's not quite clear what you mean, but is this what you need?
Dataframe1[!(Dataframe1$Cue_Ass_word %in% Dataframe2$Group1),]

Query performance - 'Left join is null' vs 'Not exists select'

I have a question about a query that I want to execute, but I dont know what is the best qua performance. I need to get all the words exclude the words that have a relation with the table wordfilter.
The output of the queries is right, but maybe there is a better solution for this. I have almost none knowledge about query plans, I'm trying to understand it now.
SELECT CONCAT(SPACE(1), UCASE(stocknews.word.word), SPACE(1)) AS word, stocknews.word.language
FROM stocknews.word
WHERE NOT EXISTS (SELECT word_id FROM stocknews.wordfilter WHERE stocknews.word.id = word_id)
AND user_id = 1
+----+--------------+------------+-------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | extra |
+----+--------------+------------+-------+---------------+---------+---------+-------+------+-------------+
| 1 | PRIMARY | word | ref | user_id | user_id | 4 | const | 843 | Using where |
| 2 | MATERIALIZED | wordfilter | index | PRIMARY | PRIMARY | 756 | | 16 | Using index |
+----+--------------+------------+-------+---------------+---------+---------+-------+------+-------------+
Against
SELECT CONCAT(SPACE(1), UCASE(stocknews.word.word), SPACE(1)) AS word, stocknews.word.language
FROM stocknews.word
LEFT JOIN stocknews.wordfilter ON stocknews.word.id = stocknews.wordfilter.word_id
WHERE stocknews.wordfilter.word_id IS NULL AND user_id = 1
+----+-------------+------------+------+---------------+---------+---------+---------+------+--------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | extra |
+----+-------------+------------+------+---------------+---------+---------+---------+------+--------------------------------------+
| 1 | SIMPLE | word | ref | user_id | user_id | 4 | const | 843 | |
| 1 | SIMPLE | wordfilter | ref | PRIMARY | PRIMARY | 4 | word.id | 1 | Using where; Using index; Not exists |
+----+-------------+------------+------+---------------+---------+---------+---------+------+--------------------------------------+
Any help is welcome! An explanation would be nice.
Edit:
For query 1:
+----------------------------+-------+
| Variable_name | Value |
+----------------------------+-------+
| Handler_commit | 1 |
| Handler_delete | 0 |
| Handler_discover | 0 |
| Handler_external_lock | 0 |
| Handler_icp_attempts | 0 |
| Handler_icp_match | 0 |
| Handler_mrr_init | 0 |
| Handler_mrr_key_refills | 0 |
| Handler_mrr_rowid_refills | 0 |
| Handler_prepare | 0 |
| Handler_read_first | 1 |
| Handler_read_key | 1044 |
| Handler_read_last | 0 |
| Handler_read_next | 859 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 0 |
| Handler_read_rnd_deleted | 0 |
| Handler_read_rnd_next | 0 |
| Handler_rollback | 0 |
| Handler_savepoint | 0 |
| Handler_savepoint_rollback | 0 |
| Handler_tmp_update | 0 |
| Handler_tmp_write | 215 |
| Handler_update | 0 |
| Handler_write | 0 |
+----------------------------+-------+
25 rows in set (0.00 sec)
For query 2:
+----------------------------+-------+
| Variable_name | Value |
+----------------------------+-------+
| Handler_commit | 1 |
| Handler_delete | 0 |
| Handler_discover | 0 |
| Handler_external_lock | 0 |
| Handler_icp_attempts | 0 |
| Handler_icp_match | 0 |
| Handler_mrr_init | 0 |
| Handler_mrr_key_refills | 0 |
| Handler_mrr_rowid_refills | 0 |
| Handler_prepare | 0 |
| Handler_read_first | 0 |
| Handler_read_key | 844 |
| Handler_read_last | 0 |
| Handler_read_next | 843 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 0 |
| Handler_read_rnd_deleted | 0 |
| Handler_read_rnd_next | 0 |
| Handler_rollback | 0 |
| Handler_savepoint | 0 |
| Handler_savepoint_rollback | 0 |
| Handler_tmp_update | 0 |
| Handler_tmp_write | 0 |
| Handler_update | 0 |
| Handler_write | 0 |
+----------------------------+-------+

It seems to be a close race between the two formulations. (Some other example may show a clearer winner.)
From the HANDLER values: Query 1 did more read_keys, and some writing (which goes along with MATERIALIZED). The other numbers were about same. So, I conclude that Query 1 is slower -- although possibly not enough slower to make much difference.
I vote for LEFT JOIN as the better query pattern (in this case)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Is there an sqlite function that can check if a field matches a certain value and return 0 or 1? - sqlite

In SQLite, boolean values are just integer values 0 and 1, so you can use the comparison directly: SELECT col1, col2 = 200 AS SomeFunction FROM MyTable

Like described in Does sqlite support any kind of IF(condition) statement in a select you can use the case keyword. SELECT col1,CASE WHEN col2=200 THEN 1 ELSE 0 END AS col2 FROM table1

Related

Sqlite count occurence per year

How do you assign groups to larger groups dpylr

How to remove empty cells and reduce columns

How to subset a dataframe using a column from another dataframe in r?

Query performance - 'Left join is null' vs 'Not exists select'

Categories

Resources