Select values from one table, count common values from other table, show 0 if no common values - sqlite

I have two tables that are something as follows:
WORKDAYS
DATE | WORKDAY_LENGHT |
-----------+----------------+
12-05-2018 | 8 |
13-05-2018 | 6.5 |
14-05-2018 | 7.5 |
15-05-2018 | 8 |
ACCIDENTS
TOD | SEVERITY |
-----------------+-----------+
12-05-2018 12:00 | minor |
12-05-2018 15:00 | minor |
13-05-2018 08:00 | severe |
13-05-2018 12:00 | severe |
14-05-2018 10:30 | severe |
And I need a result that is as follows:
WORKDAYS
DATE | WORKDAY_LENGHT | ACCIDENTS_COUNT|
-----------+----------------+----------------+
12-05-2018 | 8 | 2 |
13-05-2018 | 6.5 | 2 |
14-05-2018 | 7.5 | 1 |
15-05-2018 | 8 | 0 |
What I so far have tried is this:
SELECT DISTINCT
w.date,
(
SELECT
COUNT(*)
FROM
accidents a
WHERE
date(w.date) = date(a.tod)
)
AS accidents_count
FROM
workdays w
Which gives me an answer that is somewhat in the right direction. Something like this:
WORKDAYS
DATE | WORKDAY_LENGHT | ACCIDENTS_COUNT|
-----------+----------------+----------------+
12-05-2018 | 8 | 1 |
12-05-2018 | 8 | 1 |
13-05-2018 | 6.5 | 1 |
13-05-2018 | 6.5 | 1 |
14-05-2018 | 7.5 | 1 |
15-05-2018 | 8 | 0 |
This is sqlite, so the date values are stored as strings. The date function therefore should make them just dates, right? Or is that the one causing problems?

I was missing a group by and feel ashamed for opening a question before figuring this out.
adding GROUP BY date(w.date) is the solution here.

Related

Sqlite count occurence per year

So let's say I have a table in my Sqlite database with some information about some files, with the following structure:
| id | file format | creation date |
----------------------------------------------------------
| 1 | Word | 2010:02:12 13:31:33+01:00 |
| 2 | PSD | 2021:02:23 15:44:51+01:00 |
| 3 | Word | 2019:02:13 14:18:11+01:00 |
| 4 | Word | 2010:02:12 13:31:20+01:00 |
| 5 | Word | 2003:05:25 18:55:10+02:00 |
| 6 | PSD | 2014:07:20 20:55:58+02:00 |
| 7 | Word | 2014:07:20 21:09:24+02:00 |
| 8 | TIFF | 2011:03:30 11:56:56+02:00 |
| 9 | PSD | 2015:07:15 14:34:36+02:00 |
| 10 | PSD | 2009:08:29 11:25:57+02:00 |
| 11 | Word | 2003:05:25 20:06:18+02:00 |
I would like results that show me a chronology of how many of each file format were created in a given year – something along the lines of this:
|Format| 2003 | 2009 | 2010 | 2011 | 2014 | 2015 | 2019 | 2021 |
----------------------------------------------------------------
| Word | 2 | 0 | 0 | 2 | 0 | 0 | 2 | 0 |
| PSD | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 |
| TIFF | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
I've gotten kinda close (I think) with this, but am stuck:
SELECT
file_format,
COUNT(CASE file_format WHEN creation_date LIKE '%2010%' THEN 1 ELSE 0 END),
COUNT(CASE file_format WHEN creation_date LIKE '%2011%' THEN 1 ELSE 0 END),
COUNT(CASE file_format WHEN creation_date LIKE '%2012%' THEN 1 ELSE 0 END)
FROM
fileinfo
GROUP BY
file_format;
When I do this I am getting unique amounts for each file format, but the same count for every year…
|Format| 2010 | 2011 | 2012 |
-----------------------------
| Word | 4 | 4 | 4 |
| PSD | 1 | 1 | 1 |
| TIFF | 6 | 6 | 6 |
Why am I getting that incorrect tally, and moreover, is there a smarter way of querying that doesn't rely on the year being statically searched for as a string for every single year? If it helps, the column headers and row headers could be switched – doesn't matter to me. Please help a n00b :(
Use SUM() aggregate function for conditional aggregation:
SELECT file_format,
SUM(creation_date LIKE '2010%') AS `2010`,
SUM(creation_date LIKE '2011%') AS `2011`,
..........................................
FROM fileinfo
GROUP BY file_format;
See the demo.

Master-Detail show data SQL

I'm working with SQL Server and I have this 3 tables
STUDENTS
| id | student |
-------------
| 1 | Ronald |
| 2 | Jenny |
SCORES
| id | score | period | student |
| 1 | 8 | 1 | 1 |
| 2 | 9 | 2 | 1 |
PERIODS
| id | period |
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
And I want a query that returns this result:
| student | score1 | score2 | score3 | score4 |
| Ronald | 8 | 9 | null | null |
| Jenny | null | null | null | null |
As you can see, the number of scores depends of the periods because sometimes it can be 4 o 3 periods.
I don't know if I have the wrong idea or should I make this in the application, but I want some help.
You need to PIVOT your data e.g.
select Y.Student, [1], [2], [3], [4]
from (
select T.Student, P.[Period], S.Score
from Students T
cross join [Periods] P
left join Scores S on S.[Period] = P.id and S.Student = T.id
) X
pivot
(
sum(Score)
for [Period] in ([1],[2],[3],[4])
) Y
Reference: https://learn.microsoft.com/en-us/sql/t-sql/queries/from-using-pivot-and-unpivot?view=sql-server-20

Monthly Correlation for 19 variables

I have the following dataset with 21 columns - 19 variables and Month and Date as date type columns.
The aim is to analyze how correlation change over time calculating a daily correlation between variables summarized in one month. For example, see this "monthly correlation" over time. (X-axis as month type)
+------------+---------+-----+-----+--------+---------+-------------+
| Date | Month | AOV | ASP | Clicks | Traffic | Impressions |
+------------+---------+-----+-----+--------+---------+-------------+
| 2017-01-01 | 2017-01 | 50 | 6 | 700 | 10000 | 4500 |
+------------+---------+-----+-----+--------+---------+-------------+
| 2017-01-02 | 2017-01 | 55 | 7 | 800 | 20000 | 4600 |
+------------+---------+-----+-----+--------+---------+-------------+
| 2017-02 | 2017-02 | 58 | 8 | 700 | 4599 | 2300 |
+------------+---------+-----+-----+--------+---------+-------------+
At the moment I have the following code but I only can compare two variables at the same time
ddply(corr,"Month",summarise,corr=cor(AOV,ASP))
I get the table below
+---------+------------+
| Month | corr |
+---------+------------+
| 2017-1 | 0.4958738 |
+---------+------------+
| 2017-10 | 0.8527522 |
+---------+------------+
| 2017-11 | -0.2751771 |
+---------+------------+
| 2017-12 | NA |
+---------+------------+
| 2017-2 | 0.6596346 |
+---------+------------+
| 2017-3 | 0.6399969 |
+---------+------------+
| 2017-4 | 0.7926245 |
+---------+------------+
| 2017-5 | 0.6429613 |
+---------+------------+
| 2017-6 | 0.3824414 |
+---------+------------+
| 2017-7 | 0.9154873 |
+---------+------------+
| 2017-8 | 0.7235767 |
+---------+------------+
| 2017-9 | 0.8264006 |
+---------+------------+
I have been using combn to create the combinations set but I'm not quite sure how to use it with ddply. I get 171 combinations in pairs.
combn(corr,2,simplify = F)
You can just do:
cor(your_data_frame)

How to get a query result into a key value form in HiveQL

I have tried different things, but none succeeded. I have the following issue, and would be very gratefull if someone could help me.
I get the data from a view as several billions of records, for different measures
A)
| s_c_m1 | s_c_m2 | s_c_m3 | s_c_m4 | s_p_m1 | s_p_m2 | s_p_m3 | s_p_m4 |
|--------+--------+--------+--------+--------+--------+--------+--------|
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
|--------+--------+--------+--------+--------+--------+--------+--------|
Then I need to aggregate it by each measure. And so long so fine. I got this figured out.
B)
| s_c_m1 | s_c_m2 | s_c_m3 | s_c_m4 | s_p_m1 | s_p_m2 | s_p_m3 | s_p_m4 |
|--------+--------+--------+--------+--------+--------+--------+--------|
| 3 | 6 | 9 | 12 | 15 | 18 | 21 | 24 |
|--------+--------+--------+--------+--------+--------+--------+--------|
Then I need to get the data in the following form. I need to turn it into a key-value form.
C)
| measure | c | p |
|---------+----+----|
| m1 | 3 | 15 |
| m2 | 6 | 18 |
| m3 | 9 | 21 |
| m4 | 12 | 24 |
|---------+----+----|
The first 4 columns from B) would form in C) the first column, and the second 4 columns would form another column.
Is there an elegant way, that could be easily maintainable? The perfect solution would be if another measure would be introduced in A) and B), there no modification would be required and it would automatically pick up the difference.
I know how to get this done in SqlServer and Postgres, but here I am missing the expirience.
I think you should use map for this

Calculation of Battery Consumption of each running mobile application

Is it possible to find out how much each mobile application consumes the battery per day (using R language) , where I have data collection of the following fields
record_id ,
date_time,
application_name,
battery_level,
battery_status
battery_level (It is a number represents the available percentage of the battery)
battery_status ( status of the battery : charging , discharging , full)
This calculation is based on the collected data.
example of such data :
+-----------+------------------+---------------------+---------------+----------------+
| record_id | application_name | date_time | battery_level | battery_status |
+-----------+------------------+---------------------+---------------+----------------+
| 473849 | viber | 2015-09-01 21:34:01 | 7 | Charging |
| 473850 | watsup | 2015-09-01 21:34:01 | 7 | Charging |
| 473851 | AccuWeather | 2015-09-01 21:34:01 | 7 | Charging |
+-----------+------------------+---------------------+---------------+----------------+
as I understood that it is not possible to calculate battery Consumption of
each running mobile application using data collected in my first post.
Let us have another data collection .
assuming that we have the following data ,
cpu usage per each running application and
memory usage per each running application
as the following
+-----------+------------------+---------------------+---------------------------------+------------------------------------+
| record_id | application_name | date_time | cpu_usage_per_app_in_percentage | memory_usage_per_app_in_percentage |
+-----------+------------------+---------------------+---------------------------------+------------------------------------+
| 473849 | viber | 2015-09-06 19:23:13 | 5 | 2 |
| 473850 | watsup | 2015-09-06 19:23:13 | 9 | 2 |
| 473851 | AccuWeather | 2015-09-06 19:23:13 | 8 | 4 |
| 473980 | viber | 2015-09-06 19:23:14 | 4 | 1 |
| 474254 | watsup | 2015-09-06 19:23:14 | 9 | 1 |
| 474323 | AccuWeather | 2015-09-06 19:23:14 | 9 | 2 |
| 474533 | viber | 2015-09-06 19:23:15 | 5 | 2 |
| 474536 | watsup | 2015-09-06 19:23:15 | 8 | 3 |
| 474537 | AccuWeather | 2015-09-06 19:23:15 | 5 | 3 |
| 474538 | calendar | 2015-09-06 19:23:15 | 7 | 3 |
+-----------+------------------+---------------------+---------------------------------+------------------------------------+
you can suggest any other way of data collection , the key question is that is it possible to make calculation of Battery Consumption of earch running mobile application ? if so how and what the data to be collected?

Resources