Is there a way to show partitions on Cloudera impala? - cloudera

Normally, I can do show partitions <table> in hive. But when it is a parquet table, hive does not understand it. I can go to hdfs and check the dir structure, but that is not ideal. Is there any better way to do that?

I am using Impala 1.4.0 and I can see partitions.
From the impala-shell give the command:
show partitions <mytablename>
I have something looking like this:
+-------+-------+-----+-------+--------+---------+--------------+---------+
| year | month | day | #Rows | #Files | Size | Bytes Cached | Format |
+-------+-------+-----+-------+--------+---------+--------------+---------+
| 2013 | 11 | 1 | -1 | 3 | 25.87MB | NOT CACHED | PARQUET |
| 2013 | 11 | 2 | -1 | 3 | 24.84MB | NOT CACHED | PARQUET |
| 2013 | 11 | 3 | -1 | 2 | 19.05MB | NOT CACHED | PARQUET |
| 2013 | 11 | 4 | -1 | 3 | 23.63MB | NOT CACHED | PARQUET |
| 2013 | 11 | 5 | -1 | 3 | 26.56MB | NOT CACHED | PARQUET |
Alternatively you can go to your table in HDFS . They are normally seen in this path:
/user/hivestore/warehouse/<mytablename> or
/user/hive/warehouse/<mytablename>

Unfortunately no. Issue is open though. So checking it manually seems to be the only option right now.

Related

Sqlite count occurence per year

So let's say I have a table in my Sqlite database with some information about some files, with the following structure:
| id | file format | creation date |
----------------------------------------------------------
| 1 | Word | 2010:02:12 13:31:33+01:00 |
| 2 | PSD | 2021:02:23 15:44:51+01:00 |
| 3 | Word | 2019:02:13 14:18:11+01:00 |
| 4 | Word | 2010:02:12 13:31:20+01:00 |
| 5 | Word | 2003:05:25 18:55:10+02:00 |
| 6 | PSD | 2014:07:20 20:55:58+02:00 |
| 7 | Word | 2014:07:20 21:09:24+02:00 |
| 8 | TIFF | 2011:03:30 11:56:56+02:00 |
| 9 | PSD | 2015:07:15 14:34:36+02:00 |
| 10 | PSD | 2009:08:29 11:25:57+02:00 |
| 11 | Word | 2003:05:25 20:06:18+02:00 |
I would like results that show me a chronology of how many of each file format were created in a given year – something along the lines of this:
|Format| 2003 | 2009 | 2010 | 2011 | 2014 | 2015 | 2019 | 2021 |
----------------------------------------------------------------
| Word | 2 | 0 | 0 | 2 | 0 | 0 | 2 | 0 |
| PSD | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 |
| TIFF | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
I've gotten kinda close (I think) with this, but am stuck:
SELECT
file_format,
COUNT(CASE file_format WHEN creation_date LIKE '%2010%' THEN 1 ELSE 0 END),
COUNT(CASE file_format WHEN creation_date LIKE '%2011%' THEN 1 ELSE 0 END),
COUNT(CASE file_format WHEN creation_date LIKE '%2012%' THEN 1 ELSE 0 END)
FROM
fileinfo
GROUP BY
file_format;
When I do this I am getting unique amounts for each file format, but the same count for every year…
|Format| 2010 | 2011 | 2012 |
-----------------------------
| Word | 4 | 4 | 4 |
| PSD | 1 | 1 | 1 |
| TIFF | 6 | 6 | 6 |
Why am I getting that incorrect tally, and moreover, is there a smarter way of querying that doesn't rely on the year being statically searched for as a string for every single year? If it helps, the column headers and row headers could be switched – doesn't matter to me. Please help a n00b :(
Use SUM() aggregate function for conditional aggregation:
SELECT file_format,
SUM(creation_date LIKE '2010%') AS `2010`,
SUM(creation_date LIKE '2011%') AS `2011`,
..........................................
FROM fileinfo
GROUP BY file_format;
See the demo.

Get memory, cpu and disk usage for each tenant in Openstack

I am looking for the CPU, Memory and Disk consumption for each Tenant in Openstack,and their relationship by users, instances, flavors in use. Horizon only shows utilization of memory, cpu of a global way. Is it possible to get it with Openstack commands?
My openstack is based on Rocky.
Any ideas will be really appreciated
The only thing I know is
openstack limits show --absolute --project <Project_ID/Tenant_ID>
see also https://docs.openstack.org/python-openstackclient/pike/cli/command-objects/limits.html
In the output you have information like for example totalCoresUsed, which represents the number of cores, which are used by the selected project.
Example:
root#openstack-controller:~# openstack limits show --absolute --project 416f937f505f4ff6b623c48a61228a86
+--------------------------+-------+
| Name | Value |
+--------------------------+-------+
| maxTotalInstances | 10 |
| maxTotalCores | 20 |
| maxTotalRAMSize | 51200 |
| maxSecurityGroups | 10 |
| maxTotalFloatingIps | 10 |
| maxServerMeta | 128 |
| maxImageMeta | 128 |
| maxPersonality | 5 |
| maxPersonalitySize | 10240 |
| maxSecurityGroupRules | 20 |
| maxTotalKeypairs | 100 |
| maxServerGroups | 10 |
| maxServerGroupMembers | 10 |
| totalRAMUsed | 2560 |
| totalCoresUsed | 7 |
| totalInstancesUsed | 7 |
| totalFloatingIpsUsed | 0 |
| totalSecurityGroupsUsed | 1 |
| totalServerGroupsUsed | 0 |
| maxTotalVolumes | 10 |
| maxTotalSnapshots | 10 |
| maxTotalVolumeGigabytes | 1000 |
| maxTotalBackups | 10 |
| maxTotalBackupGigabytes | 1000 |
| totalVolumesUsed | 5 |
| totalGigabytesUsed | 7 |
| totalSnapshotsUsed | 0 |
| totalBackupsUsed | 0 |
| totalBackupGigabytesUsed | 0 |
+--------------------------+-------+
The quotas and so the limitations are bind to projects and not to users, so I don't know if it is possible to get a relationshit by users. The only idea I would have, would a simple bash-script, which iterates over all instances and volumes of a project and collect the information of each ressource by the user, who created it.
Update 30.7.2020:
Found a better solution now, which also allows to get the resource usage per user of a project. It comes with the new placement-component with the stein-release of openstack (tested in train-release of openstack).
Installation of the openstack-client extension: pip install osc-placement
Ressource-usage of a project:
openstack resource usage show --os-placement-api-version 1.9 <PROJECT_ID>
Ressource-usage of a specific user within a project:
openstack resource usage show --os-placement-api-version 1.9 --user-id <USER_ID> <PROJECT_ID>
Example:
openstack resource usage show --os-placement-api-version 1.9 --user-id 98378bd3cdd94218bf7b6ef4ec80e74a 7733616a513444c2a106243db318b0dd
+----------------+-------+
| resource_class | usage |
+----------------+-------+
| VCPU | 3 |
| MEMORY_MB | 768 |
| DISK_GB | 9 |
+----------------+-------+

How to get a query result into a key value form in HiveQL

I have tried different things, but none succeeded. I have the following issue, and would be very gratefull if someone could help me.
I get the data from a view as several billions of records, for different measures
A)
| s_c_m1 | s_c_m2 | s_c_m3 | s_c_m4 | s_p_m1 | s_p_m2 | s_p_m3 | s_p_m4 |
|--------+--------+--------+--------+--------+--------+--------+--------|
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
|--------+--------+--------+--------+--------+--------+--------+--------|
Then I need to aggregate it by each measure. And so long so fine. I got this figured out.
B)
| s_c_m1 | s_c_m2 | s_c_m3 | s_c_m4 | s_p_m1 | s_p_m2 | s_p_m3 | s_p_m4 |
|--------+--------+--------+--------+--------+--------+--------+--------|
| 3 | 6 | 9 | 12 | 15 | 18 | 21 | 24 |
|--------+--------+--------+--------+--------+--------+--------+--------|
Then I need to get the data in the following form. I need to turn it into a key-value form.
C)
| measure | c | p |
|---------+----+----|
| m1 | 3 | 15 |
| m2 | 6 | 18 |
| m3 | 9 | 21 |
| m4 | 12 | 24 |
|---------+----+----|
The first 4 columns from B) would form in C) the first column, and the second 4 columns would form another column.
Is there an elegant way, that could be easily maintainable? The perfect solution would be if another measure would be introduced in A) and B), there no modification would be required and it would automatically pick up the difference.
I know how to get this done in SqlServer and Postgres, but here I am missing the expirience.
I think you should use map for this

Calculation of Battery Consumption of each running mobile application

Is it possible to find out how much each mobile application consumes the battery per day (using R language) , where I have data collection of the following fields
record_id ,
date_time,
application_name,
battery_level,
battery_status
battery_level (It is a number represents the available percentage of the battery)
battery_status ( status of the battery : charging , discharging , full)
This calculation is based on the collected data.
example of such data :
+-----------+------------------+---------------------+---------------+----------------+
| record_id | application_name | date_time | battery_level | battery_status |
+-----------+------------------+---------------------+---------------+----------------+
| 473849 | viber | 2015-09-01 21:34:01 | 7 | Charging |
| 473850 | watsup | 2015-09-01 21:34:01 | 7 | Charging |
| 473851 | AccuWeather | 2015-09-01 21:34:01 | 7 | Charging |
+-----------+------------------+---------------------+---------------+----------------+
as I understood that it is not possible to calculate battery Consumption of
each running mobile application using data collected in my first post.
Let us have another data collection .
assuming that we have the following data ,
cpu usage per each running application and
memory usage per each running application
as the following
+-----------+------------------+---------------------+---------------------------------+------------------------------------+
| record_id | application_name | date_time | cpu_usage_per_app_in_percentage | memory_usage_per_app_in_percentage |
+-----------+------------------+---------------------+---------------------------------+------------------------------------+
| 473849 | viber | 2015-09-06 19:23:13 | 5 | 2 |
| 473850 | watsup | 2015-09-06 19:23:13 | 9 | 2 |
| 473851 | AccuWeather | 2015-09-06 19:23:13 | 8 | 4 |
| 473980 | viber | 2015-09-06 19:23:14 | 4 | 1 |
| 474254 | watsup | 2015-09-06 19:23:14 | 9 | 1 |
| 474323 | AccuWeather | 2015-09-06 19:23:14 | 9 | 2 |
| 474533 | viber | 2015-09-06 19:23:15 | 5 | 2 |
| 474536 | watsup | 2015-09-06 19:23:15 | 8 | 3 |
| 474537 | AccuWeather | 2015-09-06 19:23:15 | 5 | 3 |
| 474538 | calendar | 2015-09-06 19:23:15 | 7 | 3 |
+-----------+------------------+---------------------+---------------------------------+------------------------------------+
you can suggest any other way of data collection , the key question is that is it possible to make calculation of Battery Consumption of earch running mobile application ? if so how and what the data to be collected?

Creating hierarchical menu in asp.net using SQL

I'm trying to use to create a hierarchical menu from a SQL data source in asp.net. I'm having trouble ordering my table so that I can easily create the menu in asp.net. There may be a better way to do it if anyone has any ideas...
I currently have a table that looks like this (made up sample data), there are only folders at the root and only folders can have folder Ids:
+-------------------+---------------------+-----------+-----------+
| Name_FolderorItem | Parent_Folder | Id_Folder | Menuplace |
+-------------------+---------------------+-----------+-----------+
| c FOLDER | ROOT | c_FOLDER | 1 |
| d FOLDER | j_FOLDER | d_FOLDER | 2 |
| a FOLDER | ROOT | a_FOLDER | 1 |
| j FOLDER | ROOT | j_FOLDER | 1 |
| f FOLDER | ROOT | f_FOLDER | 1 |
| r FOLDER | f_FOLDER | r_FOLDER | 2 |
| i FOLDER | d_FOLDER | i_FOLDER | 3 |
| a ITEM | j_FOLDER | | 2 |
| d ITEM | c_FOLDER | | 2 |
| z ITEM | f_FOLDER | | 2 |
| r ITEM | d_FOLDER | | 3 |
+-------------------+---------------------+-----------+-----------+
I'm thinking that If I order it to get this which is alphabetical on the first level then alphabetical on each deeper level:
+-------------------+---------------------+-----------+-----------+
| Name_FolderorItem | Parent_Folder | Id_Folder | Menuplace |
+-------------------+---------------------+-----------+-----------+
| a FOLDER | ROOT | a_FOLDER | 1 |
| c FOLDER | ROOT | c_FOLDER | 1 |
| d ITEM | c_FOLDER | | 2 |
| f FOLDER | ROOT | f_FOLDER | 1 |
| r FOLDER | f_FOLDER | r_FOLDER | 2 |
| z ITEM | f_FOLDER | | 2 |
| j FOLDER | ROOT | j_FOLDER | 1 |
| a ITEM | j_FOLDER | | 2 |
| d FOLDER | j_FOLDER | d_FOLDER | 2 |
| i FOLDER | d_FOLDER | i_FOLDER | 3 |
| r ITEM | d_FOLDER | | 3 |
+-------------------+---------------------+-----------+-----------+
Then I can use a listview to get this menu structure:
a FOLDER
c FOLDER
- d ITEM
f FOLDER
- r FOLDER (r FOLDER is located in f folder)
- z ITEM
j FOLDER
- a item
- d FOLDER
- - i FOLDER
- - r ITEM
I can't seem to figure out the required SQL to take a folder and then prioritise what is inside of it instead of the other folders/items on that level.
If you have any ideas on the SQL statements that would allow for this ordering I would appreciate it, thanks in advance
EDIT:
Here is the Query that I'm now using, thanks for the help
SELECT *
FROM table
START WITH Parent_Folder LIKE 'ROOT'
CONNECT BY PRIOR Id_Folder LIKE Parent_folder;
This is basically the problem of storing an ordered tree in a database. I have tried a few approaches in the past and would recommend this one:
-----------------------------------
| NodeID | ParentID | Order |
-----------------------------------
| 1 | -1 | 1 |
| 2 | 1 | 1 |
| 3 | 1 | 2 |
| 4 | 3 | 1 |
-----------------------------------
The GIST is each row corresponds to a node in the tree (or a menu item in your case), and it has a parentID that points to the parent. The root node either has null or -1 has its parentID since it does not have one. Now, to get all the nodes that are directly under the root, you'd use
SELECT * FROM table WHERE ParentID = -1 ORDER BY [Order]
[Order] tells you how to order the nodes in a directly under a specific parent node. (OK, it's not so good to name a column using a SQL keyword, but let's forget that for the moment). So in my example you can see both "2" and "3" are under the root, but "2" comes first. To select the nodes directly under "3" (its children in other words), you'd use
SELECT * FROM table WHERE ParentID = 3 ORDER BY [Order]
The advantage of this approach is it's easy to query the menu. The disadvantage is when you update, you have to make sure nodes that share the same ParentID (in other words under the same parent) do not share the name [Order].
Alternative approach
An alternative approach is to store the tree as something like a link list:
-----------------------------------
|NodeID | ParentID | PrevID | (optionally you can store NextID as well)
-----------------------------------
| 1 | -1 | -1 |
| 2 | 1 | -1 |
| 3 | 1 | 2 |
| 4 | 3 | -1 |
-----------------------------------
If your tree is very large and you need to quickly browse the sibling nodes, this approach would offer a better performance.

Resources