convert rows to columns ... how do you do this? [duplicate] - sqlite

This question already has an answer here:
How to convert column values into rows in Sqlite?
(1 answer)
Closed 8 years ago.
Suppose you have a three-column table named scoreTotals. It has the weekly points totals for three players.
If you ran this query on scoreTotals:
select *
from scoreTotals;
You would get this:
Jones Smith Mills
50 70 60
How do you reconfigure the output to the end user so it's this way:
player points
Jones 50
Smith 70
Mills 60
The trick is to get the column titles to appear on the left hand side as actual data fields, rather than the titles of the columns.
I saw some things on StackOverflow relating to how to turn columns into rows, but none addressed this exact question, and my attempts to adjust the other ideas to my circumstance did not work.
It needs to work in sqlite, which means the pivot and unpivot keywords won't work. I'm looking to do this without storing a table to the database and then deleting it afterward.
The following code will generate the table I am trying to operate on:
create table scoreTotals(Jones int, Smith int, Mills int);
insert into scoreTotals values (50, 70, 60);

I had a similar problem and my solution depends on which programming language you might be using to process sqlite commands. In my case I am using python to connect to sqlite. After I do a select to return records, I store the result set into a "list of lists" (aka table) which I can then transpose (aka unpivot) with the following single line of code in python:
result = [[row[idx] for row in table] for idx in xrange(len(table[0]))] # transpose logic using list comprehension
SQLite does not have an unpivot command, but this solution by bluefeet for MySQL, may also work for SQLite:
MySQL - turn table into different table

Related

How can i loop over tables in R with the format YYYYMM

I have 24 tables in SQL work folder that goes with tablenameyearMONTH format that goes from 201304-201502 (i.e. tablename201304 ,tablename201305, tablename201306 this goes like this up to tablename201503 ). I need to put all these tables from SQL into R and put into one master table. All the table names are the same apart from the dates(dates go up by 1 month everytime), i was wondering what the best way to do this
I know how to get the data from SQL using ODBC i'm just struggling with dates in R. How should I loop the data in R so all the table can be put into one single table?

Why is SQLite query on two indexed columns so slow?

I have a table with around 65 million rows that I'm trying to run a simple query on. The table and indexes looks like this:
CREATE TABLE E(
x INTEGER,
t INTEGER,
e TEXT,
A,B,C,D,E,F,G,H,I,
PRIMARY KEY(x,t,e,I)
);
CREATE INDEX ET ON E(t);
CREATE INDEX EE ON E(e);
The query I'm running looks like this:
SELECT MAX(t), B, C FROM E WHERE e='G' AND t <= 9878901234;
I need to run this queries for thousands of different values of t and was expecting each query to run in a fraction of a second. However, the above query is taking nearly 10 seconds to run!
I tried running the query plan but only get this:
0|0|0|SEARCH TABLE E USING INDEX EE (e=?)
So this should be using the index. With a binary search I would expect worse case only 26 tests, which I would be pretty quick.
Why is my query so slow?
Each table in a query can use one index. Since your WHERE clause looks at multiple columns, you can use a multi-column index. For these, all but the last column used from the index has to test for equality; the last one used can be used for greater than/less than.
So:
CREATE INDEX e_idx_e_t ON E(e, t);
should give you a boost.
For further reading about how Sqlite uses indexes, the Query Planner documentation is a good introduction.
You're also mixing an aggregate function (max(t)) and columns (B and C) that aren't part of a group. In Sqlite's case, this means that it will pick values for B and C from the row with the maximum t value; other databases usually throw an error.

Efficient alphanumeric searching sparkR

I have in a Spark data frame with 10 million rows, where each row represents an alpha numeric string indicating id of a user, example:
602d38c9-7077-4ea1-bc8d-af5c965b4e85 my objective is to check if another id like aaad38c9-7087-4ef1-bc8d-af5c965b4e85 is present in the 10 million list.
I would want to do it efficiently and not search all 10 million records, every single time a search happens. Example can I sort my records alphabetically and ask SparkR to search only within records that begin with a instead of the universe to speed up search and make it computationally efficient?
Any solutions primarily using SparkR if not then any Spark solution would be helpful
You can use rlike which is for regex search within a dataframe column.
df.filter($"foo".rlike("regex"))
Or You can index spark dataframe into solr which will definitely search your string within few milliseconds.
https://github.com/lucidworks/spark-solr

SQLite sorting a column containing numbers or text

I have a column containing user entry descriptions, these descriptions can be anything however i do need them sorted into a logical order.
The text can be anything like
16 to 26 months
40 to 60 months
Literacy
Mathematics
When i order these in sql statement the text items return fine. However any beginning with numbers come back in an order not logical
i.e.
16 to 26 months
will be before
8 to 20 months
i understand why as it takes first character etc but don't know how to alter sql statement (using sqlite) to improve the performance without messing up the entries beginning with text
When i cast to numeric the numbers are fine the items beginning with text go wrong
Thanks
What you need is sorting the values in "natural order". To achieve this you will need to implement your own collating sequence; SQLite doesn't provide one for this case.
There are some questions (and answers) regarding this topic here on SO, but they are for other RDBMS. The best I could find in a quick search was this:
http://wiki.ozanh.com/doku.php?id=python:database:sqlite:how_to_natural_sort
You should think about improving your table schema, e. g. splitting the period into separate integer columns (monthsMin, monthsMax) instead of using text, which would make sorting much easier. You can always build a string from this values if necessary.

HBase keyvalue (NOSQL) to Hive table (SQL)

I have some tables in Hive that I need to join together. Since I need to do some work on each of them, normalize the key, remove outliers.... and as I add more and more tables... This chaining process turned out to be a big mass.
It is so easy to get lost where you are and the query is getting out of control.
However, I have a pretty clear idea how the final table should look like and each column is fairly independent of the other tables.
For examp, here is an example:
table_class1
name id score
Alex 1 90
Chad 3 50
...
table_class2
name id score
Alexandar 1 50
Benjamin 2 100
...
In the end I really want something looks like:
name id class1 class2 ...
alex 1 90 50
ben 2 100 NA
chad 3 50 NA
I know it could be a left outer join, but I am really having a hard time to create a seperate table for each of them after the normalization and then use left outer join with the union of the keys to left outer join each of them...
I am thinking about using NOSQL(HBase) to dump the processed data into NOSQL format.. like:
(source, key, variable, value)
(table_class1, (alex, 1), class1, 90)
(table_class1, (chad, 3), class1, 50)
(table_class2, (alex, 1), class2, 50)
(table_class2, (benjamin, 2), class2, 100)
...
In the end, I want to use something like the melt and cast in R reshape package to bring that data back to be a table.
Since this is a big data project, and there will be hundreds of millions of key value pairs in HBase.
(1) I don't know if this is a legit approach
(2) If so, is there any big data tool to pivot long HBase table into a Hive table.
Honestly, I would love to help more, but I am not clear about what you're trying to achieve (maybe because I've never used R), please elaborate and I'll try to improve my answer if necessary.
Why do you need HBase for? You can store your processed data in new tables and work with them, you can even CREATE VIEW to simplify the query if it's too large, maybe that's what you're looking for (HIVE manual). Unless you have a good reason for using HBase, I'll stick just to HIVE to avoid additional complexity, don't get me wrong, there are a lot of valid reasons for using HBase.
About your second question, you can define and use HBase tables as HIVE tables, you can even CREATE and SELECT INSERT into them all inside HIVE, is that what you're looking for?: HBase/HIVE integration doc
One last thing in case you don't know, you can create custom functions in HIVE very easily to help you with the tedious normalization process, take a look at this.

Resources