Sqlite - store matrices in a table - sqlite

I am quite new to Sqlite and have a dilemma about database design. Suppose we have a number of matrices (of various sizes) that is going to be stored in a table. We can further assume that no matrise is sparse.
Let's say we have:
A = [[1, 4, 5],
[8, 1‚ 4],
[1, 1, 3]]
B = [['what', 'a', 'good', 'day'],
['for', 'a', 'walk', 'outside']]
C = [['AAA', 'BBB', 'CCC', 'DDD', 'EEE'],
['FFF', 'GGG', 'HHH', 'III', 'JJJ'],
['KKK', 'LLL', 'MMM', 'NNN', 'OOO']]
And D which is [NxM]
When we create the table we do not know all the sizes that the matrices will have. I do not think it would be nice to alter the table size afterwards. What would be a recommended way to store the matrices to efficiently get them back? I wish to query out a matrix row-by-row.
I am thinking of transforming matrices into a column vector that somehow ends up in a table like this,
CREATE TABLE mat(id INT,
row INT,
col INT,
val TEXT)
How can I get them back line by line with a query in sqlite that looks like this for matrix A?
[1, 4, 5]
[8, 1‚ 4]
[1, 1, 3]
Ideas? Or could someone kindly refer to any similar problems
---------------------- UPDATE ----------------------
Okay. My question was not clear enough. That is probably the way I'm intended to arrange the data in my database. I hope you can help me find a way to organize my database,
Suppose we have some sets of data:
Compilation User BogoMips
1 Andrew 1.04
1 Klaus 1.78
1 James 1.99
1 David 2.09
. . .
. . .
1 Alex 4.71
Compilation Time Temperature Colour
2 10:20 10 Blue
2 10:28 21 Green
2 10:42 25 Red
. . . .
. . . .
2 18:16 16 Green
Compilation Colour Distance
3 Blue 4
3 Green 9
. . .
. . .
3 Yellow 12
...And there will be many more sets of data with different numbers columns and new headers. Some header names will return in another set. In advance, we have no idea what kind of sets needs to be stored. Every set has a common header 'compilation' that binds them together.
How would you structure the data in a database?
I find it hard to believe that creating a new table for each set is a good solution. or?
My idea is to have two tables, headers and data.
CREATE TABLE headers (id INT,
header TEXT
)
CREATE TABLE data (id INT,
compilation INT,
fk_header_id INT REFERENCES headers,
row INT,
col INT,
value TEXT)
So the populated tables looks like this,
SELECT * FROM headers;
id header
------------
1 User
2 BogoMips
3 Time
4 Temperature
5 Colour
6 Distance
SELECT * FROM data;
id compilation fk_header_id row col value
----------------------------------------------------
1 1 1 1 1 Andrew
2 1 2 1 2 1.04
3 1 1 2 1 Klaus
4 1 2 2 2 1.78
. . . . . .
. 2 3 1 1 10:20
. 2 4 1 2 10
. 2 5 1 3 Blue
. 2 3 2 1 10:28
. 2 4 2 2 21
. 2 5 2 3 Green
. . . . . .
. 3 5 1 1 Blue
. 3 6 1 2 4
. 3 5 2 1 Green
. 3 6 2 2 9
. . . . . .
.
and so on
The problem is that I don't know how to query out the datasets in Sqlite. Anyone (Tony) have an idea?

You'd need a pivot / cross tab query (or it's join equivalent) to get the data out.
e.g
Select c1.value as col1, c2.value as col2, c3.value as col3
from data c1 on col = 1
inner join data c2 on col = 2 and c2.compilation = c1.compilation and c2.row = c1.row
inner join data c3 on col= 3 and c3.compilation = c1.compilation and c3.row = c1.row
Where c1.compilation = 1 order by c1.row
As you can see this is less than funny. In particular with the above, you'd have to know the number of columns in advance. Crosstab or pivot would relieve you of that in terms of the sql, but you'd still have to mess about to read in the data from the query result.
Haven't seen anything is your question that indicates a need to extract a row or a column from a matrix, never mind a cell from the db
My Table would start as simple as
Compilation, Description, Matrix
Matrix would be sort of serialisation of a matrix object, Binary, xml even some sort of string eg. 1,2,3|4,5,6|7,8,9
If this was all I needed to store, I'd be looking at a NoSQL variant.

Related

Which tool should I use in Alteryx to find values and add new column

I got stuck at this for a long time and couldn't find answer elsewhere.
Below is my data:
Market Start Type(0 or 1)
A 1
A 2
A 4
A 6
A 10
A 2
B 2
B 4
B 6
B 8
B 4
B 9
C 1
C 4
C 7
C 3
C 9
C 11
C 12
And I want to complete the Type column based on following conditions:
If Market is A and Start is 1,2,3, then Type is 1, otherwise 0
If Market is B and Start is 2,4,5, then Type is 1, otherwise 0
If Market is C and Start is 4,6,9, then Type is 1, otherwise 0
In Alteryx, I tried using the formula tool three times:
IIF ( [Market]="A" && ([Start] in (1,2,3),"1","0")
IIF ( [Market]="B" && ([Start] in (2,4,5),"1","0")
IIF ( [Market]="C" && ([Start] in (4,6,9),"1","0")
But the third IIF function overwrites the previous two. Is there any other tools in Alteryx that can do what I want to do? Or is there something wrong with my code?
Thanks in advance. Really appreciate it.
It evaluates to False and places a zero for any market <> "C"... try a single Formula tool with:
IF [Market]="A" THEN
IIF([Start] in (1,2,3),"1","0")
ELSEIF [Market]="B" THEN
IIF([Start] in (2,4,5),"1","0")
ELSEIF [Market]="C" THEN
IIF([Start] in (4,6,9),"1","0")
ENDIF
This should eliminate overlap.

Writing Data into columns in a file (IDL)

I am trying to write some data into columns in IDL.
Let's say I am calculating "k" and "k**2", then I would get:
1 1
2 4
3 9
. .
and so on.
If I write this into a file, it looks like this:
1 1 2 4 3 9 . .
My corresponding code looks like this:
pro programname
openw, 1, "filename"
.
. calculating some values
.
printf, 1, value1, value2, value3
close,1
end
best regards
You should probably read the IDL documentation on formatted output, to get all of the details.
I don't understand your "value1, value2, value3" in your printf. If I were going to do this, I would have two variables "k" and "k2". Then I would print using either a transpose or a for loop:
IDL> k = [1:10]
IDL> k2 = k^2
IDL> print, transpose([[k], [k2]])
1 1
2 4
3 9
4 16
5 25
6 36
7 49
8 64
9 81
10 100
IDL> for i=0,n_elements(k)-1 do print, k[i], k2[i]
1 1
2 4
3 9
4 16
5 25
6 36
7 49
8 64
9 81
10 100
By the way, if you are going to use stack overflow you should start "accepting" the correct answers.
It sounds like you have two vectors. One with single values and one with square values. Here's what you could do:
pro programname
openw, f, "filename", /GET_LUN
.
. calculating some values for k
.
k2 = k^2 ;<---This is where you square your original array.
;Merge the vectors into a single array. Many ways to do this.
k = strtrim(k,2) ;<----------------------------Convert to string format
k2 = strtrim(k2,2) ;<--------------------------Convert to string format
merge = strjoin(transpose([[k],[k2]]),' ') ;<--Merge the arrays
;Now output your array to file the way you want to
for i=0L, n_elements(k)-1 do printf, f, merge[i]
;Close the file and free the logical file unit
free_lun, f
end

sqlite update a column with itself

I got a table like this
a b c
-- -- --
1 1 10
2 1 0
3 1 0
4 4 20
5 4 0
6 4 0
The b column 'points' to 'a', a bit like if a is the parent.
c was computed. Now I need to propagate the parent c value to their children.
The result would be
a b c
-- -- --
1 1 10
2 1 10
3 1 10
4 4 20
5 4 20
6 4 20
I can't make an UPDATE/SELECT combo that works
So far I got a SELECT that procuce the c column I'd like to get
select t1.c from t t1 join t t2 on t1.a=t2.b;
c
----------
10
10
10
20
20
20
But I dunno how to stuff that into c
Thanx in advance
Cheers, phi
You have to look up the value with a correlated subquery:
UPDATE t
SET c = (SELECT c
FROM t AS parent
WHERE parent.a = t.b)
WHERE c = 0;
I finnally found a way to copy back my initial 'temp' SELECT JOIN to table 't'. Something like this
create temp table u as select t1.c from t t1 join t t2 on t1.a=t2.b;
update t set c=(select * from u where rowid=t.rowid);
I'd like to know how the 2 solutions, yours with 1 query UPDATE correlated SELECT, and mine that is 2 queries and 1 correlated query each, compare perf wise. Mine seems more heavier, and less aesthetic, yet regarding perf I wonder.
On the Algo side, yours take care not to copy the parent data, only copy child data, mine copy parent on itself, but that's a nop, yet consuming some cycles :)
Cheers, Phi

R - data munging and scalable code [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Hy,
in the last days I had a small/big problem.
I have a transaction dataset, with 1 million rows and two columns (Client Id and product id) and I want transform this in a binary matrix.
I used reshape and spread function, but in both cases I used 64mb ram and Rstudio/R goes down.
Because I only use 1 CPU, the process takes a lot of time
My question is, what is it the new steep forward in this transition between small and big data? Who can I use more cpu?
I search and I found a couple of solution but I need a expertise opinion
1 - Using Spark R?
2 - H20.ai solution? http://h2o.ai/product/enterprise-support/
3 - Revolution analytics? http://www.revolutionanalytics.com/big-data
4 - go to the cloud? like microsoft azure?
If I needed I can use a virtual machine with a lot of cores.. but I need to know what is the smooth way to make this transaction
My specific problem
I have this data.frame (but with 1 million rows)
Sell<-data.frame(UserId = c(1,1,1,2,2,3,4), Code = c(111,12,333,12,111,2,3))
and I did:
Sell[,3] <-1
test<-spread(Sell, Code, V3)
this works with a little data set.. but with 1 million rows this takes a long time (12 hours) and goes down because my maximum ram is 64MB. Any suggestions?
You don't tell what you want to do with the result, but the most efficient way to create such a matrix would be creating a sparse matrix.
This is a dense matrix-like object that wastes a lot of RAM for all these NA values.
test
# UserId 2 3 12 111 333
#1 1 NA NA 1 1 1
#2 2 NA NA 1 1 NA
#3 3 1 NA NA NA NA
#4 4 NA 1 NA NA NA
You can avoid this with a sparse matrix, which internally is still basically a long-format structure, but has methods for matrix operations.
library(Matrix)
Sell[] <- lapply(Sell, factor)
test1 <- sparseMatrix(i = as.integer(Sell$UserId),
j = as.integer(Sell$Code),
x = rep(1, nrow(Sell)),
dimnames = list(levels(Sell$UserId),
levels(Sell$Code)))
#4 x 5 sparse Matrix of class "dgCMatrix"
# 2 3 12 111 333
#1 . . 1 1 1
#2 . . 1 1 .
#3 1 . . . .
#4 . 1 . . .
You would need even less RAM with a logical sparse matrix:
test2 <- sparseMatrix(i = as.integer(Sell$UserId),
j = as.integer(Sell$Code),
x = rep(TRUE, nrow(Sell)),
dimnames = list(levels(Sell$UserId),
levels(Sell$Code)))
#4 x 5 sparse Matrix of class "lgCMatrix"
# 2 3 12 111 333
#1 . . | | |
#2 . . | | .
#3 | . . . .
#4 . | . . .
I'm not sure this is a coding question...BUT...
The new Community Preview of SQL Server 2016 has R built in on the server, and you can get download the preview to try here: https://www.microsoft.com/en-us/evalcenter/evaluate-sql-server-2016
Doing this will bring your R code to your data and run on top of the SQL engine, allowing for the same sort of scalability you get built in with SQL.
Or you can stand up a VM in Azure, by going to the new portal, selecting "New" "Virtual Machine" and search for "SQL"

filter sqlite query based on counts of pairwise interactions

I am trying to filter a somewhat involved sqlite3 query using a pairwise association table. Say I have these tables (where pet_id_x references an id in table pets):
[pets]
id | name | animal_types_id | <additional_info>
1 Spike 2
2 Fluffy 1
3 Whiskers 1
4 Spot 2
5 Garth 2
6 Hamilton 3
7 Dingus 1
8 Scales 3
. . .
. . .
[animal_types]
id | type
1 cat
2 dog
3 lizard
[successful_pairings]
pet_id_1 | pet_id_2
1 4
2 4
2 8
3 2
3 4
4 5
4 6
4 7
5 6
5 7
6 7
. .
. .
A toy example for my query would be to get the names of all dogs which meet certain constraints (from columns within the pets table) and have > 2 successful pairings with other dogs, resulting in:
name | successful pairings
Spot 6
Garth 3
As per the above, the total counts for each id need to be combined from pet_id_1 and pet_id_2 in successful_pairings, as an id may be represented for a given pairing in either column.
I am new to sql syntax, and am having trouble chaining queries together to filter based on conditions distributed across multiple tables.

Resources