I am using Windows 32-bit so the most recent version of MariaDB I can use is 5.2. My question is, how do I have a calculated column that shows the current row number of the field?
As an example, I can create a calculated column with this:
UPDATE myTable SET myCalculatedColumn = some_other_column/2;
But how do I get the row number?
Related
I am running a query which takes 2 seconds (much too long!) and returns 16 records:
SELECT * FROM cells WHERE cellbookguid='1' AND cellpageguid='1';
The table named "cells" holds around 20.000 records.
It has 42 columns such as "cellwidth, cellheight, cellcolor, cellbackground, cellpageguid, cellbookguid, etc.".
Because I don't understand why it takes so long, I run EXPLAIN QUERY.
It says:
order: 0
from: 0
detail: TABLE cells WITH INDEX idx_cells_cellbookguid
So obviously it does not use the index idx_cells_cellpageguid.
This must be why it takes so long.
However, this index is present:
SELECT name FROM sqlite_master WHERE type = 'index';"
It does contain "idx_cells_cellpageguid".
I have also used an external SQLite viewer. This index is present on the column.
Why does the SQLite query not use this index?
The definition of idx_cells_cellpageguid is
This is the SQL command:
CREATE INDEX idx_cells_cellpageguid
ON cells
(CellPageGUID COLLATE BINARY)
Thank you!
This question already has answers here:
Overwrite only some partitions in a partitioned spark Dataset
(3 answers)
Closed 4 years ago.
I'm using the spark_write_table function from sparklyr to write tables into HDFS, using the partition_by parameter to define how to store them:
R> my_table %>%
spark_write_table(.,
path="mytable",
mode="append",
partition_by=c("col1", "col2")
)
However, now I want to update the table by altering just one partition, instead of writing the whole table again.
In Hadoop-SQL I would do something like:
INSERT INTO TABLE mytable
PARTITION (col1 = 'my_partition')
VALUES (myvalues..)
Is there an equivalent option to do this in sparklyr correctly? I cannot find it in the documentation.
Re - duplication note: this question is specifically about the way to do this in R with the sparklyr function, while the other question is about general Hive syntax
Thanks all for the comments.
It seems there is no way to do this with sparklyr directly, but this is what I am going to do.
In short, I'll save the new partition file in a temporary table, use Hadoop SQL commands to drop the partition, then another SQL command to insert into the temporary table into it.
> dbGetQuery(con,
"ALTER TABLE mytable DROP IF EXISTS PARTITION (mycol='partition1');")
> spark_write_table(new_partition, "tmp_partition_table")
> dbGetQuery(con,
"INSERT VALUES INTO TABLE mytable
PARTITION (mycol='partition1')
SELECT *
FROM tmp_partition_table "
)
I am using the RODBC package on R which allows me to connect to SQL using R.
As an example to my problem, I have a table [Sales] within SQL with 3 Columns (Alpha, Beta, BetaDistribution).
1.50,77,x
2.99,53,x
4.50,122,x
Note that the 3rd column (BetaDistribution) is not populated, and this needs to be populated using a Statistical R Function.
I have assigned my table to the variable SELECT
select <- sqlQuery(dbhandle, 'select * from dbo.sales')
how to I run a loop to update my sql table so that the BetaDistribution column is updated with the calculated Beta Distribution - pbeta(alpha,beta)
Something like this. Basically you make a temp table and then update the existing table. There's a reasonable chance you need to tweak that update statement since I, obviously, can't test it.
select$BetaDistribution<-yourfunc(x,y)
sqlSave(dbhandle, select, tablename="dbo.salestemp", rownames=FALSE,varTypes=list(Alpha="decimal(10,10)", Beta="decimal(10,10)", BetaDistribution="decimal(10,10)"))
sqlQuery(dbhandle, "update dbo.sales
set sales.BetaDistribution=salestemp.BetaDistribution
from dbo.sales
inner join
salestemp
on
sales.Alpha=salestemp.Alpha and
sales.Beta=salestemp.Beta")
sqlQuery(dbhandle, "drop table salestemp")
I am using R in combination with SQLite using RSQLite to persistate my data since I did not have sufficient RAM to constantly store all columns and calculate using them. I have added an empty column to the SQLite database using:
dbGetQuery(db, "alter table test_table add column newcol real)
Now I want to fill this column using data I calculated in R and which is stored in my data.table column dtab$newcol. I have tried the following approach:
dbGetQuery(db, "update test_table set newcol = ? where id = ?", bind.data = data.frame(transactions$sum_year, transactions$id))
Unfortunately, R seems like it is doing something but is not using any CPU time or RAM allocation. The database does not change size and even after 24 hours nothing has changed. Therefore, I assume it has crashed - without any output.
Am I using the update statement wrong? Is there an alternative way of doing this?
UPDATE
I have also tried the RSQLite functions dbSendQuery and dbGetPreparedQuery - both with the same result. However, what does work is updating a single row without the use of bind.data. A loop to update the column, therefore, seems possible but I will have to evaluate the performance since the dataset is huge.
As mentioned by #jangorecki the problem had to do with SQLite performance. I disabled synchronous and set journal_mode to off (which has to be done for every session).
dbGetQuery(transDB, "PRAGMA synchronous = OFF")
dbGetQuery(transDB, "PRAGMA journal_mode = OFF")
Also I changed my RSQLite code to use dbBegin(), dbSendPreparedQuery() and dbCommit(). It is takes a while but at least it works not and has an acceptable performance.
I have an SQLite table that contains a BLOB I need to do a size/length check on. How do I do that?
According to documentation length(blob) only works on texts and will stop counting after the first NULL. My tests confirmed this. I'm using SQLite 3.4.2.
I haven't had this problem, but you could try length(hex(glob))/2
Update (Aug-2012):
For SQLite 3.7.6 (released April 12, 2011) and later, length(blob_column) works as expected with both text and binary data.
for me length(blob) works just fine, gives the same results like the other.
As an additional answer, a common problem is that sqlite effectively ignores the column type of a table, so if you store a string in a blob column, it becomes a string column for that row. As length works different on strings, it will then only return the number of characters before the final 0 octet. It's easy to store strings in blob columns because you normally have to cast explicitly to insert a blob:
insert into table values ('xxxx'); // string insert
insert into table values(cast('xxxx' as blob)); // blob insert
to get the correct length for values stored as string, you can cast the length argument to blob:
select length(string-value-from-blob-column); // treast blob column as string
select length(cast(blob-column as blob)); // correctly returns blob length
The reason why length(hex(blob-column))/2 works is that hex doesn't stop at internal 0 octets, and the generated hex string doesn't contain 0 octets anymore, so length returns the correct (full) length.
Example of a select query that does this, getting the length of the blob in column myblob, in table mytable, in row 3:
select length(myblob) from mytable where rowid=3;
LENGTH() function in sqlite 3.7.13 on Debian 7 does not work, but LENGTH(HEX())/2 works fine.
# sqlite --version
3.7.13 2012-06-11 02:05:22 f5b5a13f7394dc143aa136f1d4faba6839eaa6dc
# sqlite xxx.db "SELECT docid, LENGTH(doccontent), LENGTH(HEX(doccontent))/2 AS b FROM cr_doc LIMIT 10;"
1|6|77824
2|5|176251
3|5|176251
4|6|39936
5|6|43520
6|494|101447
7|6|41472
8|6|61440
9|6|41984
10|6|41472