Counting words in a sqlite FTS4 - sqlite

I have a sqlite3 full text search table defined like this:
CREATE VIRTUAL TABLE entries USING fts4 ( entry TEXT )
Each entry row has a line of text. How can I write a query to count the total amount of words in the table? Thanks

I don't know of a built-in function to do that, but you could re-use the answer to
" Query to count words SQLite 3 " to get the total count of words:
select sum(length(trim(entry))
- length(replace(trim(entry), ' ', '')) + 1) from entries;
(Modified the original answer by adding the trim.)
If you have sqlite3 version 3.7.6 or later, you can do something cleaner with an fts4aux table.
create virtual table terms using fts4aux(entries);
select count(distinct term) from terms;

Related

SQLite count number of occurence of word in a string

I want to count number of occurrences of a word in a string for example ,
[{"lastUpdatedDateTime":{"timestamp":1.54867752522E12},"messageStatus":"DELIVERED","phoneNumber":"+916000060000"},{"lastUpdatedDateTime":{"timestamp":1548677525220},"messageStatus":"DELIVERED","phoneNumber":"+916000060000"}]
in above string i want to count no of occurrences of a word 'DELIVERED' here it is 2.
i want to get result 2. pls help me on this. i should have to use only sql query to achieve this.
thanks in advance.
If your table's name is tablea and the column's name is col:
SELECT
(LENGTH(col) - LENGTH(REPLACE(col, '"DELIVERED"', '')))
/
LENGTH('"DELIVERED"') as counter
from tablea
remove every occurrence of "DELIVERED" and subtract the length of the string from the original string and finally divide the result with the length of "DELIVERED"
Assuming your data is in a table something like:
CREATE TABLE example(json TEXT);
INSERT INTO example VALUES('[{"lastUpdatedDateTime":{"timestamp":1.54867752522E12},"messageStatus":"DELIVERED","phoneNumber":"+916000060000"},{"lastUpdatedDateTime":{"timestamp":1548677525220},"messageStatus":"DELIVERED","phoneNumber":"+916000060000"}]');
and your instance of sqlite has the JSON1 extension enabled:
SELECT count(*) AS "Number Delivered"
FROM example AS e
JOIN json_each(e.json) AS j
WHERE json_extract(j.value, '$.messageStatus') = 'DELIVERED';
gives you:
Number Delivered
----------------
2
This will return the total number of matching entries from all rows in the table as a single value. If you want one result per row instead, it's an easy change but the exact details depend on your table definition. Adding GROUP BY e.rowid to the end of the query will work in most cases, though.
In the long run it's probably a better idea to store each object in the array as a single row in a table, broken up into the appropriate columns.

Adding value to existing database table in RSQLite

I am new to RSQLite.
I have an input document in text format in which values are seperately by '|'
I created a table with the required variables (dummy code as follows)
db<-dbconnect(SQLite(),dbname="test.sqlite")
dbSendQuery(conn=db,
"CREATE TABLE TABLE1(
MARKS INTEGER,
ROLLNUM INTEGER
NAME CHAR(25)
DATED DATE)"
)
However I am struck at how to import values into the created table.
I cannot use INSERT INTO Values command as there are thousands of rows and more than 20+ columns in the original data file and it is impossible to manually type in each data point.
Can someone suggest an alternative efficient way to do so?
You are using a scripting language. The deal of this is literally to avoid manually typing each data point. Sorry.
You have two routes:
1: You have corrected loaded a database connection and created an empty table in your SQLite database. Nice!
To load data into the table, load your text file into R using e.g. df <-
read.table('textfile.txt', sep='|') (modify arguments to fit your text file).
To have a 'dynamic' INSERT statement, you can use placeholders. RSQLite allows for both named or positioned placeholder. To insert a single row, you can do:
dbSendQuery(db, 'INSERT INTO table1 (MARKS, ROLLNUM, NAME) VALUES (?, ?, ?);', list(1, 16, 'Big fellow'))
You see? The first ? got value 1, the second ? got value 16, and the last ? got the string Big fellow. Also note that you do not enclose placeholders for text in quotation marks (' or ")!
Now, you have thousands of rows. Or just more than one. Either way, you can send in your data frame. dbSendQuery has some requirements. 1) That each vector has the same number of entries (not an issue when providing a data.frame). And 2) You may only submit the same number of vectors as you have placeholders.
I assume your data frame, df contains columns mark, roll, and name, corrsponding to the columns. Then you may run:
dbSendQuery(db, 'INSERT INTO table1 (MARKS, ROLLNUM, NAME) VALUES (:mark, :roll, :name);', df)
This will execute an INSERT statement for each row in df!
TIP! Because an INSERT statement is execute for each row, inserting thousands of rows can take a long time, because after each insert, data is written to file and indices are updated. Insert, enclose it in an transaction:
dbBegin(db)
res <- dbSendQuery(db, 'INSERT ...;', df)
dbClearResult(res)
dbCommit(db)
and SQLite will save the data to a journal file, and only save the result when you execute the dbCommit(db). Try both methods and compare the speed!
2: Ah, yes. The second way. This can be done in SQLite entirely.
With the SQLite command utility (sqlite3 from your command line, not R), you can attach a text file as a table and simply do a INSERT INTO ... SELECT ... ; command. Alternately, read the text file in sqlite3 into a temporary table and run a INSERT INTO ... SELECT ... ;.
Useful site to remember: http://www.sqlite.com/lang.html
A little late to the party, but DBI provides dbAppendTable() which will write the contents of a dataframe to an SQL table. Column names in the dataframe must match the field names in the database. For your example, the following code would insert the contents of my random dataframe into your newly created table.
library(DBI)
db<-dbConnect(RSQLite::SQLite(),dbname=":memory")
dbExecute(db,
"CREATE TABLE TABLE1(
MARKS INTEGER,
ROLLNUM INTEGER,
NAME TEXT
)"
)
df <- data.frame(MARKS = sample(1:100, 10),
ROLLNUM = sample(1:100, 10),
NAME = stringi::stri_rand_strings(10, 10))
dbAppendTable(db, "TABLE1", df)
I don't think there is a nice way to do a large number of inserts directly from R. SQLite does have a bulk insert functionality, but the RSQLite package does not appear to expose it.
From the command line you may try the following:
.separator |
.import your_file.csv your_table
where your_file.csv is the CSV (or pipe delimited) file containing your data and your_table is the destination table.
See the documentation under CSV Import for more information.

SQLite GROUP BY

I am using DB Browser for SQLite to extract some interesting data from my database but I encountered one big problem with GROUP BY statement.
Even the most basic SELECT I can imagine is not working properly.
(Filename nvarchar(2147483647))
SELECT FileName FROM TableName WHERE FileName LIKE '%Nieminen%' GROUP BY FileName gives 5 rows even though I know that there are 9 distinct FileNames containing the phrase 'Nieminen' (I've browsed it).
Can it be possible that GROUP BY in sqlite compares only N (e.g. 10) initial characters? From my observation it might be true...
Any clues?
why don't you try out the following:
SELECT DISTINCT FileName FROM TableName WHERE FileName LIKE '%Nieminen%'

Finding out the size of a Netezza table using UNIX SAS

What syntax / tables can be used to determine the size (Gbs) of a Netezza table? I am accessing via UNIX SAS (either ODBC or libname engine). I assume there is a view which will give this info?
So you're interested in two system views _v_obj_relation_xdb and _v_sys_object_dslice_info. The first (_v_obj_relation_xdb) contains the table information (name, type, etc.) and the second (_v_sys_object_dslice_info) contains the size per disk information. You probably want to take a look at both of those tables to get a good idea of what you're really after, but the simple query would be:
select objname, sum(used_bytes) size_in_bytes
from _V_OBJ_RELATION_XDB
join _V_SYS_OBJECT_DSLICE_INFO on (objid = tblid)
where objname = 'UPPERCASE_TABLE_NAME'
group by objname
This returns the size of the table in bytes and I'll leave the conversion to GB as an exercise to the reader. There are some other interesting fields there so you might want to check out those views.
You could also use (_v_sys_object_storage_size )
select b.objid
,b.database as db
,lower(b.objname) as tbl_nm
,lower(b.owner) as owner
,b.objtype
,d.used_bytes/pow(1024,3) as used_gb
,d.skew
,cast(b.createdate as timestamp) as createdate_ts
,cast(b.objmodified as timestamp) as objmodified_ts
from _v_obj_relation_xdb b inner join
_v_sys_object_storage_size d
on b.objid=d.tblid
and lower(b.objname) = 'table name'
The size on disk (used_bytes) represents compressed data and includes storage for any deleted rows in the table.
The table rowcount statistic (reltuples) is generally very accurate, but it is just a statistic and not guaranteed to match the "select count(*)" table rowcount.
You can get this information via a catalog query
select tablename, reltuples, used_bytes from _v_table_only_storage_stat where tablename = ^FOOBAR^;

what is the best technique to search text for list of word

i have table contain two column id and word .
word column may contain one word or two or three ex ( computer , computer software , computer software computer )
i want search the text if it contain any word in that table .
thank you .
If it is small amount of text, you can use "like" with "%" e.g "select * from tableX where word like '%computer%'"
change the key word list into a table
JOIN with LIKE
Optionally:
COUNT should match number of search terms
COUNT can also be used for ranking
Better still for larger tables
or use full text search
Like this
SELECT
ID
FROM
Mytable M
JOIN
SearchTable S On M.Word LIKE '%' + S.SearchWord + '%'

Resources