How can i get substring in SQLite? - sqlite

There is a column.
And it has values like
'/abc/def/ghi/w1.xyz'
'/jkl/mno/r.stuv'
(it's path data and the number of '/'s in each value is not fixed.)
how can i get substring column which has values like
'/abc/def/ghi/'
'/jkl/mno/'
(extracting only the directory part. removing the file part.)
i read about substr(X,Y), substr(X,Y,Z), instr(X,Y).
but it's not easy to apply them because the number of '/'s in each value is not fixed and instr(X,Y) seems to find the first occurrence from the left.

With a recursive CTE:
create table tablename(col text);
insert into tablename(col) values
('/abc/def/ghi/w1.xyz'),
('/jkl/mno/r.stuv');
with recursive cte(col, pos, rest) as (
select col, instr(col, '/') pos, substr(col, instr(col, '/') + 1) rest
from tablename
union all
select col, instr(rest, '/'), substr(rest, instr(rest, '/') + 1)
from cte
where instr(rest, '/') > 0
)
select col, substr(col, 1, sum(pos)) path
from cte
group by col
See the demo.
Results:
| col | path |
| ------------------- | ------------- |
| /abc/def/ghi/w1.xyz | /abc/def/ghi/ |
| /jkl/mno/r.stuv | /jkl/mno/ |

If you are on Unix/Linux environment, you could do something like this (let's take an example).
Let's say test.db was your SQLite3 database with a table like so:
create table test (dataset text);
insert into test (dataset) values ('/abc/def/ghi/w1.xyz');
insert into test (dataset) values ('/jkl/mno/r.stuv');
From command line, you can run:
sqlite3 test.db -batch -noheader "select dataset from test"
That'll give you this as the output (I know that's not the output you want, but just read on):
/abc/def/ghi/w1.xyz
/jkl/mno/r.stuv
Explanation
the -batch and -noheader switches suppress all output except for the data resulting from the SQL statement
sqlite3 test.db -batch -noheader "sql statement" runs the SQL statement you provided and the output is dumped on the screen (stdout)
Solution
Now, we'll use awk with it to get what you want like so:
sqlite3 test.db -batch -noheader "select dataset from test" | awk -F'/' 'BEGIN{OFS="/"} {$NF = ""; print $0}'
Result will be:
/abc/def/ghi/
/jkl/mno/
Crude explanation
awk works on every line output by your sqlite3 command
the line is split by / character with -F'/' switch
since we want the output to contain the delimiters as is, we set the output field separator to be '/' as well using OFS='/'
we set number of fields (NF) in a way that the last item is ignored and then we print the rest of the data
as the other split fields are printed, OFS inserts / between those fields

Related

Replacing variable values in defined string through awk / xargs

We are dynamically generating a string in bash to insert data in oracle database. The string is like
> echo $str1
insert into tbl select '$jobid','$1','$2','$3','$sdate' from dual ;
Here the variables $1,$2 ... are dynamic and can go upto 10
Now we have data in a file with same number of ':' separated datacolumns as there are numeric variables ( $1,$2.. ) in above string.
Challenge here is to have $1 replaced with 1st column of data, $2 with 2nd column and so on. This needs to be done for all rows of dataset and a separate file needs to be generated with "insert" string as base and with replaced data from the file.
For e.g the sample data
cat test.dat
ONLINE:odr1_redo_06a.log:NO
ONLINE:odr1_redo_06b.log:NO
ONLINE:odr1_redo_05a.log:NO
and the string is
echo $str1
insert into tbl select '$jobid','$1','$2','$3','$sdate' from dual ;
Required output should be
insert into tbl select '$jobid','ONLINE','odr1_redo_06a.log','NO','$sdate' from dual ;
insert into tbl select '$jobid','ONLINE','odr1_redo_06b.log','NO','$sdate' from dual ;
insert into tbl select '$jobid','ONLINE','odr1_redo_05a.log','NO','$sdate' from dual ;
Tried using string as external variable in awk. No luck
cat test.dat | awk -F: -v var="$str1" '{print var}'
insert into tbl select '$jobid','$1','$2','$3','$sdate' from dual ;
insert into tbl select '$jobid','$1','$2','$3','$sdate' from dual ;
insert into tbl select '$jobid','$1','$2','$3','$sdate' from dual ;
or xargs
sed 's/:/ /g' test.dat | xargs -n3 bash -c "echo $str1"
insert into tbl select $jobid,$1,$2,$3,$sdate from dual
insert into tbl select $jobid,$1,$2,$3,$sdate from dual
insert into tbl select $jobid,$1,$2,$3,$sdate from dual
Writing a small loop and calling line by line bears overhead so don't prefer doing that. Any ideas how this can be done in optimal fashion ?
With Awk, for each record, replace every literal $n with the value of nth field in your template by means of gsub function and print the result.
awk -F: -v tmpl="$str1" '{
out = tmpl
for (i=1; i<=NF; i++)
gsub(("\\$" i), $i, out)
print out
}' file
Proof of concept:
$ str1="insert into tbl select '\$jobid','\$1','\$2','\$3','\$sdate' from dual ;"
$
$ awk -F: -v tmpl="$str1" '{
> out = tmpl
> for (i=1; i<=NF; i++)
> gsub(("\\$" i), $i, out)
> print out
> }' file
insert into tbl select '$jobid','ONLINE','odr1_redo_06a.log','NO','$sdate' from dual ;
insert into tbl select '$jobid','ONLINE','odr1_redo_06b.log','NO','$sdate' from dual ;
insert into tbl select '$jobid','ONLINE','odr1_redo_05a.log','NO','$sdate' from dual ;

SQLite - Extract substring between delimiters for REPLACE function

I have a column field: location. I need to extract the string between the first and second delimeter ('/').
I already have a column name where I ltrim to the first '/'. I've tried to create a similar query with a combination of rtrim, replace, substr as my source column to no avail. Here is what my data looks like. I want to extract AML, for example. Right now, there are only three options (value1, value2, value3) between the first and second delimiters, but there could be more later.
Attribute data
----------+--------------------------------------------------------------------------------------------------------------------
Field | First value
----------+--------------------------------------------------------------------------------------------------------------------
location | './AML/Counties/*****************kyaml_20190416_transparent_mosaic_group1.tif'
name | 'kyaml_20190416_transparent_mosaic_group1.tif'
----------+--------------------------------------------------------------------------------------------------------------------
What is the best way of creating my column source with the value from location?
Output should be like this:
Attribute data
----------+--------------------------------------------------------------------------------------------------------------------
Field | First value
----------+--------------------------------------------------------------------------------------------------------------------
location | './AML/Counties/****************kyaml_20190416_transparent_mosaic_group1.tif'
name | 'kyaml_20190416_transparent_mosaic_group1.tif'
source | 'AML'
----------+--------------------------------------------------------------------------------------------------------------------
With substr() and instr():
select *,
substr(
substr(location, instr(location, '/') + 1),
1,
instr(substr(location, instr(location, '/') + 1), '/') - 1
) as source
from data
See the demo.
I used forpas query to modify my query. Here is my final query
ogrinfo box_tiles.shp -dialect SQLITE -sql \
"UPDATE box_tiles SET source = \
substr(\
substr(location, instr(location, '/') + 1), 1, \
instr(substr(location, instr(location, '/') + 1), '/') - 1)"

Select Case, when no data return

it is possible do a SELECT CASE, decode, nvl or another query function when I need verify if the return of a select query is empty or has a value?
For example, I have this:
Record | type | id_customer
-------+--------+-------------
1 | T | cus1
2 | A | cus2
3 | T | cus3
4 | | cus4
If I do this:
select decode(type,'T','Main','A','Adicional','none') from table where record=1;
I get Main.
If I fo this:
select decode(type,'T','Main','A','Adicional','none') from table where record=4;
I get none.
But if I do this:
select decode(type,'T','Main','A','Aditional','none') from table where record=5;
I get nothing, and is logic. So, I need get the decode value when the row exist and a text if the rows no exist.
So, I tried with SELECT CASE but is not posible get a value using COUNT. For example like this:
SELECT
CASE
WHEN count(1)>0 THEN decode(type,'T','Main','A','Aditional','none')
ELSE '-'
END
FROM TABLE WHERE record=5;
And get a ' - ', or the same if the record is 2, get 'Aditional'
Thanks a lot.
You can use aggregate functions min or max outside expression:
select max(decode(type,'T','Main','A','Aditional','none'))
from table
where record=5;
If query returns one row, you get value of that row. If query returns 0 rows, you get NULL.
Then you can replace NULL using nvl:
select nvl(max(decode(type,'T','Main','A','Aditional','none')), ' - ')
from table
where record=5;
EDIT
Also, if you need to choose one string from several:
select decode(max(decode(type,'T', 2, 'A', 1, 0)), 0, 'none', 1, 'Additional', 2, 'Main', null, ' - ')
from table
where record=5;
This is an option:
select decode(type,'T','Main','A','Aditional','none')
from table
where record = 5
union all
select '-'
from dual
where not exists (select 1 from table where record = 5);
It selects records with record = 5 and unifies them with '-', if no records exits with record = 5. Check out this Fiddle.

Recursively add to a data table in SAS

I am new to SAS. I need to do x-iterations to populate my dataset called MYRS.
Each iteration needs to JOIN TABLE1 with (TABLE2+ MYRS) MINUS the records which are already in MYRS table.
Then, I need to update MYRS table with additional matches. The GOAL is to track a chain of emails.
MYRS is essentially a copy of TABLE1 and contains matching records. Kind of tricky. (simplified schema). Table1 Can have DUPS.
For example
TABLE1:
ID | EMAIL1 | EMAIL2 | EMAIL3 | EMAIL4|
1 | A | s | d | F
2 | g | F | j | L
3 | z | x | L | v
4 | z | x | L | v
2 | g | F | j | L
TABLE2:
EMAIL
A
MYRS (starts as empty dataset)
EMAIL1 | EMAIL2 | EMAIL3 | EMAIL4
Logic: TABLE1 has email that matches email in TABLE2. Therefore this record need to show up. Other records don't match anything in TABLE2. But because Record1 and Record2 share the same ALTERNATIVE email F, Record2 also need to be shown. But because Record2 and Record3 share same alternative email L, Record3 also needs to be shown. And so fourth...
proc sql;
SELECT TABLE1.id,
TABLE1.email1,
TABLE1.email2,
TABLE1.email3,
TABLE1.email4
FROM TABLE1
INNER JOIN (
SELECT EMAIL
FROM TABLE2
UNION
SELECT EMAIL1 AS EMAIL
FROM MYRS
UNION
SELECT EMAIL2 AS EMAIL
FROM MYRS
UNION
SELECT EMAIL3 AS EMAIL
FROM MYRS
UNION
SELECT EMAIL4 AS EMAIL
FROM MYRS
)
ON EMAIL=EMAIL1 OR EMAIL=EMAIL2 OR EMAIL=EMAIL3 OR EMAIL=EMAIL4
WHERE TABLE1.id NOT IN (
SELECT DISTINCT ID
FROM MYRS
)
quit;
How can I create the following logic:
Wrap this into some sort of function
Before sql execution, count amount of records in MYDS and SAVE the count
Execute SQL and update MYDS
Count amount of records in MYDS
If MYDS count did not change, stop execution
Else, goto #3
I am very new to SAS (3 days to be exact) and trying to put everything together. (I would use the logic above if I was to do that in Java)
Here is a macro approach, it mostly follows your logic but transforms your data first and the input/output is a list of IDs (you can easily get to and from emails with this).
This code will probably introduce quite a few SAS features that you are unfamiliar with, but the comments and explanations below should help . If any of it is still unclear take a look at the links or add a comment.
It expects input data:
inData: Your TABLE1 with ID and EMAIL* variables
matched: An initial list of known wanted IDs
It returns:
matched: An updated list of wanted IDs
/* Wrap the processing in a macro so that we can use a %do loop */
%macro looper(maxIter = 5);
/* Put all the emails in one column to make comparison simpler */
proc transpose data = inData out = trans (rename = (col1 = email));
by ID;
var email:;
run;
/* Initialise the counts for the %where condition */
%let _nMatched = 0;
%let nMatched = 1;
%let i = 0;
/* Loop until no new IDs are added (or maximum number of iterations) */
%do %while(&_nMatched. < &nMatched. and &i < &maxIter.);
%let _nMatched = &nMatched.;
%let i = %eval(&i. + 1);
%put NOTE: Loop &i.: &nMatched. matched.;
/* Move matches to a temporary table */
proc datasets library = work nolist nowarn;
delete _matched;
change matched = _matched;
quit;
/* Get new matched IDs */
proc sql noprint;
create table matched as
select distinct c.ID
from _matched as a
left join trans as b
on a.ID = b.ID
left join trans as c
on b.email = c.email;
/* Get new count */
select count(*) into :nMatched from matched;
quit;
%end;
%mend looper;
%looper(maxIter = 10);
The interesting bits are:
proc transpose: Converts the input into a deep table so that all the email addresses are in one variable, this makes writing the email comparison logic simpler (less repetition needed) and puts the data in a format that will make it easier for you to clean the email addresses if necessary (think upcase(), strip(), etc.).
%macro %mend: The statements used to define a macro. This is necessary as you cannot use macro logic or loops in open code. I've also added an argument so you can see how that works.
%let and select into :: Two ways to create macro variables. Macro variables are referenced with the prefix & and are used to insert text into the SAS program before it is executed.
%do %while() %end: One of the ways to perform a loop within a macro. The code within will be run repeatedly until the condition evaluates to false.
proc datasets: A procedure for performing admin tasks on datasets and libraries. Used here to delete and rename temporary tables.

Dynamic query based on second table

I have a table of price quotes for multiple symbols
Table QUOTES
ID INT
SYMBOL NVARCHAR(6)
DT DATETIME
PRICE DECIMAL(18,5)
Table TempSymbol
SYMBOL NVARCHAR(6)
I want to extract only those symbols from QUOTES whose symbols are also in a temp table that could vary based on user request
Create TABLE TempSymbol
(
SYMBOL NVARCHAR(6) NOT NULL
);
INSERT INTO TempSymbol(SYMBOL) VALUES ('MSFT');
INSERT INTO TempSymbol(SYMBOL) VALUES ('INTC');
INSERT INTO TempSymbol(SYMBOL) VALUES ('AAPL');
I want a query that will return from QUOTES the following data...
datetime symbol1 | price1 | symbol2 | price2 | symbol3 | price3
2012-11-12 12:10:00 MSFT | 12.10 | INTC | 5.68 | AAPL | 16.89
2012-11-12 12:15:00 MSFT | 12.22 | INTC | 5.97 | AAPL | 16.22
....
...
..
SELECT DT, SYMBOL, PRICE FROM QUOTE AS Q INNER JOIN TempSymbol AS TS ON Q.SYMBOL = TS.SYMBOL
This returns records that I need to pivot but that's not available in SQLite is there an another way I should be attempting this? Any help is appreciated.
try out this
SELECT DT, SYMBOL, PRICE FROM QUOTE where SYMBOL in (Select SYMBOL from TempSymbol)
SQL is doing the part of your problem that it's designed to do: retrieve the data. You can add ORDER BY DT to make the records for the same date-time adjacent.
If you think about it a minute you'll see that a SELECT can't possibly return what you want. It returns table rows, and SQL table rows have constant length. So doing what you call a "pivot" is not a SELECT operation. You may be thinking of pivots in spreadsheets. Databases aren't spreadsheets.
After that, producing the report you want is best done with a little program in any of the languages with an SQLite interface (in Android for example that's Java; otherwise C or TCL). Make the query. Get the rows back as hashes, arrays, or ODM records. The rest is a couple of loops over this data. The algorithm is:
last_dt = null
for row in all rows
if row.dt != last_dt
start new output line
print dt
last_dt = dt
end
print ' | ', row.symbol, ' | ', row.price
end
Another note: With advanced DB features like stored procedures and XML objects you could implement this in SQL. XML objects can have variable numbers of fields. Here the limit is SQLite, which doesn't provide these features.

Resources