SQLite: How to select only the filename without extension? - sqlite

I need to extract the filename without the extension from one of tables I have.
Currently I have used
SELECT filename FROM files
is used which returns the entire filename (Jessica.Timber.mp3). So is it possible to get only the filename using sqlite(eg: Jessica.Timber )? The files may contain multiple "." but only the last dot followed by the the ext should be removed.
I tried the following query, which provides the result if the extension is only 3 letter long (eg: *.mp3) but fails if its more than that (eg: *.flac)
SELECT substr(filename, -4, -100) from files;

Below is the query - will get you the file name without extension irrespective of extension length. File name can contain any number of . character (Tested and verified)
select replace(filename
, '.' || replace(filename
, rtrim(filename, replace(filename, '.', '') )
, '')
, '')
from files;
Test:
create table files (filename);
insert into files values ("ph.otoJpg.jpg"), ("ph.otoJpeg.jpeg");
Then the above code yields:
ph.otoJpg.
ph.otoJpeg.

SQLite has no built-in string functions that would help with this.
It would be possible to create a recursive CTE, but the easiest way is to retrieve the entire file name and to remove the extension in your code.

Related

U-SQL How can I get the current filename being processed to add to my extract output?

I need to add meta data about the Row being processed. I need the filename to be added as a column. I looked at the ambulance demos in the Git repo, but can't figure out how to implement this.
You use a feature of U-SQL called 'file sets' and 'virtual columns'. In my simple example, I have two files in my input directory, I use file sets and refer to the virtual columns in the EXTRACT statement, eg
// Filesets, file set with virtual column
#q =
EXTRACT rowId int,
filename string,
extension string
FROM "/input/filesets example/{filename}.{extension}"
USING Extractors.Tsv();
#output =
SELECT filename,
extension,
COUNT( * ) AS records
FROM #q
GROUP BY filename,
extension;
OUTPUT #output TO "/output/output.csv"
USING Outputters.Csv();
My results:
Read more about both features here:
https://msdn.microsoft.com/en-us/library/azure/mt621320.aspx

Extract only required files in U-SQL

Is it possible to extract files only for 3 days, without extracting all the files.
DROP VIEW IF EXISTS dbo.Read;
CREATE VIEW IF NOT EXISTS dbo.Read AS
EXTRACT
Statements
FROM
"adl://Test/{date:yyyy}/{date:M}/{date:d}/Testfile.csv"
USING Extractors.Csv(silent:true,quoting : true, nullEscape : "/N");
#res =
SELECT * FROM dbo.Read
WHERE date BETWEEN DateTime.Parse("2015/07/01") AND DateTime.Parse("2015/07/03");
OUTPUT #res
TO "adl://test/Testing/loop.csv"
USING Outputters.Csv();
Partition elimination already ensures for your query that only files matching predicates will actually be read (you can confirm that in the job graph).
See also my previous answer for How to implement Loops in U-SQL
If you have remaining concerns about performance, the job graph can also help you nail down where they originate.
You can use the pattern identifiers in the fileset specification in parts of the path or even parts of the name (see https://msdn.microsoft.com/en-us/library/azure/mt771650.aspx). You can do lists of files, so if you only have one file in each directory you can do;
EXTRACT ...
FROM "adl://Test/2015/07/1/Testfile.csv"
, "adl://Test/2015/07/2/Testfile.csv"
USING ...;
If there is more than one file in each directory you can do individual extracts for each day and then union the result. Something like:
#a = EXTRACT ....
FROM "adl://Test/2015/07/1/{*}.csv"
USING ...;
#b = EXTRACT ....
FROM "adl://Test/2015/07/2/{*}.csv"
USING ...;
#fullset = SELECT * FROM #a UNION SELECT * FROM #b;
Unfortunately I believe there is no list of filesets at the moment allowing you to do above case in one EXTRACT statement.

Use string text from file in u-sql query where clause ( U-SQL )

I need load some text from one file and choose specific records from another where one of second file columns is equalt to text loaded from first file.
I'm trying with something like that but actually it doesn't work.
#countryName =
EXTRACT City string
FROM "/TestCatalog/test.txt"
USING Extractors.Text();
#result =
SELECT CityName,
Temperature,
MeasurmentDate
FROM #readEmployee
WHERE CityName IN(#countryName);
What is the best way to pass some parameters to where expression ( readed from another file in azure data lake ) ?
Variables in U-SQL which are assigned to with EXTRACT or SEELCT are rowsets, rather than scalar variables. Therefore use SEMIJOIN to do this, for example:
#output =
SELECT re.CityName,
re.Temperature,
re.MeasurmentDate
FROM #readEmployee AS re
SEMIJOIN #countryName AS c ON re.CityName == c.City;
EXTRACT this other file into another rowset, and JOIN both rowsets together.

How do I utilize the TPT FileName attribute to list multiple files?

The documentation of Teradata's tbuild bulk utility states that I can list multiple files if I set FileList = 'Y'. It does not, however, mention how to do so.
I have tried something similar to this:
VARCHAR FileList = 'YES',
VARCHAR FileName = '\\path\to\file\file1.csv',
'\\path\to\file\file2.csv',
'\\path\to\file\file3.csv',
'\\path\to\file\file4.csv'
Which fails with this error (one for each file). This is the same error that occurs if I attempt to surround the entire thing with parenthesis:
TPT_INFRA: Syntax error at or near line 30 of Job Script File 'File_list_test.sql':
TPT_INFRA: At "\\path\to\file\file1.csv" missing { ARRAY_ BIGINT_ BYTEINT_ CHARACTER_ CHAR_ CHARACTERS_ CHARS_ INT_ INTEGER_ LONG_ SMALLINT_ VARCHAR_ VARDATE_ REGULAR_IDENTIFIER_ EXTENDED_IDENTIFIER_ EXTENDED_IDENTIFIER_NO_N_ } in Rule: Attribute Definition
I've tried to surround the entire list in double quotes. That fails with this error:
TPT_INFRA: At "'\\path\to\file\file1.csv','\\path\to\file\file2.csv','\\path\to\file\file3.csv','\\path\to\file\file4.csv'" missing { PLUS_ MINUS_ JOB_ATTRIBUTE_REFERENCE_ EXTENDED_LITERAL_ CHAR_STRING_LITERAL_ UNSIGNED_INTEGER_ EXACT_NUMERIC_VALUE_ APPROX_NUMERIC_VALUE_ } in Rule: Initial Value
I've tried to surround the entire list with square brackets which fails with this error:
TPT_INFRA: At "VARCHAR" missing RPAREN_ in Rule: Attribute List Definition
I've also tried to set VARCHAR FileName = to each file. Predictably, that fails with this error:
TPT_INFRA: TPT03044: Attribute 'FileName' is already on Operator 'My_DataConnector_Test' attribute list.
Duplicate definition is rejected.
How do I provide a file list so that I can load selected files via the tbuild utility?
I had the same problem a few years ago when I tried FileList the first time :-)
You just have to read the manuals carefully:
the file specified by FileName contains a list of files to be
processed.
When used with the FileList attribute, fileName is expected to
contain a list of names of the files to be processed, each with a full
path specification
The file specified by FileName must be a text file with each file name on a new line.
This is the content of the file "myfile.txt":
\\path\to\file\file1.csv
\\path\to\file\file2.csv
\\path\to\file\file3.csv
\\path\to\file\file4.csv
And now "myfile.txt" is the file used in TPT:
VARCHAR FileList = 'YES',
VARCHAR FileName = 'myfile.txt'

SQLite: How to select part of string?

There is table column containing file names: image1.jpg, image12.png, script.php, .htaccess,...
I need to select the file extentions only. I would prefer to do that way:
SELECT DISTINCT SUBSTR(column,INSTR('.',column)+1) FROM table
but INSTR isn't supported in my version of SQLite.
Is there way to realize it without using INSTR function?
below is the query (Tested and verified)
for selecting the file extentions only. Your filename can contain any number of . charenters - still it will work
select distinct replace(column_name, rtrim(column_name,
replace(column_name, '.', '' ) ), '') from table_name;
column_name is the name of column where you have the file names(filenames can have multiple .'s
table_name is the name of your table
Try the ltrim(X, Y) function, thats what the doc says:
The ltrim(X,Y) function returns a string formed by removing any and all characters that appear in Y from the left side of X.
List all the alphabet as the second argument, something like
SELECT ltrim(column, "abcd...xyz1234567890") From T
that should remove all the characters from left up until .. If you need the extension without the dot then use SUBSTR on it. Of course this means that filenames may not contain more that one dot.
But I think it is way easier and safer to extract the extension in the code which executes the query.

Resources