sqlloader: how to use substr in when-clause correctly? - oracle11g

I'm having a problem that I thought to be rather common, but trying to look it up in the "Oracle Database 10g2 Utilities_b14215.pdf" didn't help. After that I've surfed through numerous threads but no luck so far.
I'm having a tab-delimited file (x'09') e. g. name, userid, persnr. The values for the userids begin with either P, R or T e. g. P2198, P2199, R7288, T1229.
I want to load only the records with userids beginning with P.
Isolating a single record with a controlfile like this works splendidly:
OPTIONS (SKIP=1)
LOAD DATA
INFILE UserlistLoader.dat
APPEND
INTO TABLE Z_USERLIST
WHEN USERID = 'P2198'
FIELDS TERMINATED BY x'09'
TRAILING NULLCOLS
(name, userid, persnr)
But every attempt at using SUBSTR in the when-clause fails.
This:
OPTIONS (SKIP=1)
LOAD DATA
INFILE UserlistLoader.dat
APPEND
INTO TABLE Z_USERLIST
WHEN SUBSTR(USERID, 1, 1) = 'P'
FIELDS TERMINATED BY x'09'
TRAILING NULLCOLS
(name, userid, persnr)
ends in an SQL*Loader-350: Syntax-Error.
This
OPTIONS (SKIP=1)
LOAD DATA
INFILE UserlistLoader.dat
APPEND
INTO TABLE Z_USERLIST
WHEN "SUBSTR(:USERID, 1, 1)" = 'P'
FIELDS TERMINATED BY x'09'
TRAILING NULLCOLS
(name, userid, persnr)
ends in an SQL*Loader-403: Referenced column USERID not present in table Z_USERLIST.
But IT IS PRESENT - as the first example proves. I've found that the column should be preceded by : but that obviously isn't the issue.
What am I doing wrong?

From SQL Loader docs the left-hand side of a WHEN condition can only be a full field name e.g. USERID or a position spec e.g. (3:5).
The docs aren't very clear though on what is allowed - e.g. can LIKE be used as the operator?
USERID LIKE 'P%'
I strongly suspect it can't though.

I would load the entire file into a staging table that matches the file layout, then run a procedure that inserts the rows you want from there into the production table. That is a more common way to handle loads with criteria like this without having to edit source data.
If you can preprocess the source file, move the userid to the first field or copy the first letter of the userid to it's own field and construct the WHEN like this so sqlldr looks at the first position (this will cause sqlldr to return non-zero though, as not all rows meet WHEN clause criteria):
WHEN (1) = 'P'

Related

In KQL how can I use bag_unpack to turn a serialized dictionary object in customDimensions into columns?

I'm trying to write a KQL query that will, among other things, display the contents of a serialized dictionary called Tags which has been added to the Application Insights traces table customDimensions column by application logging.
An example of the serialized Tags dictionary is:
{
"Source": "SAP",
"Destination": "TC",
"SAPDeliveryNo": "0012345678",
"PalletID": "(00)312340123456789012(02)21234987654(05)123456(06)1234567890"
}
I'd like to use evaluate bag_unpack(...) to evaluate the JSON and turn the keys into columns. We're likely to add more keys to the dictionary as the project develops and it would be handy not to have to explicitly list every column name in the query.
However, I'm already using project to reduce the number of other columns I display. How can I use both a project statement, to only display some of the other columns, and evaluate bag_unpack(...) to automatically unpack the Tags dictionary into columns?
Or is that not possible?
This is what I have so far, which doesn't work:
traces
| where datetime_part("dayOfYear", timestamp) == datetime_part("dayOfYear", now())
and message has "SendPalletData"
| extend TagsRaw = parse_json(customDimensions.["Tags"])
| evaluate bag_unpack(TagsRaw)
| project timestamp, message, ActionName = customDimensions.["ActionName"], TagsRaw
| order by timestamp desc
When it runs it displays only the columns listed in the project statement (including TagsRaw, so I know the Tags exist in customDimensions).
evaluate bag_unpack(TagsRaw) doesn't automatically add extra columns to the result set unpacked from the Tags in customDimensions.
EDIT: To clarify what I want to achieve, these are the columns I want to output:
timestamp
message
ActionName
TagsRaw
Source
Destination
SAPDeliveryNo
PalletID
EDIT 2: It turned out a major part of my problem was that double quotes within the Tags data are being escaped. While the Tags as viewed in the Azure portal looked like normal JSON, and copied out as normal JSON, when I copied out the whole of a customDimensions record the Tags looked like "Tags": "{\"Source\":\"SAP\",\"Destination\":\"TC\", ... with the double quotes escaped with backslashes.
The accepted answer from David Markovitz handles this situation in the line:
TagsRaw = todynamic(tostring(customDimensions["Tags"]))
A few comments:
When filtering on timestamp, better use the timestamp column As Is, and do the manipulations on the other side of the equation.
When using the has[...] operators, prefer the case-sensitive one (if feasable)
Everything extracted from dynamic value is also dynamic, and when given a dynamic value parse_json() (or its equivalent, todynamic()), simply returns it, As Is.
Therefore, we need to treet customDimensions.["Tags"] in 2 steps:
1st, convert it to string. 2nd, convert the result to dynamic.
To reference a field within a dynamic type you can use X.Y, X["Y"], or "X['Y'].
No need to combine them as you did with customDimensions.["Tags"].
As the bag_unpack plugin doc states:
"The specified input column (Column) is removed."
In other words, TagsRaw does not exist following the bag_unpack operation.
Please note that you can add prefix to the columns generated by bag_unpack. Might make it easier to differentiate them from the rest of the columns.
While you can use project, using project-away is sometimes easier.
// Data sample generation. Not part of the solution.
let traces =
print c1 = "some columns"
,c2 = "we"
,c3 = "don't need"
,timestamp = ago(now()%1d * rand())
,message = "abc SendPalletData xyz"
,customDimensions = dynamic
(
{
"Tags":"{\"Source\":\"SAP\",\"Destination\":\"TC\",\"SAPDeliveryNo\":\"0012345678\",\"PalletID\":\"(00)312340123456789012(02)21234987654(05)123456(06)1234567890\"}"
,"ActionName":"Action1"
}
)
;
// Solution starts here
traces
| where timestamp >= startofday(now())
and message has_cs "SendPalletData"
| extend TagsRaw = todynamic(tostring(customDimensions["Tags"]))
,ActionName = customDimensions.["ActionName"]
| project-away c*
| evaluate bag_unpack(TagsRaw, "TR_")
| order by timestamp desc
timestamp
message
ActionName
TR_Destination
TR_PalletID
TR_SAPDeliveryNo
TR_Source
2022-08-27T04:15:07.9337681Z
abc SendPalletData xyz
Action1
TC
(00)312340123456789012(02)21234987654(05)123456(06)1234567890
0012345678
SAP
Fiddle
If I understand correctly, you want to use project to limit the number of columns that are displayed, but you also want to include all of the unpacked columns from TagsRaw, without naming all of the tags explicitly.
The easiest way to achieve this is to switch the order of your steps, so that you first do the project (including the TagsRaw column) and then you unpack the tags. If desired, you can then use project-away to specifically remove the TagsRaw column after you've unpacked it.

Error in loading a data to neo4j in case statement in cypher query

try to upload CSV data with a case statement in the query, but the following error appears:
cypher:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///test.csv' as line
MATCH(a:test_t{tid:line.pid})
CASE
WHEN line.key !='NA' THEN
WITH split(line.key,",") as name
UNWIND name as x
MERGE(k:test_key{key_term:toLower(x)})
MERGE(a)-[:contains]->(k)
END
Error
Neo.ClientError.Statement.SyntaxError: Invalid input 'S': expected 'l/L' (line 5, column 3 (offset: 137))
"CASE"
Can anyone help me?
The CASE clause does not support embedding other Cypher clauses (but it can invoke functions). In fact, a CASE clause is not actually needed for your use case.
This query should work (the :auto at the beginning is needed in neo4j 4.0+):
:auto USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///test.csv' as line FIELDTERMINATOR ';'
WITH line
WHERE line.key <> 'NA'
MATCH (a:test_t {tid: line.pid})
UNWIND split(line.key, ',') as x
MERGE (k:test_key {key_term: TOLOWER(x)})
MERGE (a)-[:contains]->(k)
This query filters out all unwanted lines as soon as they are obtained from the file. Reducing the number of rows of data being worked on as early as possible is good practice.
Also, you have a second issue. Your data file cannot use the comma as both the (default) field terminator AND as the delimiter between your x values.
To resolve this ambiguity, the above query chose to use the FIELDTERMINATOR ';' option to specify that the ";" character will be used as the field terminator. A sample data file would look like this:
pid;key
123;NA
234;Foo,Bar
345;Bar,Baz
456;NA
567;Baz
You are using the CASE incorrectly. You cannot have update clauses inside of a CASE statement. Instead you can use a WHERE clause to filter the rows of the file. For instance, adding WHERE line.key != 'NA' while processing the file befor you move onto the update will work. Something like this should fit the bill.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///test.csv' as line
MATCH (a:test_t {tid: line.pid})
WITH line
WHERE line.key <> 'NA'
WITH split(line.key, ",") as name
UNWIND name as x
MERGE (k:test_key {key_term: toLower(x)})
MERGE (a)-[:contains]->(k)
It looks like,from your logic you could even move the test up above the MATCH. So this might be better (fewer unecessary matches).
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///test.csv' as line
WITH line
WHERE line.key <> 'NA'
MATCH (a:test_t {tid: line.pid})
WITH split(line.key, ",") as name
UNWIND name as x
MERGE (k:test_key {key_term: toLower(x)})
MERGE (a)-[:contains]->(k)

Adding value to existing database table in RSQLite

I am new to RSQLite.
I have an input document in text format in which values are seperately by '|'
I created a table with the required variables (dummy code as follows)
db<-dbconnect(SQLite(),dbname="test.sqlite")
dbSendQuery(conn=db,
"CREATE TABLE TABLE1(
MARKS INTEGER,
ROLLNUM INTEGER
NAME CHAR(25)
DATED DATE)"
)
However I am struck at how to import values into the created table.
I cannot use INSERT INTO Values command as there are thousands of rows and more than 20+ columns in the original data file and it is impossible to manually type in each data point.
Can someone suggest an alternative efficient way to do so?
You are using a scripting language. The deal of this is literally to avoid manually typing each data point. Sorry.
You have two routes:
1: You have corrected loaded a database connection and created an empty table in your SQLite database. Nice!
To load data into the table, load your text file into R using e.g. df <-
read.table('textfile.txt', sep='|') (modify arguments to fit your text file).
To have a 'dynamic' INSERT statement, you can use placeholders. RSQLite allows for both named or positioned placeholder. To insert a single row, you can do:
dbSendQuery(db, 'INSERT INTO table1 (MARKS, ROLLNUM, NAME) VALUES (?, ?, ?);', list(1, 16, 'Big fellow'))
You see? The first ? got value 1, the second ? got value 16, and the last ? got the string Big fellow. Also note that you do not enclose placeholders for text in quotation marks (' or ")!
Now, you have thousands of rows. Or just more than one. Either way, you can send in your data frame. dbSendQuery has some requirements. 1) That each vector has the same number of entries (not an issue when providing a data.frame). And 2) You may only submit the same number of vectors as you have placeholders.
I assume your data frame, df contains columns mark, roll, and name, corrsponding to the columns. Then you may run:
dbSendQuery(db, 'INSERT INTO table1 (MARKS, ROLLNUM, NAME) VALUES (:mark, :roll, :name);', df)
This will execute an INSERT statement for each row in df!
TIP! Because an INSERT statement is execute for each row, inserting thousands of rows can take a long time, because after each insert, data is written to file and indices are updated. Insert, enclose it in an transaction:
dbBegin(db)
res <- dbSendQuery(db, 'INSERT ...;', df)
dbClearResult(res)
dbCommit(db)
and SQLite will save the data to a journal file, and only save the result when you execute the dbCommit(db). Try both methods and compare the speed!
2: Ah, yes. The second way. This can be done in SQLite entirely.
With the SQLite command utility (sqlite3 from your command line, not R), you can attach a text file as a table and simply do a INSERT INTO ... SELECT ... ; command. Alternately, read the text file in sqlite3 into a temporary table and run a INSERT INTO ... SELECT ... ;.
Useful site to remember: http://www.sqlite.com/lang.html
A little late to the party, but DBI provides dbAppendTable() which will write the contents of a dataframe to an SQL table. Column names in the dataframe must match the field names in the database. For your example, the following code would insert the contents of my random dataframe into your newly created table.
library(DBI)
db<-dbConnect(RSQLite::SQLite(),dbname=":memory")
dbExecute(db,
"CREATE TABLE TABLE1(
MARKS INTEGER,
ROLLNUM INTEGER,
NAME TEXT
)"
)
df <- data.frame(MARKS = sample(1:100, 10),
ROLLNUM = sample(1:100, 10),
NAME = stringi::stri_rand_strings(10, 10))
dbAppendTable(db, "TABLE1", df)
I don't think there is a nice way to do a large number of inserts directly from R. SQLite does have a bulk insert functionality, but the RSQLite package does not appear to expose it.
From the command line you may try the following:
.separator |
.import your_file.csv your_table
where your_file.csv is the CSV (or pipe delimited) file containing your data and your_table is the destination table.
See the documentation under CSV Import for more information.

SQLite GROUP BY

I am using DB Browser for SQLite to extract some interesting data from my database but I encountered one big problem with GROUP BY statement.
Even the most basic SELECT I can imagine is not working properly.
(Filename nvarchar(2147483647))
SELECT FileName FROM TableName WHERE FileName LIKE '%Nieminen%' GROUP BY FileName gives 5 rows even though I know that there are 9 distinct FileNames containing the phrase 'Nieminen' (I've browsed it).
Can it be possible that GROUP BY in sqlite compares only N (e.g. 10) initial characters? From my observation it might be true...
Any clues?
why don't you try out the following:
SELECT DISTINCT FileName FROM TableName WHERE FileName LIKE '%Nieminen%'

SQLite: How to select part of string?

There is table column containing file names: image1.jpg, image12.png, script.php, .htaccess,...
I need to select the file extentions only. I would prefer to do that way:
SELECT DISTINCT SUBSTR(column,INSTR('.',column)+1) FROM table
but INSTR isn't supported in my version of SQLite.
Is there way to realize it without using INSTR function?
below is the query (Tested and verified)
for selecting the file extentions only. Your filename can contain any number of . charenters - still it will work
select distinct replace(column_name, rtrim(column_name,
replace(column_name, '.', '' ) ), '') from table_name;
column_name is the name of column where you have the file names(filenames can have multiple .'s
table_name is the name of your table
Try the ltrim(X, Y) function, thats what the doc says:
The ltrim(X,Y) function returns a string formed by removing any and all characters that appear in Y from the left side of X.
List all the alphabet as the second argument, something like
SELECT ltrim(column, "abcd...xyz1234567890") From T
that should remove all the characters from left up until .. If you need the extension without the dot then use SUBSTR on it. Of course this means that filenames may not contain more that one dot.
But I think it is way easier and safer to extract the extension in the code which executes the query.

Resources