Oracle 11g: Comparing two *LOB columns of different types - plsql

I've got read-only access to a database containing two schema with tables like this:
schema1.A.unique_id, schema1.A.content
schema2.B.unique_id, schema2.B.content
A.unique_id and B.unique_id will match while A.content and B.content are *LOB columns that should match (wasn't my idea lol). What I'd like to do is compare the contents of the content fields and see how many are equal. However, one is a CLOB and one is a BLOB.
DBMS_LOB.COMPARE() is an obvious helper, however it only compares two *LOBs of the same type (e.g. CLOB vs. CLOB).
In lieu of writing a script to get the content of the fields and compare them in memory, how can I perform this comparison in straight-up PL/SQL? Is there some way I can convert one of the fields on-the-fly so that the types match (again keep in mind I only have read-only access)?
Thanks!

Related

Querying on Global Secondary indexes with a usage of contains operator

I've been reading a DynamoDB docs and was unable to understand if it does make sense to query on Global Secondary Index with a usage of 'contains' operator.
My problem is as follows: my dynamoDB document has a list of embedded objects, every object has a 'code' field which is unique:
{
"entities":[
{"code":"entity1Code", "name":"entity1Name"},
{"code":"entity2Code", "name":"entity2Name"}
]
}
I want to be able to get all documents that contain entities with entity.code = X.
For this purpose I'm considering adding a Global Secondary Index that would contain all entity.codes that are present in current db document separated by a comma. So the example above would look like:
{
"entities":[
{"code":"entity1Code", "name":"entity1Name"},
{"code":"entity2Code", "name":"entity2Name"}
],
"entitiesGlobalSecondaryIndex":"entityCode1,entityCode2"
}
And then I would like to apply filter expression on entitiesGlobalSecondaryIndex something like: entitiesGlobalSecondaryIndex contains entityCode1.
Would this be efficient or using global secondary index does not make sense in this way and DynamoDB will simply check the condition against every document which is similar so scan?
Any help is very appreciated,
Thanks
The contains operator of a query cannot be run on a partition Key. In order for a query to use any sort of operators (contains, begins with, > < ect...) you must have a range attributes- aka your Sort Key.
You can very well set up a GSI with some value as your PK and this code as your SK. However, GSIs are replication of the table - there is a slight potential for the data ina GSI to lag behind that of the master copy. If the query you're doing against this GSI isn't very often, then you're probably safe from that.
However. If you are trying to do this to the entire table at once then it's no better than a scan.
If what you need is a specific Code to return all its documents at once, then you could do a GSI with that as the PK. If you add a date field as the SK of this GSI it would even be time sorted. If you query against that code in that index, you'll get every single one of them.
Since you may have multiple codes, if they aren't too many per document, you maybe could use a Sparse Index - if you have an entity with code "AAAA" then you also have an attribute named AAAA (or AAAAflag or something.) It is always null/does not exist Unless the entities contains that code. If you do a GSI on this AAAflag attribute, it will only contain documents that contain that entity code, and ignore all where this attribute does not exist on a given document. This may work for you if you can also provide a good PK on this to keep the numbers well partitioned and if you don't have too many codes.
Filter expressions by the way are different than all of the above. Filter expressions are run on tbe data that would be returned, after it is already read out of the table. This is useful I'd you have a multi access pattern setup, but don't want a particular call to get all the documents associated with a particular PK - in the interests of keeping the data your code is working with concise. The query with a filter expression still retrieves everything from that query, but only presents what makes it past the filter.
If are only querying against a particular PK at any given time and you want to know if it contains any entities of x, then a Filter expressions would work perfectly. Of course, this is only per PK and not for your entire table.
If all you need is numbers, then you could do a count attribute on the document, or a meta document on that partition that contains these values and could be queried directly.
Lastly, and I have no idea if this would work or not, if your entities attribute is a map type you might very well be able to filter against entities code - and maybe even with entities.code.contains(value) if it was an SK - but I do not know if this is possible or not

Is there a way to display dynamic columns in Oracle apex

Long story short, I can't use pivot for this task due to the long elements that I need to include in the columns. Although I tried to create a Classic Report based on function in Oracle Apex. The query it's generated correctly but it's not working in the Classic Report.
A general hint first: Output your variable l_sql to your console using dbms_output.put_line or use some kind of debugging table where you can insert it into. Also be careful about the data type of that variable. If you need to expand the SQL you can reach a point where you need to use a CLOB variable instead of varchar2.
You will need to supply table structures and test data if you like to have your problem analyzed completely, therefore I will at first give you some general explanations:
Use Generic Column Names is ok if you have a permanent, unchangable amount of columns. But if the order of your columns or even the amount can change, then this is a bad idea, as your page will show an error if your query results in more columns than Generic Column Count
Option 1: Use column aliases in your query
Enhance your PL/SQL Function Body returning SQL Query in a way that it outputs verbose display names, like this:
return 'select 1 as "Your verbose column name", 2 as "Column #2", 3 as "Column #3" from dual';
That looks like this:
It has the disadvantage that the column names also appear in this way in the designer and APEX will only update these column names if you re-validate the function. You will have a hard time to reference a column with the internal name of Your verbose column name in a process code or dynamic action.
However it still works, even if you change the column names without telling APEX, for example by externalizing the PL/SQL Function Body into a real function.
Option 2: Use custom column headings
A little bit hidden, but there is also the option of completely custom column headings. It is almost at the end of the attributes page of your report region.
Here you can also supply a function that returns your column names. Be careful that this function is not supposed to return an SQL query that itself returns column names, but instead return column names seperated by a colon.
With this method, it is easier to identify and reference your columns in the designer:
Option 3: Both of it
Turn off Generic Column Names, let your query return column names that can be easily identified and referenced, and use the custom column headings function return verbose names for your users.
My personal opinion
Im using the 3rd option in a production application where people can change the amount and order of columns using shuttle items on the report page themselves. It took some time, but now it works like a charm, like some dynamic PIVOT without PIVOT.

How is an MVCCKey formed in CockroachDB?

I want to create a MVCCKey with a timestamp and pretty value I know. But I realize a roachpb.key is not very straightforward; is there some prefix/suffix involved? Is the database name is also encoded in roachpb.key?
Can anyone please tell me how a MVCCKey is formed? What information does it have? In the documentation, it just says that it looks like /table/primary/key/column.
An engine.MVCCKey combines a regular key with a timestamp. MVCCKeys are encoded into byte strings for use as RockDB keys (RocksDB is configured with a custom comparator so MVCCKeys are sorted correctly even though the timestamp uses a variable-width encoding).
Regular keys are byte strings of type roachpb.Key. For ordinary data records, the keys are constructed from table, column, and index IDs, along with the values of indexed columns. (The database ID is not included here; the database to which a table belongs can be found in the system.descriptors table)
The function keys.PrettyPrint can convert a roachpb.Key to a human-readable form.

Dynamically generating pzPVStream from a view

Is there some way to create a view that returns a pzPVStream that can be natively parsed by Pega when it executes an RDB?
For instance, maybe a query (in MS SQL Server) that resembled:
SELECT test_tbl_outer.ID, (
select *, 'My-Int-TestClass' as "pxObjClass"
from {class:My-Int-TestClass} as test_tbl_inner
where test_tbl_inner.ID=test_tbl_outer.ID
FOR XML RAW('pagedata'), TYPE, ELEMENTS
) as pzPVStream
from {class:My-Int-TestClass} as test_tbl_outer
This gets an invalid signature error (the SQL query does work directly however), and if I try to shove a signature string onto the column ('PR6d' or previous) I just get a different error regarding headers.
So at this point, I do realize that the pzPVstream is not stored as xml but as some sort of packed & compressed string. Is there a way for me to create a valid pzPVstream on the fly? Maybe something similar to what pr_read_from_stream does but in reverse?
The use case is that we'd like to pull a whole mess of data from an existing data warehouse. And it would be nice if we could pull all the multi-value data (many,many joins deep) over in one trip. We are not too concerned with the size of this object as we plan on pulling this data one way or another.
The pzPvStream is a compressed blob and it resonates a work object. It is compressed and stored as a single column in a table.
When it is read using obj-browse or obj-open activities, the blob is decompressed and all the encompassed properties are mapped to the clipboard.
This value has a proprietary format; the values are obfuscated.

What exactly are hashtables?

What are they and how do they work?
Where are they used?
When should I (not) use them?
I've heard the word over and over again, yet I don't know its exact meaning.
What I heard is that they allow associative arrays by sending the array key through a hash function that converts it into an int and then uses a regular array. Am I right with that?
(Notice: This is not my homework; I go too school but they teach us only the BASICs in informatics)
Wikipedia seems to have a pretty nice answer to what they are.
You should use them when you want to look up values by some index.
As for when you shouldn't use them... when you don't want to look up values by some index (for example, if all you want to ever do is iterate over them.)
You've about got it. They're a very good way of mapping from arbitrary things (keys) to arbitrary things (values). The idea is that you apply a function (a hash function) that translates the key to an index into the array where you store the values; the hash function's speed is typically linear in the size of the key, which is great when key sizes are much smaller than the number of entries (i.e., the typical case).
The tricky bit is that hash functions are usually imperfect. (Perfect hash functions exist, but tend to be very specific to particular applications and particular datasets; they're hardly ever worthwhile.) There are two approaches to dealing with this, and each requires storing the key with the value: one (open addressing) is to use a pre-determined pattern to look onward from the location in the array with the hash for somewhere that is free, the other (chaining) is to store a linked list hanging off each entry in the array (so you do a linear lookup over what is hopefully a short list). The cases of production code where I've read the source code have all used chaining with dynamic rebuilding of the hash table when the load factor is excessive.
Good hash functions are one way functions that allow you to create a distributed value from any given input. Therefore, you will get somewhat unique values for each input value. They are also repeatable, such that any input will always generate the same output.
An example of a good hash function is SHA1 or SHA256.
Let's say that you have a database table of users. The columns are id, last_name, first_name, telephone_number, and address.
While any of these columns could have duplicates, let's assume that no rows are exactly the same.
In this case, id is simply a unique primary key of our making (a surrogate key). The id field doesn't actually contain any user data because we couldn't find a natural key that was unique for users, but we use the id field for building foreign key relationships with other tables.
We could look up the user record like this from our database:
SELECT * FROM users
WHERE last_name = 'Adams'
AND first_name = 'Marcus'
AND address = '1234 Main St'
AND telephone_number = '555-1212';
We have to search through 4 different columns, using 4 different indexes, to find my record.
However, you could create a new "hash" column, and store the hash value of all four columns combined.
String myHash = myHashFunction("Marcus" + "Adams" + "1234 Main St" + "555-1212");
You might get a hash value like AE32ABC31234CAD984EA8.
You store this hash value as a column in the database and index on that. You now only have to search one index.
SELECT * FROM users
WHERE hash_value = 'AE32ABC31234CAD984EA8';
Once we have the id for the requested user, we can use that value to look up related data in other tables.
The idea is that the hash function offloads work from the database server.
Collisions are not likely. If two users have the same hash, it's most likely that they have duplicate data.

Resources