Documentum Document Content Share Location from SQL

Documentum Document Content Share Location from SQL - oracle11g

I am trying to sort out how to find the physical location of a file on a mapped documentum share. There are several ways to do it using the API or DQL, but neither of those will scale to what we need to migrate data out of the system. Ultimately the plan is to migrate all data out and into a new system, but we need the file locations to plan this out.
The following resources have been helpful:
https://robineast.wordpress.com/2007/01/24/where-is-my-content-stored/
https://community.emc.com/thread/51958?start=0&tstart=0
Running this DQL will give us the location, but the SQL provided does not return any data relevant to what we're trying to accomplish (or anything at all).
execute GET_PATH for '<parent_id_goes_here>'
Result:
t:\documentum\data\schema\storage_volume_number\00000000\80\01\ef\63.xlsx
Additionally, using the API with getpath returns valid data, but when choosing to show the SQL is gives the same query (a little further down) which doesn't actually give the location of the file.
API>getpath,c,<r_object_id>
...
t:\documentum\data\schema\storage_volume_number\00000000\80\01\ef\63.xlsx
This is the query provided with both when you choose 'Show the SQL'.
select a.r_object_id, b.audit_attr_names, a.is_audittrail,
a.event, a.controlling_app, a.policy_id,
a.policy_state, a.user_name, a.message,
a.audit_subtypes, a.priority, a.oneshot,
a.sendmail, a.sign_audit
from dmi_registry_s a, dmi_registry_r b
where a.r_object_id = b.r_object_id and a.registered_id = :p0 and (a.event = 'all' or a.event = 'dm_all' or a.event = :p1)
order by a.is_audittrail desc, a.event desc,
a.r_object_id, b.i_position desc;
:p0 = < parent_id >;
:p1 = dm_getfile
The above query returns nothing in PL/SQL, and removing the :p0/:p1 variables just returns audit data.
Any guidance on how to get this using SQL, or a DQL script that could be written to give the path and r_object_id in a CSV to join? I'm also open to other ideas of pulling data out of this system.

After a lot of digging I found that the best way to go about this is to convert the data ticket into your path. To quote the articles linked in the question:
The trick to determining the path to the content is in decoding the data_ticket's 2’s complement decimal value. Convert the data_ticket to a 2’s compliment hexadecimal number by first adding 2^32 to the number and then converting it to hex. You can use a scientific calculator to do this or grab some Java code off the net.
-2147474649 + 2^32 = (-2147474649 + 4294967296) = 2147492647
converting 2147492647 to hex = 80002327
Now, split the hex value of the data_ticket at every two characters, append it to file_system_path and docbase_id (padded to 8 bits), and add the dos_extension. Viola! you have the complete path to the content file.
C:/Documentum/data/docbase/content_storage_01/0000001/80/ 00/23/27.txt
This PowerShell code will do the conversion for you -- just feed it the data ticket.
$Ticket = -2147474649
$FSTicketInt = $Ticket + [math]::Pow(2, 32)
$FSTicketHex = [Convert]::ToString($FSTicketInt, 16)
$FSTicketPath = ($FSTicketHex -split '(..)' | ? {$_}) -join '\'
Then all you need to do is join the path with the content storage location using [System.IO.Path]::Combine().

Related

How to SELECT a single record in table X with the largest value for X.a WHERE values for fields X.b & X.c are specified

I am using the following query to obtain the current component serial number (tr_sim_sn) installed on the host device (tr_host_sn) from the most recent record in a transaction history table (PUB.tr_hist)
SELECT tr_sim_sn FROM PUB.tr_hist
WHERE tr_trnsactn_nbr = (SELECT max(tr_trnsactn_nbr)
FROM PUB.tr_hist
WHERE tr_domain = 'vattal_us'
AND tr_lot = '99524136'
AND tr_part = '6684112-001')
The actual table has ~190 million records. The excerpt below contains only a few sample records, and only fields relevant to the search to illustrate the query above:
tr_sim_sn |tr_host_sn* |tr_host_pn |tr_domain |tr_trnsactn_nbr |tr_qty_loc
_______________|____________|_______________|___________|________________|___________
... |
356136072015140|99524135 |6684112-000 |vattal_us |178415271 |-1.0000000000
356136072015458|99524136 |6684112-001 |vattal_us |178424418 |-1.0000000000
356136072015458|99524136 |6684112-001 |vattal_us |178628048 |1.0000000000
356136072015050|99524136 |6684112-001 |vattal_us |178628051 |-1.0000000000
356136072015836|99524137 |6684112-005 |vattal_us |178645337 |-1.0000000000
...
* = key field
The excerpt illustrates multiple occurrences of tr_trnsactn_nbr for a single value of tr_host_sn. The largest value for tr_trnsactn_nbr corresponds to the current tr_sim_sn installed within tr_host_sn.
This query works, but it is very slow, ~8minutes.
I would appreciate suggestions to improve or refactor this query to improve its speed.

Check with your admins to determine when they last updated the SQL statistics. If the answer is "we don't know" or "never" then you might want to ask them to run the following 4gl program which will create a SQL script to accomplish that:
/* genUpdateSQL.p
*
* mpro dbName -p util/genUpdateSQL.p -param "tmp/updSQLstats.sql"
*
* sqlexp -user userName -password passWord -db dnName -S servicePort -infile tmp/updSQLstats.sql -outfile tmp/updSQLtats.log
*
*/
output to value( ( if session:parameter <> "" then session:parameter else "updSQLstats.sql" )).
for each _file no-lock where _hidden = no:
put unformatted
"UPDATE TABLE STATISTICS AND INDEX STATISTICS AND ALL COLUMN STATISTICS FOR PUB."
'"' _file._file-name '"' ";"
skip
.
put unformatted "commit work;" skip.
end.
output close.
return.
This will generate a script that updates statistics for all table and all indexes. You could edit the output to only update the tables and indexes that are part of this query if you want.
Also, if the admins are nervous they could, of course, try this on a test db or a restored backup before implementing in a production environment.

I am posting this as a response to my request for an improved query.
As it turns out, the following syntax features two distinct features that greatly improved the speed of the query. One is to include tr_domain search criteria in both main and nested portions of the query. Second is to narrow the search by increasing the number of search criteria, which in the following are all included in the nested section of the syntax:
SELECT tr_sim_sn,
FROM PUB.tr_hist
WHERE tr_domain = 'vattal_us'
AND tr_trnsactn_nbr IN (
SELECT MAX(tr_trnsactn_nbr)
FROM PUB.tr_hist
WHERE tr_domain = 'vattal_us'
AND tr_part = '6684112-001'
AND tr_lot = '99524136'
AND tr_type = 'ISS-WO'
AND tr_qty_loc < 0)
This syntax results in ~0.5s response time. (credit to my colleague, Daniel V.)
To be fair, this query uses criteria outside the originally stated parameters that were included in the original post, making it difficult to impossible for others to attempt a reasonable answer. This omission was not on purpose of course, rather due to being fairly new to fundamentals of good query design. This query in part is a result of learning that when too-few or non-indexed fields are used as search criteria in a large table, it is sometimes helpful to narrow the search by increasing the number of search criteria items. The original had 3, this one has 5.

How to pass a predefined variable value into the part of a predefined hard coded string in Python?

I have below hard coded string which i read from our configuration table in MySql.(And there are many hard coded {} in our database, I can't go to mysql to recognize all the variables and handle it manually in the script.)
string_from_sql_table = '/distribution/{partner}/{process_date}/'
I have below predefined variables in my python script.
from datetime import datetime
string_from_sql_table = '/distribution/{partner}/{process_date}/' #to keep simple, didn't include code to read the string from mysql table.
partner = 'startupco'
process_date = datetime.today().strftime('%Y%m%d_%H%M')
I got below output when I print(string_from_sql_table ):
/distribution/{partner}/{process_date}/
How to get below expected output without changing the input string_from_sql_table or changing the input string_from_sql_table:
/distribution/startupco/20191011_1317/

Simple
'/distribution/{partner}/{process_date}/'.format(
partner = 'startupco',
process_date = datetime.today().strftime('%Y%m%d_%H%M')
)
Read more on the topic here https://docs.python.org/3/library/stdtypes.html#str.format

U-SQL Ignore Empty Files

I receive a daily dump of files from a data provider. On occasion we receive empty files (20bytes). Is there any way to automatically avoid processing or skip these files?
I have tried:
USING Extractors.Csv(skipFirstNRows:1, silent:true);
But I seem to get a vertex failure related to what I believe is the empty files.

We recently added a FILE.LENGTH property as a computed virtual column that you can use to filter out files of a certain size.
For example the following should only operate on the files that are larger than 20 bytes:
#data =
EXTRACT
// ... columns to extract
, file_sz = FILE.LENGTH()
FROM "/mydata/{*}"
USING Extractors.Csv();
#res =
SELECT *
FROM #data
WHERE file_sz > 20;

How to get programatically the file path from a directory parameter in abap?

The transaction AL11 returns a mapping of "directory parameters" to file paths on the application server AFAIK.
The trouble with transaction AL11 is that its program only calls c modules, there's almost no trace of select statements or function calls to analize there.
I want the ability to do this dynamically, in my code, like for instance a function module that took "DATA_DIR" as input and "E:\usr\sap\IDS\DVEBMGS00\data" as output.
This thread is about a similar topic, but it doesn't help.
Some other guy has the same problem, and he explains it quite well here.

I strongly suspect that the only way to get these values is through the kernel directly. some of them can vary depending on the application server, so you probably won't be able to find them in the database. You could try this:
TYPE-POOLS abap.
TYPES: BEGIN OF t_directory,
log_name TYPE dirprofilenames,
phys_path TYPE dirname_al11,
END OF t_directory.
DATA: lt_int_list TYPE TABLE OF abaplist,
lt_string_list TYPE list_string_table,
lt_directories TYPE TABLE OF t_directory,
ls_directory TYPE t_directory.
FIELD-SYMBOLS: <l_line> TYPE string.
START-OF-SELECTION-OR-FORM-OR-METHOD-OR-WHATEVER.
* get the output of the program as string table
SUBMIT rswatch0 EXPORTING LIST TO MEMORY AND RETURN.
CALL FUNCTION 'LIST_FROM_MEMORY'
TABLES
listobject = lt_int_list.
CALL FUNCTION 'LIST_TO_ASCI'
EXPORTING
with_line_break = abap_true
IMPORTING
list_string_ascii = lt_string_list
TABLES
listobject = lt_int_list.
* remove the separators and the two header lines
DELETE lt_string_list WHERE table_line CO '-'.
DELETE lt_string_list INDEX 1.
DELETE lt_string_list INDEX 1.
* parse the individual lines
LOOP AT lt_string_list ASSIGNING <l_line>.
* If you're on a newer system, you can do this in a more elegant way using regular expressions
CONDENSE <l_line>.
SHIFT <l_line> LEFT DELETING LEADING '|'.
SHIFT <l_line> RIGHT DELETING TRAILING '|'.
SPLIT <l_line>+1 AT '|' INTO ls_directory-log_name ls_directory-phys_path.
APPEND ls_directory TO lt_directories.
ENDLOOP.

Try the following
data : dirname type DIRNAME_AL11.
CALL 'C_SAPGPARAM' ID 'NAME' FIELD 'DIR_DATA'
ID 'VALUE' FIELD dirname.
Alternatively if you wanted to use your own parameters(AL11->configure) then read these out of table user_dir.

Are there restricted characters in ADO varchars?

We have a simple file browser on our intranet, built using ASP/vbscript. The files are read by the script and added to an ADO Recordset (not connected to a database), so we can sort the contents easily:
Set oFolderContents = oFolder.Files
Set rsf = Server.CreateObject("ADODB.Recordset")
rsf.Fields.Append "name", adVarChar, 255
rsf.Fields.Append "size", adInteger
rsf.Fields.Append "date", adDate
rsf.Fields.Append "type", adVarChar, 255
rsf.Open
For Each oFile In oFolderContents
if not left(oFile.Name, 3) = "Dfs" then 'Filter DFS folders
rsf.AddNew
rsf.Fields("name").Value = oFile.Name
rsf.Fields("size").Value = oFile.Size
rsf.Fields("date").Value = oFile.DateCreated
rsf.Fields("type").Value = oFile.Type
end if
Next
In one particular folder we are getting an error:
Microsoft Cursor Engine error '80040e21'
Multiple-step operation generated errors. Check each status value.
This points to the line
rsf.Fields("name").Value = oFile.Name
in the code above.
My initial thought this was caused by a long file name, but I checked the length of all the files in the directory - although some are quite long, all are under the 255 character limit set above (largest is 198 characters long).
The folder in question has nearly 2000 PDFs in it, and I do not have permissions to alter the contents, just read (it's a technical library). The files have a naming convention of "ID# - Paper Title". Some have special characters such as ', &, and ( or ) - could some of these be causing the issue? I don't recall having such a problem before. I tried searching Google for special characters in ADO, but couldn't find anything which seemed relevant.
Thanks :-)

Have you tried using adVarWChar for the name column?

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex