Writing huge processed text into a file from MarkLogic - xquery

I have very huge processed text content(content process is done using XQuery in MarkLogic server), which I need to write into a text/csv file(outside of MarkLogic server). When I use standard API function like xdmp:save() it is consuming almost 4-5 min of the time. What is the best and idle way to reduce content writing time?

If timeouts are an issue, you can always extend the timeout limit up to the configured max using xdmp:request-set-time-limit
Instead of writing the output directly to a file on the MarkLogic filesystem, why not expose that query as an endpoint and have the client retrieve the contents?
either returned directly as they execute that module
or save the document into the database and return the URI that they can use to fetch via GET /v1/documents?
Another option for generating giant CSV reports would be to leverage tools such as CoRB to execute a batch job and collect the results in a CSV file.
https://github.com/marklogic-community/corb2/wiki/Hello-World-from-CORB

Related

How to get filtered list of files from SFTP server using SSHJ [duplicate]

I am using SSHJ SFTP library to get file list from SFTP-server.
The connection to server is very slow and there are tens of thousands of files in directory. Often getting file list will end in various timeout / socket errors.
Is there possibility to tell the client to retrieve file list only from eg. ".zip" files so that it would have positive impact on the performance? Pseudo command: sftpClient.ls("*.zip")
I know there is a method List<RemoteResourceInfo> net.schmizz.sshj.sftp.SFTPClient.ls(String path, RemoteResourceFilter filter) which will filter the list, but from what I understand, the filtering would happen only in client side? ie. the client would still receive whole file list and just after then it would be filtered.
Is there any way to achieve this so that server would only return the names requested? Does the SFTP-protocol even support this?
Indeed, the SFTP protocol does not have a way to provide a list of files matching any criteria. It does not matter, what SFTP library you are using.
You would have to use another interface/API if you need the filtered list. If you have a shell access, you might use shell command ls *.zip.
Or build you own (REST?) API.

MarkLogic I don't know how to get all the result

Hello I am trying to read a module with this code:
(: Entry point - must be a read-only query. :)
xdmp:invoke(
'/path/mydocument.xqy',
(xs:QName('var1'), 'test',
xs:QName('var2'), "response"))
I am new in MarkLogic, I am using groovy and the api to connect to it, but also I saw I can invoke the module with this and indeed I did but it returns me
your query returned an empty sequence
I want to know if I can query xs:QName('var1'), 'test', changing test with a wildcard or how can I get all the information from the file called /path/mydocument.xqy?
I tried to use this:
xdmp:document-get("/path/mydocument.xqy)
but it says the file is not found. Although, if I use invoke I can query it, but I don't know what are the values I have to pass. I was wondering if there is something like sql using %% or something to give me all the data.
To answer the first question: "I am trying to read a module "
IF the module is in the database, then you must query the Modules database in which the module resides.
If the module is in the filesystem then you cannot directly access its source as a document but you can by executing xdmp:filesystem-file()
Simplification:
With the Default configuration of the server and REST client, user placed modules are in the "Modules" database and user placed documents are in the "Documents" database. This means, if you do a GET (read a "Document") with no additional parameters, it will return documents from the "Documents" database. Assuming you are using the default configuration for client and server, this would result in the behavior you are seeing. E.g. your Module code is in the Modules database, doing a GET for it by name will search the Documents database and correctly not find it.
You don't mention, and I don't know, the groovy library being used, but the REST API itself and all implementations of general purpose ML REST client libraries I am familiar with have options for overriding the default database with another. If the groovy library supports that, then specify the "Modules" database for your query and it should return the module document. Note: content-type will be application/text not text/xml.
You can simplify things for testing by bypassing the libraries and simply use a browser and try a URL like this http://yourserver.com:8000/v1/documents?uri=/your/module.xqy&database=Modules
Ref: https://docs.marklogic.com/REST/GET/v1/documents
Making the appropriate changes to the path and server for your use.
If you are still confused, then you should start with the basic MarkLogic tutorials and work through them one by one. You will most likely succeed faster by doing this then jumping straight into coding you don't understand yet.
DETAIL:
Note: The default behaviour is to EXECUTE documents when doing a GET call, using the Modules database. Thus doing a GET of http://yourserver:8000/your/module.xqy will EXECUTE it not return its source.
You will notice the REST API has a uri query parameter. This is EXECUTING the REST API code on /v1/documents which in turn will read the document specified by the uri and database parameters and return it.
I guess I can use:
xdmp:invoke(/pview/get-pview-browse-profiles.xqy,
cts:and-query((
cts:element-value-query(
xs:QName("letter"),"*", "wildcarded"),
cts:element-value-query(
xs:QName("collection"),"*", "wildcarded"))))
although it doesn't return anything

Is it possible to access elasticsearch's internal stats via Kibana

I can see from querying our elasticsearch nodes that they contains internal statistics that for example show disk, memory and CPU usage (for example via GET _nodes/stats API).
Is there anyway to access these in Kibana-4?
Not directly, as ElasticSearch doesn't natively push it's internal statistics to an index. However you could easily set something like this up on a *nix box:
Poll your ElasticSearch box via REST periodically (say, once a minute). The /_status or /_cluster/health end points probably contain what you're after.
Pipe these to a log file in a simple CSV format along with a time stamp.
Point logstash to these log files and forward the output to your ElasticSearch box.
Graph your data.

what is the best way to upload a csv file into a MS SQL table?

Several approaches:
Use SQL Bulk Import Stored Proc and call the stored proc with the file path
Use SqlBulkCopy in System.Data.SqlClient dll
Read the file line by line and then insert into a table row by row
Any other ways?
Which one is best? I just want the user to select a file from asp.net webpage. And then click on Upload button to store the file in DB.
Secondly, do I need to move the file in server's memory before the file is copied into db table?
The DB won't know of the file because everything should be decoupled and layered. Saving the file to some shared location adds overhead and tidy ups etc
Yes
Surely you want this to be an atomic operation. 10k rows would be 10 round trips with a client side transaction running. If not atomic, then you'd need staging tables and tidy ups
Parse in c#, send to the DB with a table valued parameter. Otherwise, probably nothing same and/or realistic...

Replacing SQLite database while accessing it

I am completely new to SQLite and I intend to use it in a M2M / client-server environment where a database is generated on the server, sent to the client as a file and used on the client for data lookup.
The question is: can I replace the whole database file while the client is using it at the same time?
The question may sound silly but the client is a Linux thin client and to replace the database file a temporary file would be renamed to the final file name. In Linux, a program which has still open the older version of the file will still access the older data since the old file is preserved by the OS until all file handles have been closed. Only new open()s will access the new version of the file.
So, in short:
client randomly accesses the SQLite database
a new version of the database is received from the server and written to a temporary file
the temporary file is renamed to the SQLite database file
I know it is a very specific question, but maybe someone can tell me if this would be a problem for SQLite or if there are similar methods to replace a database while the client is running. I do not want to send a bunch of SQL statements from the server to the client to update the database.
No, you cannot just replace an open SQLite3 DB file. SQLite will keep using the same file descriptor (or handle in Windows-speak), unless you close and re-open your database. More specifically:
Deleting and replacing an open file is either useless (Linux) or impossible (Windows). SQLite will never get to see the contents of the new file at all.
Overwriting an SQLite3 DB file is a recipe for data corruption. From the SQLite3 documentation:
Likewise, if a rogue process opens a
database file or journal and writes
malformed data into the middle of it,
then the database will become corrupt.
Arbitrarily overwriting the contents of the DB file can cause a whole pile of issues:
If you are very lucky it will just cause DB errors, forcing you to reopen the database anyway.
Depending on how you use the data, your application might just crash and burn.
Your application may try to apply an existing journal on the new file. Sounds painful? It is!
If you are really unlucky, the user will just get back invalid results from any queries.
The best way to deal with this would be a proper client-server implementation where the client DB file is updated from data coming from the server. In the long run that would allow for far more flexibility, while also reducing the bandwidth requirements by sending updates, rather than the whole file.
If that is not possible, you should update the client DB file in three discrete steps:
Send a message to the client application to close the DB. This allows the application to commit any changes, remove any journal files and clean-up its internal state.
Replace/Overwrite the file.
Send a message to the client application to re-open the DB. You would have to setup all prepared statements again, though.
If you do not want to close the DB file for some reason, then you should have your application - or even a separate process - update the original DB file using the new file as input. The SQLite3 backup API might be of interest to you in that case.

Resources