Apache Tika with Encrypted PDF - adobe

I wanted to extract PDF content using Apache Tika Library. All is good until I encountered PDF with encrypted username and password.
It hits errors as below:
INFO Document is encrypted
org.apache.tika.exception.EncryptedDocumentException: Unable to process: document is encrypted
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:153)
Caused by: org.apache.pdfbox.exceptions.CryptographyException: Cannot find an appropriate security handler for Adobe.APS
at org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:952)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:139)
... 4 more
Does anyone knows if Apache Tika supports extraction of PDF with such security feature?

You can try it below. It worked for me
PasswordProvider pp = (metadata) -> "password";
// Create a context parser for the pdf document
ParseContext context = new ParseContext();
context.set(PasswordProvider.class, pp);

Related

DocuSign .NET SDK - Esign DLL - Dealing with changes in Docusign Config Params regarding Error: 'Unexpected PEM type'

In Esign version 4.1.1, the VS2019 Docusign project code generators produce this type of config file:
Note that the developer must copy and paste the private key generated on the DocuSign "Quick Start" page into the VS2019 Docusign Project Wizard. The key is converted into a string, with each line in the original key file represented with a carriage return.
Using the private key value in this fashion, inline, with all the other params was very convenient.
This "RSAKey" param value does work with the 4.1.1 version. But does not with the 5.2 version.
In the Esign 5.2 version, we are now in the Asp.Net Core 3.1/.NET 5 style of code, so we now have this configuration file format:
This won't work with Esign 5.2. I surmise the change in 5.2 is this - the Docusign server generates a hash value of the key file, and if the generated hash of the key file submitted by an external client does not match, an "Unknown PEM File" error is sent back. I am trying to highlight the nuance that the first “gate” on the DocuSign server checks the file itself, not the RSA key inside the file.
The ramification, if true, is that we now have to treat the key file with kid gloves. If I wanted to store/retrieve this file from a remote source, I would need to take great care that not a single byte was changed/added/removed. This will require careful testing. As you can see from my sample appsettings.json above, I am forced to add "KeyFilePath" param in order to grab the physical file, which means I must always have it on hand in my project or be able to remotely load it (intact byte-wise) from a remote source. This increases the burden on the developer and maintenance staff considerably.
Ideally, what we need is a way to get that capability to put the key-file-as-a-string back into the config params.
Any ideas appreciated.
One way to solve this is by using https://base64.guru/ .
Using the "File-to-Base64" option allow me to provide this as a string parameter in a normal config file.
Then the C# code to use it looks like this:
var cred = LoadDocusingConfigIntoObject();
byte[] buffer = Convert.FromBase64String(cred.PrivateKey);
this.OAuthToken = docusignClient.RequestJWTUserToken(
cred.IntegrationKey,
cred.ImpersonatedUserId,
cred.AuthServer,
buffer,
1,
scopes
);

Access Denied for the signed URL .NET CloudFront

I'm trying to set up a cloudfront for my s3 bucket that will only allow users to read or write with the signed URLs.(read the file, upload, and download)
The S3 doesn't have public read/write permissions.
CloudFront is:
Http and HTTPS.
It has Trusted Signer as self.
It has Restricted View access.
It has a origin domain name as origin-domain-name/public.
Lastly, it has a origin access identity as origin-access-identity/cloudfront/XXXXXXX.
I have my cloudfront pem file and aws private key id.
My c# code to generate signed url is:
StreamReader sr = new StreamReader("../../keys/CloudFront-PrivateKey.pem");
var url = AmazonCloudFrontUrlSigner.GetCannedSignedURL(
AmazonCloudFrontUrlSigner.Protocol.http,
"http://xxxxxxxxxx.cloudfront.net",
sr,
"public/AddinJudgeIssue.png",
"<AWS Private Key ID>",
DateTime.Now.AddDays(2));
Each time when I execute the code, it generates the URL. However, when I copy and paste it to URL, it says "access denied".
First of all, does anyone have any idea about why this happening?
Secondly, this works somehow, can I use this same technique to upload assets to the bucket?
Thank you and apologize for my ignorance. I digged the aws whitepapers, but failed to find a straightforward guidance.
A look at the documentation suggests two problems:
"http://xxxxxxxxxx.cloudfront.net" should not include http:// because the field is distributionDomain and expects the domain name, not the base URL.
"public/AddinJudgeIssue.png" should have a leading / because this field is resourcePath. Paths begin with a / even though object keys don't.
After doing some experiment, I got it working. Although I used root credentials and pem keys to generate the signed URL, I still had to give public read/write access to my S3 bucket. That was the reason why I was getting access denied error. On Cloudfront setup, "restrict bucket access" option gives restriction to my bucket anyway.

IIS 7.0+ HTTP PUT Completes, but No File Saved

I'm struggling to figure out what exactly is happening. I am using GdPicture to save a scanned document through java script using their COM+ code and source project as my starting ground. Long story short is their function issues a HTTP PUT command specifying the file name to be saved.
When I execute the command I see that the request is getting to my server, and even has the appropriate content size to include the pdf document. I even get a 200 response back to my browser, no errors or anything...... yet the pdf doesn't get saved. Is that because PUT isn't the right way to do this? I don't have the option to POST the file because the transfer is wrapped in GdPicture's api... so with that said.
I have done the following
Ensured that IIS_IUSRS group has write permissions to the "Upload" virtual directory
Added a handler that specifically allows the PUT verb for "*.pdf"
Removed the StaticFileHandler for the "Upload" virtual directory
I aplogize for the links, but I don't have 10 rep points yet
PUT Request from FIDDLER
Response
** Edit **
More information about GdPicture, I have already contacted them and their function is not the problem. The implementation is as simple as
var status = oGdViewer.SaveDocumentToPDF_2("http://domain.com/Annotation/Upload/" + FileName, "user", "pass");
Thanks!

Encrypt and Decrypt documents through asp.net application

My asp.net application is in Web Server A and displays and let download MS-Word or PDF documents that are stored in Web Server B.
For security reasons, I was advised to encrypt and decrypt those documents when serving them up on the webserver A.
Could anyone give me some clue on how to do that?
I've never seen some utility before. My code just give value to a link control and let the user to click on it to display a MS-Word or PDF document, like:
Dim RemoteFolder As String
Dim RemoteFileName As String
RemoteFolder = "http://192.168.32.98/Application/Documents/"
RemoteFileName = "MyWordDocument.doc"
lnkOpenDocument.NavigateUrl = RemoteFolder + RemoteFileName
Using SSL might help, that protects all request/responses between the two servers. Otherwise .Net does have a encryption/decryption library under System.Security:
http://support.microsoft.com/kb/307010 also see this previous post What's the easiest way to encrypt a file in c#?
you can always grab the file from the user, encrypt using one of the above methods, and drop the encrypted file on webserver B. when reading it rather than link directly to the .doc file, link to another asp.net page, pass the ID of the file into that new page and have it pull the file from Webserver B decrypt it and display to the user.

Reading a remote URL in Domino LotusScript

I have a remote RSS feed which has to be transformed into Notes documents using LotusScript.
I've looked through the documentation, but I can't find how to open a remote URL in order to retrieve its contents. In other words, some sort of wget- or curl-like functionality. Can anyone shed some light on how to do this? Using Java is not an option.
Thanks.
Check out the NotesDOMParser class - available in LotusScript - which lets you (indirectly) pull XML from a remote URL and process in a an XML DOM object.
You can pull the XML into a string using the MSXMLHTTP COM object, then use NotesStream to send the XML to the NotesDOMParser.
I have not tested, but the code would look something like this:
...
Set objXML = CreateObject("Microsoft.XMLHTTP")
objXML.open "GET", sURL, False, "", ""
objXML.send("")
sXMLAsText = Trim$(objXML.responseText)
Set inputStream = session.CreateStream
inputStream.Open (sXMLAsText)
Set domParser=session.CreateDOMParser(inputStream, outputStream)
domParser.Process
...
Documentation: http://publib.boulder.ibm.com/infocenter/domhelp/v8r0/index.jsp?topic=/com.ibm.designer.domino.main.doc/H_NOTESDOMPARSER_CLASS.html
You can't open a remote URL (whether it's HTTP or some other protocol) using native Lotusscript: the object library simply doesn't support it. If you're running on a Windows server, you should be able to use the MS XMLHttp DLLs to get a handle on your remote file via a URL, as specified by the previous answer. (Alternatively, this link specifies how to parse and open a UNC path with Lotusscript—again, Windows only).
All that said, if I understand you correctly, you're not using HTTP to access the remote file at all. If the RSS file is just on a simple path, why can't you open the file for parsing in the normal way with Lotusscript?

Resources