How do I specify encryption type when using s3remote for DVC - dvc

I have just started to explore DVC. I am trying with s3 as my DVC remote. I am getting
But when I run the dvc push command, I get the generic error saying
An error occurred (AccessDenied) when calling the PutObject operation: Access Denied
which I know for a fact that I get that error when I don't specify the encryption.
It is similar to running aws s3 cp with --sse flag or specifying ServerSideEncryption when using boto3 library. How can I specify the encryption type when using DVC. Coz underneath DVC uses boto3 so there must be an easy way to do this.

Got the answer for this immediately in the DVC discord channel!! By default, no encryption is used. We should specify what server-side encryption algorithm should be used.
Running dvc remote modify worked for me!
dvc remote modify my-s3-remote sse AES256
There are a bunch of things that we can configure here. All this does is that it adds an entry of sse = AES256 under the ['remote "my-s3-remote"'] inside the .dvc/config file.
More on this here
https://dvc.org/doc/command-reference/remote/modify

Related

AWS EMR writing to KMS Encrypted S3 Parquet Files

I am using AWS EMR 5.0, Spark 2.0, Scala 2.11, S3 - encrypted with KMS(SSE-custom key), Parquet files. I can read the encrypted parquet files - no problem. However, when I write, I get a warning. Simplified code looks like:
val headerHistory = spark.read.parquet("s3://<my bucket>/header_1473640645")
headerHistory.write.parquet("s3://<my bucket>/temp/")
but generates a warning:
16/09/15 13:11:11 WARN S3V4AuthErrorRetryStrategy: Attempting to re-send the request to my bucket.s3.amazonaws.com with AWS V4 authentication. To avoid this warning in the future, please use region-specific endpoint to access buckets located in regions that require V4 signing.
Do I need an option? Do I need to set some environment variable?
Thank you for providing additional details.
Yes, it is a known issue with KMS+SSE when using EMRFS(library under the hood for s3 communication).
The problem was when server side encryption + kms is enabled, the s3client in emrfs crafted request without specifying the signer type.
In a conservative way, s3 would try V2 initially, and then retry with V4 if first attempts failed. Such behavior will slow down the overall process.
EMRFS will be patched to specify using V4 at first attempt, this should be fixed in the next EMR release.
As mentioned, it doesn't break the job.
Please keep an eye for coming emr-5.x (no ETA)
https://docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/emr-whatsnew.html

Encrypt/Decyrypt - Cipher - JceSecurity Restriction

I am trying to encrypt/decrypt the data using javax.crypto.Ciper where I have given transformation as AES/ECB/PKCS5Padding.
My problem is when I run the code in Local machine, encryption / decryption works fine, however when I run the same code on Server, system throws Exception during Cipher.init("AES/ECB/PKCS5Padding").
On doing detailed analysis and checking the code inside Cipher.java, I found the problem is inside the following method Cipher-initCryptoPermission() when system checks for JceSecurity.isRestricted().
In my local machine, JceSecurity.isRestricted() returns FALSE, however when it runs on Server, the same method returns TRUE. Due to this on server, the system does not assigns right permissions to the Cipher.
Not sure, where exactly JceSecurity restriction is set. Appreciate your help.
On doing deep-diving I found the real problem and solution.
Under Java_home/jre/lib/security there are two jar files, local_policy.jar and US_export_policy.jar. Inside local_policy.jar, there is a file called default_local.policy, which actually stores all the permissions of the cryptography.
In my local machine the file had AllPermission, hence there were no Restriction in JceSecurity for me and was allowing me to use AES encryption algorithm, but on the server it is having limited version as provided by Java bundle.
Replacing the local_policy.jar with no restrictions (or unlimited permissions) did the trick.
On reading more about it on Internet found that Java provides the restricted version with the download package as some countries have restrictions on using the type of cyptography algorithms, hence you must check with your organisation before replacing the jar files.
Jar files with no restrictions can be found on Oracle (Java) site at following location.Download link

communicate to a client/server application in R

Is it possible to communicate to a client/server application by calling the command System in R?
I use BaseX for storing xml databases and I want to call a Basex client from R by using the "system"command after have launching the basexserver manually
setwd("C:/Program Files/BaseX/bin/")
system("basexclient -U admin -P admin",wait = TRUE)
BaseX 8.1 [Client]
Try help to get more information.
The problem is that R can't communicate with the BaseX Client, and as consequence i get this error :
Child process not responding.R will terminate it.
I tried to change wait parameter to wait=FALSE and then execute a command BaseX but it seems that it can't communicate to the client also.
system("OPEN mydatabse",wait = FALSE)
object "mydatabse" not found
Any suggestions you can provide will be appreciated.
N.B : The same problem occurs with Java

Running AWS commands from commandline on a ShellCommandActivity

My original problem was that I want to increase my DynamoDB write throughput before I run the pipeline, and then decrease it when I'm done uploading (doing it max once a day, so I'm fine with the decreasing limitations).
They only way I found to do it is through a shell script that will issue the API commands to alter the throughput. How does it work with my AMI access_key and secret_key when it's a resource that pipeline creates for me? (I can't log in to set the ~/.aws/config file and don't really want to create an AMI just for this).
Should I write the script in bash? can I use ruby/python AWS SDK packages for example? (I prefer the latter..)
How do I pass my credentials to the script? do I have runtime variables (like #startedDate) that I can pass as arguments to the activity with my key and secret? Do I have any other way to authenticate with either the commandline tools or the SDK package?
If there is another way to solve my original problem - please let me know. I've only got to the ShellActivity solution because I couldn't find anything else in documentations/forums.
Thanks!
OK. found it - http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-concepts-roles.html
The resourceRole in the default object in your pipeline will be the one assigned to resources (Ec2Resource) that are created as a part of the pipeline activation.
The default one in configured to have all your permissions and AWS commandline and SDK packages are automatically looking for those credentials so no need to update ~/.aws/config of pass credentials manually.

Net.exe use 'Error: A command was used with conflicting switches.' while using /savecred

I am trying to use following command to map a drive in persistent mode, and I don't want it to ask login credentials everytime I reboot the machine:
net use P: \\server\folder Password123 /user:user123 /savecred /persistent:yes
But I am getting folowing error:
A command was used with conflicting switches.
More help is available by typing NET HELPMSG 3510.
I followed this article: http://pcsupport.about.com/od/commandlinereference/p/net-use-command.htm
Please help with this issue.
When we use /savecred switch we should not give the credentials in the same line. The correct command should be:
net use P: \\server\folder /savecred /persistent:yes
It will ask for username and password.
You can add your credentials to Windows Vault and then map you drive, this way you can avoid the limitation sabertooth1990 mentioned:
CMDKEY /add:%server% /user:%username% /pass:%password%
NET USE \\%server%\%folder% %localdrive% /SAVECRED /PERSISTENT:YES

Resources