I have the following data frame which I can encrypt using the library(gpg) package and my key.
library(gpg)
df <- data.frame(A=c(1,2,3), B=c("A", "B", "C"), C=c(T,F,F))
df <- serialize(df, con=NULL, ascii=T)
enc <- gpg_encrypt(df, receiver="my#email.com")
writeBin(enc, "test.df.gpg")
Now, in order to restore the data frame, the logical course of things would be to decrypt the file
dec <- gpg_decrypt("test.df.gpg")
df <- unserialize(dec) #throws error !
(prompts for the password correctly) and then unserialize(dec). However, it seems that gpg_decrypt() delivers a sequence of plain characters to "dec" from which it is impossible to restore the original data frame.
I can decrypt the file on the linux command line using gpg2 command without problems and then read the decrypted file with readRSD() into R which then restores the original data frame ok.
However, I want to unserialize() "dec" and thus decrypt the file directly into R.
I know there are other solutions such as Hadleys secure package but it doesn't run without problems (described here) for me either.
Support for decrypting raw data has been added to the gpg R package. See https://github.com/jeroen/gpg/issues/5
Encrypted data can be read directly into R working memory without need to store decrypted file on disk.
Related
I have a shiny app where I want to store userdata on the server and want to encrypt it before storing it. I'd like to use the encryptr package for this but so far I can't make my solution work properly. What I've managed so far is to write the data as a rds file, then encrypt it and delete the unencrypted copy. Ideally however, I'd like to only store the encrypted file. However, when I try to decrypt it again, the file doesn't change at all.
#### Approach with storing file first (works)
# data
data <- mtcars
# saving file
saveRDS(data,"Example.rds")
# keys
genkeys()
# encrypting
encrypt_file("Example.rds")
# deleting unencrypted copy
file.remove("Example.rds")
# unencrypting file
data_decrypted <- decrypt_file("Example.rds.encryptr.bin")
What I would like to do instead is something like this
#### Approach with storing only encrypted file (can't be decrypted again)
# data
data <- mtcars
# keys
genkeys()
# encrypting data
data <- encrypt(colnames(data))
# saving encrypted data
saveRDS(data,"EncryptedData.rds")
# clearing wd
rm(data)
# loading encrypted data
EncryptedData <- readRDS("EncryptedData.rds.encryptr.bin")
# decrypting data
data_decrypted <- decrypt(colnames(EncryptedData))
You seem to be missing the data parameter in your encrypt/decrypt calls and you are opening the wrong file name. Try
data |>
encrypt(colnames(data)) |>
saveRDS("EncryptedData.rds")
rm(data)
EncryptedData <- readRDS("EncryptedData.rds")
data_decrypted <- EncryptedData |> decrypt(colnames(EncryptedData))
Note that we pass the data into encrypt. If you just run encrypt(colnames(data)) without piping data into the function, you should get an error about "no applicable method ...an object of class character". I used the pipe operator |> but you could use regular function calls as well. Then, since you are writing to "EncryptedData.rds", make sure top open that file. The encrpyt() function changes your data. It does not have any effect on what the saved file name will be. If you aren't using encrypt_file, the file name will not change.
I'm trying to encrypt a dataframe in R with RSA using the encryptr package. It works fine until I try to decrypt it again which doesn't work for some reason. Yes, I've triple checked whether the password is correct, I've entered it with and without copy/paste and I have tried different passwords and datasets. I suspect I'm doing something wrong in general and I'm happy about any pointers. Here's a reproducible example that works fine for encrypting but doesn't let me decrypt the data again:
# loading test data
data <- mtcars
# generating keys with password
password = "THISISATEST"
genkeys()
# encrypting data
data_encrypt = data %>%
encrypt(colnames(data))
# checking encrypted data
View(data_encrypt)
#decrypting data
data_decrypt <- data_encrypt %>%
decrypt(colnames(data_encrypt))
It asking for the password again isn't it rejecting the password.
You need to enter your password for every column. So you need to enter your password 10 times in a row when you decrypt.
I want to read csv file from google cloud storage with a function similar to
read.csv.
I used library googleCloudStorageR and I can't find a function for that. I don't want to download it, I just want to read it in environment like a data frame.
If you download a .csv file, googleCloudStorageR will by default put it into a data.frame for you via write.csv - you can turn off the behaviour by specifying saveToDisk
# will make a data.frame
gcs_get_object("mtcars.csv")
# save to disk as a CSV
gcs_get_object("mtcars.csv", saveToDisk = "mtcars.csv")
You can specify your own parse function by supplying it via parseFunction
## default gives a warning about missing column name.
## custom parse function to suppress warning
f <- function(object){
suppressWarnings(httr::content(object, encoding = "UTF-8"))
}
## get mtcars csv with custom parse function.
gcs_get_object("mtcars.csv", parseFunction = f)
I’ve tried running a sample csv file with the as.data.frame() function.
In order to run this code snippet make sure you install (install.packages("data.table")) and included the library library(“data.table”)
Also be sure that you include the fread() within the as.data.frame() function in order to read the file from it’s location.
Here is the code snippet I ran and managed to display the data frame for my data set:
library(“data.table”)
MyData <- as.data.frame(fread(file="$FILE_PATH",header=TRUE, sep = ','))
print(MyData)
Reading Data with TensorFlow:
There is one other way you can read a csv from your cloud storage with the TensorFlow API. I would assume you are accessing this data from a bucket? Firstly, you would need to install the “readr” and “cloudml” packages for these functionalities to work. Then you would need to use gs_data_dir(“gs://your-bucket-name”) along with specifying the file path file.path(data_dir, “something.csv”). You would then want to read data from the file path with read_csv(file.path(data_dir, “something.csv”)). If you want it formatted as a data frame it should look something like this.
library(“data.table”)
library(cloudml)
library(readr)
data_dir <- gs_data_dir(“gs://your-bucket-name”)
MyData <- as.data.frame(read_csv(file.path(data_dir, “something.csv”)))
print(MyData)
Make sure you have properly authenticated access to your storage
More information in this link
I am trying to create a simple csv table output in R that contains a listing of only files through a directory (recursively). The output should contain, at minimum 3 columns:
The Full Path (e.g. \path\to\file\somefile.txt)
File size
MD5 Hash of file
(additional file.info properties (data created, modified etc.) would be helpful, but not strictly necessary
I have the following script that I hacked together from various places on the internet, which works, but I think is not the 'best' way to do it and/or might be brittle. I am seeking any comments/suggests on how to clean this up and help improve my R-skills. Thanks!
*I am particularly concerned about how cbind works, and how does it "know" if row arrangement/order is preserved?
library(digest)
library(tidyverse)
library(magrittr)
test_dir <- "C:\\Path\\To\\Folder"
outfile <- "out.csv"
file.names <- list.files(test_dir, recursive = TRUE, full.names = TRUE)
md5s <- sapply(file.names, digest, file = TRUE, algo = "md5")
q <- map(file.names, file.info)
file.sizes <- map_df(q, extract, c("size"))
output <- cbind(file.names, file.sizes, md5s)
write_csv(output, str_c("./R/", outfile))
The chosen answer did not give me the md5 of the actual file but of the file's names! I got the real md5 (which matched the md5 generated from other sources) using the following command. This seems to work with only one file at a time.
library(openssl)
md5 <- as.character(md5(file(file.name, open="rb")))
For multiple files the following command worked for me
library(tools)
md5 = as.vector(md5sum(file.names))
One tip might be to use the openssl md5 function instead of digest.
library(openssl)
md5s <- md5(file.names)
It's already vectorised so you won't need to use sapply which may improve your processing speed (depending on how big a file you want to hash).
In terms of cbind, it will keep the order of the first column you are binding to using your key (md5) so the output will have the order that file.names has.
I want to protect the content of my RData files with a strong encryption algorithm
since they may contain sensitive personal data which must not be
disclosed due to (legal) EU-GDPR requirements.
How can I do this from within R?
I want to avoid a second manual step to encrypt the RData files after creating them to minimize the risk of forgetting it or overlooking any RData files.
I am working with Windows in this scenario...
library(openssl)
x <- serialize(list(1,2,3), NULL)
passphrase <- charToRaw("This is super secret")
key <- sha256(passphrase)
encrypted_x <- aes_cbc_encrypt(x, key = key)
saveRDS(encrypted_x, "secret-x.rds")
encrypted_y <- readRDS("secret-x.rds")
y <- unserialize(aes_cbc_decrypt(encrypted_y, key = key))
You need to deal with secrets management (i.e. the key) but this general idiom should work (with a tad more bulletproofing).
I know it's very late but checkout this package endecrypt
Installation :
devtools::install_github("RevanthNemani\endecrypt")
Use the following functions for column encryption:
airquality <- EncryptDf(x = airquality, pub.key = pubkey, encryption.type = "aes256")
For column decryption:
airquality <- DecryptDf(x = airquality, prv.key = prvkey, encryption.type = "aes256")
Checkout this Github page
Just remember to generate your keys and save it for first use. Load the keys when required and supply the key object to the functions.
Eg
SaveGenKey(bits = 2048,
private.key.path = "Encription/private.pem",
public.key.path = "Encription/public.pem")
# Load keys already stored using this function
prvkey <- LoadKey(key.path = "Encription/private.pem", Private = T)
It is very easy to use and your dataframes can be stored in a database or Rdata file.