Im trying to encrypt lines from a csv file with both digest and openssl.
library(digest)
library(openssl)
my_mails <- read.table("my_mails.csv", encoding="UTF-8")
my_mails$V1 = as.character(my_mails$V1)
my_mails$sha256_1 <- sapply(my_mails$V1, digest, algo="sha256", serialize=F)
my_mails$sha256_ssl <- sha256(my_mails$V1)
Specifically, as an example, running above with string
%&3*,19531006#$#)?¿
Yields in R
a02d0c3b070e79a78eb4b2fc87ba5e96137f9bb704095a85bc8ba8617cb5b57c
Yields in Here
ffb7b8082b811876ea78c25fef6ac8503c53c28cc806e99ac3c8a47cea5debfe
what could i be missing? who should i follow?
Related
I am trying to encode Passwords for an LDAP-server-file (.LDIF) therefore I am trying to re-engineer a password in order to find the correct algorithm. I am working with R:
So here's the password I'm trying to replicate:
userPassword:: e1NIQX1XNnBoNU1tNVB6OEdnaVVMYlBnekczN21qOWc9
I know this password is "password" so here's my next step:
rawToChar(base64decode('e1NIQX1XNnBoNU1tNVB6OEdnaVVMYlBnekczN21qOWc9'))
this yields:
{SHA}W6ph5Mm5Pz8GgiULbPgzG37mj9g=
(I know this step was not necessary but I wanted to get a grasp of what was base64-encoded)
From my understanding this is an SHA-1 algorithm without SALT. So my next goal is to reproduce this in R by:
library(base64enc)
library(digest)
sha <- digest("password", "sha1", serialize = FALSE)
sha <- paste0("{SHA}", sha)
sha <- base64encode(charToRaw(sha))
sha
[1] "e1NIQX01YmFhNjFlNGM5YjkzZjNmMDY4MjI1MGI2Y2Y4MzMxYjdlZTY4ZmQ4"
However this not similar to the password in the LDIF-File I was provided with. Any hint is greatly appreciated.
So, I've been struggling with this for a while now and can't seem to google my way out of it. I'm trying to read a .sql file into R, I always do that to avoid putting 100+ lines of sql in my R scripts. I usually do this:
library(tidyverse)
library(DBI)
con <- dbConnect(<CONNECTION ARGUMENTS>)
query <- read_file("path/to/script.sql")
df <- as_tibble(dbGetQuery(con, query))
dbDisconnect(con)
However, this time my sql script has some Spanish characters in it. Say something like this:
select tree_id, tree
from forest.trees
where species = 'árbol'
When I read this script into R and make the query it just doesn't return anything, but if I copy and paste the sql script into an R string it works! So it seems that the problem is in the line where I read the script into R.
I tried changing the string's encoding in a couple of ways:
# none of these work
query <- read_file("path/to/script.sql")
Encoding(query) <- "latin1"
query <- readLines("path/to/script.sql", encoding = "latin1")
query <- paste0(query, collapse = " ")
Unfortunately I don't have a public database to offer to anyone reading this. I'm connecting to a postgreSQL 11 database.
--- UPDATE ----
I'm on a windows 10 machine, with US locale.
When I use the read_file function the contents of query look ok, the Spanish characters print out like they should, but when I pass it to dbGetQuery it just doesn't fetch anything.
I tried forcing encoding "latin1" because I found online that Spanish characters tend to fix in R when doing that. When doing this, the Spanish characters print out wrong, so I didn't expected it to work, and it didn't.
The character values in my database have 'utf-8' encoding.
Just to be completely clear, all my attempts to read the .sql script haven't worked, however this does work:
library(tidyverse)
library(DBI)
con <- dbConnect(<CONNECTION ARGUMENTS>)
query <- "select tree_id, tree from forest.trees where species = 'árbol'"
# df actually has results
df <- as_tibble(dbGetQuery(con, query))
dbDisconnect(con)
The encoding statement is telling R how to interpret the filename, not its contents. Try this instead:
filetext <- readLines(file("path/to/script.sql", encoding = "latin1"))
See this answer for more details:R: can't read unicode text files even when specifying the encoding
So after some time to think about it, I wondered why the solution proposed by MrFlick didn't work. I checked the encoding of the file created by this chunk:
query <- "select tree_id, tree from forest.trees where species = 'árbol'"
write_lines(query, "test.sql")
After checking what encoding did test.sql have, it turned out it was ANSI, but it didn't look right. So I manually changed my original script.sql encoding to ANSI. After that it worked totally fine.
This solution however, didn't work when I cloned my repo on an ubuntu environment. In ubuntu there was no problem with the original 'utf-8' encoding.
Hope this helps anyone dealing with this in windows.
I am trying to create a simple csv table output in R that contains a listing of only files through a directory (recursively). The output should contain, at minimum 3 columns:
The Full Path (e.g. \path\to\file\somefile.txt)
File size
MD5 Hash of file
(additional file.info properties (data created, modified etc.) would be helpful, but not strictly necessary
I have the following script that I hacked together from various places on the internet, which works, but I think is not the 'best' way to do it and/or might be brittle. I am seeking any comments/suggests on how to clean this up and help improve my R-skills. Thanks!
*I am particularly concerned about how cbind works, and how does it "know" if row arrangement/order is preserved?
library(digest)
library(tidyverse)
library(magrittr)
test_dir <- "C:\\Path\\To\\Folder"
outfile <- "out.csv"
file.names <- list.files(test_dir, recursive = TRUE, full.names = TRUE)
md5s <- sapply(file.names, digest, file = TRUE, algo = "md5")
q <- map(file.names, file.info)
file.sizes <- map_df(q, extract, c("size"))
output <- cbind(file.names, file.sizes, md5s)
write_csv(output, str_c("./R/", outfile))
The chosen answer did not give me the md5 of the actual file but of the file's names! I got the real md5 (which matched the md5 generated from other sources) using the following command. This seems to work with only one file at a time.
library(openssl)
md5 <- as.character(md5(file(file.name, open="rb")))
For multiple files the following command worked for me
library(tools)
md5 = as.vector(md5sum(file.names))
One tip might be to use the openssl md5 function instead of digest.
library(openssl)
md5s <- md5(file.names)
It's already vectorised so you won't need to use sapply which may improve your processing speed (depending on how big a file you want to hash).
In terms of cbind, it will keep the order of the first column you are binding to using your key (md5) so the output will have the order that file.names has.
I have the following data frame which I can encrypt using the library(gpg) package and my key.
library(gpg)
df <- data.frame(A=c(1,2,3), B=c("A", "B", "C"), C=c(T,F,F))
df <- serialize(df, con=NULL, ascii=T)
enc <- gpg_encrypt(df, receiver="my#email.com")
writeBin(enc, "test.df.gpg")
Now, in order to restore the data frame, the logical course of things would be to decrypt the file
dec <- gpg_decrypt("test.df.gpg")
df <- unserialize(dec) #throws error !
(prompts for the password correctly) and then unserialize(dec). However, it seems that gpg_decrypt() delivers a sequence of plain characters to "dec" from which it is impossible to restore the original data frame.
I can decrypt the file on the linux command line using gpg2 command without problems and then read the decrypted file with readRSD() into R which then restores the original data frame ok.
However, I want to unserialize() "dec" and thus decrypt the file directly into R.
I know there are other solutions such as Hadleys secure package but it doesn't run without problems (described here) for me either.
Support for decrypting raw data has been added to the gpg R package. See https://github.com/jeroen/gpg/issues/5
Encrypted data can be read directly into R working memory without need to store decrypted file on disk.
I am connecting to an Oracle database from R using ROracle. The problem is for every special utf-8 character it returns a question mark. Some Chinese values returns a solid string of question marks. I believe this is relevant because I haven't found any other question on this site (or others) that answers this for the package ROracle.
Some questions that were the most promising include an answer for MySQL: Fetching UTF-8 text from MySQL in R returns "????" but I was unable to make this work for ROracle. This site also provided some useful information https://docs.oracle.com/cd/E17952_01/mysql-5.5-en/charset-connection.html Before I was using RODBC and was easily able to configure the uft-8 encoding.
Here is some sample code... I am sorry that unless you have an Oracle database with utf-8 characters it may be impossible to duplicate... I also changed the host number and the sid for data privacy reasons...
library(ROracle)
drv <- dbDriver("Oracle")
# Create the connection string
host <- "10.00.000.86"
port <- 1521
sid <- "f110"
connect.string <- paste(
"(DESCRIPTION=",
"(ADDRESS=(PROTOCOL=tcp)(HOST=", host, ")(PORT=", port, "))",
"(CONNECT_DATA=(SID=", sid, ")))", sep = "")
con <- dbConnect(drv, username = "XXXXXXXXX",
password = "xxxxxxxxx",dbname=connect.string)
my.table <- dbReadTable(con, "DASH_D_PROJECT_INFO")
my.table[40, 1:3]
PROJECT_ID DATE_INPUT PROJECT_NAME
211625 2012-07-01 ??????, ?????????????????? ????? ??????, 1869?1917 [????? 3]
Any help is appreciated. I have read the entire documentation of the ROracle packages, and it seemed to have a solution for writing utf-8 characters, but not for reading them.
Okay after several weeks I found my own answer. I hope that it will be of value to someone else.
My question is largely answered by how Oracle stores the data. If you want UTF-8 characteristics preserverd you need the column in the table to be an NVARCHAR not just a varchar. At that point regular data pulling and encoding will work in R as expected. I was looking for the error in the wrong place.
I also want to mention one hang up on how to write utf-8 data from R to Oracle with utf-8
In writing files I had some that would not convert to UTF-8 in the following manner. So I did the step in too parts and wrote them in two steps to an oracle table. The results worked perfectly.
Encoding(my.data1$Project.Name) <- "UTF-8"
my.data1.1 <- my.data1[Encoding(my.data1$Project.Name) == "UTF-8", ]
my.data1.2 <- my.data1[Encoding(my.data1$Project.Name) != "UTF-8", ]
attr(my.data1.1$Project.Name, "ora.encoding") <- "UTF-8"
If you found this insightful give it an up vote so more can find it.