Reconstitute PNG file stored as RAW in SQL Database - r

I am working toward writing a report from a SQL database (Windows SQL Server) that will require certain people to sign the report before submitting it to the client. We are hoping to have a system where these people can authorize their signature in the database, and then we can use an image of their signature stored in the database and place it on the report generated by LaTeX.
The signature images are created as PNGs, then stored in the database in a field with type varbinary. In order to use the signature in the report, I need to reconstitute the PNG to a file that I can with \includegraphics in LaTeX.
Unfortunately, I can't seem to recreate the PNGs out of the data base. Since I can't post a signature, we'll use the image below as an example.
With this image on my computer, I'm able to read the file as raw, write it to a different file, and get the same image when I open the new file.
#* It works to read the image from a file and rewrite it elsewhere
pal <- readBin("C:/[filepath]/ColorPalette.png",
what = "raw", n = 1e8)
writeBin(pal,
"C:/[filepath]/colors.png",
useBytes=TRUE)
Now, I've saved that same image to the database, and using RODBC, I can extract it like so:
#*** Capture the raw from the database
con <- odbcConnect("DATABASE")
Users <- sqlQuery(con, "SELECT * FROM dbo.[User]")
db_pal <- Users$Signature[Users$LastName == "MyName"]
#*** Write db_pal to a file, but the image won't render
#*** Window Photo Viewer can't open this picture because the file appears to be damaged, corrupted, or is too large (12KB)
writeBin(db_pal[[1]],
"C:/[filename]/db_colors.png",
useBytes=TRUE)
The objects pal and db_pal are defined here in this Gist (they are too long to fit in the allowable space here)
Note: db_pal is a list of one raw vector. Also, it's clearly different than the raw vector pal
> length(pal)
[1] 2471
> length(db_pal[[1]])
[1] 9951
Any thoughts on what I may need to do to get this image out of the database?

Well, we've figured out a solution. The raw vector being returned through RODBC did not match what was in the SQL database. Somewhere in the pipeline, the varbinary object from SQL was getting distorted. I'm not sure why or how. But this answer to a different problem inspired us to recast the variables. As soon as we recast them, we could see the correct representation.
The next problem was that all of our images are more than 8000 bytes, and RODBC only allows 8000 characters at a time. So I had to fumble my way around that. The code below does the following:
Determine the largest number of bytes in an image file
Create a set of variables (ImagePart1, ..., ImagePart[n]) breaking the image into as many parts as necessary, each with max length 8000.
Query the database for all of the images.
Combine the image parts into a single object
Write the images to a local file.
The actual code
library(RODBC)
lims <- odbcConnect("DATABASE")
#* 1. Determine the largest number of bytes in the largest image file
ImageLength <- sqlQuery(lims,
paste0("SELECT MaxLength = MAX(LEN(u.Image)) ",
"FROM dbo.[User] u"))
#* Create a query string to make a set of variables breaking
#* the images into as many parts as necessary, each with
#* max length 8000
n_img_vars <- ImageLength$MaxLength %/% 8000 + 1
start <- 1 + 8000 * (0:(n_img_vars - 1))
end <- 8000 + 8000 * (0:(n_img_vars - 1))
img_parts <- paste0("ImagePart", 1:n_img_vars,
" = CAST(SUBSTRING(u.Image, ", start,
", ", end, ") AS VARBINARY(8000))")
full_query <- paste0("SELECT u.OID, u.LastName, u.FirstName,\n",
paste0(img_parts, collapse =",\n"), "\n",
"FROM dbo.[User] u \n",
"WHERE LEN(u.Image) > 0")
#* 3. Query the database for all the images
Images <- sqlQuery(lims, full_query)
#* 4. Combine the images parts into a single object
Images$full_image <-
apply(Images[, grepl("ImagePart", names(Images))], 1,
function(x) do.call("c", x))
#* 5. Write the images to a local file
for(i in seq_len(nrow(Images))){
DIR <- "[FILE_DIR]"
FILENAME <- with(Images, paste0(OID[i], "-", LastName[i], ".png"))
writeBin(unlist(Images$full_image[i]),
file.path(DIR, FILENAME))
}

I may be misinterpreting the question, but it is possible that the raster package should be of help to you.
library(raster)
your_image <- raster(nrows=587,ncols=496,values=db_pal[[1]])
plot(your_image)
But it doesn't make sense that the length of db_pal[[1]] isn't 291,152 (587*496), so something isn't adding up for me. Do you know where these 291,152 values would be stored?

Related

"%s" random concatenation in R

I am looking for a way to concatenate a string or a number (3 digits at least) into a save file.
For instance, in python I can use '%s' % [format(str), format(number)]) and add it to csv file with a random generator.
How do I generate a random number into a format in R?
That is my save file and I want to add a random string or a number in the end of the file name:
file = paste(path, 'group1N[ADD FORMAT HERE].csv',sep = '')
file = paste(path, 'group1N.csv',sep = '') to become -- >
file = paste(path, 'group1N212.csv',sep = '') or file = paste(path, 'group1Nkut.csv',sep = '')
after using a random generator of strings or numbers and appending it to the save .csv file, each time it is saved, as a random generated end of file
You could use the built-in tempfile() function:
tempfile(pattern="group1N", tmpdir=".", fileext=".csv")
[1] "./group1N189d494eaaf2ea.csv"
(if you don't specify tmpdir the results go to a session-specific temporary directory).
This won't write over existing files; given that there are 14 hex digits in the random component, I think the "very likely to be unique" in the description is an understatement ... (i.e. at a rough guess the probability of collision might be something like 16^(-14) ...)
The names are very likely to be unique among calls to ‘tempfile’
in an R session and across simultaneous R sessions (unless
‘tmpdir’ is specified). The filenames are guaranteed not to be
currently in use.

Loading CSV with fread stops because of to large string

This is the command I'm using :
dallData <- fread("data.csv", showProgress = TRUE, colClasses = c(rep("NULL", 2), "character", rep("NULL", 37)))
but I get this error when trying to load it: R character strings are limited to 2^31-1 bytes|
Anyway to skip those values ?
Here's a strategy that may work or at least narrow down the possible sources of error. It assumes you have enough working memory to hold the data and that your separators are really commas. If you actually have tabs as separators then you will need to modify accordingly. The plan is to read using readLines which will basically ignore the quotes that are probably mismatched. Then figure out which line or lines are at fault using count.fields, table, and which.
input <- readLines("data.csv") # ignores quotes
counts.def <- count.fields(textConnection(input),
sep=",") # defaults quotes are both ' and "
table(counts.def) # might show a variety of line counts.
# Second try with just double-quotes
counts.dbl <- count.fields(textConnection(input),
sep=",", quote="\"") # just dbl-quotes
table(counts.dbl) # if all the same, then all you do is change the quotes argument
Depending on the results you may need to edit cerain lines which can be identified using which(counts.def < 40) assuming most of them are 40 as your input efforts suggest is the expected number of fields per line.
(If the tag for [ram] means you are limited and getting warnings or using virtual memory which slows things down horribly, then you should restart your OS, and only load R before trying again. R needs contiguous block of memory and Windoze isn't very good at memory management.)
Here's a small test case to work with:
input <- readLines(textConnection(
"v1,v2,v3,v4,v5,v6
text, text, text, text, text, text
text, text, O'Malley, text,text,text
junk,junk, more junk, \"text\", tex\"t, nothing
3,4,5,6,7,8")

Convert txt file to csv [only specific contents that matches a string pattern]

I have a *.DAT file which can be opened by txt editor. I want to extract some contents from this and convert it to *.csv. The converted csv file must have header (colnames), specification (lower and higher) and data portion. I need to convert 100's of these type of files to *.csv (as separate csv or all combined to one big csv file)
Sample snippet of my *.DAT file will look like below
[FILEINFO]
VERSION=V4.0
FILENAME=TEST.DAT
CREATIONTIME=2015-07-09 22:05:26
[LOTINFO]
LotNo=A6022142
DUT=BCEK450049
PRODUCTNAME=EX061
Order=
ChipCode=
SACH_NO=B39000-
MAT_NO=B39000-P810
[SPEC1]
TXT=SEN1
Unit=
LSL=-411.400000
USL=-318.700000
[SPEC2]
TXT=SEN2
Unit=
LSL=-11.000000
USL=11.000000
[SPEC3]
TXT=SEN3
Unit=
LSL=-45.000000
USL=10.000000
[DATA]
2,29,-411.232,10.193,-11.530,
3,29,-411.257,10.205,-11.328,
I can extract the contents below [DATA] and save in csv file. I am not sure how to extract the contents above to create header, etc. I used below code to extract contents below [DATA]
library(stringr)
library(readr)
myTXT <- read_file("EXAMPLE.DAT")
ExtData <- read.csv(text =
sub(".*\\[DATA\\]\\s+", "", my_txt), header = FALSE)
dat2csv <- write.csv(ExtData, dat_2_csv.csv",row.names=FALSE)
To extract the contents above [DATA] I tried below code with no success
con <- file("EXAMPLE.DAT","r")
OneLine <- c()
while(True) {
line = readLines(con,1)
if(length(line) == 0) break
elseif(line="LSL=")
RES <- str_split(line,"=",simplify=TRUE)
lines <- RES[1,2]
}
Expected output csv file as below
According to this link, .DAT files are very generic files with very specific information. Therefore, and especially after looking at your sample snippet, I doubt there is a straightforward way to do the conversion (unless there's a package designed specifically to process similar data).
I can only give you my 5 cents of my general strategy to tackle this:
For starters, instead of focusing on the .csv format, you should first focus on turning this text file into a table format.
To do so, you should save the parameters in separate vectors/columns (Every column could be TXT, Unit, LSL, etc.)
In doing so, each row (SPEC1, SPEC2, SPEC3) would be representing each datapoint with all its characteristics.
Even so, looks like it also contains metadata, and you might, therefore, save the different chunks of data into different variables (file.info = read_file(x, nrows = 4))
Hope it might help a bit.
Edit: As said by #qwe, the format resembles a .ini file. So a good way to start would be to open the file with a '=' delimiter:
data = read.table('example.dat', delim = '=')

Web Crawler using R

I want to build a webcrawler using R program for website "https://www.latlong.net/convert-address-to-lat-long.html", which can visit the website with the parameter for address and then fetch the generated latitude and longitude from the site. And this would repeat for the length of the dataset which I have.
Since I am new to web crawling domain, I would seek guidance.
Thanks in advance.
In the past I have used an API called IP stack (ipstack.com).
Example: a data frame 'd' that contains a column of IP addresses called 'ipAddress'
for(i in 1:nrow(d)){
#get data from API and save the text to variable 'str'
lookupPath <- paste("http://api.ipstack.com/", d$ipAddress[i], "?access_key=INSERT YOUR API KEY HERE&format=1", sep = "")
str <- readLines(lookupPath)
#save all the data to a file
f <- file(paste(i, ".txt", sep = ""))
writeLines(str,f)
close(f)
#save data to main data frame 'd' as well:
d$ipCountry[i]<-str[7]
print(paste("Successfully saved ip #:", i))
}
In this example, I was specifically after the Country location of each IP, which appears on line 7 of the data returned by the API (hence the str[7])
This API lets you lookup 10,000 addresses per month for free, which was enough for my purposes.

Exif data for camera trapping image management. Creating new names for images based on date-time and folder names

I am sorry for my stupid questions but I am struggling to write the code I want. I am working with code found on the blogpost: Fish and Whistle by Dewey Dunnington (http://apps.fishandwhistle.net/archives/956).
I am trying to write a loop where I can rename all images in all folders (recursively) with the name of the folders and the "CreateDate" date-time from the EXIF data. Images are stored in camera station (e.g. Station "A") folders and then in camera folders (2 cameras at a station e.g. Camera "1").
So an individual image directory would be:
H:/GitHub/CT_Mara/images/raw_images/A/1/....jpg....jpg....jpg....etc.
So, ideally, I would want my images renamed to "A1_2017-05-03 15-45-13.jpg" and if there are two or more photos with the same name they should be called: "A1_2017-05-03 15-45-13(1).jpg" and "A1_2017-05-03 15-45-13(2).jpg"
What I am trying to accomplish:
rename all images according to the date and time in
exifdata$CreateDate
attach (1), (2), etc to images with the same name
attach the name of the station and camera folders to the image's
name
then lastly, as a separate function, it would be nice to know how I
could create a new coulomb in the exifdata frame for example a
"species" coulomb where animals can be identified
This is the code I am using:
library(lubridate)
define exif function
exifRip <- function(filename) {
command <- paste("exiftool -n -csv",
paste(shQuote(filename), collapse=" "))
read.csv(textConnection(system(command, intern=TRUE)),
header = TRUE,
sep = ",",
quote = "",
stringsAsFactors = FALSE)
}
load exif data from my directory
exifdata <- exifRip(list.files(path="H:/GitHub/CT_Mara/images/raw_images"))
View(exifdata)
set output directory
outdir <- dir.create("H:/GitHub/CT_Mara/images/raw_images/EXIFdata")
Everything runs perfect except for this loop:
for(i in 1:nrow(exifdata)) {
row <- exifdata[i, ]
d <- ymd_hms(row$CreateDate)
ext <- tools::file_ext(row$SourceFile) #maintain file extension
newname <- file.path(outdir,
sprintf("%04d-%02d-%02d %02d.%02d.%02d.%s",
year(d), month(d), day(d), hour(d), minute(d),
second(d), ext))
file.copy(row$SourceFile, newname)
}
I get the following error message:
Error in sprintf("%04d-%02d-%02d %02d.%02d.%02d.%s", year(d), month(d), :
invalid format '%04d'; use format %f, %e, %g or %a for numeric objects
In addition: Warning message:
All formats failed to parse. No formats found.
Any advice on how to clean this up would be highly appreciated.. Thanks in advance.
Kind Regards,
Philip
The following exiftool command gives you almost exactly what you want without the need to write a script. The only difference is that duplicate files will be named like "NAME_1", "NAME_2" instead of "NAME(1)", "NAME(2)":
exiftool '-filename<%-1:1D%-1:D_${createdate}%+c.%e' -d "%Y-%m-%d %H-%M-%S" -r DIR
Where DIR is the name of the directory containing the images. If you are in Windows, you should use double quotes instead of single quotes around the first argument.
Replace "filename" with "testname" in this command for a dry-run test to see what the file names will be before actually doing the renaming.

Resources