Get external image file dimensions with base r - r

Is it possible to get external image dimensions (for .png, .jpg, etc.) using base R? If not, what is the most lightweight R package that allows one to accomplish this?

Not base R, but the {magick} package does this easily. An example using a picture from the documentation:
library(magick)
frink <- image_read("https://jeroen.github.io/images/frink.png")
image_info(frink)
Output:
format width height colorspace matte filesize density
1 PNG 220 445 sRGB TRUE 73494 72x72

Here is a function inspired by r2evans suggestion to use the file system command:
png_url <- "https://static1.squarespace.com/static/5d2dfbe36a873f0001547453/t/5d2e5ab5607b8f00015901b6/1619287812905/?format=1500w"
jpg_url <- "https://guardian.ng/wp-content/uploads/2016/12/Rice-farm.jpg"
download.file(png_url, "test.png")
download.file(jpg_url, "test.jpg")
get_image_dimensions <- function(path) {
# Ensure file exists
if(!file.exists(path))
stop("No file found", call. = FALSE)
# Ensure file ends with .png or .jpg or jpeg
if (!grepl("\\.(png|jpg|jpeg)$", x = path, ignore.case = TRUE))
stop("File must end with .png, .jpg, or .jpeg", call. = FALSE)
# Get return of file system command
s <- system(paste0("file ", path), intern = TRUE)
# Extract width and height from string
width <- regmatches(s, gregexpr("(?<=, )[0-9]+(?=(x| x )[0-9]+,)", s, perl = TRUE))[[1]]
height <- regmatches(s, gregexpr(", [0-9]+(x| x )\\K[0-9]+(?=,)", s, perl = TRUE))[[1]]
setNames(as.numeric(c(width, height)), c("Width", "Height"))
}
get_image_dimensions('test.png')
#> Width Height
#> 1409 1046
get_image_dimensions('test.jpg')
#> Width Height
#> 1280 724
Created on 2021-06-11 by the reprex package (v1.0.0)

Related

How to get total number of pages of pdf files using magick::image_read_pdf?

Let's say under one folder main_path, we have multiple pdf files with different amount of pages, I use the function below to loop all files and screenshot each pages:
library(magick)
library(glue)
main_path <- './'
file_names <- list.files(path = main_path, pattern ='.pdf')
file_paths <- file.path(main_path, file_names)
file_names_no_ext <- tools::file_path_sans_ext(file_names)
max_page <- 10
pdf2plot <- function(file_path, file_names_no_ext){
pages <- magick::image_read_pdf(file_path)
print(pages)
num <- seq(1, max_page, 1)
# num <- seq(1, nrow(data.frame(pages)), 1)
for (i in num){
pages[i] %>% image_write(., path = paste0(glue(main_path, '/plot/', {file_names_no_ext},
sprintf('_%02d.', i)), format = "png"))
}
}
mapply(pdf2plot, file_paths, file_names_no_ext)
The problem I met is if we have one file in folder with total number of pages less than max_page, it will raise an Error in magick_image_subset(x, i) : subscript out of bounds. For example, I have one file with 2 pages, but I set max_page=10, I will get this error.
The content of pages:
format width height colorspace matte filesize density
<chr> <int> <int> <chr> <lgl> <int> <chr>
1 PNG 2250 3000 sRGB TRUE 0 300x300
2 PNG 2250 3000 sRGB TRUE 0 300x300
3 PNG 2250 3000 sRGB TRUE 0 300x300
4 PNG 2250 3000 sRGB TRUE 0 300x300
5 PNG 2250 3000 sRGB TRUE 0 300x300
6 PNG 2250 3000 sRGB TRUE 0 300x300
7 PNG 2250 3000 sRGB TRUE 0 300x300
8 PNG 2250 3000 sRGB TRUE 0 300x300
9 PNG 2250 3000 sRGB TRUE 0 300x300
Error in magick_image_subset(x, i) : subscript out of bounds
Called from: magick_image_subset(x, i)
I think there could be two ways to solve this problem, but I don't how to do it yet: 1. use try-catch, 2. replace max_page by get total number of pages using magick::image_read_pdf.
Thanks for your help at advance.
If you look at the documentation of ?image_read, we can see that:
All standard base vector methods such as [, [[, c(), as.list(), as.raster(), rev(), length(), and print() can be used to work with magick image objects. Use the standard img[i] syntax to extract a subset of the frames from an image.
So you can simply use length(pages) to get the number of pages for that document. Here's a simple version of your function using lapply(). I think you can simplify your pathing a lot, but won't get into that.
library(magick)
library(glue)
pdf2plot <- function(file_path, file_names_no_ext){
pages <- magick::image_read_pdf(file_path)
lapply(
1:length(pages),
\(i) image_write(pages[i], path = paste0(glue(main_path, '/plot/', {file_names_no_ext},
sprintf('_%02d.', i)), format = "png"))
)
}
Code produced using R 4.1.0

Compress output raster and parallelize gdalwarp from R

I would like to include -co options to compress output raster using gdalwarp from gdalUtilities in R.
I have tried some options (commented in the code), but I have not been successful in generating the compressed raster.
gdalUtilities::gdalwarp(srcfile = paste0(source_path,"/mask_30.tif"),
dstfile = paste0(writing_path,"/mask_30_gdalwarp.tif"),
cutline = paste0(source_path,"/amazon.shp"),
crop_to_cutline = TRUE,
multi = TRUE,
wo = "NUM_THREADS = 32",
co = "COMPRESS = DEFLATE")
# co = c("COMPRESS = DEFLATE","ZLEVEL = 9"))
# co COMPRESS = DEFLATE,
# co ZLEVEL = 9),
# co = "COMPRESS = DEFLATE",
# co = ZLEVEL = 9")
Additionally, I would like to use multithread warping implementation. I am including-multi and -wo "NUM_THREADS = 16" (my computer has 32 cores) options, but I also have not been able to decrease the runtime vs. the default -multi option, which uses two cores by default.
Any suggestions for compression and parallelization?
Many thanks in advance.
1 - COMPRESSION
Please find the solution for the problem of file compression. To be honest, I have already been confronted with the same problem as you and, at the time, I was racking my brains... to finally find the solution which is quite simple (once we know it!): you must not put any spaces (i.e. "COMPRESS=DEFLATE" and not "COMPRESS = DEFLATE")
So, please find below a small reprex.
Reprex
library(gdalUtilities)
library(stars) # Loaded just to have a '.tif' image for the reprex
# Import a '.tif' image from the 'stars' library
tif <- read_stars(system.file("tif/L7_ETMs.tif", package = "stars"))
# Write the image to disk (in your working directory)
write_stars(tif, "image.tif")
# Size of the image on disk (in bytes)
file.size("image.tif")
#> [1] 2950880
# Compress the image
gdalUtilities::gdalwarp(srcfile = "image.tif",
dstfile = "image_gdalwarp.tif",
co = "COMPRESS=DEFLATE")
# Size of the compressed image on disk (in bytes)
file.size("image_gdalwarp.tif")
#> [1] 937920 # The image has been successfully compressed.
As #MarkAdler said, there is not much difference between the default compression level (i.e. 6) and level 9. That said, please find below how you should write the code to be able to apply the desired compression level (i.e. still without spaces and in a list):
gdalUtilities::gdalwarp(srcfile = "image.tif",
dstfile = "image_gdalwarp_Z9.tif",
co = list("COMPRESS=DEFLATE", "ZLEVEL=9"))
file.size("image_gdalwarp_Z9.tif")
#> [1] 901542
Created on 2022-02-09 by the reprex package (v2.0.1)
2 - PARALLELIZATION
For the problem of parallelization on the cores of the processor, you should not use multi = TRUE. Only the argument wo = "NUM_THREADS=4" (always without spaces ;-)) is enough.
Just a clarification, I guess you are confusing the RAM and the number of cores. Usually computers are equipped with a 4 or 8 cores processor. The 32 that you indicate in your code refers to the 32 gigas of RAM that your computer probably has.
Reprex
library(gdalUtilities)
library(stars)
tif <- read_stars(system.file("tif/L7_ETMs.tif", package = "stars"))
write_stars(tif, "image.tif")
file.size("image.tif")
#> [1] 2950880
gdalUtilities::gdalwarp(srcfile = "image.tif",
dstfile = "image_gdalwarp_Z9_parallel.tif",
co = list("COMPRESS=DEFLATE", "ZLEVEL=9"),
wo = "NUM_THREADS=4") # Replace '4' by '8' if your processor has 8 cores
file.size("image_gdalwarp_Z9_parallel.tif")
#> [1] 901542
Created on 2022-02-09 by the reprex package (v2.0.1)

How to output results of 'msa' package in R to fasta

I am using the R package msa, a core Bioconductor package, for multiple sequence alignment. Within msa, I am using the MUSCLE alignment algorithm to align protein sequences.
library(msa)
myalign <- msa("test.fa", method=c("Muscle"), type="protein",verbose=FALSE)
The test.fa file is a standard fasta as follows (truncated, for brevity):
>sp|P31749|AKT1_HUMAN_RAC
MSDVAIVKEGWLHKRGEYIKTWRPRYFLL
>sp|P31799|AKT1_HUMAN_RAC
MSVVAIVKEGWLHKRGEYIKTWRFLL
When I run the code on the file, I get:
MUSCLE 3.8.31
Call:
msa("test.fa", method = c("Muscle"), type = "protein", verbose = FALSE)
MsaAAMultipleAlignment with 2 rows and 480 columns
aln
[1] MSDVAIVKEGWLHKRGEYIKTWRPRYFLL
[2] MSVVAIVKEGWLHKRGEYIKTWR---FLL
Con MS?VAIVKEGWLHKRGEYIKTWR???FLL
As you can see, a very reasonable alignment.
I want to write the gapped alignment, preferably without the consensus sequence (e.g., Con row), to a fasta file. So, I want:
>sp|P31749|AKT1_HUMAN_RAC
MSDVAIVKEGWLHKRGEYIKTWRPRYFLL
>sp|P31799|AKT1_HUMAN_RAC
MSVVAIVKEGWLHKRGEYIKTWR---FLL
I checked the msa help, and the package does not seem to have a built in method for writing out to any file type, fasta or otherwise.
The seqinr package looks somewhat promising, because maybe it could read this output as an msf format, albeit a weird one. However, seqinr seems to need a file read in as a starting point. I can't even save this using write(myalign, ...).
I wrote a function:
alignment2Fasta <- function(alignment, filename) {
sink(filename)
n <- length(rownames(alignment))
for(i in seq(1, n)) {
cat(paste0('>', rownames(alignment)[i]))
cat('\n')
the.sequence <- toString(unmasked(alignment)[[i]])
cat(the.sequence)
cat('\n')
}
sink(NULL)
}
Usage:
mySeqs <- readAAStringSet('test.fa')
myAlignment <- msa(mySeqs)
alignment2Fasta(myAlignment, 'out.fasta')
I think you ought to follow the examples in the help pages that show input with a specific read function first, then work with the alignment:
mySeqs <- readAAStringSet("test.fa")
myAlignment <- msa(mySeqs)
Then the rownames function will deliver the sequence names:
rownames(myAlignment)
[1] "sp|P31749|AKT1_HUMAN_RAC" "sp|P31799|AKT1_HUMAN_RAC"
(Not what you asked for but possibly useful in the future.) Then if you execute:
detail(myAlignment) #function actually in Biostrings
.... you get a text file in interactive mode that you can save
2 29
sp|P31749|AKT1_HUMAN_RAC MSDVAIVKEG WLHKRGEYIK TWRPRYFLL
sp|P31799|AKT1_HUMAN_RAC MSVVAIVKEG WLHKRGEYIK TWR---FLL
If you wnat to try hacking a function for which you can get a file written in code, then look at the Biostrings detail function code that is being used
> showMethods( f= 'detail')
Function: detail (package Biostrings)
x="ANY"
x="MsaAAMultipleAlignment"
(inherited from: x="MultipleAlignment")
x="MultipleAlignment"
showMethods( f= 'detail', classes='MultipleAlignment', includeDefs=TRUE)
Function: detail (package Biostrings)
x="MultipleAlignment"
function (x, ...)
{
.local <- function (x, invertColMask = FALSE, hideMaskedCols = TRUE)
{
FH <- tempfile(pattern = "tmpFile", tmpdir = tempdir())
.write.MultAlign(x, FH, invertColMask = invertColMask,
showRowNames = TRUE, hideMaskedCols = hideMaskedCols)
file.show(FH)
}
.local(x, ...)
}
You may use export.fasta function from bio2mds library.
# reading of the multiple sequence alignment of human GPCRS in FASTA format:
aln <- import.fasta(system.file("msa/human_gpcr.fa", package = "bios2mds"))
export.fasta(aln)
You can convert your msa alignment first ("AAStringSet") into an "align" object first, and then export as fasta as follows:
library(msa)
library(bios2mds)
mysequences <-readAAStringSet("test.fa")
alignCW <- msa(mysequences)
#https://rdrr.io/bioc/msa/man/msaConvert.html
alignCW_as_align <- msaConvert(alignCW, "bios2mds::align")
export.fasta(alignCW_as_align, outfile = "test_alignment.fa", ncol = 60, open = "w")

Error using R magick package

I am trying to save a jpeg picture in png format using the magick package in R and I'm facing an error.
Below is the error that I get using this code:
library(magick)
testPic <- "https://upload.wikimedia.org/wikipedia/commons/thumb/4/42/President_Roosevelt_-_Pach_Bros.tif/lossy-page1-165px-President_Roosevelt_-_Pach_Bros.tif.jpg"
image <- image_read(testPic)
image_info(image)
image_convert(image, format = "png", depth = NULL)
Error in magick_image_write(image, format, quality) :
Magick: profile 'icc': 'RGB ': RGB color space not permitted on grayscale PNG `' # warning/png.c/MagickPNGWarningHandler/1656
This is a bug in imagemagick. The workaround is to add strip = TRUE to image_read():
library(magick)
testPic <- "https://upload.wikimedia.org/wikipedia/commons/thumb/4/42/President_Roosevelt_-_Pach_Bros.tif/lossy-page1-165px-President_Roosevelt_-_Pach_Bros.tif.jpg"
image <- image_read(testPic, strip = TRUE)
image_info(image)
image_convert(image, format = "png", depth = NULL)
I'll try to ping upstream again to fix this.

R gif with function

I am trying to make a gif out of an R-Script using a function to generate the images.
I have a function that given some information creates a Map with dots on it.
I use this function on a Vector obtaining a series of different images, and I would like to put them together in a gif. It looks more or less like that:
createMap <- function(my_variable){
my_map <- a_map() + geom_point() # some variable missing
png(filename = paste(aDate, ".png", sep = ""), width = 3149, height = 2183, units = "px")
plot(mw_map)
dev.off()
}
ImageMagick is installed on my pc and the conversion file "converter.exe" also. Later I try to generate the gif using
saveGIF({
lapply(my_vector, createMap)
}, movie.name = "MY_GIF.gif")
but I get an error message:
> convert: improper image header `Rplot1.png' #
> error/png.c/ReadPNGImage/4362. convert: no images defined `MY_GIF.gif'
> # error/convert.c/ConvertImageCommand/3254.
an error occurred in the conversion...
does anybody know what I did wrong?
After creating the map png files. Use the below code. You don't need ImageMagick is installed on PC.
library(magick)
png.files <- sprintf("Rplot%02d.png", 1:10) #Mention the number of files to be read
GIF.convert <- function(x, output = "animation.gif")#Create a function to read, animate and convert the files to gif
{
image_read(x) %>%
image_animate(fps = 1) %>%
image_write(output)
}
GIF.convert(png.files)
For more details check this link: Link

Resources