I need to automate some image transformations to do the following:
- read in 16,000+ images that are short and wide, sizing is not the same.
- rescale each image to 90 pixels high
- crop 90 pixels over the width of the image, so multiple 90x90 crops over 1 image - then do it all over again for the next image
- each 90x90 image needs to be saved as file-name_1.png, file-name_2.png and so on in sequential order
I've completed a test on 8 images, and using the magick package I was able to rescale and create multiple crops from each image manually. The problem is when I try to do multiple, I am able to resize the images easily but when it comes to saving them there is a problem.
# capture images, file paths in a list
img_list <- list.files("./orig_images", pattern = "\\.png$", full.names = TRUE)
# get all images in a list
all_images <- lapply(img_list, image_read)
# scale each image height - THIS DOESN'T WORK, GET NULL VALUE
scale_images <-
for (i in 1:length(all_images)) {
scale_images(all_images[[i]], "x90")
}
# all images added into one
all_images_joined <- image_join(all_images)
# scale images - THIS WORKS to scale, but problems later
all_images_scaled <-
image_scale(all_images_joined, "x90")
# Test whether a single file will be written or multiple files;
# only writes one file (even if I
for (i in 1:length(all_images_scaled)) {
image_write(all_images_scaled[[i]], path = "filepath/new_cropimages/filename")
}
Ideally, I would scale the images with a for loop. That way I can save the scaled images to a directory. This didn't work - I don't get an error, but when I check the contents of the variable it is null. The image_join function puts them all together and scales the height to 90 (width is also scaled proportionately) but I can't write the separate images to directory. Also, the next piece is to crop each image across the width and save the new images file-name_1.png, and so on for every image 90x90, move over 90 pixels, crop 90x90, and so on. I chose magic because it was easy to individually scale and crop, but I'm open to other ideas (or learning how to make that package work). Thanks for any help.
Here are some images:
[Original Image, untransformed][1]
[Manual 90x90 crop][2]
[Another manual 90x90 crop, farther down the same image][3]
[1]: https://i.stack.imgur.com/8ptXv.png
[2]: https://i.stack.imgur.com/SF9pG.png
[3]: https://i.stack.imgur.com/NyKxS.png
I don't speak R, but I hope to be able to help with the ImageMagick aspects and getting 16,000 images processed.
As you are on a Mac, you can install 2 very useful packages very easily with homebrew, using:
brew install imagemagick
brew install parallel
So, your original sentence image is 1850x105 pixels, you can see that in Terminal like this:
magick identify sentence.png
sentence.png PNG 1850x105 1850x105+0+0 8-bit Gray 256c 51626B 0.000u 0:00.000
If you resize the height to 90px, leaving the width to follow proportionally, it will become 1586x90px:
magick sentence.png -resize x90 info:
sentence.png PNG 1586x90 1586x90+0+0 8-bit Gray 51626B 0.060u 0:00.006
So, if you resize and then crop into 90px wide chunks:
magick sentence.png -resize x90 -crop 90x chunk-%03d.png
you will get 18 chunks, each 90 px wide except the last, as follows:
-rw-r--r-- 1 mark staff 5648 6 Jun 08:07 chunk-000.png
-rw-r--r-- 1 mark staff 5319 6 Jun 08:07 chunk-001.png
-rw-r--r-- 1 mark staff 5870 6 Jun 08:07 chunk-002.png
-rw-r--r-- 1 mark staff 6164 6 Jun 08:07 chunk-003.png
-rw-r--r-- 1 mark staff 5001 6 Jun 08:07 chunk-004.png
-rw-r--r-- 1 mark staff 6420 6 Jun 08:07 chunk-005.png
-rw-r--r-- 1 mark staff 4726 6 Jun 08:07 chunk-006.png
-rw-r--r-- 1 mark staff 5559 6 Jun 08:07 chunk-007.png
-rw-r--r-- 1 mark staff 5053 6 Jun 08:07 chunk-008.png
-rw-r--r-- 1 mark staff 4413 6 Jun 08:07 chunk-009.png
-rw-r--r-- 1 mark staff 5960 6 Jun 08:07 chunk-010.png
-rw-r--r-- 1 mark staff 5392 6 Jun 08:07 chunk-011.png
-rw-r--r-- 1 mark staff 4280 6 Jun 08:07 chunk-012.png
-rw-r--r-- 1 mark staff 5681 6 Jun 08:07 chunk-013.png
-rw-r--r-- 1 mark staff 5395 6 Jun 08:07 chunk-014.png
-rw-r--r-- 1 mark staff 5065 6 Jun 08:07 chunk-015.png
-rw-r--r-- 1 mark staff 6322 6 Jun 08:07 chunk-016.png
-rw-r--r-- 1 mark staff 4848 6 Jun 08:07 chunk-017.png
Now, if you have 16,000 sentences to process, you can use GNU Parallel to get them all done in parallel and also get sensible names for all the files. Let's do a dry-run first so it actually doesn't do anything, but just shows you what it would do:
parallel --dry-run magick {} -resize x90 -crop 90x {.}-%03d.png ::: sentence*
Sample Output
magick sentence1.png -resize x90 -crop 90x sentence1-%03d.png
magick sentence2.png -resize x90 -crop 90x sentence2-%03d.png
magick sentence3.png -resize x90 -crop 90x sentence3-%03d.png
That looks good, so remove the --dry-run and do it again and you get the following output for the three (identical copies) of your sentence I made:
-rw-r--r-- 1 mark staff 5648 6 Jun 08:13 sentence1-000.png
-rw-r--r-- 1 mark staff 5319 6 Jun 08:13 sentence1-001.png
-rw-r--r-- 1 mark staff 5870 6 Jun 08:13 sentence1-002.png
-rw-r--r-- 1 mark staff 6164 6 Jun 08:13 sentence1-003.png
-rw-r--r-- 1 mark staff 5001 6 Jun 08:13 sentence1-004.png
-rw-r--r-- 1 mark staff 6420 6 Jun 08:13 sentence1-005.png
-rw-r--r-- 1 mark staff 4726 6 Jun 08:13 sentence1-006.png
-rw-r--r-- 1 mark staff 5559 6 Jun 08:13 sentence1-007.png
-rw-r--r-- 1 mark staff 5053 6 Jun 08:13 sentence1-008.png
-rw-r--r-- 1 mark staff 4413 6 Jun 08:13 sentence1-009.png
-rw-r--r-- 1 mark staff 5960 6 Jun 08:13 sentence1-010.png
-rw-r--r-- 1 mark staff 5392 6 Jun 08:13 sentence1-011.png
-rw-r--r-- 1 mark staff 4280 6 Jun 08:13 sentence1-012.png
-rw-r--r-- 1 mark staff 5681 6 Jun 08:13 sentence1-013.png
-rw-r--r-- 1 mark staff 5395 6 Jun 08:13 sentence1-014.png
-rw-r--r-- 1 mark staff 5065 6 Jun 08:13 sentence1-015.png
-rw-r--r-- 1 mark staff 6322 6 Jun 08:13 sentence1-016.png
-rw-r--r-- 1 mark staff 4848 6 Jun 08:13 sentence1-017.png
-rw-r--r-- 1 mark staff 5648 6 Jun 08:13 sentence2-000.png
-rw-r--r-- 1 mark staff 5319 6 Jun 08:13 sentence2-001.png
-rw-r--r-- 1 mark staff 5870 6 Jun 08:13 sentence2-002.png
-rw-r--r-- 1 mark staff 6164 6 Jun 08:13 sentence2-003.png
-rw-r--r-- 1 mark staff 5001 6 Jun 08:13 sentence2-004.png
-rw-r--r-- 1 mark staff 6420 6 Jun 08:13 sentence2-005.png
-rw-r--r-- 1 mark staff 4726 6 Jun 08:13 sentence2-006.png
-rw-r--r-- 1 mark staff 5559 6 Jun 08:13 sentence2-007.png
-rw-r--r-- 1 mark staff 5053 6 Jun 08:13 sentence2-008.png
-rw-r--r-- 1 mark staff 4413 6 Jun 08:13 sentence2-009.png
-rw-r--r-- 1 mark staff 5960 6 Jun 08:13 sentence2-010.png
-rw-r--r-- 1 mark staff 5392 6 Jun 08:13 sentence2-011.png
-rw-r--r-- 1 mark staff 4280 6 Jun 08:13 sentence2-012.png
-rw-r--r-- 1 mark staff 5681 6 Jun 08:13 sentence2-013.png
-rw-r--r-- 1 mark staff 5395 6 Jun 08:13 sentence2-014.png
-rw-r--r-- 1 mark staff 5065 6 Jun 08:13 sentence2-015.png
-rw-r--r-- 1 mark staff 6322 6 Jun 08:13 sentence2-016.png
-rw-r--r-- 1 mark staff 4848 6 Jun 08:13 sentence2-017.png
-rw-r--r-- 1 mark staff 5648 6 Jun 08:13 sentence3-000.png
-rw-r--r-- 1 mark staff 5319 6 Jun 08:13 sentence3-001.png
-rw-r--r-- 1 mark staff 5870 6 Jun 08:13 sentence3-002.png
-rw-r--r-- 1 mark staff 6164 6 Jun 08:13 sentence3-003.png
-rw-r--r-- 1 mark staff 5001 6 Jun 08:13 sentence3-004.png
-rw-r--r-- 1 mark staff 6420 6 Jun 08:13 sentence3-005.png
-rw-r--r-- 1 mark staff 4726 6 Jun 08:13 sentence3-006.png
-rw-r--r-- 1 mark staff 5559 6 Jun 08:13 sentence3-007.png
-rw-r--r-- 1 mark staff 5053 6 Jun 08:13 sentence3-008.png
-rw-r--r-- 1 mark staff 4413 6 Jun 08:13 sentence3-009.png
-rw-r--r-- 1 mark staff 5960 6 Jun 08:13 sentence3-010.png
-rw-r--r-- 1 mark staff 5392 6 Jun 08:13 sentence3-011.png
-rw-r--r-- 1 mark staff 4280 6 Jun 08:13 sentence3-012.png
-rw-r--r-- 1 mark staff 5681 6 Jun 08:13 sentence3-013.png
-rw-r--r-- 1 mark staff 5395 6 Jun 08:13 sentence3-014.png
-rw-r--r-- 1 mark staff 5065 6 Jun 08:13 sentence3-015.png
-rw-r--r-- 1 mark staff 6322 6 Jun 08:13 sentence3-016.png
-rw-r--r-- 1 mark staff 4848 6 Jun 08:13 sentence3-017.png
A word of explanation about the parameters to parallel:
{} refers to "the current file"
{.} refers to "the current file without its extension"
::: separates the parameters meant for parallel from those meant for your magick command
One note of warning, PNG images can "remember" where they came from which can be useful, or very annoying. If you look at the last chunk from above you will see it is 56x90, but that following that, it "remembers" it came from a canvas 1586x90 at offset 1530,0:
identify sentence3-017.png
sentence3-017.png PNG 56x90 1586x90+1530+0 8-bit Gray 256c 4848B 0.000u 0:00.000
This can sometimes upset subsequent processing which is annoying, or sometimes be very useful in re-assembling images that have been chopped up! If you want to remove it, you need to repage, so the command above becomes:
magick input.png -resize x90 -crop 90x +repage output.png
Updated - to make better use of the tools in EBImage
ImageMagick is a great approach. But should you want to perform some content analysis on the images, here is a solution with R. R does provide some pretty handy tools. Also, images are "nothing" but matrices, which R handles really well. By reducing the images to matrices, the package EBImage does this very well and, for better or for worse, removes some of the metadata with each image. Here's a R solution with EBImage. Again though, Mark's solution may be better for really big production runs.
The solution is structured around a large "for" loop. It would be prudent to add error checking at several steps. The code takes advantage of EBImage to manage both color and grayscale images.
Here, the final image is centered in an extended image by adding pixels of the desired background color. The extended image is then cropped into tiles. The logic determining the value for pad can be adjusted to simply crop the image or left justify or right justify it, if desired.
It starts by assuming you begin in the working directory with the source files in ./source and the destination to be in ./dest. It also creates a new directory for each "tiled" image. That could be changed to have a single directory receive all the images as well as other protective coding. Here, the images are assumed to be PNG files with an appropriate extension. The desired tile size (90) to be applied to both height and width is stored in the variable size.
# EBImage needs to be available
if (!require(EBImage)) {
source("https://bioconductor.org/biocLite.R")
biocLite("EBImage")
library(EBImage)
}
# From the working directory, select image files
size <- 90
bg.col <- "transparent" # or any other color specification for R
ff <- list.files("source", full = TRUE,
pattern = "png$", ignore.case = TRUE)
# Walk through all files with a 'for' loop,
for (f in ff) {
# Extract base name, even names like "foo.bar.1.png"
txt <- unlist(strsplit(basename(f), ".", fixed = TRUE))
len <- length(txt)
base <- ifelse(len == 1, txt[1], paste(txt[-len], collapse = "."))
# Read one image and resize
img <- readImage(f)
img <- resize(img, h = size) # options allow for antialiasing
# Determine number tiles and padding needed
nx <- ceiling(dim(img)[1]/size)
newdm <- c(nx * size, size) # extend final image
pad <- newdm[1] - dim(img)[1] # pixels needed to extend
# Translate the image with given background fille
img <- translate(img, c(pad%/%2, 0), output.dim = newdm, bg.col = bg.col)
# Split image into appropriate sized tiles with 'untile'
img <- untile(img, c(nx, 1), lwd = 0) # see the help file
# Create a new directory for each image
dpath <- file.path("dest", trimws(base)) # Windows doesn't like " "
if (!dir.create(dpath))
stop("unable to create directory: ", dpath)
# Create new image file names for each frame
fn <- sprintf("%s_%03d.png", base, seq_len(nx))
fpaths <- file.path(dpath, fn)
# Save individual tiles (as PNG) and names of saved files
saved <- mapply(writeImage, x = getFrames(img, type = "render"),
files = fpaths)
# Check on the results from 'mapply'
print(saved)
}
Related
As the data is of rainfall, I want to replace the negative values both in point forecasts and intervals with 0. How can this be done in R ? Looking for the R codes that can make the required changes.
The Forecast values obtained in R using an ARIMA model are given below
> Predictions
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
Jan 2021 -1.6625108 -165.62072 162.2957 -252.41495 249.0899
Feb 2021 0.8439712 -165.57869 167.2666 -253.67752 255.3655
Mar 2021 35.9618300 -130.53491 202.4586 -218.67297 290.5966
Apr 2021 53.4407679 -113.05822 219.9398 -201.19746 308.0790
May 2021 206.7464927 40.24744 373.2455 -47.89184 461.3848
Jun 2021 436.2547446 269.75569 602.7538 181.61641 690.8931
Jul 2021 408.2814434 241.78239 574.7805 153.64311 662.9198
Aug 2021 431.7649076 265.26585 598.2640 177.12657 686.4032
Sep 2021 243.5520546 77.05300 410.0511 -11.08628 498.1904
Oct 2021 117.4581047 -49.04095 283.9572 -137.18023 372.0964
Nov 2021 25.0773401 -141.42171 191.5764 -229.56098 279.7157
Dec 2021 28.9468415 -137.55188 195.4456 -225.69098 283.5847
Jan 2022 -0.4912674 -171.51955 170.5370 -262.05645 261.0739
Feb 2022 2.2963271 -168.86759 173.4602 -259.47630 264.0690
Mar 2022 43.3561613 -127.81187 214.5242 -218.42275 305.1351
Apr 2022 48.6538398 -122.51431 219.8220 -213.12526 310.4329
May 2022 228.4762035 57.30805 399.6444 -33.30290 490.2553
Jun 2022 445.3540781 274.18592 616.5222 183.57497 707.1332
Jul 2022 441.8287867 270.66063 612.9969 180.04968 703.6079
Aug 2022 592.5766086 421.40845 763.7448 330.79751 854.3557
Sep 2022 220.6996396 49.53148 391.8678 -41.07946 482.4787
Oct 2022 158.7952154 -12.37294 329.9634 -102.98389 420.5743
Nov 2022 29.9052184 -141.26288 201.0733 -231.87380 291.6842
Dec 2022 25.9432583 -145.22303 197.1095 -235.83298 287.7195
In this context, try using:
Predictions[Predictions < 0] <- 0
Which will replace all values less than 0 with 0. Because of the processing, the use of for loops is discouraged in applications where vectorization can be applied.
I have a script that would delete based on duplicate values in three columns. There are way more than three columns but i want to delete based on those specific ones
DF2021 <-DF2021 [!duplicated (DF2021[,c("column1","column2","column3")]),]
The script above works and it leaves me with one row for each time there is a duplicate based on those three columns.
The next step is where I wonder how to make sure I'am left with the row based on criteria. For example I want the row with the least NA's.
column1|column2|column3|column4|column5|column6|column 7
Jan Tue 2020 Blue Warm Hospital NA
Jan Tue 2020 Blue Warm NA NA
Jan Tue 2020 Blue NA NA NA
Feb Thu 2020 Red NA NA NA
Feb Thu 2020 Red Warm NA NA
Feb Thu 2020 Red Warm Garden Run
Mar Thu 2020 Red Cold Desk Bus
In the end I would expect the duplicate value to leave me with three rows.
column1|column2|column3|column4|column5|column6|column 7
Jan Tue 2020 Blue Warm Hospital NA
Feb Thu 2020 Red Warm Garden Run
Mar Thu 2020 Red Cold Desk Bus
Note that if i were to do
DF2021 <- DF2021[complete.cases(DF2021),]
It would only give me the Feb and Mar row but not the Jan. I want the script to remove duplicates and take the "most" but doesn't have to "full" rows out of the duplicates based on those three rows.
Try this. You can create a function to detect the complete rows and those with only one NA. With that you can use indexing and select that rows. Here the code:
#Index for selection
myfun <- function(x)
{
y <- length(which(is.na(x)))
y <- ifelse(y<=1,1,0)
return(y)
}
#Apply
index <- which(apply(df,1,myfun)==1)
#Output
out <- df[index,]
Output:
column1 column2 column3 column4 column5 column6 column7
1 Jan Tue 2020 Blue Warm Hospital <NA>
6 Feb Thu 2020 Red Warm Garden Run
7 Mar Thu 2020 Red Cold Desk Bus
I am importing some data in to R and want the code to stop running if there is no file or there is no data in the file. I'm using base R and readxl. Please can you help with the syntax?
I've tried
if (dim(Llatest) == NULL) {stop('STOP NO DATA')}
if (dim(Llatest)[1] == 0) + stop('STOP NO DATA')}
if (isTRUE(dim(Llatest) == NULL)) {stop('STOP NO DATA')}
Some data imported from Sep19import.xlsx
ID Code Received Actioned Decision
1 123 Jul 01 2019 Sep 02 2019 Hold
2 456 Jul 11 2019 Sep 13 2019 No action
3 789 Nov 26 2018 Sep 25 2019 Investigate
4 321 Sep 12 2019 Sep 12 2019 Await decision
5 654 Aug 30 2019 Sep 26 2019 Hold
6 987 Feb 22 2019 Sep 02 2019 Investigate
Obtain list of files for import
LFiles <- list.files(path = "C:/Projects/Sep/code", pattern = "*import.xlsx", full.names = TRUE)
***I wish to stop here if LFiles is empty
Identify the latest file
Llatest <- subset(LFiles, LFiles == max(LFiles))
Extract data from file
LMonthly <- read_excel(Llatest)
***I wish to stop here if LMonthly is empty
Error Messages received - no non-missing arguments, returning NA
I expect the output to be 'STOP NO DATA'
This question already has answers here:
Adding a column of means by group to original data [duplicate]
(4 answers)
Closed 3 years ago.
I want to add an extra column in my already existing dataframe with location, coral type, percent bleached and year as column. I want average of bleach percent of each type of coral on every site over the years. For example, soft corals on site 01 has bleach percent on 20 in 2010 and 10 in 2011 so the average column value will contain 15.
already exiting df
type location year value
soft site01 2010 20
soft site01 2011 10
hard site01 2010 10
hard site01 2011 30
after adding column
type location year value avg
soft site01 2010 20 15
soft site01 2011 10 15
hard site01 2010 10 20
hard site01 2011 30 20
You can use ave:
transform(dat, avg = ave(value, type, location))
The result:
type location year value avg
1 soft site01 2010 20 15
2 soft site01 2011 10 15
3 hard site01 2010 10 20
4 hard site01 2011 30 20
The txt is like
#---*----1----*----2----*---
Name Time.Period Value
A Jan 2013 10
B Jan 2013 11
C Jan 2013 12
A Feb 2013 9
B Feb 2013 11
C Feb 2013 15
A Mar 2013 10
B Mar 2013 8
C Mar 2013 13
I tried to use read.table with readLines and count.field as shown belows:
> path <- list.files()
> data <- read.table(text=readLines(path)[count.fields(path, blank.lines.skip=FALSE) == 4])
Warning message:
In readLines(path) : incomplete final line found on 'data1.txt'
> data
V1 V2 V3 V4
1 A Jan 2013 10
2 B Jan 2013 11
3 C Jan 2013 12
4 A Feb 2013 9
5 B Feb 2013 11
6 C Feb 2013 15
7 A Mar 2013 10
8 B Mar 2013 8
9 C Mar 2013 13
The problem is that it give four attributes instead of three. Therefore i manipulate my data as below which seeking a alternative.
> library(zoo)
> data$Name <- as.character(data$V1)
> data$Time.Period <- as.yearmon(paste(data$V2, data$V3, sep=" "))
> data$Value <- as.numeric(data$V4)
> DATA <- data[, 5:7]
> DATA
Name Time.Period Value
1 A Jan 2013 10
2 B Jan 2013 11
3 C Jan 2013 12
4 A Feb 2013 9
5 B Feb 2013 11
6 C Feb 2013 15
7 A Mar 2013 10
8 B Mar 2013 8
9 C Mar 2013 13
You can use read.fwf to read fixed width files. You need to correctly specify the width of each column, in spaces.
data <- read.fwf(path, widths=c(-12, 8, -4, 2), header=T)
The key there is how you specify the width. Negative means skip that many places, positive means read that many. I am assuming entries in the last column have only 2 digits. Change widths accordingly if this is not the case. You will probably also have to fix the column names.
You will have to change the indices if the file format changes, or come up with some clever regexp to read it from the first few rows. A better solution would be to enclose your strings in " or, even better, avoid the format altogether.
?count.fields
As the R Documentation states count.fields counts the number of fields, as separated by sep, in each of the lines of file read, when you set count.fields(path, blank.lines.skip=FALSE) == 4 it will skip the header row which actually has three fields.