I am trying to plot multiple trackViewer Vignette lollipopPlot plots that have been made from lapply and save them into single PDF file in grid system.
The following code makes the plots and saved them in separate file:
files <- list.files(path="/Users/myusername/lollyplot_data/", pattern="*.txt", full.names=TRUE, recursive=FALSE)
lapply(files, function(x) {
file_base_name = sub('\\..*$', '', basename(x))
myfile <- read.csv(x, header=FALSE, sep = "\t")
chrom_info = strsplit(myfile$V1, ':')[[1]][1]
sample.gr <- GRanges(chrom_info,IRanges(myfile$V2, myfile$V2, names=myfile$V1), color=myfile$V3, score=myfile$V4)
features <- GRanges(chrom_info, IRanges(myfile$V2, myfile$V2))
sample.gr.rot <- sample.gr
png(paste(file_base_name, "png", sep = "."), width = 600, height = 595)
lolliplot(sample.gr.rot, features, cex = 1.2, yaxis.gp = gpar(fontsize=18, lwd=2), ylab = FALSE, xaxis.gp = gpar(fontsize=10))
grid.text(strsplit(file_base_name, "_")[[1]][7], x=.5, y=.98, just="top",
gp=gpar(cex=1.5, fontface="bold"))
dev.off()
})
How do I transform the above code to save the plots in a single PDF file in a grid system. The total plots are 20 plots so the grid will have 5 rows and 4 columns. I tried using grid.arrange but that did not work.
Really appreciate any input in advance.
Related
I am new in coding with R and I work with a large dataset.
I am trying to write a code that do the following things:
Get all pathes to all files in my folder
Extract the names of the files (as I want to name my plots after the input file)
Read in all files in my folder (these are all .csv files)
Plot a diagram for each .csv file by plotting groundwater level against the year
--> these plots should then get the title of the input file and also be stored under the same name.
For example when my file is called 211210.csv, then the title should be 211210 and stored as 211210.png
This is the code I have until know. As I said, I am new to R, and I tried to solve may problems I had in the code but I still run into new errors. Is there someone who can explain me where the problem is and how to solve it.
library(fs)
library(ggplot2)
library(tidyverse)
#Opening path to my data
filepath <- fs::dir_ls("D:/Desktop/Masterarbeit/Daten/Test/")
# Get name of files
name <- basename(filepath)
#Read every single files
file_content <- list()
for (i in seq_along(filepath)){
path <- filepath
file_content[[i]] <- read.csv(
file = filepath[[i]], header = TRUE
)
}
file_content <- set_names(file_content, filepath)
#Plot the diagram with gwl against year for each file, title = name of each file and store it in a seperat folder with the name of the input file
for (i in file_content){
mypath <- file.path("D:/Desktop/Masterarbeit/Daten/Results/", paste("Messstelle_", name[[i]], ".png", sep = ""))
png(file=mypath)
mytitle = paste("Messstelle", name[[i]])
plot(i$year, i$gwl,
pch = 19, #--> solid circle
cex = 1.5, #--> make 150% size
main = name[[i]],
xlab = "Year",
ylab = "Ground water level",
)
dev.off()
}
First I would prefer doing everything in one loop for efficiency. Second, I would avoid using unnecessary packages, e.g. fs (Base R has a good list.files function to list all files in a folder) Third, I would iterate through the names of the files and not through a numeric vector, e.g.:
filepath <- "D:/Desktop/Masterarbeit/Daten/Test/"
files <- list.files(filepath, pattern=".csv")
#Iterate through every single file
for (file in files){
name2store <- strsplit(file, "[.]")[[1]][1]
path2read <- file.path(filepath, file)
data <- read.csv(file =path2read, header = TRUE)
mypath <- file.path("D:/Desktop/Masterarbeit/Daten/Results/", paste("Messstelle_", name2store, ".png", sep = ""))
png(file=mypath)
mytitle = paste("Messstelle", name2store)
plot(data$year, data$gwl,
pch = 19, #--> solid circle
cex = 1.5, #--> make 150% size
main = name2store,
xlab = "Year",
ylab = "Ground water level",
)
dev.off()
}
Currently I have been using R to read in a table and plot some of the data which I save as a png file. Now I have 100 files and would like this process to be automated rather than manually changing the path 100 times.
Additionally I would like to join the 100 files into one table in R that I can subsequently analyse. The join would be in the format of dplyr's bind_rows as all files have the same column headers. I've done this for when I have two tables in R but now when I am using a loop to read files in sequentially. What would be the best way to do this in R? Thanks in advance for any suggestions or help.
my_data <- read.table('/path/to/data/results/sample_1.txt', header = TRUE, sep = "\t")
ggplot(my data, aes(x=alt_freq)) + geom_histogram(color="black", fill="white", bins = 20) + xlim(c(0,1))
ggsave("/path/to/plots/sample_1.png", plot = last_plot(),width = 16, height = 9)
#append table to one large table in the format of dplyr::bind_rows(y, z)
Input files are all named with the same naming convention:
sample_1.txt
sample_2.txt
sample_3.txt
The files look like:
sample_name position alt_freq ref_freq sample_1_counts
sample 1 10 0.5 0.5 2
sample 1 20 0.25 0.75 4
All txt files are in the same directory and all txt files are of interest.
First collect the complete path of the files of interest
library(ggplot2)
all_files <- list.files("/path/to/data/results", pattern = "sample_\\d+\\.txt$",
full.names = TRUE)
Then create a function to apply to each file
new_fun <- function(path_of_file) {
my_data <- read.table(path_of_file, header = TRUE)
ggplot(my_data, aes(x=alt_freq)) +
geom_histogram(color="black", fill="white", bins = 20) + xlim(c(0,1))
ggsave(paste0(dirname(path_of_file), "/", sub("txt$", "png",
basename(path_of_file))), plot = last_plot(),width = 16, height = 9)
}
We use paste0 to create path to save the plot dynamically by getting the directory name and replacing the ending txt with png.
Then use lapply/map/for loop to apply new_fun to each file
lapply(all_files, new_fun)
To combine all the files into one dataframe we can do
combined_data <- do.call(rbind, lapply(all_files, read.table, header = TRUE))
If the header is different for one column we can change the column name for that particular column and then rbind. So for example, if the header information for column 1 is different, we can do
combined_data <- do.call(rbind, lapply(all_files, function(x) {
df <- read.table(x, header = TRUE)
names(df)[1] <- "new_header"
df$filename <- basename(x)
df
}))
I would do something like the following.
Change these to their real values.
in_dir <- '/path/to/data/results'
out_dir <- '/path/to/plots'
Now the plots and binding the tables.
library(ggplot2)
old_dir <- getwd()
setwd(in_dir)
flnames <- list.files(pattern = '^sample_[[:digit:]]+\\.txt$')
data_list <- lapply(flnames, read.table, header = TRUE, sep = '\t')
lapply(seq_along(data_list), function(i){
ggplot(data_list[[i]], aes(x = alt_freq)) +
geom_histogram(color = "black", fill = "white", bins = 20) +
xlim(c(0, 1))
f <- sub('txt$', 'png', flname[i])
outfile <- paste(out_dir, f, sep = '/')
ggsave(outfile, plot = last_plot(),width = 16, height = 9)
})
data_all <- dplyr::bind_rows(data_list)
Final cleanup.
setwd(old_dir)
## NOT RUN
#rm(data_list)
I have 100 scanned PDF files and I need to convert them into text files.
I have first converted them into png files (see script below),
now I need help to convert these 100 png files to 100 text files.
library(pdftools)
library("tesseract")
#location
dest <- "P:\\TEST\\images to text"
#making loop for all files
myfiles <- list.files(path = dest, pattern = "pdf", full.names = TRUE)
#Convert files to png
sapply(myfiles, function(x)
pdf_convert(x, format = "png", pages = NULL,
filenames = NULL, dpi = 600, opw = "", upw = "", verbose = TRUE))
#read files
cat(text)
I expect to have a text file for each png file:
From: file1.png, file2.png, file3.png...
To: file1.txt, file2.txt, file3.txt...
But the actual result is one text file containing all png files text.
I guess you left out the bit with teh png -> text bit, but I assume you used library(tesseract).
You could do the following in your code:
library(tesseract)
eng <- tesseract("eng")
sapply(myfiles, function(x) {
png_file <- gsub("\\.pdf", ".png", x)
txt_file <- gsub("\\.pdf", ".txt", x)
pdf_convert(x, format = "png", pages = 1,
filenames = png_file, dpi = 600, verbose = TRUE)
text <- ocr(png_file, engine = eng)
cat(text, file = txt_file)
## just return the text string for convenience
## we are anyways more interested in the side effects
text
})
I am trying to plot graphs by loop.
Input data: Tables, which have the same ending *depth.txt, there are 2 tab delimited columns in the table:
Baba"\t"58.38
Tata"\t"68.38
Mama"\t"30.80
jaja"\t"88.65
OUTPUT: I would like to get a jpeg file with plot() for each *depth.txt (their names will be the same as the tables' names) for all files (axis x will be the first column from the table and axis y will be second column)
I created a part of the script, but it doesn't work:
files <- list.files(path="/home/fil/Desktop/", pattern="*depth.txt", full.names=T,recursive=FALSE)
for (i in 1:length(files))
plot(read.table(files[i],header=F,sep="\t")$V1,read.table(files[i],header=F,sep="\t")$V2)
dev.copy(jpeg,filename=files[i])
dev.off
It doesn't work, could you help me please? I am a beginner with R.
Will the following do what you want?
for (i in 1:length(files)) {
dat <- read.table(files[i], header = FALSE, sep = '\t')
jpeg(file = paste(files[i], '.jpeg', sep = ''))
plot(dat$V1, dat$V2)
dev.off()
}
Similar to the first two but changing the file name for the plots
files <- paste("fil",1:3,"depth.txt",sep="") # example file names
for( i in 1:length(files)) {
filename <- sub(".txt",".jpg",files[i])
jpeg(file=filename)
plot(1:(10*i)) # example plots
dev.off()
}
renameing the file?
for (i in 1:length(files)) {
file = files[i]
file = paste("jpg",file,sep="_")
jpeg(file)
plot(read.table(files[i],header=F,sep="\t")$V1,read.table(files[i],header=F,sep="\t")$V2)
dev.off()
}
I have multiple CSVs as inputs which basically have lat long info and i am exporting the .tiff images which have these lat longs plotted on a map. I want to some how loop this process so as I can read multiple CSVs and hence generate multiple maps(.tiff) corresponding to these CSVs.Any help will be appreciated !!
Here is the code which I am using at present
rm(list=ls())
sclusters_1 <- readLines("C:\\Users\\D85_H.csv")
skip_second <- sclusters_1[-2]
sclusters <- read.csv(textConnection(skip_second), header = TRUE)
library(grDevices)
library(PBSmapping)
library(maptools)
library(sp)
myShapeFile<-importShapefile("C:\\Users\\st99_d00_shp\\st99_d00",readDBF=TRUE, projection = "LL")
ConvUS <- convUL(myShapeFile)
addressEvents<-as.PolyData(sclusters,projection="LL", zone = 15)
uaddressEvents <- convUL(addressEvents)
sclusters_cl <- unique(sclusters$PID)
len <- length(sclusters_cl)
palette(c("dodgerblue3","red3","olivedrab","purple4","turquoise2","orange3","lightskyblue4","mediumorchid3","saddlebrown","skyblue4"))
setwd("C:/Users/")
name, leave .tiff extension
tiff(filename = "Test.tiff",
width = 3750, height = 3142, units = "px", pointsize = 12,
compression = "lzw",
bg = "transparent")
plotMap(ConvUS , xlim=c(-8000,3500), ylim=c(2000,9500), plt=c(0.07,0.97,0.07,0.98), bg = "white", border = "darkgrey", axes=FALSE, xlab=" ",ylab=" ", lty = 1, lwd = 2)
addPoints(uaddressEvents,col=1:len,cex=2.5, pch = "O")
legend("topright",legend = sclusters_cl, cex=0.7, fill=palette())
#close output file stream
dev.off()
Mind you, this is untested:
Since both the import and export filenames are passed as length-1 characters vectors, you can make a matrix like m <- matrix(c("infile1.csv", "outfile1.tiff", "infile2.csv", "outfile2.tiff"),nrow=2).
Then you can just wrap the entire thing in a for loop over the columns of that matrix, e.g. for(j in 1:ncol(m)). Then just replace "C:\\Users\\st99_d00_shp\\st99_d00" with m[1,j] and "Test.tiff" with m[2,j].
Even better practice might be to wrap the entire thing in a function and then write a separate loop that just calls the function. For example:
myTIFF <- function (infile, outfile) {
# stuff you want to do to each pair of input and output files
}
m <- matrix(c(
"infile1.csv", "outfile1.tiff",
"infile2.csv", "outfile2.tiff"
# and so on
), nrow = 2)
for(j in 1:ncol(m)) myTIFF(m[1, j], m[2, j])
Note also that R has function called dir that automatically extracts the contents of a directory as a character vector, in the way that ls does for variables in the current environment. If you have a large number of CSV files, you can use this (possibly in conjunction with grep) to generate the matrix m programmatically rather than typing it by hand.