I was wondering if anyone could assist with code below. I have a huge dataset (> 1000 subjects) which I'm trying to visualise individually.
I was fortunate to find a code written by Tony Cookson from R-bloggers which I've modified for my use. The code works ok but the pdfs produced are damaged-essentially they refuse to open. I have a feeling there's bug somewhere but I haven't yet figured out where. Any assistance would be highly appreciated.
library(lattice)
names = LETTERS[1:3]
for(i in 1:3){
mypath <- file.path("myFilepath", "folder containing 'Plots' subfolder ",
"Plots",paste("myplot_", names[i], ".pdf", sep = ""))
pdf(file=mypath)
mytitle = paste("Theoph Plots", names[i])
xyplot(conc ~ Time | Subject, group = Subject, data = Theoph, type = "l",
layout = c(2, 2), main = mytitle)
dev.off()
}
For the code to be reproducible, you need to replace myFilepath, folder containing 'Plots' subfolder and "Plots" with names of actual folders that can be found on your computer. Please see the original on R-bloggers for more details. I would be very happy to clarify anything that seems ambiguous.
Thanks
Edit:
library(lattice)
names = LETTERS[1:3]
for(i in 1:3){
mypath <- file.path("myFilepath", "folder containing 'Plots' subfolder ",
"Plots",paste("myplot_", names[i], ".pdf", sep = ""))
pdf(file=mypath)
mytitle = paste("Theoph Plots", names[i])
print(xyplot(conc ~ Time | Subject, group = Subject, data = Theoph, type = "l",
layout = c(2, 2), main = mytitle))
dev.off()
}
I've managed to find a temporary solution (above) using the print function. However, I'm currently getting all 12 Subjects in the same pdf. What I really want is 4 subjects (2 by 2 matrix) on separate pdfs so making 3 pdfs in total. Anyone know how to do this?
If you're looking to plot a subset of Subjects on each page, then you have to subset your data for each iteration and then plot.
To get 4 Subjects on each page, you can use the following index builder as a basis for subsetting:
(i - 1) * 4 + 1:4
The trick with the Theoph dataset is that the subject "numbers" are actually ordered factors. So you have to convert the above to a factor, or, as a shortcut, to a character vector.
for(i in 1:3){
## Changed mypath to make it reproducible
mypath <- file.path(tempdir(), paste("myplot_", names[i], ".pdf", sep = ""))
pdf(file=mypath)
mytitle = paste("Theoph Plots", names[i])
myIndex <- as.character((i - 1) * 4 + 1:4) # index builder from above
print(xyplot(conc ~ Time | Subject,
data = Theoph[Theoph$Subject %in% myIndex, ],
type = "l", layout = c(2, 2), main = mytitle))
dev.off()
}
The order of the subjects is a bit screwy, since that variable is an ordered factor, as mentioned. To keep the ordering, you could subset on the levels of that factor:
myIndex <- levels(Theoph$Subject)[(i - 1) * 4 + 1:4]
The best way to build your index will depend on your actual data.
Related
I am trying to extract data using the rgbif package for multiple species (once the code works I'll be running a list of about 200 species, so it is important for me to implement a list).
I have tried to adapt code written in following link:
https://github.com/ropensci/rgbif/issues/377
This is what my input file looks like:
csv file
And my code looks as follows:
library("rgbif")
#input <- read.csv("C:/Users/omi30wk/Desktop/TESTsampledata_udi.csv", header = TRUE, fill = TRUE, sep = ",")
#since you guys don't have my csv file here are three samples species I'm using:
# Acanthorrhynchium papillatum, Acrolejeunea sandvicensis, Acromastigum cavifolium
#'taxon' as header, see image posted above of my csv file for clarity
allpts <- vector('list', length(input))
names(allpts) = input
for (taxon in input){
cat(taxon, "\n")
allpts[[taxon]] <- occ_data(scientificName = taxon, limit = 2) #error here
df <- allpts[[taxon]]$data
df$networkKeys = NULL
if (!is.null(df)) {
df <- df[, !apply(df, 2, function(z)
is.null(unlist(z)))]
write.csv(df, paste("/Users/user/Desktop/DATA Bats/allpts_30sept/", gsub(" ", "_", taxon), ".csv", sep = "")) } }
However I get following error message at the moment:
Error in `[[<-`(`*tmp*`, taxon, value = list(`Acanthorrhynchium papillatum` = list( :
no such index at level 1
I'm even happy to try different codes to extract multiple species data. I've already tried many codes (i.e. loops, etc) that also kept giving me error messages and I haven't been able to solve.
Any help is greatly appreciated!
I am new in coding with R and I work with a large dataset.
I am trying to write a code that do the following things:
Get all pathes to all files in my folder
Extract the names of the files (as I want to name my plots after the input file)
Read in all files in my folder (these are all .csv files)
Plot a diagram for each .csv file by plotting groundwater level against the year
--> these plots should then get the title of the input file and also be stored under the same name.
For example when my file is called 211210.csv, then the title should be 211210 and stored as 211210.png
This is the code I have until know. As I said, I am new to R, and I tried to solve may problems I had in the code but I still run into new errors. Is there someone who can explain me where the problem is and how to solve it.
library(fs)
library(ggplot2)
library(tidyverse)
#Opening path to my data
filepath <- fs::dir_ls("D:/Desktop/Masterarbeit/Daten/Test/")
# Get name of files
name <- basename(filepath)
#Read every single files
file_content <- list()
for (i in seq_along(filepath)){
path <- filepath
file_content[[i]] <- read.csv(
file = filepath[[i]], header = TRUE
)
}
file_content <- set_names(file_content, filepath)
#Plot the diagram with gwl against year for each file, title = name of each file and store it in a seperat folder with the name of the input file
for (i in file_content){
mypath <- file.path("D:/Desktop/Masterarbeit/Daten/Results/", paste("Messstelle_", name[[i]], ".png", sep = ""))
png(file=mypath)
mytitle = paste("Messstelle", name[[i]])
plot(i$year, i$gwl,
pch = 19, #--> solid circle
cex = 1.5, #--> make 150% size
main = name[[i]],
xlab = "Year",
ylab = "Ground water level",
)
dev.off()
}
First I would prefer doing everything in one loop for efficiency. Second, I would avoid using unnecessary packages, e.g. fs (Base R has a good list.files function to list all files in a folder) Third, I would iterate through the names of the files and not through a numeric vector, e.g.:
filepath <- "D:/Desktop/Masterarbeit/Daten/Test/"
files <- list.files(filepath, pattern=".csv")
#Iterate through every single file
for (file in files){
name2store <- strsplit(file, "[.]")[[1]][1]
path2read <- file.path(filepath, file)
data <- read.csv(file =path2read, header = TRUE)
mypath <- file.path("D:/Desktop/Masterarbeit/Daten/Results/", paste("Messstelle_", name2store, ".png", sep = ""))
png(file=mypath)
mytitle = paste("Messstelle", name2store)
plot(data$year, data$gwl,
pch = 19, #--> solid circle
cex = 1.5, #--> make 150% size
main = name2store,
xlab = "Year",
ylab = "Ground water level",
)
dev.off()
}
My data (TransDat70) contains 103 variables total. The first 102 are named "V1" through "V102", the last variable is names "Time.Min".
I need to generate 102 ggplots of each variable (V1 through V102) against the variable "Time.Min". I then need to save all these ggplots in a separate file (pdf) preferably all next to/below one another for comparison purposes.
I tried using code that I was able to find online but none has worked for me so far.
Here is my code:
var_list = combn(names(TransDat70)[1: 102], 2, simplify = FALSE)
plot_list = list()
for (i in 1: 3) {
p = ggplot(TransDat70, aes_string(x = var_list[[i]][1], y = var_list[[i]][2])) + geom_point()
plot_list[[i]] = p
}
for (i in 1: 3) {
plot70 = paste("iris_plot_", i, ".tiff", sep = "")
tiff(plot70)
print(plot_list[[i]])
dev.off()
}
pdf("plots.pdf")
for (i in 1: 3) {
print(plot_list[[i]])
}
dev.off()
Any suggestions?
If by separate you meant each plot in a separate file, how about this?
library(ggplot2)
# FAKE DATA AS EXAMPLE
TransDat70 <- data.frame(
1:10,
1:10,
1:10,
1:10,
1:10
)
colnames(TransDat70) <- c('V1', 'V2', 'V3', 'V4', 'Time.Min')
for (i in 1:(length(TransDat70) - 1)) {
p <- ggplot(TransDat70, aes_string(x = paste('V', toString(i), sep=''), y='Time.Min')) + geom_point()
ggsave(paste('~/Desktop/plot_', i, '.pdf', sep=''), p)
}
See the ggsave documentation for more options.
If you meant to have them all in one big file, take a look at Printing multiple ggplots into a single pdf, multiple plots per page.
However, for that many plots it would make a likely make a huge file, which could be problematic to open, especially if you have many points in your plots. In that case it might be better to compare them as separate files.
I am almost a beginner in R so please forgive me if i sound stupid. here is my situation:
I simulated 960 different reply patterns for a 10-item test. they are stored in my directory in .txt format as pairs, so there are 480 pairs of text files. they are named like: x_a_b_c_d or y_a_b_c_d where a, b, c and d are numbers. a is between 1 and 3, b and c are between 1 and 4 and d is between 1 and 10. I need to call each pair from the directory, convert them into frequency tables and equate them. I can do this one by one:
First I call a pair from the directory and turn them into frequency tables with freqtab() function (because equate() only works with them).
path1<-"directory//x_1_1_1_1.txt"
x1<-(read.table(path1, header=TRUE))
ftx1<-freqtab(x1, items = list(1:10, 9:10), scales = list(0:10, 0:2))
path2<-"directory//y_1_1_1_1.txt"
y1<-(read.table(path2, header=TRUE))
fty1<-freqtab(y1, items = list(1:10, 9:10), scales = list(0:10, 0:2))
then i equate them as in:
eq1<- equate(ftx1, fty1, type="linear", method="levine", ws=1)$conc$yx
however, I need to do that for all of the pairs one by one.
so is there any way that I can call .txt files as pairs and equate them in one function?
I don't know equate, so I'll answer from a generic data-processing standpoint.
xfiles <- list.files(path = "some_directory", pattern = "^x.*", full.names = TRUE)
yfiles <- gsub("^x", "y", xfiles)
bothexist <- file.exists(xfiles) & file.exists(yfiles)
xdata <- lapply(xfiles[bothexist], function(fn) {
freqtab(read.table(fn, header = TRUE), items = list(1:10, 9:10), scales = list(0:10, 0:2))
})
ydata <- lapply(yfiles[bothexist], function(fn) {
freqtab(read.table(fn, header = TRUE), items = list(1:10, 9:10), scales = list(0:10, 0:2))
})
eq <- Map(function(x,y) equate(x, y, type="linear", method="levine", ws=1)$conc$yx,
xdata, ydata)
My data (TransDat70) contains 103 variables total. The first 102 are named "V1" through "V102", the last variable is names "Time.Min".
I need to generate 102 ggplots of each variable (V1 through V102) against the variable "Time.Min". I then need to save all these ggplots in a separate file (pdf) preferably all next to/below one another for comparison purposes.
I tried using code that I was able to find online but none has worked for me so far.
Here is my code:
var_list = combn(names(TransDat70)[1: 102], 2, simplify = FALSE)
plot_list = list()
for (i in 1: 3) {
p = ggplot(TransDat70, aes_string(x = var_list[[i]][1], y = var_list[[i]][2])) + geom_point()
plot_list[[i]] = p
}
for (i in 1: 3) {
plot70 = paste("iris_plot_", i, ".tiff", sep = "")
tiff(plot70)
print(plot_list[[i]])
dev.off()
}
pdf("plots.pdf")
for (i in 1: 3) {
print(plot_list[[i]])
}
dev.off()
Any suggestions?
If by separate you meant each plot in a separate file, how about this?
library(ggplot2)
# FAKE DATA AS EXAMPLE
TransDat70 <- data.frame(
1:10,
1:10,
1:10,
1:10,
1:10
)
colnames(TransDat70) <- c('V1', 'V2', 'V3', 'V4', 'Time.Min')
for (i in 1:(length(TransDat70) - 1)) {
p <- ggplot(TransDat70, aes_string(x = paste('V', toString(i), sep=''), y='Time.Min')) + geom_point()
ggsave(paste('~/Desktop/plot_', i, '.pdf', sep=''), p)
}
See the ggsave documentation for more options.
If you meant to have them all in one big file, take a look at Printing multiple ggplots into a single pdf, multiple plots per page.
However, for that many plots it would make a likely make a huge file, which could be problematic to open, especially if you have many points in your plots. In that case it might be better to compare them as separate files.