How to print pdf from a function - r

I have a series of tables and graphs that are produced from a list in R. I would like to create a pdf for each iteration of the list. I have tried simply using pdf() within the function but I get the error that too many graphic devices are open. How can I do this (and name the files according to the list element name?
So far I have tried:
ReportPDF<-function(x){
pdf(paste(name(x),"~\\Myfile.pdf")
tb<-table(x$acolumn)
print(fb)
}
lapply(mylist,ReportPDF)
I cant quite work out how to attach the name of the list element to the filename and I'm not even sure this is the best way to create a pdf from lapply

Can you clean some of this up?
Please give a more specific example of the object you're passing to ReportPDF(), I would expect a plot object, rather than what appears to be a data frame that you are selecting a column from.
The function example has some errors too, did you mean this?
ReportPDF<-function(x){
pdf(paste(names(x),"Myfile.pdf"))
tb<-table(x$acolumn)
print(tb)
dev.off()
}
lapply(mylist,ReportPDF)
I believe I've done something similar before and can update this answer when I get the other information.
Here's an update making some assumptions about your objects. It uses a for loop as lmo suggests, but I think a more elegant method must exist. I'm using the for loop because lapply passes the object within each element of the list, with no reference to name of the element in the list -- which is what you need to name the files individually. Note the difference between calling mylist[i] and mylist[[i]], which is part of what's breaking the code in your example. In your code, names(x) will get the names of the columns within x, rather than the name of x as it is inside of mylist, which is what you want.
x <- data.frame(acolumn = rnorm(10))
q<- data.frame(acolumn = rnorm(10))
mylist <- list(a = x,b = q)
for(i in seq_along(mylist) ){
filename <- paste(names(mylist[i]),'-myFile.pdf', sep = "")
pdf(filename)
plot(myList[[i]]$acolumn)
dev.off()
}

Related

Adding columns to data frame via user-defined function

I am trying to add columns to several dataframes. I am trying to create a function that will add the columns and then I would want to use that function with lapply over a list of object. The function is currently just adding empty columns to the data frame. But, if I solve the problem below, I would like to add to it to automatically populate the new columns (and still keeping the initial name of the object).
This is the code I have so far:
AAA_Metadata <- data.frame(AAA_Code=character(),AAA_REV4=character(),AAA_CONCEPT=character(),AAA_Unit=character(),AAA_Date=character(),AAA_Vintage=character())
add_empty_metadata <- function(x) {
temp_dataframe <- setNames(data.frame(matrix(ncol=length(AAA_Metadata),nrow=nrow(x))),as.list(colnames(AAA_Metadata)))
x <- cbind(temp_dataframe,x)
}
However when I run this
a <- data.frame(matrix(ncol=6,nrow=100))
add_empty_metadata(a)
and look at the Global Environment
object "a" still has 6 columns instead of 12.
I understand that I am actually working on a copy of "a" within the function (based on the other topics I checked, e.g. Update data frame via function doesn't work). So I tried:
x <<- cbind(temp_dataframe,x)
and
x <- cbind(temp_dataframe,x)
assign('x',x, envir=.GlobalEnv)
But none of those work. I want to have the new a in the Global Environment for future reference and keep the name 'a' unchanged. Any idea what I am doing wrong here?
Is this what you're looking for:
addCol <- function(x, newColNames){
for(i in newColNames){
x[,i] <- NA
}
return(x)
}
a <- data.frame(matrix(ncol=6,nrow=100));dim(a)
a <- addCol(a, newColNames = names(WIS_Metadata));dim(a)
Amazing source for this kind of stuff is Advanced R by Hadley Wickham with a website here.
R objects are immutable - they don't change - just get destroyed and rebuilt with the same name. a is never changed - it is used as an input to the function and unless the resulting object inside the function is reassigned via return, the version inside the function (this is a separate environment) is binned when the function is complete.

Looping and adding to a dataframe in R with a pdf extractor

I want to read in a pdf via pdf tools, extract some data from it and write it to a csv. I have been able to do this successfully for one pdf, but I have many (440) to do. I'm trying to write a loop that goes through a list I have created that has all my file paths in it. The problem is it overwrites every time. So I think my program is doing what I've asked of it, but I am not asking the correct thing! My code is below:
temp <-as.list(list.files(pattern = "*.pdf"))
file_path <- file.path(getwd(),temp)%>%as.list()
data_anz<-as.character()
for (i in 1:length(file_path)){
data_anz<-pdf_text(file_path[[i]])[2]%>%str_split("\n")%>%.[[1]]%>%str_split_fixed("\\s{2,}", n=4)%>%as.data.frame(i:length(file_path))%>%rename(Commodity =V1, Level = V2, Change = V3, Description = V4)
}
What I would like achieve is a data frame that adds to with every iteration, not over writes. So first run, the df = 1 row, 4 cols, the next run 2 rows ect.
I'm lost! But I can get it to work for an individual member of the list and it seems to work through the whole list, but overwrites.
Any help would be super appreciated!
Each iteration of the loop is assigning your table to the same variable. You might want to try something like
data_anz<-list()
for (i in 1:length(file_path)){
data_anz[[i]] <- ...
}
data_anz_all <- do.call(data_anz, rbind)
which puts each table into its own position in a list, and then row-binds them all together at the end (assuming the columns of the individual frames are compatible).

How to create an object by adding a variable to a fixed value?

I am trying to write a program to open a large amount of files and run them through a function I made called "sort". Every one of my file names starts with "sa1", however after that the characters vary based on the file. I was hoping to do something along the lines of this:
for(x in c("Put","Characters","which","Vary","by","File","here")){
sa1+x <- read.csv("filepath/sa1+x",header= FALSE)
sa1+x=sort(sa1+x)
return(sa1+x)
}
In this case, say that x was 88. It would open the file sa188, name that dataframe sa188, and then run it through the function sort. I dont think that writing sa1+x is the correct way to bind together two values, but I dont know a way to.
You need to use a list to contain the data in each csv file, and loop over the filenames using paste0.
file_suffixes <- c("put","characters","which","vary","by","file","here")
numfiles <- length(file_suffixes)
list_data <- list()
sorted_data <- list()
filename <- "filepath/sa1"
for (x in 1:numfiles) {
list_data[[x]] <- read.csv(paste0(filename, file_suffixes[x]), header=FALSE)
sorted_data[[x]] <- sort(list_data[[x]])
}
I am not sure why you use return in that loop. If you're writing a function, you should be returning the sorted_data list which contains all your post-sorting data.
Note: you shouldn't call your function sort because there is already a base R function called sort.
Additional note: you can use dir() and regex parsing to find all the files which start with "sa1" and loop over all of them, thus freeing you from having to specify the file_suffixes.

lapply and the save function in R

I have created a list of objects in my work environment
data <- c("variable1", "variable2", "variable3")
i would like to save the files to different directories with the variable name as the directory... so i did this to give me a list of file names to pass to the save function via lapply..
paste0(data,"/",data,".rda")
lapply(data,FUN=save,file = paste0(data,"/",data,".rda"))
i get the error
Error in FUN(X[[i]], ...) : object ‘X[[i]]’ not found
i'm not sure what i'm doing wrong here..
Do you have a list of objects, or a list of names of objects? You say you have the former, but the code you give is for the latter.
Also, if you only have one object per file, then it's better to use the saveRDS function (and loadRDS to load it).
lapply(data, function(x) saveRDS(get(x), paste0(x, "/", x, ".rds")))
If you have to use save:
lapply(data, function(x) save(list=x, file=paste0(x, "/", x, ".rds")))
Several things going on here.
First, you need not use lapply when you don't care about the return value of the function called at each iteration. It offers nothing in this case.
Second, and more importantly, what you are doing is writing objects to files with names derived from their variable names in R. That's an anti-pattern.
Instead, create a list of the objects, and use for for the work. We need to use saveRDS for this (thanks Hong Ooi) as l[[n]] is also not the name of an object in the environment.
l <- list(variable1 = variable1, variable2 = variable2, variable3=variable3)
for (n in names(l)) {
fname = paste0(n, '/', n, '.rda')
saveRDS(file=fname, l[[n]])
}
It would be better to just save the entire list, but then all the data would be in one file in one directory.
As for what's actually wrong with your code:
You pass the same value for file to all invocations of save, and you don't intend to do so. This value is a vector, but what you want is that each iteration gets one element from this vector.
The way lapply computes the value to pass to the function confuses save. In particular, it does this:
names <- as.character(substitute(list(...)))[-1L]
That results in something like the following, which is not the name of an object in the environment.
c("variable1", "variable2", "variable3")[[1]]

Pass variable name to a function and output inside of loop in R

There seems to be variations to this question, but none seem to address the situation of being in a loop AND naming and output file. How I thought this might work:
for(j in 1:3) {
for(k in 1:17){
extract_[j]km <- extract(RasterStack, SpatialPolygonsDataFrame_[j]km, layer=[k], nl=1, df=TRUE)
}
}
The extract function is from the raster package. I have already created a series of RasterStacks and SpatialPolygons and I want to pass these to a function ("extract") that has several parameters, some of which I wish to manipulate through the loop, and label the output accordingly. This is a breeze in BASH, but I can't figure this out in R.
Ultimately, I'd like to pass strings as well, but another post seems to show the way there.
EDIT: I originally posted the above function as being a single dataframe, when in fact, they are specified objects from the raster package (which are ultimately dataframes).
As Justin points out, working with a list is more inline with R's structure than messing up the workspace with lots of named variables. It quickly becomes challenging to work when you have a lot of objects in the workspace to "know" what's next.
Your way:
for(j in 1:3) {
assign(
paste("extract",j,"km",sep=""), # or paste0 to avoid need for sep=""
function(
get(
paste("data",j,"km",sep="")
)
)
)
}
Personally, I prefer working with lists, so below, I convert your data objects to a list and show you how to run a function on all elements of that list. Working in this way usually relegates the need to use strings in the "get" and "assign" fashion.
# just converting your variables to a list
data.list <- mget(grep("data",ls(),value=TRUE),envir=.GlobalEnv)
# then output results
result.list <- lapply(data.list,your_function)

Resources