rePost: Generating names iteratively in R for storing plots (2009) - r

A previous post provided a solution to iteratively store plots in R: see ... iteratively in R... . I had a similar problem and after reading and implementing the solutions provided by the post I am still unable to solve my problem.
The previous post provided the following code:
# Create a list to hold the plot objects.
pltList <- list()
for( i in 2:15 ){
# Get data, perform analysis, ect.
# Create plot name.
pltName <- paste( 'a', i, sep = '' )
# Store a plot in the list using the
name as an index.
pltList[[ pltName ]] <- plot()
}
The following is my code implementation:
a <- list.files("F:.../4hrs", pattern='.csv')
pltList <- list()
i=1
for (x in a) {
myfiles <- read.csv(a, header=TRUE, as.is=TRUE, nrows=2500)
h <- hist(data, plot=F)
# perform analysis, ect.
pltName <- paste('a', formatC(i, width=2, flag='0'), sep='')
pltList[[ pltName ]] <- plot(h)
i <- i+1
}
pltName does produce a list of names but pltList is of length zero.
I am not sure why pltList is not being assigned the plots.
What I eventually want to do is create a pltList with multiple plots contained therein. Then plot those plots in par(mfrow=c(2,1)) style and export as a .pdf.
I should mention that the above works for
pltList[[ pltName ]] <- xyplot(h)
but then I am unable to plot multiple plots in the style of par(mfrow=c(2,1)).
Any suggestions are appreciated.

In the original question you referenced and my answer to it, plot() was used as an abstract placeholder for a plotting function that returned an object and not a literal call to the R function plot. Most functions that return graphics objects are based on the 'grid' package such as xyplot from the lattice package or qplot from ggplot2.
Sorry for the confusion, I should have made this point clear in my answer, but I did not as the asker of the question was already aware of it.
Base graphics functions such as hist and plot render directly to output and do not return any objects that can be used to recreate the plot at a later time which is why you are ending up with a list of length zero.
In order to use this approach you will have to trade usage of the hist function for something that returns an object. Using ggplot2, you could do something like the following in your loop:
# Don't call your data variable 'data' or ggplot will confuse it with the
# `data` function and suffer an error.
h <- qplot(x = plot_data)
pltName <- paste('a', formatC(i, width=2, flag='0'), sep='')
pltList[[ pltName ]] <- h
I have edited my answer to the previous question to make it clear that the use of plot() in my example is not an actual call to the R function of the same name.

Your code uses files I don't have so I can't replicate it, I am also not entirely sure what you are trying to accomplish, but I do see some problems in the code that might help fix it:
a <- list.files("F:.../4hrs", pattern='.csv')
I am not fammiliar with list.files, Is this correctly assigning a? .csv seems an odd pattern.
pltList <- list()
i=1
for (x in a) {
myfiles <- read.csv(a, header=TRUE, as.is=TRUE, nrows=2500)
Here I think a is a vector containing the filenanes right? You are looping x for every value of a, however I don't see x return anywhere in the code. Also you are reading a vector of filenames here. Shouldnt this be read.csv(x,..., or better yet, loop for (i in 1:length(a)) and index a[i].
h <- hist(data, plot=F)
I don't see the object data anywhere. Is h correctly assigned?
# perform analysis, ect.
pltName <- paste('a', formatC(i, width=2, flag='0'), sep='')
pltList[[ pltName ]] <- plot(h)
i <- i+1
}
What I like to do is simply run such a loop by hand, and see what is going on. I think there is a problem in the assigning of myfiles or h

Related

R Lapply PointsOnLines thru a list of psp objects

Now that I have successfully, thanks to #Phil, coerced my SpatialLinesDataFrame into a list of list of psp objects, I need to use the SpatStat function pointsOnLines to create points along each list item(ie. Line), and have the marks of the line transferred to each set of points.
Im new to lapply, but seeing how it was used in converting the spatialdataframe to a list of psp objects, I think its appropriate to use it to apply the pointsOnLines function to each list? Alas, It isn't working for me. Help!
To continue the fylk example....
library("maptools")
library("rgdal")
library("spatstat")
base_dir <- system.file("shapes", package = "maptools")
fylk <- readOGR(base_dir, "fylk-val")
is(fylk)
out <- lapply(fylk#lines, function(i) { lapply(i#Lines, as.psp) })
out
dat <- fylk#data
for (i in seq_along(1:nrow(dat))) {
out[[i]] <- lapply(out[[i]], "marks<-", value = dat[i, , drop = FALSE])
}
for(i in seq_along(out)){
abc[[i]]<-(lapply(out[[i]],function(i){pointsOnLines(out[[i]],eps=10)}))
}
This doesnt work, and i can't troubleshoot why. I used [[]] as out is a list of lists?
Suggestions for the newbie?
Assuming the last lapply is your issue, consider a nested lapply to assign the result to abc:
abc <- lapply(out, function(o)
lapply(o, function(x) pointsOnLines(x, eps=10))
# EQUIVALENTLY:
# lapply(o, pointsOnLines, eps=10)
)
And to plot according to docs, run elementwise loop with Map (wrapper to mapply) between out and abc lists.
One psp and ppp pair:
plot(out[[1]], main="")
plot(abc[[1]], add=TRUE, pch="+")
Multiple psp and ppp pairs:
proc_plot <- function(X, Y) {
plot(X, main="")
plot(Y, add=TRUE, pch="+")
}
result <- Map(proc_plot, out, abc)

why doesn't this method of sorting raster objects work in R?

I'm working on a project that has large raster objects that are associated with variables and modified inside a function. I already sort the variables I need inside the function but I now want to return not just the sorted matrix of my variables but the raster associated with those values. I could run the function twice and return the second object I want in the second iteration but that seems terrible and a waste of time. I am very new to programming and R is my first language. This code below throws the same error as my more complicated function,
"Error in temp2[i, ] = t(as.matrix(temp)) :
incorrect number of subscripts on matrix "
Any advice would be very helpful, thank you.
require('raster')
r1 <- raster(nrows=108, ncols=21, xmn=0, xmx=10)
Test = function(x,y,z){
temp = matrix(NA,4,length(x))
temp2 = matrix(NA,4,length(x))
for(i in 1:length(x)){
temp=c(r1,x[i],y[i],z[i])
temp2[i,]=t(as.matrix(temp))
}
return(temp2)
}
x = c(1,2,3,4)
y = c(1,2,3,4)
z = c(1,2,3,4)
final answer = Test(x,y,z)
Your questions is not clear at all. But here is how you can sort raster values
library(raster)
s <- stack(system.file("external/rlogo.grd", package="raster"))
ss <- calc(s, sort, na.last=TRUE)

How to use extract function in a for loop?

I am using the extract function in a loop. See below.
for (i in 1:length(list_shp_Tanzania)){
LU_Mod2000<- extract(x=rc_Mod2000_LC, y=list_shp_Tanzania[[i]], fun=maj)
}
Where maj function is:
maj <- function(x){
y <- as.numeric(names(which.max(table(x))))
return(y)
}
I was expecting to get i outputs, but I get only one output once the loop is done. Somebody knows what I am doing wrong. Thanks.
One solution in this kind of situation is to create a list and then assign the result of each iteration to the corresponding element of the list:
LU_Mod2000 <- vector("list", length(list_shp_Tanzania))
for (i in 1:length(list_shp_Tanzania)){
LU_Mod2000[[i]] <- extract(x=rc_Mod2000_LC, y=list_shp_Tanzania[[i]], fun=maj)
}
Do not do
LU_Mod2000 <- c(LU_Mod2000, extract(x=rc_Mod2000_LC, y=list_shp_Tanzania[[i]], fun=maj))
inside the loop. This will create unnecessary copies and will take long to run. Use the list method, and after the loop, convert the list of results to the desired format (usually using do.call(LU_Mod2000, <some function>))
Alternatively, you could substitute the for loop with lapply, which is what many people seem to prefer
LU_Mod2000 <- lapply(list_shp_Tanzania, function(z) extract(x=rc_Mod2000_LC, y=z, fun=maj))

Is this a valid implementation of a loop?

It seems like every question involving loops in R is met with "Loops are bad" and "You're doing it wrong" with advice to use list, or tapply or whatnot.
I'm learning R, and have implemented the following loop to create image files for each factor level, with the # of factor levels changing each time I run it:
for(i in unique(df$factor)) {
lnam <- paste("test_", i, sep="")
assign(lnam, subset(df, factor==i))
lfile <- paste(lnam, ".png", sep="")
png(file = lfile, bg="transparent")
with(get(lnam), hist(x, main = paste("Histogram of x for ", i, " factor", sep="")))
dev.off()
}
This works. I want to expand it to perhaps run various tests on those subgroups (also output to files), etc.
Is this a valid and legitimate use of loops? Or is there a preferred way to skin this cat?
There's nothing wrong with loops in general. Sometimes, particularly when you're working with files or calling functions for their side-effects rather than their outputs, loops can be easier to follow than *apply calls. However, when you use a loop to simulate a operation that can be vectorised, it's often much slower, hence the recommendation to avoid them.
Re your specific example, though, I'd make the following comments:
If you want to do something for each level in a factor, it's more straightforward to use levels(factor) rather than unique(factor).
You don't need to create a new data frame specifically for each factor level.
With that in mind:
for(i in levels(df$factor))
{
lf <- paste("test_", i, ".png", sep="")
png(file=lf, bg="transparent",
with(subset(df, factor == i), hist(x, ....)
dev.off()
}
In this case, a reasonable option is to use split to convert your data frame into a list of data frames, each containing subset of with a specific factor level.
split_df <- split(df, df$factor)
As Colin mentioned, paste can be vectorised, so you only need to call it once.
lfile <- paste("test_", names(split_df), ".png", sep = "")
Group all your plotting code into a function.
draw_and_save_histogram <- function(data, file)
{
png(file)
with(data, hist(x))
dev.off()
}
Now you can more easily compare the difference between a plain loop and an *apply function (in this case mapply, since we need two inputs).
for(i in seq_along(split_df))
{
draw_and_save_histogram(split_df[[i]], lfile[i])
}
mapply(
draw_and_save_histogram,
split_df,
lfile
)
Rather than drawing a lots of histograms to be saved in different files, it is much more preferable to draw one plot with several panels using lattice or ggplot2.
library(lattice)
histogram(~ x | factor, df)
library(ggplot2)
ggplot(df, aes(x)) + geom_histogram() + facet_wrap(~ factor)

plyr application, creating a list of matrices each of which corresponds to a subset of the data

With some help, I figured out how to transform an edgelist, aka, an adjacency list into an adjacency matrix. I want to learn how to automate this for a large number of edgelists and then put the resulting adjacency matrices in a list.
I'm guessing plyr is the best way to do this, but if you want to tell me how to do it with loops I'd be grateful for that as well. For the curious, the data represents social networks in different schools.
Here's what I've got so far:
# extract one school edgelist from the dataframe
aSchool <- myDF[which(myDF$school==1), c("school", "id", "x1","x2","x3","x4","x5","x6","x7","x8","x9","x10")]
# figure out unique ids
edgeColumns <- c("x1","x2","x3","x4","x5","x6","x7","x8","x9","x10")
ids <- unique(unlist(aSchool[edgeColumns]))
ids <- ids[!is.na(ids)]
# make an empty matrix
m <- matrix(0,nrow=length(ids),ncol=length(ids))
rownames(m) <- colnames(m) <- as.character(ids)
# fill in the matrix
for(col in edgeColumns){
theseEdges <- aSchool[c("id",col)]
theseEdges <- na.omit(theseEdges)
theseEdges <- apply(theseEdges,1,as.character)
theseEdges <- t(theseEdges)
m[theseEdges] <- m[theseEdges] + 1
}
for(i in 1:nrow(m)) m[i,i] <- 0
Check out the SNA package and the as.edgelist.sna() and as.sociomatrix.sna() functions.
In particular, as.sociomatrix.sna() seems like the perfect solution here: it's designed to convert an edgelist to an adjacency matrix in a single step (without losing attributes such as vertex names, etc.). Wrap it all up in a call to lapply() and I think you've got yourself yet another (maybe less labor intensive?) solution.
If you'd like to see a more expressive answer, I think it would be helpful to either provide more complete sample data or a clearer description of exactly what is in myDF
Also, I don't have the reputation on SO to do so, but I would add some tags to this post to signal that it's about network analysis.
Its hard to answer your question without a workable example. But if I understand your question correctly here is a function that should work (returns a list containing symmetrican adjacency matrices):
makeADJs <- function(...)
{
require(plyr)
dfs <- list(...)
e2adj <- function(x)
{
IDs <- unique(c(as.matrix(x)))
df <- apply(x,2,match,IDs)
adj <- matrix(0,length(IDs),length(IDs))
colnames(adj) <- rownames(adj) <- IDs
a_ply(rbind(df,df[,2:1]),1,function(y){adj[y[1],y[2]] <<- 1})
return(adj)
}
llply(dfs,e2adj)
}
Example:
makeADJs(
cbind(letters[sample(1:26)],letters[sample(1:26)]),
cbind(letters[sample(1:26)],letters[sample(1:26)]),
cbind(letters[sample(1:26)],letters[sample(1:26)]),
cbind(letters[sample(1:26)],letters[sample(1:26)])
)
Edit:
Or without plyr:
makeADJs <- function(...)
{
dfs <- list(...)
e2adj <- function(x)
{
IDs <- unique(c(as.matrix(x)))
df <- apply(x,2,match,IDs)
adj <- matrix(0,length(IDs),length(IDs))
colnames(adj) <- rownames(adj) <- IDs
apply(rbind(df,df[,2:1]),1,function(y){adj[y[1],y[2]] <<- 1})
return(adj)
}
lapply(dfs,e2adj)
}
Edit2:
And to plot them all in a single pdf file:
library(qgraph)
pdf("ADJplots.pdf")
l_ply(adjs,function(x)qgraph(x,labels=colnames(x)))
dev.off()

Resources