How transform list to plotable dataframe in R? - r

I have those 2 lists :
> FHM_CS
$X3
[1] 100
$X5
[1] 100
$X7
[1] 54.23706 63.48137 51.04026 60.14302 70.39396 56.59812 75.41480 88.26871 70.96976 54.20140 63.43252
[12] 51.00868 60.10348 70.33980 56.56310 75.36522 88.20079 70.92585
$X9
[1] 38.63259 27.74551 21.17788 100.00000 73.08030 55.78148 85.86148 38.56665 27.71148 21.15804
11] 72.99067 55.72924 85.78107
$XAS
[1] 0
$XPW
[1] 49.07016 40.02288 23.87023 100.00000 89.30224 53.26115 69.98929 0.00000
and
> FHM_CD
$X3
[1] 14.8840750 17.7316138 6.1164435 0.0000000 1.1435141 14.8904265 17.7375474 6.1241709 1.1506441
[10] 14.6751282 17.5364297 5.8621689 0.9089743
$X5
[1] 74.41660 76.74417 80.95828 58.58119 62.34946 69.17199 57.25100 61.14029 68.18193 74.38872 76.72114
[12] 80.94284 58.53606 62.31217 69.14699 57.20442 61.10180 68.15613 74.34258 76.68302 80.91730 58.46136
[23] 62.25047 69.10565 57.12732 61.03811 68.11346
$X7
[1] 66.30768 60.56507 68.29355 49.37678 40.74842 52.36058 36.42026 25.58356 40.16773 66.32983 60.59541
[12] 68.31317 49.41006 40.79401 52.39005 36.46206 25.64082 40.20475
$X9
[1] 66.14771 75.68765 81.44262 55.01738 67.69397 75.34112 51.61251 65.24862 73.47461 66.20550 75.71747
[12] 81.46000 55.09417 67.73359 75.36421 51.69510 65.29125 73.49945
$XAS
[1] 25.62701 45.29201 44.51013 0.00000 17.99168 16.81963
$XPW
[1] 7.344758 24.428011 54.927770 0.000000 4.637824 43.124615 39.752560 100.000000
And I would like to do a "clustured jittered plot" for every line from both list. For example : X3 from FHM_CS right by X3 from FHM_CD and so one for every row of the lists.
I was thinking of using qplot from ggplot2 with geom="jitter", but I would also like to add an horizontal bar for each bar space to show the mean every list.
It would be something like this except I would like to add the mean for every list as a horizontal red bar (with its value if possible) and the clustering part (like FHM_CS in blue and FHM_CD in red).
So how to convert those list to dataframe and how to plot this from there ?

To create a data.frame, you can do it crudely like this:
df <- data.frame(values=unlist(FHM_CS, use.names=FALSE), tag=rep(names(FHM_CS), times=sapply(FHM_CS, length))
But for usage with ggplot2, we should merge everything into a single dataframe:
df.CS <- data.frame(values=unlist(FHM_CS, use.names=FALSE), tag=rep(names(FHM_CS), times=sapply(FHM_CS, length)), class='CS', stringsAsFactors=TRUE)
df.CD <- data.frame(values=unlist(FHM_CD, use.names=FALSE), tag=rep(names(FHM_CD), times=sapply(FHM_CD, length)), class='CD', stringsAsFactors=TRUE)
my.data <- rbind(df.CS, df.CD)
Edit Alternatively, as seen what Michele found, use melt:
library(reshape2)
df.CD <- data.frame(melt(FHM_CD), class='CD')
df.CS <- data.frame(melt(FHM_CS), class='CS')
## Except now, instead of `tag`, we have `L1`.
my.data <- rbind(df.CD, df.CS)
my.data$tag <- my.data$L1
End of edit
Then to plot, as you want (I was lazy and didn't enter much data):
library(ggplot2)
ggplot(my.data, aes(x=interaction(tag, class), y=values)) + geom_point(position=position_jitter())
But lets try to add the horizontal bars. But, I would use facetting so we will get the following:
ggplot(my.data, aes(x=tag, y=values)) + geom_point(position=position_jitter()) + stat_summary(fun.y='mean', geom='errorbarh', aes(xmin=as.integer(tag)-0.3, xmax=as.integer(tag)+0.3), height=0) + facet_grid(.~class)
Edit 2
First manually create the interaction vector:
my.data$it <- with(my.data, interaction(tag, class, sep=' - ', lex.order=TRUE))
Then we plot as previously.
ggplot(my.data, aes(x=it, y=values)) + geom_point(position=position_jitter()) + stat_summary(fun.y='mean', geom='errorbarh', aes(xmin=as.integer(it)-0.3, xmax=as.integer(it)+0.3, height=0, colour=class))
Of course, you might want to edit the arguments to position_jitter() to squeeze the points more closer.

lst1 <- list("A", "B", "C")
lst2 <- list(rnorm(1000), rnorm(1000), rnorm(1000))
library(ggplot2)
library(reshape2)
df <- merge(melt(lst1, value.name="id"), melt(lst2), by="L1")
# L1 is just an output from melt.list method and represents the list items' index
> head(df)
L1 id value
1 1 A 2.0216986
2 1 A 1.4856589
3 1 A -0.2204599
4 1 A 0.6514056
5 1 A 0.3035737
6 1 A 0.8371660
qplot(id, value, data=df, geom="jitter")

Related

Find closest colors from grDevices::colors() to copy chart with ggplot

Say I have a chart such as this one, as an image:
I want to extract its colors and find the closest color available in grDevices::colors() and that can be seen here
head(grDevices::colors())
[1] "white" "aliceblue" "antiquewhite" "antiquewhite1" "antiquewhite2" "antiquewhite3"
The simplest output would be a vector of these colors.
A fancier output would be a data.frame with the real color codes, the "rounded" color (i.e. part of grDevices::colors()) , the percentage of image surface it covers, and the coordinates of centers of gravity of its covered areas.
A super fancy output would overlay these color names over the original chart, or/and build a new dot chart that with dots placed at these center positions and color names as text labels.
An ultra fancy output would propose the closest match among existing palettes.
tldr : get_named_colors("https://i.stack.imgur.com/zdyNO.png") using the function defined at the bottom.
We will load the image in R, convert it to long rgb format, get the rgb values of the named colors as well and put them in the same format, then compute all relevant distances and keep the minimum for each of the colors of our image, from there we get our output.
library(ggplot2)
library(dplyr)
library(png)
our candidates :
rgb_named_colors <- t(col2rgb(grDevices::colors())/255)
head(rgb_named_colors,3)
# red green blue
# [1,] 1.0000000 1.0000000 1.0000000
# [2,] 0.9411765 0.9725490 1.0000000
# [3,] 0.9803922 0.9215686 0.8431373
our colors :
img <- readPNG("https://i.stack.imgur.com/zdyNO.png")
dim(img) # [1] 476 746 3
# it's a 3d matrix, let's convert it to long format
rgb_img <- apply(img,3,c)
colnames(rgb_img) <- c("red","green","blue")
head(rgb_img,3)
# red green blue
# [1,] 0.9803922 0.9803922 0.9803922
# [2,] 0.9803922 0.9803922 0.9803922
# [3,] 0.9803922 0.9803922 0.9803922
dim(unique(rgb_img)) # [1] 958 3
We have 958 colors, it's a bit much, we need to filter out those with low occurences,
we set a cutoff to 0.5% of img pixels.
rgb_img_agg <-
rgb_img %>%
as_tibble %>%
group_by_all %>%
count %>%
filter(n > dim(img)[1]* dim(img)[2] *0.5/100)
How did it work out ?
dim(rgb_img_agg) # [1] 11 4
much better.
head(rgb_img_agg,3)
# # A tibble: 3 x 4
# # Groups: red, green, blue [3]
# red green blue n
# <dbl> <dbl> <dbl> <int>
# 1 0.04705882 0.2627451 0.5137255 2381
# 2 0.27843137 0.5568627 0.7803922 29353
# 3 0.37254902 0.7450980 0.2549020 2170
for all of the image colors we compute the distance to named colors and keep the min
output <- apply(rgb_img_agg[1:3],1, function(row_img)
grDevices::colors()[which.min(
apply(rgb_named_colors,1,function(row_named)
dist(rbind(row_img,row_named))))])
ouput
# [1] "dodgerblue4" "steelblue3" "limegreen" "olivedrab" "gray80" "olivedrab1" "chocolate3" "chocolate1"
# [9] "ghostwhite" "gray98" "white"
It works! now let's display all of our colors with a legend:
ggplot(tibble(named_color=output),aes(named_color,fill=factor(named_color,levels=output))) + geom_bar() +
scale_fill_manual(values = output)
Now we put everything in a function :
get_named_colors <- function(path, cutoff = 0.5){
library(dplyr)
library(ggplot2)
library(png)
# named colors
rgb_named_colors <- t(col2rgb(grDevices::colors())/255)
# colors from path
img <- readPNG(path)
rgb_img <- apply(img,3,c)
colnames(rgb_img) <- c("red","green","blue")
rgb_img_agg <-
rgb_img %>%
as_tibble %>%
group_by_all %>%
count %>%
filter(n > dim(img)[1]* dim(img)[2] *cutoff/100)
# distances
output <- apply(rgb_img_agg[1:3],1, function(row_img)
grDevices::colors()[which.min(
apply(rgb_named_colors,1,function(row_named)
dist(rbind(row_img,row_named))))])
p <- ggplot(tibble(named_color=output),aes(named_color,fill=factor(named_color,levels=output))) + geom_bar() +
scale_fill_manual(values = output)
print(p)
output
}
I might update if I found out how to implement the fancy features.

Broken R code to select specific rows and cells in text file and put into data frame

This is an extension of this question which needs to be altered to accommodate more rows Bands in the text file. I want is to select the "Basic stats" rows from a text file that looks like the one below and then organize them in a data frame, like the one at the bottom of the question. Here's a link to the file if you want to use it directly.
Filename: /blah/blah/blah.txt
ROI: red_2 [Red] 12 points
Basic Stats Min Max Mean Stdev
Band 1 0.032262 0.124425 0.078073 0.028031
Band 2 0.021072 0.064156 0.037923 0.012178
Band 3 0.013404 0.066043 0.036316 0.014787
Band 4 0.005162 0.055781 0.015526 0.013255
Histogram DN Npts Total Percent Acc Pct
Band 1 0.032262 1 1 8.3333 8.3333
Bin=0.00036 0.032624 0 1 0.0000 8.3333
0.032985 0 1 0.0000 8.3333
0.033346 0 1 0.0000 8.3333
This is the code I'm using:
dat <- readLines('/blah/blah/blah.txt')
# create an index for the lines that are needed: Basic stats and Bands
ti <- rep(which(grepl('ROI:', dat)), each = 8) + 1:8
# create a grouping vector of the same length
grp <- rep(1:203, each = 8)
# filter the text with the index 'ti'
# and split into a list with grouping variable 'grp'
lst <- split(dat[ti], grp)
# loop over the list a read the text parts in as dataframes
lst <- lapply(lst, function(x) read.table(text = x, sep = '\t', header = TRUE, blank.lines.skip = TRUE))
# bind the dataframes in the list together in one data.frame
DF <- do.call(rbind, lst)
# change the name of the first column
names(DF)[1] <- 'ROI'
# get the correct ROI's for the ROI-column
DF$ROI <- sub('.*: (\\w+).*$', '\\1', dat[grepl('ROI: ', dat)])
DF
The output looks something like this:
$ROI
[1] "red_2" "red_3" "red_4" "red_5" "red_6" "red_7" "red_8" "red_9" "red_10" "bcs_1" "bcs_2"
[12] "bcs_3" "bcs_4" "bcs_5" "bcs_6" "bcs_7" "bcs_8" "bcs_9" "bcs_10" "red_11" "red_12" "red_12"
[23] "red_13" "red_14" "red_15" "red_16" "red_17" "red_18" "red_19" "red_20" "red_21" "red_22" "red_23"
[34] "red_24" "red_25" "red_24" "red_25" "red_26" "red_27" "red_28" "red_29" "red_30" "red_31" "red_33"
$<NA>
[1] "Basic Stats\t Min\t Max\t Mean\t Stdev"
$<NA>
[1] "Basic Stats\t Min\t Max\t Mean\t Stdev"
etc...
When it should look this this:
ROI Band Min Max Mean Stdev
red_2 Band 1 0.032262 0.124425 0.078073 0.028031
red_2 Band 2 0.021072 0.064156 0.037923 0.012178
red_2 Band 3 0.013404 0.066043 0.036316 0.014787
red_2 Band 4 0.005162 0.055781 0.015526 0.013255
red_3 Band 1 values...
red_4 Band 2
red_4 Band 3
red_4 Band 4
I would like some help.
For this file you will have to adapt the approach I proposed here. For the linked text-file (test2.txt) I propose the following approach:
dat <- readLines('test2.txt')
len <- sum(grepl('ROI:', dat))
ti <- rep(which(grepl('ROI:', dat)), each = 7) + 0:6
grp <- rep(1:len, each = 7)
lst <- split(dat[ti], grp)
lst <- lapply(lst, function(x) read.table(text = x, sep = '\t', skip = 1, header = TRUE, blank.lines.skip = TRUE))
names(lst) <- sub('.*: (\\w+).*$', '\\1', dat[grepl('ROI: ', dat)])
library(data.table)
DT <- rbindlist(lst, idcol = 'ROI')
setnames(DT, 2, 'Band')
which give the desired result:
> DT
ROI Band Min Max Mean Stdev
1: red_1 Band 1 0.013282 0.133982 0.061581 0.034069
2: red_1 Band 2 0.009866 0.112935 0.042688 0.026618
3: red_1 Band 3 0.008304 0.037059 0.018434 0.007515
4: red_1 Band 4 0.004726 0.040089 0.018490 0.009605
5: red_2 Band 1 0.032262 0.124425 0.078073 0.028031
---
1220: bcs_49 Band 4 0.002578 0.010578 0.006191 0.002285
1221: bcs_50 Band 1 0.032775 0.072881 0.051152 0.012593
1222: bcs_50 Band 2 0.020029 0.085993 0.042864 0.018628
1223: bcs_50 Band 3 0.012770 0.034367 0.023056 0.006581
1224: bcs_50 Band 4 0.005804 0.024798 0.014049 0.005744

cbind 1:nrows of same ID variable value to original data.frame

I have a large dataframe, where a variable id (first column) recurs with different values in the second column. My idea is to order the dataframe, to split it into a list and then lapply a function which cbinds the sequence 1:nrows(variable id) to each group. My code so far:
DF <- DF[order(DF[,1]),]
DF <- split(DF,DF[,1])
DF <- lapply(1:length(DF), function(i) cbind(DF[[i]], 1:length(DF[[i]])))
But this gives me an error: arguments imply different number of rows.
Can you elaborate?
> head(DF, n=50)
cell area
1 1 121.2130
2 2 81.3555
3 3 81.5862
4 4 83.6345
...
33 1 121.3270
34 2 80.7832
35 3 81.1816
36 4 83.3340
DF <- DF[order(DF$cell),]
What I want is:
> head(DF, n=50)
cell area counter
1 1 121.213 1
33 1 121.327 2
65 1 122.171 3
97 1 122.913 4
129 1 123.697 5
161 1 124.474 6
...and so on.
This is my code:
cell.areas.t <- function(file) {
dat = paste(file)
DF <- read.table(dat, col.names = c("cell","area"))
DF <- splitstackshape::getanID(DF, "cell")[] # thanks to akrun's answer
ggplot2::ggplot(data = DF, aes(x = .id , y = area, color = cell)) +
geom_line(aes(group = cell)) + geom_point(size=0.1)
}
And the plot looks like this:
Most cells increase in area, only some decrease. This is only a first try to visualize my data, so what you can't see very well is that the areas drop down periodically due to cell division.
Additional question:
There is a problem I didn't take into account beforehand, which is that after a cell division a new cell is added to the data.frame and is handed the initial index 1 (you see in the image that all cells start from .id=1, not later), which is not what I want - it needs to inherit the index of its creation time. First thing that comes into my mind is that I could use a parsing mechanism that does this job for a newly added cell variable:
DF$.id[DF$cell != temporary.cellindex] <- max(DF$.id[DF$cell != temporary.cellindex])
Do you have a better idea? Thanks.
There is a boundary condition which may ease the problem: fixed number of cells at the beginning (32). Another solution would be to cut away all data before the last daughter cell is created.
Update: Additional question solved, here's the code:
cell.areas.t <- function(file) {
dat = paste(file)
DF <- read.table(dat, col.names = c("cell","area"))
DF$.id <- c(0, cumsum(diff(DF$cell) < 0)) + 1L # Indexing
title <- getwd()
myplot <- ggplot2::ggplot(data = DF, aes(x = .id , y = area, color = factor(cell))) +
geom_line(aes(group = cell)) + geom_line(size=0.1) + theme(legend.position="none") + ggtitle(title)
#save the plot
image=myplot
ggsave(file="cell_areas_time.svg", plot=image, width=10, height=8)
}
We can use getanID from splitstackshape
library(splitstackshape)
getanID(DF, "cell")[]
There's a much easier method to accomplish that goal. Use ave with seq.int
DF$group_seq <- ave(DF, DF[,1], FUN=function(x){ seq.int(nrow(x)) } )

cbind subsets into one column in r

I have have created subsets of a dataframe, which I used for calculations. I am now left with numberous subsets which I want to combine into one column. The subsets look like this:
> E
$`1`
[1] "AAAaaa" "TTTaaa" "CCCaaa" "GGGaaa" "AAAttt" "TTTttt" "CCCttt" "GGGttt"
[9] "AAAccc" "TTTccc" "CCCccc" "GGGccc" "AAAggg" "TTTggg" "CCCggg" "GGGggg"
$`2`
[1] "ATAata" "TATata" "CGCata" "GCGata" "BBBata" "ATAtat" "TATtat" "CGCtat"
[9] "GCGtat" "BBBtat" "ATAcgc" "TATcgc" "CGCcgc" "GCGcgc" "BBBcgc" "ATAgcg"
[17] "TATgcg" "CGCgcg" "GCGgcg" "BBBgcg" "ATAbbb" "TATbbb" "CGCbbb" "GCGbbb"
[25] "BBBbbb"
I have tried:
A=vector()
cbind(A,ExonJunction,deparse.level = 1)
A
But that leaves me with
E
1 Character,16
2 Character,25
I want the list of characters in one column. How do I do this?
Could also try the recursive argument in c function, something like
c(E, recursive = TRUE, use.names = FALSE)
# [1] "AAAaaa" "TTTaaa" "CCCaaa" "GGGaaa" "AAAttt" "TTTttt" "CCCttt" "GGGttt" "AAAccc" "TTTccc" "CCCccc" "GGGccc" "AAAggg" "TTTggg" "CCCggg" "GGGggg" "ATAata"
# [18] "TATata" "CGCata" "GCGata" "BBBata" "ATAtat" "TATtat" "CGCtat" "GCGtat" "BBBtat" "ATAcgc" "TATcgc" "CGCcgc" "GCGcgc" "BBBcgc" "ATAgcg" "TATgcg" "CGCgcg"
# [35] "GCGgcg" "BBBgcg" "ATAbbb" "TATbbb" "CGCbbb" "GCGbbb" "BBBbbb"
Or if you want it as a column within a data frame, could try
df <- data.frame(Res = c(E, recursive = TRUE))
You can unlist the list and create a single column dataframe with data.frame
dat <- data.frame(Col1=unlist(E, use.names=FALSE), stringsAsFactors=FALSE)
data
E <- structure(list(`1` = c("AAAaaa", "TTTaaa", "CCCaaa", "GGGaaa",
"AAAttt", "TTTttt", "CCCttt", "GGGttt", "AAAccc", "TTTccc", "CCCccc",
"GGGccc", "AAAggg", "TTTggg", "CCCggg", "GGGggg"), `2` = c("ATAata",
"TATata", "CGCata", "GCGata", "BBBata", "ATAtat", "TATtat", "CGCtat",
"GCGtat", "BBBtat", "ATAcgc", "TATcgc", "CGCcgc", "GCGcgc", "BBBcgc",
"ATAgcg", "TATgcg", "CGCgcg", "GCGgcg", "BBBgcg", "ATAbbb", "TATbbb",
"CGCbbb", "GCGbbb", "BBBbbb")), .Names = c("1", "2"))
You can also use stack, like this (provided you are dealing with a named list, like you are):
stack(E)
A nice feature is that the names become the "ind" column, so the process is easily reversible.
head(stack(E))
# values ind
# 1 AAAaaa 1
# 2 TTTaaa 1
# 3 CCCaaa 1
# 4 GGGaaa 1
# 5 AAAttt 1
# 6 TTTttt 1
tail(stack(E))
# values ind
# 36 BBBgcg 2
# 37 ATAbbb 2
# 38 TATbbb 2
# 39 CGCbbb 2
# 40 GCGbbb 2
# 41 BBBbbb 2

R for loop and a series of plots: obtaining always the same plot

I'm trying to get a series of plots from the following dataset and for loop:
> head(all5new[c(6,70,22:23)])#This is a snapshot of my dataset. There is more species, see below.
setID fishery blackdog smoothdog
11 1 TRAWL-PAND.BOR. 0 0
12 1 TRAWL-PAND.BOR. 0 0
13 1 TRAWL-REDFISH 0 0
14 1 TRAWL-PAND.BOR. 0 0
21 10 TRAWL-PAND.BOR. 0 0
22 10 TRAWL-PAND.BOR. 0 0
> elasmo #This is the list of the species for which I would like to have individual barplots
[1] "blackdog" "smoothdog" "spinydog" "mako" "porbeagle"
[6] "blue" "greenland" "portuguese" "greatwhite" "mackerelNS"
[11] "dogfish" "basking" "thresher" "deepseacat" "atlsharp"
[16] "oceanicwt" "roughsagre" "dusky" "sharkNS" "sand"
[21] "sandbar" "smoothhammer" "tiger" "wintersk" "abyssalsk"
[26] "arcticsk" "barndoorsk" "roundsk" "jensensk" "littlesk"
[31] "richardsk" "smoothsk" "softsk" "spinysk" "thorny"
[36] "whitesk" "stingrays" "skateNS" "manta" "briersk"
[41] "pelsting" "roughsting" "raysNS" "skateraysNS" "allSHARK"
[46] "allSKATE" "PELAGIC"
This is my for loop. The code works fine when I run it for one species, however when I run it for all, I always get the same barplot. I know it must be just a quick fix adding for example [[i]] somewhere in the code, but I tried different things without any success.
for (i in elasmo) {
# CALUCLATE THE CATCH PER UNIT OF EFFORT (KG/SET) FOR ALL SPECIES FOR EACH FISHERY
test<-ddply(all5new,.(fishery),summarize, sets=length(as.factor(setID)),LOGcpue=log((sum(i)/length(as.factor(setID)))))
#TAKE THE FIRST 10 FISHERY WITH THE HIGHEST LOGcpue
x<-test[order(-test$LOGcpue)[1:10],]
#REORDER THE FISHERY FACTOR ACCORDINGLY (FOR GGPLOT2, TO HAVE EACH LEVEL IN ORDER)
list<-x$fishery
x$fishery <- factor(x$fishery, levels =list)
#BAR PLOT
graph<-ggplot(x, aes(fishery,LOGcpue)) + geom_bar() + coord_flip() +
geom_text(aes(label=sets,hjust=0.5,vjust=-1),size=4,angle = 270)
#SAVE GRAPH IN NEW DIR
ggsave(graph,filename=paste("barplot",i,".png",sep=""))
}
Here's a subset of my dataset after melting: mydata.
> data.melt<-melt(all5new, id.vars=c("tripID","setID","fishery"), measure.vars = c(22:23))
> head(data.melt);dim(data.melt)
tripID setID fishery variable value
1 1 1 TRAWL-PAND.BOR. blackdog 0
2 1 1 TRAWL-PAND.BOR. blackdog 0
3 1 1 TRAWL-REDFISH blackdog 0
4 1 1 TRAWL-PAND.BOR. blackdog 0
5 1 10 TRAWL-PAND.BOR. blackdog 0
6 1 10 TRAWL-PAND.BOR. blackdog 0
[1] 350100 5
Here's a workflow I use for generating lots of graphs, adapted to your dataset (or, my interpretation of it). This is a nice illustration of the power of plyr, I think. For your application, I don't think calculation times really matter. What is more important for you is generating easy-to-read code, and I think plyr is good for this.
#Load packages
require(plyr)
require(reshape)
require(ggplot2)
#Recreate your data set, with only two species
setID <- rep(1:5, each=4, times=1)
fishery <- gl(10, 2)
blackdog <- sample(1:5, size=20, replace=TRUE)
smoothdog <- sample(1:5, size=20, replace=TRUE)
df <- data.frame(setID, fishery, blackdog, smoothdog)
#Melt the data frame
dfm <- melt(df, id.vars <- c("setID", "fishery"))
#Calculate LOGcpue for each fish at each fishery
cpueDF <- ddply(dfm, c("fishery", "variable"), summarise, LOGcpue = log(sum(value)/length(value)))
#Plot all the data in one (potentially huge) faceted plot.
#(I often use huge plots like this for onscreen analysis
# - obviouly it can't be printed in practice, but you can get a visual overview of the data)
ggplot(cpueDF, aes(x=fishery, y=LOGcpue)) + geom_bar() + coord_flip() + facet_wrap(~variable)
ggsave("giant plot.pdf", height=30, width=30, units="in")
#Print each plot individually to screen, and save it, and put it in a list
printGraph <- function(df) {
p <-ggplot(df, aes(x=fishery, y=LOGcpue)) +
geom_bar() + coord_flip()
print(p)
fn <- paste(df$variable[1], ".png")
ggsave(fn)
printGraph <- p
}
plotList <- dlply(cpueDF, .(variable), printGraph)
#Now pick out the top n fisheries for each fish
cpueDFtopN <- ddply(cpueDF, .(variable), function(x) head(x[order(x$LOGcpue, decreasing=T),], n=5))
ggplot(cpueDFtopN, aes(x=fishery, y=LOGcpue)) + geom_bar() +
coord_flip() + facet_wrap(~variable, scales="free")

Resources