Parallel Processing in R using "parallel" package - r

I have two data frames:
> head(k)
V1
1 1814338070
2 1199215279
3 1283239083
4 1201972527
5 404900682
6 3093614019
> head(g)
start end state value
1 16777216 16777471 queensland 15169
2 16777472 16778239 fujian 0
3 16778240 16779263 victoria 56203
4 16779264 16781311 guangdong 0
5 16781312 16781823 tokyo 0
6 16781824 16782335 aichi 0
> dim(k)
[1] 624979 1
> dim(g)
[1] 5510305 4
I want to compare each value in data.frame(k) and match if it fits between the range of start and end of data.frame(g) and if it does return the value of state and value from data.frame(g)
The problem I have is due to the dimensions of both the data frame and to do the match and return my desired values it takes 5 hours on my computer. I've used the following method but I'm unable to make use of all cores on my computer and not even make it work correctly:
return_first_match_position <- function(int, start,end) {
match = which(int >= start & int <= end)
if(length(match) > 0){
return(match[1])
}
else {
return(match)
}
}
library(parallel)
cl = makeCluster(detectCores())
matches = Vectorize(return_first_match_position, 'int')(k$V1,g$start, g$end)
p = parSapply(cl, Vectorize(return_first_match_position, 'int')(k$V1,g$start, g$end), return_first_match_position)
stopCluster(cl)
desired output is % number of times state and value show up for every match of the number from data.frame(k) in data.frame(g)
Was wondering there there is an intelligent way of doing parallel processing in R ?
And can anyone please suggest (any sources) how I can learn/improve writing functions in R?

I think you want to do a rolling join. This can be done very efficiently with data.table:
DF1 <- data.frame(V1=c(1.5, 2, 0.3, 1.7, 0.5))
DF2 <- data.frame(start=0:3, end=0.9:3.9,
state=c("queensland", "fujian", "victoria", "guangdong"),
value=1:4)
library(data.table)
DT1 <- data.table(DF1, key="V1")
DT1[, pos:=V1]
# V1 pos
#1: 0.3 0.3
#2: 0.5 0.5
#3: 1.5 1.5
#4: 1.7 1.7
#5: 2.0 2.0
DT2 <- data.table(DF2, key="start")
# start end state value
#1: 0 0.9 queensland 1
#2: 1 1.9 fujian 2
#3: 2 2.9 victoria 3
#4: 3 3.9 guangdong 4
DT2[DT1, roll=TRUE]
# start end state value pos
#1: 0 0.9 queensland 1 0.3
#2: 0 0.9 queensland 1 0.5
#3: 1 1.9 fujian 2 1.5
#4: 1 1.9 fujian 2 1.7
#5: 2 2.9 victoria 3 2.0

so instead of editing the last one a lot (pretty much making a new one).. is this what you want:
I noticed that your end is always 1 before the next rows start, so what you want ( i think) is to just find out how many were within each interval and give that interval the state,value for that range. so
set.seed(123)
c1=seq(1,25,4)
c2=seq(4,30,4)
c3=letters[1:7]
c4=sample(seq(1,7),7)
c.all=cbind(c1,c2,c3,c4)
> c.all ### example data.frame that looks similar to yours
c1 c2 c3 c4
[1,] "1" "4" "a" "3"
[2,] "5" "8" "b" "7"
[3,] "9" "12" "c" "2"
[4,] "13" "16" "d" "1"
[5,] "17" "20" "e" "6"
[6,] "21" "24" "f" "5"
[7,] "25" "28" "g" "4"
k1 <- sample(seq(1,18),20,replace=T)
k1
[1] 2 1 15 14 4 15 3 17 18 1 4 3 16 15 2 4 8 11 7 16
fallsin <- cut(k1, c(as.numeric(c.all[,1]), max(c.all[,2])), labels=paste(c.all[,3], c.all[,4],sep=':'), right=F)
fallsin
[1] a:3 a:3 e:6 e:6 a:3 e:6 a:3 f:5 f:5 a:3 a:3 a:3 e:6 e:6 a:3 a:3 c:2 d:1 b:7 e:6
Levels: a:3 b:7 c:2 d:1 e:6 f:5 g:4
prop.table(table(fallsin))
a:3 b:7 c:2 d:1 e:6 f:5 g:4
0.45 0.05 0.05 0.05 0.30 0.10 0.00
where the names of the columns are the 'state:value' and the numbers are the percent of k1 that fall within the range of that label

Related

Create data frame variables based on a function with two matching variable arguments where argument order matters

Here is a toy data frame
df <- data.frame(alpha = c(rep(.005,5)),
a1 = c(1:5),
b1 = c(4:8),
c1 = c(10:14),
a2 = c(9:13),
b2 = c(3:7),
c2 = c(15:19))
Here is a nonsensical toy function that requires two variables, both of which must have the same letter prefix. The specific function calculation is not important. Rather, the issue is how to pass two or more separate named variables to the function from the data frame where the order of the arguments matters.
toy_function <- function(x,y){
z = x+y
w = x/y
v = z+w
return(v)
}
Manual calculation of new variables using the function would look like this. Not practical when you've got dozens or hundreds of variable pairs.
df2 <- df %>%
mutate(va = toy_function(a1,a2),
vb = toy_function(b1,b2),
vc = toy_function(c1,c2)
)
How can I do this across all matching pairs of variables? This problem seems similar to How to use map from purrr with dplyr::mutate to create multiple new columns based on column pairs but that example was applying a simple mathematical function (e.g., +) in which variable order does not matter. I'm having trouble figuring out how to modify it for this case.
Here is one base R approach using split.default.
cbind(df, sapply(split.default(df[-1],
sub('\\d+', '', names(df)[-1])), function(x)
toy_function(x[[1]], x[[2]])))
# alpha a1 b1 c1 a2 b2 c2 a b c
#1 0.005 1 4 10 9 3 15 10.1 8.33 25.7
#2 0.005 2 5 11 10 4 16 12.2 10.25 27.7
#3 0.005 3 6 12 11 5 17 14.3 12.20 29.7
#4 0.005 4 7 13 12 6 18 16.3 14.17 31.7
#5 0.005 5 8 14 13 7 19 18.4 16.14 33.7
We ignore the first column ([-1]) since we don't want to include that in the calculation and create a group of similarly named column and split them into lists. Using sapply we apply toy_function to each element in the list.
sub is used to remove the numbers from the names and create groups to split on.
sub('\\d+', '', names(df)[-1])
#[1] "a" "b" "c" "a" "b" "c"
If you wish to use the tidyverse approach you could do :
library(dplyr)
library(purrr)
unique_names <- unique(sub('\\d+', '', names(df)[-1]))
map_dfc(unique_names, ~df[-1] %>%
select(matches(.x)) %>%
mutate(!!paste0('v', .x) := toy_function(.[[1]], .[[2]])))
# a1 a2 va b1 b2 vb c1 c2 vc
#1 1 9 10.1 4 3 8.33 10 15 25.7
#2 2 10 12.2 5 4 10.25 11 16 27.7
#3 3 11 14.3 6 5 12.20 12 17 29.7
#4 4 12 16.3 7 6 14.17 13 18 31.7
#5 5 13 18.4 8 7 16.14 14 19 33.7
You can do something like this
First, create a dataframe with the function arguments as columns and the values to be used for each function call as rows.
vars <- letters[1:3]
args <- tibble(
arg1 = setNames(paste0(vars, 1), paste0("set_output_names_like_this_", vars)),
arg2 = paste0(vars, 2)
)
> str(args)
tibble [3 x 2] (S3: tbl_df/tbl/data.frame)
$ arg1: Named chr [1:3] "a1" "b1" "c1"
..- attr(*, "names")= chr [1:3] "set_output_names_like_this_a" "set_output_names_like_this_b" "set_output_names_like_this_c"
$ arg2: chr [1:3] "a2" "b2" "c2"
Then, use pmap_dfc
df %>% mutate(pmap_dfc(args, function(arg1, arg2, d) toy_function(d[[arg1]], d[[arg2]]), .data))
Output
alpha a1 b1 c1 a2 b2 c2 set_output_names_like_this_a set_output_names_like_this_b set_output_names_like_this_c
1 0.005 1 4 10 9 3 15 10.11111 8.333333 25.66667
2 0.005 2 5 11 10 4 16 12.20000 10.250000 27.68750
3 0.005 3 6 12 11 5 17 14.27273 12.200000 29.70588
4 0.005 4 7 13 12 6 18 16.33333 14.166667 31.72222
5 0.005 5 8 14 13 7 19 18.38462 16.142857 33.73684

Recode factors to number of my choosing

I like to convert NG to 0, SG=1.25, LG=7.25, MG=26 and HG=40
My actual data that looks exactly like the t below is here:
actual data causing problems
t<-rep(c("NG","SG","LG","MG","HG"),each=5)
colnames(t)<-c("X.1","X1","X2","X4","X8","X12","X24","X48")
Why doesn't this work?
t[t=="NG"] <- "0"
t[t=="SG"] <- "1.25"
t[t=="LG"] <- "7.25"
t[t=="MG"] <- "26"
or this:
factor(t, levels=c("NG","SG","LG","MG", "HG"), labels=c("0","1.25","7.25","26","40"))
or this:
t <- sapply(t,switch,"NG"=0,"SG"=1.25,"LG"=7.25,"MG"=26, "HG"=40)
You may want this:
t <- rep(c(NG = 0, SG = 1.25, LG = 7.25, MG = 26, HG = 40), each = 5)
t <- factor(t)
levels(t)
# [1] "0" "1.25" "7.25" "26" "40"
labels(t)
# [1] "NG" "NG" "NG" "NG" "NG" "SG" "SG" "SG" "SG" "SG" "LG" "LG" "LG" "LG" "LG"
# [16] "MG" "MG" "MG" "MG" "MG" "HG" "HG" "HG" "HG" "HG"
The internal codes for the factor will always be integers, so you can't create a factor with internal codes that are double precision floats.
unclass(t)
# NG NG NG NG NG SG SG SG SG SG LG LG LG LG LG MG MG MG MG MG HG HG HG HG HG
# 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5
# attr(,"levels")
# [1] "0" "1.25" "7.25" "26" "40"
You can still extract the numerical value using the label for a level:
t["SG"]
# SG
# 1.25
# Levels: 0 1.25 7.25 26 40

Creating vectors from regular expressions in a column name

I have a dataframe, in which the columns represent species. The species affilation is encoded in the column name's suffix:
Ac_1234_AnyString
The string after the second underscore (_) represents the species affiliation.
I want to plot some networks based on rank correlations, and i want to color the species according to their species affiliation, later when i create fruchtermann-rheingold graphs with library(qgraph).
Ive done it previously by sorting the df by the name_suffix and then create vectors by manually counting them:
list.names <- c("SG01", "SG02")
list <- vector("list", length(list.names))
names(list) <- list.names
list$SG01 <- c(1:12)
list$SG02 <- c(13:25)
str(list)
List of 2
$ SG01 : int [1:12] 1 2 3 4 5 6 7 8 9 10 ...
$ SG02 : int [1:13] 13 14 15 16 17 18 19 20 21 22 ...
This was very tedious for the big datasets i am working with.
Question is, how can i avoid the manual sorting and counting, and extract vectors (or a list) according to the suffix and the position in the dataframe. I know i can create a vector with the suffix information by
indx <- gsub(".*_", "", names(my_data))
str(indx)
chr [1:29]
"4" "6" "6" "6" "6" "6" "11" "6" "6" "6" "6" "6" "3" "18" "6" "6" "6" "5" "5"
"6" "3" "6" "3" "6" "NA" "6" "5" "4" "11"
Now i would need to create vectors with the position of all "4"s, "6"s and so on:
List of 7
$ 4: int[1:2] 1 28
$ 6: int[1:17] 2 3 4 5 6 8 9 10 11 12 15 16 17 20 22 24 26
$ 11: int[1:2] 7 29
....
Thank you.
you can try:
sapply(unique(indx), function(x, vec) which(vec==x), vec=indx)
# $`4`
# [1] 1 28
# $`6`
# [1] 2 3 4 5 6 8 9 10 11 12 15 16 17 20 22 24 26
# $`11`
# [1] 7 29
# $`3`
# [1] 13 21 23
# $`18`
# [1] 14
# $`5`
# [1] 18 19 27
# $`NA`
# [1] 25
Another option is
setNames(split(seq_along(indx),match(indx, unique(indx))), unique(indx))

Receiving NAs when using cut() to add decile column

New R user. I'm trying to split a dataset based on deciles, using cut according to the process in this question. I want to add the decile values as a new column in a dataframe, but when I do this the lowest value is listed as NA for some reason. This happens regardless of whether include.lowest=TRUE or FALSE. Anyone have any idea why?
Happens when I use this sample set, too, so it's not exclusive to my data.
data <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)
> decile <- cut(data, quantile(data, (0:10)/10, labels=TRUE, include.lowest=FALSE))
> df <- cbind(data, decile)
> df
data decile
[1,] 1 NA
[2,] 2 1
[3,] 3 2
[4,] 4 2
[5,] 5 3
[6,] 6 3
[7,] 7 4
[8,] 8 4
[9,] 9 5
[10,] 10 5
[11,] 11 6
[12,] 12 6
[13,] 13 7
[14,] 14 7
[15,] 15 8
[16,] 16 8
[17,] 17 9
[18,] 18 9
[19,] 19 10
[20,] 20 10
There are two problems, first you have a couple of things wrong with your cut call. I think you meant
cut(data, quantile(data, (0:10)/10), include.lowest=FALSE)
## ^missing parenthesis
Also, labels should be FALSE, NULL, or a vector of length(breaks) containing the required labels.
Second, the main issue is that because you set include.lowest=FALSE, and data[1] is 1, which corresponds to the first break as defined by
> quantile(data, (0:10)/10)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
1.0 2.9 4.8 6.7 8.6 10.5 12.4 14.3 16.2 18.1 20.0
the value 1 doesn't fall into any category; it is beyond the lower limit of the categories defined by your breaks.
I'm not sure what you want, but you could try one of these two alternatives, depending on which class you want 1 to be in:
> cut(data, quantile(data, (0:10)/10), include.lowest=TRUE)
[1] [1,2.9] [1,2.9] (2.9,4.8] (2.9,4.8] (4.8,6.7] (4.8,6.7]
[7] (6.7,8.6] (6.7,8.6] (8.6,10.5] (8.6,10.5] (10.5,12.4] (10.5,12.4]
[13] (12.4,14.3] (12.4,14.3] (14.3,16.2] (14.3,16.2] (16.2,18.1] (16.2,18.1]
[19] (18.1,20] (18.1,20]
10 Levels: [1,2.9] (2.9,4.8] (4.8,6.7] (6.7,8.6] (8.6,10.5] ... (18.1,20]
> cut(data, c(0, quantile(data, (0:10)/10)), include.lowest=FALSE)
[1] (0,1] (1,2.9] (2.9,4.8] (2.9,4.8] (4.8,6.7] (4.8,6.7]
[7] (6.7,8.6] (6.7,8.6] (8.6,10.5] (8.6,10.5] (10.5,12.4] (10.5,12.4]
[13] (12.4,14.3] (12.4,14.3] (14.3,16.2] (14.3,16.2] (16.2,18.1] (16.2,18.1]
[19] (18.1,20] (18.1,20]
11 Levels: (0,1] (1,2.9] (2.9,4.8] (4.8,6.7] (6.7,8.6] ... (18.1,20]

Manipulating list objects in R - processing MCMC output

EDITED BELOW TO SHOW A REALLY NEAT SOLUTION -- THANKS TO HADLEY WICKHAM.
I have a very specific query, but it also relates to some general shortcomings in my R knowledge which I would like to rectify. I'd like also (if possible) not just solve my problem but do so in an elegant and efficient way---maybe I am setting my sights to high. Can anyone both answer my specific queries, but also recommend a good source to find out more? Any help greatly appreciated. It seems Hadley Wickham has wrestled with a similar problem here - http://www.slideshare.net/hadley/plyr-one-data-analytic-strategy - but these are slides from a presentation, and I struggle to understand the slides by themselves.
I am trying to manipulate MCMC output stored in a list in R. The data are grouped into five years, and for each year I have four groups. The goal is to plot these. To make the problem tractable, here is the output for just ten iterations, like so.
iterations [,1] [,2] [,3] [,4]
[1,] 49.184181 4.3515983 16.051958 -14.896019
[2,] 45.910362 2.1738066 17.161775 -29.880989
[3,] 14.575248 7.9476606 8.385455 -34.753004
[4,] 55.029604 2.3422748 16.366960 -66.182627
[5,] 25.338546 8.3039173 16.937638 -26.697235
[6,] 48.633115 0.4698014 16.130142 -65.659992
[7,] 1.356642 3.0249349 2.388576 -1.700559
[8,] 49.831352 -2.0644832 15.403726 -23.378055
[9,] 13.057886 -2.8856576 11.481152 -36.697754
[10,] 50.889166 2.6846852 15.763382 -23.049868
, , 2
iterations [,1] [,2] [,3] [,4]
[1,] 51.6134663 15.659392 17.218244 -47.864892
[2,] 46.0545981 17.067779 18.158151 -38.336587
[3,] 16.5690775 10.386358 10.991029 -30.225820
[4,] 55.5724832 14.840466 15.556193 -54.432882
[5,] 26.1064404 5.656579 15.063810 -5.085942
[6,] 57.3084200 12.551751 16.212203 -52.459935
[7,] 0.9825892 6.651478 1.999976 -5.350995
[8,] 56.1117252 3.204124 16.011812 -21.179722
[9,] 15.4204854 5.761157 12.594028 -43.691113
[10,] 50.1407397 16.404882 15.990908 -26.019990
, , 3
iterations [,1] [,2] [,3] [,4]
[1,] 53.521436 24.340327 16.073063 -20.939950
[2,] 46.040969 21.025351 16.535917 -47.611395
[3,] 19.276578 16.575285 14.824175 -18.432136
[4,] 58.050774 20.886686 15.944355 -37.646286
[5,] 26.008007 11.449253 13.027001 -56.572886
[6,] 61.474771 18.270354 15.879238 -31.316868
[7,] 1.515227 1.434234 3.568761 -1.328706
[8,] 61.725967 19.212081 16.717331 -18.993349
[9,] 15.303739 6.939953 11.940742 -54.261739
[10,] 47.968838 20.070758 17.168400 -48.598802
, , 4
iterations [,1] [,2] [,3] [,4]
[1,] 51.952695 24.267668 17.867717 -28.129743
[2,] 49.680524 22.914727 16.001512 -44.434294
[3,] 18.519755 17.961953 15.831455 -57.110802
[4,] 59.652211 21.655724 16.876315 -24.965724
[5,] 29.091609 20.831196 15.546565 -59.272164
[6,] 62.190041 21.112490 15.759867 -19.910655
[7,] 3.116584 1.187595 1.050807 -7.721749
[8,] 61.384355 27.331487 16.646250 -17.793893
[9,] 16.320224 14.321294 13.726538 -47.748184
[10,] 47.676867 27.325987 17.056364 -31.032911
, , 5
iterations [,1] [,2] [,3] [,4]
[1,] 55.326522 33.737691 19.698060 -46.34804
[2,] 51.122038 31.055026 19.668949 -64.52942
[3,] 22.036674 17.577561 13.546166 -85.24881
[4,] 60.481009 34.300432 16.903054 -25.19277
[5,] 29.168884 26.811356 16.066908 -37.56252
[6,] 54.221450 28.760434 16.480317 -36.42441
[7,] 3.672456 1.571084 2.397663 -10.91522
[8,] 56.223306 30.730421 18.185858 -28.30282
[9,] 16.955258 16.699139 18.101711 -36.85851
[10,] 48.220404 29.749342 17.557532 -38.22831
Some further information:
> str(a.type)
List of 1
$ a_type: num [1:10, 1:4, 1:5] 49.2 45.9 14.6 55 25.3 ...
..- attr(*, "dimnames")=List of 3
.. ..$ iterations: NULL
.. ..$ : NULL
.. ..$ : NULL
What I am looking for (for the immediate problem) is a way of naming the dimensions (i.e. the groups and the years) of this (with the dimnames() command), and second, taking some summary values from each column (group) in each of the five years. Something that will apply the following to each of the four columns for each of the five years:
myfunc <- function(x)c(mean(x),
quantile(x,c(.025,.975)))
Any help greatly appreciated. Also, as I said, if anyone can recommend a good source on problems like this, I might not have to ask questions like this so often in future.
Note added: Based on the helpful answer below, I have figured out part of my problem. I can name the dimensions as follows:
dimnames(a.type[[1]])=list(paste('iter',1:10,sep=''), ## 10 iterations
paste(c("Delivery", "Other", "Regulatory", "Transfer")), ## 4 groups
paste('Year',1:5,sep='')) ## 5 Years
This makes the following (just showing year 1):
> a.type
$a_type
, , Year1
Delivery Other Regulatory Transfer
iter1 49.184181 4.3515983 16.051958 -14.896019
iter2 45.910362 2.1738066 17.161775 -29.880989
iter3 14.575248 7.9476606 8.385455 -34.753004
iter4 55.029604 2.3422748 16.366960 -66.182627
iter5 25.338546 8.3039173 16.937638 -26.697235
iter6 48.633115 0.4698014 16.130142 -65.659992
iter7 1.356642 3.0249349 2.388576 -1.700559
iter8 49.831352 -2.0644832 15.403726 -23.378055
iter9 13.057886 -2.8856576 11.481152 -36.697754
iter10 50.889166 2.6846852 15.763382 -23.049868
So that works. A further question: how can I just name the groups and years---I have not much interest in naming the iterations, and indeed I want to be able to output different numbers of iterations without changing my code. In other words is there a logical way to skip over naming the iterations. If I do...
dimnames(a.type[[1]])=list(, ##
paste(c("Delivery", "Other", "Regulatory", "Transfer")), ## 4 groups
paste('Year',1:5,sep='')) ## 5 Years
...then I get an error message...
> dimnames(a.type[[1]][2:3])=list(#paste('iter',1:10,sep=''), ## 10 years
+ paste(c("Delivery", "Other", "Regulatory", "Transfer")), ## 4 groups
+ paste('Year',1:5,sep='')) ## 5 Years
Error in dimnames(a.type[[1]][2:3]) = list(paste(c("Delivery", "Other", :
'dimnames' applied to non-array
On the other thing, applying a function. I can do the following, but that gives me I think the mean and quantiles across all years:
> myfunc <- function(x)c(mean(x),
+ quantile(x,c(.025,.975)))
>
>
>
>
> a.type.bar <- apply(a.type[[1]], 2, myfunc)
> a.type.bar
Delivery Other Regulatory Transfer
38.351706 14.892788 14.450314 -34.61954
2.5% 1.392323 -1.494269 2.087411 -66.06503
97.5% 61.669447 33.134091 19.335254 -2.46227
>
On the other hand, I can do the following, and apply my function to just one year at a time:
a.type.bar <- apply(a.type[[1]][,,1], 2, myfunc)
Now obviously, that would solve my problem -- I would just have to type five lines of code. But to figure out the deeper problem, is there a way of getting means and quantiles a year at a time?
Thanks.
Note added 17 March 2013. Thanks to Hadley Wickham's marvellous plyr package, I seem to have a solution---and thanks Zach for turning me onto it.
library(plyr)
myfunc <- function(x)c(mean(x),
quantile(x,c(.025,.975)))
summaries <- adply(a.type[[1]], 2:3, myfunc)
This gives the following output.
> summaries
X1 X2 V1 2.5% 97.5%
1 Delivery 1996 78.6691388 39.912455 109.61078
2 Other 1996 4.3485461 -4.584758 16.61764
3 Regulatory 1996 19.6444938 14.135322 24.00373
4 Transfer 1996 -0.7922307 -195.263744 203.95175
5 Delivery 1997 79.6291215 29.853200 109.26860
6 Other 1997 14.3462871 5.607952 22.68043
7 Regulatory 1997 22.4131984 16.861994 30.09017
8 Transfer 1997 4392.7699174 991.168626 8426.64365
9 Delivery 1998 85.9237011 52.100181 115.78991
10 Other 1998 21.4735955 9.790307 37.40546
11 Regulatory 1998 25.5654754 19.558132 30.58021
12 Transfer 1998 6166.7374268 2456.330035 10249.00350
13 Delivery 1999 90.1843678 52.574874 128.28546
14 Other 1999 27.2028622 14.373959 38.54636
15 Regulatory 1999 28.8851480 20.913437 34.59272
16 Transfer 1999 8116.6049650 4186.782183 12030.65517
17 Delivery 2000 91.0299168 47.211931 125.35626
18 Other 2000 31.5885924 16.087480 46.28089
19 Regulatory 2000 31.7628775 21.082236 40.29969
20 Transfer 2000 9203.9975199 2349.851364 14382.00472
All that is left now is to plot this (well, and several other versions of the same model). I am having a play with ggplot.
You want to get your data into a data frame instead of a matrix, and then use the formula interface to aggregate.
Ideally you want to get your MCMC output in a form that you can read directly into a data frame, but if you are stuck with the matrix, then use melt or reshape + as.data.frame or just do something like this (assuming you have a matrix called M with the three dimensions discussed above):
d<-data.frame(year=rep(1991:1995,each=40),
agency=rep(c("D","O","T","R"),50),
iteration=rep(0:9,5,each=4),
spend=as.vector(M))
in order to get a data frame that looks like this:
year agency iteration spend
1 1996 D 0 49.184181
2 1996 O 0 4.351598
3 1996 R 0 16.051958
4 1996 T 0 -14.896019
5 1996 D 1 45.910362
6 1996 O 1 2.173807
7 1996 R 1 17.161775
...
Now you can use aggregate to apply your function, like this:
aggregate(spend~agency+year,d,myfunc)
to get
agency year spend.V1 spend.2.5% spend.97.5%
1 D 1996 35.380610 3.989422 54.098005
2 O 1996 2.634854 -2.700893 8.223760
3 R 1996 13.607076 3.737874 17.111344
4 T 1996 -32.289610 -66.065034 -4.669537
5 D 1997 37.588003 4.231116 57.039164
6 O 1997 10.818397 3.755926 16.918627
...
and now you can slice and dice as you wish
aggregate(spend~year,d,myfunc)
aggregate(spend~agency,d,myfunc)
etc...
I don't know the dimensions of your array , but here an example:
dat <- array(sample(1:5,10*4*5,rep=TRUE),c(10,4,5))
Using dimnames here is a good idea since you have many dimensions, this will help you to understand the output of your aggregation function. You need just to spply a list of names with the good dimensions.
dimnames(dat)=list(paste('year',1:10,sep=''), ## 10 years
paste('group',letters[1:4],sep=''), ## 4 groups
paste('iter',1:5,sep='')) ## 5 iterations
Then using apply to get means by iteration
apply(dat,3,rowMeans)
iter1 iter2 iter3 iter4 iter5
year1 2.25 3.00 3.75 3.00 3.00
year2 3.00 3.00 3.00 2.25 3.25
year3 3.75 3.50 3.50 3.50 3.50
year4 2.00 2.25 3.50 1.50 3.50
year5 2.50 2.50 3.50 2.75 3.50
year6 2.75 3.75 2.00 4.00 2.50
year7 3.50 2.50 3.50 2.50 2.75
year8 3.25 2.75 4.50 2.50 3.75
year9 4.50 3.25 3.25 3.00 2.25
year10 1.75 4.25 3.25 1.50 2.00
To get means by group over years
> apply(dat,3,colMeans)
iter1 iter2 iter3 iter4 iter5
groupa 3.1 3.0 3.3 2.8 2.9
groupb 2.7 3.6 3.0 2.8 2.7
groupc 3.6 3.3 3.4 2.1 3.3
groupd 2.3 2.4 3.8 2.9 3.1

Resources