Midcentered bar graphs - r

I have a data set with percentages in four categories: the top two categories are "positive" and the bottom two categories are "negative" so I want to align the boundary between 2 and 3 so it's the zero point on all the bars. (I'm plotting pairs of bars: on set of bars for ext.obs=0 and one for ext.obs=1.) Here's a portion of the data:
structure(list(ext.obs = c(0, 0, 0, 1, 1, 1), comp = c(1, 2,
3, 1, 2, 3), `1` = c(0.00617283950617284, 0.00609756097560976,
0.0111111111111111, 0, 0, 0), `2` = c(0.154320987654321, 0.195121951219512,
0.161111111111111, 0.211180124223602, 0.392638036809816, 0.23030303030303
), `3` = c(0.709876543209877, 0.676829268292683, 0.666666666666667,
0.745341614906832, 0.521472392638037, 0.721212121212121), `4` = c(0.12962962962963,
0.121951219512195, 0.161111111111111, 0.0434782608695652, 0.0858895705521472,
0.0484848484848485)), .Names = c("ext.obs", "comp", "1", "2",
"3", "4"), row.names = c(1L, 2L, 3L, 11L, 12L, 13L), class = "data.frame")
I would like to be able to put together a matrix with these data that I can just do barplot(datamatrix) and have it come out nice. But I can't figure out any way other than plotting the top two categories and then adding the bottom two categories using barplot(..., add=T).
Here's the code I wrote (I actually plot 10 pairs of bars with par(mfrow=c(1, 10)) looping though for(i in 1:10) ):
bar.loc <- barplot(t(as.matrix(tab3[c(i, i+10), c(5,6)])),
ylim=c(-0.5, 1.0),
col=my.pal[3:4],
xaxt="n",
yaxt="n",
ylab="",
xlab=components[i]
)
barplot(t(as.matrix(tab3[c(i, i+10), c(4, 3)]*(-1))),
add=T,
col=my.pal[2:1],
yaxt="n",
xaxt="n",
ylab="",
xlab="")
You can see part of the finished product here or the image is below:
Can anyone think of a more elegant way to do this?

Try this:
barplot( t(cbind(tab3[,5:6],-tab3[,6:5],-tab3[,4:3])),
col=c('lightblue','darkblue',NA,NA,'tan','brown') )

Related

How to plot many probability density functions (pdfs) without sharp edges?

I have an issue with plotting continuous distributions without sharp edges in ggplot2. I need to show two of them on one plot. Also, it does not have to be ggplot2 to achieve this result.
I am aware, that the number of data points directly influences the smoothness of the line, but it was not the case here. Below you can see some sample data (from dput)
sample.data<-list(beta.data = structure(list(cluster = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), beta.density = c(0, 3.42273368363934e-43, 8.42987148403021e-29,
2.04764468657484e-20, 1.69485562831516e-14, 6.07999638837842e-10, 2.88180370232676e-06, 0.00314064636750876, 0.954118897015866, 0, 0, 3.80101893822358e-36, 6.43342582657081e-22, 6.82956252277493e-14, 1.75291058931833e-08, 0.000131874335695378, 0.0750918340641428, 3.72532418759802, 5.05242078519544, 0), pr = c(0, 0.111111111111111, 0.222222222222222, 0.333333333333333, 0.444444444444444, 0.555555555555556, 0.666666666666667, 0.777777777777778, 0.888888888888889, 1, 0, 0.111111111111111, 0.222222222222222, 0.333333333333333, 0.444444444444444, 0.555555555555556, 0.666666666666667, 0.777777777777778, 0.888888888888889, 1)), row.names = c(NA, -20L), class = "data.frame"), beta.params = structure(list(cluster = 1:2, a = c(49, 50), b = c(2, 10), ni.beta = c(0.961,0.833), sd.beta = c(0.00072, 0.00228)), row.names = c(NA,-2L), class = "data.frame"))
Before I was using geom_col, but it discretizes values. I went with geom_area:
ggplot(sample.data$beta.data, aes(x = pr, y = beta.density)) +
geom_area(stat = "function",
fun = dbeta,
args = list(shape1 = sample.data$beta.params[1,"a"], shape2 = sample.data$beta.params[1,"b"]),
fill = "#F1C40F",
colour = "black",
alpha = 0.7) +
geom_area(stat = "function",
fun=dbeta,
args = list(shape1 = sample.data$beta.params[2,"a"], shape2 = sample.data$beta.params[2,"b"]),
fill = "#3498DB",
colour = "black",
alpha = 0.7)
I presented you the data with 10 points, but 1000 points look almost the same. It is not the case here, where even 100 points looks ok:
p = seq(0,1, length=100)
plot(p, dbeta(p, 50, 10), ylab="Density", type ="l", col=4, , lwd = 2)
Here I am attaching code to simulate the data. Oh, and these troublesome beta parameters were a = 49 and b = 2.
len <- 100
p <- seq(0,1, length.out = len)
df <- data.frame(rbind(cbind("cl" = rep(1, times = length(p)), "beta" = dbeta(p, 50, 10),"p"= p),
cbind("cl" = rep(1, times = length(p)), "beta" = dbeta(p, 40, 2),"p"= p)))
Do you have any ideas?
EDIT: The pdfs stands here for probability density functions. That is why I have not put "pdf" as a tag. My apologies for the confusion!
Anyway, when I tried to print graphic to PDF file, the result was poor as well (sharp edges). But it the end, it shouldn't matter. I want to see smooth lines whatever I do (reasonably).
EDIT2 It is possible to achieve because:
library(mosaic)
theme_set(theme_bw())
xpbeta(c(0.7, 0.90), shape1 = 49, shape2 = 2)
It produces nice, smoothed beta dist with parameters (49, 2). But then again, I need to show two dists in one chart.
I have found the answer. It still needs some editing (like transparency/alpha which I couldn't figure out), but in general, this is what I meant. Code:
library(mosaic)
plotDist('beta', params=list(49,2), kind='density', type = "h", col = "#3498DB", xlim = c(0,1))
plotDist('beta', params=list(50, 10), kind='density', , type = "h", col = "#F1C40F", add = TRUE)
plotDist('beta', params=list(49,2), kind='density', add = TRUE, col = "black")
plotDist('beta', params=list(50, 10), kind='density', add = TRUE, col = "black")
Result:
We can add as many distributions as we want, using "add" parameter.
Parameter type = "h", is used to draw filled distribution. Without it, the only line is visible. In my answer, I draw the two lines and two filled dists. I would be really happy if someone could show a better answer, though.
EDIT:
I think I found my perfect answer!
Here is the code:
library(ggformula)
theme_set(theme_bw())
gf_dist("beta", shape1 = 49, shape2 = 2, geom = "area", alpha = 0.5, fill = "#F1C40F") %>%
gf_dist("beta", shape1 = 49, shape2 = 2) %>%
gf_dist("beta", shape1 = 50, shape2 = 10, geom = "area", alpha = 0.5, fill = "#3498DB") %>%
gf_dist("beta", shape1 = 50, shape2 = 10)
It is much faster than the previous code, parameter alpha is obvious and it is relatively easy to combine many plots! Because of transparency, you can nicely see the overlap of both distributions.

Putting x-axis labels directly under tick marks in barplots in R

I have a table (below) showing the percentage of tree species (categorical variable) present in a group experiment. My objective is to plot the percentage of tree species on the y-axis and 'Species' on the x-axis within a barplot.
Issue
My problem is that I am experiencing problems with formatting the x-axis correctly. My objective is to ensure that the x-axis labels for**'Species'** are:-
Positioned directly underneath their bar at the tick mark
Do not overlap onto the plotting area
If anyone can help solve this issue, I would be incredibly grateful.
R code
df <- leaf.percent[order(leaf.percent$Leaf.Percentge, decreasing = TRUE),]
Tree.labels<-c("Quercus robar", "Quercus Patraea",
"Deciduous", "Oak",
"Plant", "Shrub")
par(mar=c(6, 6, 3, 3))
Tree<-barplot(df$Leaf.Percentge, names.arg = df$Species,
xaxt = "n",
ylab="Percentage %",
xlab="Tree Species",
col="lightblue",
ylim = c(0, 60))
axis(1, at=Tree, labels=FALSE)
text(seq(1, 6, by=1), par("usr")[3] - 0.2,
labels=unique(Tree.labels),
srt = 25, pos = 1,
xpd = TRUE, cex=0.7)
DATA
structure(list(Species = structure(1:6, .Label = c("Deciduous",
"Oak", "Plant", "Quercus_petraea", "Quercus_robur", "Shrub"), class = "factor"),
Frequency = c(48L, 29L, 6L, 70L, 206L, 4L), Leaf.Percentge = c(13.2231404958678,
7.98898071625344, 1.65289256198347, 19.2837465564738, 56.7493112947658,
1.10192837465565)), .Names = c("Species", "Frequency", "Leaf.Percentge"
), row.names = c(NA, -6L), class = "data.frame")

How to properly index list items to return rows, not columns, inside a for loop

I'm trying to write a for loop within another for loop. The first loop grabs the ith vcov matrix from a list of variously sized matrices (vcmats below) and grabs a frame of 24 predictor models of appropriate dimension to multiply with the current vcov matrix from a list of frames (jacobians below) for the different models. The second loop should pull the jth record (row) from the selected predictor frame, correctly format it, then run the calculation with the vcov matrix and output an indicator variable and calculated result needed for post processing to the holding table (holdtab).
When I run the code below I get the following error: Error in jjacob[, 1:4] : incorrect number of dimensions because R is returning the column of 1s (i.e. the intercept column of jacobs), not the complete first record (i.e. jjacob = jacobs[1,]). I've substantially simplified the example but left enough complexity to demonstrate the problem. I would appreciate any help in resolving this issue.
vcmats <- list(structure(c(0.67553, -0.1932, -0.00878, -0.00295, -0.00262,
-0.00637, -0.1932, 0.19988, 0.00331, -0.00159, 0.00149, 2e-05,
-0.00878, 0.00331, 0.00047, -6e-05, 3e-05, 3e-05, -0.00295, -0.00159,
-6e-05, 0.00013, -2e-05, 6e-05, -0.00262, 0.00149, 3e-05, -2e-05,
2e-05, 0, -0.00637, 2e-05, 3e-05, 6e-05, 0, 0.00026), .Dim = c(6L,
6L)), structure(c(0.38399, -0.03572, -0.00543, -0.00453, -0.00634,
-0.03572, 0.10912, 0.00118, -0.00044, 0.00016, -0.00543, 0.00118,
0.00042, -3e-05, 4e-05, -0.00453, -0.00044, -3e-05, 0.00011,
5e-05, -0.00634, 0.00016, 4e-05, 5e-05, 0.00025), .Dim = c(5L,
5L)))
jacobians <- list(structure(list(intcpt = c(1, 1, 1, 1), species = c(1, 1,
0, 0), nage = c(6, 6, 6, 6), T = c(12, 50, 12, 50), hgt = c(90,
90, 90, 90), moon = c(7, 7, 7, 7), hXm = c(0, 0, 0, 0), covr = c(0,
0, 0, 0), het = c(0, 0, 0, 0)), .Names = c("intcpt", "species",
"nage", "T", "hgt", "moon", "hXm", "covr", "het"), row.names = c("1",
"1.4", "1.12", "1.16"), class = "data.frame"), structure(list(
intcpt = c(1, 1, 1, 1), species = c(1, 1, 0, 0), nage = c(6,
6, 6, 6), T = c(12, 50, 12, 50), hgt = c(0, 0, 0, 0), moon = c(7,
7, 7, 7), hXm = c(0, 0, 0, 0), covr = c(0, 0, 0, 0), het = c(0,
0, 0, 0)), .Names = c("intcpt", "species", "nage", "T", "hgt",
"moon", "hXm", "covr", "het"), row.names = c("2", "2.4", "2.12",
"2.16"), class = "data.frame"))
holdtab <- structure(list(model = structure(c(4L, 4L, 4L, 4L, 5L, 5L, 5L,
5L), .Label = c("M.1.BaseCov", "M.2.Height", "M.5.Height.X.LastNewMoon",
"M.6.Height.plus.LastNew", "M.7.LastNewMoon", "M.G.Global"), class = "factor"),
aicc = c(341.317, 341.317, 341.317, 341.317, 342.1412, 342.1412,
342.1412, 342.1412), species = c(NA, NA, NA, NA, NA, NA,
NA, NA), condVar = c(NA, NA, NA, NA, NA, NA, NA, NA)), .Names = c("model",
"aicc", "species", "condVar"), row.names = c(1L, 2L, 3L, 4L,
25L, 26L, 27L, 28L), class = "data.frame")
jloop <- 1
for (imat in vcmats) { # Call the outside loop of vcov matrices
jacobs = jacobians[[jloop]] # Set tempvar jacobs as the jth member of the jacobians frame (n/24)
for (jjacob in jacobs) { # Call inside loop of lines in jacob (each individual set of predictor levels)
# I need to reduce the vector length to match my vcov matrix so
pt1 = jjacob[,1:4] # Separate Core columns from variable columns (because I don't want to drop species when ==0)
pt2 = jjacob[,5:9] # Pull out variable columns for next step
pt2 = pt2[,!apply(pt2 == 0, 2, all)] # Drop any variable columns that ==0
jjacob = cbind(pt1, pt2) # Reconstruct the record now of correct dimensions for the relevant vcov matrix
jjacob = as.matrix(jjacob) # Explicitly convert jjmod - I was having trouble with this previously
tj = (t(jjacob)) # Transpose the vector
condvar = jjacob %*% imat %*% tj # run the calculation
condVarTab[record,3] = jjacob[2] # Write species 0 or 1 to the output table
condVarTab[record,4] = condvar # Write the conditional variance to the table
record = record+1 # Iterate the record number for the next output run
}
jloop = jloop+1 # Once all 24 models in a frame are calculated iterate to the next frame of models which will be associated with a new vcv matrix
}

How to change the color of dendrogram for each group in a cluster

Here is my data
df<- structure(list(name = structure(c(2L, 12L, 1L, 16L, 14L, 10L,
9L, 5L, 15L, 4L, 8L, 13L, 7L, 6L, 3L, 11L), .Label = c("All",
"Bab", "boro", "bra", "charli", "delta", "few", "hora", "Howe",
"ist", "kind", "Kiss", "myr", "No", "TT", "where"), class = "factor"),
value = c(1.251, -1.018, -1.074, -1.137, 1.018, 1.293, 1.022,
-1.008, 1.022, 1.252, -1.005, 1.694, -1.068, 1.396, 1.646,
1.016)), .Names = c("name", "value"), class = "data.frame", row.names = c(NA,
-16L))
here what I do
d <- dist(as.matrix(df$value),method = "euclidean")
#compute cluster membership
hcn <- hclust(d,method = "ward.D2")
plot(hcn)
and it gives me what I want as follows
Here all groups are shown by black color and the dendrogram is not that clear what I want is to change the color of each group and also use the name in vertical instead the number and finally I want to be able to remo the hclust(."ward.D2") while change the x label and y label as I want
You could use the dendextend package, aimed for tasks such as this:
# install the package:
if (!require('dendextend')) install.packages('dendextend'); library('dendextend')
## Example:
dend <- as.dendrogram(hclust(dist(USArrests), "ave"))
d1=color_branches(dend,k=5, col = c(3,1,1,4,1))
plot(d1) # selective coloring of branches :)
d2=color_branches(d1,k=5) # auto-coloring 5 clusters of branches.
plot(d2)
# More examples are in ?color_branches
You can see many examples in the presentations and vignettes of the package, in the "usage" section in the following URL: https://github.com/talgalili/dendextend
Or you can use also:
You should use dendrapply.
For instance:
# Generate data
set.seed(12345)
desc.1 <- c(rnorm(10, 0, 1), rnorm(20, 10, 4))
desc.2 <- c(rnorm(5, 20, .5), rnorm(5, 5, 1.5), rnorm(20, 10, 2))
desc.3 <- c(rnorm(10, 3, .1), rnorm(15, 6, .2), rnorm(5, 5, .3))
data <- cbind(desc.1, desc.2, desc.3)
# Create dendrogram
d <- dist(data)
hc <- as.dendrogram(hclust(d))
# Function to color branches
colbranches <- function(n, col)
{
a <- attributes(n) # Find the attributes of current node
# Color edges with requested color
attr(n, "edgePar") <- c(a$edgePar, list(col=col, lwd=2))
n # Don't forget to return the node!
}
# Color the first sub-branch of the first branch in red,
# the second sub-branch in orange and the second branch in blue
hc[[1]][[1]] = dendrapply(hc[[1]][[1]], colbranches, "red")
hc[[1]][[2]] = dendrapply(hc[[1]][[2]], colbranches, "orange")
hc[[2]] = dendrapply(hc[[2]], colbranches, "blue")
# Plot
plot(hc)
I get this information from: How to create a dendrogram with colored branches?
We could instead draw rectangles around groups, let's say there are 5 groups(k = 5):
# plot dendogram
plot(hcn)
# then draw dendogram with red borders around the 5 clusters
rect.hclust(hcn, k = 5, border = "red")
EDIT:
Remove x axis label, and add names instead of numbers:
plot(hcn, xlab = NA, sub = NA, labels = df$name)
rect.hclust(hcn, k = 5, border = "red")

Adding and specifying legend to the multiple chart.rolling Correlation plots

For simplicity, suppose I have the following zoo object:
x.ts<- structure(c(103.7, 103.2, 103.1, 105.4, 102.1, 103.5, 103.1,
102.6, 102.2, 104.6, -2.1, -1, -3, 2, -1, 1, -1, -1, -1, 0, -25,
-25, -25, -25, -25, -25, -21, -21, -20, -20), .Dim = c(10L, 3L
), .Dimnames = list(c("1", "2", "3", "4", "5", "6", "7", "8",
"9", "10"), c("a", "b", "c")), index = structure(c(1985, 1985.08333333333,
1985.16666666667, 1985.25, 1985.33333333333, 1985.41666666667,
1985.5, 1985.58333333333, 1985.66666666667, 1985.75), class = "yearmon"), class = "zoo")
I am interested in plotting the rolling correlation between each pairs of the time series in the object. I used chart.RollingCorrelation from PerformanceAnalytics package and plot the charts as follow:
par.corr<-par(mfrow=c(1,2), oma = c(1, 1, 1, 1), mar = c(2, 2, 2, 2))
chart.RollingCorrelation(x.ts[, 1, drop=FALSE],
x.ts[, 1:3, drop=FALSE],
colorset=rich8equal,legend.loc = "right",
width=3, main = "a to b and c")
chart.RollingCorrelation(x.ts[, 2, drop=FALSE],
x.ts[, c(1,3), drop=FALSE],
colorset=rich8equal,legend.loc = "right",
width=3, main = "b to a and c")
mtext("Rolling 3 Month Correlation", side=3, line=-0.3, outer=TRUE, cex=1.2)
par(par.corr)
I get the following plot:
I need to get a common legend for both charts with one colour describing each relationship and place it to the bottom of the plot. I have tried to remove the legend specification from the chart.RollingCorrelation's arguments and add another plot with my customised legend but there is an issue with that. Because every chart in the plot has been plotted separately, you will find that two different relationships are represented by the same colour. So it seems I need to change something in the way I apply the function.

Resources