I need to add an operation (division) in the right corner at the top of every facet. I need to divide value of z by value of x for A, B, C.
for A the operation result is 0.17 (400/2300), for B is 0.1363(30/200) and for C is 0.10 (2/19)
I was going to use annotate, but I read to better use geom_text when using facet
dt<-data.frame(va=c(rep("A",3),rep("B",3), rep("C",3))
,vb=c(rep(letters[24:26],3))
,value=c(23*100,13*100,4*100,22*10,12*10,3*10,19,8,2))
ggplot(data=dt, aes(x=vb,y=value)) +
geom_col(stat="identity",position="dodge")+
facet_wrap(~va,scales="free_y")+
geom_text(aes(label=value))
You can make a new dataframe in which you calculate the result you want to display, as well as the x, y position. Then you can change the dataframe used by the geom_text layer with the data argument.
dt_calc <- dt %>%
pivot_wider(va, vb) %>%
mutate(result = z / x, xpos = "z", ypos = pmax(x, y, z))
ggplot(data = dt, aes(x = vb, y = value)) +
geom_col(position = "dodge") +
facet_wrap( ~ va, scales = "free_y") +
geom_text(aes(x = xpos, y = ypos, label = round(result, 2)), data = dt_calc)
Output:
Related
Suppose I have this data:
xy <- data.frame(cbind(c(1,2,3,4,5,2,3,4),c(rep('A',5),rep('B',3))))
So, when I type
ggplot(xy, aes(x = x, fill = y)) +
geom_histogram(aes(y=..count../sum(..count..)), position = "dodge")
I get this graphic:
But I wanted to see the levels independently leveled, i.e., the red bars leveled to 0.2 and the blue bars leveled to 0.333. How can I achieve it?
Also, how can I set the y-axis to show the numbers in percentage instead of decimals?
Many thanks in advance.
This seems to do the job. It uses ..density.. rather than ..count.., a rather ugly way to count the number of levels in the A/B factor column, and then the scales package to get the labels on the y axis
ggplot(xy, aes(x = x, fill = y)) +
geom_histogram(aes(y=..density../sum(..density..)*length(unique(xy$y)), group = y), position = "dodge") +
scale_y_continuous(labels = scales::percent_format(accuracy = 1))
Alternatively to calculate everything in ggplot, you can first calculate the relative frequency and then use this value to plot it with geom_col. preserve = "single" preserves equal width of the bars:
library(ggplot2)
library(dpylr)
xy <- data.frame(x = c(1,2,3,4,5,2,3,4),
y = c(rep('A',5),rep('B',3)))
xy <- xy %>%
group_by(y, x) %>%
summarise(rel_freq = n()) %>%
mutate(rel_freq = rel_freq / n())
ggplot(xy, aes(x = x, y = rel_freq, fill = y)) +
geom_col(position = position_dodge2(preserve = "single")) +
scale_y_continuous(labels = scales::percent_format(accuracy = 1))
Here is some code which reproduces my issue:
x <- as.factor(1:20)
y <- 1:20
id <- as.factor(c(rep(0,19),1))
g1 <- ggplot() + geom_bar(stat = "identity", aes(x = x, y = y, color = id, fill = id), width = 0.5) + ggtitle("g1")
g1 # First print
y <- 20:1
g2 <- ggplot() + geom_bar(stat = "identity", aes(x = x, y = y, color = id, fill = id), width = 0.5) + ggtitle("g2")
g2
g1 # Second print
As you can see when running the code above, the first time you print g1, you have a barplot starting at (factor 1, y = 1), ending at (factor 20, y = 20).
After having created g2, if you print again g1, it looks the same than g2, except the title which isn't modified.
I'm really puzzled, any help would be much appreciated !
ggplot works best when you pull data from a data.frame rather than the global environment. If you did
x <- as.factor(1:20)
y <- 1:20
id <- as.factor(c(rep(0,19),1))
g1 <- ggplot(data.frame(x, y, id)) +
geom_bar(stat = "identity", aes(x = x, y = y, color = id, fill = id), width = 0.5) +
ggtitle("g1")
y <- 20:1
g2 <- ggplot(data.frame(x, y, id)) +
geom_bar(stat = "identity", aes(x = x, y = y, color = id, fill = id), width = 0.5) +
ggtitle("g2")
everything would work fine.
The "problem" is that ggplot doesn't actually "build" the plot until you print it. And when you are linking to variable names with aes(), it just tracks the variable name, not the value. So it uses whatever the current value is when the plot prints. When we "trap" data inside a data.frame, we are capturing the current value of the variable so that we can use that later.
I am trying to generate a ternary plot using ggtern.
My data ranges from 0 - 1000 for x, y,and z variables. I wondered if it is possible to extend the axis length above 100 to represent my data.
#Nevrome is on the right path, your points will still be plotted as 'compositions', ie, concentrations sum to unity, but you can change the labels of the axes, to indicate a range from 0 to 1000.
library(ggtern)
set.seed(1)
df = data.frame(x = runif(10)*1000,
y = runif(10)*1000,
z = runif(10)*1000)
breaks = seq(0,1,by=0.2)
ggtern(data = df, aes(x, y, z)) +
geom_point() +
limit_tern(breaks=breaks,labels=1000*breaks)
I think there is no direct solution to do this with ggtern. But an easy workaround could look like this:
library(ggtern)
df = data.frame(x = runif(50)*1000,
y = runif(50)*1000,
z = runif(50)*1000,
Group = as.factor(round(runif(50,1,2))))
ggtern() +
geom_point(data = df, aes(x/10, y/10, z/10, color = Group)) +
labs(x="X", y="Y", z="Z", title="Title") +
scale_T_continuous(breaks = seq(0,1,0.2), labels = 1000*seq(0,1,0.2)) +
scale_L_continuous(breaks = seq(0,1,0.2), labels = 1000*seq(0,1,0.2)) +
scale_R_continuous(breaks = seq(0,1,0.2), labels = 1000*seq(0,1,0.2))
Oh wise ones: I've got a question about the use of geom_linerange(), attached is what I hope is a workable example to illustrate my problem.
b=c(100,110,90,100,120,130,170,150,150,120,140,150,120,90,90,100,40,50,40,40,20,60,30)
test<-data.frame(a=c(2,2,2,4,4,4,4,6,6,6,6,6,6,8,8,8,10,10,10,10,10,10,10),
b=b,c=c(b-15))
testMelt <- melt(
test,
id = c("a"),
measured = c("b", "c")
)
p <- ggplot(
aes(
x = factor(a),
y = value,
fill= variable
),
data = testMelt) +
geom_boxplot() +
stat_smooth(aes(group=variable,x=factor(a),y=value,fill=factor(variable)),data=testMelt)
My actual dataset is much larger, and the boxplots are a bit overwhelming. I think what I want is to use geom_linerange() somehow to show the range of the data, at "b" and "c", at each value of "a".
The best I've come up with is:
p<- p+ geom_linerange(aes(as.factor(a),ymin=min(value),ymax=value,color=variable))
I can assume the "c" values are always equal to or less than "b", but if the range is smaller, this "covers it up". Can I jitter the lines somehow? Is there a better solution?
In your geom_linerange call, add an additional argument position=position_dodge(width=0.3). You can adjust the absolute width to change the separation between the vertical lines.
My understanding of the question is that you want the line range to reflect the range for the combination a:b:c.
geom_linerange(aes(as.factor(a),ymin=min(value),ymax=value,color=variable)) will set the minimum value to the whole-dataset minimum (hence all the lines appear with the same minimum value.
A couple of solutions.
Calculate the minima and maxima yourself
test_range <- ddply(testMelt, .(a,variable), summarize,
val_min = min(value), val_max = max(value))
then run
ggplot(data = testMelt) +
geom_boxplot(aes(x = factor(a), y = value, fill = variable)) +
stat_smooth(aes(group = variable, x = factor(a), y = value,
fill = factor(variable))) +
geom_linerange(data = test_range, aes(x = as.factor(a), ymin = val_min,
ymax = val_max, color = variable),
position = position_dodge(width = 0.3))
Or, for an alternative to boxplots / line range use a violin plot.
ggplot(data = testMelt) +
geom_violin(aes(x = factor(a), y = value, fill = variable)) +
stat_smooth(aes(group = variable, x = factor(a), y = value,
fill = factor(variable)))
I'm using ggplot2 to show lines and points on a plot. What I am trying to do is to have the lines all the same color, and then to show the points colored by an attribute. My code is as follows:
# Data frame
dfDemo <- structure(list(Y = c(0.906231077471568, 0.569073561538186,
0.0783433165521566, 0.724580209473378, 0.359136092118470, 0.871301974471722,
0.400628333618918, 1.41778205350433, 0.932081770977729, 0.198188442350644
), X = c(0.208755495088456, 0.147750173706688, 0.0205864576474412,
0.162635017485883, 0.118877260137735, 0.186538613831806, 0.137831912094464,
0.293293029083812, 0.219247919537514, 0.0323148791663826), Z = c(11112951L,
11713300L, 14331476L, 11539301L, 12233602L, 15764099L, 10191778L,
12070774L, 11836422L, 15148685L)), .Names = c("Y", "X", "Z"
), row.names = c(NA, 10L), class = "data.frame")
# Variables
X = array(0.1,100)
Y = seq(length=100, from=0, by=0.01)
# make data frame
dfAll <- data.frame()
# make data frames using loop
for (x in c(1:10)){
# spacemate calc
Floors = array(x,100)
# include label
Label = paste(' ', toString(x), sep="")
df1 <- data.frame(X = X * x, Y = Y, Label)
# merge df1 to cumulative df, dfAll
dfAll <- rbind(dfAll, df1)
}
# plot
pl <- ggplot(dfAll, aes(x = X, y = Y, group = Label, colour = 'Measures')) + geom_line()
# add points to plot
pl + geom_point(data=dfDemo, aes(x = X, y = Y)) + opts(legend.position = "none")
This almost works, but I am unable to color the points by Z when I do this. I can plot the points separately, colored by Z using the following code:
ggplot(dfDemo, aes(x = X, y = Y, colour = Z)) + geom_point()
However, if I use the similar code after plotting the lines:
pl + geom_point(data=dfDemo, aes(x = X, y = Y, colour = Z)) + opts(legend.position = "none")
I get the following error:
Error: Continuous variable () supplied to discrete scale_hue.
I don't understand how to add the points to the chart so that I can colour them by a value. I appreciate any suggestion how to solve this.
The issue is that they are colliding the two colour scales, one from the ggplot call and the other from geom_point. If you want the lines of one colour and the points of different colours then you need to erase the colour setting from ggplot call and put it inside the geom_line outside the aes call so it isn't mapped. Use I() to define the colour otherwise it will think is just a variable.
pl <- ggplot(dfAll, aes(x = X, y = Y, group = Label)) +
geom_line(colour = I("red"))
pl + geom_point(data=dfDemo, aes(x = X, y = Y, colour = Z)) +
opts(legend.position = "none")
HTH