Increasing size of circles in ggplot2 graphs [duplicate] - r

This question already has answers here:
How to increase size of the points in ggplot2, similar to cex in base plots?
(2 answers)
Closed 8 years ago.
I want to increase the scale of circles in ggplot2. I tried something like this aes(size=100*n) but it did not work for me.
df <-
structure(list(Logit = c(-2.9842723737754, 1.49511606166294,
-2.41756623714116, -2.96160412831003, -2.12996384688938, -1.61751836789074,
-0.454353048358851, 0.9284099250287, -0.144082412641708, -2.30422500981431,
-0.658367257547178, 0.082600042011989, -0.318343575566633, -0.717447827238429,
-1.0508122312565, -2.82559465551781, 0.361703788394458, -1.85086010050691,
-0.0916611209129359, -0.740116072703798, 0.0599317965466193,
-0.370764867295404, -0.703703748477917, -0.749040239408657, -2.7575899191217,
-2.51532401980067, 1.38177483433609, 1.47244781619757, -0.205002348239784,
0.135021333740761), PRes = c(-0.661648371860934, 1.63444424896772,
-0.30348016008728, -0.230651042355737, 1.07487559116003, -0.460143991337599,
-0.823052248365889, -0.999903730870253, -0.959022180953211, -0.321344960297977,
-1.40881799070885, -0.674754839222841, 0.239931843185434, -1.81660411888874,
0.830318780187542, -0.24702802619469, 0.692695708496924, -0.40412065378683,
-0.977640032689132, -0.715192962242284, -1.06270128658429, -0.856103053117159,
-0.731162073769824, 1.51334938767359, 4.02946801536109, 3.56902361409375,
0.505952430753934, 0.483660641952208, 1.13712619443209, 0.951889504154342
), n = c(7L, 38L, 1L, 1L, 11L, 1L, 1L, 4L, 1L, 1L, 3L, 9L, 2L,
8L, 2L, 1L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L)), .Names = c("Logit", "PRes", "n"), row.names = c(NA, -30L
), class = "data.frame")
library(ggplot2)
ggplot(data=df, mapping=aes(x=Logit, y=PRes, label=rownames(df))) +
geom_point(aes(size=n), shape=1, color="black") +
geom_text() +
theme_bw() +
theme(legend.position="none")

Simply add a scale for size:
+ scale_size_continuous(range = c(10, 15))

Related

Labels on wrong dodged columns in geom_col

I want to create a simple barplot of my data frame:
> dput(corr)
structure(list(`sample length` = structure(c(2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L), .Label = c("3s", "10s"), class = "factor"),
feature = structure(c(1L, 1L, 5L, 5L, 2L, 5L, 6L, 5L, 5L,
4L, 1L, 1L, 1L, 1L, 1L, 2L, 5L, 5L, 3L, 4L, 1L, 1L, 1L, 1L
), .Label = c("f0", "f1", "f2", "f3", "f2 prime", "f2-f1"
), class = "factor"), measure = c("meanf0 longterm", "meanf0 longterm st",
"f2' Fant", "f2' Carlson", "F1meanERB", "F2meanERB", "f2-f1 ERB",
"f2' Fant", "f2' Carlson", "F3meanERB", "meanf0 3secs", "meanf0 3secs st",
"meanf0 10secs", "meanf0 longterm", "meanf0 longterm st",
"F1meanERB", "f2' Fant", "f2' Carlson", "F2meanERB", "F3meanERB",
"meanf0 longterm", "meanf0 longterm st", "meanf0 3secs",
"meanf0 3s st"), score = c(0.574361009949897, 0.592472685498182,
0.597453479834514, 0.529641256460457, 0.585994252821649,
0.618734735308094, 0.517715270144259, 0.523916918327387,
0.616237363007349, 0.732926257362305, 0.649505366093518,
0.626628120773466, 0.522527636952945, 0.53968850323167, 0.548664887822775,
0.648294358978928, 0.650806695307235, 0.696797693503567,
0.621298393945597, 0.57140950987443, 0.606634531002859, 0.597064217305556,
0.582534743353082, 0.572808145210493), dimension = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L), .Label = c("1", "2", "3",
"4"), class = "factor")), row.names = c(NA, -24L), class = c("tbl_df",
"tbl", "data.frame"))
I have tried the following code:
ggplot(data=corr, aes(x=factor(dimension), y=score)) +
geom_col(aes(fill=feature),position=position_dodge2(width=1,preserve='single')) +
facet_grid(~`sample length`, scales='free_x',space='free_x') +
labs(x="Dimension", y="Correlation Coefficient (Abs. value)") +
geom_text(aes(label=measure),position=position_dodge2(width=0.9, preserve='single'), angle=90,
size=4,hjust=2.5,color='white')
Giving the following barplot:
However, the labels for 'measure' are being incorrectly assigned to the columns. E.g. for 3s facet plot, under 'dimension 2', the two light blue bars should be labelled as 'f2' Carlson' and 'f2' Fant' but they have been swapped with the other two labels.
I think the levels must be wrong, but I don't understand how!
Any help much appreciated, ta
The problem of switching labels comes from geom_text() not knowing how the information should be split for the purposes of dodging. The solution is to supply a group= aesthetic to geom_text() that matches the fill= aesthetic specified for geom_col().
In the case of geom_col(), you specify aes(fill=feature). The height of the different columns is therefore grouped automatically by corr$feature. You can supply a group= aesthetic as well, but it's unnecessary and the dodging will happen as you expect.
In the case of geom_text(), there is no obvious way to group the data. When you do not specify a group= aesthetic, ggplot2 chooses one of the columns (in this case, the first column number) for grouping. For dodging to work here, you need to specify how the label information is grouped. If you don't have a specific legend-associated aesthetic to choose here, you can use the group= aesthetic to specify group=feature. This let's ggplot2 know that the text labels should be sorted and dodged by grouping according to this column in the data:
ggplot(data=corr, aes(x=factor(dimension), y=score)) +
geom_col(aes(fill=feature),position=position_dodge2(width=1,preserve='single')) +
facet_grid(~`sample length`, scales='free_x',space='free_x') +
labs(x="Dimension", y="Correlation Coefficient (Abs. value)") +
geom_text(aes(label=measure, group=feature),position=position_dodge2(width=0.9, preserve='single'), angle=90,
size=4,hjust=2.5,color='white')
As a side note, you don't have to specify the group= aesthetic if you assign a color-based aesthetic (or one that would result in a legend). If we set color=feature with geom_text(), it works without group=. To see the labels, you need to set the alpha for the columns a bit lower, but this should illustrate the point well:
ggplot(data=corr, aes(x=factor(dimension), y=score)) +
geom_col(aes(fill=feature),position=position_dodge2(width=1,preserve='single'), alpha=0.2) +
facet_grid(~`sample length`, scales='free_x',space='free_x') +
labs(x="Dimension", y="Correlation Coefficient (Abs. value)") +
geom_text(aes(label=measure, color=feature),position=position_dodge2(width=0.9, preserve='single'), angle=90,
size=4,hjust=2.5)

Have two colour scales ggplot [duplicate]

This question already has answers here:
Assign color to 2 different geoms and get 2 different legends
(3 answers)
Closed 4 years ago.
I am trying to change have separate colors for my lines and points. My data is split by Arm so at each time-point there should be two dots and two lines connecting them to the previous and future time-point.
I can get both the line and dot colors to change together, but I would like the line to be a different colour, still based on Arm though. As in, I want the lines to be light blue for Arm=1 and yellow for Arm=2, but the dots to stay they color shown below. Is this possible with ggplot?
Any help would be much appreciated.
What I have:
Code:
ggplot(head(TOT, 12), aes(x=VisitNo, y=Mean)) +
geom_line(size=1.5, aes(color=as.factor(Arm))) +
geom_point(size=3, aes(color=as.factor(Arm))) +
scale_colour_manual(values = c("blue", "orange")) +
theme_bw()
Data:
TOT <- structure(list(Arm = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L),
VisitNo = structure(c(0L, 6L, 12L, 16L, 24L, 36L, 0L, 6L, 12L, 16L, 24L, 36L),
label = "VisitNo", class = c("labelled", "integer")),
variable = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
.Label = c("PWB", "SWB", "EWB", "FWB", "AC"), class = "factor"),
Mean = c(25.3025326086957, 25.4365119047619, 25.8333333333333, 21.3452380952381,
26, 26.8235294117647, 25.2272727272727, 25.6172839506173,
25.6805555555556, 21.625976744186, 26.24, 26)),
row.names = c(NA, 12L), class = "data.frame")
If you just want the lines to be a bit lighter than the points, you can use alpha to make the lines a bit transparent:
ggplot(head(TOT, 12), aes(x=VisitNo, y=Mean)) +
geom_line(size=1.5, aes(color=as.factor(Arm)), alpha = 0.4) +
geom_point(size=3, aes(color=as.factor(Arm))) +
scale_colour_manual(values = c("blue", "orange")) +
theme_bw()

ggplot2 and CSV "inventing" data that isn't in my input

I'm attempting to produce an attractive graph of bandwidth data across a number of machines and tests. My attempts seem to work for small manually entered amounts of data, but when I feed the "full" 1773 entries, I get results in my graph that don't seem to exist in the input data.
I believe this is likely because the different tests are each of different duration, but I can't seem to prove this. If I use the following input data as csv (sorry, off-site because of size) I end up with a strange upwards-curve on my geom_smooth line, and additional data points that I can't actually see in my .csv input data. (I have much more data in real life, this is a subset that produces the strange behaviour)
I would expect the first four tries (try01-try04) to flat-line at zero, and try05 to carry on at around 1GBit/sec. Here's my code
library("ggplot2")
library("RColorBrewer")
speed = read.csv(file="data.csv")
svg("all_results.svg",width=24)
ggplot(speed,
aes(x = Second, y = Bandwidth, group=Test, colour=Test)) +
scale_fill_brewer(palette="Paired") +
geom_point() +
geom_smooth()
dev.off()
Here's the image produced
#Gregor seems to be exactly right in that the seconds are interpreted as text, when they should represent the number of the seconds since the start of that test.
Here's some example input data - please note the times are not always on a .00 second boundary due to the output of iperf.
structure(list(Machine = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "valhalla", class = "factor"),
User = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "alice", class = "factor"),
Test = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "try01", class = "factor"),
Second = structure(c(1L, 2L, 13L, 14L, 15L, 16L, 17L, 18L,
19L, 20L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L), .Label = c("0.00-1.00",
"1.00-2.00", "10.00-11.00", "11.00-12.00", "12.00-13.00",
"13.00-14.00", "14.00-15.00", "15.00-16.00", "16.00-17.00",
"17.00-18.00", "18.00-19.00", "19.00-20.00", "2.00-3.00",
"3.00-4.00", "4.00-5.00", "5.00-6.00", "6.00-7.00", "7.00-8.00",
"8.00-9.00", "9.00-10.00"), class = "factor"), Bandwidth = c(937,
943, 944, 943, 943, 943, 943, 944, 658, 943, 944, 943, 944,
644, 943, 943, 943, 944, 943, 943)), row.names = c(NA, 20L
), class = "data.frame")
I'll try casting (or whatever R calls it) those to a float now.
Points have a single x value, not a range of x-values, so we'll separate your Second column into beginning and end of the interval and plot the points at the beginning. Calling your data dd"
library(tidyr)
library(dplyr)
dd = dd %>%
separate(Second, into = c("sec_start", "sec_end"), sep = "-", remove = FALSE) %>%
mutate(sec_start = as.numeric(sec_start),
sec_end = as.numeric(sec_end))
After that the plotting should go just fine if you put sec_start or sec_end on the x-axis. (Or calculate the middle, whatever you want...)
If you want to visualize the durations, you could use geom_segment and aes(x = sec_start, xend = sec_end, y = Bandwidth, yend = Bandwidth), but since everything is just about the same duration, it doesn't seem like this would add much value.

Combine bar plot and stat_smooth() line from different data sets in ggplot2

I'm trying to overlay a stat_smooth() line from one dataset over a bar plot of another. Both csv files draw from the same dataset, but I had to make a new one for the bar plot because I had to add a few columns (including error bars) that wouldn't make sense in the big csv. So, I have code for the bar plot, and code for the line made using stat_smooth, but can't figure out how to combine them. I just want a graph with the line on top of the bars. Here's the code for the bar plot:
`e <- read.csv("Retro Complex.csv", header=T, sep=",")
e <- subset(e, Accuracy != 0)
limits <- aes(ymax = Confidence + SE, ymin = Confidence - SE)
e$Complexity <- factor(e$Complexity)
p <- ggplot(e, aes(e$Complexity, Confidence))
p +
geom_bar(position = "dodge", stat = "identity") +
geom_errorbar(limits, position = "dodge", width = 0.25) +
coord_cartesian(ylim=c(0,1)) +
scale_y_continuous(labels = percent) +
ggtitle("Retro")`
And here's for the line
`ggplot(retroacc, aes(x=Complexity.Sample, y=risk)) +
stat_smooth(aes(x=Complexity.Sample, y=risk), data=retroacc,
method="glm", method.args=list(family="binomial"), se=FALSE) +
ylim(0,1)`
Here's what they both look like:
Stat_smooth() line:
Barplot:
Sample Data
For the bar plot:
structure(list(Complexity = structure(1:5, .Label = c("1", "2",
"3", "4", "5"), class = "factor"), Accuracy = c(1L, 1L, 1L, 1L,
1L), Risk = c(0.69297164, 0.695793434, 0.695891571, 0.746606335,
0.748717949), SE = c(0.003621776, 0.004254081, 0.00669456, 0.008114764,
0.021963804), Proportion = c(0.823475656, 0.809299751, 0.863727821,
0.94724695, 0.882352941), SEAcc = c(0.002716612, 0.003267882,
0.004639995, 0.004059001, 0.015325003)), .Names = c("Complexity",
"Accuracy", "Confidence", "SE", "Proportion", "SEAcc"), row.names = c(1L,
3L, 5L, 7L, 9L), class = "data.frame")
For the line:
structure(list(risk = c(0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), acc = c(0L, 1L, 1L, 1L,
0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
Uniqueness = c(0.405166959, 0.407414244, 0.285123931, 0.248994487,
0.259019778, 0.334552913, 0.300580793, 0.354632526, 0.309841996,
0.331460876, 0.289981111, 0.362405881, 0.37389863, 0.253672193,
0.342903451, 0.294459829, 0.387447291, 0.519657612, 0.278964406
), Average.Similarity = c(0.406700667, 0.409547355, 0.275663862,
0.240909144, 0.251796956, 0.31827466, 0.240574971, 0.349093002,
0.34253811, 0.348084627, 0.290495997, 0.318312198, 0.404143605,
0.290789337, 0.293259599, 0.320214236, 0.382449298, 0.506295194,
0.335167223), Complexity.Sample = c(8521L, 11407L, 3963L,
2536L, 2327L, 3724L, 4005L, 5845L, 5770L, 5246L, 3629L, 3994L,
4285L, 1503L, 8222L, 3683L, 5639L, 10288L, 3076L)), .Names = c("risk",
"acc", "Uniqueness", "Average.Similarity", "Complexity.Sample"
), class = "data.frame", row.names = c(NA, -19L))
So yeah, if any of you guys know how to combine these into one plot please let me know!!

How to generate facetted ggplot graph where each facet has ordered data?

I want to sort my factors (Condition, Parameter and SubjectID) by MeanWeight and plot MeanWeight against SubjectID such that when faceted by Condition and Parameter, MeanWeight appears in descending order.
Here is my solution, which isn't giving me what I want:
dataSummary <- structure(list(SubjectID = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L), .Label = c("s001",
"s002", "s003", "s004"), class = "factor"), Condition = structure(c(1L,
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L), .Label = c("1", "2", "3"), class = "factor"), Parameter = structure(c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L), .Label = c("(Intercept)", "PrevCorr1", "PrevFail1"), class = "factor"),
MeanWeight = c(-0.389685536725783, 0.200987679398502, -0.808114314421089,
-0.10196105040707, 0.0274188815763494, 0.359978984195839,
-0.554583879312783, 0.643791202050396, -0.145042221940287,
-0.0144598460145723, -0.225804028997856, -0.928152539784374,
0.134025102103562, -0.267448309989731, -1.19980109795115,
0.0587152632631923, 0.0050656268880826, -0.156537446664213
)), .Names = c("SubjectID", "Condition", "Parameter", "MeanWeight"
), row.names = c(NA, 18L), class = "data.frame")
## Order by three variables
orderWeights <- order(dataSummary$Condition, dataSummary$Parameter, dataSummary$SubjectID, -dataSummary$MeanWeight)
## Set factors to the new order. I expect this to sort for each facet when plotting, but it doesn't seem to work.
conditionOrder <- dataSummary$Condition[orderWeights]
dataSummary$Condition <- factor(dataSummary$Condition, levels=conditionOrder)
paramOrder <- dataSummary$Parameter[orderWeights]
dataSummary$Parameter <- factor(dataSummary$Parameter, levels=paramOrder)
sbjOrder <- dataSummary$SubjectID[orderWeights]
dataSummary$SubjectID <- factor(dataSummary$SubjectID, levels=sbjOrder)
## Plot
ggplot(dataSummary, aes(x=MeanWeight, y=SubjectID)) +
scale_x_continuous(limits=c(-3, 3)) +
geom_vline(yintercept = 0.0, size = 0.1, colour = "#a9a9a9", linetype = "solid") +
geom_segment(aes(yend=SubjectID), xend=0, colour="grey50") +
geom_point(size=2) +
facet_grid(Parameter~Condition, scales="free_y")
I tried a few other approaches, but they didn't work either:
dataSummary <- dataSummary[order(dataSummary$Condition, dataSummary$Parameter, dataSummary$SubjectID, -dataSummary$MeanWeight),]
or this one
dataSummary <- transform(dataSummary, SubjectID=reorder(Condition, Parameter, SubjectID, MeanWeight))
You can order your data and plot it. However, the labels no longer correspond to Subject ID's, but to the reordered subjects. If that is not what you want, you cannot use faceting but have to plot the parts separately and use e.g.grid.arrangeto combind the different plots.
require(plyr)
## Ordered data
datOrder <- ddply(dataSummary, c("Condition", "Parameter"), function(x){
if (nrow(x)<=1) return(x)
x$MeanWeight <- x$MeanWeight[order(x$MeanWeight)]
x
})
## Plot
ggplot(datOrder, aes(x=MeanWeight, y=SubjectID)) +
scale_x_continuous(limits=c(-3, 3)) +
geom_vline(yintercept = 0.0, size = 0.1, colour = "#a9a9a9", linetype = "solid") +
geom_segment(aes(yend=SubjectID), xend=0, colour="grey50") +
geom_point(size=2) +
facet_grid(Parameter~Condition) +
scale_y_discrete(name="Ordered subjects")

Resources