I'm making a graphical analysis of the course evaluations.
I got the following data:
> str(dataJ2)
'data.frame': 16 obs. of 22 variables:
...
$ lk_nummer : Factor w/ 111 levels "051-0311-00S",..: 19 30 38 47 49 50 51 55 56 59 ...
$ le_titel : Factor w/ 111 levels "","Advanced Methods and Strategies in Synthesis",..: 6 99 75 82 84 8 40 39 38 68 ...
$ anzahl_stud : int 7 79 1 34 10 20 83 10 4 11 ...
$ durchschnitt : num 4.61 5.35 3.5 4.4 4.4 4.33 4.49 4.53 5.38 4.48 ...
$ standardabweich : num 0.4 0.54 0 1.02 1.21 0.62 1.17 0.9 0.28 0.68 ...
...
$ prozent_best : num 85.7 97.5 0 70.6 90 80 73.5 90 100 81.8 ...
...
Using ggplot2 I was able to make a plot looking like this:
plotJ2 <- ggplot(dataJ2, aes(y=durchschnitt,x=le_titel))
plotJ2 + geom_bar(position=position_dodge(), stat="identity", fill = I("chartreuse4")) +
scale_y_continuous(limits=c(0,6.6),breaks=seq(from=1, to=6, by=1)) +
geom_errorbar(aes(ymin=durchschnitt-standardabweich, ymax=durchschnitt+standardabweich), width=.1) +
ggtitle("2. Jahr Bsc Biologie") +
ylab("Durchschnitt") + xlab("Fächer") +
geom_text(aes(label = durchschnitt, y = 1.8), size = 4, colour="gray85") +
geom_text(aes(label = anzahl_stud, y = 0.2), size = 4, colour="grey85") +
geom_text(aes(label = prozent_best, y = 6.55), size = 4, colour="chartreuse4", adj=1) +
geom_text(aes(label = "%", y = 6.6), size = 4, colour="chartreuse4", adj=0) +
coord_flip()
Which looks like this when plotted.
But however, the "prozent_best" in the graphical part looks not very nice.
I tried to add with mtext, text and facet_wrap the data from "dataJ2$prozent_best" as a second y-axis label on the right side of the gray graph part but couldn't make it work.
Any recommendations?
Useful translations/descriptions of the data annotation:
lk_nummer -> number of the lectures
le_titel -> name of the lectures
anzahl_stud -> number of students
durchschnitt -> average
prozent_best -> number of students which passed the exam in percent
Fächer -> classes
Try:
geom_text(aes(label = paste0(prozent_best,'%'), y = 6.55),
size = 4, colour="chartreuse4", hjust='right')
That will combine the '%' symbol with the value into one string. Generally I would suggest generating your label vectors outside the ggplot call, but for this it does not add too much mess.
Also, you might want to look into adding scale_x_continuous(expand=0,limits=c(0,7)). That will get rid of the ugly grey bar on the left side.
Possibly also try adding in theme_bw() since your plot is already so busy the grey blocks in the background of ggplots standard theme just make it look mushy.
Related
I have two data sets MASS and MASS2 to create a map in R. I got the first one with the help of library(ggmap).
counties<-map_data('county')
MASS<-map_data('county', 'massachusetts')
str(MASS)
data.frame': 744 obs. of 6 variables:
$ long : num -70.7 -70.5 -70.5 -70.5 -70.5 ...
$ lat : num 41.7 41.8 41.8 41.8 41.8 ...
$ group : num 1 1 1 1 1 1 1 1 1 1 ...
$ state : chr "massachusetts" "massachusetts" "massachusetts" ...
$ county_name: chr "barnstable" "barnstable" "barnstable" "barnstable" ...
The second consists of 14 points one per each county and has a teacher's quantity data per that county.
str(MASS2)
'data.frame': 14 obs. of 6 variables:
$ state : chr "massachusetts" "massachusetts" "massachusetts" ...
$ county_name : chr "barnstable" "berkshire" "bristol" "dukes" ...
$ long : num -70.7 -73.5 -71.2 -70.5 -71 ...
$ lat : num 41.7 42 41.7 41.4 42.4 ...
$ group : num 1 2 3 4 5 6 7 8 9 10 ...
$ teacher_count: int 62 40 47 ...
I need to create a map where each teacher_count point will be represented by a circle in accordance with teacher's amount. So far I'm getting just one size circles.
My code is next:
ggplot(MASS, aes(long,lat, group = group)) +
geom_polygon(aes(fill = county_name),colour = "black") +
geom_point(data = MASS2, aes(x = long, y = lat), color = "red", size = 5)+
theme(legend.position="none") +
coord_quickmap()
This is the map I get
I found one solution online which offers to represent the size in geom_point as
+geom_point(......, size = MASS2$teacher_count*circle_scale_amt)+
scale_size_continuous(range=range(MA$teacher_count))
but R can't find circle_scale_amt.
I am a new to R and trying to learn. Will appreciate ideas for any other ways to represent the teachers by their quantity! Thank you!
This works for me after setting a value for circle_scale_amt to rescale the size of the points otherwise they would be too big.
library(ggmap)
counties <- map_data('county')
MASS <- map_data('county', 'massachusetts')
circle_scale_amt <- 0.05
ggplot(MASS, aes(long,lat, group = group)) +
geom_polygon(aes(fill = subregion),colour = "black") +
geom_point(data = MASS2, aes(x = long, y = lat),
size = MASS2$teacher_count * circle_scale_amt,
color = "red", alpha = 0.6)+
scale_size_continuous(range = range(MASS2$teacher_count)) +
theme(legend.position="none") +
coord_quickmap()
Created on 2018-03-16 by the reprex package (v0.2.0).
I have 9 plots with 3 time series in each plot, one of these plots contains only one curve and it's the reference plot which I would like to place in between the two rows that contain the other 8 plots. Is there an easy way to do so?
I use facet_wrap(~density,nrow=2) but I get one row with 5 and another with 4 plots. I am sure other people had this problem, is there an easy way around to organize the position of this reference plot, or do I have to create two separate plots and overlay them? Otherwise I might have to move this reference plot in all the other plots but it seems redundant information.
This is my current result, but as you can see it's not very well laid out.
The graphic you are looking for can be generated with gridArrange from the
gridExtra package. Here is
an example using the storms data set from the
dplyr.
library(ggplot2)
library(gridExtra)
library(dplyr)
data(storms, package = 'dplyr')
str(storms)
## Classes 'tbl_df', 'tbl' and 'data.frame': 10010 obs. of 13 variables:
## $ name : chr "Amy" "Amy" "Amy" "Amy" ...
## $ year : num 1975 1975 1975 1975 1975 ...
## $ month : num 6 6 6 6 6 6 6 6 6 6 ...
## $ day : int 27 27 27 27 28 28 28 28 29 29 ...
## $ hour : num 0 6 12 18 0 6 12 18 0 6 ...
## $ lat : num 27.5 28.5 29.5 30.5 31.5 32.4 33.3 34 34.4 34 ...
## $ long : num -79 -79 -79 -79 -78.8 -78.7 -78 -77 -75.8 -74.8 ...
## $ status : chr "tropical depression" "tropical depression" "tropical depression" "tropical depression" ...
## $ category : Ord.factor w/ 7 levels "-1"<"0"<"1"<"2"<..: 1 1 1 1 1 1 1 1 2 2 ...
## $ wind : int 25 25 25 25 25 25 25 30 35 40 ...
## $ pressure : int 1013 1013 1013 1013 1012 1012 1011 1006 1004 1002 ...
## $ ts_diameter: num NA NA NA NA NA NA NA NA NA NA ...
## $ hu_diameter: num NA NA NA NA NA NA NA NA NA NA ...
Let's create two graphics. The first graphic will be only form category == -1
storms (this would be the control group in your question). The second
graphic will be a facteted graphic for the category > -1 storm
First, we'll build a generic ggplot object for the graphics.
graphic <-
ggplot() +
aes(x = long, y = lat, color = category) +
geom_point() +
facet_wrap( ~ category) +
scale_color_hue(breaks = levels(storms$category),
labels = levels(storms$category),
drop = FALSE)
Next we build the two graphics as needed.
g1 <- graphic %+% dplyr::filter(storms, category == -1) + theme(legend.position = "none")
g2 <- graphic %+% dplyr::filter(storms, category != -1)
gridExtra::grid.arrange can take a layout matrix where the numbers 1 and 2
denote the first and second graphics passed to the function. (This works for
a lot more than just two graphics, by the way.) By repeating the values of 1
and 2 in the matrix we can control the relative size of the two graphics in
the graphics device.
gridExtra::grid.arrange(g1, g2,
layout_matrix =
matrix(c(1, 1, 1, 2, 2, 2, 2, 2,
1, 1, 1, 2, 2, 2, 2, 2,
1, 1, 1, 2, 2, 2, 2, 2),
byrow = TRUE, nrow = 3)
)
If I understand the question correctly you could reformat your data with appropriate facetting variables to introduce a new row of reference panels
library(ggplot2)
d <- data.frame(x=rep(1:10, 8), y = rnorm(80),
f=gl(8,10, ordered = TRUE))
d$f1 <- factor(d$f <= 4, labels=c(1,3))
d$f2 <- as.numeric(d$f) %% 4
d2 <- data.frame(x=1:10, y=0, f1 = 2)
ggplot(d, aes(x,y)) +
geom_point(aes(colour=f)) +
geom_point(data=d2, colour="black") +
facet_grid(f1~f2)
I'm trying to make a wheel chart that has rings. My result looks like the lines all go back to zero before continuing to the next point. Is it a discreet/continuous issue? I've tried making Lap.Time and Lap both numeric to no avail:
f1 <- read.csv("F1 2011 Turkey - Fuel Corrected Lap Times.csv", header = T)
str(f1)
# data.frame: 1263 obs. of 5 variables:
# $ Driver : Factor w/ 23 levels "1","2","3","4",..: 23 23 23 23 23 23 23 23 23 23 ...
# $ Lap : int 1 2 3 4 5 6 7 8 9 10 ...
# $ Lap.Time : num 107 99.3 98.4 97.5 97.4 ...
# $ Fuel.Adjusted.Laptime : num 102.3 94.7 93.9 93.1 93.1 ...
# $ Fuel.and.fastest.lap.adjusted.laptime: num 9.73 2.124 1.321 0.54 0.467 ...
library(ggplot2)
f1$Driver<-as.factor(f1$Driver)
p1 <- ggplot(data=subset(f1, Lap.Time <= 120), aes(x = Lap, y= Lap.Time, colour = Driver)) +
geom_point(aes(colour=Driver))
p2 <- ggplot(subset(f1, Lap.Time <= 120),
aes(x = Lap, y= Lap.Time, colour = Driver, group = 1)) +
geom_line(aes(colour=Driver))
pout <- p1 + coord_polar()
pout2 <- p2 + coord_polar()
pout
pout2
resulting chart image
All the data is in this csv:
https://docs.google.com/spreadsheets/d/1Ef2ewd1-0FM1mJL1o00C6c2gf7HFmanJh8an1EaAq2Q/edit?hl=en_GB&authkey=CMSemOQK#gid=0
Sample of csv:
Driver,Lap,Lap Time,Fuel Adjusted Laptime,Fuel and fastest lap adjusted laptime
25,1,106.951,102.334,9.73
25,2,99.264,94.728,2.124
25,3,98.38,93.925,1.321
25,4,97.518,93.144,0.54
25,5,97.364,93.071,0.467
25,6,97.853,93.641,1.037
25,7,98.381,94.25,1.646
25,8,98.142,94.092,1.488
25,9,97.585,93.616,1.012
25,10,97.567,93.679,1.075
25,11,97.566,93.759,1.155
25,12,97.771,94.045,1.441
25,13,98.532,94.887,2.283
25,14,99.146,95.582,2.978
25,15,98.529,95.046,2.442
25,16,99.419,96.017,3.413
25,17,114.593,111.272,18.668
I have a small data set, local, (5 observations) with two types: a and b.
Each observation has a Date field (p.start), a ratio, and a duration.
local
principal p.start duration allocated.days ratio
1 P 2015-03-18 1 162.0000 162.0000
2 V 2015-08-28 4 24.0000 6.0000
3 V 2015-09-03 1 89.0000 89.0000
4 V 2015-03-30 1 32.0000 32.0000
5 P 2015-01-29 1 150.1667 150.1667
str(local)
'data.frame': 5 obs. of 5 variables:
$ principal : chr "P" "V" "V" "V" ...
$ p.start : Date, format: "2015-03-18" "2015-08-28" "2015-09-03" "2015-03-30" ...
$ duration : Factor w/ 10 levels "1","2","3","4",..: 1 4 1 1 1
$ allocated.days: num 162 24 89 32 150
$ ratio : num 162 6 89 32 150
I have another data frame, stats, with text to be added to a faceted plot.
stats
principal xx yy zz
1 P 2015-02-28 145.8 Average = 156
2 V 2015-02-28 145.8 Average = 24
str(stats)
'data.frame': 2 obs. of 4 variables:
$ principal: chr "P" "V"
$ xx : Date, format: "2015-02-28" "2015-02-28"
$ yy : num 146 146
$ zz : chr "Average = 156" "Average = 24"
The following code fails:
p = ggplot (local, aes (x = p.start, y = ratio, size = duration))
p = p + geom_point (colour = "blue"); p
p = p + facet_wrap (~ principal, nrow = 2); p
p = p + geom_text(aes(x=xx, y=yy, label=zz), data= stats)
p
Error: Continuous value supplied to discrete scale
Any ideas? I'm missing something obvious.
The problem is that you are plotting from 2 data.frames, but your initial ggplot call includes aes parameters referring to just the local data.frame.
So although your geom_text specifies data=stats, it is still looking for size=duration.
The following line works for me:
ggplot(local) +
geom_point(aes(x=p.start, y=ratio, size=duration), colour="blue") +
facet_wrap(~ principal, nrow=2) +
geom_text(data=stats, aes(x=xx, y=yy, label=zz))
Just remove size = duration from ggplot (local, aes (x = p.start, y = ratio, size = duration)) and add it into geom_point (colour = "blue"). Then, it should work.
ggplot(local, aes(x=p.start, y=ratio))+
geom_point(colour="blue", aes(size=duration))+
facet_wrap(~principal, nrow=2)+
geom_text(aes(x=xx, y=yy, label=zz), data=stats)
I am failry new to R and recently used it to make some Boxplots. I also added the mean and standard deviation in my boxplot. I was wondering if i could add some kind of tick mark or circle in different percentile as well. Let's say if i want to mark the 85th, $ 90th percentile in each HOUR boxplot, is there a way to do this? My data consist of a year worth of loads in MW in each hour & My output consist of 24 boxplots for each hour for each month. I am doing each month at a time because i am not sure if there is a way to run all 96(Each month, weekday/weekend , for 4 different zones) boxplots at once. Thanks in advance for help.
JANWD <-read.csv("C:\\My Directory\\MWBox2.csv")
JANWD.df<-data.frame(JANWD)
JANWD.sub <-subset(JANWD.df, MONTH < 2 & weekend == "NO")
KeepCols <-c("Hour" , "Houston_Load")
HWD <- JANWD.sub[ ,KeepCols]
sd <-tapply(HWD$Houston_Load, HWD$Hour, sd)
means <-tapply(HWD$Houston_Load, HWD$Hour, mean)
boxplot(Houston_Load ~ Hour, data=HWD, xlab="WEEKDAY HOURS", ylab="MW Differnce", ylim= c(-10, 20), smooth=TRUE ,col ="bisque", range=0)
points(sd, pch = 22, col= "blue")
points(means, pch=23, col ="red")
#Output of the subset of data used to run boxplot for month january in Houston
str(HWD)
'data.frame': 504 obs. of 2 variables:
`$ Hour : int 1 2 3 4 5 6 7 8 9 10 ...'
`$ Houston_Load: num 1.922 2.747 -2.389 0.515 1.922 ...'
#OUTPUT of the original data
str(JANWD)
'data.frame': 8783 obs. of 9 variables:
$ Date : Factor w/ 366 levels "1/1/2012","1/10/2012",..: 306 306 306 306 306 306 306 306 306 306 ...
`$ Hour : int 1 2 3 4 5 6 7 8 9 10 ...'
` $ MONTH : int 8 8 8 8 8 8 8 8 8 8 ...'
`$ weekend : Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ...'
`$ TOTAL_LOAD : num 0.607 5.111 6.252 7.607 0.607 ...'
`$ Houston_Load: num -2.389 0.515 1.922 2.747 -2.389 ...'
`$ North_Load : num 2.95 4.14 3.55 3.91 2.95 ...'
`$ South_Load : num -0.108 0.267 0.54 0.638 -0.108 ...'
`$ West_Load : num 0.154 0.193 0.236 0.311 0.154 ...'
Here is one way, using quantile() to compute the relevant percentiles for you. I add the marks using rug().
set.seed(1)
X <- rnorm(200)
boxplot(X, yaxt = "n")
## compute the required quantiles
qntl <- quantile(X, probs = c(0.85, 0.90))
## add them as a rgu plot to the left hand side
rug(qntl, side = 2, col = "blue", lwd = 2)
## add the box and axes
axis(2)
box()
Update: In response to the OP providing str() output, here is an example similar to the data that the OP has to hand:
set.seed(1) ## make reproducible
HWD <- data.frame(Hour = rep(0:23, 10),
Houston_Load = rnorm(24*10))
Now get I presume you want ticks at 85th and 90th percentiles for each Hour? If so we need to split the data by Hour and compute via quantile() as I showed earlier:
quants <- sapply(split(HWD$Houston_Load, list(HWD$Hour)),
quantile, probs = c(0.85, 0.9))
which gives:
R> quants <- sapply(split(HWD$Houston_Load, list(HWD$Hour)),
+ quantile, probs = c(0.85, 0.9))
R> quants
0 1 2 3 4 5 6
85% 0.3576510 0.8633506 1.581443 0.2264709 0.4164411 0.2864026 1.053742
90% 0.6116363 0.9273008 2.109248 0.4218297 0.5554147 0.4474140 1.366114
7 8 9 10 11 12 13 14
85% 0.5352211 0.5175485 1.790593 1.394988 0.7280584 0.8578999 1.437778 1.087101
90% 0.8625322 0.5969672 1.830352 1.519262 0.9399476 1.1401877 1.763725 1.102516
15 16 17 18 19 20 21
85% 0.6855288 0.4874499 0.5493679 0.9754414 1.095362 0.7936225 1.824002
90% 0.8737872 0.6121487 0.6078405 1.0990935 1.233637 0.9431199 2.175961
22 23
85% 1.058648 0.6950166
90% 1.145783 0.8436541
Now we can draw marks at the x locations of the boxplots
boxplot(Houston_Load ~ Hour, data = HWD, axes = FALSE)
xlocs <- 1:24 ## where to draw marks
tickl <- 0.15 ## length of marks used
for(i in seq_len(ncol(quants))) {
segments(x0 = rep(xlocs[i] - 0.15, 2), y0 = quants[, i],
x1 = rep(xlocs[i] + 0.15, 2), y1 = quants[, i],
col = c("red", "blue"), lwd = 2)
}
title(xlab = "Hour", ylab = "Houston Load")
axis(1, at = xlocs, labels = xlocs - 1)
axis(2)
box()
legend("bottomleft", legend = paste(c("0.85", "0.90"), "quantile"),
bty = "n", lty = "solid", lwd = 2, col = c("red", "blue"))
The resulting figure should look like this: