Coloring line segments in ggplot2 - r

Suppose I have following data for a student's score on a test.
set.seed(1)
df <- data.frame(question = 0:10,
resp = c(NA,sample(c("Correct","Incorrect"),10,replace=TRUE)),
score.after.resp=50)
for (i in 1:10) {
ifelse(df$resp[i+1] == "Correct",
df$score.after.resp[i+1] <- df$score.after.resp[i] + 5,
df$score.after.resp[i+1] <- df$score.after.resp[i] - 5)
}
df
.
question resp score.after.resp
1 0 <NA> 50
2 1 Correct 55
3 2 Correct 60
4 3 Incorrect 55
5 4 Incorrect 50
6 5 Correct 55
7 6 Incorrect 50
8 7 Incorrect 45
9 8 Incorrect 40
10 9 Incorrect 35
11 10 Correct 40
I want to get following graph:
library(ggplot2)
ggplot(df,aes(x = question, y = score.after.resp)) + geom_line() + geom_point()
My problem is: I want to color segments of this line according to student response. If correct (increasing) line segment will be green and if incorrect response (decreasing) line should be red.
I tried following code but did not work:
ggplot(df,aes(x = question, y = score.after.resp, color=factor(resp))) +
geom_line() + geom_point()
Any ideas?

I would probably approach this a little differently, and use geom_segment instead:
df1 <- as.data.frame(with(df,cbind(embed(score.after.resp,2),embed(question,2))))
colnames(df1) <- c('yend','y','xend','x')
df1$col <- ifelse(df1$y - df1$yend >= 0,'Decrease','Increase')
ggplot(df1) +
geom_segment(aes(x = x,y = y,xend = xend,yend = yend,colour = col)) +
geom_point(data = df,aes(x = question,y = score.after.resp))
A brief explanation:
I'm using embed to transform the x and y variables into starting and ending points for each line segment, and then simply adding a variable that indicates whether each segment went up or down. Then I used the previous data frame to add the original points themselves.
Alternatively, I suppose you could use geom_line something like this:
df$resp1 <- c(as.character(df$resp[-1]),NA)
ggplot(df,aes(x = question, y = score.after.resp, color=factor(resp1),group = 1)) +
geom_line() + geom_point(color = "black")

By default ggplot2 groups the data according to the aesthetics that are mapped to factors. You can override this default by setting group explicitly,
last_plot() + aes(group=NA)

Related

ggplot2 missing data when plotting histogram with custom x axis limits

I am trying to plot six histograms (2 colums of data (calories, sodium) x 3 types (beef, meat, poultry)) with these data and I want to give them the same scale for x and y axis. I'm using scale_x_continuous to limit the x axis, which according to various sources, removes data that won't appear on the plot. Here is my code:
#src.table is the data frame containing my data
histogram <- function(df, dataset, n_bins, label) {
ggplot(df, aes(x=df[[dataset]])) +
geom_histogram(color="darkblue", fill="lightblue", bins = n_bins) + xlab(label)
}
src2_12.beef <- src2_12.table[src2_12.table$Type == "Beef",]
src2_12.meat <- src2_12.table[src2_12.table$Type == "Meat",]
src2_12.poultry <- src2_12.table[src2_12.table$Type == "Poultry",]
src2_12.calories_scale <- lims(x = c(min(src2_12.table$Calories), max(src2_12.table$Calories)), y = c(0, 6))
src2_12.sodium_scale <- lims(x = c(min(src2_12.table$Sodium), max(src2_12.table$Sodium)), y = c(0, 6))
#src2_12.calories_scale <- lims()
#src2_12.sodium_scale <- lims()
src2_12.plots <- list(
histogram(src2_12.beef, "Calories", 10, "Calories-Beef") + src2_12.calories_scale,
histogram(src2_12.meat, "Calories", 10, "Calories-Meat") + src2_12.calories_scale,
histogram(src2_12.poultry, "Calories", 10, "Calories-Poultry") + src2_12.calories_scale,
histogram(src2_12.beef, "Sodium", 10, "Sodium-Beef") + src2_12.sodium_scale,
histogram(src2_12.meat, "Sodium", 10, "Sodium-Meat") + src2_12.sodium_scale,
histogram(src2_12.poultry, "Sodium", 10, "Sodium-Poultry") + src2_12.sodium_scale
)
multiplot(plotlist = src2_12.plots, cols = 2, layout = matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, byrow = TRUE))
Here is the output:
vs. what the data are supposed to look like:
I couldn't understand why some data points are missing since given that the limit I set is already the min and the max of the data.
You probably want to use coord_cartesian instead of lims. Unexpected things can happen when you're fiddling around with the limits on histograms, because a fair bit of fiddly transformations have to happen to get from your raw data to the actual histogram.
Let's peer under the hood for one example:
p <- ggplot(src2_12.beef,aes(x = Calories)) +
geom_histogram(bins = 10)
p1 <- ggplot(src2_12.beef,aes(x = Calories)) +
geom_histogram(bins = 10) +
lims(x = c(86,195))
a <- ggplot_build(p)
b <- ggplot_build(p1)
>a$data[[1]][,1:5]
y count x xmin xmax
1 1 1 114.1111 109.7222 118.5000
2 0 0 122.8889 118.5000 127.2778
3 3 3 131.6667 127.2778 136.0556
4 2 2 140.4444 136.0556 144.8333
5 5 5 149.2222 144.8333 153.6111
6 2 2 158.0000 153.6111 162.3889
7 0 0 166.7778 162.3889 171.1667
8 2 2 175.5556 171.1667 179.9444
9 3 3 184.3333 179.9444 188.7222
10 2 2 193.1111 188.7222 197.5000
> b$data[[1]][,1:5]
y count x xmin xmax
1 0 0 NA NA 90.83333
2 0 0 96.88889 90.83333 102.94444
3 1 1 109.00000 102.94444 115.05556
4 0 0 121.11111 115.05556 127.16667
5 4 4 133.22222 127.16667 139.27778
6 4 4 145.33333 139.27778 151.38889
7 4 4 157.44444 151.38889 163.50000
8 1 1 169.55556 163.50000 175.61111
9 4 4 181.66667 175.61111 187.72222
10 2 2 193.77778 187.72222 NA
>
So now you're wondering, how the heck did that happen, right?
Well, when you tell ggplot that you want 10 bins and the x limits go from 86 to 195, the histogram algorithm tries to create ten bins that span that actual range. That's why it's trying to create bins down below 100 even though there's no data there.
And then further oddities can happen because the bars may extend past the nominal data range (the xmin and xmax values), since the bar widths will generally encompass a little above and a little below your actual data at the high and low ends.
coord_cartesian will adjust the x limits after all this processing has happened, so it bypasses all these little quirks.

ggplot2 animation shows some empty plots

I'm trying to plot a lot of scatterplots in an animation, but a lot of the plots show just an empty plot. It also differs everytime I run the code/adjust the range.
Some plots do work and they are all supposed to look like this:
But most of the plots look like this:
This is my code:
library(ggplot2)
library(animation)
begintime <- min(dfL$time)
endtime <- max(dfL$time)
beginRange <- begintime
endRange <- begintime + 10
dateRangeBetween <- function(x,y){dfL[dfL$time >= x & dfL$time <= y,]}
saveHTML({
for (i in 1:20) {
dfSub <- dateRangeBetween(beginRange, endRange)
ggScatterplot = ggplot(data = dfSub, aes(x = UTM_WGS84.Longitude, y = UTM_WGS84.Latitude)) + ggtitle("Coordinates") + xlab("Longitude") + ylab("Latitude") + theme(legend.position = "top") + geom_point()
beginRange <- beginRange + 10
endRange <- endRange + 10
print(ggScatterplot)
}
}, img.name = "coordinatesplots", imgdir = "coordinatesplots", htmlfile = "coordinatesplots.html",
outdir = getwd(), autobrowse = FALSE, ani.height = 400, ani.width = 600,
verbose = FALSE, autoplay = TRUE, title = "Coordinates")
This is an example of my dataframe:
track time UTM_WGS84.Longitude UTM_WGS84.Latitude
1 1 2015-10-14 23:59:55.711 5.481687 51.43635
2 1 2015-10-14 23:59:55.717 5.481689 51.43635
3 1 2015-10-14 23:59:55.723 5.481689 51.43635
4 1 2015-10-14 23:59:55.730 5.481690 51.43635
5 1 2015-10-14 23:59:55.763 5.481691 51.43635
Can someone please help me with this?
The most likely reason for your plots being empty is that the subset of your data.frame itself is empty.
I think (hard to say without seeing your full data) that your problem is that you're not incrementing by the correct amount of time. By default, adding a number to a date will add a number of seconds. I suspect the full range of your data is less than 10 seconds, and therefore only the first plot will show some data. After that the time range will be outside of the range of your data.
If that is the case, just change the + 10 to the actual amount of time you want to add. 1: 1 second, 0.1: a tenth of a second, etc...

Plot lines with their standard deviation on different x axis

In the following example, I want have zone on y axis, then plot D1 with its standard deviation (shading) D1sd on x axis. Next, I want to add D1b and its standard deviation on the second x axis. My second question is that, is it possible to plot the second set of data D2 in a panel next to first one. I'm thinking of the way spplot puts the panels next to each other. Thanks!
zone D1 D1sd D1b D1bsd D2 D2sd D2b D2bsd
-10 6.018198819 1.353674355 0.820238734 0.299921523 6.149905542 1.559112995 0.71903318 0.281436916
-9 6.016694189 1.348320178 0.790463895 0.320471326 6.225247218 1.810133214 0.690944285 0.291123921
-8 6.075920068 1.268199241 0.792396958 0.295767298 6.452827975 1.890055573 0.698130383 0.285354803
-7 6.014926533 1.15754388 0.826652396 0.269340472 6.364786271 1.677836628 0.748784125 0.262342978
-6 5.934024155 1.097224151 0.876312952 0.287715603 6.167672962 1.558124318 0.755995918 0.265152681
-5 6.180879693 1.115373166 0.911045374 0.302416557 6.429580579 1.485044161 0.783518016 0.255475422
-4 6.215761357 1.287465467 0.930981232 0.302896699 6.579955644 1.388358072 0.810873074 0.234479504
-3 6.191414137 1.297136068 0.859521028 0.301839757 6.72533907 1.383269712 0.786424272 0.242793151
-2 6.249558839 1.484243431 0.870789671 0.315339266 6.738830636 1.39348093 0.822833797 0.28853238
-1 6.279693424 1.462642241 0.890051094 0.313090388 6.665698185 1.272444414 0.849884276 0.309606843
0 6.389352438 1.653046732 0.911295197 0.332748249 6.623842834 1.3384852 0.860175975 0.311888845
1 6.421109477 1.954238381 0.917046385 0.349039084 6.633736605 1.627187751 0.880706612 0.346350393
2 6.187522396 1.994178951 0.881417644 0.38571426 6.422238767 1.685610306 0.875399565 0.351651773
3 5.975654953 2.180870669 0.871365681 0.444535385 6.245207747 1.925609129 0.915266481 0.424662193
4 5.681784682 2.182018258 0.846469896 0.38550673 6.004553419 1.947533306 0.890484046 0.404342645
5 5.550390285 2.189799132 0.834608476 0.340348644 5.831848009 1.849502381 0.887486532 0.387460845
6 5.382758749 2.460409982 0.832118248 0.360057614 5.810419947 2.06423957 0.954814407 0.38078381
7 4.819027419 2.643911373 0.78895866 0.38043413 5.42194855 2.259929373 0.935858628 0.37891625
8 3.782918423 2.584426217 0.643611576 0.335647266 4.418220284 2.186679796 0.790979174 0.364691895
9 3.064023314 2.528951519 0.496242154 0.294101493 3.64670387 2.091471213 0.592464821 0.341064247
10 2.62392179 2.707531426 0.380282732 0.249942178 3.159422995 2.392110771 0.452474888 0.334645666
Load in data
dat <- read.table(text = "zone D1 D1sd D1b D1bsd D2 D2sd D2b D2bsd
-10 6.018198819 1.353674355 0.820238734 0.299921523 6.149905542 1.559112995 0.71903318 0.281436916
-9 6.016694189 1.348320178 0.790463895 0.320471326 6.225247218 1.810133214 0.690944285 0.291123921
-8 6.075920068 1.268199241 0.792396958 0.295767298 6.452827975 1.890055573 0.698130383 0.285354803
-7 6.014926533 1.15754388 0.826652396 0.269340472 6.364786271 1.677836628 0.748784125 0.262342978
-6 5.934024155 1.097224151 0.876312952 0.287715603 6.167672962 1.558124318 0.755995918 0.265152681
-5 6.180879693 1.115373166 0.911045374 0.302416557 6.429580579 1.485044161 0.783518016 0.255475422
-4 6.215761357 1.287465467 0.930981232 0.302896699 6.579955644 1.388358072 0.810873074 0.234479504
-3 6.191414137 1.297136068 0.859521028 0.301839757 6.72533907 1.383269712 0.786424272 0.242793151
-2 6.249558839 1.484243431 0.870789671 0.315339266 6.738830636 1.39348093 0.822833797 0.28853238
-1 6.279693424 1.462642241 0.890051094 0.313090388 6.665698185 1.272444414 0.849884276 0.309606843
0 6.389352438 1.653046732 0.911295197 0.332748249 6.623842834 1.3384852 0.860175975 0.311888845
1 6.421109477 1.954238381 0.917046385 0.349039084 6.633736605 1.627187751 0.880706612 0.346350393
2 6.187522396 1.994178951 0.881417644 0.38571426 6.422238767 1.685610306 0.875399565 0.351651773
3 5.975654953 2.180870669 0.871365681 0.444535385 6.245207747 1.925609129 0.915266481 0.424662193
4 5.681784682 2.182018258 0.846469896 0.38550673 6.004553419 1.947533306 0.890484046 0.404342645
5 5.550390285 2.189799132 0.834608476 0.340348644 5.831848009 1.849502381 0.887486532 0.387460845
6 5.382758749 2.460409982 0.832118248 0.360057614 5.810419947 2.06423957 0.954814407 0.38078381
7 4.819027419 2.643911373 0.78895866 0.38043413 5.42194855 2.259929373 0.935858628 0.37891625
8 3.782918423 2.584426217 0.643611576 0.335647266 4.418220284 2.186679796 0.790979174 0.364691895
9 3.064023314 2.528951519 0.496242154 0.294101493 3.64670387 2.091471213 0.592464821 0.341064247
10 2.62392179 2.707531426 0.380282732 0.249942178 3.159422995 2.392110771 0.452474888 0.334645666", header = T)
First simple solution
A first attempt. This first way is the 'normal' way of doing this. Normally we could flip x and y with coord_flip(), but that doesn't work with facets and free scales, unfortunately.
library(ggplot2)
dat2 <- data.frame(D = rep(c("D1", "D1b", "D2", "D2b"), each = nrow(dat)),
group = rep(c('1', '2'), each = nrow(dat) * 2),
zone = dat$zone,
value = unlist(dat[c(2, 4, 6, 8)]),
SD = unlist(dat[c(3, 5, 7, 9)]))
ggplot(dat2, aes(zone, value, ymin = value - SD, ymax = value + SD, fill = group)) +
geom_point() + geom_line() + geom_ribbon(alpha = 0.2) +
facet_wrap(~D, scales = 'free') +
theme_bw()
A solution with flipped axes
You can actually get flipped axes when you manually draw the polygons. This code is hardly pretty, but you should get the idea.
polydat <- data.frame(D = rep(c("D1", "D1b", "D2", "D2b"), each = nrow(dat) * 2),
value = c(dat$D1 - dat$D1sd, rev(dat$D1 + dat$D1sd),
dat$D1b - dat$D1bsd, rev(dat$D1b + dat$D1bsd),
dat$D2 - dat$D2sd, rev(dat$D2 + dat$D2sd),
dat$D2b - dat$D2bsd, rev(dat$D2b + dat$D2bsd)),
zone = c(dat$zone, rev(dat$zone)),
group = rep(c('1', '2'), each = nrow(dat) * 4))
ggplot(dat2, aes(value, zone, fill = group)) +
geom_point() + geom_path() +
geom_polygon(data = polydat, alpha = 0.2) +
facet_wrap(~D, scales = 'free') +
theme_bw()
One way of getting this into two plots is to normalize the data into a common x-axis first (using scale for example).

How to dodge points in ggplot2 in R

df = data.frame(subj=c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10), block=factor(rep(c(1,2),10)), acc=c(0.75,0.83,0.58,0.75,0.58,0.83,0.92,0.83,0.83,0.67,0.75,0.5,0.67,0.83,0.92,0.58,0.75,0.5,0.67,0.67))
ggplot(df,aes(block,acc,group=subj)) + geom_point(position=position_dodge(width=0.3)) + ylim(0,1) + labs(x='Block',y='Accuracy')
How do I get points to dodge each other uniformly in the horizontal direction? (I grouped by subj in order to get it to dodge at all, which might not be the correct thing to do...)
I think this might be what you were looking for, although no doubt you have solved it by now.
Hopefully it will help someone else with the same issue.
A simple way is to use geom_dotplot like this:
ggplot(df,aes(x=block,y=acc)) +
geom_dotplot(binaxis = "y", stackdir = "center", binwidth = 0.03) + ylim(0,1) + labs(x='Block',y='Accuracy')
This looks like this:
Note that x (block in this case) has to be a factor for this to work.
If they don't have to be perfectly aligned horizontally, here's one quick way of doing it, using geom_jitter. You don't need to group by subj.
Method 1 [Simpler]: Using geom_jitter()
ggplot(df,aes(x=block,y=acc)) + geom_jitter(position=position_jitter(0.05)) + ylim(0,1) + labs(x='Block',y='Accuracy')
Play with the jitter width for greater degree of jittering.
which produces:
Method 2: Deterministically calculating the jitter value for each row
We first use aggregate to count the number of duplicated entries. Then in a new data frame, for each duplicated value, move it horizontally to the left by an epsilon distance.
df$subj <- NULL #drop this so that aggregate works.
#a new data frame that shows duplicated values
agg.df <- aggregate(list(numdup=seq_len(nrow(df))), df, length)
agg.df$block <- as.numeric(agg.df$block) #block is not a factor
# block acc numdup
#1 2 0.50 2
#2 1 0.58 2
#3 2 0.58 1
#4 1 0.67 2
#...
epsilon <- 0.02 #jitter distance
new.df <- NULL #create an expanded dataframe, with block value jittered deterministically
r <- 0
for (i in 1:nrow(agg.df)) {
for (j in 1:agg.df$numdup[i]) {
r <- r+1 #row counter in the expanded df
new.df$block[r] <- agg.df$block[i]
new.df$acc[r] <- agg.df$acc[i]
new.df$jit.value[r] <- agg.df$block[i] - (j-1)*epsilon
}
}
new.df <- as.data.frame(new.df)
ggplot(new.df,aes(x=jit.value,y=acc)) + geom_point(size=2) + ylim(0,1) + labs(x='Block',y='Accuracy') + xlim(0,3)
which produces:

Adding vertical line in plot ggplot

I am plotting a graph using the following piece of code:
library (ggplot2)
png (filename = "graph.png")
stats <- read.table("processed-r.dat", header=T, sep=",")
attach (stats)
stats <- stats[order(best), ]
sp <- stats$A / stats$B
index <- seq (1, sum (sp >= 1.0))
stats <- data.frame (x=index, y=sp[sp>=1.0])
ggplot (data=stats, aes (x=x, y=y, group=1)) + geom_line()
dev.off ()
1 - How one can add a vertical line in the plot which intersects at a particular value of y (for example 2)?
2 - How one can make the y-axis start at 0.5 instead of 1?
You can add vertical line with geom_vline(). In your case:
+ geom_vline(xintercept=2)
If you want to see also number 0.5 on your y axis, add scale_y_continuous() and set limits= and breaks=
+ scale_y_continuous(breaks=c(0.5,1,2,3,4,5),limits=c(0.5,6))
Regarding the first question:
This answer is assuming that the value of Y you desire is specifically within your data set. First, let's create a reproducible example as I cannot access your data set:
set.seed(9999)
stats <- data.frame(y = sort(rbeta(250, 1, 10)*10 ,decreasing = TRUE), x = 1:250)
ggplot(data=stats, aes (x=x, y=y, group=1)) + geom_line()
What you need to do is to use the y column in your data frame to search for the specific value. Essentially you will need to use
ggplot(data=stats, aes (x=x, y=y, group=1)) + geom_line() +
geom_vline(xintercept = stats[stats$y == 2, "x"])
Using the data I generated above, here's an example. Since my data frame does not likely contain the exact value 2, I will use the trunc function to search for it:
stats[trunc(stats$y) == 2, ]
# y x
# 9 2.972736 9
# 10 2.941141 10
# 11 2.865942 11
# 12 2.746600 12
# 13 2.741729 13
# 14 2.693501 14
# 15 2.680031 15
# 16 2.648504 16
# 17 2.417008 17
# 18 2.404882 18
# 19 2.370218 19
# 20 2.336434 20
# 21 2.303528 21
# 22 2.301500 22
# 23 2.272696 23
# 24 2.191114 24
# 25 2.136638 25
# 26 2.067315 26
Now we know where all the values of 2 are. Since this graph is decreasing, we will reverse it, then the value closest to 2 will be at the beginning:
rev(stats[trunc(stats$y) == 2, 1])
# y x
# 26 2.067315 26
And we can use that value to specify where the x intercept should be:
ggplot(data=stats, aes (x=x, y=y, group=1)) + geom_line() +
geom_vline(xintercept = rev(stats[trunc(stats$y) == 2, "x"])[1])
Hope that helps!

Resources