Colouring a specific point in a scatterplot in R - r

I'm somewhat new to R and ggplot2. I've been trying to create a scatterplot graph that has one specific point coloured. For example, here is my basic data frame
manager Confirmed Overturned keeping Stands total
A.J. Hinch 11 24 0 14 49
Angel Hernandez 0 1 0 0 1
Bill Miller 3 1 0 4 8
Bob Melvin 6 16 0 6 28
Brad Ausmus 3 11 0 13 27
With this I can create a simple scatterplot using this code,
p <- ggplot(data = Outcome, aes(x = Overturned, y = total))
p + geom_point()
I know how to add general colour, and add a colour scale, but I don't know how to colour just one point. For example, let's say I wanted to colour A.J. Hinch blue, and make every other point a different colour (probably grey or black), how would I do that?
Here is a link to the graph I want to create in Tableau.
https://public.tableau.com/profile/julien1554#!/vizhome/ManagerChallenges2014-2015/Sheet1
All help is appreciated, thanks.

You would just add another scatter plot layer to your plot. Here is the code that I used. Hope it helps!
> df = as.data.frame(cbind(Overturned = c(24,1,1,16,11), total = c(49,1,8,28,27)))
> library(ggplot2)
> p <- ggplot(data = df, aes(x = Overturned, y = total)) # creates the graph
> p + geom_point(data = df, color = "gray") + # creates main scatter plot with gray points
geom_point(data = df[1,], color = "blue") # colors A.J. Hinch's point blue
Here is the resulting graph:

Note that I'm just using the last name because when I read your data from the clipboard it thought the first names were row labels.
Outcome$color_me <- ifelse(Outcome$manager == "Hinch", "color_me", "normal")
textdf <- Outcome[Outcome$manager == "Hinch", ]
mycolors <- c("color_me" = "blue", "normal" = "grey50")
ggplot(data = Outcome, aes(x = Overturned, y = total)) +
geom_point(size = 3, aes(colour = color_me))
or with the manually defined color:
ggplot(data = Outcome, aes(x = Overturned, y = total)) +
geom_point(size = 3, aes(colour = color_me)) +
scale_color_manual("Status", values = mycolors)

Related

Trouble graphing two columns on one graph in R

I just started learning R. I melted my dataframe and used ggplot to get this graph. There's supposed to be two lines on the same graph, but the lines connecting seem random.
Correct points plotted, but wrong lines.
# Melted my data to create new dataframe
AvgSleep2_DF <- melt(AvgSleep_DF , id.vars = 'SleepDay_Date',
variable.name = 'series')
# Plotting
ggplot(AvgSleep2_DF, aes(SleepDay_Date, value, colour = series)) +
geom_point(aes(colour = series)) +
geom_line(aes(colour = series))
With or without the aes(colour = series) in the geom_line results in the same graph. What am I doing wrong here?
The following might explain what geom_line() does when you specify aesthetics in the ggplot() call.
I assign a deliberate colour column that differs from the series specification!
df <- data.frame(
x = c(1,2,3,4,5)
, y = c(2,2,3,4,2)
, colour = factor(c(rep(1,3), rep(2,2)))
, series = c(1,1,2,3,3)
)
df
x y colour series
1 1 2 1 1
2 2 2 1 1
3 3 3 1 2
4 4 4 2 3
5 5 2 2 3
Inheritance in ggplot will look for aesthetics defined in an upper layer.
ggplot(data = df, aes(x = x, y = y, colour = colour)) +
geom_point(size = 3) + # setting the size to stress point layer call
geom_line() # geom_line will "inherit" a "grouping" from the colour set above
This gives you
While we can control the "grouping" associated to each line(segment) as follows:
ggplot(data = df, aes(x = x, y = y, colour = colour)) +
geom_point(size = 3) +
geom_line(aes(group = series) # defining specific grouping
)
Note: As I defined a separate "group" in the series column for the 3rd point, it is depicted - in this case - as a single point "line".

How to customise the colors in stacked bar charts

Maybe a question someone already asked.
I have a data frame (dat) that looks like this:
Sample perc cl
a 30 0
b 22 0
s 2 0
z 19 0
a 12 1
b 45 1
s 70 1
z 1 1
a 60 2
b 67 2
s 50 2
z 18 2
I would like to generate a stacked barplot. To do this I used the following:
g = ggplot(dat, aes(x = cl, y = Perc,fill = Sample)
g + geom_bar(stat="identity", position = "fill", show.legend = FALSE) +
scale_fill_manual(name = "Samples", values=c("a"="blue","b" = "blue","s" = "gray","z" = "red"))`
Fortunately the colors are assigned correctly. My point is that the order of samples in the bar is from a to z from the top to the bottom of the bar but I would like a situation in which the gray is on the top without loss of continuity in the bar from the blue to the red. Maybe there's another way to color the bars and set the desired order.
The groups are plotted in the bars in the order of the factor levels. You can change the plotting order by changing the order of the factor levels in your call to aes with factor(var, levels(var[order])) like this:
library(ggplot2)
ggplot(dat, aes(x = cl, y = perc,
fill = factor(Sample, levels(Sample)[c(3,1,2,4)]))) +
geom_bar(stat="identity", position = "fill", show.legend = FALSE) +
scale_fill_manual(name = "Samples",
values=c("a"="blue","b" = "blue","s" = "gray","z" = "red"))

Plotting variable means for each level of the independent variable. R

Given the next code and dataframe:
require(data.table)
require(ggplot2)
dat1 <- fread('J S1 S2 S3 S4 Z
1 4 5 3 2 0
1 6 5 6 5 1
2 3 5 8 9 0
2 12 11 34 44 1
3 11 23 23 22 0
3 12 15 22 21 1')
temp <- melt(dat1, id.vars = c("J", "Z"))
ggplot(temp, aes(x = J, y = value, color = variable, shape = as.factor(Z))) +
geom_point()
I'd like to plot in the same graph the mean of values (S1, S2, S3, S4) for each level of J. I mean, for S1, get 3 points in my graph: 5.5, 7.5, 11.5. For S2, another 3 points, and so on...
I'm trying this:
ggplot(temp, aes(x = J, y = mean(value), color = variable, shape = as.factor(Z))) +
geom_point()
Plot
I get only one point for each full set of data. But I'd like to get in the same graph the mean of S1 for each level of J (1,2,3), the mean of S2 for each level of J, the mean of S3 for each level of J, and the mean of S4 for each level of J.
You need to add rows for mean in your data.
Please let me know if this make sense or you wish to have something different.
You can do:
library(data.table)
temp1 <- setDT(temp)[,.(value = mean(value)),by=.(J,variable)]
ggplot(temp1, aes(x = J, y = value, color=factor(variable))) +
geom_point()
OR you can do :
ggplot(temp1, aes(x = variable, y = value, color=factor(J))) +
geom_point()
EDIT, after OP's request:
To get Z variable into account, you need to summarize the data basis Z as well like below and then plot:
temp1 <- setDT(temp)[,.(value = mean(value)),by=.(J,variable,Z)]
ggplot(temp1, aes(x = variable, y = value, color=factor(J),shape=factor(Z))) +
geom_point()
Now the plot contains three categorical variables, "variable","J" and "Z", you can play with them by switching them interchangeably to see what fits your need, don't forget to use factor() before them in case you want to use shape and color in the aesthetics. If you want to draw a graph for 0s and 1s separately then you have to use facet_wrap, like below:
ggplot(temp1, aes(x = variable, y = value, color=factor(J),shape=factor(Z))) +
geom_point() + facet_wrap(~Z)

Modyfing the Legend in ggplot2

I've got a problem interacting with the labels in ggplot2.
I have two data sets (Temperature vs. Time) from two experiments but recorded at different timesteps. I've managed to merge the data frames and put them in a long fashion to plot them in the same graph, using the melt function from the reshape2 library.
So, the initial data frames look something like this:
> d1
step Temp
1 512.5 301.16
2 525.0 299.89
3 537.5 299.39
4 550.0 300.58
5 562.5 300.20
6 575.0 300.17
7 587.5 300.62
8 600.0 300.51
9 612.5 300.96
10 625.0 300.21
> d2
step Temp
1 520 299.19
2 540 300.39
3 560 299.67
4 580 299.43
5 600 299.78
6 620 300.74
7 640 301.03
8 660 300.39
9 680 300.54
10 700 300.25
I combine it like this:
> mrgd <- merge(d1, d2, by = "step", all = T)
step Temp.x Temp.y
1 512.5 301.16 NA
2 520.0 NA 299.19
...
And put it into long format for ggplot2 with this:
> melt1 <- melt(mrgd3, id = "step")
> melt1
step variable value
1 512.5 Temp.x 301.16
2 520.0 Temp.x NA
...
Now, I want to for example do a histogram of the distribution of values. I do it like this:
p <- ggplot(data = melt1, aes(x = value, color = variable, fill = variable)) + geom_histogram(alpha = 0.4)
My problem is when I try to modify the Legend of this graph, I don't know how to! I've followed what is suggested in the R Graphics Cookbook book, but I've had no luck.
I've tried to do this, for example (to change the labels of the Legend):
> p + scale_fill_discrete(labels = c("d1", "d2"))
But I just create a "new" Legend box, like so
Or even removing the Legend completely
> p + scale_fill_discrete(guide = F)
I just get this
Finally, doing this also doesn't help
> p + scale_fill_discrete("")
Again, it just adds a new Legend box
Does anyone know what's happening here? It looks as if I'm actually modyfing another Label object, if that makes any sense. I've looked into other related questions in this site, but I haven't found someone having the same problem as me.
Get rid of the aes(color = variable...) to remove the scale that belongs to aes(color = ...).
ggplot(data = melt1, aes(x = value, fill = variable)) +
geom_histogram(alpha = 0.4) +
scale_fill_discrete(labels = c("d1", "d1")) # Change the labels for `fill` scale
This second plot contains aes(color = variable...). Color in this case will draw colored outlines around the histogram bins. You can turn off the scale so that you only have one legend, the one created from fill
ggplot(data = melt1, aes(x = value, color = variable, fill = variable)) +
geom_histogram(alpha = 0.4) +
scale_fill_discrete(labels = c("d1", "d1")) +
scale_color_discrete(guide = F) # Turn off the color (outline) scale
The most straightforward thing to do would be to not use reshape2 or merge at all, but instead to rbind your data frames:
dfNew <- rbind(data.frame(d1, Group = "d1"),
data.frame(d2, Group = "d2"))
ggplot(dfNew, aes(x = Temp, color = Group, fill = Group)) +
geom_histogram(alpha = 0.4) +
labs(fill = "", color = "")
If you wanted to vary alpha by group:
ggplot(dfNew, aes(x = Temp, color = Group, fill = Group, alpha = Group)) +
geom_histogram() +
labs(fill = "", color = "") +
scale_alpha_manual("", values = c(d1 = 0.4, d2 = 0.8))
Note also that the default position for geom_histogram is "stacked". There won't be overlap of the bars unless you use geom_histogram(position = identity).

Control dot colour in ggplot

How can I control the colour of the dots in the scatter plot by ggplot2? I need the first 20 points to have a colour, then the next 20 to have a different colour. At the moment I am using base R plot output. The matrix looks like this
1 4
1 3
2 9
-1 8
9 9
and I have a colour vector which looks like
cols<-c("#B8DBD3","#FFB933","#FF6600","#0000FF")
then
plot(mat[,1],mat[,2],col=cols)
works.
How could I do this ggplot?
Regarding the colours
my cols vector looks ike this
100->n
colours<-c(rep("#B8DBD3",n),rep("#FFB933",n),rep("#FF6600",n),rep("#0000FF",n),rep("#00008B",n),rep("#ADCD00",n),rep("#008B00",n),rep("#9400D3",n))
when I then do
d<-ggplot(new,aes(x=PC1,y=PC2,col=rr))
d+theme_bw() +
scale_color_identity(breaks = rep(colours, each = 1)) +
geom_point(pch=21,size=7)
the colours look completely different from
plot(new[,1],new[,2],col=colours)
this looks like
http://fs2.directupload.net/images/150417/2wwvq9u2.jpg
while ggplot with the same colours looks like
http://fs1.directupload.net/images/150417/bwc5wn7b.jpg
I would recommend creating a column that designates to which group a point belongs to.
library(ggplot2)
xy <- data.frame(x = rnorm(80), y = rnorm(80), col = as.factor(rep(1:4, each = 20)))
cols<-c("#B8DBD3","#FFB933","#FF6600","#0000FF")
ggplot(xy, aes(x = x, y = y, col = col)) +
theme_bw() +
scale_colour_manual(values = cols) +
geom_point()

Resources