ggplot() color each point manually - r

How do I create a scatter-plot in ggplot() with each points coloured manually? The necessary colours are given in my dataframe.
> head(df)
x y col
1 0.72 2757 #2AAE89
2 0.72 2757 #2DFE83
3 0.72 2757 #40FE89
4 0.70 2757 #28FE97
5 0.86 2757 #007C7D
6 0.75 2757 #24FEA1
The colour of the points must be exactly as given in the dataframe

Luckily there is a relatively easy solution by using scale_colour_identity(), see the following example:
library(ggplot2)
z <- " x y z col
1 0.72 2757 86 #2AAE89
2 0.72 2757 86 #2DFE83
3 0.72 2757 86 #40FE89
4 0.70 2757 82 #28FE97
5 0.86 2757 26 #007C7D
6 0.75 2757 79 #24FEA1"
df <- read.table(text = z, header = T)
ggplot(df, aes(x, y, colour = col)) +
geom_point() +
scale_colour_identity()
EDIT: I made a mistake in loading in the data, but the plotting syntax is still valid.

Related

How to add "N = " labels to bar plot in R?

I'm looking to add "n = #" under each of the variables on the x-axis but I'm not sure how. The counts don't necessarily have to be under the names, just as long as the counts are there. I'm also working with two categorical variables, so that may be the issue too. Let me know if you have any suggestions, I'm new to R.
~
Here's some information on the dataset and the variables I'm comparing. The overall data set (scorpions) consists of scorpion species and what vegetation they're found in. Those are the two things I'm comparing. "species" is the vector for the species and "veg" is the vector for the vegetation type. These are both character vectors. I really just want to know how to add more labels onto my graph to give more clarification. This is what my graph currently looks like:
graph
I just want to be able to add number labels anywhere. If you want to recreate it, you can really use any dataset that consists of two character vectors. The other posts don't help because they consist of numerical vectors as well. If it's not possible to do this, then just let me know.
Thank you everyone for the help so far!
ggplot(data=scorpions, aes(x=species,y=veg,fill=veg)) +
geom_bar(stat="identity",color="black",position=position_dodge()) +
theme_stata() +
scale_fill_economist() +
theme(
axis.text.y = element_text(angle = 0),
axis.title = element_text(face="bold"),
axis.text.x = element_text(face = "italic")
) +
labs(title="Relationship Between Species and Vegetation Type")
I've tried changing the names in the Excel spreadsheet, but it looks really messy. I've also tried googling answers but nothing works since it's two categorical variables.
This question is in contrast to the most common dupe-links for grouped bar plots in ggplot2 in that other links (How to put labels over geom_bar for each bar in R with ggplot2 and How to put labels over geom_bar in R with ggplot2) tend to talk about one categorical variable only; this question asks about two categorical variables.
But it's not that hard: we just need to come up with a number for all combinations of each of the two categoricals. I'll use xtabs for that.
Using ggplot2::diamonds dataset, plotting against cut and color (both character):
library(ggplot2)
head(diamonds)
# # A tibble: 6 x 10
# carat cut color clarity depth table price x y z
# <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
# 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
# 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
# 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
# 4 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63
# 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
# 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
Starting with a simple (non-themed) bar plot:
gg <- ggplot(data=diamonds, aes(x=cut,y=color,fill=color)) +
geom_bar(stat="identity",color="black",position=position_dodge())
gg
Calculate the frequency table:
xtabs(~ cut + color, data = diamonds)
# color
# cut D E F G H I J
# Fair 163 224 312 314 303 175 119
# Good 662 933 909 871 702 522 307
# Very Good 1513 2400 2164 2299 1824 1204 678
# Premium 1603 2337 2331 2924 2360 1428 808
# Ideal 2834 3903 3826 4884 3115 2093 896
### convert to a frame
tab <- data.frame(xtabs(~ cut + color, data = diamonds))
head(tab)
# cut color Freq
# 1 Fair D 163
# 2 Good D 662
# 3 Very Good D 1513
# 4 Premium D 1603
# 5 Ideal D 2834
# 6 Fair E 224
New plot, adding geom_text:
gg +
geom_text(data = tab, aes(label = Freq),
position = position_dodge(width = 0.9), vjust = -0.25)

How can the facet function in ggplot be used to create a histogram in order to visualize distributions for all variables in a dataset?

This is an example of the variables that I would like to visualize
id post.test.score pre.test.score messages forum.posts av.assignment.score
1 0.37 0.48 68 7 0.19
2 0.52 0.37 83 22 0.28
3 0.42 0.37 81 7 0.25
4 0.56 0.34 94 14 0.27
5 0.25 0.39 42 11 0.07
I've copied the data from your post above so you can skip the variable assignment
library("tidyverse")
df <- read.table(file = "clipboard", header = T) %>%
as_tibble()
You need to modify your data structure slightly before you pass it to ggplot. Get each of your test names into a single variable with tidyr::gather. Then pipe to ggplot:
df %>%
gather(test, value, -id) %>%
ggplot(aes(x = value)) +
geom_histogram() +
facet_grid(~test)

Can't add legend panel to a certain scatter plot with multiple data sets

I simply can't find a way to plot legends panel in this specific ggplot with ggplot2 on R. Just want to make it appear.
For context, I'm plotting chemical abundances of sample versus the atomic number of the elements.
For background, I tried many things that are described here:
Reasons that ggplot2 legend does not appear
including links therein, however could not find a solution for my specific data set.
I know the problem could be within the structure of the data set, since I've been able to do that with other data, but I can't solve it. I also know that the problem should have to do with the theme() described in the code below, because when I use default ggplot configuration legends actually appear. I use this personalized theme for consistency trough out my work.
This is what I have so far removing cosmetics:
ggplot(atomic, aes(x=atomic$Z, y = atomic$avg, group=1), fill = atomic$Z) +
plot dots for average of values
geom_point(data=atomic, aes(x=atomic$Z, y=atomic$avg, group=1, color="black"), size=0.5, alpha=1, shape=16 ) +
connect dots for average of values
geom_line(data=atomic, aes(x=atomic$Z, y=atomic$avg, group=1), color="black", linetype= "dashed") +
plot dots for actual values from the samples
geom_point(data=atomic, aes(x=atomic$Z, y=atomic$SDSS, group=1, color="#00ba38"), size=5, alpha=1, shape=16, color="#00ba38") +
geom_point(data=atomic, aes(x=atomic$Z, y=atomic$HE22, group=1, color="#619cff"), size=5, alpha=1, shape=16, color="#619cff") +
geom_point(data=atomic, aes(x=atomic$Z, y=atomic$HE12, group=1, color="#F8766D"), size=5, alpha=1, shape=16, color="#F8766D") +
EDIT: the Definition of base_breaks (used below)
base_breaks_x <- function(x){
b <- pretty(x)
d <- data.frame(y=-Inf, yend=-Inf, x=min(b), xend=max(b))
list(geom_segment(data=d, aes(x=x, y=y, xend=xend, yend=yend), inherit.aes=FALSE),
scale_x_continuous(breaks=b))
}
base_breaks_y <- function(x){
b <- pretty(x)
d <- data.frame(x=-Inf, xend=-Inf, y=min(b), yend=max(b))
list(geom_segment(data=d, aes(x=x, y=y, xend=xend, yend=yend), inherit.aes=FALSE),
scale_y_continuous(breaks=b))
}
the problem might be here
theme_bw() +
theme(plot.title = element_text(hjust = 0.5),
text = element_text(size=20),
legend.position="bottom",
panel.border = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank()) +
base_breaks_x(atomic$Z) +
base_breaks_y(atomic$HE22)
The data set is the following
Z Name HE22 SDSS HE12 avg
1 3 Li NA 1.00 NA 1.00
2 6 C 6.16 5.50 6.06 5.91
3 7 N NA NA 6.49 6.49
4 11 Na NA NA 3.53 3.53
5 12 Mg 5.32 4.43 4.99 4.91
6 13 Al 2.90 NA 3.08 2.99
7 14 Si NA 4.90 4.89 4.90
8 20 Ca 4.07 3.37 3.56 3.67
9 21 Sc 0.72 -0.07 0.24 0.30
10 22 Ti 2.74 1.79 2.47 2.33
11 23 V NA NA 1.18 1.18
12 24 Cr 2.88 2.14 2.67 2.56
13 25 Mn 2.34 1.59 2.44 2.12
14 26 Fe 4.92 4.14 4.59 4.55
15 27 Co 2.57 1.72 2.36 2.22
16 28 Ni 3.63 2.96 3.51 3.37
17 29 Cu NA NA 0.31 0.31
18 30 Zn 2.29 NA 2.44 2.37
19 38 Sr 0.62 0.29 0.41 0.44
20 39 Y -0.22 -0.44 -0.33 -0.33
21 40 Zr 0.60 NA 0.30 0.45
22 56 Ba 0.13 -0.10 0.12 0.05
23 57 La -0.77 -0.49 -0.77 -0.68
24 58 Ce NA NA -0.39 -0.39
25 59 Pr NA NA -0.78 -0.78
26 60 Nd -0.47 NA -0.37 -0.42
27 62 Sm NA NA -0.57 -0.57
28 63 Eu -1.02 -0.92 -0.85 -0.93
29 64 Gd NA NA -0.39 -0.39
30 66 Dy NA NA -0.16 -0.16
31 68 Er NA -0.40 NA -0.40
32 70 Yb NA -0.60 NA -0.60
33 90 Th NA -0.60 NA -0.60
as Z = atomic number, Name = element, HE12/HE22/SDSS = samples, avg = average of the samples.
I would like to know how I can add legend panel coherent with the colors of my scatter plots.
Thank you so much! Hope I could describe the problem properly.
This is personally what I would do.
I converted the data from wide format to long format since it's easier to manipulate colors that way (Sorry I just used generic "key" and "value" since I'm not sure what you would want your columns to be named). Hopefully this will get you at least part of the way to where you want to go. Let me know if you have questions!
library(ggplot2)
library(tidyr)
p <- atomic %>%
gather(key = "key", value = "value", SDSS, HE22, HE12) %>%
ggplot(aes(Z, value, color = key))+
geom_point() +
geom_text(aes(x = Z, y = avg, label = Name), # EDITED
color = "black")
scale_color_manual(values = c("#00ba38", "#619cff", "#F8766D"))
p +
geom_line(data=atomic, aes(x=atomic$Z, y=atomic$avg, group=1), color="black",
linetype= "dashed") +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5),
text = element_text(size=20),
legend.position="bottom",
panel.border = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank()) +
base_breaks_x(atomic$Z) +
base_breaks_y(atomic$HE22)
EDITED
I added the geom_text() command so labels show up. You can adjust the arguments so the labels look better. I've also heard geom_text_repel() in the ggrepel package is helpful for creating nice labels: https://cran.r-project.org/web/packages/ggrepel/vignettes/ggrepel.html#examples

how graduate the axis of the graphic

I have two data tables:
vah_p_1
x y
0 4
0.25 5
0.27 6
0,29 7
0.31 8
0.33 10
0.34 13
0.36 16
0.37 20
0.38 23
0.39 28
0.4 37
0.41 43
0.42 55
0.43 67
0.44 81
0.45 94
0.46 118
0.47 143
0.48 187
0.49 225
vah_o_1
x y
-17.2 -9
-14.2 -8
-9.27 -7
-6.9 -6
-4.09 -5
0 -4
I need to build data for two tables in one graph(code below).
vah_p <- read.table(file='vah_p_1',header =TRUE)
y <- log2(vah_p$y)
x <- vah_p$x
mat_p <- data.frame(x,y)
error_p <- lm(y ~ x, mat_p)
error_p <- tidy(error_p)
vah_o <- read.table(file='vah_o_1',header =TRUE)
y <- log2((vah_o$y)*(-1))
x <- vah_o$x
mat_o <- data.frame(x,y)
error_o <- lm(y ~ x, mat_o)
error_o <- broom::tidy(error_o)
library(ggplot2)
p <- ggplot(vah_p, aes(x = x, y = y)) +
geom_point() + geom_point(data = vah_o, aes(x = x, y = y))
p
After compilation I will get a graph.
(source: savepice.ru)
This schedule is very bad. I tried to graduate the axis the graphics that looked better, but I did not succeed. Help please.
If you would like to change the scale as I understand the problem use
ggplot() + ylim(min, max)

How to add shaded confidence intervals to line plot with specified values

I have a small table of summary data with the odds ratio, upper and lower confidence limits for four categories, with six levels within each category. I'd like to produce a chart using ggplot2 that looks similar to the usual one created when you specify a lm and it's se, but I'd like R just to use the pre-specified values I have in my table. I've managed to create the line graph with error bars, but these overlap and make it unclear. The data look like this:
interval OR Drug lower upper
14 0.004 a 0.002 0.205
30 0.022 a 0.001 0.101
60 0.13 a 0.061 0.23
90 0.22 a 0.14 0.34
180 0.25 a 0.17 0.35
365 0.31 a 0.23 0.41
14 0.84 b 0.59 1.19
30 0.85 b 0.66 1.084
60 0.94 b 0.75 1.17
90 0.83 b 0.68 1.01
180 1.28 b 1.09 1.51
365 1.58 b 1.38 1.82
14 1.9 c 0.9 4.27
30 2.91 c 1.47 6.29
60 2.57 c 1.52 4.55
90 2.05 c 1.31 3.27
180 2.422 c 1.596 3.769
365 2.83 c 1.93 4.26
14 0.29 d 0.04 1.18
30 0.09 d 0.01 0.29
60 0.39 d 0.17 0.82
90 0.39 d 0.2 0.7
180 0.37 d 0.22 0.59
365 0.34 d 0.21 0.53
I have tried this:
limits <- aes(ymax=upper, ymin=lower)
dodge <- position_dodge(width=0.9)
ggplot(data, aes(y=OR, x=days, colour=Drug)) +
geom_line(stat="identity") +
geom_errorbar(limits, position=dodge)
and searched for a suitable answer to create a pretty plot, but I'm flummoxed!
Any help greatly appreciated!
You need the following lines:
p<-ggplot(data=data, aes(x=interval, y=OR, colour=Drug)) + geom_point() + geom_line()
p<-p+geom_ribbon(aes(ymin=data$lower, ymax=data$upper), linetype=2, alpha=0.1)
Here is a base R approach using polygon() since #jmb requested a solution in the comments. Note that I have to define two sets of x-values and associated y values for the polygon to plot. It works by plotting the outer perimeter of the polygon. I define plot type = 'n' and use points() separately to get the points on top of the polygon. My personal preference is the ggplot solutions above when possible since polygon() is pretty clunky.
library(tidyverse)
data('mtcars') #built in dataset
mean.mpg = mtcars %>%
group_by(cyl) %>%
summarise(N = n(),
avg.mpg = mean(mpg),
SE.low = avg.mpg - (sd(mpg)/sqrt(N)),
SE.high =avg.mpg + (sd(mpg)/sqrt(N)))
plot(avg.mpg ~ cyl, data = mean.mpg, ylim = c(10,30), type = 'n')
#note I have defined c(x1, x2) and c(y1, y2)
polygon(c(mean.mpg$cyl, rev(mean.mpg$cyl)),
c(mean.mpg$SE.low,rev(mean.mpg$SE.high)), density = 200, col ='grey90')
points(avg.mpg ~ cyl, data = mean.mpg, pch = 19, col = 'firebrick')

Resources