Manually draw boxplot using ggplot - r

I think my question is very similar to this one, the only difference being that I'd love to use ggplot (and the answer with ggplot was missing a tiny bit of detail). I have data like this:
show<-structure(list(Median = c(20, 39, 21, 52, 45.5, 24, 36, 20, 134,
27, 44, 43), IQR = c(4, 74, 28, 51.5, 73.5, 18, 47.5, 26.5, 189.5,
46, 54, 61), FirstQuartile = c(`25%` = 19, `25%` = 24, `25%` = 12,
`25%` = 30.5, `25%` = 36.5, `25%` = 18, `25%` = 16.5, `25%` = 13,
`25%` = 53.5, `25%` = 15, `25%` = 24.5, `25%` = 27), ThirdQuartile = c(`75%` = 23,
`75%` = 98, `75%` = 40, `75%` = 82, `75%` = 110, `75%` = 36,
`75%` = 64, `75%` = 39.5, `75%` = 243, `75%` = 61, `75%` = 78.5,
`75%` = 88), Group = c("Program Director", "Editor", "Everyone",
"Board Director", "Board Director", "Program Director", "Editor",
"Everyone", "Board Director", "Everyone", "Editor", "Program Director"
), Decade = c("1980's", "1980's", "1980's", "1980's", "1990's",
"1990's", "1990's", "1990's", "2000's", "2000's", "2000's", "2000's"
)), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame"
))
And I would like to draw a graph like this:
With "group" as the color, instead of "fellowship". The problem is, that graph was drawn from "complete" data (with 800ish rows), and I clearly only have summary data above. I realize it won't be able to draw outliers but that is ok. Any help would be appreciated! I'm specifically struggling with how I would draw the ymin/max and the edges of the notch. Thank you

You can use geom_boxplot() with stat = "identity" and fill in the five boxplot numbers as aesthetics.
library(ggplot2)
# show <- structure(...) # omitted for previty
ggplot(show, aes(Decade, fill = Group)) +
geom_boxplot(
stat = "identity",
aes(lower = FirstQuartile,
upper = ThirdQuartile,
middle = Median,
ymin = FirstQuartile - 1.5 * IQR, # optional
ymax = ThirdQuartile + 1.5 * IQR) # optional
)
As pointed out by jpsmith in the comments below, the 1.5 * IQR rule becomes hairy if you don't have the range of the data. However, if you have information about the data extrema or the data domain, you can limit the whiskers as follows:
# Dummy values assuming data is >= 0 up to infinity
show$min <- 0
show$max <- Inf
ggplot(show, aes(Decade, fill = Group)) +
geom_boxplot(
stat = "identity",
aes(lower = FirstQuartile,
upper = ThirdQuartile,
middle = Median,
ymin = pmax(FirstQuartile - 1.5 * IQR, min),
ymax = pmin(ThirdQuartile + 1.5 * IQR, max))
)

Related

How to get rid of annotations on faceted graph?

Problem
I am trying to label the left facet side of my graph while leaving out the annotations on the right side.
Data
Here are my libraries and data:
#### Libraries ####
library(tidyverse)
library(ggpubr)
library(plotly)
#### Dput ####
emlit <- structure(list(X = 1:20, Ethnicity = c("Asian (other than Chinese)",
"Filipino", "Indonesian", "Thai", "Japanese", "Korean", "South Asian",
"Indian", "Nepalese", "Pakistani", "Other South Asian", "Other Asian",
"White", "Mixed", "With Chinese parent", "Other mixed", "Others",
"All ethnic minorities", "All ethnic minorities, excluding\n foreign domestic helpers",
"Whole population"), Age_5.14 = c(65.8, 72.2, 69.4, 83.1, 26.6,
52.4, 67.4, 60.4, 69.5, 71.5, 92.5, 92, 34.8, 76.6, 84.2, 45.3,
51.3, 64.3, 64.3, 94.8), Age_15.24 = c(28.1, 29.2, 4.4, 72.9,
34.8, 50.3, 38.7, 41.4, 22.2, 54.3, 41.9, 64.7, 24.4, 82.9, 90.7,
37.4, 53.2, 40.6, 52.9, 96.9), Age_25.34 = c(4.5, 1.8, 4.6, 20,
17.2, 26.8, 6.6, 4.2, 6.4, 11.9, 12, 33.9, 15, 60.5, 82, 6.7,
11.2, 7.8, 21.8, 84.9), Age_35.44 = c(6.3, 2, 6.1, 35.7, 36.5,
25.5, 9.4, 6.2, 10.5, 10.1, 22.4, 35.7, 8.6, 63, 83.2, 4.5, 12.2,
9.5, 23.4, 84.6), Age_45.54 = c(8.1, 2.3, 8, 23.2, 43.4, 59.6,
7.5, 6.3, 3.9, 13.5, 28.3, 47.5, 13.1, 72.1, 84, 4.4, 22.4, 14.2,
27.7, 92.5), Age_55.64 = c(15.9, 4.4, 44, 27, 41.7, 52.8, 11.8,
7.4, 9.5, 2, 54.2, 39.6, 12.7, 75.3, 80.1, 2.6, 20.6, 25, 32.4,
94.8), Age_65. = c(31.1, 11.9, 82.6, 39, 46.4, 57, 9.5, 3.9,
NA, 11.4, 66.5, 74.5, 14.5, 80.5, 81, 57.5, 13.6, 42.7, 44, 82.3
), Age_Overall = c(10.1, 3.5, 6.4, 31.4, 35.1, 39.8, 20.4, 15.3,
16.4, 33.8, 30.4, 46.3, 15.4, 72.7, 83.9, 19.4, 19.8, 16.9, 35.2,
89.4)), class = "data.frame", row.names = c(NA, -20L))
I have also pivoted the data for my graph:
#### Pivot Data ####
emlitpivot <- emlit %>%
pivot_longer(cols = contains("Age"),
names_to = "Age_Range",
values_to = "Percent")
Plot
Here is my plot so far, a faceted graph that breaks down literacy by age with some notes on some important points on the left:
#### EM vs all ####
# Order
order <- c("5-14", "15-24", "25-34", "35-44", "45-54", "55-64", "65+", "Overall",
"5-14", "15-24", "25-34", "35-44", "45-54", "55-64", "65+", "Overall")
# Plot
plot <- emlitpivot %>%
filter(Ethnicity %in% c("All ethnic minorities",
"Whole population")) %>%
ggbarplot(x="Age_Range",
y="Percent",
fill = "Ethnicity",
label = T,
palette = "jco",
facet.by = "Ethnicity",
title = "EM x Native Chinese Literacy by Age",
xlab = "Age Range",
ylab = "Literacy in Chinese (By Percent)",
caption = "*Data obtained from Census and Statistics Department Hong Kong SAR, 2016.")+
theme_cleveland()+
theme(axis.text.x = element_text(angle = 45,
hjust = .5,
vjust = .5),
legend.position = "none",
plot.caption = element_text(face = "italic"))+
scale_x_discrete(labels=order)+
geom_segment(aes(x = 3, y = 15, xend = 3, yend = 48))+
geom_segment(aes(x = 1, y = 71, xend = 1, yend = 80))+
geom_segment(aes(x = 7, y = 50, xend = 7, yend = 65))+
annotate("text",
x=4,
y=53,
label = "Post-college workers can't read.")+
annotate("text",
x=3.5,
y=85,
label = "School age supports seem to boost initial literacy.")+
annotate("text",
x=6,
y=70,
label = "Increase due to generational literacy?")
# Print plot:
plot
However, you can probably guess what the problem is:
How do I get rid of the annotations on the right? I'm not sure if there is a simple way of getting rid of them, but it would be helpful to only have text on the left side.
In this case, I'll use geom_text instead of annotate, since it allows you to have subset of your data.
library(tidyverse)
library(ggpubr)
emlitpivot %>%
filter(Ethnicity %in% c(
"All ethnic minorities",
"Whole population"
)) %>%
ggbarplot(
x = "Age_Range",
y = "Percent",
fill = "Ethnicity",
label = T,
palette = "jco",
facet.by = "Ethnicity",
title = "EM x Native Chinese Literacy by Age",
xlab = "Age Range",
ylab = "Literacy in Chinese (By Percent)",
caption = "*Data obtained from Census and Statistics Department Hong Kong SAR, 2016."
) +
theme_cleveland() +
theme(
axis.text.x = element_text(
angle = 45,
hjust = .5,
vjust = .5
),
legend.position = "none",
plot.caption = element_text(face = "italic")
) +
scale_x_discrete(labels = order) +
geom_segment(data = subset(emlitpivot, Ethnicity == "All ethnic minorities"), aes(x = 3, y = 15, xend = 3, yend = 48)) +
geom_segment(data = subset(emlitpivot, Ethnicity == "All ethnic minorities"), aes(x = 1, y = 71, xend = 1, yend = 80)) +
geom_segment(data = subset(emlitpivot, Ethnicity == "All ethnic minorities"), aes(x = 7, y = 50, xend = 7, yend = 65)) +
geom_text(data = subset(emlitpivot, Ethnicity == "All ethnic minorities"), aes(4, 53), label = "Post-college workers can't read.", check_overlap = T) +
geom_text(data = subset(emlitpivot, Ethnicity == "All ethnic minorities"), aes(3.5, 85), label = "School age supports seem to boost initial literacy.", check_overlap = T) +
geom_text(data = subset(emlitpivot, Ethnicity == "All ethnic minorities"), aes(6, 70), label = "Increase due to generational literacy?", check_overlap = T)
Update remove lines in second facet:
Create a dataframe with your text labels and position and add it to the plot,
to remove the lines do the same procedure:
df for text = ann_text
df for lines = segm
ann_text <- data.frame(x = c(4, 3.5, 6),
y = c(53, 85, 70),
lab = c("Post-college workers can't read.", "School age supports seem to boost initial literacy.",
"Increase due to generational literacy?"),
Ethnicity = rep("All ethnic minorities", 3))
segm <- data.frame(x = c(3,1,7),
y = c(15, 71, 50),
xend = c(3,1,7),
yend = c(48,80,65),
Ethnicity = rep("All ethnic minorities", 3))
plot1 <- plot +
geom_text(
data = ann_text,
mapping = aes(x = x, y = y, label = lab)
)
plot1 + geom_segment(
data = segm,
mapping = aes(x = x, y = y, xend = xend, yend = yend)
)
remove the following from your code:
annotate("text",
x=4,
y=53,
label = "Post-college workers can't read.")+
annotate("text",
x=3.5,
y=85,
label = "School age supports seem to boost initial literacy.")+
annotate("text",
x=6,
y=70,
label = "Increase due to generational literacy?")

Is it possible to show a x-axis variable on multiple lines in ggplot2 [duplicate]

This question already has answers here:
Wrap long axis labels via labeller=label_wrap in ggplot2
(4 answers)
Closed 1 year ago.
I have some data, for the variable names are too long. When I don't have them in an angle, they overlap. When I have them in an angle they look like the example below.
What I would like to do is simply have the possibility to write the problematic variable as:
This is a very long
name specifically
for the example
But I cannot figure out how to do this in ggplot2.
library(ggplot2)
counts <- structure(list(ECOST = c("0.52", "0.52", "0.39", "0.39", "0.26",
"0.26", "0.13", "0.13", "0.00", "This is a very long name specifically for the example"), group = c("control",
"treatment", "control", "treatment", "control", "treatment",
"control", "treatment", "control", "treatment"), count = c(18,
31, 30, 35, 47, 46, 66, 68, 86, 86), percentage = c(16.3636363636364,
31.9587628865979, 27.2727272727273, 36.0824742268041, 42.7272727272727,
47.4226804123711, 60, 70.1030927835051, 78.1818181818182, 88.659793814433
), total = c(110, 97, 110, 97, 110, 97, 110, 97, 110, 97), negative_count = c(92,
66, 80, 62, 63, 51, 44, 29, 24, 11), p_value = c(0.00843644912924255,
0.00843644912924255, 0.172947686684261, 0.172947686684261, 0.497952719783453,
0.497952719783453, 0.128982570547408, 0.128982570547408, 0.0447500820026408,
0.0447500820026408)), row.names = c(NA, -10L), class = c("data.table",
"data.frame"))
ECOST group count percentage total negative_count p_value
1: 0.52 control 18 16 110 92 0.0084
2: 0.52 treatment 31 32 97 66 0.0084
3: 0.39 control 30 27 110 80 0.1729
4: 0.39 treatment 35 36 97 62 0.1729
5: 0.26 control 47 43 110 63 0.4980
6: 0.26 treatment 46 47 97 51 0.4980
7: 0.13 control 66 60 110 44 0.1290
8: 0.13 treatment 68 70 97 29 0.1290
9: 0.00 control 86 78 110 24 0.0448
10: This is a very long name specifically for the example treatment 86 89 97 11 0.0448
counts %>%
ggplot(aes(x = ECOST, y = percentage, fill = group, label=sprintf("%.02f %%", round(percentage, digits = 1)))) +
geom_col(position = 'dodge') +
geom_text(position = position_dodge(width = .9), # move to center of bars
vjust = -0.5, # nudge above top of bar
size = 4) +
scale_fill_grey(start = 0.8, end = 0.5) +
theme_bw(base_size = 15) +
theme(axis.text.x=element_text(angle=45,hjust=1))
The simplest solution is to use str_wrap from stringr package to set the new lines automatically and make your plot code reproducible in other scenarios. The scales package also provides label_wrap and wrap_format which can be convenient in some cases (for example here you can also use scale_x_discrete(labels = scales::wrap_format(20))).
library(tidyverse)
library(ggplot2)
counts <- structure(list(ECOST = c("0.52", "0.52", "0.39", "0.39", "0.26",
"0.26", "0.13", "0.13", "0.00", "This is a very long name specifically for the example"), group = c("control",
"treatment", "control", "treatment", "control", "treatment",
"control", "treatment", "control", "treatment"), count = c(18,
31, 30, 35, 47, 46, 66, 68, 86, 86), percentage = c(16.3636363636364, 31.9587628865979, 27.2727272727273, 36.0824742268041, 42.7272727272727,
47.4226804123711, 60, 70.1030927835051, 78.1818181818182, 88.659793814433
), total = c(110, 97, 110, 97, 110, 97, 110, 97, 110, 97), negative_count = c(92,
66, 80, 62, 63, 51, 44, 29, 24, 11), p_value = c(0.00843644912924255,
0.00843644912924255, 0.172947686684261, 0.172947686684261, 0.497952719783453,
0.497952719783453, 0.128982570547408, 0.128982570547408, 0.0447500820026408,
0.0447500820026408)), row.names = c(NA, -10L), class = c("data.table",
"data.frame"))
counts %>%
ggplot(aes(x = ECOST, y = percentage, fill = group, label=sprintf("%.02f %%", round(percentage, digits = 1)))) +
geom_col(position = 'dodge') +
geom_text(position = position_dodge(width = .9), # move to center of bars
vjust = -0.5, # nudge above top of bar
size = 4) +
scale_fill_grey(start = 0.8, end = 0.5) +
theme_bw(base_size = 15) +
theme(axis.text.x=element_text(angle=45,hjust=1)) +
scale_x_discrete(labels = function(x) stringr::str_wrap(x, width = 20))
Created on 2021-02-22 by the reprex package (v0.3.0)
You can break lines using \n.
Code:
library(ggplot2)
counts <- structure(list(ECOST = c("0.52", "0.52", "0.39", "0.39", "0.26",
"0.26", "0.13", "0.13", "0.00", "This is a \nvery long name \nspecifically for the \nexample"), group = c("control",
"treatment", "control", "treatment", "control", "treatment",
"control", "treatment", "control", "treatment"), count = c(18,
31, 30, 35, 47, 46, 66, 68, 86, 86), percentage = c(16.3636363636364,
31.9587628865979, 27.2727272727273, 36.0824742268041, 42.7272727272727,
47.4226804123711, 60, 70.1030927835051, 78.1818181818182, 88.659793814433
), total = c(110, 97, 110, 97, 110, 97, 110, 97, 110, 97), negative_count = c(92,
66, 80, 62, 63, 51, 44, 29, 24, 11), p_value = c(0.00843644912924255,
0.00843644912924255, 0.172947686684261, 0.172947686684261, 0.497952719783453,
0.497952719783453, 0.128982570547408, 0.128982570547408, 0.0447500820026408,
0.0447500820026408)), row.names = c(NA, -10L), class = c("data.table",
"data.frame"))
library(dplyr)
counts %>%
ggplot(aes(x = ECOST, y = percentage, fill = group, label=sprintf("%.02f %%", round(percentage, digits = 1)))) +
geom_col(position = 'dodge') +
geom_text(position = position_dodge(width = .9), # move to center of bars
vjust = -0.5, # nudge above top of bar
size = 4) +
scale_fill_grey(start = 0.8, end = 0.5) +
theme_bw(base_size = 15) +
theme(axis.text.x=element_text(angle=45,hjust=1))
-output

R - tidyverse/ggplot bar chart with custom discrete data labels and sorted by one variable?

I have a data frame with which I am learning tidyverse methods in R that looks like this:
> glimpse(data)
Observations: 16
Variables: 6
$ True.species <fct> Badger, Blackbird, Brown hare, Domestic cat, Domestic d...
$ misidentified <dbl> 17, 16, 59, 20, 12, 24, 28, 6, 3, 7, 191, 19, 110, 21, ...
$ missed <dbl> 61, 106, 7, 24, 16, 160, 110, 12, 15, 37, 200, 58, 259,...
$ Total <dbl> 78, 122, 66, 44, 28, 184, 138, 18, 18, 44, 391, 77, 369...
$ PrMissed <dbl> 0.7820513, 0.8688525, 0.1060606, 0.5454545, 0.5714286, ...
$ PrMisID <dbl> 0.21794872, 0.13114754, 0.89393939, 0.45454545, 0.42857...
Here is the dput():
data <- structure(list(True.species = structure(c(1L, 2L, 3L, 5L, 6L,
7L, 8L, 9L, 13L, 16L, 17L, 18L, 20L, 21L, 22L, 23L), .Label = c("Badger",
"Blackbird", "Brown hare", "Crow", "Domestic cat", "Domestic dog",
"Grey squirrel", "Hedgehog", "Horse", "Human", "Jackdaw", "Livestock",
"Magpie", "Muntjac", "Nothing", "Pheasant", "Rabbit", "Red fox",
"Red squirrel", "Roe Deer", "Small rodent", "Stoat or Weasel",
"Woodpigeon"), class = "factor"), misidentified = c(17, 16, 59,
20, 12, 24, 28, 6, 3, 7, 191, 19, 110, 21, 5, 13), missed = c(61,
106, 7, 24, 16, 160, 110, 12, 15, 37, 200, 58, 259, 473, 9, 17
), Total = c(78, 122, 66, 44, 28, 184, 138, 18, 18, 44, 391,
77, 369, 494, 14, 30), PrMissed = c(0.782051282051282, 0.868852459016393,
0.106060606060606, 0.545454545454545, 0.571428571428571, 0.869565217391304,
0.797101449275362, 0.666666666666667, 0.833333333333333, 0.840909090909091,
0.51150895140665, 0.753246753246753, 0.70189701897019, 0.95748987854251,
0.642857142857143, 0.566666666666667), PrMisID = c(0.217948717948718,
0.131147540983607, 0.893939393939394, 0.454545454545455, 0.428571428571429,
0.130434782608696, 0.202898550724638, 0.333333333333333, 0.166666666666667,
0.159090909090909, 0.48849104859335, 0.246753246753247, 0.29810298102981,
0.0425101214574899, 0.357142857142857, 0.433333333333333)), row.names = c(NA,
-16L), class = "data.frame")
I managed to make a rudimentary plot of what I want with ggplot() as follows:
ggplot(data = data, aes(x = True.species, y = PrMissed)) + geom_bar(stat = "identity")
But there are three things I can't figure out how to do:
I want a stacked bar chart where the variables PrMissed and PrMisID are on top of each other. Note that PrMissed + PrMisID == 1 for each row in the data frame, so the final plot would have equally high stacks but each containing two colors (how do I specify them?), one for PrMissed and another for PrMisID.
I want the order of the bars to be in ascending order of the PrMissed variable so that Brown hare would be on one end and Small rodent on the other.
I prefer this plot to be "flipped" on its side so that the labels (the animal names like "Brown hare") are on the left side and easier to read. An added complexity is that rather than the labels simply saying the animal name, I want them to say the corresponding Total value, so for example Brown hare would get a corresponding axis label like "Brown hare (total = 66)".
I been trying for a long time a for the life of me couldn't figure out an axiomatic way to do this with ggplot(). I know the answer might be simple so please excuse my ignorance. Can anyone help? Thanks in advance.
Here's my answer which does not require the use of data.tables and is solely based on tidyverse packages:
library(ggplot2)
library(reshape2)
library(magrittr)
library(dplyr)
# order Species by PrMissed value
data$True.species <- factor(data$True.species,
levels = data[order(data$PrMissed, decreasing = F),"True.species"])
# reshape to have the stackable values and plot
melt(data,
id.vars = c("True.species", "misidentified", "missed", "Total"),
measure.vars = c("PrMissed", "PrMisID")) %>%
mutate(x_axis_text = paste(.$True.species, "(Total = ", .$Total, ")") ) %>%
ggplot(aes(x = x_axis_text, y = value, fill = variable) ) +
geom_bar(stat = "identity") +
coord_flip()
Which would result in a plot like this
Break down of the code:
Your individual points are done like this.
1) To have stackable values, they need to be all in one column, so using melt from the reshape2 package we tidy the data and create 2 new columns in the data. One is value containing the values from 0 to 1 and the other is variable indicating if that number is associated with PrMissed or PrMisID
2) Before melting the data we convert the True.species values into factor based on PrMissed values. Use decreasing = T to invert the order if you wish.
3) coord_flip() flips the x and y axis so that the species are on the y axis instead of the y axis and you can easily read them on the left side.
I can help with a data.table and ggplot2 solution:
First, you'll need to make your wide table a long one with melt. Then, you're looking for position = "stack" argument to geom_bar:
Also, please notice that naming data a table is bad idea, as there's a function called data().
require(data.table)
ggplot(melt(df[, .(True.species, PrMissed, PrMisID)],
id.vars="True.species"),
aes(x = True.species, y = value, fill = variable))+
geom_bar(position = "stack", stat = "identity")
I forgot about the sorting... (and rotation of texts, so they are readable):
ggplot(melt(df[, .(True.species, PrMissed, PrMisID)],
id.vars="True.species"),
aes(x = True.species, y = value,
fill = variable))+
geom_bar(position = "stack", stat = "identity")+
theme(axis.text.x = element_text(angle = 90))+
scale_x_discrete(limits = sort(df$True.species))

How to edit the score ranges in highcharter-histogram tooltip

I am using the highcharter package to make a histogram. I want to get rid of the default score range display -(x, y]- and replace it with something that reads: score ranges: x to y
library(highcharter)
apple <- c(0, 22, 5, 32, 34, 35, 56, 67, 42, 67, 12, 99, 46, 78, 43, 67, 33, 11)
hchart(apple, color = "#a40c19", breaks = 20) %>%
hc_yAxis(title = list(text = "Number of Apples")) %>%
hc_xAxis(title = list(text = "Score (0-100)")) %>%
hc_tooltip(borderWidth = 1, sort = TRUE, crosshairs = TRUE,
pointFormat = "Score Range: {point.x} to {point.x} <br> Number of Apples: {point.y}") %>%
hc_legend(enabled = FALSE)
For example, in the picture below, I want to get rid of the heading (30, 35] and replace it with Score Range: 30 to 35.
First of all you need to know what is the interval length of your histogram:
library(highcharter)
apple <- c(0, 22, 5, 32, 34, 35, 56, 67, 42, 67, 12, 99, 46, 78, 43, 67, 33, 11)
h <- hist(apple, breaks = 20)
d <- diff(h$breaks)[1]
d
> d
[1] 5
Now, you need to use pointFormatter instead of pointFormat because allow you to have more control in the output of the tooltip. pointFormat need a string template and pointFormatter need a javascript function.
You put the delta in that function to get the right limits in for every interval. Obviously you can do it more elegant but that's the idea.
hchart(h, color = "#a40c19", breaks = 20) %>%
hc_yAxis(title = list(text = "Number of Apples")) %>%
hc_xAxis(title = list(text = "Score (0-100)")) %>%
hc_tooltip(borderWidth = 1, sort = TRUE, crosshairs = TRUE,
headerFormat = "",
pointFormatter = JS("function() {
return 'Score Range:' + (this.x - 5/2) + ' to ' + (this.x + 5/2) + '<br> Number of Apples:' + this.y;
}")) %>%
hc_legend(enabled = FALSE)
Finally you use headerFormat = "" to remove the header.

R: How Plot an Excel Table(Matrix) with R

I got this problem I still haven't found out how to solve it. I want to plot all the Values MW1, MW2 and MW3 in function of "DHT + Procymidone". How can I plot all this values in the graphic so that I will get 3 different curves (in different colors and different number like curve 1, 2, ...)? And I want the labels of the X-Values("DHT + Procymidone") to be like -10, -9, ... , -4 instead of 1,00E-10, ...
DHT + Procymidone MW 1 MW 2 MW 3
1,00E-10 114,259526780335 111,022461066274 213,212408408682
1,00E-09 115,024187788314 111,083316791613 114,529425136628
1,00E-08 110,517449986348 107,867941606743 125,10230718665
1,00E-07 100,961311263444 98,4219995773135 116,045168653416
1,00E-06 71,2383604211297 73,539659636842 50,3213799775309
1,00E-05 20,3553333652104 36,1345771905088 15,42260866106
1,00E-04 4,06189509055904 18,1246447874679 10,1988107887318
I have shortened your data frame for convenience reasons, so here's an example:
mydat <- data.frame(DHT_Procymidone = c(-10, -9, -8, -7, -6, -5, -4),
MW1 = c(114, 115, 110, 100, 72, 20, 4),
MW2 = c(111, 111, 107, 98, 73, 36, 18),
MW3 = c(213, 114, 123, 116, 50, 15, 10))
library(tidyr)
library(ggplot2)
mydf <- gather(mydat, "grp", "MW", 2:4)
ggplot(mydf, aes(x = DHT_Procymidone, y = MW, colour = grp)) + geom_line()
which gives following plot:
To use ggplot, your data needs to be in long-format. gather does this for you, appending columns MW1-MW3 into one column, while the column names are added as new column values in the grp-column. This group-column allows to identify different groups, i.e. different colored lines in the plot.
Depending on the type of DHT + Procymidone, you can, e.g. use format(..., scientific = FALSE) to convert to numeric, however, this will result in -0.0000000001 (and not -10).
However, if this data column is a character vector (you can coerce with as.character), this may work:
a <- "1,00E-10"
sub("1,00E", "", a, fixed = TRUE)
> [1] "-10"
As an alternative answer to #Daniel's which doesn't rely on ggplot (thanks Daniel for providing the reproducible data).
mydat <- data.frame(DHT_Procymidone = c(-10, -9, -8, -7, -6, -5, -4),
MW1 = c(114, 115, 110, 100, 72, 20, 4),
MW2 = c(111, 111, 107, 98, 73, 36, 18),
MW3 = c(213, 114, 123, 116, 50, 15, 10))
plot(mydat[,2] ~ mydat[,1], typ = "l", ylim = c(0,220), xlim = c(-10,-2), xlab = "DHT Procymidone", ylab = "MW")
lines(mydat[,3] ~ mydat[,1], col = "blue")
lines(mydat[,4] ~ mydat[,1], col = "red")
legend(x = -4, y = 200, legend = c("MW1","MW2","MW3"), lty = 1, bty = "n", col = c("black","blue","red"))
To change axis labels see the text in xlab and ylab. To change axis limits see xlim and ylim.

Resources