I used ggplot2 to create the following barplot. However, I would like to add stars to show the significancy between the dark and light for the same treatment. For instance SW.5 treatment.
calen_per<- read.table("sadek calen percent.csv", sep=";",header = TRUE)
calen_per
DW SW.1 SW.2 SW.3 SW.4 SW.5 LVD
70 85 80 70 75 84 Light (79.5a)
75 90 85 77 72 86 Light (79.5a)
78 85 80 75 75 90 Light (79.5a)
72 70 74 65 70 70 Dark (70.8b)
75 72 70 70 72 75 Dark (70.8b)
70 75 70 70 65 70 Dark (70.8b)
cal_per <- melt(calen_per,id="LVD")
g8<-ggplot(cal_per, aes(x=variable, y=value, fill=as.factor(LVD)))
+ stat_summary(fun.y=mean,geom="bar",position=position_dodge(),
colour="black",width=.7,size=.5)+
stat_summary(fun.ymin=min,fun.ymax=max,geom="errorbar",
color="black",position=position_dodge(.7), width=.2) +
stat_summary(geom = 'text', fun.y = max, position = position_dodge(.7),
label = c("B","a","A","a","AB","a", "B","a","B","a","A","a"), vjust = -0.5)+
scale_fill_manual("marigold",
values = c("Light (79.5a)" = "white", "Dark (70.8b)" = "grey")) +
xlab("treatments")+ylab("percentage") +
theme(panel.background = element_rect(fill = 'white', colour = 'black'))+
ylim(0,120)
How can I do this?
Related
Hi would to create a barplot like this
but the bars should by filled by the values of this plot, leaving the rest in a color like gray or black:
To produce the barplots I used:
> table
Var1 Freq
1 H3K27ac 72
2 H3K27me3 72
3 H3K36me3 72
4 H3K4me1 72
5 H3K4me2 66
6 H3K4me3 72
7 H3K9ac 66
8 H3K9me3 71
> table_filt
Var1 Freq
1 H3K27ac 31
2 H3K27me3 72
3 H3K36me3 0
4 H3K4me1 71
5 H3K4me2 66
6 H3K4me3 72
7 H3K9ac 60
8 H3K9me3 1
and the code is:
table%>%
ggplot(aes(Var1, Freq, fill = Var1)) +
geom_col() +
scale_fill_manual(values = colours)
table_filt%>%
ggplot(aes(Var1, Freq, fill = Var1)) +
geom_col() +
scale_fill_manual(values = colours)
The colour vector is:
colours
H3K27ac H3K27me3 H3K36me3 H3K4me1 H3K4me2 H3K4me3 H3K9ac H3K9me3
"mediumvioletred" "#E69F00" "#56B4E9" "#009E73" "#F0E442" "#0072B2" "firebrick4" "aquamarine"
I appreciate any suggestion.
Are you looking for something like this?
library(ggplot2)
ggplot(table, aes(Var1, Freq)) +
geom_col(fill = "gray75") +
geom_col(data = table_filt, aes(fill = Var1)) +
scale_fill_brewer(palette = "Set1")
My intention is to plot several locations for which I have the longitude and the latitude onto a map (as simple dots). The locations are distributed across Uganda.
print(locations)
Latitude Longitude
1 0.482980 30.212160
2 0.647717 30.315984
3 0.44735 30.18063
4 0.58416316 30.2066327
5 0.60012 30.19998
6 0.433483 30.20179
7 0.625317 30.224837
8 0.654277 30.251667
9 0.387517 30.197475
10 0.607402 30.292068
11 0.770128 30.403456
12 0.767266 30.414246
13 0.777873 30.389111
14 0.631774 30.290356
15 0.734015 30.279161
16 0.722133 30.277941
17 0.66322994 30.22795225
18 0.66900827 30.21357739
19 0.450372 30.197764
20 0.493699 30.250891
21 0.479716 30.180958
22 0.483242 30.284576
23 0.645044 30.321270
24 0.602389 30.275637
25 0.868827 30.465939
26 0.631194 30.263565
27 0.631576 30.263855
28 0.413701 30.247934
29 0.67135 30.2675
30 0.492360 30.223620
31 0.81481 30.39311
32 0.396665 30.26309
33 0.666170 30.308960
34 0.610067 30.306058
35 0.677144 30.196810
36 0.677144 30.196810
37 0.555555 30.231681
38 0.63874 30.231691
39 0.512953 30.207603
40 0.442291 30.279173
41 0.575658 30.310231
42 0.423129 30.211289
43 0.623838 30.256925
44 0.639643 30.341620
45 0.653550 30.170428
46 0.752630 30.401040
47 0.478544 30.191938
48 0.48114 30.198471
49 0.679820 30.259800
50 0.581293 30.158619
51 0.730410 30.376620
52 0.504059 30.178556
53 0.587441 30.310364
54 0.588072 30.277877
55 0.70893233 30.19008103
56 0.81699 30.41799
57 0.609300 30.271613
58 0.595226 30.315580
59 0.459029 30.277659
60 0.727873 30.216385
61 0.647722 30.217760
62 0.690064 30.193881
63 0.512339 30.140107
64 0.649181 30.302570
65 0.649881 30.303974
66 0.649736 30.302481
67 0.722082 30.226063
68 0.463480 30.203050
69 0.692930 30.281880
70 0.652864 30.229106
71 0.491520 30.233780
72 0.778370 30.415920
73 0.682090 30.276460
74 0.564670 30.148920
75 0.655588 30.243047
76 0.647717 30.315984
77 0.518769 30.159384
78 0.683070 30.339650
79 0.662980 30.253890
80 0.591899 30.145857
81 0.699690 30.344650
82 0.441030 30.177240
83 0.612202 30.213022
84 0.472530 30.236980
85 0.473722 30.165020
86 0.499181 30.159485
87 0.6598021 30.29158
88 0.6601362 30.29119
89 0.48386 30.23142
90 0.679470 30.282190
91 0.685860 30.271070
92 0.528797 30.171251
93 0.514863 30.243976
94 0.603612 30.258705
95 0.484708 30.142588
96 0.523857 30.233239
97 0.395356 30.215351
98 0.612247 30.269341
99 0.55878815 30.17702095
100 0.747630 30.384240
101 0.538778 30.326353
102 0.554198 30.299815
103 0.504410 30.298260
104 0.418705 30.259747
105 0.669850 30.324100
106 0.654277 30.251667
107 0.460830 30.214070
108 0.378725 30.216429
Here is what I managed to do so far:
locations$Latitude=as.numeric(levels(locations$Latitude))[locations$Latitude]
locations$Longitude=as.numeric(levels(locations$Longitude))[locations$Longitude]
uganda <- raster::getData('GADM', country='UGA', level=1)
ggplot() +
geom_polygon(data = uganda,
aes(x = long, y = lat, group = group),
colour = "grey10", fill = "#fff7bc") +
geom_point(data = locations,
aes(x = Longitude, y = Latitude)) +
coord_map() +
theme_bw() +
xlab("Longitude") + ylab("Latitude")
As you can see by executing the code above, the map of Uganda is loaded from the GADM database and displayed correctly. However, I get the following warning message:
Warning:
Removed 108 rows containing missing values (geom_point).
I read in another post (Explain ggplot2 warning: "Removed k rows containing missing values") that this error might be caused by erroneous axis ranges. I'm not familiar with the plotting of geographic data and GADM maps, though. This is why I wasn't able to adjust the ranges (I guess this would be done in the geom_polygon -part). Can somebody help me, please?
I am not sure why you are running your first part of the code:
locations$Latitude=as.numeric(levels(locations$Latitude))[locations$Latitude] locations$Longitude=as.numeric(levels(locations$Longitude))[locations$Longitude]
If you don't run that part, there won't be any NA anymore. So if you run the following code, it should work:
library(tidyverse)
library(raster)
uganda <- raster::getData('GADM', country='UGA', level=1)
ggplot() +
geom_polygon(data = uganda,
aes(x = long, y = lat, group = group),
colour = "grey10", fill = "#fff7bc") +
geom_point(data = locations,
aes(x = Longitude, y = Latitude)) +
coord_map() +
theme_bw() +
xlab("Longitude") + ylab("Latitude")
Output:
I have a huge dataset containing minute-by minute recordings of some parameters in 20 patients.
By visualizing the records of patient's monitoring (The IP parameter) I was trying to construct the colored barplots.
So I used the scale_fill_gradient() function in r.
The problem is, that I'd like to assign to a definite value (for example IP = 20) a special color (let's say white).
Is it possible to do that with scale_fill_gradient?
my database looks like that:
Patient min IP
1a 75 19
1a 76 21
1a 77 20
1a 78 18.5
1a 79 17
1a 80 25
1a 81 29.3
1a 82 32.1
1a 83 30.9
2c 1 2
2c 2 5
2c 3 8
2c 4 9
2c 5 12
2c 6 16
2c 7 18
3v 72 38
3v 73 35
3v 74 30.3
3v 75 28.7
3v 76 27
3v 77 25.2
3v 78 22
3v 79 19.1
3v 80 18
3v 81 15
my code is
i<- ggplot(data, aes(patient, IP, fill=IP))
IPcol <- c("green", "greenyellow", "#fed976","#feb24c", "#fd8d3c", "#fc4e2a", "#e31a1c", "#bd0026", "#800026", "black")
i+geom_bar(stat = "identity")+ scale_fill_gradientn(colours = IPcol)+ coord_flip()+scale_y_time()
From this code I obtain the following image:
So the only thing i want to change - is that IP = 20 should be white
So the easy option would be to recode the IPs that equal 20s to NAs and set a na.value in that scale:
data <- read.table(text = your_posted_data, header = T)
IPcol <- c("green", "greenyellow", "#fed976","#feb24c", "#fd8d3c",
"#fc4e2a", "#e31a1c", "#bd0026", "#800026", "black")
ggplot(data, aes(Patient, IP, fill= ifelse(IP != 20, IP, NA))) +
geom_col() +
scale_fill_gradientn(colours = IPcol, na.value = "white")+
coord_flip()+
scale_y_time()
If you want the difficult option, you'd have to write a palette function that works on values ranged 0-1 and set the rescaled 20 to white:
# Define an area to set to white
target <- (c(19.5, 20.5) - min(data$IP)) / (max(data$IP) - min(data$IP))
# Build a palette function
my_palette <- function(colours, values = NULL) {
ramp <- scales::colour_ramp(colours)
force(values)
function(x) {
# Decide what values to replace
replace <- x > target[1] & x < target[2]
if (length(x) == 0)
return(character())
if (!is.null(values)) {
xs <- seq(0, 1, length.out = length(values))
f <- stats::approxfun(values, xs)
x <- f(x)
}
out <- ramp(x)
# Actually replace values
out[replace] <- "white"
out
}
}
And now you could plot it like thus:
ggplot(data, aes(Patient, IP, fill= IP)) +
geom_col() +
continuous_scale("fill", "my_pal", my_palette(IPcol),
guide = guide_colourbar(nbin = 100))+
coord_flip()+
scale_y_time()
I extracted some longitudinal temperature data from a .nc weather dataset (ncdf4 package) and would like to label the local extrema with their respective dates from x-axis using ggplot2 and its extension ggpmisc that includes stat_peaks/stat_valleys. Oddly, all the labels read the same: "Dec 1969".
I figured the most likely culprit was that my data used for the x-axis was not formatted correctly as Date, but the x-axis displays correctly and I have checked the class of the input data to confirm. I also tried applying group=1 which resulted in no change -- I admit I am new to R and ggplot2 (more familiar with Python/Pandas) and do not completely understand what group=1 does, though it was necessary to get the line to display correctly. Perhaps this is the result of a bug?
ggplot(df_denver, aes(x=Date, y=Temp..C., group=1)) +
geom_line() +
scale_x_date(date_labels="%b %Y", date_breaks = "10 years", expand=c(0,0)) +
stat_peaks(span=24, ignore_threshold = 0.80, color="red") +
stat_peaks(geom="text", span=24, ignore_threshold = 0.80, x.label.fmt = "%b %Y", color="red", angle=90, hjust=-0.1) +
stat_valleys(span=24, ignore_threshold = 0.55, color="blue") +
stat_valleys(geom="text", span=24, ignore_threshold = 0.55, x.label.fmt = "%b %Y", color="blue", angle=90, hjust=1.1) +
labs(x="Date", y="Temp (C)", title="Monthly Air Surface Temp for Denver from 1880 on")
Here are the first 100 rows of my dataset that produce 3 peaks and 3 valleys to illustrate:
Date Temp..C.
1 1880-01-01 2.91287017
2 1880-02-01 -2.73586297
3 1880-03-01 -2.04185677
4 1880-04-01 0.37948364
5 1880-05-01 0.78548384
6 1880-06-01 0.44176754
7 1880-07-01 -1.06966007
8 1880-08-01 -0.53162575
9 1880-09-01 -0.29665694
10 1880-10-01 -2.08401608
11 1880-11-01 -9.46955109
12 1880-12-01 -1.52052176
13 1881-01-01 -2.53366208
14 1881-02-01 -1.88263988
15 1881-03-01 -0.06864686
16 1881-04-01 3.32321167
17 1881-05-01 1.75613177
18 1881-06-01 2.82765651
19 1881-07-01 1.76543093
20 1881-08-01 1.39409852
21 1881-09-01 -0.98141575
22 1881-10-01 -0.63346595
23 1881-11-01 -1.95676208
24 1881-12-01 3.28983855
25 1882-01-01 -0.64792717
26 1882-02-01 2.15854502
27 1882-03-01 2.91465187
28 1882-04-01 0.56616443
29 1882-05-01 -1.89441001
30 1882-06-01 -0.63149375
31 1882-07-01 -0.64883423
32 1882-08-01 0.82802373
33 1882-09-01 0.66150969
34 1882-10-01 -0.54113626
35 1882-11-01 -1.21310496
36 1882-12-01 1.30559540
37 1883-01-01 -1.41802752
38 1883-02-01 -6.39232874
39 1883-03-01 2.96320987
40 1883-04-01 -0.48122203
41 1883-05-01 -0.99614143
42 1883-06-01 -0.67229420
43 1883-07-01 -0.56595141
44 1883-08-01 0.52161294
45 1883-09-01 0.09190032
46 1883-10-01 -2.65115738
47 1883-11-01 1.88332438
48 1883-12-01 -0.19942272
49 1884-01-01 -0.34669495
50 1884-02-01 -2.21085262
51 1884-03-01 0.55254096
52 1884-04-01 -1.21859336
53 1884-05-01 -0.40969065
54 1884-06-01 0.44454563
55 1884-07-01 1.28881764
56 1884-08-01 -1.09331822
57 1884-09-01 1.52377772
58 1884-10-01 1.76569140
59 1884-11-01 0.72411090
60 1884-12-01 -4.64927006
61 1885-01-01 -1.03242493
62 1885-02-01 -0.79325873
63 1885-03-01 0.65910935
64 1885-04-01 -0.10181000
65 1885-05-01 -1.50702798
66 1885-06-01 -1.25801849
67 1885-07-01 -0.88433135
68 1885-08-01 -1.18410277
69 1885-09-01 0.15284735
70 1885-10-01 -0.91721576
71 1885-11-01 1.82403481
72 1885-12-01 1.68553519
73 1886-01-01 -4.21202993
74 1886-02-01 2.43953681
75 1886-03-01 -2.24947429
76 1886-04-01 -1.22557247
77 1886-05-01 2.66594267
78 1886-06-01 -0.21662886
79 1886-07-01 1.09909940
80 1886-08-01 0.63720244
81 1886-09-01 -0.11845125
82 1886-10-01 0.49225059
83 1886-11-01 -3.16969180
84 1886-12-01 2.18220520
85 1887-01-01 0.51427501
86 1887-02-01 -0.69656581
87 1887-03-01 3.96693182
88 1887-04-01 0.92614591
89 1887-05-01 1.66550291
90 1887-06-01 1.88668025
91 1887-07-01 -1.48990893
92 1887-08-01 -0.98355341
93 1887-09-01 0.93172997
94 1887-10-01 -1.12551820
95 1887-11-01 1.07798636
96 1887-12-01 -2.15758419
97 1888-01-01 -1.69266903
98 1888-02-01 2.55955243
99 1888-03-01 -1.83599913
100 1888-04-01 3.63450384
As you can see, the labels produced by stat_peaks and stat_valleys are identical and not even within the range of the abbreviated data, rather than the correct dates corresponding to the x-axis.
Monthly Air Surface Temp for Denver from 1880 on
stat_peaks and stat_valleys labels will work with dates in POSIXct format:
df_denver$Date <- as.POSIXct(df_denver$Date, format = "%Y-%m-%d")
ggplot(df_denver, aes(x=Date, y=Temp)) +
geom_line() +
scale_x_datetime(date_labels="%b %Y", date_breaks = "1 year", expand=c(0,0)) +
stat_peaks(span=24, ignore_threshold = 0.80, color="red") +
stat_peaks(geom="text", span=24, ignore_threshold = 0.80, x.label.fmt = "%b %Y", color="red", angle=90, hjust=-0.1) +
stat_valleys(span=24, ignore_threshold = 0.55, color="blue") +
stat_valleys(geom="text", span=24, ignore_threshold = 0.55, x.label.fmt = "%b %Y", color="blue", angle=90, hjust=1.1) +
labs(x="Date", y="Temp (C)", title="Monthly Air Surface Temp for Denver from 1880 on") +
expand_limits(y = 6)
Note: scale_x_date was changed to scale_x_datetime. In addition, changed date_breaks to 1 year to demonstrate x-axis labels for example data, and expand_limits to ensure peak labels are readable. group=1 should not be needed.
I am trying to make it so that there is a line for each team, with the color of that line matching the color in the legend. I wrote the program as if it were a bar chart, since I know how to do that, so I think there are only a few changes that need to be made in order to make it into lines. Note: I don't want a line of best fit, but rather, one that connects from dot to dot.
This next part may be very time consuming, so I don't expect any one to help with this, but I would also really like to have the team logos in the legend, maybe replacing the team names. Then in the legend, I would like to have the color associated with the team as a line rather than a box.
Any help with either or both of these would be very much appreciated.
EDIT: I would like to keep all the features that the program below has, such as the gray background, white grids, ect.
df <- read.table(textConnection(
'Year Orioles RedSox Yankees Rays BlueJays
1998 79 92 114 63 88
1999 78 94 98 69 84
2000 74 85 87 69 83
2001 63 82 95 62 80
2002 67 93 103 55 78
2003 71 95 101 63 86
2004 78 98 101 70 67
2005 74 95 95 67 80
2006 70 86 97 61 87
2007 69 96 94 66 83
2008 68 95 89 97 86
2009 64 95 103 84 75
2010 66 89 95 96 85
2011 69 90 97 91 81
2012 93 69 95 90 73
2013 85 97 85 92 74
2014 96 71 84 77 83
2015 81 78 87 80 93
2016 89 93 84 68 89'), header = TRUE)
df %>%
gather(Team, Wins, -Year) %>%
mutate(Team = factor(Team, c("Orioles", "RedSox", "Yankees","Rays","BlueJays"))) %>%
ggplot(aes(x=Year, y=Wins)) +
ggtitle("AL East Wins") +
ylab("Wins") +
geom_col(aes(fill = Team), position = "dodge") +
scale_fill_manual(values = c("orange", "red", "blue", "black","purple"))+
theme(
plot.title = element_text(hjust=0.5),
axis.title.y = element_text(angle = 0, vjust = 0.5),
panel.background = element_rect(fill = "gray"),
panel.grid = element_line(colour = "white")
)
You can use geom_path(aes(color = Team)) instead of geom_col(aes(fill = Team) and a named color palette to achieve your basic goals like this:
# break this off the pipeline
df <- gather(df, Team, Wins, -Year) %>%
mutate(Team = factor(Team, c("Orioles", "RedSox", "Yankees","Rays","BlueJays")))
# if you want to resuse the same theme a bunch this is nice
# theme_grey() is the default theme
theme_set(theme_grey() +
theme(plot.title = element_text(hjust=0.5),
axis.title.y = element_text(angle = 0, vjust = 0.5),
panel.background = element_rect(fill = "gray")))
# named palettes are easy
# for specific colors i like hex codes the best
# i just grabbed these of this nice website TeamColorCodes, could be fun!
cust <- c("#FC4C00", "#C60C30", "#1C2841", "#79BDEE","#003DA5")
names(cust) <- levels(df$Team)
# use geom_path inplace of geom_col
ggplot(df, aes(x=Year, y=Wins, color = Team)) +
geom_path(aes(color = Team)) +
scale_color_manual(values = cust) +
labs(title = "AL East Wins",
subtitle = "Ahhh",
y = "Wins",
x = "Year")
Link to teamcolorcodes.com