Issue with Density Plot using GGPLOT2 [closed] - r

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I want to plot a density plot for 2 groups and below is my code.
library(ggplot2)
#Sample data
dat <- data.frame(Score = c(myfiles2Best$V2, myfilesL2Best$V2)
, Group = rep(c("T", "L")))
ggplot(dat, aes(x = Score)) +
geom_density(aes(color = Group)) + xlim(0,16)
Below is the image of the output.
and when I change the data frame by changing the location of the column as shown below this is how my plot looks like.
dat <- data.frame(Score = c(myfilesL2Best$V2, myfiles2Best$V2)
, Group = rep(c("L", "T")))
Individually, this is how they look like.
dat <- data.frame(Score = c(myfiles2Best$V2)
, Group = rep(c("T"))
ggplot(dat, aes(x = Score)) +
geom_density(aes(color = Group)) + xlim(0,16)
dat <- data.frame(Score = c(myfilesL2Best$V2)
, Group = rep(c("L"))
ggplot(dat, aes(x = Score)) +
geom_density(aes(color = Group)) + xlim(0,16)
This is totally wrong, anything wrong with my setup
rownumber score group
1 8 T
2 8 L
3 7 T
4 7 L
5 9 T
6 8 L
7 8 T
8 7 L
9 8 T
10 8 L
11 8 T
12 9 L
13 8 T
14 8 L
15 8 T
16 8 L
17 9 T
18 7 L
19 9 T
20 7 L
21 8 T
22 10 L
23 8 T
24 8 L
25 9 T
26 8 L
27 8 T
28 8 L
29 9 T
30 8 L
31 7 T
32 10 L
33 8 T
34 10 L
35 8 T
36 7 L
37 8 T
38 7 L
39 11 T
40 9 L
41 8 T
42 9 L
43 8 T
44 10 L
45 8 T
46 9 L
47 8 T
48 8 L
49 8 T
50 7 L
51 9 T
52 8 L
53 8 T
54 9 L
55 8 T
56 7 L
57 7 T
58 9 L
59 10 T
60 8 L

ggplot2::geom_density uses the base R density function to compute density. (see ?geom_density.) This requires a parameter for smoothing, which by default uses a rule named "nrd0", which was picked for "historical and compatibility reasons." (see ?density.) You will get density plots with different appearances depending on this parameter.
From ?bandwidth:
bw.nrd0 implements a rule-of-thumb for choosing the bandwidth of a Gaussian kernel density estimator. It defaults to 0.9 times the minimum of the standard deviation and the interquartile range divided by 1.34 times the sample size to the negative one-fifth power (= Silverman's ‘rule of thumb’, Silverman (1986, page 48, eqn (3.31))) unless the quartiles coincide when a positive result will be guaranteed.
In your example, the two subgroups look like they have different standard deviations and IQRs, so it makes sense to me that they would look different depending on whether that smoothing parameter is calculated for them collectively (as in the case with the combined plot) or individually.
If you want your density plots to correspond between a grouped and individual basis, specify the bandwidth manually:
ggplot(df, aes(x = score)) +
geom_density(aes(color = group), bw = 0.3) +
xlim(0,16)
ggplot(subset(df, group == "L"), aes(x = score)) +
geom_density(aes(color = group), bw = 0.3) +
xlim(0,16)
ggplot(subset(df, group == "T"), aes(x = score)) +
geom_density(aes(color = group), bw = 0.3) +
xlim(0,16)

Related

Adding text in one of the four facets [duplicate]

This question already has an answer here:
Annotation on only the first facet of ggplot in R?
(1 answer)
Closed last month.
I want to add a few texts in one facet out of four facets in my ggplot.
I am using annotate function to add a text but it generates the text at a given location (x,y) in every facet. Because the data variables have different ranges of y in each facet, the texts are not coming at a desired location (x,y).
Please let me know what should be done. Thanks.
library(dplyr)
library(tidyr)
library(ggplot2)
df%>%
select(Date, Ca, Na, K, Mg)%>%
gather(var,value,-Date)%>%
ggplot(aes(as.Date(Date), value))+
geom_point()+
theme_bw()+
facet_wrap(~var,scales = 'free_y',ncol = 1)+
ylab(" (ppm) (ppm)
(ppm) (ppm)")+
facet_wrap(~var,scales = 'free_y',ncol = 1, strip.position = "right")+
geom_vline(aes(xintercept = as.Date("2021-04-28")), col = "red")+
geom_vline(aes(xintercept = as.Date("2021-04-28")), col = "red")+
geom_vline(aes(xintercept = as.Date("2021-04-29")), col = "red")+
theme(axis.title = element_text(face="bold"))+
theme(axis.text = element_text(face="bold"))+
xlab('Date')+
theme(axis.title.x = element_text(margin = margin(t = 10)))+
theme(axis.title.y = element_text(margin = margin(r = 10)))+
annotate("text", label = "E1", x = as.Date("2021-04-28"), y = 2.8)
This is the code I am using for the desired output. I want to name all the xintercept lines which is E1, E2, E3 (from left to right) on the top of xaxis i.e. above the first facet of variable Ca in the data. Any suggestions?
Here is a part of my data:
df <- read.table(text = "
Date Ca K Mg Na
2/18/2021 1 25 21 19
2/22/2021 2 26 22 20
2/26/2021 3 27 23 21
3/4/2021 4 28 5 22
3/6/2021 5 29 6 8
3/10/2021 6 30 7 9
3/13/2021 7 31 8 10
3/17/2021 8 32 9 11
3/20/2021 9 33 10 12
3/23/2021 10 34 11 13
3/27/2021 11 35 12 14
3/31/2021 12 36 13 15
4/3/2021 13 37 14 16
4/7/2021 14 38 15 17
4/10/2021 15 39 16 18
4/13/2021 16 40 17 19
4/16/2021 17 41 18 20
4/19/2021 8 42 19 21
4/22/2021 9 43 20 22
4/26/2021 0 44 21 23
4/28/2021 1 45 22 24
4/28/2021 2 46 23 25
4/28/2021 3 47 24 26
4/28/2021 5 48 25 27
4/29/2021 6 49 26 28
5/4/2021 7 50 27 29
5/7/2021 8 51 28 30
5/8/2021 9 1 29 31
5/10/2021 1 2 30 32
5/29/2021 3 17 43 45
5/31/2021 6 18 44 46
6/1/2021 4 19 45 47
6/2/2021 8 20 46 48
6/3/2021 2 21 47 49
6/7/2021 3 22 48 50
6/10/2021 5 23 49 51
6/14/2021 3 5 50 1
6/18/2021 1 6 51 2
", header = TRUE)
Prepare the data before plotting, make a separate data for text annotation:
dfplot <- df %>%
select(Date, Ca, Na, K, Mg) %>%
#convert to date class before plotting
mutate(Date = as.Date(Date, "%m/%d/%Y")) %>%
#using pivot instead of gather. gather is superseded.
#gather(var, value, -Date)
pivot_longer(cols = 2:5, names_to = "grp", values_to = "ppm")
dftext <- data.frame(grp = "Ca", # we want text to show up only on "Ca" facet.
ppm = max(dfplot[ dfplot$grp == "Ca", "ppm" ]),
Date = as.Date(c("2021-04-27", "2021-04-28", "2021-04-29")),
label = c("E1", "E2", "E3"))
After cleaning up your code, we can use geom_text with dftext:
ggplot(dfplot, aes(Date, ppm)) +
geom_point() +
facet_wrap(~grp, scales = 'free_y',ncol = 1, strip.position = "right") +
geom_vline(xintercept = dftext$Date, col = "red") +
geom_text(aes(x = Date, y = ppm, label = label), data = dftext, nudge_y = -2)
Try using ggrepel library to avoid label overlap, replace geom_text with one of these:
#geom_text_repel(aes(x = Date, y = ppm, label = label), data = dftext)
#geom_label_repel(aes(x = Date, y = ppm, label = label), data = dftext)
After cleaning up the code and seeing the plot, I think this post is a duplicate of Annotation on only the first facet of ggplot in R? .

Function for generating multiple line charts for all variables in a dataframe for different groups

I have 106 weeks data for 5 different LOB (Line of Business). The variables are Traffic, Spend, Clicks, etc. In total there will be 106*5 = 530 rows.
Dataframe looks like:
LOB Week Traffic Spend Clicks
A 1 34 12 5
A 2 37 32 6
A 3 41 57 7
A 4 52 42 12
A 5 27 37 8
... 106 weeks
B...106 weeks
C...106 weeks
D...106 weeks
E 1 43 22 12
E 2 65 16 14
E 3 76 18 9
E 4 25 14 11
E 5 53 15 15
... 106 weeks
I want to generate line chart for Traffic for all the 5 different LOB on the same chart, similarly for other metrics also. For this I have written a function but it is not doing what I want.
Code:
for ( i in seq(1,length( data),1) ) plot(data[,i],ylab=names(data[i]),type="l", col = "red", xlab = "Week", main = "")
Kindly suggest me how this can be done.
You can use ggplot2 :
ggplot(data, aes(x = Week, y = Traffic, color = LOB)) +
geom_line()
Please try to submit a toy example of your data so we can reproduce the code. See Here.
Edit: as suggested by #Axeman, you may want to plot all metrics together. Here is his solution for visibility:
d <- gather(data, metric, value, -Week, -LOB)
ggplot(d, aes(Week, value, color = LOB)) +
geom_line() +
facet_wrap(~metric, scales = 'free_y')

R hist vs geom_hist break points

I am using both geom_hist and histogram in R with the same breakpoints but I get different graphs. I did a quick search, does anyone know what the definition breaks are and why they would be a difference
These produce two different plots.
set.seed(25)
data <- data.frame(Mos=rnorm(500, mean = 25, sd = 8))
data$Mos<-round(data$Mos)
pAge <- ggplot(data, aes(x=Mos))
pAge + geom_histogram(breaks=seq(0, 50, by = 2))
hist(data$Mos,breaks=seq(0, 50, by = 2))
Thanks
To get the same histogram in ggplot2 you specify the breaks inside scale_x_continuous and binwidth inside geom_histogram.
Additionally, hist and histograms in ggplot2 use different defaults to create the intervals:
hist: right-closed (left open) intervals. Default: right = TRUE
stat_bin (ggplot2): left-closed (right open) intervals. Default: right = FALSE
**hist** **ggplot2**
freq1 Freq freq2 Freq
1 (0,2] 0 [0,2) 0
2 (2,4] 2 [2,4) 2
3 (4,6] 2 [4,6) 1
4 (6,8] 1 [6,8) 2
5 (8,10] 6 [8,10) 2
6 (10,12] 9 [10,12) 7
7 (12,14] 24 [12,14) 17
8 (14,16] 27 [14,16) 26
9 (16,18] 39 [16,18) 31
10 (18,20] 48 [18,20) 46
11 (20,22] 52 [20,22) 43
12 (22,24] 38 [22,24) 57
13 (24,26] 44 [24,26) 36
14 (26,28] 46 [26,28) 52
15 (28,30] 39 [28,30) 39
16 (30,32] 31 [30,32) 33
17 (32,34] 30 [32,34) 26
18 (34,36] 24 [34,36) 29
19 (36,38] 18 [36,38) 27
20 (38,40] 9 [38,40) 12
21 (40,42] 5 [40,42) 6
22 (42,44] 4 [42,44) 0
23 (44,46] 1 [44,46) 5
24 (46,48] 1 [46,48) 0
25 (48,50] 0 [48,50) 1
I included the argument right = FALSE so the histogram intervalss are left-closed (right open) as they are in ggplot2. I added the labels in both plots, so it is easier to check the intervals are the same.
ggplot(data, aes(x = Mos))+
geom_histogram(binwidth = 2, colour = "black", fill = "white")+
scale_x_continuous(breaks = seq(0, 50, by = 2))+
stat_bin(binwidth = 2, aes(label=..count..), vjust=-0.5, geom = "text")
hist(data$Mos,breaks=seq(0, 50, by = 2), labels =TRUE, right =FALSE)
To check the frequencies in each bin:
freq <- cut(data$Mos, breaks = seq(0, 50, by = 2), dig.lab = 4, right = FALSE)
as.data.frame(table(frecuencias))

ggplot2 merge color and fill legends

I want to merge two legends in ggplot2. I use the following code:
ggplot(dat_ribbon, aes(x = x)) +
geom_ribbon(aes(ymin = ymin, ymax = ymax,
group = group, fill = "test4 test5"), alpha = 0.2) +
geom_line(aes(y = y, color = "Test2"), data = dat_m) +
scale_colour_manual(values=c("Test2" = "white", "test"="black", "Test3"="red")) +
scale_fill_manual(values = c("test4 test5"= "dodgerblue4")) +
theme(legend.title=element_blank(),
legend.position = c(0.8, 0.85),
legend.background = element_rect(fill="transparent"),
legend.key = element_rect(colour = 'purple', size = 0.5))
The output is shown below. There are two problems:
When I use two or more words in the fill legend, the alignment becomes wrong
I want to merge the two legends into one, such that the fill legend is just part of a block of 4.
Does anyone know how I can achieve this?
Edit: reproducible data:
dat_m <- read.table(text="x quantile y group
1 1 50 0.4967335 0
2 2 50 0.4978249 0
3 3 50 0.5113562 0
4 4 50 0.4977866 0
5 5 50 0.5013287 0
6 6 50 0.4997994 0
7 7 50 0.4961121 0
8 8 50 0.4991302 0
9 9 50 0.4976087 0
10 10 50 0.5011666 0")
dat_ribbon <- read.table(text="
x ymin group ymax
1 1 0.09779713 40 0.8992385
2 2 0.09979283 40 0.8996875
3 3 0.10309222 40 0.9004759
4 4 0.10058433 40 0.8985366
5 5 0.10259125 40 0.9043807
6 6 0.09643109 40 0.9031940
7 7 0.10199870 40 0.9022920
8 8 0.10018253 40 0.8965690
9 9 0.10292754 40 0.9010934
10 10 0.09399359 40 0.9053067
11 1 0.20164694 30 0.7974174
12 2 0.20082056 30 0.7980642
13 3 0.20837821 30 0.8056074
14 4 0.19903399 30 0.7973723
15 5 0.19903322 30 0.8050146
16 6 0.19965049 30 0.8051922
17 7 0.20592719 30 0.8042850
18 8 0.19810139 30 0.7956606
19 9 0.20537392 30 0.8007527
20 10 0.19325158 30 0.8023044
21 1 0.30016463 20 0.6953927
22 2 0.29803646 20 0.6976961
23 3 0.30803808 20 0.7048137
24 4 0.30045448 20 0.6991248
25 5 0.29562249 20 0.7031225
26 6 0.29647060 20 0.7043499
27 7 0.30159103 20 0.6991356
28 8 0.30369025 20 0.6949053
29 9 0.30196483 20 0.6998127
30 10 0.29578036 20 0.7015861
31 1 0.40045725 10 0.5981147
32 2 0.39796299 10 0.5974115
33 3 0.41056038 10 0.6057062
34 4 0.40046287 10 0.5943157
35 5 0.39708008 10 0.6014512
36 6 0.39594129 10 0.6011162
37 7 0.40052411 10 0.5996186
38 8 0.40128517 10 0.5959748
39 9 0.39917658 10 0.6004600
40 10 0.39791453 10 0.5999168")
You are not using ggplot2 according to its philosophy. That makes things difficult.
ggplot(dat_ribbon, aes(x = x)) +
geom_ribbon(aes(ymin = ymin, ymax = ymax, group = group, fill = "test4 test5"),
alpha = 0.2) +
geom_line(aes(y = y, color = "Test2"), data = dat_m) +
geom_blank(data = data.frame(x = rep(5, 4), y = 0.5,
group = c("test4 test5", "Test2", "test", "Test3")),
aes(y = y, color = group, fill = group)) +
scale_color_manual(name = "combined legend",
values=c("test4 test5"= NA, "Test2" = "white",
"test"="black", "Test3"="red")) +
scale_fill_manual(name = "combined legend",
values = c("test4 test5"= "dodgerblue4",
"Test2" = NA, "test"=NA, "Test3"=NA))

plot plate layout heatmap in r

I am trying to plot a plate layout heatmap in R. The plate layout is simply 8 (row) x 12 (column) circles (wells). Rows are labeled by alphabets and columns by numbers. Each well need to be filled with some color intensity depends upon a qualitative or quantitative variable. The plate layout look like this:
Here is small dataset:
set.seed (123)
platelay <- data.frame (rown = rep (letters[1:8], 12), coln = rep (1:12, each = 8),
colorvar = rnorm (96, 0.3, 0.2))
rown coln colorvar
1 a 1 0.187904871
2 b 1 0.253964502
3 c 1 0.611741663
4 d 1 0.314101678
5 e 1 0.325857547
6 f 1 0.643012997
7 g 1 0.392183241
8 h 1 0.046987753
9 a 2 0.162629430
10 b 2 0.210867606
11 c 2 0.544816359
12 d 2 0.371962765
13 e 2 0.380154290
14 f 2 0.322136543
15 g 2 0.188831773
16 h 2 0.657382627
17 a 3 0.399570096
18 b 3 -0.093323431
19 c 3 0.440271180
20 d 3 0.205441718
21 e 3 0.086435259
22 f 3 0.256405017
23 g 3 0.094799110
24 h 3 0.154221754
25 a 4 0.174992146
26 b 4 -0.037338662
27 c 4 0.467557409
28 d 4 0.330674624
29 e 4 0.072372613
30 f 4 0.550762984
31 g 4 0.385292844
32 h 4 0.240985703
33 a 5 0.479025132
34 b 5 0.475626698
35 c 5 0.464316216
36 d 5 0.437728051
37 e 5 0.410783531
38 f 5 0.287617658
39 g 5 0.238807467
40 h 5 0.223905800
41 a 6 0.161058604
42 b 6 0.258416544
43 c 6 0.046920730
44 d 6 0.733791193
45 e 6 0.541592400
46 f 6 0.075378283
47 g 6 0.219423033
48 h 6 0.206668929
49 a 7 0.455993024
50 b 7 0.283326187
51 c 7 0.350663703
52 d 7 0.294290649
53 e 7 0.291425909
54 f 7 0.573720457
55 g 7 0.254845803
56 h 7 0.603294121
57 a 8 -0.009750561
58 b 8 0.416922750
59 c 8 0.324770849
60 d 8 0.343188314
61 e 8 0.375927897
62 f 8 0.199535309
63 g 8 0.233358523
64 h 8 0.096284923
65 a 9 0.085641755
66 b 9 0.360705728
67 c 9 0.389641956
68 d 9 0.310600845
69 e 9 0.484453494
70 f 9 0.710016937
71 g 9 0.201793767
72 h 9 -0.161833775
73 a 10 0.501147705
74 b 10 0.158159847
75 c 10 0.162398277
76 d 10 0.505114274
77 e 10 0.243045399
78 f 10 0.055856458
79 g 10 0.336260696
80 h 10 0.272221728
81 a 11 0.301152837
82 b 11 0.377056080
83 c 11 0.225867994
84 d 11 0.428875310
85 e 11 0.255902688
86 f 11 0.366356393
87 g 11 0.519367803
88 h 11 0.387036298
89 a 12 0.234813683
90 b 12 0.529761524
91 c 12 0.498700771
92 d 12 0.409679392
93 e 12 0.347746347
94 f 12 0.174418785
95 g 12 0.572130490
96 h 12 0.179948083
Is there is package that can readily do it ? Is it possible write a function in base or ggplot2 or other package that can achieve this target.
Changing the colour of points of sufficient size, with ggplot2. Note I've implemeted #TylerRinkler's suggestion, but within the call to ggplot. I've also removed the axis labels
ggplot(platelay, aes(y = factor(rown, rev(levels(rown))),x = factor(coln))) +
geom_point(aes(colour = colorvar), size =18) +theme_bw() +
labs(x=NULL, y = NULL)
And a base graphics approach, which will let you have the x axis above the plot
# plot with grey colour dictated by rank, no axes or labels
with(platelay, plot( x=as.numeric(coln), y= rev(as.numeric(rown)), pch= 19, cex = 2,
col = grey(rank(platelay[['colorvar']] ) / nrow(platelay)), axes = F, xlab= '', ylab = ''))
# add circular outline
with(platelay, points( x=as.numeric(coln), y= rev(as.numeric(rown)), pch= 21, cex = 2))
# add the axes
axis(3, at =1:12, labels = 1:12)
axis(2, at = 1:8, labels = LETTERS[8:1])
# the background grid
grid()
# and a box around the outside
box()
And for giggles and Christmas cheer, here is a version using base R plotting functions.
Though there is very possibly a better solution.
dev.new(width=6,height=4)
rown <- unique(platelay$rown)
coln <- unique(platelay$coln)
plot(NA,ylim=c(0.5,length(rown)+0.5),xlim=c(0.5,length(coln)+0.5),ann=FALSE,axes=FALSE)
box()
axis(2,at=seq_along(rown),labels=rev(rown),las=2)
axis(3,at=seq_along(coln),labels=coln)
colgrp <- findInterval(platelay$colorvar,seq(min(platelay$colorvar),max(platelay$colorvar),length.out=10))
colfunc <- colorRampPalette(c("green", "blue"))
collist <- colfunc(length(unique(colgrp)))
symbols(platelay$coln,
factor(platelay$rown, rev(levels(platelay$rown))),
circles=rep(0.2,nrow(platelay)),
add=TRUE,
inches=FALSE,
bg=collist[colgrp])
And the resulting image:
here a solution using ggplot2 solution of #mnel and grid solution
here the code of given solution
d <- ggplot(platelay, aes(y=rown,x=factor(coln))) +
geom_point(aes(colour = colorvar), size =18) + theme_bw()
I use the data generated by ggplot
data <- ggplot_build(d)$data[[1]]
x <- data$x
y <- data$y
grid.newpage()
pushViewport(plotViewport(c(4, 4, 2, 2)),
dataViewport(x, y))
grid hase an ellipse geom
grid.ellipse(x, y,size=20, ar = 2,angle=0,gp =gpar(fill=data$colour))
grid.xaxis(at=c(labels=1:12,ticks=NA),gp=gpar(cex=2))
grid.yaxis(at = 1:8,label=rev(LETTERS[1:8]),gp=gpar(cex=2))
grid.roundrect(gp=gpar(fill=NA))
I add grid :
gpgrid <- gpar(col='grey',lty=2,col='white')
grid.segments(unit(1:12, "native") ,unit(0, "npc"), unit(1:12, "native"),unit(1, "npc"),gp=gpgrid)
grid.segments(unit(0, "npc"), unit(1:8, "native"), unit(1, "npc"),unit(1:8, "native"),gp=gpgrid)
upViewport()
This answer is an add on for #thelatemail answer which explains the platemap for (8,12) = 96 format.
To construct (32,48) = 1536 format, single digits of A-Z is insufficent. Hence one needs to expand letters such as AA, AB, AC, AD ... ZZ and it can be expanded to three or more digits by concatenating LETTERS to the levels variable as below.
levels = c(LETTERS, c(t(outer(LETTERS, LETTERS, paste, sep = "")))))
#thelatemail answer can be improved for letters in double digits for 1536 plate format as below
rown = rep (c(LETTERS, c(t(outer(LETTERS[1], LETTERS[1:6], paste, sep = "")))),
symbols(platelay$coln,
factor(platelay$rown,
levels = rev(c(LETTERS, c(t(outer(LETTERS[1], LETTERS[1:6], paste, sep = "")))))),
circles=rep(0.45,nrow(platelay)),
add=TRUE,
inches=FALSE,
bg=collist[colgrp])
The levels variable inside symbols function should have characters with alphabetically sorted single, then double, then triple ... and so on digits.
For example, if you have below incorrect order of levels inside the symbols function, then it will plot with incorrect color representation.
Incorrect order:
A, AA, AB, AC, AD, AE, AF, B, C,D, ...Z
Correct order:
A, B, C, D, E, .....Z, AA, AB, AC, AD, AE, AF

Resources