boxplot in ggplot gives unexpected output

boxplot in ggplot gives unexpected output - r

I would like to plot a grouped boxplot using ggplot. Something like the picture below:
Below please see a sample (10 rows) from my data:
alpha colsample_bytree best_F1
35 0.00 0.5 0.5825656
78 0.10 0.3 0.4716612
68 0.00 0.3 0.4714286
27 0.40 1.0 0.4786216
49 0.15 0.5 0.4943968
62 0.00 0.3 0.4938805
70 0.00 0.3 0.4849785
73 0.10 0.3 0.4997061
59 0.30 0.5 0.4856369
88 0.20 0.3 0.4552402
sort(unique(data$alpha))
0 0.1 0.15 0.2 0.3 0.4
sort(unique(data$colsample_bytree))
0.3 0.5 1
My code is the following:
library(ggplot2)
library(ggthemes)
ggplot(data, aes(x= colsample_bytree, y = best_F1, fill = as.factor(alpha))) +
geom_boxplot(alpha = 0.5, position=position_dodge(1)) + theme_economist() +
ggtitle("F1 for alpha and colsample_bytree")
This produces the following plot:
and the following Warning:
Warning message:
"position_dodge requires non-overlapping x intervals"
Since the variable colsample_bytree takes 3 discrete values and the variable alpha takes 6 I would expect to see 3 groups of boxplots --each group comprised from 6 boxplots corresponding to the different alpa values and each group positioned at a different value of colsample_bytree,i.e. 0.3, 0.5 and 1.
I would expect the boxplots to not overlap just like in the example I cited.

You just have to include data$colsample_bytree <- as.factor(data$colsample_bytree) before you plot your data with the ggplot command.

Related

How to change the order in which the data appears in ggplot

I am trying to create a plot that shows the mean and standard deviation of my data. I have the code that creates the plot, and it works. However, my points are out of order. Below is the data that is being plotted (called Sumcircle):
reduction.area.size antlerless.harvest.rate.sink mean sd
1 0.3 23.14362 5.1980318
5 0.3 24.82013 2.9770937
10 0.3 25.30464 1.9167845
15 0.3 25.27654 1.6662350
20 0.3 24.86209 1.5823747
25 0.3 25.17401 1.3082544
1 0.35 20.65101 4.9711989
5 0.35 21.47942 2.6411690
10 0.35 21.72935 1.8211059
15 0.35 21.30290 1.6275956
20 0.35 21.49806 1.3869719
25 0.35 21.10958 1.1720223
1 0.4 18.09449 4.8401543
5 0.4 17.56596 2.2518005
10 0.4 18.22319 1.7100441
15 0.4 17.89776 1.2087007
20 0.4 17.84899 1.2016877
25 0.4 18.05289 1.0047864
1 0.45 14.35913 4.0633069
5 0.45 14.78276 2.1630511
10 0.45 15.18299 1.6615803
15 0.45 14.83019 1.2601986
20 0.45 14.90748 1.1107997
25 0.45 14.69429 0.8485477
1 0.5 11.75290 3.5159347
5 0.5 12.10627 2.2036029
10 0.5 12.47646 1.4440110
15 0.5 12.31346 0.9431687
20 0.5 12.20568 0.9091177
25 0.5 12.14800 0.8264364
Here is the code I use to create the plot:
library(ggplot2)
pd = position_dodge(.8)
circlelineplot<-ggplot(Sumcircle,
aes(x = antlerless.harvest.rate.sink,
y = mean,
color = reduction.area.size)) +
geom_point(shape = 15,
size = 4,
position = pd) +
geom_errorbar(aes(ymin = mean - sd,
ymax = mean + sd),
width = 0.2,
size = 0.7,
position = pd) +
theme_bw() +
theme(axis.title = element_text(face = "bold"))
# Change the y axis name
circlelineplot + ggtitle("Population Density in Circular Reduction
Area with 30 deer/mi2 Ambient Density") +
theme(plot.title = element_text(hjust=0.5)) +
scale_y_continuous(name ="End Population Density"),
breaks=seq(0,30,5)) +
scale_x_discrete(name ="Antlerless Harvest Rate",
breaks=c("0.3","0.35","0.4","0.45","0.5"),
labels=c("30%","35%","40%","45%","50%")) +
scale_color_manual(values=c("brown","brown1","brown2","brown3",
"brown4","darkred"), name="Size of Reduction Area",
limits=c("1","5","10","15","20","25"))
However, this code gives me the following plot:
How do I get the data for the size "5" reduction area to go between the data for sizes "1" and "10"? I thought the limits function would do that, but it is not. Thanks for any help!

Coloring nodes of a graph according to the different scales

I want to plot different data sets as igraph objects. They can be like as follows:
library(igraph)
m<-matrix(data = c("a1_ghj", "a1_phj",
"b2_ghj", "c1_pht",
"c1_ght", "a1_ghi",
"g5_pht", "d2_phj",
"r5_phj", "u6_pht"), ncol = 2)
))
g<-graph_from_edgelist(m)
g
The color of their nodes should be specified by different scales for example they are as follows:
aa qwr asd rty fgh vbn iop ert
ghj 1.8 -0.5 0.2 0.62 0.74 0.3 1.6
ght 2.5 -1 4.1 0.29 0.91 0.9 2
pht -3.5 3 -3.1 -0.9 0.62 -0.6 -9.2
phj -3.5 3 -1.8 -0.74 0.62 -0.7 -8.2
ghi 2.8 -2.5 4.4 1.19 0.88 0.5 3.7
In the name of nodes, after _ , the name of group that the node is a member of that is displayed. In the scale table, columns display type of the scale and rows illustrate the name of the groups.
For plotting these graphs I need a function to normalize these scales between -1 and 1, then specifies color to the nodes regarding the values of a chosen scale type in the table. Anybody help me on this issue?

First of all, as in The earlier question
you can use sub on the vertex names to get the suffixes.
Suffixes = factor(sub(".*_", "", names(V(g))))
So the question becomes how to use the different scales to choose the colors
for the nodes. You asked to scale from -1 to 1, but actually I have scaled
0 to 1, because that is the type of argument expected by the function produced
by colorRamp.
Your scaling data
RawScales = read.table(text="aa qwr asd rty fgh vbn iop ert
ghj 1.8 -0.5 0.2 0.62 0.74 0.3 1.6
ght 2.5 -1 4.1 0.29 0.91 0.9 2
pht -3.5 3 -3.1 -0.9 0.62 -0.6 -9.2
phj -3.5 3 -1.8 -0.74 0.62 -0.7 -8.2
ghi 2.8 -2.5 4.4 1.19 0.88 0.5 3.7",
header=TRUE)
I will use both of the qwr and rty scales as examples.
Scale between 0 and 1.
qwr_Scaled = (RawScales$qwr - min(RawScales$qwr)) /
(max(RawScales$qwr) - min(RawScales$qwr))
rty_Scaled = (RawScales$rty - min(RawScales$rty)) /
(max(RawScales$rty) - min(RawScales$rty))
Set up a function to create color scales. Note: orange is the minimum value, red is the maximum value.
Color = colorRamp(c("orange", "yellow", "white", "pink", "red"))
Use the function to create a vector of colors for the nodes.
ColVals_qwr = rgb(Color(qwr_Scaled), maxColorValue=255)
names(ColVals_qwr) = RawScales$aa
ColVals_rty = rgb(Color(rty_Scaled), maxColorValue=255)
names(ColVals_rty) = RawScales$aa
Now plot using the color scales. I have added an explicit layout of the nodes so that the two plots would be comparable.
par(mfrow=c(1,2), mar=c(5, 1,3,1))
LO = layout_with_fr(g)
plot(g, vertex.color=ColVals_qwr[Suffixes], frame=TRUE)
plot(g, vertex.color=ColVals_rty[Suffixes], frame=TRUE)

Calculate average within a specified range

I am using the 'diamonds' dataset from ggplot2 and am wanting to find the average of the 'carat' column. However, I want to find the average every 0.1:
Between
0.2 and 0.29
0.3 and 0.39
0.4 and 0.49
etc.

You can use function aggregate to mean by group which is calculated with carat %/% 0.1
library(ggplot2)
averageBy <- 0.1
aggregate(diamonds$carat, list(diamonds$carat %/% averageBy * averageBy), mean)
Which gives mean by 0.1
Group.1 x
1 0.2 0.2830764
2 0.3 0.3355529
3 0.4 0.4181711
4 0.5 0.5341423
5 0.6 0.6821408
6 0.7 0.7327491
...

ggplot2 - Display specific values on x-axis

I'm trying to display specific values on x-axis while plotting a line plot on with ggplot2. In my table, I have the num values which are quite distant from each other, that's why I want to plot them as discrete values.
line <- ggplot(lineplot, aes(value,num, colour=attribute))
line + geom_line()
Hope I've been clear, I'm a very beginner,
apologies in advance for the question
example table:
num value attribute
a 0 0.003 main
b 1 0.003 low
c 0 0.003 high
d 0 0.6 main
e 9 0.6 low
f 3 0.6 high
g 2 0.9 main
h 2 0.9 low
I 2 0.9 high
x-axis:
what i get:
0.003 0.6 0.9
i want:
0.003 0.6 0.9

If you want the x axis to be treated like a discrete factor then you have to add the group aesthetic to tell ggplot2 which points to connect with a line.
df <- read.table(text = "num value attribute
0 0.003 main
1 0.003 low
0 0.003 high
0 0.6 main
9 0.6 low
3 0.6 high
2 0.9 main
2 0.9 low
2 0.9 high", header = TRUE)
ggplot(df, aes(x = factor(value), y = num, group = attribute, color = attribute)) +
geom_line()

try to force x-axis to be as factor and not numeric
line <- ggplot(lineplot, aes(factor(value),num, colour=attribute))
line + geom_line()
Is that what you want ?

R: A facet_wrap() without a factor

I need to create a matrix of histograms with ggplot and facet_wrap(). More or less the code I have is the following:
df_3<-data.frame(rnorm(1000),...,rnorm(1000))
#The data frame has 1000 observations and 16 variables.
colnames(df_3) <- letters[1:16]
library(ggplot2)
gr12 <- ggplot(df_3, aes(x=observations)) + geom_histogram()
My question is: how can I do to plot the matrix of histograms with facet_wrap() and without a factor variable?

Melt the dataframe into a long format (see reshape2), thus your data frame goes from being
a b c ... p
1 0.1 0.2 0.3 ... 0.16
2 0.1 0.1 0.2 ... 0.00
(My internal randomizer is really bad.)
to
variable value
a 0.1
b 0.2
c 0.3
... ...
p 0.16
a 0.1
b 0.1
c 0.2
... ...
p 0.00
Then variable is your factor that you wish to facet by. If the long formatted data frame is df_4, I imagine you could do
ggplot(df_4, aes(x=value)) + stat_histogram() + facet_wrap(variable)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

boxplot in ggplot gives unexpected output - r

You just have to include data$colsample_bytree <- as.factor(data$colsample_bytree) before you plot your data with the ggplot command.

Related

How to change the order in which the data appears in ggplot

Coloring nodes of a graph according to the different scales

Calculate average within a specified range

ggplot2 - Display specific values on x-axis

R: A facet_wrap() without a factor

Categories

Resources