R: A facet_wrap() without a factor - r

I need to create a matrix of histograms with ggplot and facet_wrap(). More or less the code I have is the following:
df_3<-data.frame(rnorm(1000),...,rnorm(1000))
#The data frame has 1000 observations and 16 variables.
colnames(df_3) <- letters[1:16]
library(ggplot2)
gr12 <- ggplot(df_3, aes(x=observations)) + geom_histogram()
My question is: how can I do to plot the matrix of histograms with facet_wrap() and without a factor variable?

Melt the dataframe into a long format (see reshape2), thus your data frame goes from being
a b c ... p
1 0.1 0.2 0.3 ... 0.16
2 0.1 0.1 0.2 ... 0.00
(My internal randomizer is really bad.)
to
variable value
a 0.1
b 0.2
c 0.3
... ...
p 0.16
a 0.1
b 0.1
c 0.2
... ...
p 0.00
Then variable is your factor that you wish to facet by. If the long formatted data frame is df_4, I imagine you could do
ggplot(df_4, aes(x=value)) + stat_histogram() + facet_wrap(variable)

Related

How to remove outliers along with factor variables?

I am trying to remove outliers from a dataset s consisted of 3 variables:
id consumption period
a 0.1 summer
a 0.2 summer
b 0.3 summer
a 0.4 winter
b 10 winter
c 12 winter
I used outliers <- s$consumption[!s$consumption %in% boxplot.stats(s$consumption)$out] to remove the outliers from s and got something like this:
consumption
0.1
0.2
0.3
0.4
However, I want to get something like this below:
id consumption period
a 0.1 summer
a 0.2 summer
b 0.3 summer
a 0.4 winter
But the $out function only allows me to remove the column with numbers (not with factors).
I found a solution which is to find the min of the output I got from outliers <- s$consumption[!s$consumption %in% boxplot.stats(s$consumption)$out], which is l in this case:
consumption
0.1
0.2
0.3
0.4
By knowing my min value, I can then take a subset of s by setting a condition where consumption has to be less than min(l).
new <- subset(s, consumption < min(l))

How to PLOT a grouped AND stacked barplot with 3 FACTORS in R? [duplicate]

My data looks like this:
system operation_type prep_time operation_time
A x 0.7 1.4
A y 0.11 2.3
A z 1.22 6.7
B x 0.44 5.2
B y 0.19 2.3
B z 3.97 9.5
C x 1.24 2.4
C y 0.23 2.88
C z 0.66 9.7
I would like to have a stacked chart on prep_time and operation time that gives me total_time grouped by system and then faceted by operation_type.
My code looks like this for now.
library(ggplot2)
df <- read.csv("test.csv", strip.white=T)
plot <- ggplot(df, aes(x=system,y=(prep_time+operation_time))) + geom_bar(stat="identity") + facet_grid(.~operation_type)
The output I get is
What I need is a distinction in bar that shows what part of the total_time is prep_time and what is operation_time. I thought of adding a legend and having different colors for prep_time and operation_time but I cannot figure out how I can do that.
This should give you a start. You need to convert your data frame from wide format to long format based on prep_time and operation_time because they are the same variable. Here I called new column Type. To plot the system on the x-axis, we can use fill to assign different color. geom_col is the command to plot a stacked bar chart. facet_grid is the command to create facets.
library(tidyr)
library(ggplot2)
df2 <- df %>% gather(Type, Time, ends_with("time"))
ggplot(df2, aes(x = system, y = Time, fill = Type)) +
geom_col() +
facet_grid(. ~ operation_type)
DATA
df <- read.table(text = "system operation_type prep_time operation_time
A x 0.7 1.4
A y 0.11 2.3
A z 1.22 6.7
B x 0.44 5.2
B y 0.19 2.3
B z 3.97 9.5
C x 1.24 2.4
C y 0.23 2.88
C z 0.66 9.7",
header = TRUE, stringsAsFactors = FALSE)

Calculate average within a specified range

I am using the 'diamonds' dataset from ggplot2 and am wanting to find the average of the 'carat' column. However, I want to find the average every 0.1:
Between
0.2 and 0.29
0.3 and 0.39
0.4 and 0.49
etc.
You can use function aggregate to mean by group which is calculated with carat %/% 0.1
library(ggplot2)
averageBy <- 0.1
aggregate(diamonds$carat, list(diamonds$carat %/% averageBy * averageBy), mean)
Which gives mean by 0.1
Group.1 x
1 0.2 0.2830764
2 0.3 0.3355529
3 0.4 0.4181711
4 0.5 0.5341423
5 0.6 0.6821408
6 0.7 0.7327491
...

ggplot2 - Display specific values on x-axis

I'm trying to display specific values on x-axis while plotting a line plot on with ggplot2. In my table, I have the num values which are quite distant from each other, that's why I want to plot them as discrete values.
line <- ggplot(lineplot, aes(value,num, colour=attribute))
line + geom_line()
Hope I've been clear, I'm a very beginner,
apologies in advance for the question
example table:
num value attribute
a 0 0.003 main
b 1 0.003 low
c 0 0.003 high
d 0 0.6 main
e 9 0.6 low
f 3 0.6 high
g 2 0.9 main
h 2 0.9 low
I 2 0.9 high
x-axis:
what i get:
0.003 0.6 0.9
i want:
0.003 0.6 0.9
If you want the x axis to be treated like a discrete factor then you have to add the group aesthetic to tell ggplot2 which points to connect with a line.
df <- read.table(text = "num value attribute
0 0.003 main
1 0.003 low
0 0.003 high
0 0.6 main
9 0.6 low
3 0.6 high
2 0.9 main
2 0.9 low
2 0.9 high", header = TRUE)
ggplot(df, aes(x = factor(value), y = num, group = attribute, color = attribute)) +
geom_line()
try to force x-axis to be as factor and not numeric
line <- ggplot(lineplot, aes(factor(value),num, colour=attribute))
line + geom_line()
Is that what you want ?

boxplot in ggplot gives unexpected output

I would like to plot a grouped boxplot using ggplot. Something like the picture below:
Below please see a sample (10 rows) from my data:
alpha colsample_bytree best_F1
35 0.00 0.5 0.5825656
78 0.10 0.3 0.4716612
68 0.00 0.3 0.4714286
27 0.40 1.0 0.4786216
49 0.15 0.5 0.4943968
62 0.00 0.3 0.4938805
70 0.00 0.3 0.4849785
73 0.10 0.3 0.4997061
59 0.30 0.5 0.4856369
88 0.20 0.3 0.4552402
sort(unique(data$alpha))
0 0.1 0.15 0.2 0.3 0.4
sort(unique(data$colsample_bytree))
0.3 0.5 1
My code is the following:
library(ggplot2)
library(ggthemes)
ggplot(data, aes(x= colsample_bytree, y = best_F1, fill = as.factor(alpha))) +
geom_boxplot(alpha = 0.5, position=position_dodge(1)) + theme_economist() +
ggtitle("F1 for alpha and colsample_bytree")
This produces the following plot:
and the following Warning:
Warning message:
"position_dodge requires non-overlapping x intervals"
Since the variable colsample_bytree takes 3 discrete values and the variable alpha takes 6 I would expect to see 3 groups of boxplots --each group comprised from 6 boxplots corresponding to the different alpa values and each group positioned at a different value of colsample_bytree,i.e. 0.3, 0.5 and 1.
I would expect the boxplots to not overlap just like in the example I cited.
You just have to include data$colsample_bytree <- as.factor(data$colsample_bytree) before you plot your data with the ggplot command.

Resources