How to plot frequency as step curve in R

How to plot frequency as step curve in R - r

I have a dataframe as shown below:
ID AC AF Type
1 60 1 0.00352113 1
2 48 1 0.00352113 2
3 25 1 0.00352113 1
4 98 1 0.00352113 2
5 24 1 0.00352113 1
6 64 2 0.00704225 1
I need to plot a step curve of AF on X-axis with its frequency on Y-axis colored by TYPE. I managed to have histogram using the below code:
ggplot(data, aes(x = AF,fill=TYPE))+geom_histogram(aes(y = ..count..),bins=40)
However, i need a curve plot as shown below instead of histogram:
Any suggestions to achieve this?

We can use geom_line with stat = 'count':
First I generate some dummy data:
set.seed(123)
df1 <- data.frame(Type = sample(1:3, 100, replace = T),
AF = sample(1:10, 100, replace = T,
prob = seq(.8, .2, length.out = 10)))
Then we make the plot:
ggplot(df1, aes(x = AF))+
geom_line(stat = 'count', aes(group = Type, colour = factor(Type)))
Here's an alternative (HT to #eipi)
set.seed(123)
df1 <- data.frame(Type = sample(1:3, 1000, replace = T),
AF = round(rnorm(1000), 3))
ggplot(df1, aes(x = AF))+
geom_step(stat = 'bin', aes(group = Type, colour = factor(Type)),
bins = 35)

In the regular graphics library you can do this:
set.seed(1)
AF<-sample(1:20,1000,replace=TRUE)
set.seed(2)
TYPE<-sample(c(1:2),1000,replace = TRUE)
plot(table(AF[which(TYPE==1)])/length(AF[which(TYPE==1)]),type="l",col="blue",
ylab="Frequency of AF",xlab="AF")
points(table(AF[which(TYPE==2)])/length(AF[which(TYPE==2)]),type="l")
legend("bottomright",c("Type1","Type2"),lty=1,lwd=3,col=c("blue","black"))

Related

stacked bar chart without using fill in geom_bar?

I have some dummy data and am able to create a bar chart and a stacked bar chart:
# some data
egdf <- data.frame(
ch = c('a', 'b', 'c'),
N = c(100, 110, 120),
M = c(10, 15, 20)
)
Looks like this:
egdf
ch N M
1 a 100 10
2 b 110 15
3 c 120 20
Now some charts:
# bar chart
ggplot(egdf, aes(x = ch, y = N)) +
geom_bar(stat = 'identity')
# stacked bar chart
egdf %>%
pivot_longer(cols = c(N, M), names_to = 'metric') %>%
ggplot(aes(x = ch, y = value, fill = metric)) +
geom_bar(stat = 'Identity')
My question is, is there a way to create the stacked bar chart from egdf directly without having to first transform with pivot_longer()?
[EDIT]
Why am I asking for this? My actual dataframe has some additional fields which are based on calculations off the current structure, e.g. it looks more like this:
egdf <- data.frame(
ch = c('a', 'b', 'c'),
N = c(120, 110, 100),
M = c(10, 15, 20)
) %>%
mutate(drop = N - lag(N),
drop_pct = scales::percent(drop / N),
Rate = scales::percent(M / N))
egdf
ch N M drop drop_pct Rate
1 a 120 10 NA <NA> 8.3%
2 b 110 15 -10 -9.09% 13.6%
3 c 100 20 -10 -10.00% 20.0%
In my plot, I'm adding on some additional geoms. If I was to pivot_longer, these relationships would be buckled. If I was able to somehow tell ggplot to make a stacked bar just based on feature1, feature2 (N and M in the example) it would be much easier for this particular use case.

Update: See valuable comment of stefan:
ggplot(egdf1, aes(x=ch, y=N+M)) +
geom_col(aes(fill="N")) +
geom_col(aes(x=ch, y=M, fill="M")) +
ylab("N") +
scale_y_continuous(breaks = scales::pretty_breaks(n = 10))
First answer:
Are you looking for such a solution?
ggplot(egdf1, aes(x=ch, y=N)) +
geom_col(aes(fill="N")) +
geom_col(aes(x=ch, y=M, fill="M"))

Adding title and another line to line graph using ggplot2

I got help from another user on how to create these plots (thank you!):
test <- data.frame("Site_No" = c("01370", "01332", "01442"),"0.99" = c(12, 15, 18), "0.98" = c(14,
15, 18), "0.90" = c(7, 22, 30), ".80" = c(3,2,1), ".75" = c(1, 6, 8), ".70" = c(5,6,9), ".60" = c(15,6,19), ".50" = c(5,6,9), ".40" = c(9,16,20), ".30" = c(1, 15, 3), ".25" = c(5,16,19), ".20" = c(5,1,20), ".10" = c(11,12,13), ".05" = c(15,16,28), "0.02" = c(22,20,12), ".01" = c(3,26,29))
dt <- as.data.table(test)
melted <- data.table::melt(dt, measure = c("X0.99","X0.98","X0.90"))
for (i in unique(melted$Site_No)){
dev.new()
print(ggplot2::ggplot(data = melted[Site_No == i,], mapping = aes(x = variable, y = value, group
= Site_No)) +
ggplot2::geom_line())
}
I just have a few questions for some additions
1) I would like to add a title to each of these graphs with the Site_No. I tried adding title = Site_no to the code, but it didn't work.
2) I would like to add another line to this graph that has this data (a different color than the other line):
test2 <- data.frame("Site_No" = c("01370", "01332", "01442"),"0.99" = c(19, 36, 22), "0.98" = c(19,
10, 28), "0.90" = c(2, 6, 8))
I tried copying the same code to add the other line, but it didn't work.
3) I would like to have each of these 3 plots save to my local directory automatically. So I don't have to do it individually for each plot (I am running 100 plots in reality, not 3).
Thank you so much for your help :)

For your question 1), to add a title, you can use ggtitle in your function.
For the question 2), a possible solution is to bind together your both dataframe.
library(data.table)
melted2 <- melt(setDT(test2), measure = c("X0.99","X0.98","X0.90"))
library(dplyr)
DF <- left_join(melted, melted2, by = c("Site_No","variable"))
DF <- melt(setDT(DF), measure = c("value.x","value.y"), variable.name = "Test",value.name = "Value")
Site_No variable Test Value
1: 01370 X0.99 value.x 12
2: 01332 X0.99 value.x 15
3: 01442 X0.99 value.x 18
4: 01370 X0.98 value.x 14
5: 01332 X0.98 value.x 15
6: 01442 X0.98 value.x 18
7: 01370 X0.90 value.x 7
8: 01332 X0.90 value.x 22
9: 01442 X0.90 value.x 30
10: 01370 X0.99 value.y 19
11: 01332 X0.99 value.y 36
12: 01442 X0.99 value.y 22
13: 01370 X0.98 value.y 19
14: 01332 X0.98 value.y 10
15: 01442 X0.98 value.y 28
16: 01370 X0.90 value.y 2
17: 01332 X0.90 value.y 6
18: 01442 X0.90 value.y 8
Then, to add a second line to your graph, you can modify group in the aes and add the color argument.
So, your function should look like this:
for (i in unique(DF$Site_No)){
dev.new()
print(ggplot2::ggplot(data = DF[Site_No == i,], mapping = aes(x = variable, y = Value, group
= Test)) +
ggplot2::geom_line(aes(color = Test)) +
ggplot2::scale_color_discrete(labels = c("test1","test2"))+
ggplot2::ggtitle(paste("Title:", i)))
}
For your question 3), you can use ggsave to directly save the graph into your current directory.
library(ggplot2)
for (i in unique(DF$Site_No)){
graph <- ggplot(data = DF[Site_No == i,], mapping = aes(x = variable, y = Value, group
= Test)) +
geom_line(aes(color = Test)) +
scale_color_discrete(labels = c("test1","test2"))+
ggtitle(paste("Title:", i))
ggsave(filename = paste0("Site_",i,".png"), plot = graph, device = "png", width = 5, height = 5, units = "in")
}
here an example of the graph saved:
EDIT: With more x values: continuous vs discrete plot
You mentioned you have 18 x values representing some percentiles and you would like them to be nicely display on your graph (they are confounded right now).
One way is to keep those values discrete and simply reduce the size of the x axis text in theme.
Here, the preparation of the datatable based on your new example:
library(data.table)
melted <- melt(setDT(test), measure = list(grep("X",colnames(test))))
melted2 <- melt(setDT(test2), measure = list(grep("X",colnames(test2))))
DF <- left_join(melted, melted2, by = c("Site_No","variable"))
DF <- melt(setDT(DF), measure = c("value.x","value.y"), variable.name = "Test",value.name = "Value")
DF$variable <- gsub("X\\.","X0\\.",DF$variable)
For the plot, you can get:
for (i in unique(DF$Site_No)){
graph <- ggplot(data = DF[Site_No == i,], mapping = aes(x = variable, y = Value, group
= Test)) +
geom_line(aes(color = Test)) +
scale_color_discrete(labels = c("test1","test2"))+
ggtitle(paste("Title:", i))+
theme(axis.text.x = element_text(angle = 90, size = 10, vjust = 0.5))
ggsave(filename = paste0("Site_",i,".png"), plot = graph, device = "png", width = 5, height = 5, units = "in")
}
Which gives you the following graph:
An another possibilty is to represent your data on a continuous scale and arrange the labeling to show a little bit less of text:
DF2 <- DF %>% mutate(variable = as.numeric(gsub("X","",variable)))
setDT(DF2)
for (i in unique(DF2$Site_No)){
graph <- ggplot(data = DF2[Site_No == i,], mapping = aes(x = variable, y = Value, group
= Test)) +
geom_line(aes(color = Test)) +
scale_color_discrete(labels = c("test1","test2"))+
scale_x_continuous(breaks = seq(0,1,by = 0.1))+
ggtitle(paste("Title:", i))
ggsave(filename = paste0("Site_",i,"_conti_.png"), plot = graph, device = "png", width = 5, height = 5, units = "in")
}
Which gives this kind of graph:
Finally, a third possibility is to add a scale to ggsave:
for (i in unique(DF$Site_No)){
graph <- ggplot(data = DF[Site_No == i,], mapping = aes(x = variable, y = Value, group
= Test)) +
geom_line(aes(color = Test)) +
scale_color_discrete(labels = c("test1","test2"))+
ggtitle(paste("Title:", i))
ggsave(filename = paste0("Site_",i,".png"), plot = graph, device = "png", width = 5, height = 5, units = "in", scale = 2)
}
You can also mix those solutions together and get some continuous scale with rotating labeling fro example. It's up to you.
Does it answer your question ?

Free colour scales in facet_grid

Say I have the following data frame:
# Set seed for RNG
set.seed(33550336)
# Create toy data frame
loc_x <- c(a = 1, b = 2, c = 3)
loc_y <- c(a = 3, b = 2, c = 1)
scaling <- c(temp = 100, sal = 10, chl = 1)
df <- expand.grid(loc_name = letters[1:3],
variables = c("temp", "sal", "chl"),
season = c("spring", "autumn")) %>%
mutate(loc_x = loc_x[loc_name],
loc_y = loc_y[loc_name],
value = runif(nrow(.)),
value = value * scaling[variables])
which looks like,
# > head(df)
# loc_name variables season loc_x loc_y value
# 1 a temp spring 1 3 86.364697
# 2 b temp spring 2 2 35.222573
# 3 c temp spring 3 1 52.574082
# 4 a sal spring 1 3 0.667227
# 5 b sal spring 2 2 3.751383
# 6 c sal spring 3 1 9.197086
I want to plot these data in a facet grid using variables and season to define panels, like this:
g <- ggplot(df) + geom_point(aes(x = loc_name, y = value), size = 5)
g <- g + facet_grid(variables ~ season)
g
As you can see, different variables have very different scales. So, I use scales = "free" to account for this.
g <- ggplot(df) + geom_point(aes(x = loc_name, y = value), size = 5)
g <- g + facet_grid(variables ~ season, scales = "free")
g
Mucho convenient. Now, say I want to do this, but plot the points by loc_x and loc_y and have value represented by colour instead of y position:
g <- ggplot(df) + geom_point(aes(x = loc_x, y = loc_y, colour = value),
size = 5)
g <- g + facet_grid(variables ~ season, scales = "free")
g <- g + scale_colour_gradient2(low = "#3366CC",
mid = "white",
high = "#FF3300",
midpoint = 50)
g
Notice that the colour scales are not free and, like the first figure, values for sal and chl cannot be read easily.
My question: is it possible to do an equivalent of scales = "free" but for colour, so that each row (in this case) has a separate colour bar? Or, do I have to plot each variable (i.e., row in the figure) and patch them together using something like cowplot?

Using the development version of dplyr:
library(dplyr)
library(purrr)
library(ggplot2)
library(cowplot)
df %>%
group_split(variables, season) %>%
map(
~ggplot(., aes(loc_x, loc_y, color = value)) +
geom_point(size = 5) +
scale_colour_gradient2(
low = "#3366CC",
mid = "white",
high = "#FF3300",
midpoint = median(.$value)
) +
facet_grid(~ variables + season, labeller = function(x) label_value(x, multi_line = FALSE))
) %>%
plot_grid(plotlist = ., align = 'hv', ncol = 2)

color by factor and continuous variable in ggplot

I am trying to use color to highlight differences between and within factor levels. For example, with these reproducible data:
set.seed(123)
dat <- data.frame(
Factor = sample(c("AAA", "BBB", "CCC"), 50, replace = T),
ColorValue = sample(1:4, 50 , replace = T),
x = sample(1:50, 50, replace =T),
y = sample(1:50, 50, replace =T))
head(dat)
Factor ColorValue x y
1 AAA 1 30 43
2 CCC 2 17 25
3 BBB 4 25 20
4 CCC 1 48 13
5 CCC 3 25 6
6 AAA 1 45 20
I want to have a different color for each Factor. Then, within each factor I am trying to use ColorValue as a continuous coloring variable to show intensity.
In the plot below, each facet would have different shades of red, green, and blue that reflect the ColorValue, ideally with a single intensity (i.e. ColorValue) legend for all three factor levels.
ggplot(dat, aes(x = x, y = y, color = Factor)) +
geom_point(size = 3) +
facet_wrap(~Factor) +
theme_bw()

ggplot(dat, aes(x = x, y = y, color = Factor, alpha = ColorValue)) +
geom_point(size = 3) +
facet_wrap(~Factor) +
theme_bw()

Combine Grouped and Stacked Bar Graph in R

The following is how my data frame looks like:
CatA CatB CatC
1 Y A
1 N B
1 Y C
2 Y A
3 N B
2 N C
3 Y A
4 Y B
4 N C
5 N A
5 Y B
I want to have CatA on X-Axis, and its count on Y-Axis. This graph comes fine. However, I want to create group for CatB and stack it with CatC keeping count in Y axis. This is what I have tried, and this is how it looks:
I want it to look like this:
My code:
ggplot(data, aes(factor(data$catA), data$catB, fill = data$catC))
+ geom_bar(stat="identity", position = "stack")
+ theme_bw() + facet_grid( ~ data$catC)
PS: I am sorry for providing links to images because I am not able to upload it, it gives me error occurred at imgur, every time I upload.

You could use facets:
df <- data.frame(A = sample(1:5, 30, T),
B = sample(c('Y', 'N'), 30, T),
C = rep(LETTERS[1:3], 10))
ggplot(df) + geom_bar(aes(B, fill = C), position = 'stack', width = 0.9) +
facet_wrap(~A, nrow = 1) + theme(panel.spacing = unit(0, "lines"))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to plot frequency as step curve in R - r

Related

stacked bar chart without using fill in geom_bar?

Adding title and another line to line graph using ggplot2

Free colour scales in facet_grid

color by factor and continuous variable in ggplot

Combine Grouped and Stacked Bar Graph in R

Categories

Resources