hist column in dependency of the relative frequency of other column (R) - r

Lets say we have the table:
x y
1 43
1 54
2 54
3 22
2 22
1 43
I want to hist on the x-axis only 1,2,3 so it recognizes the unique values but in addition it should show in % the frequency of the number 43 in 1 then 54 and so on. Should both columns be factorized?

Here's my solution:
library("ggplot2")
library("dplyr")
library("magrittr")
library("tidyr")
df <- data.frame(x = c(1,1,2,3,2,1), y = c(43,54,54,22,22,43))
#Creating a counter that will keep track
#Of how many of each number in y exist for each x category
df$n <- 1
df %<>% #This is a bidirectional pipe here that overwrites 'df' with the result!
group_by(x, y) %>% #Unidirectional pipe
tally(n) %>%
mutate(n = round(n/sum(n), 2)) #Calculating as percentage
#Plotting
df %>%
ggplot(aes(fill = as.factor(y), y = n, x = x)) +
geom_bar(position = "fill", stat = "identity") +
scale_y_continuous(labels = scales::percent) +
labs(y = "Percentage contribution from each y category") +
#Adding the percentage values as labels
geom_text(aes(label = paste0(n*100,"%")), position = position_stack(vjust = 0.5), size = 2)
Note: the y-axis values are presented as percentages because position="fill" is passed to geom_bar().

Related

Problem with naming x-axis with ggplot2 in Rstudio

I'm trying to create some variation of a pareto-chart.
Moving along the code I face a problem I cannot solve on my own for several hours. It's regarding the data order of the package ggplot2 (1) and renaming the labels accordingly(2).
(1)Since I want to create an ordered bar-plot with a saturation curve, I created a dummyvar from X to X-1, so my bars are sorted from high to low, as you can see in the output (1).
By maneuvering around this problem I created a second problem I can't fix.
(2)I have a column in my df containing all the species I want to see at the x-axis. However, ggplot won't allow to print those accordingly. Actually since I added the command I won't get any labeling on the x-axis. Somehow I will not get any error.
So my question is:
Is there a way to use my species list as x-axis?(But remember my data has to be sorted from high to low)
Or does some one easily spot a way to solve the labeling problem?
cheers
dfb
Beech id proc kommu Order
1 Va fla 1 8.749851 8.749851 Psocopt
2 Er 2 7.793812 16.543663 Acari
3 Faga dou 3 7.659406 24.203069 Dipt
4 Tro 4 6.675941 30.879010 Acari
5 Hal ann 5 6.289307 37.168317 Dipt
6 Stigm 6 3.724406 40.892723 Acari
7 Di fag 7 3.642574 44.535297 Lepidopt
8 Phyfa 8 3.390545 47.925842 Neoptera
9 Phylma 9 2.766040 50.691881 Lepidopt
data example:
structure(list(Beech = c("Va fla", "Er", "Faga dou", "Tro", "Hal ann",
"Stigm", "Di fag", "Phyfa", "Phylma"), id = c(1, 2, 3, 4, 5,
6, 7, 8, 9), proc = c(8.749851, 7.793812, 7.659406, 6.675941,
6.289307, 3.724406, 3.642574, 3.390545, 2.76604), kommu = c(8.749851,
16.543663, 24.203069, 30.87901, 37.168317, 40.892723, 44.535297,
47.925842, 50.691881), Order = c("Psocopt", "Acari", "Dipt",
"Acari", "Dipt", "Acari", "Lepidopt", "Neoptera", "Lepidopt")), row.names = c(NA,
-9L), class = c("tbl_df", "tbl", "data.frame"))
library(openxlsx)
library(ggplot2)
dfb <- data.xlsx ###(df containing different % values per species)
labelb <- dfb$Beech ###(list of 22 items; same number as x-values)
p <-ggplot(dfb, aes(x=id))
p <- p + geom_bar(aes(y = proc), stat = "identity", fill = "lightgreen")
p <- p + geom_line(aes(y = kommu/10), color = "orange", size = 2) + geom_point(aes(y = kommu/10),size = 2)
p <- p + scale_y_continuous(sec.axis = sec_axis(~.*10, name ="Total biocoenosis[%]"))
p <- p + labs(y = "Species [%]",
x = "Species")
p <- p + scale_x_discrete(labels = labelb)
p <- p + theme(legend.position = c(0.8, 0.9))
--> Answer to other comments:
So basically my problem is the bars are not labeled with a species name.
I know that this is a result due to my dummyvar, which is basically 1 to 22.
So I try to force ggplot to name the x-axis with my wanted values.
But this input doesn't work
p <- p + scale_x_discrete(labels = labelb)
But back to your suggestions:
Jeah, I tried tidyverse just after creating this post and couldn't handle it good enough. But your idea doesn't do anything for me, its like using the ggplot command.
arrange(Beech) %>%
mutate(Beech = factor(Beech, levels = unique(.$Beech))) %>%
ggplot(aes(Beech, proc)) +
geom_col()
I can't quite tell from the picture what's going wrong, but one way to make sure your bar plots are in ascending/descending order is to arrange the column and then convert it to a factor using the existing order of the categories:
So, without ordering:
library(tidyverse)
diamonds %>%
group_by(cut) %>%
summarize(price = mean(price)) %>%
ggplot(aes(cut, price)) +
geom_bar(stat = "identity")
And with ordering:
diamonds %>%
group_by(cut) %>%
summarize(price = mean(price)) %>%
arrange(price) %>%
mutate(cut = factor(cut, levels = unique(.$cut))) %>%
ggplot(aes(cut, price)) +
geom_bar(stat = "identity")
I edited your code with the database sample you provided and I think I was able to do what you wanted.
Basically I sorted Beech depending on the descending proc and then convert it to factor. Here is the modified code and the result:
p <-
dfb %>%
arrange(desc(proc)) %>%
mutate(Beech = factor(Beech, levels = unique(.$Beech))) %>%
ggplot(aes(Beech)) +
geom_bar(aes(y = proc), stat = "identity", fill = "lightgreen") +
geom_line(aes(y = kommu/10, x=as.integer(Beech)), color = "orange", size = 2) +
geom_point(aes(y = kommu/10),size = 2) +
labs(y = "Species [%]", x = "Species") +
scale_x_discrete("Species") +
scale_y_continuous(sec.axis = sec_axis(~.*10, name ="Total biocoenosis[%]")) +
theme(legend.position = c(0.8, 0.9))
p
Note: I had to tweak a bit the geom_line by adding x=as.integer(Beech) because it works with numbers and not factors.

How to create two barplots with different x and y axis in tha same plot in R?

I need plot two grouped barcodes with two dataframes that has distinct number of rows: 6, 5.
I tried many codes in R but I don't know how to fix it
Here are my data frames: The Freq colum must be in Y axis and the inter and intra columns must be the x axis.
> freqinter
inter Freq
1 0.293040975264367 17
2 0.296736775990729 2
3 0.297619926364764 4
4 0.587377012109561 1
5 0.595245125315916 4
6 0.597022018595893 2
> freqintra
intra Freq
1 0 3
2 0.293040975264367 15
3 0.597022018595893 4
4 0.598809552335782 2
5 0.898227748764939 6
I expect to plot the barplots in the same plot and could differ inter e intra values by colour
I want a picture like this one:
You probably want a histogram. Use the raw data if possible. For example:
library(tidyverse)
freqinter <- data.frame(x = c(
0.293040975264367,
0.296736775990729,
0.297619926364764,
0.587377012109561,
0.595245125315916,
0.597022018595893), Freq = c(17,2,4,1,4,2))
freqintra <- data.frame(x = c(
0 ,
0.293040975264367,
0.597022018595893,
0.598809552335782,
0.898227748764939), Freq = c(3,15,4,2,6))
df <- bind_rows(freqinter, freqintra, .id = "id") %>%
uncount(Freq)
ggplot(df, aes(x, fill = id)) +
geom_histogram(binwidth = 0.1, position = 'dodge', col = 1) +
scale_fill_grey() +
theme_minimal()
With the data you posted I don't think you can have this graph to look good. You can't have bars thin enough to differentiate 0.293 and 0.296 when your data ranges from 0 to 0.9.
Maybe you could try to treat it as a factor just to illustrate what you want to do:
freqinter <- data.frame(x = c(
0.293040975264367,
0.296736775990729,
0.297619926364764,
0.587377012109561,
0.595245125315916,
0.597022018595893), Freq = c(17,2,4,1,4,2))
freqintra <- data.frame(x = c(
0 ,
0.293040975264367,
0.597022018595893,
0.598809552335782,
0.898227748764939), Freq = c(3,15,4,2,6))
df <- bind_rows(freqinter, freqintra, .id = "id")
ggplot(df, aes(x = as.factor(x), y = Freq, fill = id)) +
geom_bar(stat = "identity", position = position_dodge2(preserve = "single")) +
theme(axis.text.x = element_text(angle = 90)) +
scale_fill_discrete(labels = c("inter", "intra"))
You can also check the problem by not treating your x variable as a factor:
ggplot(df, aes(x = x, y = Freq, fill = id)) +
geom_bar(stat = "identity", width = 0.05, position = "dodge") +
theme(axis.text.x = element_text(angle = 90)) +
scale_fill_discrete(labels = c("inter", "intra"))
Either the bars must be very thin (small width), or you'll get overlapping x intervals breaking the plot.

R, ggplot stacked bar-chart with position = "fill" and labels

I'm working with ggplot2, stacked barplot to 100% with relative values, using the position = "fill" option in geom_bar().
Here my code:
test <- data.frame (x = c('a','a','a','b','b','b','b')
,k = c('k','j','j','j','j','k','k')
,y = c(1,3,4,2,5,9,7))
plot <- ggplot(test, aes(x =x, y = y, fill = k))
plot <- plot + geom_bar(position = "fill",stat = "identity")
plot <- plot + scale_fill_manual(values = c("#99ccff", "#ff6666"))
plot <- plot + geom_hline(yintercept = 0.50)+ggtitle("test")
plot
Here the result:
However, I need to add the labels on the various bars, also on the "sub bars". To do this, I worked with the geom_text():
plot + geom_text(aes(label=y, size=4))
But the result is not good. I tried without luck the hjust and vjust parameters, and also using something like:
plot + geom_text(aes(label=y/sum(y), size=4))
But I did not reach the result needed (I'm not adding all the tests to not overload the question with useless images, if needed, please ask!).
Any idea about to have some nice centered labels?
label specifies what to show, and y specifies where to show. Since you are using proportions for y-axis with position = "fill", you need to calculate the label positions (geom_text(aes(y = ...))) in terms of proportions for each x using cumulative sums. Additionally, to display only the total proportion of a given color, you will need to extract the Nth row for each x, k combination. Here, I am building a separate test_labels dataset for use in geom_text to display the custom labels:
test <- data.frame (x = c('a','a','a','b','b','b','b'),
k = c('k','j','j','j','j','k','k'),
y = c(1,3,4,2,5,9,7))
test_labels = test %>%
arrange(x, desc(k)) %>%
group_by(x) %>%
mutate(ylabel_pos = cumsum(y)/sum(y),
ylabel = y/sum(y)) %>%
group_by(k, add = TRUE) %>%
mutate(ylabel = sum(ylabel)) %>%
slice(n())
ggplot(test, aes(x =x, y = y, fill = k)) +
geom_bar(position = "fill", stat = "identity") +
scale_fill_manual(values = c("#99ccff", "#ff6666")) +
geom_hline(yintercept = 0.50) +
geom_text(data = test_labels,
aes(y = ylabel_pos, label=paste(round(ylabel*100,1),"%")),
vjust=1.6, color="white", size=3.5) +
ggtitle("test")
Result:
> test_labels
# A tibble: 4 x 5
# Groups: x, k [4]
x k y ylabel_pos ylabel
<fctr> <fctr> <dbl> <dbl> <dbl>
1 a j 4 1.0000000 0.8750000
2 a k 1 0.1250000 0.1250000
3 b j 5 1.0000000 0.3043478
4 b k 7 0.6956522 0.6956522

Plot divergent stacked bar chart with ggplot2

Is there a way to use ggplot2 to create divergent stacked bar charts like the one on the right-hand side of the image below?
Data for reproducible example
library(ggplot2)
library(scales)
library(reshape)
dat <- read.table(text = " ONE TWO THREE
1 23 234 324
2 34 534 12
3 56 324 124
4 34 234 124
5 123 534 654",sep = "",header = TRUE)
# reshape data
datm <- melt(cbind(dat, ind = rownames(dat)), id.vars = c('ind'))
# plot
ggplot(datm,aes(x = variable, y = value,fill = ind)) +
geom_bar(position = "fill",stat = "identity") +
coord_flip()
Sure, positive values stack positively, negative values stack negatively. Don't use position fill. Just define what you want as negative values, and actually make them negative. Your example only has positive scores. E.g.
ggplot(datm, aes(x = variable, y = ifelse(ind %in% 1:2, -value, value), fill = ind)) +
geom_col() +
coord_flip()
If you want to also scale to 1, you need some preprocessing:
library(dplyr)
datm %>%
group_by(variable) %>%
mutate(value = value / sum(value)) %>%
ggplot(aes(x = variable, y = ifelse(ind %in% 1:2, -value, value), fill = ind)) +
geom_col() +
coord_flip()
An extreme approach might be to calculate the boxes yourself. Here's one method
dd <- datm %>% group_by(variable) %>%
arrange(desc(ind)) %>%
mutate(pct = value/sum(value), right = cumsum(pct), left=lag(right, default=0))
then you can plot with
ggplot(dd) +
geom_rect(aes(xmin=right, xmax=left, ymin=as.numeric(variable)-.4, ymax=as.numeric(variable)+.4, fill=ind)) +
scale_y_continuous(labels=levels(dd$variable), breaks=1:nlevels(dd$variable))
to get the left plot. and to get the right, you just shift the boxes a bit. This will line up all the right edges of the ind 3 boxes.
ggplot(dd %>% group_by(variable) %>% mutate(left=left-right[ind==3], right=right-right[ind==3])) +
geom_rect(aes(xmin=right, xmax=left, ymin=as.numeric(variable)-.4, ymax=as.numeric(variable)+.4, fill=ind)) +
scale_y_continuous(labels=levels(dd$variable), breaks=1:nlevels(dd$variable))
So maybe overkill here, but you have a lot of control this way.

ordering y axis with facet

I have this data frame
library(dplyr)
dat =data.frame(parent = c("J","J","F","F"),group= c("A(4)","C(3)","A(4)","D(5)"),value=c(1,2,3,4),
count = c(4,3,4,5))
dat %>% arrange(parent,-count )
parent group value count
1 F D(5) 4 5
2 F A(4) 3 4
3 J A(4) 1 4
4 J C(3) 2 3
You can see above that the data is ordered by parent and then count descending. I would like the chart to keep this ordering BUT when I plot it in the "J" chart C(3) comes on top of A(4) and that should be revered. How can that be done?
ggplot(dat, aes(x = group, y= value))+
geom_bar(stat ="identity",position = "dodge")+
coord_flip()+
facet_wrap(~parent, scale = "free_y")
You can set your factor on the group column using the currently ordered values as the levels (reversed to get the ordering you want). This will lock in the ordering.
library(dplyr)
dat =data.frame(parent = c("J","J","F","F"),group= c("A(4)","C(3)","A(4)","D(5)"),value=c(1,2,3,4),
count = c(4,3,4,5))
dat <- dat %>% arrange(parent,-count )
dat$group <- factor(dat$group, levels = rev(unique(dat$group)))
ggplot(dat, aes(x = group, y= value))+
geom_bar(stat ="identity",position = "dodge")+
coord_flip()+
facet_wrap(~parent, scale = "free_y")
You can try:
dat %>% arrange(parent,-count ) %>%
mutate(group2=factor(c(2,1,4,3))) %>%
ggplot(aes(x = group2, y= value))+
geom_bar(stat ="identity",position = "dodge")+
coord_flip()+
facet_wrap(~parent, scale = "free_y")+
scale_x_discrete(breaks=1:4,labels = c("A(4)","D(5)","C(3)","A(4)"))

Resources