Making a stacked bar plot based on ranges in R and plotly - r

I want to create a stacked bar chart in R and plotly using iris dataset. In the x-axis, I want to set limits like iris_limits below in the code and the y-axis should contain all the Sepal.Length values which fit into these ranges. I want to pass the values as a single vector. Also, if the limits can be made dynamic by understanding the range of the Sepal.Length instead of hard coding it, please help. I have written a basic script with values to give you an idea. Thanks.
library(plotly)
iris_limits <- c("1-4", "4-6", "6-8")
sepal <- c(2.4,5.4,7.1)
data <- data.frame(iris_limits, sepal)
p <- plot_ly(data, x = ~iris_limits, y = ~sepal, type = 'bar', name =
'Sepal') %>%
layout(yaxis = list(title = 'Count'), barmode = 'group')
p

I tried my best to understand. First dividing the sepal length to the desired categories iris_limits: "1-3","3-6","6-9"
iris$iris_limits <- cut(iris$Sepal.Length, c(1,3,6,9))
Note: no sepal length is in between 1-3, so you only have 2 groups.
Then you want each sepal length limit as a separate bar on the x axis, and each individual sepal length falling into category to be bar stacked onto each other? You linked to a stack bar chart with varying color for the stacked bars, is this what you want?
Create an ID for each sepal length:
iris$ID <- factor(1:nrow(iris))
Plot, set color=~ID if you want different colors for the stacked bars:
library(plotly)
p <- plot_ly(iris, x = ~iris_limits, y = ~Sepal.Length, type = 'bar', color=~ID) %>%
layout(yaxis = list(title = 'Count'), barmode = 'stack')
EDITED For version that is not stacked but grouped by iris_limits, I switched to ggplot2 to make use of facet_wrap functionality to segregate by iris_limits, then use ggplotly.
gg <- ggplot(iris, aes(x=ID, y=Sepal.Length, fill=iris_limits)) +
geom_bar(stat="identity", position="dodge") +
facet_wrap(~iris_limits, scales="free_x", labeller=label_both) +
theme_minimal() + xlab("") + ylab("Sepal Length") +
theme(axis.text.x=element_blank())
ggplotly(gg)
EDITED: Re: Changing legend title and tooltip display
To change the legend title, use labs. Here it was also necessary to change the legend.title font size under theme to fit the ggplotly margins.
To change the tooltip text, add text parameter to aes to create desired character string, then define aes values to be displayed in tooltip in ggplotly.
gg <- ggplot(iris, aes(x=ID, y=Sepal.Length, fill=iris_limits,
text=paste("Sepal Length:", Sepal.Length, "cm"))) +
geom_bar(stat="identity", position="dodge") +
facet_wrap(~iris_limits, scales="free_x") +
theme_minimal() + xlab("") + ylab("Sepal Length (cm)") +
theme(axis.text.x=element_blank(), legend.title=element_text(size=10)) +
labs(fill="Sepal \nLength (cm)")
ggplotly(gg, tooltip=c("x", "text"))

Try using cut:
library(plotly)
iris$iris_limits <- as.numeric(cut(iris$Sepal.Length,3))
p <- plot_ly(iris, x = ~iris_limits, y = ~Sepal.Length, type = 'bar', name =
'Sepal') %>%
layout(yaxis = list(title = 'Count'), barmode = 'group')
p
The grouping details:
> iris$Sepal.Length[iris$iris_limits==1]
[1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9 5.4 4.8 4.8 4.3 5.4 5.1 5.1 5.4 5.1 4.6 5.1 4.8 5.0 5.0 5.2 5.2 4.7 4.8
[29] 5.4 5.2 5.5 4.9 5.0 5.5 4.9 4.4 5.1 5.0 4.5 4.4 5.0 5.1 4.8 5.1 4.6 5.3 5.0 5.5 4.9 5.2 5.0 5.5 5.5 5.4 5.5 5.5
[57] 5.0 5.1 4.9
> iris$Sepal.Length[iris$iris_limits==2]
[1] 5.8 5.7 5.7 6.4 6.5 5.7 6.3 6.6 5.9 6.0 6.1 5.6 6.7 5.6 5.8 6.2 5.6 5.9 6.1 6.3 6.1 6.4 6.6 6.7 6.0 5.7 5.8 6.0
[29] 6.0 6.7 6.3 5.6 6.1 5.8 5.6 5.7 5.7 6.2 5.7 6.3 5.8 6.3 6.5 6.7 6.5 6.4 5.7 5.8 6.4 6.5 6.0 5.6 6.3 6.7 6.2 6.1
[57] 6.4 6.4 6.3 6.1 6.3 6.4 6.0 6.7 5.8 6.7 6.7 6.3 6.5 6.2 5.9
> iris$Sepal.Length[iris$iris_limits==3]
[1] 7.0 6.9 6.8 7.1 7.6 7.3 7.2 6.8 7.7 7.7 6.9 7.7 7.2 7.2 7.4 7.9 7.7 6.9 6.9 6.8
>

Related

Problems with scatterplot error bars in ggplot2

I have a question about how I can do a scatterplot with error bars. I´m working with stable isotopes so I have data on D13C and D15N for faunal samples. I want to obtain a plot like this one (without convex hulls) attached (target)target.png
But on the contrary I obtain a plot like this (CNPlot)CNPlot.png
I´m using this script :
a<-read.table("Means.txt", header = TRUE)
theme_set(theme_classic(base_size = 16))
ggplot(a, aes(x=D13C, y=D15N)) +
geom_errorbar(aes(ymax=D13C+D13C.ds, ymin=D13C-D13C.ds), width=0.15,alpha=.8)+
geom_errorbarh(aes(xmax=D15N+D15N.ds, xmin=D15N-D15N.ds), height=0.15,alpha=.8)+
geom_point(aes(shape=Species),fill="white",size=4) +
geom_point(aes(color=Species,fill=Species,shape=Species),size=4, alpha = .5) +
scale_color_manual(values=c("black","dodgerblue1","coral4","darkorchid"))+
scale_fill_manual(values=c("black","dodgerblue1","coral4","darkorchid"))+
scale_shape_manual(values=c(21,23,22,24))+
labs(title=NULL,
subtitle=NULL,
caption=NULL,
x=expression(paste(delta^{13}, "Ccol(‰)")),
y=expression(paste(delta^{15}, "N(‰)")))
and I have two datasets but I´m using the one named Means but I have another one named CN_fauna where I included the raw data
Means:
Species D13C D13C.ds D15N D15N.ds
Bird -16.4 7.1 7.6 1.5
SH -18.5 1.7 5.5 2.7
CH -14.8 2.9 8.8 0.6
Deer -19.2 0.7 4.8 1.04
CN_fauna:
taxa D13C D15N
Bird -24.1 7.9
Bird -9.9 9
Bird -15.2 5.9
SH -17.0 9.6
SH -16.6 7.3
SH -20.3 4.6
SH -20.3 2.6
SH -20.3 2.7
SH -18.6 6.6
CH -16.9 9.4
CH -11.5 8.2
CH -16.1 8.8
Deer -18.6 3.0
Deer -19.1 6.0
Deer -18.3 5.4
Deer -17.9 5.4
Deer -19.2 5.6
Deer -20.4 5.6
Deer -19.5 6.1
Deer -20.3 5.9
Deer -18.7 5.4
Deer -19.7 3.8
Deer -19.2 3.4
Deer -19.9 4.1
Deer -18.4 4.3
Deer -20.1 4.1
I do not understand why the scales of the error barplots are different in my plot, any help is more than welcome.
Not to take away from the sage advice that #stefan provided (reproducible questions get better responses faster)...
I could be wrong, but I think your errorbar data is on the wrong axis. Is that what you were trying to create?
If so, you need to change your xmin to ymin and so on for the two errorbar layers. It would look something like this:
ggplot(Means, aes(x = D13C, y = D15N)) +
geom_errorbar(aes(xmax = D13C + D13C.ds,
xmin = D13C - D13C.ds), width=0.15,alpha=.8)+
geom_errorbar(aes(ymax = D15N + D15N.ds,
ymin = D15N - D15N.ds), height=0.15,alpha=.8)+
geom_point(aes(shape=Species),fill="white",size=4) +
geom_point(aes(color=Species,fill=Species,shape=Species),size=4, alpha = .5) +
scale_color_manual(values=c("black","dodgerblue1","coral4","darkorchid"))+
scale_fill_manual(values=c("black","dodgerblue1","coral4","darkorchid"))+
scale_shape_manual(values=c(21,23,22,24))+
labs(title=NULL,
subtitle=NULL,
caption=NULL,
x=expression(paste(delta^{13}, "Ccol(‰)")),
y=expression(paste(delta^{15}, "N(‰)")))

Draw only 2 ellipses in PCA plot (instead of 20)

I have a PCA plot created with ggplot/ggfortify
and the function autoplot(), such as in this question: Change point colors and color of frame/ellipse around points
head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
df <- iris[c(1, 2, 3, 4)]
autoplot(prcomp(df))
autoplot(prcomp(df), data = iris, colour = 'Species')
autoplot(prcomp(df), data = iris, colour = 'Species', shape='Species', frame=T)
Is there a way to draw only 1 or 2 frames/ellipses, instead of all of them, in the PCA plot?
The problem with using autoplot is that, although it is great for producing nice visualizations of common data structures and models with little effort, it doesn't give you the full freedom to customize the plot. However, it is pretty straightforward to do the whole thing within ggplot. The following is a full reprex:
library(ggplot2)
pc <- prcomp(iris[1:4])
df <- cbind(pc$x[,1:2], iris)
ggplot(df, aes(PC1, PC2, color = Species)) +
geom_point() +
stat_ellipse(geom = "polygon", aes(fill = after_scale(alpha(colour, 0.3))),
data = df[df$Species != "versicolor",])
Created on 2022-02-21 by the reprex package (v2.0.1)

Plotting sales over time in R

I am trying to show the top 100 sales on a scatterplot by year. I used the below code to take top 100 games according to sales and then set it as a data frame.
top100 <- head(sort(games$NA_Sales,decreasing=TRUE), n = 100)
as.data.frame(top100)
I then tried to plot this with the below code:
ggplot(top100)+
aes(x=Year, y = Global_Sales) +
geom_point()
I bet the below error when using the subset top100
Error: data must be a data frame, or other object coercible by fortify(), not a numeric vector
if i use the actual games dataseti get the plot attached.
Any ideas?
As pointed out in comments by #CMichael, you have several issues in your code.
In absence of reproducible example, I used iris dataset to explain you what is wrong with your code.
top100 <- head(sort(games$NA_Sales,decreasing=TRUE), n = 100)
By doing that you are only extracting a single column.
The same command with the iris dataset:
> head(sort(iris$Sepal.Length, decreasing = TRUE), n = 20)
[1] 7.9 7.7 7.7 7.7 7.7 7.6 7.4 7.3 7.2 7.2 7.2 7.1 7.0 6.9 6.9 6.9 6.9 6.8 6.8 6.8
So, first, you do not have anymore two dimensions to be plot in your ggplot2. Second, even colnames are not kept during the extraction, so you can't after ask for ggplot2 to plot Year and Global_Sales.
So, to solve your issue, you can do (here the example with the iris dataset):
top100 = as.data.frame(head(iris[order(iris$Sepal.Length, decreasing = TRUE), 1:2], n = 100))
And you get a data.frame of of this type:
> str(top100)
'data.frame': 100 obs. of 2 variables:
$ Sepal.Length: num 7.9 7.7 7.7 7.7 7.7 7.6 7.4 7.3 7.2 7.2 ...
$ Sepal.Width : num 3.8 3.8 2.6 2.8 3 3 2.8 2.9 3.6 3.2 ...
> head(top100)
Sepal.Length Sepal.Width
132 7.9 3.8
118 7.7 3.8
119 7.7 2.6
123 7.7 2.8
136 7.7 3.0
106 7.6 3.0
And then if you are plotting:
library(ggplot2)
ggplot(top100, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point()
Warning Based on what you provided in your example, I will suggest you to do:
top100 <- as.data.frame(head(games[order(games$NA_Sales,decreasing=TRUE),c("Year","Global_Sales")], 100))
However, if this is not satisfying to you, you should consider to provide a reproducible example of your dataset How to make a great R reproducible example

How to add labels at the top of vlines in ggplot2 and add these in separate legends

I have created a dummy dataframe representative of my data-
SQ AgeGroup Prop LCI UCI
2010-1 0 to 18 4.3 4.2 4.4
2010-1 19 to 25 5.6 5.3 5.6
2010-1 26 and over 7.8 7.6 7.9
2010-2 0 to 18 4.1 3.9 4.2
2010-2 19 to 25 5.8 5.6 5.9
2010-2 26 and over 8.1 7.9 8.3
2010-3 0 to 18 4.2 4 4.4
2010-3 19 to 25 5.5 5.2 5.6
2010-3 26 and over 7.6 7.4 7.7
2010-4 0 to 18 3.9 3.6 4.1
2010-4 19 to 25 5.2 5 5.4
2010-4 26 and over 7.4 7.2 7.6
2011-1 0 to 18 4.3 4.1 4.5
2011-1 19 to 25 5.7 5.5 5.8
2011-1 26 and over 8.2 8 8.3
2011-2 0 to 18 4.1 4 4.5
2011-2 19 to 25 5.7 5.5 5.9
2011-2 26 and over 8.2 8 8.4
2011-3 0 to 18 4.4 4.2 4.6
2011-3 19 to 25 5.7 5.5 7.9
2011-3 26 and over 8.2 8 8.4
which creates an image that looks like this-
I have used the following code-
library(readxl)
library(dplyr)
library(epitools)
library(gtools)
library(reshape2)
library(binom)
library(pivottabler)
library(readxl)
library(phecharts)
library(ggplot2)
library(RODBC)
rm(list=ls())
df<-read_xlsx("Dummydata.xlsx")
pd<-position_dodge(width=0.3)
limits <- aes(ymax =df$UCI , ymin = df$LCI)
p<-ggplot(df, aes(x = SQ, y =Prop, group=AgeGroup, colour= AgeGroup)) +
geom_line(position=pd)+
geom_point(size=2.0, position=pd)+
geom_errorbar(limits, width = 0.55, size=0.4, position= pd)+
labs(
y = "Percentage",
x = "Study Quarter")
p<-p +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
scale_y_continuous(name="Percentage",breaks=c(0,2,4,6,8,10),limits=c(0,10))+#limits need to change with every pot
scale_fill_manual(values = pal)+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1,size=16))+
theme(axis.text.y=element_text(size=16))+
theme(legend.text = element_text(size=18))+
theme(legend.title=element_text(size=16))+
theme(legend.title=element_blank())+
theme(legend.position="bottom")+
theme(axis.title = element_text(size=22))
p + geom_vline(xintercept = c(2,4,6), linetype="dotted",
color = "black", size=1.0, show.legend = TRUE)
However, what I want is that the three geom lines should have a lable (L1, L2 and L3) at the top of each of these lines and a separate legend at the bottom where I can add what these lines stand for. Something like this-
L1: Launch of x
L2: Launch of y
L3: Launch of z
Can someone please help with this?

create a dummy variable (using mutate) based on a pattern in a character string

I'm trying to figure out how to create a dummy variable based on a pattern in a character string. The point is to end up with a simple way to make certain aspects of my ggplot (color, linetype, etc.) the same for samples that have something in common (such as different types of mutations of the same gene -- each sample name contains the name of the gene, plus some other characters).
As an example with the iris dataset, let's say I want to add a column (my dummy variable) that will have one value for species whose names contain the letter "v", and another value for species that don't. (In the real dataset, I have many more possible categories.)
I've been trying to use mutate and recode, str_detect, or if_else, but can't seem to get the syntax right. For instance,
mutate(iris,
anyV = ifelse(str_detect('Species', "v"), "withV", "noV"))
doesn't throw any errors, but it doesn't detect that any of the species names contain a v, either. Which I think has to do with my inability to figure out how to get str_detect to work:
iris %>%
select(Species) %>%
str_detect("setosa")
just returns [1] FALSE.
iris %>%
filter(str_detect('Species', "setosa"))
doesn't work, either.
(I've also tried things like a mutate/recode solution, based on an example in 7 Most Practically Useful Operations When Wrangling Text Data in R , but can't get that to work, either.)
What am I doing wrong? And how do I fix it?
This works:
library(stringr)
iris%>% mutate(
anyV = ifelse(str_detect(Species, "v"), "withV", "noV"))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species anyV
1 5.1 3.5 1.4 0.2 setosa noV
2 4.9 3.0 1.4 0.2 setosa noV
3 4.7 3.2 1.3 0.2 setosa noV
4 4.6 3.1 1.5 0.2 setosa noV
5 5.0 3.6 1.4 0.2 setosa noV
...
52 6.4 3.2 4.5 1.5 versicolor withV
53 6.9 3.1 4.9 1.5 versicolor withV
54 5.5 2.3 4.0 1.3 versicolor withV
55 6.5 2.8 4.6 1.5 versicolor withV
56 5.7 2.8 4.5 1.3 versicolor withV
57 6.3 3.3 4.7 1.6 versicolor withV
58 4.9 2.4 3.3 1.0 versicolor withV
59 6.6 2.9 4.6 1.3 versicolor withV
An alternative to nested ifelse statements:
iris%>% mutate(newVar = case_when(
str_detect(.$Species, "se") ~ "group1",
str_detect(.$Species, "ve") ~ "group2",
str_detect(.$Species, "vi") ~ "group3",
TRUE ~ as.character(.$Species)))

Resources