creating a scatter plot using ggplot2 in r - r

class_day <- c(1:10)
control_group <- c(67,72,69,81,73,66,71,72,77,71)
A_treatment_group <- c(NA,72,77,81,73,85,69,73,74,77)
B_treatment_group <- c(NA,66,68,69,67,72,73,75,79,77)
class.df<-data.frame(class_day, control_group, A_treatment_group, B_treatment_group)
I tried to convert vecotrs to a table but I am not sure how to include three categories in one plot.
How can I get a scatter plot with three different colors?
I would like to set x-axis as class_day above and y axis as scores.

First, A cleaner way to make a dataframe without the intermediate variables.
You can make this type of chart by pivoting the data into "long" form:
class.df<-data.frame(class_day = c(1:10),
control_group = c(67,72,69,81,73,66,71,72,77,71),
A_treatment_group = c(NA,72,77,81,73,85,69,73,74,77),
B_treatment_group = c(NA,66,68,69,67,72,73,75,79,77) )
library(tidyverse)
class.df %>%
pivot_longer(!class_day) %>%
ggplot(aes(x=class_day, y=value, color=name))+
geom_point()

Here is a version with ggscatter from ggpubr:
library(ggpubr)
library(tidyverse)
class.df %>%
pivot_longer(-class_day,
names_to= "group",
values_to = "score") %>%
ggscatter(x = "class_day", y = "score", color = "group",
palette = c("#00AFBB", "#E7B800", "#FC4E07"))

Related

Proportional Bar Chart from Dataframe Created from Melt of Another Data Set

I need to create a proportional bar chart from a data set created by using the melt function from dplyr. By proportional, I mean that I would need a chart for which the height of each bar would be different, based on the proportion of the total. I would like to proportion to be for the value X generated by the following sample code, with fill based on "group". I have tried many online solutions, and I constantly run into not an error code, but solid bars, with no difference in proportions
See sample code:
library(ggplot2)
library(tidyr)
set.seed(1)
example_matrix <-matrix(rpois(90,7), nrow=6,ncol=15)
example_df <- data.frame(example_matrix)
rownames(example_df) <-c('group1','group2','group3','group4','group5','group6')
df <- reshape2::melt(as.matrix(example_df))```
library(data.table)
library(ggplot2)
set.seed(1)
example_matrix <-matrix(rpois(90,7), nrow=6,ncol=15)
example_df <- data.frame(example_matrix)
rownames(example_df) <-c('group1','group2','group3','group4','group5','group6')
df <- reshape2::melt(as.matrix(example_df))
df$fraction <- df$value/sum(df$value)
setDT(df)
p <-
ggplot(data = df, aes(x = Var2, y = fraction, fill = Var1)) +
geom_bar(stat = "identity", position = "dodge") +
scale_fill_manual(values = c("blue", "red", "black", "orange", "pink", "yellow"))
print(p)
I don't think this can be done with the plotting commands directly, you'll need to transform your data before. For example:
library(dplyr)
df <- df %>% group_by(Var2) %>% mutate(fraction = value/sum(value))
and then plot either with the ggplot solution from the other answer or here's a plotly version:
library(plotly)
plot_ly(data = df, x = ~Var2, y = ~fraction, color = ~Var1, type = 'bar')

ggplot2 version of shepard plot, i.e. vegan::stressplot()?

There seems to be quite a bit of information for plotting NMDS outputs (i.e. NMDS1 vs NMDS1) using ggplot2 however I cannot find a way to plot the vegan::stressplot() (shepard's plot) using ggplot2.
Is there a way to produce a ggplot2 version of a metaMDS output?
Reproducible code
library(vegan)
set.seed(2)
community_matrix = matrix(
sample(1:100,300,replace=T),nrow=10,
dimnames=list(paste("community",1:10,sep=""),paste("sp",1:30,sep="")))
example_NMDS=metaMDS(community_matrix, k=2)
stressplot(example_NMDS)
Created on 2021-09-17 by the reprex package (v2.0.1)
Here's a workaround to plot a very similar plot using ggplot2.The trick was to get the structure of the stressplot(example_NMDS) and extract the data stored in that object. I used the tidyverse package that includes ggplot and other packages such as tidyr that contains the pivot_longer function.
library(vegan)
library(tidyverse)
# Analyze the structure of the stressplot
# Notice there's an x, y and yf list
str(stressplot(example_NMDS))
# Create a tibble that contains the data from stressplot
df <- tibble(x = stressplot(example_NMDS)$x,
y = stressplot(example_NMDS)$y,
yf = stressplot(example_NMDS)$yf) %>%
# Change data to long format
pivot_longer(cols = c(y, yf),
names_to = "var")
# Create plot
df %>%
ggplot(aes(x = x,
y = value)) +
# Add points just for y values
geom_point(data = df %>%
filter(var == "y")) +
# Add line just for yf values
geom_step(data = df %>%
filter(var == "yf"),
col = "red",
direction = "vh") +
# Change axis labels
labs(x = "Observed Dissimilarity", y = "Ordination Distance") +
# Add bw theme
theme_bw()

plotting line graph with two lines in R

I'm trying to create a simple double line graph with a dataset I made. Here's the data:
date <- c("2021-04-06","2021-04-10", "2021-04-14", "2021-04-18")
as.Date(date)
graded <- c(3408, 3572, 3647, 3864)
psa10 <- c(2099, 2130, 2147, 2193)
graded_marvel <- data.frame(date, graded, psa10)
graded_marvel
And here's what I did to try and graph this
library("ggplot2")
graph <- ggplot(graded_marvel, aes(date)) +
geom_line(aes(y = graded), color = "darkred") +
geom_line(aes(y = psa10), color = "blue")
print(graph)
All I get is an empty graph that has the correct values on the axes, but the graph just comes up empty. Not sure what to do. Any help is appreciated!
This happens because your date variable is not a date, so ggplot2 interprets is as a character and assigns a discrete x scale. This auto-groups your data based on the x-axis value, so every 'group' only has one observations, with which you cannot draw a line. The way to fix this is to convert your date to a proper Date class.
library(ggplot2)
date <- c("2021-04-06","2021-04-10", "2021-04-14", "2021-04-18")
graded <- c(3408, 3572, 3647, 3864)
psa10 <- c(2099, 2130, 2147, 2193)
graded_marvel <- data.frame(date, graded, psa10)
ggplot(graded_marvel, aes(as.Date(date))) +
geom_line(aes(y = graded), color = "darkred") +
geom_line(aes(y = psa10), color = "blue")
Created on 2021-04-19 by the reprex package (v1.0.0)
First get long format with pivot_longer.
Then plot with ggplot2.
library("ggplot2")
ggplot(df, aes(x=factor(date), y = values, group = names)) +
geom_point(aes(color=names)) +
geom_line(aes(linetype=names, color=names)) +
scale_colour_manual(values=c("darkred", "blue"))
data:
df <- graded_marvel %>%
pivot_longer(
cols = -date,
names_to = "names",
values_to = "values"
)

plotly and ggplot legend order interaction

I have multiple graphs that I am plotting with ggplot and then sending to plotly. I set the legend order based the most recent date, so that one can easily interpret the graphs. Everything works great in generating the ggplot, but once I send it through ggplotly() the legend order reverts to the original factor level. I tried resetting the factors but this creates a new problem - the colors are different in each graph.
Here's the code:
Data:
Country <- c("CHN","IND","INS","PAK","USA")
a <- data.frame("Country" = Country,"Pop" = c(1400,1300,267,233,330),Year=rep(2020,5))
b <- data.frame("Country" = Country,"Pop" = c(1270,1000,215,152,280),Year=rep(2000,5))
c <- data.frame("Country" = Country,"Pop" = c(1100,815,175,107,250),Year=rep(1990,5))
Data <- bind_rows(a,b,c)
Legend Ordering Vector - This uses 2020 as the year to determine order.
Legend_Order <- Data %>%
filter(Year==max(Year)) %>%
arrange(desc(Pop)) %>%
select(Country) %>%
unlist() %>%
as.vector()
Then I create my plot and use Legend Order as breaks
Graph <- Data %>%
ggplot() +
geom_line(aes(x = Year, y = Pop, group = Country, color = Country), size = 1.2) +
scale_color_discrete(name = 'Country', breaks = Legend_Order)
Graph
But then when I pass this on to:
ggplotly(Graph)
For some reason plotly ignores the breaks argument and uses the original factor levels.
If I set the factor levels beforehand, the color schemes changes (since the factors are in a different order).
How can I keep the color scheme from graph to graph, but change the legend order when using plotly?
Simply recode your Conutry var as factor with the levels set according to Legend_Order. Try this:
library(plotly)
library(dplyr)
Country <- c("CHN","IND","INS","PAK","USA")
a <- data.frame("Country" = Country,"Pop" = c(1400,1300,267,233,330),Year=rep(2020,5))
b <- data.frame("Country" = Country,"Pop" = c(1270,1000,215,152,280),Year=rep(2000,5))
c <- data.frame("Country" = Country,"Pop" = c(1100,815,175,107,250),Year=rep(1990,5))
Data <- bind_rows(a,b,c)
Legend_Order <- Data %>%
filter(Year==max(Year)) %>%
arrange(desc(Pop)) %>%
select(Country) %>%
unlist() %>%
as.vector()
Data$Country <- factor(Data$Country, levels = Legend_Order)
Graph <- Data %>%
ggplot() +
geom_line(aes(x = Year, y = Pop, group = Country, color = Country), size = 1.2)
ggplotly(Graph)
To "lock in" the color assignment you can make use of a named color vector like so (for short I only show the ggplots):
# Fix the color assignments using a named color vector which can be assigned via scale_color_manual
cols <- scales::hue_pal()(5) # Default ggplot2 colors
cols <- setNames(cols, Legend_Order) # Set names according to legend order
# Plot with unordered Countries but "ordered" color assignment
Data %>%
ggplot() +
geom_line(aes(x = Year, y = Pop, color = Country), size = 1.2) +
scale_color_manual(values = cols)
# Plot with ordered factor
Data$Country <- factor(Data$Country, levels = Legend_Order)
Data %>%
ggplot() +
geom_line(aes(x = Year, y = Pop, color = Country), size = 1.2) +
scale_color_manual(values = cols)

Can we plot percentage with Plot_ly

is there a way to plot percentages using plot_ly. For example, the below is used to plot the count of cut from diamonds dataset,
plot_ly(diamonds, x = ~cut)
But i tried to plot the percentage for cut. For example I need the percentage of "Good" to the total count. Is there a way to get it?
It could be done like this.
First, create percentage for each cut category
diamonds %>% group_by(cut) %>% summarize(perc = n()/53940*100)
summarized dataset
Second, pipe the resultant data set to plot_ly()
diamonds %>% group_by(cut) %>% summarize(perc = n()/53940*100) %>% plot_ly(x = ~cut, y = ~perc)
R Plot
You can use data.table and ggplot2:
library(data.table)
library(ggplot2)
dt <- data.table(diamonds)
Calculate the number of records by each cut, and then calculate the prop.table of those counts:
result <- dt[, .N, by = cut][, .(cut, N, percentCut = prop.table(N))]
Now you can plot it with ggplot and use the library scales to have a beautiful percent-formatted y-axis:
p <- ggplot(result, aes(x = cut, y = percentCut))+
geom_col()+
scale_y_continuous(labels = scales::percent)
Now you can pass p to plotly, if so you want:
plotly::ggplotly(p)

Resources