Sort data in graph in R - r

I have the following code :
library(ggplot2)
ggplot(data = diamonds, aes(x = cut)) +
geom_bar()
with this result.
I would like to sort the graph on the count descending.

There are multiple ways of how to do it (it is probably possible just by using options within ggplot). But a way using dplyr library to first summarize the data and then use ggplot to plot the bar chart might look like this:
# load the ggplot library
library(ggplot2)
# load the dplyr library
library(dplyr)
# load the diamonds dataset
data(diamonds)
# using dplyr:
# take a dimonds dataset
newData <- diamonds %>%
# group it by cut column
group_by(cut) %>%
# count number of observations of each type
summarise(count = n())
# change levels of the cut variable
# you tell R to order the cut variable according to number of observations (i.e. count variable)
newData$cut <- factor(newData$cut, levels = newData$cut[order(newData$count, decreasing = TRUE)])
# plot the ggplot
ggplot(data = newData, aes(x = cut, y = count)) +
geom_bar(stat = "identity")

Related

Can we plot percentage with Plot_ly

is there a way to plot percentages using plot_ly. For example, the below is used to plot the count of cut from diamonds dataset,
plot_ly(diamonds, x = ~cut)
But i tried to plot the percentage for cut. For example I need the percentage of "Good" to the total count. Is there a way to get it?
It could be done like this.
First, create percentage for each cut category
diamonds %>% group_by(cut) %>% summarize(perc = n()/53940*100)
summarized dataset
Second, pipe the resultant data set to plot_ly()
diamonds %>% group_by(cut) %>% summarize(perc = n()/53940*100) %>% plot_ly(x = ~cut, y = ~perc)
R Plot
You can use data.table and ggplot2:
library(data.table)
library(ggplot2)
dt <- data.table(diamonds)
Calculate the number of records by each cut, and then calculate the prop.table of those counts:
result <- dt[, .N, by = cut][, .(cut, N, percentCut = prop.table(N))]
Now you can plot it with ggplot and use the library scales to have a beautiful percent-formatted y-axis:
p <- ggplot(result, aes(x = cut, y = percentCut))+
geom_col()+
scale_y_continuous(labels = scales::percent)
Now you can pass p to plotly, if so you want:
plotly::ggplotly(p)

R: using ggplot2 with a group_by data set

I can't quite figure this out. A CSV of 200+ rows assigned to data like so:
gid,bh,p1_id,p1_x,p1_y
90467,R,543333,80.184,98.824
90467,L,408045,74.086,90.923
90467,R,543333,57.629,103.797
90467,L,408045,58.589,95.937
Trying to group by p1_id and plot the mean values for p1_x and p1_y:
grp <- data %>% group_by(p1_id)
Trying to plot geom_point objects like so:
geom_point(aes(mean(grp$p1_x), mean(grp$p1_y), color=grp$p1_id))
But that isn't showing unique plot points per distinct p1_id values.
What's the missing step here?
Why not calculate the mean first?
library(dplyr)
grp <- data %>%
group_by(p1_id) %>%
summarise(mean_p1x = mean(p1_x),
mean_p1y = mean(p1_y))
Then plot:
library(ggplot2)
ggplot(grp, aes(x = mean_p1x, y = mean_p1y)) +
geom_point(aes(color = as.factor(p1_id)))
Edit: As per #eipi10, you can also pipe directly into ggplot
data %>%
group_by(p1_id) %>%
summarise(mean_p1x = mean(p1_x),
mean_p1y = mean(p1_y)) %>%
ggplot(aes(x = mean_p1x, y = mean_p1y)) +
geom_point(aes(color = as.factor(p1_id)))

Using gganimate and ggplot for a boxplot: Cumulative not working

I'm trying to produce an animation for a simulation model, and I want to show how the distribution of results changes as the simulation runs.
I've seen gganimate used for scatter plots but not for boxplots (or ideally violin plots). Here I've provided a reprex.
When I use sim_category (which is a bucket for a certain number of simulation runs) I want the result to be cumulative of all previous runs to show the total distribution.
In this example (and my actual code), cumulative = TRUE does not do this. Why is this?
library(gganimate)
library(animation)
library(ggplot2)
df = as.data.frame(structure(list(ID = c(1,1,2,2,1,1,2,2,1,1,2,2),
value = c(10,15,5,10,7,17,4,12,9,20,6,17),
sim_category = c(1,1,1,1,2,2,2,2,3,3,3,3))))
df$ID <- factor(df$ID, levels = (unique(df$ID)))
df$sim_category <- factor(df$sim_category, levels = (unique(df$sim_category)))
ani.options(convert = shQuote('C:/Program Files/ImageMagick-7.0.5-Q16/magick.exe'))
p <- ggplot(df, aes(ID, value, frame= sim_category, cumulative = TRUE)) + geom_boxplot(position = "identity")
gganimate(p)
gganimate's cumulative doesn't accumulate the data, it just keeps gif frames in subsequent frames as they appear. To achieve what you want, you have to do the accumulation before building the plot, something along the following lines:
library(tidyverse)
library(gganimate)
df <- data_frame(
ID = factor(c(1,1,2,2,1,1,2,2,1,1,2,2), levels = 1:2),
value = c(10,15,5,10,7,17,4,12,9,20,6,17),
sim_category = factor(c(1,1,1,1,2,2,2,2,3,3,3,3), levels = 1:3)
)
p <- df %>%
pull(sim_category) %>%
levels() %>%
as.integer() %>%
map_df(~ df %>% filter(sim_category %in% 1:.x) %>% mutate(sim_category = .x)) %>%
ggplot(aes(ID, value, frame = factor(sim_category))) +
geom_boxplot(position = "identity")
gganimate(p)

Rank Stacked Bar Chart by Sum of Subset of Fill Variable

Sample data:
set.seed(145)
df <- data.frame(Age=sample(c(1:10),20,replace=TRUE),
Rank=sample(c("Extremely","Very","Slightly","Not At All"),
20,replace=TRUE),
Percent=(runif(10,0,.01)))
df.plot <- ggplot(df,aes(x=Age,y=Percent,fill=Rank))+
geom_bar(stat="identity")+
coord_flip()
df.plot
Within the ggplot, how can I reorder x=Age, by the sum of Ranks "Extremely" and "Very" only?
I tried using the below, without success.
df.plot <- ggplot(df,aes(x=reorder(Age,Rank=="Extremely",sum),y=Percent,fill=Rank))+
geom_bar(stat="identity")+
coord_flip()
df.plot
Couple of notes:
The way that you are simulating your data does not rule out the possibility that for some ages, all categories are not represented (which is fine), but also that for some ages, some categories are duplicated. I am assuming that this is not true for your real data, so have let this be. Note also that your simulation logic does not produce percentages that add up, although the category names indicate that they should.
The way I would do this is to create the ordering of age based on your desired logic, and then pass that order to the factor call. This decouples the ordering logic and allows arbitrary ordering logic.
Here is then what I think you are looking for:
library(ggplot2)
library(dplyr)
library(scales)
set.seed(145)
# simulate the data
df_foo = data.frame(Age=sample(c(1:10),20,replace=TRUE),
Rank=sample(c("Extremely","Very","Slightly","Not At All"),
20,replace=TRUE),
Percent=(runif(10,0,.01)))
# get the ordering that you are interested in
age_order = df_foo %>%
filter(Rank %in% c("Extremely", "Very")) %>%
group_by(Age) %>%
summarize(SumRank = sum(Percent)) %>%
arrange(desc(SumRank)) %>%
`[[`("Age")
# in some cases ages do not appear in the order because the
# ordering logic does not span all categories
age_order = c(age_order, setdiff(unique(df_foo$Age), age_order))
# make age a factor sorted by the ordering above
ggplot(df_foo, aes(x = factor(Age, levels = age_order), y = Percent, fill = Rank))+
geom_bar(stat = "identity") +
coord_flip() +
theme_bw() +
scale_y_continuous(labels = percent)
Which code produces:

Plotting a bar graph in R

Here is a snapshot of data:
restaurant_change_sales = c(3330.443, 3122.534)
restaurant_change_labor = c(696.592, 624.841)
restaurant_change_POS = c(155.48, 139.27)
rest_change = data.frame(restaurant_change_sales, restaurant_change_labor, restaurant_change_POS)
I want two bars for each of the columns indicating the change. One graph for each of the columns.
I tried:
ggplot(aes(x = rest_change$restaurant_change_sales), data = rest_change) + geom_bar()
This is not giving the result the way I want. Please help!!
So ... something like:
library(ggplot2)
library(dplyr)
library(tidyr)
restaurant_change_sales = c(3330.443, 3122.534)
restaurant_change_labor = c(696.592, 624.841)
restaurant_change_POS = c(155.48, 139.27)
rest_change = data.frame(restaurant_change_sales,
restaurant_change_labor,
restaurant_change_POS)
cbind(rest_change,
change = c("Before", "After")) %>%
gather(key,value,-change) %>%
ggplot(aes(x = change,
y = value)) +
geom_bar(stat="identity") +
facet_grid(~key)
Which will produce:
Edit:
To be extra fancy e.g. make it so that the order of x-axis labels goes from "Before" to "After", you can add this line: scale_x_discrete(limits = c("Before", "After")) to the end of the ggplot function
Your data are not formatted properly to work well with ggplot2, or really any of the plotting packages in R. So we'll fix your data up first, and then use ggplot2 to plot it.
library(tidyr)
library(dplyr)
library(ggplot2)
# We need to differentiate between the values in the rows for them to make sense.
rest_change$category <- c('first val', 'second val')
# Now we use tidyr to reshape the data to the format that ggplot2 expects.
rc2 <- rest_change %>% gather(variable, value, -category)
rc2
# Now we can plot it.
# The category that we added goes along the x-axis, the values go along the y-axis.
# We want a bar chart and the value column contains absolute values, so no summation
# necessary, hence we use 'identity'.
# facet_grid() gives three miniplots within the image for each of the variables.
ggplot2(rc2, aes(x=category, y=value, facet=variable)) +
geom_bar(stat='identity') +
facet_grid(~variable)
You have to melt your data:
library(reshape2) # or library(data.table)
rest_change$rowN <- 1:nrow(rest_change)
rest_change <- melt(rest_change, id.var = "rowN")
ggplot(rest_change,aes(x = rowN, y = value)) + geom_bar(stat = "identity") + facet_wrap(~ variable)

Resources