R ggplot - Boxplot with fill displaying different values than bargraph - r

I have a boxplot generated using the following code, and after checking the dataset all the values are correct here.
myplot <- inDATA %>% filter(PARAMCD=="param1") %>%
ggplot(aes(x=ACTARMCD,y=AVAL,fill=ACTARMCD))+
geom_boxplot()+
stat_summary(fun.y=mean,na.rm=TRUE,shape=25,col='black',geom='point')
I want to generate a second boxplot where I split the x variable into different groups by applying a different variable as a fill. I use the following code, but the values present in the graph are incorrect.
myplot <- inDATA %>% filter(PARAMCD=="param1") %>%
group_by(ACTARMCD, RESPFL) %>%
ggplot(aes(x=ACTARMCD,y=AVAL))+
geom_boxplot(aes(fill=RESPFL))
However when I generate a bargraph using this code, the numbers are correct.
myplot <- inDATA %>%
filter(PARAMCD=="param1") %>%
group_by(ACTARMCD,RESPFL) %>%
dplyr::mutate(AVAL = mean(AVAL, na.rm=TRUE)) %>%
ggplot(aes(x=ACTARMCD,y=AVAL,fill=RESPFL))+
geom_bar(stat="identity",position="dodge")
Can anyone please help me understand what I am doing incorrectly with the second boxplot?

I ended up solving the issue by using plotly instead of ggplot. The code that worked is:
myplot <- inDATA %>% filter(PARAMCD=="param1") %>%
plot_ly(x = ~ACTARMCD, y = ~AVAL, color = ~RESPFL, type = "box",boxmean=TRUE) %>% layout(boxmode = "group")

Related

Subset data using rangeslider in plotly for R

I am trying to make a scatterplot with plotly for R where the dots are connected in order using geom_path.
Now I would like to add a rangeslider where the user can select a date range.
Here is something similar using Python: Youtube Code
...but I don't need any recalculation of means or something like that, I just want to filter based on column i.
Unfortunately, I am having trouble doing that in R plotly.
This is my attempt, but I don't know how to tell plotly to subset the data using the i column:
library(tidyverse)
library(plotly)
p <- mtcars %>%
mutate(i = 1:nrow(mtcars)) %>%
ggplot(aes(x = mpg, y = wt))+
geom_path(size = 0.5) +
geom_point(aes(color = i), size = 3)
ggplotly(p) %>%
layout(
xaxis = list(rangeslider = list())
)

Can we plot percentage with Plot_ly

is there a way to plot percentages using plot_ly. For example, the below is used to plot the count of cut from diamonds dataset,
plot_ly(diamonds, x = ~cut)
But i tried to plot the percentage for cut. For example I need the percentage of "Good" to the total count. Is there a way to get it?
It could be done like this.
First, create percentage for each cut category
diamonds %>% group_by(cut) %>% summarize(perc = n()/53940*100)
summarized dataset
Second, pipe the resultant data set to plot_ly()
diamonds %>% group_by(cut) %>% summarize(perc = n()/53940*100) %>% plot_ly(x = ~cut, y = ~perc)
R Plot
You can use data.table and ggplot2:
library(data.table)
library(ggplot2)
dt <- data.table(diamonds)
Calculate the number of records by each cut, and then calculate the prop.table of those counts:
result <- dt[, .N, by = cut][, .(cut, N, percentCut = prop.table(N))]
Now you can plot it with ggplot and use the library scales to have a beautiful percent-formatted y-axis:
p <- ggplot(result, aes(x = cut, y = percentCut))+
geom_col()+
scale_y_continuous(labels = scales::percent)
Now you can pass p to plotly, if so you want:
plotly::ggplotly(p)

How to create a function for a bar chart with percentage labels

I would like to create a function for a bar chart that contains labels with the respective percentages.
The following code creates the bar chart that I would like to see:
percentData <- df %>%
group_by(col1) %>%
count(col2) %>%
mutate(ratio=scales::percent(n/sum(n)))
diagram <- ggplot(df, aes(x=col1, fill=col2))
geom_bar(position = "fill")
geom_text(data=percentData, aes(y=n,label=ratio),
position=position_fill(vjust=0.5))
I would like to create a function for the bar chart above to be able to change df, col1 and col2.
I tried the following:
newdiagram <- function(data, col1, col2){
percentData <- data %>%
group_by(col1) %>%
count(col2) %>%
mutate(ratio=scales::percent(n/sum(n)))
diagram <- ggplot(data, aes(x=col1, fill=col2))
geom_bar(position = "fill")
geom_text(data=percentData, aes(y=n,label=ratio),
position=position_fill(vjust=0.5))
return(diagram)
}
newdiagram(df, column1, column2)
Unfortunately I receive the error message that the columns are unknown. I tried to solve it with specifying the columns with data$col1 but this does not work either.
Since you didn't provide any sample data I'm not able to vertify to 100% percent that it will solve the issue but generaly speaking if you want to adress a column of a dataframe with a string this would be the right syntax:
data[[col1]]
which would mean that this should work for you:
newdiagram <- function(data, col1, col2){
percentData <- data %>%
group_by(data[[col1]]) %>%
count(data[[col2]]) %>%
mutate(ratio=scales::percent(n/sum(n)))
diagram <- ggplot(data, aes(x=col1, fill=col2))
geom_bar(position = "fill")
geom_text(data=percentData, aes(y=n,label=ratio),
position=position_fill(vjust=0.5))
return(diagram)
}
newdiagram(df, column1, column2)
Also you can provide a minimalistic data set for reproducability with:
head(data)
I would recommend that you separate the calculating of the percentages and building of the visual into two separate functions. Makes it easier to spot errors that way.
Using the syntax known as the curly brackets will help with this:
{{ }}
Calculating percentages
library(tidyverse)
library(magrittr)
percentData_calculation <- function(col1){
group_by({{col1}}) %>%
count({{col1}}) %>%
mutate(ratio = scales::percent(n/sum(n)))}
Building visual
newdiagram <- function(df, col1, col2){
percentData_calculation({{col1}}) %>%
ggplot({{df}}, aes(x={{col1}},y={{col2}}) +
geom_bar(postion = fill) +
geom_text(., aes(y = n , label = ration),
postion = postion_fill(vjust = 0.5)
In geom_text you don't need to name the data source since its coming from the calculation. Also if you know some aspect of the ggplot wont change, like the df, than you don't need to place it in the function.
Hope this help!

R: using ggplot2 with a group_by data set

I can't quite figure this out. A CSV of 200+ rows assigned to data like so:
gid,bh,p1_id,p1_x,p1_y
90467,R,543333,80.184,98.824
90467,L,408045,74.086,90.923
90467,R,543333,57.629,103.797
90467,L,408045,58.589,95.937
Trying to group by p1_id and plot the mean values for p1_x and p1_y:
grp <- data %>% group_by(p1_id)
Trying to plot geom_point objects like so:
geom_point(aes(mean(grp$p1_x), mean(grp$p1_y), color=grp$p1_id))
But that isn't showing unique plot points per distinct p1_id values.
What's the missing step here?
Why not calculate the mean first?
library(dplyr)
grp <- data %>%
group_by(p1_id) %>%
summarise(mean_p1x = mean(p1_x),
mean_p1y = mean(p1_y))
Then plot:
library(ggplot2)
ggplot(grp, aes(x = mean_p1x, y = mean_p1y)) +
geom_point(aes(color = as.factor(p1_id)))
Edit: As per #eipi10, you can also pipe directly into ggplot
data %>%
group_by(p1_id) %>%
summarise(mean_p1x = mean(p1_x),
mean_p1y = mean(p1_y)) %>%
ggplot(aes(x = mean_p1x, y = mean_p1y)) +
geom_point(aes(color = as.factor(p1_id)))

R - ggplot2 geom_bar() doesn't plot correctly column's values

I am new to R
I would like plot using ggplot2's geom_bar():
top_r_cuisine <- r_cuisine %>%
group_by(Rcuisine) %>%
summarise(count = n()) %>%
arrange(desc(count)) %>%
top_n(10)
But when I try to plot this result by:
ggplot(top_r_cuisine, aes(x = Rcuisine)) +
geom_bar()
I get this:
which doesn't represent the values in top_r_cuisine. Why?
EDIT:
I have tried:
c_count=c(23,45,67,43,54)
country=c("america","india","germany","france","italy")
# sample Data frame #
finaldata = data.frame(country,c_count)
ggplot(finaldata, aes(x=country)) +
geom_bar(aes(weight = c_count))
you need to assign the weights in the geom_bar()

Resources