Boxplot (ggplot2) not working as expected - r

I'm trying to plot a simple boxplot where I get 15 boxplots for respondents age of 15 "Cadernos" (15 surveys, surveys A, B, C .. to O). But that's not working as expected.
I have already tried to switch places to the "Cadernos" (surveys) and "Idade" (age) variables.
Any idea? What I expected was 15 boxplots in the vertical axis.
The code I'm using is the follow:
library(ggplot2)
select_base %>%
ggplot(aes(Idade,Caderno)) +
geom_boxplot()
the plot I get is the following:

I'm not sure that you did not provide your data, but you may try
select_base %>%
ggplot(aes(x = Caderno, y = Idade, group = Caderno)) +
geom_boxplot()
For example using data dummy it will be like plot below
dummy <- data.frame(
x = rnorm(50),
y = rep(c("a","b","c","d","e"),10)
)
dummy %>%
ggplot(aes(x = y,y = x, group = y)) +
geom_boxplot()

Related

How to plot a barplot using ggplot

average
Young 0.01921875
Cohoused Young 0.07111951
Old 0.06057224
Cohoused Old 0.12102273
I am using the above data frame to create a histogram or bar and my code is as follows:
C <-ggplot(data=c,aes(x=average))
C + geom_bar()
but the plot is attached here.
I would like the bar heights to reflect my data on the y axis instead of where the bar is placed on the x axis, but I don't know what my problem is in the code.
We can create a column with rownames_to_column
library(dplyr)
library(tibble)
library(ggplot2)
c %>%
rownames_to_column('rn') %>%
ggplot(aes(x = rn, y = average)) +
geom_col()
Or create a column directly in base R
c$rn <- row.names(c)
ggplot(c, aes(x = rn, y = average)) +
geom_col()
Or as #user20650 suggested
ggplot(data=c,aes(x=rownames(c) , y=average))
NOTE: It is better not to name objects with function names (c is a function)
In base R, with barplot, we can directly get the plots
barplot(as.matrix(c))

geom_bar overlapping labels

for simplicity lets suppose we have a database like
# A
1 1
2 2
3 2
4 2
5 3
We have a categorical variable "A" with 3 possible values (1,2,3). And im tring this code:
ggplot(df aes(x="", y=df$A, fill=A))+
geom_bar(width = 1, stat = "identity")
The problem is that the labels are overlapping. Also i want to change the labes for 1,2,3 to x,y,z.
Here is picture of what is happening
And here is a link for the actual data that im using.
https://a.uguu.se/anKhhyEv5b7W_Data.csv
Your graph does not correspond to the sample of data you are showing, so it is hard to be sure that the structure of your real data is actually the same.
Using a random example, I get the following plot:
df <- data.frame(A = sample(1:3,20, replace = TRUE))
library(ggplot2)
ggplot(df, aes(x="A", y=A, fill=as.factor(A)))+
geom_bar(width = 1, stat = "identity") +
scale_fill_discrete(labels = c("x","y","z"))
EDIT: Using data provided by the OP
Here using your data, you should get the following plot:
ggplot(df, aes(x = "A",y = A, fill = as.factor(A)))+
geom_col()
Or if you want the count of each individual values of A, you can do:
library(dplyr)
library(ggplot2)
df %>% group_by(A) %>% count() %>%
ggplot(aes(x = "A", y = n, fill = as.factor(A)))+
geom_col()
Is it what you are looking for ?

ggplot2 - reordering aes fill based on y-numeric, but calculated for each instance of factor x. Is this possible?

I've searched everywhere but cannot seem to find even a messy / hacked way of creating this plot.
I would like to plot a column chart with:
x = categorical factor, sorted in descending y order
y = numeric variable, summed
fill = categorical factor, sorted in descending y order - BUT having this calculated separately for each occurrence of x.
For example, the below code (using data from datasets) will nearly sort everything as I want, but I cannot for the life of me figure out how to tell ggplot to reorder the fill for each x.
library(tidyverse)
UCBAdmissions <- as.data.frame(UCBAdmissions)
UCBAdmissions$Dept <- as.factor(UCBAdmissions$Dept)
UCBAdmissions$Gender <- as.factor(UCBAdmissions$Gender)
plot <- UCBAdmissions %>%
ggplot(aes(
x = fct_reorder(Dept, Freq, .fun = sum),
y = Freq,
fill = fct_reorder(Gender, Freq, .fun = sum)
)) +
geom_col() + coord_flip() + labs(fill = "gender")
plot
I would like to keep Dept A showing Male closest to the axis, then Female,
but change Dept E to show Female closest (or any Dept where Female > Male).
Any ideas? Open to a messy solution at this point :)
Thanks in advance for your help.
From the position_stack help here:
position_fill() and position_stack() automatically stack values in
reverse order of the group aesthetic
So we can get what you want by adding mapping group to frequency. Since the data includes two Admit categories, I did some pre-processing here to combine them.
Now for each Dept, the stacking order is determined by which Gender has the higher number.
plot <- UCBAdmissions %>%
count(Dept, Gender, wt = Freq) %>% # outputs n = total Freq per Dept/Gender
ggplot(aes(
x = fct_reorder(Dept, n, .fun = sum),
y = n,
group = n,
fill = fct_reorder(Gender, n, .fun = sum)
)) +
geom_col() + coord_flip() + labs(fill = "gender")
plot

Change the boxplot background based in x-variable (ggplot2)

I want to change the background of a boxplot based in x-variables. My code is very simple:
ggplot(data = df, aes(x = variable, y = value)) +
geom_boxplot() +
So, i have 17 x-variables and i generate 17 boxplots in the same picture. I want to change to grey the background of the boxplots from 1 to 4 and from 11 to 14. I donĀ“t know how can i do that.
Thanks.
You must create some factor to aid this process. In the example I created a new feature (tales) in df.
library(tidyverse)
df <- data.frame(variable = rep(base::LETTERS[1:17], 5),
value = runif(17*5, 0, 100))
df <- df %>%
dplyr::mutate(tales = rep(c(rep("x", 4), rep("y", 11-4), rep("w", 17-11)), 5))
ggplot(data = df, aes(x = variable, y = value)) +
geom_boxplot(aes(fill = tales))

R ggplot How to Show Probability of Two Variables

I have a distribution of data that is shown below in image 1. My goal is to show the likelihood that a variable is below a particular value for both X and for Y. For instance, I'd like to have a good way to show that ~95% of values are below 8000 on X-axis and below 6500 on the Y-axis. I am confident that there is a simple answer to this. I apologize if this has been asked many times before.
plot1 <- df %>% ggplot(mapping = aes(x = FLUID_TOT)) + stat_ecdf() + theme_bw()
plot2 <- df %>% ggplot(mapping = aes(x = FLUID_TOT, y = y)) + geom_point() + theme_bw()

Resources