ggplot - how to make errorbar with respect to gender - r

Few words at the beginning - I have just started my journey with R, and after initial experience am really keen on further learning! But I've encountered a huge problem and searching in google doesnt seem to help. Maybe some good soul here could guide me with your wisdom :)
So, I've been trying to make an error_bar in R, using ggplot. But the problem is that on y axis I got continuous variable (marital satisfaction), x ais is factor (consisting of three levels), and I also wanted to add gender to the plot (what is more, all of it has to be black-white, which makes it double).
What I want to do is to show the means and standard deviation of marital satisfaction in three different religions (Christian, Muslim, atheistic) with respect to gender (male, female). Do you have any idea how to do it?
Thanks in advance! <3
I've already tried doing boxplot with my data, but such plot doesnt provide with any useful information, and after googling I think this error bar would better fit into the data.
Here how it looks like:
factor(ds$`Religion`, levels = c(2, 4, 6), labels = c("Christian", "Muslim", "atheistic"))
factor(ds$Sex, levels = c(0, 1), labels = c("Male", "Female"))
obj1 <- ggplot(data=df, aes(y=Marital, x=factor(Religion), fill=factor(Sex))) + geom_boxplot()
obj1+labs(x="Religious affiliation", y="Marital satisfaction", fill="Sex") -> obj2
obj2 + scale_x_discrete(labels = c('Christian','Muslim','atheistic')) -> obj3
obj3 + scale_fill_discrete(name = "Sex", labels = c("Male", "Female")) -> obj4
obj4 + scale_y_continuous(expand = c(0, 1)) -> obj5
I copy pasted my data here:
https://textsaver.flap.tv/lists/2l4k

What I want to do is to show the means and standard deviation of marital satisfaction in three different religions (Christian, Muslim, atheistic) with respect to gender (male, female). Do you have any idea how to do it?
The simple answer to this question is that you should create the summary statistics beforehand. After that picking the visualization becomes trivial.
Assuming you are interested in using a tidyverse solution I would proceed as follows:
library(tidyverse)
ds %>%
group_by(Religion, Sex) %>%
summarize(meanVal = mean(Marital),
sdVal = sd(Marital)) -> ds.summarized
ds.summarizedwill have one row for each Religion-Sex combination and provides the mean and the sd of martial satisfaction in this group. You can then proceed plotting using geom_errorbar where with aesthetics y=meanVal, ymin=meanVal - sdVal and ymax=meanVal + sdVal. One final remark - you may want to use standard errors instead of standard deviation.

Related

Best way to visually compare an individual's value to multiple subgroup means in R

For an individual feedback sheet generated by a Shiny App in R I would like to visually compare an individual's value in variable X to the mean of the whole group, the mean of people of the same age and the mean of people playing the same sports. I was considering making a barplot with four bars for each value and since I keep reading ggplot2 is neat for making plots tried to figure out how to do it in ggplot2. However when trying to implement this idea the factor on the x axis would conceptually be the subsets of the dataset and since the subsets are build from different variables and one individual can be in more than one subset I absolutely can't seem to wrap my head around how to actually feed that into any barplot synthax I found. I wondered if your could just make a list along the lines of c(your_value, mean(group), mean(age_subset), mean(sports_subset)) but I didn't find if that was possible also first making a list or even a second dataframe seems kinda messy to me - isn't there an easier and more elegant way to do something like that?
Below I start with arbitrary numbers (equivalent to the list you considered starting with). The code might give you an idea how to make a general function of the kind you're seeking.
library(ggplot2)
library(dplyr)
own_result <- 5.4
mean_age <- 5.6
mean_sport <- 4.5
data.frame(group = c("age", "sport"),
means = c(mean_age, mean_sport)) %>%
ggplot(aes(x = group, y = means)) +
geom_bar(stat = "identity") +
geom_hline(yintercept = own_result, lty = 2, col = "red")
Created on 2021-07-20 by the reprex package (v2.0.0)

Sorting histogram plots within facet_wrap by skew

I have about 1K observations for each country and I have used facet_wrap to display each country's geom_bar but the output is by alphabetical order. I would want to cluster or order them by skew (so the most positive-skew are together and moving towards the normal-distribution countries, then the negative-skew countries ending with the most negative-skewed) without eyeballing what countries are more similar to each other. I was thinking maybe psych::describe() might be useful since it calculates skew, but I am having a hard time figuring out how I would implement adding that information to a similar question.
Any suggestions would be helpful
I can't go into too much detail without a reproducible example but this would be my general approach. Use psych::describe() to create a vector of countries that are sorted from most positive skew to least positive skew: country_order . Next, factor the country column in your dataset with country = factor(country, levels = country_order). When you use facet_wrap the plots will be displayed in the same order as country_order.
After some troubleshooting , I found (what I think is) an efficient way of doing it:
skews <- psych::describe.By(df$DV, df$Country, mat = TRUE) #.BY and mat will produce a matrix that you can use to merge into your df easily
skews %<>%select(group1, mean, skew) %>% sjlabelled::as_factor(., group1) #Turn it into a factor, I also kept country means
combined <- sort(union(levels(df$Country), levels(skews$group1))) #I was getting an error that my levels were inconsistent even though they were the same (since group1 came from df$Country) which I think was due to having Country reference category Germany which through off the alphabetical sort of group1 so I used [dfrankow's answer][1]
df <- left_join(mutate(df, Country=factor(Country, levels=combined)),
mutate(skews, Country=factor(group1, levels=combined))) %>% rename(`Country skew` = "skew", `Country mean` = "mean") %>% select(-group1)
df$`Country skew` <- round(df$`Country skew`, 2)
ggplot(df) +
geom_bar(aes(x = DV, y=(..prop..)))+
xlab("Scale axis text") + ylab("Proportion") +
scale_x_continuous()+
scale_y_continuous(labels = scales::percent_format(accuracy = 1))+
ggtitle("DV distribution by country mean")+
facet_wrap(~ Country %>% fct_reorder(.,mean), nrow = 2) #this way the reorder that was important for my lm can remain intact

ggplot2 in R: Calculate percentage and make a graph that might be a geom_area plot

I'm a beginner in R, so please be patient with me if there are very obvious mistakes in my code and for my question! For a homework problem, I am struggling to make what I think is a geom_area plot look like this:
As background, we are using the diamonds dataframe from ggplot2 library. We were given the plot and asked to reproduce it. My biggest problem is with the y-axis. The graph given indicated that the y-axis represents density, which I think is the percentage/proportion of each clarity grade given the title. Originally, I thought perhaps I needed to create a new dataframe with "Price" and "Clarity Proportion (or, density)", but I wasn't sure how to do that. The professor hinted that we should not need to create a new variable for this problem.
Here's what I have so far. It produces the error message: "In Ops.ordered(left, right): '/' is not meaningful for ordered factors":
set.seed(123)
d <- ggplot(diamonds[sample(nrow(diamonds),5000),]) #these were given in the homework
d + geom_area(aes(x = price, y = lapply(count(diamonds$clarity), FUN = count(diamonds$clarity)/53940), colour = clarity), position = "fill") +
labs(title = "Clarity Proportion by Price")
I know my y-axis is wrong, but I'm just not sure how to transform it. Your explanation and insight are greatly appreciated!

barplot 2 bars one stacked the other not

Despite some similar questions and my research I cannot seem to solve my little problem. Please forgive if the answer is very easy and I am being silly....I have a data frame
df<-data.frame(X = c("Germany", "Chile","Netherlands","Papua New Guinea","Cameroon"), R_bar_Ger = c(1300000000, 620000, 550000, 400000, 320000))
I would like to produce a barplot with 2 bars (Country names on x-achsis, amounts on y-achsis).
The left bar should show Germany, the right one should be stacked with the remaining 4 countrys.
Please help and Thank you very much in advance!
One way to solve this is by using ggplot2 and a little bit of manipulating your data frame.
First, add a column to your data frame that indicates which bar a country should be plotted in (Germany or Not-Germany):
df$bar <- ifelse(df$X == "Germany", 1, 0)
Now, create the plot:
ggplot(data = df, aes(x = factor(bar), fill = factor(X), y = R_bar_Ger)) +
geom_bar(stat = "identity") +
scale_y_sqrt() +
labs(x = "Country Group", title = "Square Root Scale", fill = "Country") +
scale_x_discrete(labels = c("Not Germany", "Germany"))
Note that if you're not familiar with ggplot2, only the first two lines are necessary for creating the plot - the others are to make it look nice. Since Germany is orders of magnitude larger than your other countries, this isn't going to look very good without some sort of scaling. ggplot2 has a number of built in scaling commands that might be worth exploring - here, I've added the square root scale so you can that the non-Germany countries actually do get stacked as desired.
The documentation for ggplot2 bar charts can be found here - it's definitely worth a read if you're looking for a powerful plotting tool.
There are a number of ways to skin a cat, and your exact question will often change as you learn new tools. I probably wouldn't have set the problem specification up this way, but sticking as close to your data and barplot as possible, one way to achieve what I think you want is:
with(aggregate(R_bar_Ger ~ X=="Germany", data=df, sum), barplot(R_bar_Ger, names.arg=c("Other", "Germany")))
So what we're doing here is aggregating Germany and non-Germany figures by addition, and then passing those values to the barplot function along with sensible x-axis labels.
You'll need to add an additional column to your data first:
df$group <- ifelse(df$X=="Germany","Germany","Other")
Then we can use the following ggplot approach
library(ggplot)
qplot(x = factor(group), y = R_bar_Ger, data=df, geom = "bar", stat = "identity", fill = factor(X))

box-plot not working with factor data

I'm trying to create a simple boxplot of some survey data.
Data
The data is survey data, and each row has a response recorded 1-5.
**Example Data**
Race= 2,2,3,2,5
Rating = 1,1,3,5,5
Converting to factors
df$Race = factor(DF$Race)
df$Rating = factor(DF$Rating)
Assigning each factor variable levels
levels(df$Race) = c("Asian/Pacific Islander", "White" , "American Indian/Eskimo", "Black/African American", "Other","NA")
levels(df$Rating) = c("Poor","Below Avg.","Neutral","Good","Excellent", "NA")
ggplot(df, aes(x=Race, y=Rating)) + geom_boxplot()
Using the full data I get a result like this.
Please let me know why this turns out funky. Also, How can I remove NA's?. I'm brand new to R. So if you see something else that I am doing wrong, or poorly please let me know! Thanks!
UPDATE
Using #jlhoward code provided in the comments I can generate the following - but it's plotting them all the same, and not plotting "white."
ggplot(df, aes(x=Race, y=as.numeric(Rating))) + geom_boxplot() +scale_y_continuous(labels=df$Rating,breaks=as.integer(df$Rating))
If I understand correctly, you want the factor levels ("Poor", "Below Avg" etc.) to appear on the Y axis, but you actually want the "rating" boxplot to be computed with numerical values. Is that correct?
If that is the case, you would need to not convert your "rating" variable into a factor before feeding them to ggplot (leave it numerical), and then simply label the y axis appropriately according to your factor levels.
(A reproducible example would help answer the question more fully).

Resources