Gathering the Averages and Combining multiple Line Graphs - r

I am new to R and I would love some assistance on this. I am using this dataset: https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-01-29/clean_cheese.csv
I am trying to first find the Average of the following Cheeses: Cheddar, American, Mozzarella, Italian, Swiss, Muenster, and Blue. Then I would like to place them into a line graph but show them all at once. I would like to show the average consumption of these cheeses.
The following is my code and what I have so far. I am new at this so this might like horrible to some.
line_3 <- clean_cheese %>%
select(c(Year, Cheddar, Mozzarella, `American Other`, `Italian other`, Swiss, Muenster, Blue)) %>%
group_by(Year) %>%
summarise(avg_cheddar_cheese = mean(Cheddar), avg_mozz_cheese = mean(Mozzarella), avg_american_other = mean(`American Other`), avg_italin_other = mean(`Italian other`), avg_swiss_cheese = mean(Swiss), avg_muenster = mean(Muenster), avg_blue = mean(Blue)) %>%
pivot_longer(-c(Year)) +
ggplot(aes(x = Year, y = value, color=name,group=name)) +
geom_line() +
facet_wrap(.~name,scales = 'free_y')
ggplotly(line_3)

You can try :
library(tidyverse)
clean_cheese <- read.csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-01-29/clean_cheese.csv')
line_3 <- clean_cheese %>%
group_by(Year) %>%
summarise(across(Cheddar:Blue, mean)) %>%
pivot_longer(cols = -Year) %>%
ggplot(aes(x = Year, y = value, color=name,group=name)) +
geom_line() +
facet_wrap(.~name,scales = 'free_y')
line_3

Related

displaying data as a line in charts

df <- read.csv('https://raw.githubusercontent.com/ulklc/covid19-
timeseries/master/countryReport/raw/rawReport.csv')
df$countryName = as.character(df$countryName)
I processed the dataset.
Can we show the patient and population charts of the continents as separate line charts on the same chart?
as output;
''date region confirmed
''2020/01/03 europa 850
The data in the output I created are examples. The data in the example are not real.
Here's an approach with dplyr, tidyr and ggplot:
library(dplyr)
library(tidyr)
library(ggplot2)
df %>%
group_by(region, day) %>%
dplyr::summarize(confirmed = sum(confirmed),
recovered = sum(recovered),
death = sum(death)) %>%
pivot_longer(cols = c("confirmed","recovered","death"), names_to = "condition") %>%
ggplot(aes(x= as.Date(day), y = value, group = region, color = region)) +
geom_line() +
facet_grid(rows = vars(condition), scales = "free_y") +
labs(x = "Date", y = "Number of Individuals")

Working with tidyverse, ggplot, and broom to add confidence interval to a proportion test (prop.test) in R

Let's say I'm working with proportions, I have two main variables (sex and pain_level). It's not difficult to plot them:
With tidyverse and broom (and thanks for this link here: Calling prop.test function in R with dplyr) I can compare if the proportions are statistically different.
Now comes the question!
I want to add to the plot, the error bar. I know it's not as difficult as I'm thinking, but I could not find a way to do it. I've tried to replicate this link here (http://www.andrew.cmu.edu/user/achoulde/94842/labs/lab07_solution.html) but I'm trying to stay at tidyverse environment.
The desired output should be something like that:
Please feel free to use the script/syntax below that simulate the original dataset.
library(tidyverse)
ds <- data.frame(sex = rep(c("M","F"), 18),
pain_level = c("High","Moderate","low"))
#plot
ds %>%
group_by(pain_level, sex) %>%
summarise(n=n()) %>%
mutate(prop = n/sum(n)*100) %>%
ggplot(., aes(x = sex, fill = pain_level, y = prop)) +
geom_bar(stat = "summary") +
facet_wrap( ~ pain_level) +
theme(legend.position = "none")
#p values of proportion test
ds %>%
rowwise %>%
group_by(pain_level, sex) %>%
summarise(cases = n()) %>%
mutate(pop = sum(cases)) %>% #compute totals
distinct(., pain_level, .keep_all= TRUE) %>% #keep only one value of the row
mutate(tst = list(broom::tidy(prop.test(cases, pop, conf.level=0.95)))) %>%
tidyr::unnest(tst)
I think the following might roughly resemble your desired output:
ds %>%
group_by(pain_level, sex) %>%
summarise(cases = n()) %>%
mutate(pop = sum(cases)) %>%
rowwise() %>%
mutate(tst = list(broom::tidy(prop.test(cases, pop, conf.level=0.95)))) %>%
tidyr::unnest(tst) %>%
ggplot(aes(sex, estimate, group = pain_level)) +
geom_col(aes(fill = pain_level)) +
geom_errorbar(aes(ymin = conf.low, ymax = conf.high)) +
facet_wrap(~ pain_level)

bar chart of row freq ggplot2

I have the following data:
dataf <- read.table(text = "index,group,taxa1,taxa2,taxa3,total
s1,g1,2,5,3,10
s2,g1,3,4,3,10
s3,g2,1,2,7,10
s4,g2,0,4,6,10", header = T, sep = ",")
I'm trying to make a stacked bar plot of the frequences of the data so that it counts across the row (not down a column) for each index (s1,s2,s3,s4) and then for each group (g1,g2) of each taxa. I'm only able to figure out how to graph the species of one taxa but not all three stacked on each other.
Here are some examples of what I'm trying to make:
These were made on google sheets so they don't look like ggplot but it would be easier to make in r with ggplot2 because the real data set is larger.
You would need to reshape the data.
Here is my solution (broken down by plot)
For first plot
library(tidyverse)
##For first plot
prepare_data_1 <- dataf %>% select(index, taxa1:taxa3) %>%
gather(taxa,value, -index) %>%
mutate(index = str_trim(index)) %>%
group_by(index) %>% mutate(prop = value/sum(value))
##Plot 1
prepare_data_1 %>%
ggplot(aes(x = index, y = prop, fill = fct_rev(taxa))) + geom_col()
For second plot
##For second plot
prepare_data_2 <- dataf %>% select(group, taxa1:taxa3) %>%
gather(taxa,value, -group) %>%
mutate(group = str_trim(group)) %>%
group_by(group) %>% mutate(prop = value/sum(value))
##Plot 2
prepare_data_2 %>%
ggplot(aes(x = group, y = prop, fill = fct_rev(taxa))) + geom_col()
##You need to reshape data before doing that.
dfm = melt(dataf, id.vars=c("index","group"),
measure.vars=c("taxa1","taxa2","taxa3"),
variable.name="variable", value.name="values")
ggplot(dfm, aes(x = index, y = values, group = variable)) +
geom_col(aes(fill=variable)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.25)) +
geom_text(aes(label = values), position = position_stack(vjust = .5), size = 3) + theme_gray()

dplyr() and ggolot2()::geom_tile, filtering a group of summary statistics

I've got a data frame (df) with three categorical variables called site, purchase, and happycustomer.
I'd like to use gglot2's geom_tile function to create a heat-map of customer experience. I'd like site on the x-axis, purchase on the y-axis, and happycustomer as the fill. I'd like the heat map to feature the percentages for the happy customers grouped by site and purchase (ie the ones for which the value of happycustomer is y).
My problem's that at the moment the plot features both the happy and the unhappy customers.
Any help would be much appreciated.
Starting point (df):
df <- data.frame(site=c("GA","NY","BO","NY","BO","NY","BO","NY","BO","GA","NY","GA","NY","NY","NY"),purchase=c("a1","a2","a1","a1","a3","a1","a1","a3","a1","a2","a1","a2","a1","a2","a1"),happycustomer=c("n","y","n","y","y","y","n","y","n","y","y","y","n","y","n"))
Current code:
library(ggplot2)
library(dplyr)
df %>%
group_by(site, purchase,happycustomer) %>%
summarize(bin = sum(happycustomer==happycustomer)) %>%
group_by(site,happycustomer) %>%
mutate(bin_per = (bin/sum(bin)*100)) %>%
ggplot(aes(site,purchase)) + geom_tile(aes(fill = bin_per),colour = "white") + geom_text(aes(label = round(bin_per, 1))) +
scale_fill_gradient(low = "blue", high = "red")
Here is the solution with two data frames.
happyDF <- df %>%
filter(happycustomer == "y") %>%
group_by(site, purchase) %>%
summarise( n = n() )
totalDF <- df %>%
group_by(site, purchase) %>%
summarise( n = n() )
And the ggplot code:
merge(happyDF, totalDF, by=c("site", "purchase") ) %>%
mutate(prop = 100 * (n.x / n.y) ) %>%
ggplot(., aes(site, purchase)) +
geom_tile(aes(fill = prop),colour = "white") +
geom_text(aes(label = round(prop, 1))) +
scale_fill_gradient(low = "blue", high = "red")

R dplyr group, ungroup, top_n and ggplot

I have an object with several values including cities, states, year and number of murders. I use dplyr to group it by city and calculate the total murders over all years for the top 10 cities like this:
MurderNb_reshaped2 %>%
select(city, state, Year, Murders) %>%
group_by(city) %>%
summarise(total = sum(Murders)) %>%
top_n(10, total) %>%
ggplot(aes(x = Year, y = Murders, fill = "red")) +
geom_histogram(stat = "identity") +
facet_wrap(~city)
I would like to plot this for only the top ten cities, but 'x = year' is not found because it has been grouped by city. Can anyone explain how I can accomplish this?
EDIT: this the original source data https://interactive.guim.co.uk/2017/feb/09/gva-data/UCR-1985-2015.csv
And here is my code:
Deaths <- read.csv("UCR-1985-2015.csv", stringsAsFactors = F)
MurderRate <- Deaths[, -c(5:35)]
MurderNb <- Deaths[, -c(36:66)]
colnames(MurderNb) <- gsub("X", "", colnames(MurderNb))
colnames(MurderNb) <- gsub("_raw_murder_num", "", colnames(MurderNb))
MurderNb_reshaped <- melt(MurderNb, id = c("city", "Agency", "state", "state_short"))
colnames(MurderNb_reshaped) <- c("city", "Agency", "state", "state_short", "Year", "Murders")
MurderNb_reshaped2 <- MurderNb_reshaped
MurderNb_reshaped2 %>%
select(city, state, Year, Murders) %>%
group_by(city) %>%
summarise(total = sum(Murders)) %>%
top_n(10, total) %>%
ggplot(aes(x = Year, y = Murders, fill = "red")) +
geom_bar(stat = "identity") +
facet_wrap(~city)
Ok there were a couple minor issue. This should do the trick:
#this gives you the top cities
topCities <- MurderNb_reshaped2 %>%
select(city, state, Year, Murders) %>%
group_by(city) %>%
summarise(total = sum(Murders)) %>%
top_n(10, total)
#you then need to filter your original data to be only the data for the top cities
MurderNb_reshaped2 <- filter(MurderNb_reshaped2, city %in% topCities$city)
ggplot(data = MurderNb_reshaped2, aes(x = Year, y = Murders, fill = "red")) +
geom_bar(stat = "identity") +
facet_wrap(~city)

Resources