displaying data as a line in charts - r

df <- read.csv('https://raw.githubusercontent.com/ulklc/covid19-
timeseries/master/countryReport/raw/rawReport.csv')
df$countryName = as.character(df$countryName)
I processed the dataset.
Can we show the patient and population charts of the continents as separate line charts on the same chart?
as output;
''date region confirmed
''2020/01/03 europa 850
The data in the output I created are examples. The data in the example are not real.

Here's an approach with dplyr, tidyr and ggplot:
library(dplyr)
library(tidyr)
library(ggplot2)
df %>%
group_by(region, day) %>%
dplyr::summarize(confirmed = sum(confirmed),
recovered = sum(recovered),
death = sum(death)) %>%
pivot_longer(cols = c("confirmed","recovered","death"), names_to = "condition") %>%
ggplot(aes(x= as.Date(day), y = value, group = region, color = region)) +
geom_line() +
facet_grid(rows = vars(condition), scales = "free_y") +
labs(x = "Date", y = "Number of Individuals")

Related

Mixing stacked and unstacked bars in single graph

I'm trying to mix stacked and unstacked bars in one graph.
Here is sample code:
Indicators <- c("Z","X","C","NO C","NA C","LOW C","HIGH C","D","E","F","G","H","LOW H","MED H","HIGH H",
"I","LOW I","MED I","HIGH I","J","LOW J","MED J","HIGH J")
Class_1 <- c(0.44,0.01,"",0.6,0.13,0.11,0.16,0.08,0.01,0.14,0.21,"",0.55,
0.23,0.22,"",0.4,0.31,0.29,"",0.41,0.23,0.36)
data <- data.frame(Indicators,Class_1)
How can I make a graph with both unstacked and stacked bars? I really appreciate all the help there is! Thanks!
Basically you want a stacked barchart even if for some categories there is nothing to stack. To this end you have to do some data wrangling and cleaning like getting rid of the empty value rows and splitting the Indicators column in two:
library(dplyr)
library(ggplot2)
library(tidyr)
data1 <- data |>
filter(Class_1 != "") |>
mutate(Class_1 = as.numeric(Class_1)) |>
separate(Indicators, into = c("level", "which"), fill = "left") |>
replace_na(list(level = "Total"))
ggplot(data1, aes(which, Class_1, fill = level)) +
geom_col()
Update II:
library(tidyverse)
my_pattern1 <- c("No|Non|Low-risk|High-risk|Low|Medium|High")
data %>%
filter(Class_1 != "") %>%
mutate(x = case_when(str_detect(Indicators, 'CSA|childhood sexual abuse') ~ "Childhood sexual abuse",
str_detect(Indicators, 'discrimination') ~ "discrimination"),
y = str_extract(Indicators, my_pattern1)) %>%
mutate(z = str_remove_all(Indicators, my_pattern1)) %>%
mutate(y= ifelse(is.na(y), "Total", y)) %>%
ggplot(aes(x = z, y = as.numeric(Class_1), fill= y))+
geom_bar(stat = "identity", position = "stack")+
theme(axis.text.x = element_text(angle = 45, vjust = 0.5, hjust=1))
Update: (#camille is right. I now adapted my answer and removed the first one)
library(tidyverse)
data %>%
separate(Indicators, c("Indicators_1", "Indicators_2")) %>%
mutate(Indicators_3 = ifelse(is.na(Indicators_2), "Total", Indicators_1),
Indicators_2 = coalesce(Indicators_2, Indicators_1)) %>%
filter(Class_1 != "") %>%
ggplot(aes(x = Indicators_2, y = as.numeric(Class_1), fill= Indicators_3))+
geom_bar(stat = "identity", position = "stack")

Remove some of the X axis labels in ggplot bar chart

I have the following code for a stacked bar graph for the period 1970-2020, which is the X-axis label.
The graph is generated from a dataframe with 3 columns and 51 rows representing various years.
Year Active New
1970 1 1
......
2020 268 60
df %>%
mutate(Old = Active - New) %>%
select(-Active) %>%
pivot_longer(-Year, names_to = "Type", values_to = "Count") %>%
ggplot() +
geom_col(aes(x = Year, y = Count, fill = forcats::fct_rev(Type)))+
ggtitle("example graph")+
scale_fill_discrete(name="Cases",
breaks=c("Old", "New"),
labels=c("Ongoing", "New"))
As my X-axis is not readable, I would like to remove most of the Years in the X-axis, and keep eg. 1970, 1980, 1990, 2000, 2020 - at their corresponding positions. I'm not sure that scale_x_discrete can do this?
The obvious answer in this specific case is to convert Year to a numeric variable, which will make the breaks pretty by default.
This sample data allows us to run your code and reproduce your issue:
set.seed(1)
df <- data.frame(Year = factor(1970:2020),
Active = cumsum(rnorm(51, 4, 2)),
New = cumsum(rnorm(51, 1, 1)))
Using your exact plotting code produces a similar plot with unreadable axis labels:
df %>%
mutate(Old = Active - New) %>%
select(-Active) %>%
pivot_longer(-Year, names_to = "Type", values_to = "Count") %>%
ggplot() +
geom_col(aes(x = Year, y = Count, fill = forcats::fct_rev(Type)))+
ggtitle("example graph")+
scale_fill_discrete(name="Cases",
breaks=c("Old", "New"),
labels=c("Ongoing", "New"))
But if we simply convert Year to numeric values, we get the same plot with pretty breaks:
df %>%
mutate(Old = Active - New) %>%
select(-Active) %>%
pivot_longer(-Year, names_to = "Type", values_to = "Count") %>%
ggplot() +
geom_col(aes(x = as.numeric(as.character(Year)), y = Count,
fill = forcats::fct_rev(Type)))+
ggtitle("example graph")+
xlab("Year") +
scale_fill_discrete(name="Cases",
breaks=c("Old", "New"),
labels=c("Ongoing", "New"))
If for whatever reason it needs to be a factor, you can leave specific years blank using a labeling function.
df %>%
mutate(Old = Active - New) %>%
select(-Active) %>%
pivot_longer(-Year, names_to = "Type", values_to = "Count") %>%
ggplot() +
geom_col(aes(x = Year, y = Count, fill = forcats::fct_rev(Type)))+
ggtitle("example graph")+
scale_fill_discrete(name="Cases",
breaks=c("Old", "New"),
labels=c("Ongoing", "New")) +
scale_x_discrete(labels = function(x) ifelse(as.numeric(x) %% 10, "", x)) +
theme(axis.ticks.length.x = unit(0, "mm"))
Created on 2022-08-19 with reprex v2.0.2

Sorting Y-axis of barplot based on the decresing value of last facet grid in ggplot2

Question:
I am trying to sort the Y-axis of the barplot based on the decreasing value of the last facet group "Step4" with having a common Y-axis label. There are suggestions for ordering all facet groups within themselves but how to do with the common y-axis label and values of one facet group. I have attached a sample data and code for the initial plot to understand the question.
Thanks in advance.
Data:
Download the sample data here
Code:
library(ggplot2)
library(reshape2)
#reading data
data <- read.csv(file = "./sample_data.csv", stringsAsFactors = TRUE)
#reshaping data in longer format using reshape::melt
data.melt <- melt(data)
#plotting the data in multi-panel barplot
ggplot(data.melt, aes(x= value, y=reorder(variable, value))) +
geom_col(aes(fill = Days), width = 0.7) +
facet_grid(.~step, scales = "free")+
theme_pubr() +
labs(x = "Number of Days", y = "X")
Graph: Barplot Graph for the sample data
Summarise the values for last 'step' and extract the levels from the data.
library(dplyr)
library(ggplot2)
lvls <- data.melt %>%
arrange(step) %>%
filter(step == last(step)) %>%
#Or
#filter(step == 'Step4') %>%
group_by(variable) %>%
summarise(sum = sum(value)) %>%
arrange(sum) %>%
pull(variable)
data.melt$variable <- factor(data.melt$variable, lvls)
ggplot(data.melt, aes(x= value, y= variable)) +
geom_col(aes(fill = days), width = 0.7) +
facet_grid(.~step, scales = "free")+
theme_pubr() +
labs(x = "Number of Days", y = "X")

Gathering the Averages and Combining multiple Line Graphs

I am new to R and I would love some assistance on this. I am using this dataset: https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-01-29/clean_cheese.csv
I am trying to first find the Average of the following Cheeses: Cheddar, American, Mozzarella, Italian, Swiss, Muenster, and Blue. Then I would like to place them into a line graph but show them all at once. I would like to show the average consumption of these cheeses.
The following is my code and what I have so far. I am new at this so this might like horrible to some.
line_3 <- clean_cheese %>%
select(c(Year, Cheddar, Mozzarella, `American Other`, `Italian other`, Swiss, Muenster, Blue)) %>%
group_by(Year) %>%
summarise(avg_cheddar_cheese = mean(Cheddar), avg_mozz_cheese = mean(Mozzarella), avg_american_other = mean(`American Other`), avg_italin_other = mean(`Italian other`), avg_swiss_cheese = mean(Swiss), avg_muenster = mean(Muenster), avg_blue = mean(Blue)) %>%
pivot_longer(-c(Year)) +
ggplot(aes(x = Year, y = value, color=name,group=name)) +
geom_line() +
facet_wrap(.~name,scales = 'free_y')
ggplotly(line_3)
You can try :
library(tidyverse)
clean_cheese <- read.csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-01-29/clean_cheese.csv')
line_3 <- clean_cheese %>%
group_by(Year) %>%
summarise(across(Cheddar:Blue, mean)) %>%
pivot_longer(cols = -Year) %>%
ggplot(aes(x = Year, y = value, color=name,group=name)) +
geom_line() +
facet_wrap(.~name,scales = 'free_y')
line_3

bar chart of row freq ggplot2

I have the following data:
dataf <- read.table(text = "index,group,taxa1,taxa2,taxa3,total
s1,g1,2,5,3,10
s2,g1,3,4,3,10
s3,g2,1,2,7,10
s4,g2,0,4,6,10", header = T, sep = ",")
I'm trying to make a stacked bar plot of the frequences of the data so that it counts across the row (not down a column) for each index (s1,s2,s3,s4) and then for each group (g1,g2) of each taxa. I'm only able to figure out how to graph the species of one taxa but not all three stacked on each other.
Here are some examples of what I'm trying to make:
These were made on google sheets so they don't look like ggplot but it would be easier to make in r with ggplot2 because the real data set is larger.
You would need to reshape the data.
Here is my solution (broken down by plot)
For first plot
library(tidyverse)
##For first plot
prepare_data_1 <- dataf %>% select(index, taxa1:taxa3) %>%
gather(taxa,value, -index) %>%
mutate(index = str_trim(index)) %>%
group_by(index) %>% mutate(prop = value/sum(value))
##Plot 1
prepare_data_1 %>%
ggplot(aes(x = index, y = prop, fill = fct_rev(taxa))) + geom_col()
For second plot
##For second plot
prepare_data_2 <- dataf %>% select(group, taxa1:taxa3) %>%
gather(taxa,value, -group) %>%
mutate(group = str_trim(group)) %>%
group_by(group) %>% mutate(prop = value/sum(value))
##Plot 2
prepare_data_2 %>%
ggplot(aes(x = group, y = prop, fill = fct_rev(taxa))) + geom_col()
##You need to reshape data before doing that.
dfm = melt(dataf, id.vars=c("index","group"),
measure.vars=c("taxa1","taxa2","taxa3"),
variable.name="variable", value.name="values")
ggplot(dfm, aes(x = index, y = values, group = variable)) +
geom_col(aes(fill=variable)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.25)) +
geom_text(aes(label = values), position = position_stack(vjust = .5), size = 3) + theme_gray()

Resources