Graph plotting first,second,and third values of variables - r

For example I have this dataset:
c1 c2
A 1
A 3
A 10
B 5
B 4
C 3
C 4
C 6
A 5
C 7
Is there a short way to maybe plot in 1 graph the first third of values of the A,B,C, the second third of values A,B,C, and the third third values A,B,C. For every variables there will be 3 lines.
So there will be 9 lines in total

You could use group_split and lapply:
df <- data.frame(c1 = rep(LETTERS[1:3], 3), c2 = sample(1:10, size = 9, rep = T))
df %>%
group_by(c1) %>%
mutate(num = 1:n()) %>%
group_split(num) -> plot_list
lapply(plot_list, function(x) {
ggplot(x, aes(x = num, y = c2)) + geom_line()
})
Or you use facets:
df %>%
group_by(c1) %>%
mutate(num = 1:n()) %>%
ggplot() +
facet_grid(scales = "free", cols = vars(num)) +
geom_line(aes(x = c1, y = c2, group = num))

Related

Can I make a bar plot, where each bar represents a column in a data frame?

I would like to make a bar plot, where each bar is represented by one of the three columns in this data frame. The 'size' of each bar depends on the sum created by adorn_totals.
Reproducible example:
library(janitor)
test_df <- data.frame(
a = c(1:5),
b = c(1:5),
c = c(1:5)
) %>%
adorn_totals(where = 'row', tabyl = c(a, b, c))
I tried a solution that has previously been posted, but that didn't work:
Link to the post: Bar plot for each column in a data frame in R
library(janitor)
library(ggplot2)
df <- data.frame(
a = c(1:5),
b = c(1:5),
c = c(1:5)
) %>%
adorn_totals(where = 'row', tabyl = c(a, b, c))
lapply(names(df), function(col) {
ggplot(df, aes(.data[[col]], ..count..)) +
geom_bar(aes(fill = .data[[col]]), position = "dodge")
}) -> list_plots
This is one way:
library(janitor)
library(ggplot2)
test_df <- data.frame(
a = c(1:5),
b = c(1:5),
c = c(1:5)
) %>%
adorn_totals(where = 'row', tabyl = c(a, b, c))
tail(test_df,1) %>% stack() %>%
ggplot(aes(ind, values)) + geom_col()
Created on 2022-11-07 with reprex v2.0.2
Of course, you don't need to totalize the df before plotting it, since ggplot does it for you. I add another example with an explanation of stack, some color, and no totals.
library(ggplot2)
test_df <- data.frame(
a = c(1:5),
b = c(1:5),
c = c(1:5))
test_df |> stack()
#> values ind
#> 1 1 a
#> 2 2 a
#> 3 3 a
#> 4 4 a
#> 5 5 a
#> 6 1 b
#> 7 2 b
#> 8 3 b
#> 9 4 b
#> 10 5 b
#> 11 1 c
#> 12 2 c
#> 13 3 c
#> 14 4 c
#> 15 5 c
test_df |> stack() |>
ggplot(aes(ind, values, fill=ind)) + geom_col()
Created on 2022-11-07 with reprex v2.0.2
If you want to use ggplot, you would be best to slice the totals off the bottom, pivot into long format and plot the result:
library(janitor)
library(tidyverse)
data.frame(
a = c(1:5),
b = c(1:5),
c = c(1:5)
) %>%
adorn_totals(where = 'row', tabyl = c(a, b, c)) %>%
slice_tail(n = 1) %>%
pivot_longer(everything()) %>%
ggplot(aes(name, value, fill = name)) +
geom_col(color = "gray") +
scale_fill_brewer() +
theme_minimal(base_size = 16)
Two pivot_longer alternatives without janitor::adorn_totals()
#uses the internal weight stat to calculate the sum
#geom_bar only uses one aesthetic (x OR y)
data.frame(a = c(1:5), b = c(1:5), c = c(1:5)) %>%
pivot_longer(everything()) %>%
ggplot(aes(name, weight=value))+
geom_bar()
#geom_col version
#Lots of flexibility in summarise:
data.frame(a = c(1:5), b = c(1:5), c = c(1:5)) %>%
pivot_longer(everything()) %>%
group_by(name) %>%
summarise(total=sum(value)) %>%
ggplot(aes(name, total))+
geom_col()

Summarise and create a stacked bar chart in R

For a population of individuals I have a regular time series of what category they fall into. I would like to summarise the composition of this population over time, by the categories, as a stacked bar chart in R. For example:
set.seed(1)
id <- seq(1:25)
t1 <- sample(LETTERS[1:5], 25, replace=TRUE)
t2 <- sample(LETTERS[1:5], 25, replace=TRUE, prob=c(0.1,0.1,0.1,0.1,0.6))
t3 <- sample(LETTERS[1:5], 25, replace=TRUE, prob=c(0.2,0.1,0.2,0.1,0.4))
df <- data.frame(cbind(id, t1, t2, t3))
with frequencies:
> table(df$t1)
A B C D E
7 6 3 2 7
> table(df$t2)
B C D E
3 4 5 13
> table(df$t3)
A B C D E
4 2 5 4 10
So, at time period 1, 7 of the 25 are category A, 6 category B, whilst at time period 2, none are category A, 3 category B, etc. The chart will look like this (from EXCEL):
Can this be made in ggplot? Thanks.
Here is an option with data.table
library(dplyr)
library(data.table)
library(ggplot2)
melt(setDT(df), id.var = "id")[, .N, .(variable, value)][, perc := N / sum(N), variable] %>%
ggplot(aes(x = variable, y = perc, fill = value)) +
geom_bar(stat = "identity") +
scale_y_continuous(labels = scales::percent)
This can be done by first reshaping into 'long' format with pivot_longer, then get the frequency count and use the summarised 'n' as 'y' in ggplot aes while specifying the 'x' as 'name' and the fill as 'value' column created from pivot_longer
library(dplyr)
library(tidyr)
library(ggplot2)
df %>%
pivot_longer(cols = everything()) %>%
count(name, value) %>%
ggplot(aes(x = name, y = n, fill = value)) +
geom_col()
If we need proportion instead of count,
df %>%
pivot_longer(cols = everything()) %>%
count(name, value) %>%
group_by(name) %>%
mutate(prop = n/sum(n)) %>%
ggplot(aes(x = name, y = prop, fill = value)) +
geom_col() +
scale_y_continuous(labels= scales::percent)

Mean across each element of a tibble list-column by group with purrr and dplyr

I'm trying to get used to using tidyverse. I don't know if my data is well suited for using functions like map(). I like the organization of list-columns so I am wondering how to use a combination of group_by(), summarize(), map(), and other functions to get this to work. I know how to use these functions with vector-columns but do not know how to approach this in the case of list-columns.
Sample data:
library(tidyverse)
set.seed(3949)
myList <- replicate(12, sample(1:20, size = 10), simplify = FALSE)
tibble(
group = rep(c("A", "B"), each = 6),
data = myList
)
Each vector in the list-column has ten elements which are values for a given trial. What I would like to do is group the tibble by group and then find the "column" mean and se of the expanded lists. In other words, it's like I'm treating the list columns as a matrix with each row of the tibble bound together. The output will have columns for the group and trials as well so it is in the correct format for ggplot2.
mean se group trial
1 6.000000 1.6329932 A 1
2 12.666667 2.3333333 A 2
3 12.333333 2.8007935 A 3
4 13.833333 1.8150605 A 4
5 8.166667 3.1028661 A 5
6 11.500000 2.9410882 A 6
7 13.666667 2.3758040 A 7
8 6.833333 1.7779514 A 8
9 11.833333 2.3009660 A 9
10 8.666667 1.7061979 A 10
11 8.333333 1.6865481 B 1
12 12.166667 2.6002137 B 2
13 10.000000 2.7080128 B 3
14 11.833333 3.1242777 B 4
15 4.666667 1.2823589 B 5
16 12.500000 3.0413813 B 6
17 6.000000 1.5055453 B 7
18 8.166667 1.6616591 B 8
19 11.000000 2.6708301 B 9
20 13.166667 0.9457507 B 10
Here is how I would normally do something like this:
set.seed(3949)
data.frame(group = rep(c("A", "B"), each = 6)) %>%
cbind(replicate(12, sample(1:20, size = 10)) %>% t()) %>%
split(.$group) %>%
lapply(function(x) data.frame(mean = colMeans(x[ ,2:11]),
se = apply(x[ ,2:11], 2, se))) %>%
do.call(rbind,.) %>%
mutate(group = substr(row.names(.), 1,1),
trial = rep(1:10, 2)) %>%
ggplot(aes(x = trial, y = mean)) +
geom_point() +
geom_line() +
facet_grid(~ group) +
scale_x_continuous(limits = c(1,10), breaks = seq(1, 10, 1)) +
geom_errorbar(aes(ymin = mean-se, ymax = mean+se), color = "black") +
theme_bw()
Is there are cleaner way to do this with the tidyverse functions?
I think that another way is to use nest() and map().
library(tidyverse)
library(plotrix) #For the std.error
# Your second sample dataset
set.seed(3949)
df <- data.frame(group = rep(c("A", "B"), each = 6)) %>%
cbind(replicate(12, sample(1:20, size = 10)) %>% t())
df %>%
nest(-group) %>%
mutate(mean = map(data, ~rowMeans(.)),
se = map(data, ~ plotrix::std.error(t(.))),
trial = map(data, ~ seq(1, nrow(.)))) %>%
unnest(mean, se, trial) %>%
ggplot(aes(x = trial, y = mean)) +
geom_point() +
geom_line() +
facet_grid(~ group) +
geom_errorbar(aes(ymin = mean-se, ymax = mean+se), color = "black") +
theme_bw()

facet_grid with multiple line colours

I have the following data frame resulting from simulations of ODEs with different parameter sets, e.g.
df <- data.frame(t = rep(seq(0,4), 4),
x1 = c(1.2*seq(1,5), 1.3*seq(1,5), 1.4*seq(1,5), 1.5*seq(1,5)),
x2 = c(0.2*seq(1,5), 0.3*seq(1,5), 0.4*seq(1,5), 0.5*seq(1,5)),
a = rep(c(rep(1, 5), rep(2,5)), 2),
b = c(rep(1, 10), rep(2,10))
)
I now would like to have a facet_grid with x1 and x2 on top and a and b on the right where the values of a and b determine the line colour.
I tried
df.1 <- df %>%
gather(x, xval, -t, -a, -b) %>%
gather(p, pval, -t, -x, -xval) %>%
distinct()
df.1$pval <- as.factor(df.1$pval)
ggplot(df.1, aes(t, xval)) +
geom_line(aes(colour = pval)) +
facet_grid(p~x)
and
dm.1 <- melt(df[, c("t", "x1", "x2")], id = 't')
colnames(dm.1) <- c("t", "x", "xval")
dm.2 <- melt(df[, c("t", "a", "b")], id = 't')
colnames(dm.2) <- c("t", "p", "pval")
dm <- merge(dm.1, dm.2)
dm$pval <- as.factor(dm$pval)
ggplot(dm, aes(t, xval)) +
geom_line(aes(colour = pval)) +
facet_grid(p~x)
But both do not give the desired result. Any hint would be greatly appreciated.
Edit: The desired result would be to have two lines in each facet similar to my first solution but the correct ones, i.e. straight lines and not the zig-zag lines that result.
The problem that's causing the zigzag plots you're getting is that there are multiple repeats of the same combinations of p and x, but there isn't a way to demarcate one from the other. So you get the first plot below:
library(tidyverse)
df <- data.frame(t = rep(seq(0,4), 4),
x1 = c(1.2*seq(1,5), 1.3*seq(1,5), 1.4*seq(1,5), 1.5*seq(1,5)),
x2 = c(0.2*seq(1,5), 0.3*seq(1,5), 0.4*seq(1,5), 0.5*seq(1,5)),
a = rep(c(rep(1, 5), rep(2,5)), 2),
b = c(rep(1, 10), rep(2,10))
)
df_long <- df %>%
gather(key = x, value = xval, x1, x2) %>%
gather(key = p, value = pval, a, b) %>%
mutate(pval = as.factor(pval))
df_long %>%
ggplot(aes(x = t, y = xval)) +
geom_line(aes(color = pval)) +
facet_grid(p ~ x)
You can see what it looks like here when I filter for a specific pair of x, p values. This will just place data at the same value of t repeatedly, instead of knowing how to make distinct lines.
df_long %>%
filter(x == "x1", p == "a") %>%
head()
#> t x xval p pval
#> 1 0 x1 1.2 a 1
#> 2 1 x1 2.4 a 1
#> 3 2 x1 3.6 a 1
#> 4 3 x1 4.8 a 1
#> 5 4 x1 6.0 a 1
#> 6 0 x1 1.3 a 2
Instead, before gathering, you can make an ID for each combination of a and b, and use that as your grouping variable in aes. There are probably other ways to do this, but a simple one is just interaction(a, b), which will give IDs that look like 1.1, 1.2, 2.1, 2.2, etc. Then add group = id inside your aes to make separate lines.
df_long_id <- df %>%
mutate(id = interaction(a, b)) %>%
gather(key = x, value = xval, x1, x2) %>%
gather(key = p, value = pval, a, b) %>%
mutate(pval = as.factor(pval))
df_long_id %>%
ggplot(aes(x = t, y = xval, group = id)) +
geom_line(aes(color = pval)) +
facet_grid(p ~ x)
Created on 2018-05-09 by the reprex package (v0.2.0).

How to not show all frequency in bar-chart and group them in same category in R?

Sorry I am new in R!
I used table function to make a table using two different columns of a data frame like
C1 C2
A 2
B 1
A 2
C 1
A 1
C 3
D 2
C2 values are categories and I want to show frequency of each letter in C1 based on C2 values (here 1 and 2). I used function table as below to find the frequncy table.
t <- table(data$c1,data$c2)
Now I want to make a barchart using ggplot when x-axis is C2 values (1,2,3,..) and y-axis is frequency of each letter of column C1. I also want to show only few letters (like 5) with high frequency in each bar not frequency of all letters and put the rest of them as one part named for example "other". For instance if I have for value 1, letters (A,B,D, F, G) have high frequency show them in bar and put all other letters frequency in one part labels as "Other" and next bar is for frequency of different letters for value 2 ,...
This may be a strech, but it gets the job done.
library(dplyr)
library(ggplot2)
set.seed(1)
df <- data.frame(C1 = sample(LETTERS, 200, replace = T),
C2 = sample(1:5, 200, replace = T))
df$C1 <- as.character(df$C1)
df1 <- df %>% group_by(C2, C1) %>% summarise(Count = n()) %>% arrange(desc(Count))
df2 <- df1 %>% group_by(C2) %>% mutate(Index = 1:n())
df2$C1[df2$Index > 5] <- "Other"
df3 <- df2 %>% group_by(C2, C1) %>% summarise(updatedCount = sum(Count)) %>%
arrange(updatedCount) %>%
mutate(mid_y = ave(updatedCount, C2, FUN = function(val) cumsum(val) - (0.5 * val)))
ggplot(df3, aes(x = C2, y = updatedCount, fill = C1, label = C1)) +
geom_bar(stat = "identity") +
theme(legend.position="none") + geom_text(aes(y = mid_y))

Resources