Creating single row pie chart for budget in R with ggplot - r

To start off, this is my dput:
structure(list(Income = 18000, Rent = 7300, Wifi = 477, Gas = 900,
MTR_Bus = 600, Food = 3000, Total_Expenses = 12277, Remaining_Income = 5723), class = "data.frame", row.names = c(NA, -1L))
That comes up with a data frame like this:
I'm trying to create a simple pie chart in R using this budget, though I just need the expenses and income (in other words, I don't need the variables "Total Expenses" or "Remaining Income").
My issue is that the best I can come up with is something like this:
bar <- ggplot(data = Budget) + geom_bar(mapping = aes(x = Total_Expenses, fill = row))+coord_polar()
I guess my question is two-fold: 1) is this the correct structure for my code and 2) what should I be using for x or fill? I didn't really have much of a good answer from my book I was using.
Thanks for any help you can give!

Reshape you data using e.g. tidyr::pivot_longer():
library(tidyr)
library(dplyr)
library(ggplot2)
budget_pie <- Budget %>%
pivot_longer(everything()) %>%
filter(!grepl("^(Total|Remaining)", name))
ggplot(data = budget_pie, aes(x = "", y = value, fill = name)) +
geom_col() +
coord_polar("y", start = 0)

Related

Which R package can I use to plot kernel density plots per year?

I measured the number of occurrences of exclamation marks in the abstract and title of papers per year. Now, I want to show the distribution of this number for each individual year using a kernel density estimation. I want to plot my data in a way that I found in another publication (PlaveĢn-Sigray et al. eLife 2017, https://elifesciences.org/articles/2772):
Do you have any idea how I could achieve this using R? I would be glad if you could provide a package.
I added some toy data along with what I tested so far.
library(ggplot2)
set.seed(176)
df = data.frame(
id = seq(1:2000),
amount = sample(0:3, 2000, replace = TRUE),
year = sample(1990:2010, 2000, replace = T)
)
ggplot(df, aes(x = year, y = amount) ) +
geom_density_2d() +
geom_density_2d_filled() +
geom_density_2d(colour = "black")
I get the following result which is not really what I want:
Any help would be appreciated. Thank you in advance!
You can get a plot like this in ggplot directly without additional packages. Here's a full reprex:
set.seed(1)
df <- data.frame(year = rep(1920:2000, each = 100),
amount = rnorm(8100, rep(120:200, each = 100), 20))
library(tidyverse)
df %>%
group_by(year) %>%
summarize(Amount = density(amount, from = min(df$amount),
to = max(df$amount))$x,
Density = density(amount, from = min(df$amount),
to = max(df$amount))$y) %>%
ggplot(aes(year, Amount, fill = Density)) +
geom_raster(interpolate = TRUE) +
scale_fill_viridis_c(option = "magma") +
theme_minimal(base_size = 20) +
coord_cartesian(expand = 0) +
theme(legend.position = "top",
legend.key.width = unit(3, "cm"),
legend.title = element_text(vjust = 1))

How can I change dates on an X axis into 'day 1', 'day 2' etc for a line graph plot?

I am trying to modify a line graph i have already made. On the x axis, it has the data in which a participant completed a task. However, I am trying to make it so the x axis simply show each completed session of the task as day 1, day 2 etc.... Is there a way to do this?
My code for the line graph is as follows:
ggplot(data = p07_points_scored, aes(x = day, y = total_score, group = 1)) +
geom_line() +
geom_point() +
theme(axis.text.x = element_text(angle = 60, vjust = 0.5)) +
labs(title=" P07s Total score on the training tool",
x = "Date of training completion",
y = "Total Score",
color = "lightblue") +
geom_smooth()
To further add to this. I have 4 separate line graphs from individual participants showing their total scores within the task. Is there a way to combine the separate graphs together into 1?
Many thanks :)
enter image description here
Here is an example with fake data: The key point is to mutate a new column days and assign it to the x axis with fct_inorder():
library(tidyverse)
library(lubridate)
# Create some fake data:
date <- dmy("6-8-2022"):dmy("5-9-2022")
y = rnorm(31, mean = 2300, sd = 100)
df <- tibble(date, y)
df %>%
mutate(days = paste0("day",row_number())) %>%
ggplot(aes(x = fct_inorder(days), y = y, group= 1)) +
geom_point()+
geom_line()
data:
df <- structure(list(date = 19210:19240, y = c(2379.71407792736, 2349.90296535465,
2388.14396999868, 2266.84629740315, 2261.95099255488, 2270.90461436351,
2438.19569234793, 2132.6468717962, 2379.46892613664, 2406.13636097426,
2176.9392984643, 2219.0521150482, 2221.22674399102, 2399.82972150781,
2396.76276645913, 2233.62763324748, 2468.98833991591, 2397.47855248058,
2486.96828322353, 2330.04116860874, 2280.66624489061, 2411.09933781266,
2281.06682518505, 2281.63162850277, 2235.66952459084, 2271.2152525563,
2481.86164459452, 2544.25592495568, 2411.90218614317, 2275.60378793237,
2297.98843827031)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-31L))

Creating a mosaic plot with percentages

Following is the sample dataset that I have:
df <- structure(list(Class = c("A", "B", "C", "D"),
`Attempted` = c(374, 820, 31, 108),
`Missed` = c(291, 311, 5, 15),
`Cancelled` = c(330, 206, 6, 5),
`Unknown` = c(950, 341, 6, 13)),
class = "data.frame", row.names = c(NA, -4L))
I want to create a mosaic plot with 'percentages' instead of absolute numbers. To be precise, I want to see what percentage of 'class A' people out of the total 'class A' population 'missed' their test? And, similarly for other class population.
I have not tried any code yet as I have absolutely no clue how to start. Can anyone please help me with this?
Using only one package, you can do and note I am labeling the cells with the proportions in each class (i.e rows sum up to 1):
library(vcd)
M = as.table(as.matrix(df[,-1]))
names(dimnames(M)) = c("Class","result")
labs <- round(prop.table(M,margin=1), 2)
mosaic(M, pop = FALSE)
labeling_cells(text = labs, margin = 0)(M)
You can also just visualize it with a simple
library(RColorBrewer)
barplot(t(labs),col=brewer.pal(4,"Set2"))
legend("bottomright",legend = colnames(labs),inset=c(0,1.1), xpd=TRUE,
fill =brewer.pal(4,"Set2"),horiz=TRUE,cex=0.7)
If you use ggplot2 and another other gg stuff, you need to pivot your data long:
library(tidyr)
library(dplyr)
library(ggplot2)
df_long = df %>%
pivot_longer(-Class) %>%
group_by(Class) %>%
mutate(total = sum(value),
p = round(100*value/total,digits=1)) %>%
ungroup()
ggplot(df_long,aes(x=Class,y=p,fill=name)) + geom_col() + geom_text(aes(label=p),position=position_stack(vjust=0.2))
If you want to use ggplot2, you need to modify this answer by z.lin, note I take the sqrt to make the smaller plots more visible:
ggplot(df_long,
aes(x = Class, y = p, width = sqrt(total), fill = name)) +
geom_col(colour = "black") +
geom_text(aes(label = p), position = position_stack(vjust = 0.5)) +
facet_grid(~Class, scales = "free_x", space = "free_x") +
theme_void()

Heatmap in plotly with defined colors per category in r

I am trying to plot a heatmap with specified colors (by category) in plotly. I asked a similar question here: "Split" up by category in plotly.
However, I ran into a new problem while trying a similar thing with a heatmap. My code looks like:
# Test DataFrame
test_df <- data.frame(
"weekday" = c("Fr", "Sa", "Su"),
"time" = c("06:00:00", "12:00:00", "18:00:00"),
"channel" = c("NBC", "CBS Drama", "ABC"),
"colors" = c("#FCB711", "#162B48", "#AA8002"),
"views" = c(1200, 1000, 1250)
)
plot_ly(colors = unique(as.character(test_df$colors)), type = "heatmap") %>%
add_trace(test_df,
x = test_df$weekday,
y = test_df$time,
z = test_df$views,
type = "heatmap")
What I get is the following picture:
The problems I have here are:
1. The colors are not the colors which I told R to use
2. I do not want a colorscale, rather the categories split up channels.
I know there is a workaround in ggplot, and I am working on it, but I want to have it in plotly.
Here is what it looks like in ggplot and what I want to have in plotly (I am aware of ggplotly, but that still isn't pure plotly):
Here is the code for the above picture:
channel_colors <- test_df %>% distinct(colors) %>% pull(colors)
names(channel_colors) <- test_df %>% distinct(channel) %>% pull(channel)
p <- ggplot(data = test_df,
aes(
x = weekday,
y = time,
fill = channel)) +
geom_tile(aes(alpha = views)) +
scale_alpha(range = c(0.5, 1)) +
theme_minimal() +
scale_fill_manual(values = channel_colors)
ggplotly(p)
I would appreciate any help.

Bull's-eye charts

A colleague of mine needs to plot 101 bull's-eye charts. This is not her idea. Rather than have her slave away in Excel or God knows what making these things, I offered to do them in R; mapping a bar plot to polar coordinates to make a bull's-eye is a breeze in ggplot2.
I'm running into a problem, however: the data is already aggregated, so Hadley's example here isn't working for me. I could expand the counts out into a factor to do this, but I feel like there's a better way - some way to tell the geom_bar how to read the data.
The data looks like this:
Zoo Animals Bears Polar Bears
1 Omaha 50 10 3
I'll be making a plot for each zoo - but that part I can manage.
and here's its dput:
structure(list(Zoo = "Omaha", Animals = "50", Bears = "10", `Polar Bears` = "3"), .Names = c("Zoo",
"Animals", "Bears", "Polar Bears"), row.names = c(NA, -1L), class = "data.frame")
Note: it is significant that Animals >= Bears >= Polar Bears. Also, she's out of town, so I can't just get the raw data from her (if there was ever a big file, anyway).
While we're waiting for a better answer, I figured I should post the (suboptimal) solution you mentioned. dat is the structure included in your question.
d <- data.frame(animal=factor(sapply(list(dat[2:length(dat)]),
function(x) rep(names(x),x))))
cxc <- ggplot(d, aes(x = animal)) + geom_bar(width = 1, colour = "black")
cxc + coord_polar()
You can use inverse.rle to recreate the data,
dd = list(lengths = unlist(dat[-1]), values = names(dat)[-1])
class(dd) = "rle"
inverse.rle(dd)
If you have multiple Zoos (rows), you can try
l = plyr::dlply(dat, "Zoo", function(z)
structure(list(lengths = unlist(z[-1]), values = names(z)[-1]), class = "rle"))
reshape2::melt(llply(l, inverse.rle))
The way to do this without disaggregating is to use stat="identity" in geom_bar.
It helps to have the data frame containing numeric values rather than character strings to start:
dat <- data.frame(Zoo = "Omaha",
Animals = 50, Bears = 10, `Polar Bears` = 3)
We do need reshape2::melt to get the data organized properly:
library(reshape2)
d3 <- melt(dat,id.var=1)
Now create the plot (identical to the other answer):
library(ggplot2)
ggplot(d3, aes(x = variable, y = value)) +
geom_bar(width = 1, colour = "black",stat="identity") +
coord_polar()

Resources