Don't understand simple pie chart (coord_polar()) - r

please consider the following simple data frame, where each observation/individual has multiple variables whose values add up to 1. For example, A through E could be different body parts, and each person's body parts all add up 100% of the person's total weight, but the proportions might be different between individuals:
library(ggplot2)
library(dplyr)
test.df <- data.frame("Variable" = LETTERS[1:5], "Obs1" = c(0.1, 0, 0.5, 0.2, 0.2),
"Obs2" = c(0.3, 0.7, 0, 0, 0))
This should produce the following data frame
> test.df
Variable Obs1 Obs2
1 A 0.1 0.3
2 B 0.0 0.7
3 C 0.5 0.0
4 D 0.2 0.0
5 E 0.2 0.0
I'd like a pie chart for each observation. So, a total of two pie charts, where all the variables are shown in a legend for each chart, and the color codes match between charts, and values of '0' are represented in the legend but not on the chart itself.
I'm positive that what I'd like to do is simple, but there's some stumbling block that I'm not seeing. I've done this before successfully, but it seems I did not truly understand because now I'm having trouble. I have tried:
ggplot(test.df, aes(x='', y='Obs1', fill = Variable)) +
geom_bar(width = 1, stat = 'identity') + coord_polar("y", start = 0)
What I end up with is a pie chart where each of the five varibles take up equal amounts of space, even though I have specified that the values of 'y' should come from Obs1 in the data frame:
Can anyone please help me? This is driving me crazy!
Best,
A

You could convert your data to a longer format using pivot_longer from tidyr to make sure you can plot two graphs for Obs1 and Obs2 using facet_wrap like this:
library(ggplot2)
library(dplyr)
library(tidyr)
test.df %>%
pivot_longer(cols = Obs1:Obs2) %>%
ggplot(aes(x='', y= value, fill = Variable)) +
geom_bar(width = 1, stat = 'identity') +
coord_polar("y", start = 0) +
facet_wrap(~name)
Created on 2023-02-18 with reprex v2.0.2

Related

Heatmap with one column in R

I have a dataframe with scores associated with every cell and I have the result of clustering (not related to the score) in one column of my dataframe:
>head(clust.labs)
type value cell
1 1 0.3 1
2 1 0.5 2
3 1 -0.3 3
4 1 0.5 4
5 1 0.3 5
6 1 0.3 6
I want to make a heatmap with one column representing the cells, samples coming in order and colors represent the scores(value). Currently I have made a heatmap that looks like below, I want the colored parts squished to one column and. I want a rectangle to be on the left representing samples. How can I do that?
ggplot(data = clust.labs, mapping = aes(x = type,
y = cell,
fill = value)) +
geom_tile() +
xlab(label = "Sample")
I am not completely sure how the output should look, but I decided to give it a try. Since you wanted to make a single column plot, you should add a variable that has the same value for all samples, which in this case I named dummy. Then, you can do the heatmap and add the rectangle using geom_rect. Finally, you can adjust the x-axis breaks to avoid showing the -0.5 and 0.5 labels.
library(ggplot2)
library(dplyr)
df |>
mutate(dummy = 1) |>
ggplot(aes(x = factor(dummy),
y = cell,
fill = value)) +
#Add rectangle
geom_rect(aes(xmin=factor(-0.5),
xmax=factor(0.5),
ymin=0.5,
ymax=1.5),
colour = "black",
fill = "transparent") +
geom_tile() +
# Change breaks for x axis
scale_x_discrete(breaks = c(0,1)) +
xlab(label = "Sample")
This can be done using plot_ly. We have to convert the dataframe to a matrix and then run
as.matrix(as.numeric(clust.labs$value))->my.mat
colnames(my.mat)<-"KS.score"
rownames(my.mat)<-as.character(seq(1, length(my.mat[,1])))
cbind(my.mat, as.numeric(clust.labs$type))->my.mat
colnames(my.mat)<-c("KS.score", "Cluster")
plot_ly(z=my.mat, type="heatmap")

How to add lines to connect certain data points across multiple boxplots in R?

I am trying to create a graph similar to this, where one specific data point for each boxplot is connected to the next through a red line.
My current code is:
p <- ggplot(melt_opt_base, aes(factor(variable), value))
p + geom_boxplot() + labs(x = "Variable", y = "Value")
And the current graphs looks like this. Assuming the data points to connect are:
points = c(0, 0.1, 0.2, 0.3, 0, 0.2, 0.2, 0.1, 0.3)
Does anyone know how I could add a line connecting these points across the nine adjacent boxplots, so that it would look like this instead?
This could be achieved like so:
Make a dataframe with your categories and the point values
Pass this dataframe as data to geom_line and or geom_point
Making use of ggplot2::diamonds as example data:
library(ggplot2)
points <- aggregate(price ~ cut, FUN = mean, data = diamonds)
points
#> cut price
#> 1 Fair 4358.758
#> 2 Good 3928.864
#> 3 Very Good 3981.760
#> 4 Premium 4584.258
#> 5 Ideal 3457.542
ggplot(diamonds, aes(cut, price)) +
geom_boxplot() +
geom_point(data = points, color = "red") +
geom_line(data = points, aes(group = 1), color = "red")

How to succinctly graph all variables of data frame with ggplot2 [duplicate]

This question already has answers here:
Plotting two variables as lines using ggplot2 on the same graph
(5 answers)
Closed 3 years ago.
I have a dataframe, df,similar to the following:
Time Sample_A Sample_B Sample_C
0 0.12 0.14 0.15
1 0.13 0.20 0.21
2 0.31 0.34 0.36
I am reading in this data from a text file, in which the number of columns will always be changing. I would like to use ggplot in order to quickly and easily graph the x value (always Time) by all of the y values (Sample A, B, C, ....) onto a single graph. The names of the Y-variables are always changing as well.
In essence, I'd like to avoid doing the following on repeat:
ggplot(df, aes(x = Time, y = Sample_A) + geom_line()
ggplot(df, aes(x = Time, y = Sample_B) + geom_line()
I have tried to create a vector that contains all names of the columns and apply that as the Y-values to the aes function, however it returns the number of variables, rather than the values within the variables.
What is the most efficient way to go about this?
This is pretty simple:
library(tidyverse)
df <- tibble(
time = c(0, 1, 2),
Sample_A = c(0.12, 0.13, 0.31),
Sample_B = c(0.14, 0.20, 0.34),
Sample_C = c(0.15, 0.21, 0.36)
)
df %>%
gather(key = sample, value = value, -time) %>%
ggplot(aes(x = time, y = value, color = sample)) +
geom_line()
Basically, you can gather all of the columns except the first into a "long" data frame instead of a "wide" one. Then a couple lines of ggplot code will plot the result, colored by sample.
Use lapply to render a geom_line that loops over the columns like this:
ggplot(data) +
lapply(names(data)[2:length(data)], FUN = function(i) geom_line(aes_string(x = time, y = i)))

Readjusting the horizontal axis in ggplot

I have a simple dataset, containing values from 0 to 1. When I plot it, naturally, the horizontal axis is zero. I would like this reference to be 0.5 and the bars falling below 0.5 to be reversed and colored differently than those falling above this threshold.
my.df <- data.frame(group=state.name[1:20],col1 = runif(20))
p <- ggplot(my.df, aes(x=group,y=col1)) +
geom_bar(stat="identity")+ylim(0,0.5)
I am thinking of dissecting the data into two, one subset being greater than 0.5 and the other being larger than 0.5, then somewhat combining these two subsets in the same ggplot. Is there any other clearer way to do that? Thanks!
To build on #jas_hughes's answer, you can subtract 0.5 from your col1 variable, then rename the labels on the y-axis.
df <- data.frame(group=state.name[1:20],value=runif(20))
df %>% ggplot(aes(reorder(group,value),value-0.5)) + geom_bar(stat='identity') +
scale_y_discrete(name='Value',
labels=c('0','0.5','1'),
limits=c(-0.5,0,0.5),
expand = c(-0.55, 0.55)) +
xlab('State') +
theme(axis.text.x = element_text(angle=45,hjust=1))
The y-variable you are trying to communicate is distance from 0.5, so you need to change the values in col1 to reflect this.
library(dplyr)
library(ggplot)
my.df %>%
mutate(col2 = col1-0.5) %>%
ggplot() +
aes(x = group, y = col2, fill = col2 >=0) +
geom_bar(stat = 'identity') +
theme(legend.position = 'none',
axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
ylab('Col1 above 0.5 (AU)')
Note, you can also use the aes(fill = col1 >= 0.5) option to color code the bars without shifting the axis (which is what I would recommend if col1 contains percentages).

ggplot: How does geom_tile calculate the fill? [duplicate]

I used geom_tile() for plot 3 variables on the same graph... with
tile_ruined_coop<-ggplot(data=df.1[sel1,])+
geom_tile(aes(x=bonus, y=malus, fill=rf/300))+
scale_fill_gradient(name="vr")+
facet_grid(Seuil_out_coop_i ~ nb_coop_init)
tile_ruined_coop
and I am pleased with the result !
But What kind of statistical treatment is applied to fill ? Is this a mean ?
To plot the mean of the fill values you should aggregate your values, before plotting. The scale_colour_gradient(...) does not work on the data level, but on the visualization level.
Let's start with a toy Dataframe to build a reproducible example to work with.
mydata = expand.grid(bonus = seq(0, 1, 0.25), malus = seq(0, 1, 0.25), type = c("Risquophile","Moyen","Risquophobe"))
mydata = do.call("rbind",replicate(40, mydata, simplify = FALSE))
mydata$value= runif(nrow(mydata), min=0, max=50)
mydata$coop = "cooperative"
Now, before plotting I suggest you to calculate the mean over your groups of 40 values, and for this operation like to use the dplyr package:
library(dplyr)
data = mydata %>% group_by("bonus","malus","type","coop") %>% summarise(vr=mean(value))
Tow you have your dataset ready to plot with ggplot2:
library(ggplot2)
g = ggplot(data, aes(x=bonus,y=malus,fill=vr))
g = g + geom_tile()
g = g + facet_grid(type~coop)
and this is the result:
where you are sure that the fill value is exactly the mean of your values.
Is this what you expected?
It uses stat_identity as can be seen in the documentation. You can test that easily:
DF <- data.frame(x=c(rep(1:2, 2), 1),
y=c(rep(1:2, each=2), 1),
fill=1:5)
# x y fill
#1 1 1 1
#2 2 1 2
#3 1 2 3
#4 2 2 4
#5 1 1 5
p <- ggplot(data=DF) +
geom_tile(aes(x=x, y=y, fill=fill))
print(p)
As you see the fill value for the 1/1 combination is 5. If you use factors it's even more clear what happens:
p <- ggplot(data=DF) +
geom_tile(aes(x=x, y=y, fill=factor(fill)))
print(p)
If you want to depict means, I'd suggest to calculate them outside of ggplot2:
library(plyr)
DF1 <- ddply(DF, .(x, y), summarize, fill=mean(fill))
p <- ggplot(data=DF1) +
geom_tile(aes(x=x, y=y, fill=fill))
print(p)
That's easier than trying to find out if stat_summary can play with geom_tile somehow (I doubt it).
scale_fill() and geom_tile() apply no statistics -or better apply stat_identity()- to your fill value=rf/300. It just computes how many colors you use and then generates the colors with the munsell function 'mnsl()'. If you want to apply some statistics only to the colors displayed you should use:
scale_colour_gradient(trans = "log")
or
scale_colour_gradient(trans = "sqrt")
Changing the colors among the tiles could not be the best idea since the plots have to be comparable, and you compare the values by their colours. Hope this helps

Resources