Why pies are flat in geom_scatterpie in R? - r

Why are the pies flat?
df<- data.frame(
Day=(1:6),
Var1=c(172,186,191,201,205,208),
Var2= c(109,483,64010,161992,801775,2505264), A=c(10,2,3,4.5,16.5,39.6), B=c(10,3,0,1.4,4.8,11.9), C=c(2,5,2,0.1,0.5,1.2), D=c(0,0,0,0,0.1,0.2))
ggplot() +
geom_scatterpie(data = df, aes(x = Var1 , y = Var2, group = Var1), cols = c("A", "B", "C", "D"))
I have tried using coord_fixed() and does not work either.

The problem seems to be the scales of the x- and y-axes. If you rescaled them to both to have zero mean and unit variance, the plot works. So, one thing you could do is plot the rescaled values, but transform the labels back into the original scale. To do this, you would have to do the following:
Make the data:
df<- data.frame(
Day=(1:6),
Var1=c(172,186,191,201,205,208),
Var2= c(109,483,64010,161992,801775,2505264), A=c(10,2,3,4.5,16.5,39.6), B=c(10,3,0,1.4,4.8,11.9), C=c(2,5,2,0.1,0.5,1.2), D=c(0,0,0,0,0.1,0.2))
Rescale the variables
df <- df %>%
mutate(x = c(scale(Var1)),
y = c(scale(Var2)))
Find the linear map that transforms the rescaled values back into their original values. Then, you can use the coefficients from the model to make a function that will transform the rescaled values back into the original ones.
m1 <- lm(Var1 ~ x, data=df)
m2 <- lm(Var2 ~ y, data=df)
trans_x <- function(x)round(coef(m1)[1] + coef(m1)[2]*x)
trans_y <- function(x)round(coef(m2)[1] + coef(m2)[2]*x)
Make the plot, using the transformation functions as the call to labels in the scale_[xy]_continuous() functions
ggplot() +
geom_scatterpie(data=df, aes(x = x, y=y), cols = c("A", "B", "C", "D")) +
scale_x_continuous(labels = trans_x) +
scale_y_continuous(labels = trans_y) +
coord_fixed()
There may be an easier way than this, but it wasn't apparent to me.

The range on the y-axis is so large it's compressing the disks to lines. Change the y-axis to a log scale, and you can see the shapes. Adding coord_fixed() to keep the pies circular:
ggplot() +
geom_scatterpie(data = df, aes(x = Var1 , y = Var2, group = Var1), cols = c("A", "B", "C", "D")) +
scale_y_log10() +
coord_fixed()

Related

For looping x-as in ggplot

I would like to create multiple histograms (ggplot) using a for loop. The problem is that my x-as from the plots, stay the same like "value". Do you know how to change the x-as every time it loops?
My dataframe for example:
df <- data.frame(variable = c("A", "A", "B", "B", "C", "C"), value = c(1, 2, 4, 5, 2, 3))
So that means I get three plots with x-as: "A", "B" and "C"
My code:
for (i in unique(df$variable)){
d <- subset(df, df$variable == i)
print(ggplot(d, aes(x = value)) + geom_histogram())
}
You can take help of imap to get different x-axis value after splitting the data by variable.
library(ggplot2)
list_plot <- df %>%
split(.$variable) %>%
purrr::imap(~ggplot(.x, aes(x = value)) +
geom_histogram() + xlab(.y))
Also have you considered using facets? Where x-axis is the same and you get A, B, C as facet names.
ggplot(df, aes(x = value)) + geom_histogram() + facet_wrap(~variable)

geom_scatterpie with non-numeric axes

I want to have a species x sample (both strings/factors) scatterplot with piecharts instead of points. The size of the points shall be correlated to the abundance of the species in each sample.
This can be easily done with just points as this:
d <- data.frame(Tax=c("A", "B", "C"), Sample=c("01", "02", "03"))
d$A <- abs(rnorm(3, sd=1))
d$B <- abs(rnorm(3, sd=2))
d$size=c(0.1,0.2,0.3)
library(ggplot2)
ggplot(d,aes(x=Tax, y=Sample, size=size)) + geom_point()
To replace the points with piecharts can be achieved with geom_scatterpie of the scatterpie package (available on CRAN)
However, this does not work with factors in the x/y aesthetics:
library(scatterpie)
ggplot() + geom_scatterpie(aes(x=Tax, y=Sample, r=size), data=d, cols=c("A", "B"))
Warning:
Removed 6 rows containing non-finite values (stat_pie).
The panel is drawn, but stays empty. Note that scatterpie works well with numeric x/y aesthetics:
d <- data.frame(x=c(1,2,3), y=c(1,2,3))
d$A <- abs(rnorm(3, sd=1))
d$B <- abs(rnorm(3, sd=2))
d$size=c(0.1,0.2, 0.3)
ggplot() + geom_scatterpie(aes(x=x, y=y, r=size), data=d, cols=c("A", "B")) + coord_fixed()
How can i change geom_scatterpie to accept non-numeric axes?
This should work:
d2 <- d %>%
mutate(tax_num = as.numeric(as.factor(Tax)),
sample_num = as.numeric(as.factor(Sample)))
ggplot() + geom_scatterpie(data=d2, aes(x=tax_num, y=sample_num, r=size), cols=c("A", "B")) +
scale_x_continuous(breaks=c(1,2,3), labels=c("A", "B", "C")) +
scale_y_continuous(breaks=c(1,2,3), labels=c("01", "02", "03")) +
labs(x="Tax", y="Sample") +
coord_fixed()

Individual legends for separate geom_line aesthetics in the same ggplot

I'm new to R and I'm trying to create a single plot with data from 2 melted dataframes.
Ideally I would have a legend for each of the dataframes with their respective titles; however, I get a only a single legend with the title of the first aesthetic.
My starting point is:
aerobic_melt <- melt(aerobic, id.vars = 'Distance', variable.name = 'Aerobic')
anaerobic_melt <- melt(anaerobic, id.vars = 'Distance', variable.name = 'Anaerobic')
plot <- ggplot() +
geom_line(data = aerobic_melt, aes(Distance, value, col=Aerobic)) +
geom_line(data = anaerobic_melt, aes(Distance, value, col= Anaerobic)) +
xlim(0, 125) +
ylab('Energy (J/kg )') +
xlab('Distance (m)')
Which results in
I've searched, but with my limited ability I haven't been able to find a way to do it.
My question is:
How do I create separate legends with titles 'Aerobic' and 'Anaerobic' which should respectively refer to A,B,C,F,G,L and E,H,I,J,K?
Any help is appreciated.
Obviously we don't have your data, but I have created some sample data that should have the same names and structure as your own data frames, since it works with your own plot code. See the end of the answer for the data used here.
You can use the package ggnewscale if you want two color scales on the same plot. Just add in a new_scale_color() call between your geom_line calls. I have left the rest of your code as-is.
library(ggplot2)
library(ggnewscale)
plot <- ggplot() +
geom_line(data = aerobic_melt, aes(Distance, value, col=Aerobic)) +
new_scale_color() +
geom_line(data = anaerobic_melt, aes(Distance, value, col= Anaerobic)) +
xlim(0, 125) +
ylab('Energy (J/kg )') +
xlab('Distance (m)')
plot
Data
set.seed(1)
aerobic_melt <- data.frame(
Aerobic = rep(c("A", "B", "C", "F", "G", "L"), each = 120),
value = as.numeric(replicate(6, cumsum(rnorm(120)))),
Distance = rep(1:120, 6))
anaerobic_melt <- data.frame(
Anaerobic = rep(c("E", "H", "I", "J", "K"), each = 120),
value = as.numeric(replicate(5, cumsum(rnorm(120)))),
Distance = rep(1:120, 5))

How to draw a barplot from counts data in R?

I have a data-frame 'x'
I want barplot like this
I tried
barplot(x$Value, names.arg = x$'Categorical variable')
ggplot(as.data.frame(x$Value), aes(x$'Categorical variable')
Nothing seems to work properly. In barplot, all axis labels (freq values) are different. ggplot is filling all bars to 100%.
You can try plotting using geom_bar(). Following code generates what you are looking for.
df = data.frame(X = c("A","B C","D"),Y = c(23,12,43))
ggplot(df,aes(x=X,y=Y)) + geom_bar(stat='identity') + coord_flip()
It helps to read the ggplot documentation. ggplot requires a few things, including data and aes(). You've got both of those statements there but you're not using them correctly.
library(ggplot2)
set.seed(256)
dat <-
data.frame(variable = c("a", "b", "c"),
value = rnorm(3, 10))
dat %>%
ggplot(aes(x = variable, y = value)) +
geom_bar(stat = "identity", fill = "blue") +
coord_flip()
Here, I'm piping my dat to ggplot as the data argument and using the names of the x and y variables rather than passing a data$... value. Next, I add the geom_bar() statement and I have to use stat = "identity" to tell ggplot to use the actual values in my value rather than trying to plot the count of the number.
You have to use stat = "identity" in geom_bar().
dat <- data.frame("cat" = c("A", "BC", "D"),
"val" = c(23, 12, 43))
ggplot(dat, aes(as.factor(cat), val)) +
geom_bar(stat = "identity") +
coord_flip()

confusion with overriding ggplot data in a plot layer

Following is a short code to generate a barplot with an added layer of line plot. I have added comments indicating what works and what doesn't. While my problem is solved, I can't understand why I had a problem or how it got solved. If you can explain or suggest the right way to do it, that would be nice.
library(ggplot2)
factors <- c("A", "B", "C", "D", "B", "A", "C", "B", "D", "D")
data <- data.frame(n=1:10, a= runif(10, 1, 5), b=runif(10, 1, 5),c=runif(10, 1, 5))
gg_data <- melt(data, id.vars="n", variable.name="var")
gg_data$alp <- rep(factors, 3)
gg_data1 <- melt(data.frame(n=1:10, a= runif(10, 2, 3), b=runif(10, 4, 5),c=runif(10, 3, 4)), id.vars="n", variable.name="var")
#this does not work
ggplot(data= gg_data, aes(x=n, y=value, fill=alp))+geom_bar(stat="identity")+ facet_grid( var ~ ., scale="free_y")+geom_line(data= gg_data1, aes(x= n, y=value))
#this gives a weird output
gg_data1$alp <- rep(factors, 3)
ggplot(data= gg_data, aes(x=n, y=value, fill=alp))+geom_bar(stat="identity")+ facet_grid( var ~ ., scale="free_y")+geom_line(data= gg_data1, aes(x= n, y=value))
#this works the way I want it to, don't know why.
gg_data1$alp <- "A"
ggplot(data= gg_data, aes(x=n, y=value, fill=alp))+geom_bar(stat="identity")+ facet_grid( var ~ ., scale="free_y")+geom_line(data= gg_data1, aes(x= n, y=value))
Basically your plots are combining information from the two datasets to try to get a new plot. Since you have listed that fill = alp, then ggplot is trying to apply this information to all of your plots.
The easiest way to see this is consider this new data.frame:
gg1 <- gg_data1
names(gg1) <- c("n1", "var1", "value1")
gg_combine <- cbind(gg_data, gg1)
To reproduce your 2nd graph it is equivalent to:
ggplot(data=gg_combine, aes(x=n, y=value, fill=alp))+
geom_bar(stat="identity")+
geom_line(aes(x=n1, y=value1, colour=alp)) +
facet_grid( var ~ ., scale="free_y")
Basically what it is saying is I want to group everything by "alp" and plot them together by those groups, which is why you get those lines; with the addition of colour=alp then it becomes clear why the lines look that way.
With your last plot. What you've done is only group the bar plots with alp, but with the lines we want to ignore this grouping. This is equivalent to:
ggplot(data=gg_combine, aes(x=n, y=value))+
geom_bar(aes(fill=alp), stat="identity")+
geom_line(aes(x=n1, y=value1)) +
facet_grid( var ~ ., scale="free_y")
Hope this helps.

Resources