A colleague of mine needs to plot 101 bull's-eye charts. This is not her idea. Rather than have her slave away in Excel or God knows what making these things, I offered to do them in R; mapping a bar plot to polar coordinates to make a bull's-eye is a breeze in ggplot2.
I'm running into a problem, however: the data is already aggregated, so Hadley's example here isn't working for me. I could expand the counts out into a factor to do this, but I feel like there's a better way - some way to tell the geom_bar how to read the data.
The data looks like this:
Zoo Animals Bears Polar Bears
1 Omaha 50 10 3
I'll be making a plot for each zoo - but that part I can manage.
and here's its dput:
structure(list(Zoo = "Omaha", Animals = "50", Bears = "10", `Polar Bears` = "3"), .Names = c("Zoo",
"Animals", "Bears", "Polar Bears"), row.names = c(NA, -1L), class = "data.frame")
Note: it is significant that Animals >= Bears >= Polar Bears. Also, she's out of town, so I can't just get the raw data from her (if there was ever a big file, anyway).
While we're waiting for a better answer, I figured I should post the (suboptimal) solution you mentioned. dat is the structure included in your question.
d <- data.frame(animal=factor(sapply(list(dat[2:length(dat)]),
function(x) rep(names(x),x))))
cxc <- ggplot(d, aes(x = animal)) + geom_bar(width = 1, colour = "black")
cxc + coord_polar()
You can use inverse.rle to recreate the data,
dd = list(lengths = unlist(dat[-1]), values = names(dat)[-1])
class(dd) = "rle"
inverse.rle(dd)
If you have multiple Zoos (rows), you can try
l = plyr::dlply(dat, "Zoo", function(z)
structure(list(lengths = unlist(z[-1]), values = names(z)[-1]), class = "rle"))
reshape2::melt(llply(l, inverse.rle))
The way to do this without disaggregating is to use stat="identity" in geom_bar.
It helps to have the data frame containing numeric values rather than character strings to start:
dat <- data.frame(Zoo = "Omaha",
Animals = 50, Bears = 10, `Polar Bears` = 3)
We do need reshape2::melt to get the data organized properly:
library(reshape2)
d3 <- melt(dat,id.var=1)
Now create the plot (identical to the other answer):
library(ggplot2)
ggplot(d3, aes(x = variable, y = value)) +
geom_bar(width = 1, colour = "black",stat="identity") +
coord_polar()
Related
I am attempting to complete a principal component analysis on a set of data containing columns of numeric data.
Assuming a dataset like this (in reality I have a pre configured data frame, this one if for reproducibility):
v1 <- c(1,2,3,4,5,6,7)
v2 <- c(3,6,2,5,2,4,9)
v3 <- c(6,1,4,2,3,7,5)
dataset <-data.frame(v1,v2,v3)
row.names(dataset) <-c('New York', 'Seattle', 'Washington DC', 'Dallas', 'Chicago','Los Angeles','Minneapolis')
I have ran my principal component analysis, and successfully plotted it:
pca=prcomp(dataset,scale=TRUE)
plot(pca$x[,1], pca$x[,2],
xlab="First PC",ylab="Second PC")
text(pca$x[,1], pca$x[,2],cex=0.7,pos=3,col="darkgrey")
What I want to do however is colour code my data points based on the city, which is the row names of my dataset. I also want to use these cities (i.e. rownames) as labels.
I've tried the following, but neither have worked:
## attempt 1 - I get row labels, but no chart
plot(pca$x[,1], pca$x[,2],col=rownames(dataset),pch=rownames(dataset),
xlab="First PC",ylab="Second PC")
text(pca$x[,1], pca$x[,2],labels=rownames(dataset),cex=0.7,pos=3,col="darkgrey")
## attempt 2
datasetwithcity = rownames_to_column(dataset, var = "city")
head(datasetwithcity)
OnlyCities=datasetwithcity[,1]
OnlyCities
# this didn't work:
City_Labels=as.numeric(OnlyCities)
head(City_Labels)
# gets city labels, but loses points and no colour
plot(pca$x[,1], pca$x[,2],col=City_Labels,pch=City_Labels,
xlab="First PC",ylab="Second PC")
text(pca$x[,1], pca$x[,2],labels=rownames(dataset),
cex=0.7,pos=3,col="darkgrey")
There are many different ways to do this.
In base R, you could do:
plot(pca$x[,1], pca$x[,2],
xlab="First PC",ylab="Second PC", col = seq(nrow(pca$x)),
xlim = c(-2.5, 2.5), ylim = c(-2, 2))
text(pca$x[,1], pca$x[,2],cex=0.7,pos=3,col="darkgrey")
text(x = pca$x[,1], y = pca$x[,2], labels = rownames(pca$x), pos = 1)
Personally, I think the resulting aesthetics are nicer (and more easy to change to suit your needs) with ggplot. The code is also a bit easier to read once you get used to the syntax.
library(ggplot2)
df <- as.data.frame(pca$x)
df$city <- rownames(df)
ggplot(df, aes(PC1, PC2, color = city)) +
geom_point(size = 3) +
geom_text(aes(label = city) , vjust = 2) +
lims(x = c(-2.5, 2.5), y = c(-2, 2)) +
theme_bw() +
theme(legend.position = "none")
Created on 2021-10-28 by the reprex package (v2.0.0)
To start off, this is my dput:
structure(list(Income = 18000, Rent = 7300, Wifi = 477, Gas = 900,
MTR_Bus = 600, Food = 3000, Total_Expenses = 12277, Remaining_Income = 5723), class = "data.frame", row.names = c(NA, -1L))
That comes up with a data frame like this:
I'm trying to create a simple pie chart in R using this budget, though I just need the expenses and income (in other words, I don't need the variables "Total Expenses" or "Remaining Income").
My issue is that the best I can come up with is something like this:
bar <- ggplot(data = Budget) + geom_bar(mapping = aes(x = Total_Expenses, fill = row))+coord_polar()
I guess my question is two-fold: 1) is this the correct structure for my code and 2) what should I be using for x or fill? I didn't really have much of a good answer from my book I was using.
Thanks for any help you can give!
Reshape you data using e.g. tidyr::pivot_longer():
library(tidyr)
library(dplyr)
library(ggplot2)
budget_pie <- Budget %>%
pivot_longer(everything()) %>%
filter(!grepl("^(Total|Remaining)", name))
ggplot(data = budget_pie, aes(x = "", y = value, fill = name)) +
geom_col() +
coord_polar("y", start = 0)
My problems seems simple, I am using ggplot2 with geom_jitter() to plot a variable. (take my picture as an example)
Jitter now adds some random noise to the variable (the variable is just called "1" in this example) to prevent overplotting. So I have now random noise in the y-direction and clearly what otherwise would be completely overplotted is now better visible.
But here is my question:
As you can see, there are still some points, that overplot each other. In my example here, this could be easily prevented, if it wouldn't be random noise in y-direction... but somehow more strategically placed offsets.
Can I somehow alter the geom_jitter() behavior or is there a similar function in ggplot2 that does exactly this?
Not really a minimal example, but also not too long:
library("imputeTS")
library("ggplot2")
data <- tsAirgap
# 2.1 Create required data
# Get all indices of the data that comes directly before and after an NA
na_indx_after <- which(is.na(data[1:(length(data) - 1)])) + 1
# starting from index 2 moves all indexes one in front, so no -1 needed for before
na_indx_before <- which(is.na(data[2:length(data)]))
# Get the actual values to the indices and put them in a data frame with a label
before <- data.frame(id = "1", type = "before", input = na_remove(data[na_indx_before]))
after <- data.frame(id = "1", type = "after", input = na_remove(data[na_indx_after]))
all <- data.frame(id = "1", type = "source", input = na_remove(data))
# Get n values for the plot labels
n_before <- length(before$input)
n_all <- length(all$input)
n_after <- length(after$input)
# 2.4 Create dataframe for ggplot2
# join the data together in one dataframe
df <- rbind(before, after, all)
# Create the plot
gg <- ggplot(data = df) +
geom_jitter(mapping = aes(x = id, y = input, color = type, alpha = type), width = 0.5 , height = 0.5)
gg <- gg + ggplot2::scale_color_manual(
values = c("before" = "skyblue1", "after" = "yellowgreen","source" = "gray66"),
)
gg <- gg + ggplot2::scale_alpha_manual(
values = c("before" = 1, "after" = 1,"source" = 0.3),
)
gg + ggplot2::theme_linedraw() + theme(aspect.ratio = 0.5) + ggplot2::coord_flip()
So many good suggestions...here is what Bens suggestion would look like for my example:
I changed parts of my code to:
gg <- ggplot(data = df, aes(x = input, color = type, fill = type, alpha = type)) +
geom_dotplot(binwidth = 15)
Would basically also work as intended for me. ggbeeplot as suggested by Jon also worked great for my purpose.
I thought of a hack I really like, using ggrepel. It's normally used for labels, but nothing preventing you from making the label into a point.
df <- data.frame(x = rnorm(200),
col = sample(LETTERS[1:3], 200, replace = TRUE),
y = 1)
ggplot(df, aes(x, y, label = "●", color = col)) + # using unicode black circle
ggrepel::geom_text_repel(segment.color = NA,
box.padding = 0.01, key_glyph = "point")
A downside of this method is that ggrepel can take a lot time for a large number of points, and will recalculate differently each time you change the plot size. A faster alternative would be to use ggbeeswarm::geom_quasirandom, which uses a deterministic process to define jitter that looks random.
ggplot(df, aes(x,y, color = col)) +
ggbeeswarm::geom_quasirandom(groupOnX = FALSE)
I was attempting to overlay two plots using ggplot2, I can graph them individually, but I want to overlay them to show a comparison. They have the same y axis. The y axis is a score from 0 to 100, the x axis is a specific date in the month (from a range of 3 weeks)
Here is what I have tried:
data <- read.table(text = Level5avg, header = TRUE)
data2 <- read.table(text = Level6avg, header = TRUE)
colnames(data) = c("x","y")
colnames(data2) = c("x","y")
ggplot(rbind(data.frame(data2, group="a"), data.frame(data, group="b")), aes(x=x,y=y)) +
stat_density2d(geom="tile", aes(fill = group, alpha=..density..), contour=FALSE) + scale_fill_manual(values=c("b"="#FF0000", "a"="#00FF00")) + geom_point() + theme_minimal()
When I do this, I get a strange graph that has several dots, but I'm not sure if my code is right, since I can't distinguish the data. I want to add 3 more (small) datasets to the plot, if it is possible. If it is possible, how do I make it into a line graph in order to distinguish the datasets?
Note: I was under the impression ggplot would work for my purposes because of this post (and several other posts on this site advised using ggplot as opposed to Lattice). I'm not sure if what I want is possible, so I came here.
Data sets:
dput(data) structure(list(x = structure(1:6, .Label = c("10/27/2015",
"10/28/2015",
"10/29/2015", "10/30/2015", "10/31/2015", "11/1/2015"), class = "factor"),
y = c(0, 12.5, 0, 0, 11, 43)), .Names = c("x", "y"), class = "data.frame",
row.names = c(NA, -6L))
dput(data2) structure(list(x = structure(1:3, .Label
=c("10/28/2015","10/31/2015",
"11/1/2015"), class = "factor"), y = c(0, 0, 41.5)), .Names = c("x",
"y"), class = "data.frame", row.names = c(NA, -3L))
I've now managed to get my overlay, but is there a way to organize the horizontal axis? The dates have no order.
It seems to me that the answer that you are basing your plots on uses density plots that are not useful for your data. If you are just looking for some line plots with points, you could do the following (note I created a dataframe outside of the ggplot() call to make it look a little cleaner):
data$group <- "b"
data2$group <- "a"
df <- rbind(data2,data)
df$x <- as.Date(df$x,"%m/%d/%Y")
ggplot(df,aes(x=x,y=y,group=group,color=group)) + geom_line() +
geom_point() + theme_minimal()
Note that by converting the date, the dates end up in the right order all on their own.
I've been able to successfully create a dotpot in ggplot for percentages across gender. But, I want to highlight the significant differences. I thought I could do this with a combination of subsetting and the use of last_plot().
Here’s my data:
require(ggplot2)
require(reshape2)
prog <- c("Honors", "Academic", "Social", "Media")
m <- c(30,35,40,23)
f <- c(25,40,45,15)
s <- c(0.7, 0.4, 0.1, 0.03)
temp <- as.data.frame(cbind(prog, m, f, s), stringsAsFactors=FALSE)
first <- temp[,1:3]
first.melt <- melt(first, id.vars = 'prog', variable.name = 'Gender', value.name = 'Percent')
first.melt <- as.data.frame(cbind(first.melt,temp[,4]), , stringsAsFactors=FALSE)
names(first.melt) <- c("program", "Gender", "Percent", "sig")
first.melt$program <- as.factor(first.melt$program)
Here’s where I reverse order my Program variable, so that when graphed if will be alphabetical from top to bottom.
first.melt[,1] = with(first.melt, factor(first.melt[,1], levels = rev(levels(first.melt[,1]))))
first.melt$sig <- as.numeric(as.character(first.melt$sig))
first.melt$Percent <- as.numeric(as.character(first.melt$Percent))
Now, I subset...
first.melt.ns <- subset(first.melt,sig > 0.05)
first.melt.sig <- subset(first.melt,sig <= 0.05)
ggplot(first.melt.ns, aes(program, y=Percent, shape=Gender)) +
geom_point(size=3) +
coord_flip() +
scale_shape_manual(values=c("m"=1, "f"=5))
The first run at ggplot get’s me my non-significant Program pairs – and it’s in the right order – so, I add my the two new points for male and female (making them solid, to draw attention as a significant pair):
last_plot() +
geom_point(data=first.melt.sig, aes(program[Gender=="m"], y=Percent[Gender=="m"]), size=3, shape=19) +
geom_point(data=first.melt.sig, aes(program[Gender=="f"], y=Percent[Gender=="f"]),size=4, shape=18)
The points get added just fine – ggplot works. But notice my Program axis – it’s correct, but reversed now.
First, you really should avoid as.data.frame(cbind(...)). It is dramatically increasing the amount of work necessary to prepare your data. The function for creating data frames is (naturally) data.frame. Use it!
What you're doing here is basically trying to get around the limitation of only having one shape scale. It's probably easiest to just do this:
temp <- data.frame(prog,m,f,s)
first <- temp[,1:3]
first.melt <- melt(first, id.vars = 'prog', variable.name = 'Gender', value.name = 'Percent')
first.melt$sig <- rep(temp$s,times = 2)
first.melt[,1] = with(first.melt, factor(first.melt[,1], levels = rev(levels(first.melt[,1]))))
first.melt.sig <- subset(first.melt,sig < 0.05)
first.melt$Percent[first.melt$sig < 0.05] <- NA
ggplot() +
geom_point(data = first.melt,aes(x = prog,y = Percent,shape = Gender),size = 3) +
geom_point(data = first.melt.sig[1,],aes(x = prog,y = Percent),shape = 19) +
geom_point(data = first.melt.sig[2,],aes(x = prog,y = Percent),shape = 18) +
coord_flip() +
scale_shape_manual(values=c("m"=1, "f"=5))
In general, work to structure your ggplot code so that you're subsetting data frames, not variables inside of aes. That gets both tricky and dangerous, because ggplot is assuming certain things about what you pass inside of aes in order for the evaluation to work properly.