So i want to create a stacked bar chart, with frequency counts printed for each
fill factor.
Showing data values on stacked bar chart in ggplot2
This question places the counts in the center of each segment, but the user specifies the values. In this example we dont input the specific value, and I am seeking an r function that automatically calcualtes counts.
Take the following data for example.
set.seed(123)
a <- sample(1:4, 50, replace = TRUE)
b <- sample(1:10, 50, replace = TRUE)
data <- data.frame(a,b)
data$c <- cut(data$b, breaks = c(0,3,6,10), right = TRUE,
labels = c ("M", "N", "O"))
head(data)
ggplot(data, aes(x = a, fill = c)) + geom_bar(position="fill")
So I want to print a "n= .." for M,N and O value in 1,2,3 and 4
So the end result looks like
Similiar to this question, however we do not have fr
Try the following:
obj <- ggplot_build(p)$data[[1]]
# some tricks for getting centered the y-positions:
library(dplyr)
y_cen <- obj[["y"]]
y_cen[is.na(y_cen)] <- 0
y_cen <- (y_cen - lag(y_cen))/2 + lag(y_cen)
y_cen[y_cen == 0 | y_cen == .5] <- NA
p + annotate("text", x = obj[["x"]], y = y_cen, label = paste("N = ", obj[["count"]]))
Which gives:
Related
I am working with a large network and wish too highlight certain nodes. I would like these nodes to plot on top of a dense network. They currently are identified by a certain color. Here is some simple example code.
library(network)
library(GGally)
# make a random network
x <- c(0,1,0,1,1,1,0,1,0,1,0,1)
seed <- c(10,25,40,34,1,35,6,3,14,5,23,3)
net <- data.frame(matrix(nrow = 12, ncol = 12))
for (i in 1:12) {
set.seed(seed[i])
net[i] <- sample(x)
}
#plot it with two colors
plot = as.network(net,
directed = FALSE,
ignore.eval = FALSE,
names.eval = 'R_val')
color <- c("yes","yes","no","no","no","no","no","no","no","no","no","no")
final <- ggnet2(net,size = 25,color = color,label = TRUE)
I have really exaggerated the dot size here to make them overlap. Is there a way I can get the "yes" points to always plot on top of the "no" points?
EDIT: Added "labels" for clarity.
Yes, there is! Your color vector first denotes the "yes" and then the "no", which seems to determine the plotting order. Assuming you have more than "yes" or "no", you could try convert the color vector to a factor and set levels. Then you can sort the order of your "yes"s and "no"s:
color <- c("yes","yes","no","no","no","no","no","no","no","no","no","no")
factor_color <- sort(factor(color, levels = c("no", "yes")))
ggnet2(net, size = 100, color = factor_color)
EDIT 1
As per your comment, I cannot think of a (more) elegant solution, but this works for me:
#plot it with two colors
plot = as.network(net,
directed = FALSE,
ignore.eval = FALSE,
names.eval = 'R_val')
color <- c("yes","yes","no","no","no","no","no","no","no","no","no","no")
final <- ggnet2(net,size = 100, color = color, label = TRUE)
final_build <- ggplot2::ggplot_build(final)
# Extract the geom_point data and find which elements have 'yes'
yes_index <- which(color == "yes")
label_data <- final_build$data[[2]]
yes_coordinates_label <- cbind(label_data[yes_index,], label = names(net)[yes_index])
final +
geom_point(data = yes_coordinates_label, aes(x = x, y = y),
size = 100, color = first(yes_coordinates_label$colour)) +
geom_text(data = yes_coordinates_label, aes(x = x, y = y, label = label))
The idea is to plot the dots with geom_point() again but only for the dots which are "yes".
EDIT 2
I couldn't help but think of another solution without plotting the points again. It is possible to retrieve the plot information using ggplot_build() and then to reorder the hierarchy of the points drawn; the datapoints which come first are drawn first. Hence doing the following will work:
library(tidyverse)
# Find the index of the GeomPoint layer
geom_types <- final$layers %>% map("geom") %>% map(class)
GeomPoint_ind <- which(sapply(geom_types, function(x) "GeomPoint" %in% x))
# Retrieve plot information
final_build <- ggplot2::ggplot_build(final)
df <- final_build$data[[GeomPoint_ind]]
# Set the indices which you would like to have on top and modify the ggplot_build object.
yes_index <- which(color == "yes")
final_build$data[[2]] <- rbind(df[-yes_index,], df[yes_index,])
# Convert to plot object and plot
new_final <- ggplot_gtable(final_build)
plot(new_final)
I'm not able to plot7 time series one on top of the other using ggplot. Why does this reproducible code not work? signal is a factor variable with 7 values spanning 700 values (100 values each), yet somehow the values will only plot if I change the x in aes() to be 1:700. I'd like each signal to plot from 1 to 100. Why isn't that happening?
signal_to_noise_ratio = 10
t=seq(0.1,10,0.1)
df <- data.frame(truesignal = sin(t))
df2 <- df
for (i in seq(5)) {
noise = rnorm(t)
k <- sqrt(var(t)/(signal_to_noise_ratio*var(noise)))
data_wNoise = t + k*noise
df2[,i] = sin(data_wNoise)
}
df[,2:6] = df2
df[,2:7] = rowSums(df2)
colnames(df) <- c("truesignal", "noisy1", "noisy2", "noisy3", "noisy4", "noisy5",
"stacked")
melt_df <- melt(df,measure.vars = 1:7, variable.name=c("signal"))
ggplot(data=melt_df,
aes(x=t,y=value,colour=factor(signal))) +
geom_path() +
facet_grid(signal~.)
You probably want something like an id variable.
melt_df$t.2 <- rep(1:100, 7)
library(ggplot2)
ggplot(data=melt_df,
aes(x=t.2, y=value, colour=factor(signal))) +
geom_path() +
facet_grid(signal~.)
Yields:
I am trying to create a stacked bar graph in R. I need the bar graph to display three things:
Y axis = Count
X axis bars = Passed Driving Test (Yes or No)
Colours within X axis bars Owns a car (Yes or No)
So my desired output is:
However, my actual output is:
My code so far is:
carData <- read.csv(file="~/Desktop/carData.csv",head=TRUE,sep=";")
ggplot(carData, aes(x = passed.test, fill = owns.car)) + geom_bar()
The passed.test values in the CSV file are either 1 or 0. (1 = passed ,0 = not passed)
The owns.car values in the CSV file are either 1 or 0. (1 = owns a car, 0 = doesn't own a car)
How do I:
A. Add colours to the bar graph to show the second variable (Owns a car = Yes or No)
B. Change the X axis to be 'Yes' and 'No', rather than -0.5 -1.5
You want to make both those columns into factors. Otherwise, numeric values are assumed to be continuous, so when geom_bar counts observations of each value, it doesn't make a whole lot of sense for the levels of owns.car to be continuous.
library(tidyverse)
set.seed(1234)
carData <- tibble(
passed.test = sample(c(0, 1), 100, replace = T),
owns.car = sample(c(0, 1), 100, replace = T)
)
cars_factors <- mutate_all(carData, as.factor)
ggplot(cars_factors, aes(x = passed.test, fill = owns.car)) +
geom_bar() +
scale_x_discrete(labels = c("No", "Yes")) +
scale_fill_discrete(labels = c("No", "Yes"))
Created on 2018-04-28 by the reprex package (v0.2.0).
I have this code.
a = c("a", 1)
b = c("b",2)
c = c('c',3)
d = c('d',4)
e = c('e',5)
z = data.frame(a,b,c,d,e)
hist = hist(as.numeric(z[2,]))
I am trying to have a histogram such that the bins would be a,b,c,d,e
and the freq values would be 1,2,3,4,5.
However, it gives me an empty screen(no bins at all for histogram model)
You are plotting the factor levels of each column for row 2, which is in this case always 1.
When creating the dataframe you add stringsAsFactors=FALSE to avoid converting the numbers to factors. This should work:
z = data.frame(a,b,c,d,e,stringsAsFactors=FALSE)
hist(as.numeric(z[2,]))
Perhaps this would work for you: it creates a data frame with the x elements being the letters a through 'e', and the y elements being the numbers 1 through 5. It then renders a histogram and tells ggplot not to perform any binning.
library(ggplot2)
tmp <- data.frame(x = letters[1:5], y = 1:5)
ggplot(tmp, aes(x = x, y = y)) + geom_histogram(stat = "identity")
I'm trying to write a custom scatterplot matrix function in ggplot2 using facet_grid. My data have two categorical variables and one numeric variable.
I'd like to facet (make the scatterplot rows/cols) according to one of the categorical variables and change the plotting symbol according to the other categorical.
I do so by first constructing a larger dataset that includes all combinations (combs) of the categorical variable from which I'm creating the scatterplot panels.
My questions are:
How to use geom_rect to white-out the diagonal and upper panels in facet_grid (I can only make the middle ones black so far)?
How can you move the titles of the facets to the bottom and left hand sides respectively?
How does one remove tick axes and labels for the top left and bottom right facets?
Thanks in advance.
require(ggplot2)
# Data
nC <- 5
nM <- 4
dat <- data.frame(
Control = rep(LETTERS[1:nC], nM),
measure = rep(letters[1:nM], each = nC),
value = runif(nC*nM))
# Change factors to characters
dat <- within(dat, {
Control <- as.character(Control)
measure <- as.character(measure)
})
# Check, lapply(dat, class)
# Define scatterplot() function
scatterplotmatrix <- function(data,...){
controls <- with(data, unique(Control))
measures <- with(data, unique(measure))
combs <- expand.grid(1:length(controls), 1:length(measures), 1:length(measures))
# Add columns for values
combs$value1 = 1
combs$value2 = 0
for ( i in 1:NROW(combs)){
combs[i, "value1"] <- subset(data, subset = Control==controls[combs[i,1]] & measure == measures[combs[i,2]], select = value)
combs[i, "value2"] <- subset(data, subset = Control==controls[combs[i,1]] & measure == measures[combs[i,3]], select = value)
}
for ( i in 1:NROW(combs)){
combs[i,"Control"] <- controls[combs[i,1]]
combs[i,"Measure1"] <- measures[combs[i,2]]
combs[i,"Measure2"] <- measures[combs[i,3]]
}
# Final pairs plot
plt <- ggplot(combs, aes(x = value1, y = value2, shape = Control)) +
geom_point(size = 8, colour = "#F8766D") +
facet_grid(Measure2 ~ Measure1) +
ylab("") +
xlab("") +
scale_x_continuous(breaks = c(0,0.5,1), labels = c("0", "0.5", "1"), limits = c(-0.05, 1.05)) +
scale_y_continuous(breaks = c(0,0.5,1), labels = c("0", "0.5", "1"), limits = c(-0.05, 1.05)) +
geom_rect(data = subset(combs, subset = Measure1 == Measure2), colour='white', xmin = -Inf, xmax = Inf,ymin = -Inf,ymax = Inf)
return(plt)
}
# Call
plt1 <- scatterplotmatrix(dat)
plt1
I'm not aware of a way to move the panel strips (the labels) to the bottom or left. Also, it's not possible to format the individual panels separately (e.g., turn off the tick marks for just one facet). So if you really need these features, you will probably have to use something other than, or in addition to ggplot. You should really look into GGally, although I've never had much success with it.
As far as leaving some of the panels blank, here is a way.
nC <- 5; nM <- 4
set.seed(1) # for reproducible example
dat <- data.frame(Control = rep(LETTERS[1:nC], nM),
measure = rep(letters[1:nM], each = nC),
value = runif(nC*nM))
scatterplotmatrix <- function(data,...){
require(ggplot2)
require(data.table)
require(plyr) # for .(...)
DT <- data.table(data,key="Control")
gg <- DT[DT,allow.cartesian=T]
setnames(gg,c("Control","H","x","V","y"))
fmt <- function(x) format(x,nsmall=1)
plt <- ggplot(gg, aes(x,y,shape = Control)) +
geom_point(subset=.(as.numeric(H)<as.numeric(V)),size=5, colour="#F8766D") +
facet_grid(V ~ H) +
ylab("") + xlab("") +
scale_x_continuous(breaks=c(0,0.5,1), labels=fmt, limits=c(-0.05, 1.05)) +
scale_y_continuous(breaks=c(0,0.5,1), labels=fmt, limits=c(-0.05, 1.05))
return(plt)
}
scatterplotmatrix(dat)
The main feature of this is the use of subset=.(as.numeric(H)<as.numeric(V)) in the call to geom_point(...). This subsets the dataset so you only get a point layer when the condition is met, e.g. in facets where is.numeric(H)<is.numeric(V). This works because I've left the H and V columns as factors and is.numeric(...) operating on a factor returns the levels, not the names.
The rest is just a more compact (and much faster) way of creating what you called comb.