How do I get both my columns into a Bar Graph? - r

I printed a matrix and I have two columns which I wish to get into a bar graph and I don't know how. When I tried it, it used the whole matrix as the X value. Here is my code
smoke <- matrix(c(53,42,40,40,39,34,34,30,28,24,22,21,20,16,'A','A','A','B','A','A','A','A','B','A','C','B','B', 'B'),nrow=14, ncol = 2)
colnames(smoke) <- c("NumberofBooks","Grade")
smoke <- as.table(smoke)
smoke
Any help into plotting the numbers against the letters would be greatly appreciated!

smoke <-
matrix(c(53,42,40,40,39,34,34,30,28,24,22,21,20,16,'A','A','A','B','A','A','A','A','B','A','C','B','B', 'B'),nrow=14, ncol = 2)
colnames(smoke) <- c("NumberofBooks","Grade")
smoke <- as.data.frame(smoke)
p <- ggplot(data = smoke, aes(y = NumberofBooks, x = Grade))
p <- (p
+ geom_bar(stat = 'Identity', position = 'dodge', color = 'white')
)
print(p)
Not exactly sure what you're looking for, but this can get you started. You'll need to change to a data.frame to use ggplot

In my opinion, the easiest way to put those values into a bar graph is just replacing matrix with data frame. Just like here:
number_of_books <- c(53,42,40,40,39,34,34,30,28,24,22,21,20,16)
grade <- c('A','A','A','B','A','A','A','A','B','A','C','B','B', 'B')
smoke <- data.frame(number_of_books,grade)
bar_graph <- ggplot(smoke, aes(number_of_books,grade)) + geom_bar(stat="identity")
Hope it'll help you.

Does this do what you are looking for?
library(dplyr)
library(ggplot2)
# install.packages("dplyr"); install.packages("ggplot2")
smoke <- matrix(c(53,42,40,40,39,34,34,30,28,24,22,21,20,16,'A','A','A','B','A','A','A',' A','B','A','C','B','B', 'B'),nrow=14, ncol = 2)
colnames(smoke) <- c("NumberofBooks","Grade")
smoke %>%
as.data.frame() %>%
group_by(Grade) %>%
summarise(NumberofBooks = mean(as.numeric(NumberofBooks))) %>%
ggplot(aes(x = Grade, y = NumberofBooks)) +
geom_bar(stat = "identity") +
xlab("Grade") +
ylab("Average Number of Books") +
ggtitle("Average number of book by Grade")

Related

Is there a way to add legend and count to each level for geom_point?

Is there a way to add a legend with the count to give density of each row?
Or an easier way to show it?
Thanks very much!
Couldn't even get a legend added :)
Code I used:
data %>%
ggplot(aes(x = subscribed, y = campaign)) +
geom_point () +
geom_jitter()
You could per group (subscribed) create a label which is calculated beforehand the number of n() observations and assign these as a column string. This can be used in the aes to make sure it is shown in the legend. Here is a reproducible example:
library(dplyr)
library(ggplot2)
df %>%
group_by(subscribed) %>%
mutate(count = paste0(subscribed, ' (n = ', n(), ')')) %>%
ggplot(aes(subscribed, campaign, colour = factor(count))) +
geom_jitter()
Created on 2023-01-12 with reprex v2.0.2
Created data:
df <- data.frame(campaign = runif(100),
subscribed = rep(c("no", "yes"), 50))
I found another way to show similar data to this, in a more clear manner.
However, I couldn't figure out the legend lol
The code I used was :
p <- ggplot(data = data, aes(x = subscribed, y = pdays)) +
geom_count() + scale_size_continuous(range = c(7, 30))
p + geom_text(data = ggplot_build(p)$data[[1]],
aes(x, y, label = n), color = "#ffffff") +
scale_y_continuous(breaks = seq(0, 30, by = 4))

Add macron to letter in faceting label

As the title says, I want to add a macron to a faceting label. An example:
library(tidyverse)
# subset data
df2 <- diamonds %>%
sample_n(500)
# plot
ggplot(df2,aes(x = carat, y = price)) +
geom_point() +
facet_wrap(~cut)
Now I want to add a macron over the a in Fair
# attempt to recode Fair to Fāir
df2 <- df2 %>%
mutate(cut2 = fct_recode(cut, "F\u0101ir" = "Fair"))
# doesn't work - produces exactly the same plot as above.
ggplot(df2,aes(x = carat, y = price)) +
geom_point() +
facet_wrap(~cut2)
Any tips would be greatly appreciated.
Looks like it's a problem with fct_recode rather than ggplot2. This seems to work just fine
df2 <- diamonds %>%
sample_n(500)
df2$cut2 <- df2$cut
levels(df2$cut2)[1] <- "F\u0101ir"
ggplot(df2,aes(x = carat, y = price)) +
geom_point() +
facet_wrap(~cut2)
Actually I guess it has to do with all parameter names in R. It doesn't look like you can use unicode names (at least not in 4.0.5 which I tested with)
foo <- function(...) {
print(match.call())
}
foo("F\u0101ir" = 1)
# foo(Fair = 1)
foo(Fāir = 1)
# foo(Fair = 1)
Seems the values are just converted to ASCII

Assigning many line colors based on group in ggplot

Suppose I have some code like the following, generating a lineplot with a considerable number of lines (example taken from here)
library(ggplot2)
library(reshape2)
n = 1000
set.seed(123)
mat = matrix(rnorm(n^2), ncol=n)
cmat = apply(mat, 2, cumsum)
cmat = t(cmat)
rownames(cmat) = paste("trial", seq(n), sep="")
colnames(cmat) = paste("time", seq(n), sep="")
dat = as.data.frame(cmat)
dat$trial = rownames(dat)
mdat = melt(dat, id.vars="trial")
mdat$time = as.numeric(gsub("time", "", mdat$variable))
p = ggplot(mdat, aes(x=time, y=value, group=trial)) +
theme_bw() +
theme(panel.grid=element_blank()) +
geom_line(size=0.2, alpha=0.1)
So here, "trial number" is my group producing all of these lines, and there are 1000 trials.
Suppose I want to "group my grouping variable" now - that is, I want to see the exact same lines in this plot, but I want the first 500 trial lines to be one color and the next 500 trial lines to be another. How can I do this with ggplot? I've been poking around for some time and I can't figure out how to manually set the colors per group.
Add a variable splitting the data into two groups, then add use it to color the lines in ggplot
dat = as.data.frame(cmat)
dat$trial = rownames(dat)
dat$group = rep(c("a","b"), each = n/2)
mdat = melt(dat, id.vars=c("trial", "group"))
mdat$time = as.numeric(gsub("time", "", mdat$variable))
p = ggplot(mdat, aes(x=time, y=value, group=trial, color = group)) +
theme_bw() +
theme(panel.grid=element_blank()) +
geom_line(size=0.2, alpha=0.1)
One possible solution will be to create a new column with the index of the trial number and then using an ifelse condition, you can set different group based on the trial number and pass the grouping variable as color in aes such as:
mdat %>% mutate(Trial = as.numeric(sub("trial","",trial))) %>%
mutate(Group = ifelse(Trial < 51,"A","B")) %>%
ggplot(aes(x=time, y=value, group=trial, color = Group)) +
theme_bw() +
theme(panel.grid=element_blank()) +
geom_line(size=0.2, alpha=0.8)
Is it what you are looking for ?
NB: I only use n = 100 to get smallest dataframe.

Color-coded PMF with legend in ggplot2

My goal is to produce two overlapping PMFs of binomial distributions using ggplot2, color-coded according to colors that I specify, with a legend at the bottom.
So far, I think I have set up the data frame right.
successes <- c(seq(0,10,1),seq(0,10,1))
freq <- c(dbinom(seq(0,10,1),10,0.2),dbinom(seq(0,10,1),10,0.8))
class <- c(rep(' A ',11),rep(' B ',11))
df1 <- data.frame(cbind(successes,freq,class))
However, this gives the wrong result.
library(ggplot2)
g <- ggplot(df1, aes(successes),y=freq)
g + geom_bar(aes(fill = class))
I feel like I'm following an example yet getting a totally different result. This (almost) does what I want: it would be exact if it gave relative frequencies.
g <- ggplot(mpg, aes(class))
g + geom_bar(aes(fill = drv))
A couple of questions:
1) Where am I going wrong in my block of code?
2) Is there a better way to show to PMFs in one graph? I'm not determined to use a histogram or bar chart.
3) How can I set this up to give me the ability to choose the colors?
4) How do I order the values on the x-axis? They aren't categories. They are the numbers 0-10 and have a natural order that I want to preserve.
Thanks!
UPDATE
The following two blocks worked.
successes <- c(seq(0,10,1),seq(0,10,1))
freq <- c(dbinom(seq(0,10,1),10,0.2),dbinom(seq(0,10,1),10,0.8))
class <- c(rep(' A ',11),rep(' B ',11))
df1 <- data.frame(successes,freq,class)
ggplot(df1, aes(successes ,y=freq, fill = class)) +
geom_bar(stat = "identity") +
scale_x_continuous(breaks = seq(0,10,1)) +
scale_fill_manual(values = c("blue", "green")) + theme_bw()
AND
successes <- c(seq(0,10,1),seq(0,10,1))
freq <- c(dbinom(seq(0,10,1),10,0.2),dbinom(seq(0,10,1),10,0.8))
class <- c(rep(' A ',11),rep(' B ',11))
df1 <- data.frame(successes,freq,class)
ggplot(df1, aes(x=successes,y=freq),y=freq) +
geom_col(aes(fill = class)) +
scale_x_continuous(breaks = seq(0,10,1)) +
scale_fill_manual(values = c("blue", "green")) + theme_bw()
I think your issue is that successes and freq are being changed to factors when you create df1
Maybe this is what you're thinking of?
successes <- c(seq(0,10,1),seq(0,10,1))
freq <- c(dbinom(seq(0,10,1),10,0.2),dbinom(seq(0,10,1),10,0.8))
class <- c(rep(' A ',11),rep(' B ',11))
df1 <- data.frame(successes = as.numeric(successes), freq = as.numeric(freq), class)
ggplot(df1, aes(x = successes, y = freq)) +
geom_bar(stat = "identity", aes(fill = class))
If not, happy to answer any further questions!
Is this what you're looking for?
library(ggplot2)
g <- ggplot(df1, aes(successes ,y=freq, fill = class))
g + geom_bar(stat = "identity") +
scale_fill_manual(values = c("blue", "green"))
Of course, keeping in mind you'd indeed change your dataframe creation to:
successes <- c(seq(0,10,1),seq(0,10,1))
freq <- c(dbinom(seq(0,10,1),10,0.2),dbinom(seq(0,10,1),10,0.8))
class <- c(rep(' A ',11),rep(' B ',11))
df1 <- data.frame(successes,freq,class)
as suggested in the comments.

Standard evaluation inside a function with dplyr

I have data with lots of factor variables that I am visualising to get a feel for each of the variables. I am reproducing a lot of the code with minor tweaks for variable names etc. so decided to write a function to simply things. I just can't get it to work...
Dummy Data
ID <- sample(1:32, 128, replace = TRUE)
AgeGrp <- sample(c("18-65", "65-75", "75-85", "85+"), 128, replace = TRUE)
ID <- factor(ID)
AgeGrp <- factor(AgeGrp)
data <- data_frame(ID, AgeGrp)
data
Basically what I am trying to do with each factor variable is produce a bar chart with labels of percentages inside the bars. For example with the dummy data.
plotstats <- #Create a table with pre-summarised percentages
data %>%
group_by(AgeGrp) %>%
summarise(count = n()) %>%
mutate(pct = count/sum(count)*100)
age_plot <- #Plot the data
ggplot(data,aes(x = AgeGrp)) +
geom_bar() + #Add the percentage labels using pre-summarised table
geom_text(data = plotstats, aes(label=paste0(round(pct,1),"%"),y=pct),
size=3.5, vjust = -1, colour = "sky blue") +
ggtitle("Count of Age Group")
age_plot
This works fine with the dummy data - but when I try to create a function...
basic_plot <-
function(df, x){
plotstats <-
df %>%
group_by_(x) %>%
summarise_(
count = ~n(),
pct = ~count/sum(count)*100)
plot <-
ggplot(df,aes(x = x)) +
geom_bar() +
geom_text(data = plotstats, aes(label=paste0(round(pct,1),"%"),
y=pct), size=3.5, vjust = -1, colour = "sky blue")
plot
}
basic_plot(data, AgeGrp)
I get the error code :
Error in UseMethod("as.lazy") : no applicable method for 'as.lazy' applied to an object of class "factor"
I have looked at questions here, here, and here and also looked at the NSE Vignette but can't find my fault.

Resources