I'm new to R and am just trying to graph some plots using the ggplot2 function that I just learned with the palmer penguins dataset. I want to create a simple bar graph with species at the x-axis and average body mass at the y-axis. I begin by grouping the penguins by species and calculating the average as follow:
avg_body_mass <- penguins %>%
drop_na(body_mass_g) %>%
group_by(species) %>%
summarize(mean(body_mass_g))
This results in a new data frame that looks like this:
[1]: https://i.stack.imgur.com/Xz24r.png
The first problem is: How do I change the name of the average body mass column so it is not called "mean(body_mass_g)"?
The following is my code for the graph:
ggplot(data = penguins) + geom_bar(mapping = aes(x = species, y = avg_body_mass))
Alternatively, I also tried this:
ggplot(data = avg_body_mass) + geom_bar(mapping = aes(x = species, y = mean(body_mass_g)))
I don't think I should put a mean function in the geom_bar, but that's the name of the column and I don't know how to change it.
Related
I have the following bar plot created using R ggplot. How do I dynamically update the distances between the bars on the plot using the 'distance' column of the same data frame.
library(tidyverse)
data.frame(name = c("A","B","C","D","E"),
value = c(34,45,23,45,75),
distance = c(3,4,1,2,5)) %>%
ggplot(aes(x = name, y = value)) +
geom_col()
I have a grouped boxplot that shows for each category two boxes side by side (see code). Now I am interested in adding the mean for each category and box separately. I can calculate and visualize the mean for each category but not conditioned on the grouped variable "year". I tried to calculate the means for each year individually and add them separately, but that did not work.
data(mpg, package = "ggplot2")
library(latticeExtra)
tmp <- tapply(mpg$hwy, mpg$class, FUN =mean)
bwplot(class~hwy, data = mpg, groups = year,
box.width = 1/3,
panel = panel.superpose,
panel.groups = function(x, y,..., group.number) {
panel.bwplot(x,y + (group.number-1.5)/3,...)
panel.points(tmp, seq(tmp),...)
}
)
Which produces the following plot:
The example is based on: Grouped horizontal boxplot with bwplot
Can someone show how to do this if possible using Lattice graphics? Because all my plots in my master thesis are based on it.
If you want to consider a last option, you can try with ggplot2. Here the code where the red points belong to means:
library(ggplot2)
library(dplyr)
#Data
data(mpg, package = "ggplot2")
#Compute summary for points
Avg <- mpg %>% group_by(class,year) %>%
summarise(Avg=mean(hwy))
#Plot
ggplot(data = mpg, aes(x = class, y = hwy, fill = factor(year))) +
geom_boxplot(alpha=.25) +
geom_point(data=Avg,aes(x = class, y = Avg,color=factor(year)),
position=position_dodge(width=0.9),show.legend = F)+
scale_color_manual(values = c('red','red'))+
coord_flip()+
labs(fill='Year')+
theme_bw()
Output:
average
Young 0.01921875
Cohoused Young 0.07111951
Old 0.06057224
Cohoused Old 0.12102273
I am using the above data frame to create a histogram or bar and my code is as follows:
C <-ggplot(data=c,aes(x=average))
C + geom_bar()
but the plot is attached here.
I would like the bar heights to reflect my data on the y axis instead of where the bar is placed on the x axis, but I don't know what my problem is in the code.
We can create a column with rownames_to_column
library(dplyr)
library(tibble)
library(ggplot2)
c %>%
rownames_to_column('rn') %>%
ggplot(aes(x = rn, y = average)) +
geom_col()
Or create a column directly in base R
c$rn <- row.names(c)
ggplot(c, aes(x = rn, y = average)) +
geom_col()
Or as #user20650 suggested
ggplot(data=c,aes(x=rownames(c) , y=average))
NOTE: It is better not to name objects with function names (c is a function)
In base R, with barplot, we can directly get the plots
barplot(as.matrix(c))
I can't quite figure this out. A CSV of 200+ rows assigned to data like so:
gid,bh,p1_id,p1_x,p1_y
90467,R,543333,80.184,98.824
90467,L,408045,74.086,90.923
90467,R,543333,57.629,103.797
90467,L,408045,58.589,95.937
Trying to group by p1_id and plot the mean values for p1_x and p1_y:
grp <- data %>% group_by(p1_id)
Trying to plot geom_point objects like so:
geom_point(aes(mean(grp$p1_x), mean(grp$p1_y), color=grp$p1_id))
But that isn't showing unique plot points per distinct p1_id values.
What's the missing step here?
Why not calculate the mean first?
library(dplyr)
grp <- data %>%
group_by(p1_id) %>%
summarise(mean_p1x = mean(p1_x),
mean_p1y = mean(p1_y))
Then plot:
library(ggplot2)
ggplot(grp, aes(x = mean_p1x, y = mean_p1y)) +
geom_point(aes(color = as.factor(p1_id)))
Edit: As per #eipi10, you can also pipe directly into ggplot
data %>%
group_by(p1_id) %>%
summarise(mean_p1x = mean(p1_x),
mean_p1y = mean(p1_y)) %>%
ggplot(aes(x = mean_p1x, y = mean_p1y)) +
geom_point(aes(color = as.factor(p1_id)))
I have the following code :
library(ggplot2)
ggplot(data = diamonds, aes(x = cut)) +
geom_bar()
with this result.
I would like to sort the graph on the count descending.
There are multiple ways of how to do it (it is probably possible just by using options within ggplot). But a way using dplyr library to first summarize the data and then use ggplot to plot the bar chart might look like this:
# load the ggplot library
library(ggplot2)
# load the dplyr library
library(dplyr)
# load the diamonds dataset
data(diamonds)
# using dplyr:
# take a dimonds dataset
newData <- diamonds %>%
# group it by cut column
group_by(cut) %>%
# count number of observations of each type
summarise(count = n())
# change levels of the cut variable
# you tell R to order the cut variable according to number of observations (i.e. count variable)
newData$cut <- factor(newData$cut, levels = newData$cut[order(newData$count, decreasing = TRUE)])
# plot the ggplot
ggplot(data = newData, aes(x = cut, y = count)) +
geom_bar(stat = "identity")