Trouble with levels on continuous bar chart with ggplot2 - r

I have a dataset that looks something like this:
testSet <- data.table(date=as.Date(c("2013-07-02","2013-08-03","2013-09-04",
"2013-10-05","2013-11-06")),
yr = c(2013,2013,2013,2013,2013),
mo = c(07,08,09,10,11),
da = c(02,03,04,05,06),
plant = LETTERS[1:5],
product = letters[26:22],
rating = runif(5))
What I would like to do is plot 2 graphs using ggplot2.
The first would give me a dodged, continuous bar chart for all months that would have the product on the x, the ratings on the y, and the dates grouped and plotted on their respective products.
x = product
y = rating
Dodge = date
The second that I'm trying to create is a dodged, continuous bar chart for one month that would have the plant on the x, the ratings on the y, and the product grouped and plotted on their respective plants.
x = plant
y = rating
Dodge = product
I'm looking for an output that is very similar to this: http://docs.ggplot2.org/0.9.3/geom_bar-28.png but continuous.
I've had issues trying to figure out how the levels things works and haven't seen an example of a dodged, continuous chart.
Here is the code I have created so far:
testMean <- tapply(testSet$rating, list(testSet$mo), mean)
testLevels <- factor(levels(testSet$product,testSet$mo),
levels = levels(testSet$product,testSet$mo))
qplot(testLevels, aes(testMean, fill=cut)) +
geom_bar(position="dodge", stat="identity")
This is what the ggplot2 site says about creating a continuous bar chart, but it doesn't say anything about how to do it with multiple graphs overlayed on top of each other and then dodged, like in the one I linked to earlier. Here is their code:
meanprice <- tapply(diamonds$price, diamonds$cut, mean)
cut <- factor(levels(diamonds$cut), levels = levels(diamonds$cut))
qplot(cut, meanprice)
I appreciate the help, guys!

I ended up using the diamonds built in data set to get my question answered. All the thanks in the world to #carloscinelli for his assistance.
library(data.table)
data <- data.table(diamonds)[,list(mean_carat=mean(carat)), by=c('cut', 'color')]
Thanks!

Related

Pie Chart using variables with character names

I'm trying to create some pie charts showing the distribution of companies amongst regions and countries.
I'm getting an error saying 'x' values must be positive, which I think is because I'm trying to plot country names and it needs to be a number?
Any guidance on this would be really helpful
Summary: trying to make a pie chart of investor countries/regions to show their distribution (i.e. how many are in the UK, France, Germany etc)
Data: data
Main variables: investor, country/region
Any help with this code would be great!
Rory
try something on these lines
#demo data
investors <- paste0('investor', 1:100)
countries <- paste0('country', 1:5)
set.seed(1)
df <- data.frame(investors, countries = sample(countries, 100, T))
# pie chart code
library(tidyverse)
df %>% ggplot(aes(x = '', y = ..count.. , fill = countries)) +
geom_bar() +
coord_polar('y', start = 0)
Created on 2021-07-31 by the reprex package (v2.0.0)

How to plot bar charts with 'n' number of columns and group by another column?

I am learning r currently and I have an r data-frame containing data I have scraped from a football website.
There are 58 columns(Variables,attributes) for each row. Out of these variables, I wish to plot 3 in a single bar chart.I have 3 important variables 'Name', 'Goals.with.right.foot', 'Goals.with.left.foot'.
What I want to build is a bar chart with each 'Name' appearing on the x-axis and 2 independent bars representing the other 2 variables.
Sample row entry:
{......., RONALDO, 10(left), 5(right),............}
I have tried playing around a lot with ggplot2 geom_bar with no success.
I have also searched for similar questions however I cannot understand the answers. Is anyone able to explain simply how do I solve this problem?
my data frame is called 'Forwards' who are the strikers in a game of football. They have attributes Name, Goals.with.left.foot and Goals.with.right.foot.
barplot(counts, main="Goals",
xlab="Goals", col=c("darkblue","red"),
legend = rownames(counts))
You could try it this way:
I simulated a frame as a stand in for yours, just replace it with a frame containing the columns you're interested in:
df <- data.frame(names = letters[1:5], r.foot = runif(5,1,10), l.foot = runif(5,1,10))
# transform your df to long format
library(reshape2)
plotDf <- melt(df, variable.name = 'footing', value.name = 'goals')
# plot it
library(ggplot2)
ggplot(plotDf, aes(x = names, y = goals, group = footing, fill = footing)) +
geom_col(position = position_dodge()) #does the same as geom_bar, but uses stat_identity instead of stat_count
Results in this plot:
your plot
This works, because ggplot expects one variable containing the values needed for the y-axis and one or more variable containing the grouping factor(s).
with the melt-function, your data.frame is merged into the so called 'long format' which is exactly the needed orientation of data.

plotting two categorical vectors in ggridges

I have a dataset with a few organisms, which I would like to plot on my y-axis, against date, which I would like to plot on the x-axis. However, I want the fluctuation of the curve to represent the abundance of the organisms. I.e I would like to plot a time series with the relative abundance separated by the organism to show similar patterns with time.
However, of course, plotting just date against an organism does not yield any information on the abundance. So, my question is, is there a way to make the curve represent abundance using ggridges?
Here is my code for an example dataset:
set.seed(1)
Data <- data.frame(
Abundance = sample(1:100),
Organism = sample(c("organism1", "organism2"), 100, replace = TRUE)
)
Date = rep(seq(from = as.Date("2016-01-01"), to = as.Date("2016-10-01"), by =
'month'),times=10)
Data <- cbind(Date, Data)
ggplot(Data, aes(x = Abundance, y = Organism)) +
geom_density_ridges(scale=1.15, alpha=0.6, color="grey90")
This produces a plot with the two organisms, however, I want the date on the x-axis and not abundance. However, this doesn't work. I have read that you need to specify group=Date or change date into julian day, however, this doesn't change the fact that I do not get to incorporate abundance into the plot.
Does anyone have an example of a plot with date vs. a categorical variable (i.e. organism) plotted against a continuous variable in ggridges?
I really like to output from ggridges and would like to be able to use it for these visualizations. Thank you in advance for your help!
Cheers,
Anni
To use geom_density_ridges, it'll help to reshape the data to show observations in separate rows, vs. as summarized by Abundance.
library(ggplot2); library(ggridges); library(dplyr)
# Uncount copies the row "Abundance" number of times
Data_sum <- Data %>%
tidyr::uncount(Abundance)
ggplot(Data_sum, aes(x = Date, y = Organism)) +
ggridges::geom_density_ridges(scale=1, alpha=0.6, color="grey90")

How to Plot Bar Charts for a Categorical Variable Against an Analytical Variable in R

I'm struggling with how to do something with R that comes very easily to me in Excel: so I'm sure this is something quite basic but I'm just not aware of the equivalent method in R.
In essence, I have a two variables in my dataset: a categorical variable which has a list of names, and an analytical variable that has the frequency corresponding to that particular observation.
Something like this:
Name Freq
==== =========
X 100
Y 200
and so on.
I would like to plot a bar chart with the names listed on the X-Axis (X, Y and so on) and bars of height corresponding to the relevant value of the Freq. variable for that observation.
This is something very trivial with Excel; I can just select the relevant cells and create a bar chart.
However, in R I just can't seem to figure out how to do this! The bar charts in R seems to be univariate only and doesn't behave the way I want it to. Trying to plot the two variables results in a scatter plot which is not what I'm going for.
Is there something very basic I'm missing here, or is R just not capable of performing this task?
Any pointers will be much helpful.
Edited to Add:
I was primarily trying to use base R's plot function to get the job done.
Using, plot(dataset1$Name, dataset1$Freq) does not lead to a bar graph but a scatter-plot instead.
First the data.
dat <- data.frame(Name = c("X", "Y"), Freq = c(100, 200))
With base R.
barplot(dat$Freq, names.arg = dat$Name)
If you want to display a long list of names.arg, maybe the best way is to customize your horizontal axis with function staxlab from package plotrix. Here are two example plots.
One, with the axis labels rotated 45 degrees.
set.seed(3)
Name <- paste0("Name_", LETTERS[1:10])
dat2 <- data.frame(Name = Name, Freq = sample(100:200, 10))
bp <- barplot(dat2$Freq)
plotrix::staxlab(1, at = bp, labels = dat2$Name, srt = 45)
Another, with the labels spread over 3 lines.
bp <- barplot(dat2$Freq)
plotrix::staxlab(1, at = bp, labels = dat2$Name, nlines = 3)
Add colors with argument col. See help("par").
With ggplot2.
library(ggplot2)
ggplot(dat, aes(Name, Freq)) +
geom_bar(stat = "identity")
To add colors you have the aesthetics colour (for the contour of the bars) and fill (for the interior of the bars).

R - Building a histogram with data in intervals (from survey)

I'm currently analysing some data I've retrieved from a survey and I want to create a histogram with it.
The problem is that the data is in pairs of range-absolute frequency, something like with different ranges:
Since the intervals are not the same, how can I generate the histogram in R?
Thank you in advance.
I think you want a bar chart instead of a histogram. Here's an article that explains the difference nicely.
For a barchart with the data you provided in the format you've indicated you could do something like this:
my_data <- data.frame(range = c('[0-2]','[2-5]','[5-9]'),
abs_frequency = c(2,10,5))
library(ggplot2)
plot <- ggplot(data = my_data, aes(x = range, y = abs_frequency))
plot +
geom_bar(stat="identity")

Resources