Pie Chart using variables with character names - r

I'm trying to create some pie charts showing the distribution of companies amongst regions and countries.
I'm getting an error saying 'x' values must be positive, which I think is because I'm trying to plot country names and it needs to be a number?
Any guidance on this would be really helpful
Summary: trying to make a pie chart of investor countries/regions to show their distribution (i.e. how many are in the UK, France, Germany etc)
Data: data
Main variables: investor, country/region
Any help with this code would be great!
Rory

try something on these lines
#demo data
investors <- paste0('investor', 1:100)
countries <- paste0('country', 1:5)
set.seed(1)
df <- data.frame(investors, countries = sample(countries, 100, T))
# pie chart code
library(tidyverse)
df %>% ggplot(aes(x = '', y = ..count.. , fill = countries)) +
geom_bar() +
coord_polar('y', start = 0)
Created on 2021-07-31 by the reprex package (v2.0.0)

Related

Creating multiple variables to view the frequency of them in time

I'm currently working with a dataframe which has this structure:
Date
Term
Frequency
2022-10-28
politics
42
2022-10-26
biology
69
It was generated to summarize the frequency of a certain word by date, from a larger database of social media posts.
Here's example data:
examp.data <- data.frame(
date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27")),
term = c("engineering","biology","physics","mathematics","computer"),
freq = c(732,917,241,601,692),
stringsAsFactors = FALSE
)
The object is to produce a plot that looks this
from one that right now looks this:
I was assuming I could achieve this by creating new variables (columns) based on each word and then plotting them using the same x axis (dates). But I can't figure a way to transform the data to do it.
I don't think you need to transform the data. You can just use ggplot aesthetics.
dat %>%
ggplot() +
aes(date, freq, color = term) +
geom_line()

Stacked Bar Chart for Gene Expression in R

I am new to data visualization. I am trying to use R to visualize DEseq2 data from galaxy to show the range of expression of genes on different chromosomes between males and female of a beetle species. I need the X to be chromosome and the Y to be expression level (log2FC).
I am trying to have this produce a color gradient going from male to female, like this example.
I have tried to do this using a standard barchart, creating a joined group for gender-log2fc:
sexlinked=read.table("clipboard", header=TRUE)
df <- sexlinked
df$group <- paste0(df$sex, "-", df$log2fc, sep="")
ggplot(df, aes(chromosomes)) +
geom_bar(aes(fill = group), colour = "grey")
Here is an image of my data (screencap of spreadsheet), in case it helps you to assist me.

Putting Values on a County Map in R

I am using an excel sheet for data. One column has FIPS numbers for GA counties and the other is labeled Count with numbers 1 - 5. I have made a map with these values using the following code:
library(usmap)
library(ggplot2)
library(rio)
carrierdata <- import("GA Info.xlsx")
plot_usmap( data = carrierdata, values = "Count", "counties", include = c("GA"), color="black") +
labs(title="Georgia")+
scale_fill_continuous(low = "#56B1F7", high = "#132B43", name="Count", label=scales::comma)+
theme(plot.background=element_rect(), legend.position="right")
I've included the picture of the map I get and a sample of the data I am using. Can anyone help me put the actual Count numbers on each county?
Thanks!
Data
The usmap package is a good source for county maps, but the data it contains is in the format of data frames of x, y co-ordinates of county outlines, whereas you need the numbers plotted in the center of the counties. The package doesn't seem to contain the center co-ordinates for each county.
Although it's a bit of a pain, it is worth converting the map into a formal sf data frame format to give better plotting options, including the calculation of the centroid for each county. First, we'll load the necessary packages, get the Georgia data and convert it to sf format:
library(usmap)
library(sf)
library(ggplot2)
d <- us_map("counties")
d <- d[d$abbr == "GA",]
GAc <- lapply(split(d, d$county), function(x) st_polygon(list(cbind(x$x, x$y))))
GA <- st_sfc(GAc, crs = usmap_crs()#projargs)
GA <- st_sf(data.frame(fips = unique(d$fips), county = names(GAc), geometry = GA))
Now, obviously I don't have your numeric data, so I'll have to make some up, equivalent to the data you are importing from Excel. I'll assume your own carrierdata has a column named "fips" and another called "values":
set.seed(69)
carrierdata <- data.frame(fips = GA$fips, values = sample(5, nrow(GA), TRUE))
So now we left_join our imported data to the GA county data:
GA <- dplyr::left_join(GA, carrierdata, by = "fips")
And we can calculate the center point for each county:
GA$centroids <- st_centroid(GA$geometry)
All that's left now is to plot the result:
ggplot(GA) +
geom_sf(aes(fill = values)) +
geom_sf_text(aes(label = values, geometry = centroids), colour = "white")

plotting two categorical vectors in ggridges

I have a dataset with a few organisms, which I would like to plot on my y-axis, against date, which I would like to plot on the x-axis. However, I want the fluctuation of the curve to represent the abundance of the organisms. I.e I would like to plot a time series with the relative abundance separated by the organism to show similar patterns with time.
However, of course, plotting just date against an organism does not yield any information on the abundance. So, my question is, is there a way to make the curve represent abundance using ggridges?
Here is my code for an example dataset:
set.seed(1)
Data <- data.frame(
Abundance = sample(1:100),
Organism = sample(c("organism1", "organism2"), 100, replace = TRUE)
)
Date = rep(seq(from = as.Date("2016-01-01"), to = as.Date("2016-10-01"), by =
'month'),times=10)
Data <- cbind(Date, Data)
ggplot(Data, aes(x = Abundance, y = Organism)) +
geom_density_ridges(scale=1.15, alpha=0.6, color="grey90")
This produces a plot with the two organisms, however, I want the date on the x-axis and not abundance. However, this doesn't work. I have read that you need to specify group=Date or change date into julian day, however, this doesn't change the fact that I do not get to incorporate abundance into the plot.
Does anyone have an example of a plot with date vs. a categorical variable (i.e. organism) plotted against a continuous variable in ggridges?
I really like to output from ggridges and would like to be able to use it for these visualizations. Thank you in advance for your help!
Cheers,
Anni
To use geom_density_ridges, it'll help to reshape the data to show observations in separate rows, vs. as summarized by Abundance.
library(ggplot2); library(ggridges); library(dplyr)
# Uncount copies the row "Abundance" number of times
Data_sum <- Data %>%
tidyr::uncount(Abundance)
ggplot(Data_sum, aes(x = Date, y = Organism)) +
ggridges::geom_density_ridges(scale=1, alpha=0.6, color="grey90")

Trouble with levels on continuous bar chart with ggplot2

I have a dataset that looks something like this:
testSet <- data.table(date=as.Date(c("2013-07-02","2013-08-03","2013-09-04",
"2013-10-05","2013-11-06")),
yr = c(2013,2013,2013,2013,2013),
mo = c(07,08,09,10,11),
da = c(02,03,04,05,06),
plant = LETTERS[1:5],
product = letters[26:22],
rating = runif(5))
What I would like to do is plot 2 graphs using ggplot2.
The first would give me a dodged, continuous bar chart for all months that would have the product on the x, the ratings on the y, and the dates grouped and plotted on their respective products.
x = product
y = rating
Dodge = date
The second that I'm trying to create is a dodged, continuous bar chart for one month that would have the plant on the x, the ratings on the y, and the product grouped and plotted on their respective plants.
x = plant
y = rating
Dodge = product
I'm looking for an output that is very similar to this: http://docs.ggplot2.org/0.9.3/geom_bar-28.png but continuous.
I've had issues trying to figure out how the levels things works and haven't seen an example of a dodged, continuous chart.
Here is the code I have created so far:
testMean <- tapply(testSet$rating, list(testSet$mo), mean)
testLevels <- factor(levels(testSet$product,testSet$mo),
levels = levels(testSet$product,testSet$mo))
qplot(testLevels, aes(testMean, fill=cut)) +
geom_bar(position="dodge", stat="identity")
This is what the ggplot2 site says about creating a continuous bar chart, but it doesn't say anything about how to do it with multiple graphs overlayed on top of each other and then dodged, like in the one I linked to earlier. Here is their code:
meanprice <- tapply(diamonds$price, diamonds$cut, mean)
cut <- factor(levels(diamonds$cut), levels = levels(diamonds$cut))
qplot(cut, meanprice)
I appreciate the help, guys!
I ended up using the diamonds built in data set to get my question answered. All the thanks in the world to #carloscinelli for his assistance.
library(data.table)
data <- data.table(diamonds)[,list(mean_carat=mean(carat)), by=c('cut', 'color')]
Thanks!

Resources