First time asking a question on Stack Overflow
I am having trouble creating a bar plot which I hope to be able to filter based on certain fields. I am using some sensitive data, but I will try to be as clear as possible with the type of data I am using. The data is set up in a hierarchy of sorts, for example the data is as such:
Project -> Subproject -> Sub-SubProject, and a value for each month (jan, feb, march, etc), for a total of 15 columns.
Each row in the csv has a value for each element, so the Project column has a lot of repeated values since it's the top of the hierarchy, with subproject having a fair bit of repeated values as well since it's one level lower.
My goal is to create a bar chart that groups each unique value in the hierarchy, while having the months on the x axis, and values for the months on the y.
So all the values with the same Project, Subproject, will be grouped together, showing the months on the x axis.
I have tried using the ggplot2 library to try to group the values based on hierarchy, but it doesn't look the best and it's aggregating the values rather than showing a the unique value for the entry.
plot <- ggplot(data=data, aes(x= Sub-Project, y = January, fill = Sub-SubProject)) +
geom_bar(stat="identity", position = "dodge") +
facet_grid(~Project, scales = "free_x", space = "free") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
geom_text(aes(label=Capacity.1),hjust=0, vjust=0)
I want to avoid using fill, since I would like to conditionally set the color on my own, but that is a problem for another time. I was able to somewhat replicate what I am looking for on Tableau, but now the deliverable must be in R.
In general, I would like no aggregation, but a unique bar for each entry, grouped by the hierarchy I said above.
Related
I have a data frame containing 5 probes which are my variables in a dataframe, cg02823866, cg13474877, cg14305799, cg15837913 and cg19724470. I want to create a boxplot that will group cg02823866 and cg14305799 into a group called 'GeneBody' and then cg13474877, cg14305799 and cg19724470 into a group called 'Promoter'. I then want to colour code the boxplots to represent the probe names. I can't figure out how to group those variables into groups to plot the graph.
I created an ungrouped boxplot of the five probes and it looked like this.
I want there to be the titles 'Promoter' and 'GeneBody' on the x axis. Above the 'GeneBody' title there are the 2 boxplots for the cg02823866 and cg14305799 probes. Then a 'Promoter' label with the boxplots for cg13474877, cg14305799 and cg19724470. I then want each boxplots colour coded to represent each different probe.
My data frame that I imported into RStudio looks like this: https://i.stack.imgur.com/r4gEC.png
Assuming you have some data with variable names Beta (your y axis), Probe (your current x axis), and group (either "GeneBody" or "Promoter"), you can do something like the following:
library(ggplot2)
ggplot(data, aes(x = group, y = Beta, fill = Probe)) +
geom_boxplot()
If you provide a reproducible set of data, I can probably do better.
Adding to Ben's answer the traditional iris-data.frame example,which you can easily load by data(iris):
ggplot(iris) +
aes(x = "", y = Sepal.Length, group = Species) +
geom_boxplot(shape = "circle", fill = "#112446") +
theme_minimal()
So you just need a column which indicates the group dependency.
It gets of course more difficult with uncleand data, where you might need to transpond the data first etc. But those are follow up questions i guess.
Also if you want to make your life easier, use esquisse R-Studio add-on
Boxplot
I would like to make a graph in R, which I managed to make in excel. It is a bargraph with species on the x-axis and the log number of observations on the y-axis. My current data structure in R is not suitable (I think) to make this graph, but I do not know how to change this (in a smart way).
I have (amongst others) a column 'camera_site' (site 1, site2..), 'species' (agouti, paca..), 'count'(1, 2..), with about 50.000 observations.
I tried making a dataframe with a column 'species" (with 18 species) and a column with 'log(total observation)' for each species (see dataframe) But then I can only make a point graph.
this is how I would like the graph to look:
desired graph made in excel
Your data seems to be in the correct format from what I can tell from your screenshot.
The minimum amount of code you would need to get a plot like that would be the following, assuming your data.frame is called df:
ggplot(df, aes(VRM_species, log_obs_count_vrm)) +
geom_col()
Many people intuitively try geom_bar(), but geom_col() is equivalent to geom_bar(stat = "identity"), which you would use if you've pre-computed observations and don't need ggplot to do the counting for you.
But you could probably decorate the plot a bit better with some additions:
ggplot(df, aes(VRM_species, log_obs_count_vrm)) +
geom_col() +
scale_x_discrete(name = "Species") +
scale_y_continuous(name = expression("Log"[10]*" Observations"),
expand = c(0,0,0.1,0)) +
theme(axis.text.x = element_text(angle = 90))
Of course, you could customize the theme anyway you would like.
Groetjes
I have a dataframe that I want to reorder to make a ggplot so I can easily see which items have the highest and lowest values in them. In my case, I've grouped the data into two groups, and it'd be nice to have a visual representation of which group tends to score higher. Based on this question I came up with:
library(ggplot2)
cor.data<- read.csv("https://dl.dropbox.com/s/p4uy6uf1vhe8yzs/cor.data.csv?dl=0",stringsAsFactors = F)
cor.data.sorted = cor.data[with(cor.data,order(r.val,pic)),] #<-- line that doesn't seem to be working
ggplot(cor.data.sorted,aes(x=pic,y=r.val,size=df.val,color=exp)) + geom_point()
which produces this:
I've tried quite a few variants to reorder the data, and I feel like this should be pretty simple to achieve. To clarify, if I had succesfully reorganised the data then the y-values would go up as the plot moves along the x-value. So maybe i'm focussing on the wrong part of the code to achieve this in a ggplot figure?
You could do something like this?
library(tidyverse);
cor.data %>%
mutate(pic = factor(pic, levels = as.character(pic)[order(r.val)])) %>%
ggplot(aes(x = pic, y = r.val, size = df.val, color = exp)) + geom_point()
This obviously still needs some polishing to deal with the x axis label clutter etc.
Rather than try to order the data before creating the plot, I can reorder the data at the time of writing the plot:
cor.data<- read.csv("https://dl.dropbox.com/s/p4uy6uf1vhe8yzs/cor.data.csv?dl=0",stringsAsFactors = F)
cor.data.sorted = cor.data[with(cor.data,order(r.val,pic)),] #<-- This line controls order points drawn created to make (slightly) more readible plot
gplot(cor.data.sorted,aes(x=reorder(pic,r.val),y=r.val,size=df.val,color=exp)) + geom_point()
to create
This question already has answers here:
How do you specifically order ggplot2 x axis instead of alphabetical order? [duplicate]
(2 answers)
Closed 5 years ago.
I am trying to create a custom label for an ordered stacked bar graph in ggplot2.
I have six different animals in my garden - a beaver, an elephant, a kangaroo, a mouse, a dragon and a chihuahua.
I asked them each to sing to me on two occasions, once when they were happy, and once when they were sad. I recorded how long they sang for on each occasion.
I want to plot the animals' total singing time in a stacked bar graph, with one stacked bar corresponding to one animal, and each component of the stacked bar corresponding to the animal’s mood, but I want to order the stacked bars by the animal’s size, with the animal’s name displayed underneath the bars.
In an attempt to do this, I created a column in my data frame that combines the size order information with the animal factor (e.g. "1.mouse", etc.). This allows the bars to be displayed in the size order. I then tried to use ‘substring' to extract the letters corresponding to the name for the x label (so that it reads e.g. "mouse", etc.) That didn’t work.
If I just use ‘animal’ to label the axis then ggplot labels the bars with the animal names listed in alphabetical order. I did try using the function ‘order’ too.
I have looked on stack overflow and other sites and can't find the exact problem elsewhere.
Many thanks from me and my menagerie!
animal<-rep(c("beaver","elephant","kangaroo","mouse","dragon","chihuahua"),2)
size_order<-rep(c(3,5,4,1,6,2),2)
mood<-c((rep("happy",6)),rep("sad",6))
singing_time<-as.numeric(rnorm(12, 5, 2))
ordered_animal<-paste(size_order,animal,sep = ".")
singing_data<-as.data.frame(cbind(mood,singing_time,ordered_animal))
ggplot(singing_data, aes(x = ordered_animal, y = singing_time, fill = mood, label = singing_time)) +
geom_bar(stat = "identity") +
scale_x_discrete(labels = levels(substring(as.factor(ordered_animal),3,10)))
Part of the problem is that your use of cbind is coercing different data types (numeric, factor) into a matrix of a single data type (numeric). Try the data.frame constructor with vector arguments.
You don't need to put numbers into the factor levels, and you don't need an "ordered factor" (which is useful for regression and other modeling but not needed here). Just use a regular factor with levels= which will take care of the display order.
Your other problem is that your animal size order isn't right, so the example results don't look right.
animal_items <- c("beaver","elephant","kangaroo","mouse","dragon","chihuahua")
corrected_size_order<-c(4,6,1,3,2,5) # applies to animal_items
animal<-rep(animal_items,2)
ordered_animal <- factor(animal, levels=animal[corrected_size_order])
mood<-c((rep("happy",6)),rep("sad",6))
singing_time<-as.numeric(rnorm(12, 5, 2))
singing_data<-data.frame(mood,singing_time,ordered_animal)
ggplot(singing_data, aes(x = ordered_animal, y = singing_time, fill = mood, label = singing_time)) +
geom_bar(stat = "identity")
I am a new R user.
I have a difficult time figuring out how to combine different barplot into one graph.
For example,
Suppose, the top five of professions in China, are, government employees, CEOs, Doctors, Athletes, artists, with the incomes (in dollars) respectively, 20,000,17,000,15,000,14,000,and 13,000, while the top five of professions in the US, are, doctors, athletes, artists, lawyers, teachers with the incomes (in dollars) respectively, 40,000,35,000,30,000,25,000 and 20,000.
I want to show the differences in one graph.
How am I supposed to do that? Beware that they have different names.
The answer to the question is fairly straight forward. As a new R user, I recommend you make liberal use of the 'ggplot2' package. For many R users, this one package is enough.
To get the "combined" barchart described in the original post, the answer is to put all of the data into one dataset and then add grouping variables, like so:
Step 1: Make the dataset.
data <- read.table(text="
Country,Profession,Income
China,Government employee,20000
China,CEO,17000
China,Doctor,15000
China,Athlete,14000
China,Artist,13000
USA,Doctor,40000
USA,Athlete,35000
USA,Artist,30000
USA,Lawyer,25000
USA,Teacher,20000", header=TRUE, sep=",")
You'll notice I'm using the 'read.table' function here. This is not required and is purely for readability in this example. The important part is that we have our values (Income) and our grouping variables (Country, Profession).
Step 2: Create a barchart with Income as the height of the bars, Profession as the x-axis, and color the bars by Country.
library(ggplot2)
ggplot(data, aes(x=Profession, y=Income, fill=Country)) +
geom_bar(stat="identity", position="dodge") +
theme(axis.text.x = element_text(angle = 90))
Here we are first loading the 'ggplot2' package. You may need to install this.
Then, we specify what data we want to use and how to separate it.
ggplot(data, aes(x=Profession, y=Income, fill=Country))
This tells 'ggplot' to use our dataset in the 'data' data frame. The aes() command specifies how 'ggplot' should read the data. We map the grouping variable Profession onto the x-axis, map the Income onto the y-axis, and change the color (fill) of each bar according to the grouping variable Country.
Next, we specify what kind of barchart we want.
geom_bar(stat="identity", position="dodge")
This tells 'ggplot' to make a barchart (geom_bar()). By default, the 'geom_bar' function tries to make a histogram, but we already have the totals we want to use. We tell it to use our totals by specifying that the type of statistic represented in Income is the total, or actual values (identity) that we want to chart (stat="identity"). Finally, I made a judgement call about how to display the data and decided to set one set of data on next to the other when a single profession has multiple income values (position="dodge").
Finally, we need to rotate the x-axis labels, since some of them are quite long. We do this with a simple 'theme' command that changes the rotation of the x-axis text elements.
theme(axis.text.x = element_text(angle = 90))
We chain all of these commands together with the +, and it's done!