I'm trying to make barplot
Data are in dataframe. In those dataframes I have several column, one named ID and another count.
First I'm trying to make group of this count. In the barplot we should see,count=0,count=1,count=2,count>=3
Some exemple data
data1 <- data.frame(ID="ID_1", count=(rep(seq(0,10,by=1),each=4)))
data2 <- data.frame(ID="ID_2", count=(rep(seq(0,10,by=1),each=4)))
data3 <- data.frame(ID="ID_3", count=(rep(seq(0,10,by=1),each=4)))
Obviously here, barplots of the dataframes will look same
I tried to make this in ggplot (it's not nice at all)
ggplot(data1)+
geom_bar(aes(x = ID, fill = count),position = "fill")+
geom_bar(data=data2,aes(x = ID, fill = count),position = "fill")+
geom_bar(data=data3,aes(x = ID, fill = count),position = "fill")
I got something like that
What I'm trying to do is to have different groups within a barplot, like the proportion of counts 0, proportion of counts 1,2 and proportion of counts greater (and equal) to 3.
I expect something like that
But of course in my example barplots will look same.
Also if you have some suggestion to change Y axis from 1.00 to 100%.
Also One of my problem is that length of my real dataframes are not equal but it should doesn't matter because I try to get the percentage of count group
You need to put all the data in 1 dataframe, in long format. Then cast your counts to factors, and it works.
ggplot(bind_rows(data1, data2, data3)) +
geom_bar(aes(x = ID, fill = as.factor(count)), position = "fill") +
scale_y_continuous(labels=scales::percent) # To get the Y axis in percentage
So I did something to try to create my barplot
data1$var="first"
data2$var="second"
data3$var="third"
data4$var="fourth"
data5$var="fifth"
full_data=rbind(data1,data2,data3,data4,data5)
ggplot(ppgk) +
geom_bar(aes(x = var, fill = as.factor(Count)), position = "fill")+
scale_y_continuous(labels=scales::percent)
So I got something like that :
If Someone have the solution to make different group of counts : count=0,count=1,count=2,count>=3
Related
I'm trying to make barplot with ggplot.
So I have several dataframe (example data below).
In these dataframe I have a column "count". But I have a lot of count==0.
So I'm trying to make a barplot of my data, exclude 0 in visualization, but keep the original percentage.
For example if I have 80% of 0 in my data I want to plot only the value!=0 but keep in Y label 20% (like that I can easily visualize my data and keep information about 0 value).
If you have better suggestion to represent my data I'm open to suggestion.
Another of my problem is that I want to merge some groups of "count". Meaning that I want in my plot count=1,count=2,count>=3 and I don't know how to get that. I was thinking maybe make a count matrix?
Here data example:
#Stackoverflow example
data1=data.frame(count=c(rep(0,70),rep(1,15),rep(2,10),rep(3,3),5,7))
data2=data.frame(count=c(rep(0,140),rep(1,30),rep(2,20),rep(3,6),5,5,7,7))
data3=data.frame(count=c(rep(0,120),rep(1,20),rep(2,7),5,7,9))
data1$var="first"
data2$var="second"
data3$var="third"
all_df=rbind(data1,data2,data3)
#Plot all values : Plot 1
ggplot(all_df) +
geom_bar(aes(x = var, fill = as.factor(count)), position = "fill")+
scale_y_continuous(labels=scales::percent)
#Plot value greater than 0 : Plot 2
ggplot(all_df[which(all_df$count>0),]) +
geom_bar(aes(x = var, fill = as.factor(count)), position = "fill")+
scale_y_continuous(labels=scales::percent)
So here it's what I got with all the data
And so here it's what I tried to exclude 0 but I don't know how keep the information about the 0 missing value (80% of the data). So instead to have 100% on the Y top label, I'm trying to get (1-(% count==0))
And also group the count>=3 so instead to have all in the legend : 1,2,3,5,7,9. I want 1,2,>=3
To do that I was thinking to make a count table in new dataframe. So in my data make the sum of count=0,count=1,count=2,count>=3, do it for all the different dataframe but then... I don't know... Example of what I tried below.
count_df=function(a,b,c){
data.frame(first=c(sum(a$count==0),sum(a$count==1),sum(a$count==2),sum(a$count>=3)),
second=c(sum(b$count==0),sum(b$count==1),sum(b$count==2),sum(b$count>=3)),
third=c(sum(c$count==0),sum(c$count==1),sum(c$count==2),sum(c$count>=3)))
}
count_table=count_df(data1,data2,data3)
rownames(count_table)=c("0","1","2","=<3")
You could set the color of the zero count to transparent. This way you do not need to change your data.frame at all.
Using the handy gg_color_hue-function found here you can then do this:
gg_color_hue <- function(n) {
hues = seq(15, 375, length = n + 1)
hcl(h = hues, l = 65, c = 100)[1:n]
}
counts <- unique(all_df$count)
counts <- counts[-which(counts==0)]
colors <- c('transparent', gg_color_hue(length(counts)))
#Plot all values : Plot 1
ggplot(all_df) +
geom_bar(aes(x = var, fill = as.factor(count)), position = "fill")+
scale_y_continuous(labels=scales::percent) +
scale_fill_manual(values=colors, breaks=counts)
I have a plot that has 16 Observations on 5 columns. One of the columns is called "Name". Within the column name, I have car1-6 , truck1-5, and train 1-5 which makes up my 16 observations. I have:
ggplot(dftest, aes(x = Names, y= AVGMostLikely, ymin= BestCaseHi, ymax=WorstCaseLow)) +
geom_bar(stat = "identity") +
geom_errorbar() +
ggtitle("Bar chart with Error Bars")
I want to have the fill/color of the bars to be based on the name where car1-6 will be one color, truck1-5 another, and train1-5 are a third color. Is this possible within ggplot?
Thanks for any help
Here is the code to remove the last character in Your column Names (which is the number):
char_array = c("car1","car2","truck1","truck5")
data = data.frame("names"=char_array)
data$names = as.character(data$names)
data$groups = substr(data$names,1,nchar(data$names)-1)
Than You have a new column named groups which You can use as a fill argument in ggplot.
I am learning geom_bar on section 3.7 of r4ds.had.co.nz. I run a code like this:
library(ggplot2)
ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, y = ..prop.., group = 1))
Then I have this plot:
The point is, if I exclude the "group = 1" part:
library(ggplot2)
ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, y = ..prop..))
The plot will be wrong,
But if I replace group = 1 by group = 2 or group = "x", the plot still looks correct. So I don't quite understand the meaning of group = 1 here and how to use it.
group="whatever" is a "dummy" grouping to override the default behavior, which (here) is to group by cut and in general is to group by the x variable. The default for geom_bar is to group by the x variable in order to separately count the number of rows in each level of the x variable. For example, here, the default would be for geom_bar to return the number of rows with cut equal to "Fair", "Good", etc.
However, if we want proportions, then we need to consider all levels of cut together. In the second plot, the data are first grouped by cut, so each level of cut is considered separately. The proportion of Fair in Fair is 100%, as is the proportion of Good in Good, etc. group=1 (or group="x", etc.) prevents this, so that the proportions of each level of cut will be relative to all levels of cut.
Group will help the plot to look at the specific rows that contain the specific cut and the proportion is found with respect to the whole database as in proportion of an ideal cut in the whole dataset.
If group is not used, the proportion is calculated with respect to the data that contains that field and is ultimately going to be 100% in any case. For instance, The proportion of an ideal cut in the ideal cut specific data will be 1.
This is my first time submitting a question, so apologies in advance if my formatting is not optimal.
I have a dataframe with roughly 6,000 rows of data in 2 columns, and I want to be able to pull out individual rows (and multiple rows together) to barplot.
I read my file in as a dataframe, here is a very small subset:
gene log2
1 SMa0002 0.457418
2 SMa0005 1.116950
3 SMa0007 0.686749
4 SMa0009 0.169450
5 SMa0011 0.393365
6 SMa0013 0.601940
So what I would want to be able to do is have a barplot where the x axis is a number of genes (SMaXXX, SMaXXX, SMaXXX, etc.), and the y-axis is the log2 column. It only has (+) values displayed, but there are (-) values as well. I have no real preference about whether I use barplot or geom_bar in ggplot2, or another plotter.
I know how to just plot the dataframe;
ggplot(df, aes(x = gene, y = log2)) + geom_bar(stat = "identity")
I've tried playing around with using 'match' but I haven't been able to figure out how to make that work. Ideally the code is versatile so I can just punch in different SMaXXXX codes to generate many different plots.
Thanks for reading!
It seems that you just need a way to subset your data.frame when plotting, right?
Let's assume you've got a vector subset.genes of the genes you need to plot:
df=data.frame(gene=c("SMa0002","SMa0005","SMa0006","SMa0007","SMa0011","SMa0013"),
"log2"=runif(6), stringsAsFactors=F)
subset.genes=sample(unique(df$gene), 4, replace=F)
A couples of ways:
1°) Inside ggplot2
ggplot(df, aes(x = gene, y = log2)) + geom_bar(stat = "identity") +
scale_x_discrete(limits=subset.genes)
2°) before:
df2 <- subset(df, gene %in% subset.genes)
ggplot(df2, aes(x = gene, y = log2)) + geom_bar(stat = "identity")
I have a data frame that contains 4 variables: an ID number (chr), a degree type (factor w/ 2 levels of Grad and Undergrad), a degree year (chr with year), and Employment Record Type (factor w/ 6 levels).
I would like to display this data as a count of the unique ID numbers by year as a stacked area plot of the 6 Employment Record Types. So, count of # of ID numbers on the y-axis, degree year on the x-axis, the value of x being number of IDs for that year, and the fill will handle the Record Type. I am using ggplot2 in RStudio.
I used the following code, but the y axis does not count distinct IDs:
ggplot(AlumJobStatusCopy, aes(x=Degree.Year, y=Entity.ID,
fill=Employment.Data.Type)) + geom_freqpoly() +
scale_fill_brewer(palette="Blues",
breaks=rev(levels(AlumJobStatusCopy$Employment.Data.Type)))
I also tried setting y = Entity.ID to y = ..count.. and that did not work either. I have searched for solutions as it seems to be a problem with how I am writing the aes code.
I also tried the following code based on examples of similar plots:
ggplot(AlumJobStatusCopy, aes(interval)) +
geom_area(aes(x=Degree.Year, y = Entity.ID,
fill = Employment.Data.Type)) +
scale_fill_brewer(palette="Blues",
breaks=rev(levels(AlumJobStatusCopy$Employment.Data.Type)))
This does not even seem to work. I've read the documentation and am at my wit's end.
EDIT:
After figuring out the answer to the problem, I realized that I was not actually using the correct values for my Year variable. A count tells me nothing as I am trying to display the rise in a lack of records and the decline in current records.
My Dataset:
Year, int, 1960-2015
Current Record, num: % of total records that are current
No Record, num: % of total records that are not current
Ergo each Year value has two corresponding percent values. I am now using 2 lines instead of an area plot since the Y axis has distinct values instead of a count function, but I would still like the area under the curves filled. I tried using Melt to convert the data from wide to long, but was still unable to fill both lines. Filling is just for aesthetic purposes as I would like to use a gradient for each with 1 fill being slightly lighter than the other.
Here is my current code:
ggplot(Alum, aes(Year)) +
geom_line(aes(y = Percent.Records, colour = "Percent.Records")) +
geom_line(aes(y = Percent.No.Records, colour = "Percent.No.Records")) +
scale_y_continuous(labels = percent) + ylab('Percent of Total Records') +
ggtitle("Active, Living Alumni Employment Record") +
scale_x_continuous(breaks=seq(1960, 2014, by=5))
I cannot post an image yet.
I think you're missing a step where you summarize the data to get the quantities to plot on the y-axis. Here's an example with some toy data similar to how you describe yours:
# Make toy data with three levels of employment type
set.seed(1)
df <- data.frame(Entity.ID = rep(LETTERS[1:10], 3), Degree.Year = rep(seq(1990, 1992), each=10),
Degree.Type = sample(c("grad", "undergrad"), 30, replace=TRUE),
Employment.Data.Type = sample(as.character(1:3), 30, replace=TRUE))
# Here's the part you're missing, where you summarize for plotting
library(dplyr)
dfsum <- df %>%
group_by(Degree.Year, Employment.Data.Type) %>%
tally()
# Now plot that, using the sums as your y values
library(ggplot2)
ggplot(dfsum, aes(x = Degree.Year, y = n, fill = Employment.Data.Type)) +
geom_bar(stat="identity") + labs(fill="Employment")
The result could use some fine-tuning, but I think it's what you mean. Here, the bars are equal height because each year in the toy data include an equal numbers of IDs; if the count of IDs varied, so would the total bar height.
If you don't want to add objects to your workspace, just do the summing in the call to ggplot():
ggplot(tally(group_by(df, Degree.Year, Employment.Data.Type)),
aes(x = Degree.Year, y = n, fill = Employment.Data.Type)) +
geom_bar(stat="identity") + labs(fill="Employment")