I have a plot that has 16 Observations on 5 columns. One of the columns is called "Name". Within the column name, I have car1-6 , truck1-5, and train 1-5 which makes up my 16 observations. I have:
ggplot(dftest, aes(x = Names, y= AVGMostLikely, ymin= BestCaseHi, ymax=WorstCaseLow)) +
geom_bar(stat = "identity") +
geom_errorbar() +
ggtitle("Bar chart with Error Bars")
I want to have the fill/color of the bars to be based on the name where car1-6 will be one color, truck1-5 another, and train1-5 are a third color. Is this possible within ggplot?
Thanks for any help
Here is the code to remove the last character in Your column Names (which is the number):
char_array = c("car1","car2","truck1","truck5")
data = data.frame("names"=char_array)
data$names = as.character(data$names)
data$groups = substr(data$names,1,nchar(data$names)-1)
Than You have a new column named groups which You can use as a fill argument in ggplot.
Related
I produced a geom_col() that has 13 separate columns on it. I would like to assign a specific color to the columns: for example, I have "Teams" on the x axis, and "AVG Attendance" of the teams on the y axis.
Of the 13 teams, I would like my specific team's column to be colored red so it stands out, four of the other columns (teams newly added to league) to be in green, and the other 9 existing teams to be in blue.
I can only get ALL of the columns to be one color if I use - geom_col(fill = "blue").
How do I differentiate the columns to have a separate color grouped by the way I described above? I have spent many days googling this and I can't find a way to do it.
Notes in the code...
# need dplyr and ggplot
library(tidyverse)
# make our random numbers the same
set.seed(123)
# fake minimum reproducible example
game_df <- tibble(team = LETTERS[1:13],
attendance = sample(5000:10000, 13),
# this is hand coded but you should find a way to do this automatically using existing data
# or use mutate() to create a new column with a calculation
color_group = c('old','old','new', 'new', 'old', 'highlight','old','new','old','new','old','old','old') )
# plot
game_df %>%
ggplot (., aes(x = team, y = attendance, fill = color_group)) +
geom_col() +
scale_color_manual( values = c('red','blue','green'))
I would really appreciate some insight on the zagging when using the following code in R:
tbi_military %>%
ggplot(aes(x = year, y = diagnosed, color = service)) +
geom_line() +
facet_wrap(vars(severity))
The dataset is comprised of 5 variables (3 character, 2 numerical). Any insight would be so appreciated.
enter image description here
This is just an illustration with a standard dataset. Let's say we're interested in plotting the weight of chicks over time depending on a diet. We would attempt to plot this like so:
library(ggplot2)
ggplot(ChickWeight, aes(Time, weight, colour = factor(Diet))) +
geom_line()
You can see the zigzag pattern appear, because per diet/time point, there are multiple observations. Because geom_line sorts the data depending on the x-axis, this shows up as a vertical line spanning the range of datapoints at that time per diet.
The data has an additional variable called 'Chick' that separates out individual chicks. Including that in the grouping resolves the zigzag pattern and every line is the weight over time per individual chick.
ggplot(ChickWeight, aes(Time, weight, colour = factor(Diet))) +
geom_line(aes(group = interaction(Chick, Diet)))
If you don't have an extra variable that separates out individual trends, you could instead choose to summarise the data per timepoint by, for example, taking the mean at every timepoint.
ggplot(ChickWeight, aes(Time, weight, colour = factor(Diet))) +
geom_line(stat = "summary", fun = mean)
Created on 2021-08-30 by the reprex package (v1.0.0)
I have a CSV file with 3 levels of the parameter(C, C+Fe, Fe). Now I want to group the boxplot based on parameters by using geom-boxplot(fill=parameter) but only for two levels of them, not all 3.
The current code is: geom_boxplot(aes(x = gene, y=RA, fill = parameter) which yealds to:
However, I want to eliminate the blue box plot which is one of the parameters.
You may just filter the observation before plotting. Assuming your data frame is called df:
df2 <- subset(df, parameter != "Fe")
ggplot(df2, aes(x = gene, y=RA, fill=parameter)) +
geom_boxplot()
I'm trying to make barplot
Data are in dataframe. In those dataframes I have several column, one named ID and another count.
First I'm trying to make group of this count. In the barplot we should see,count=0,count=1,count=2,count>=3
Some exemple data
data1 <- data.frame(ID="ID_1", count=(rep(seq(0,10,by=1),each=4)))
data2 <- data.frame(ID="ID_2", count=(rep(seq(0,10,by=1),each=4)))
data3 <- data.frame(ID="ID_3", count=(rep(seq(0,10,by=1),each=4)))
Obviously here, barplots of the dataframes will look same
I tried to make this in ggplot (it's not nice at all)
ggplot(data1)+
geom_bar(aes(x = ID, fill = count),position = "fill")+
geom_bar(data=data2,aes(x = ID, fill = count),position = "fill")+
geom_bar(data=data3,aes(x = ID, fill = count),position = "fill")
I got something like that
What I'm trying to do is to have different groups within a barplot, like the proportion of counts 0, proportion of counts 1,2 and proportion of counts greater (and equal) to 3.
I expect something like that
But of course in my example barplots will look same.
Also if you have some suggestion to change Y axis from 1.00 to 100%.
Also One of my problem is that length of my real dataframes are not equal but it should doesn't matter because I try to get the percentage of count group
You need to put all the data in 1 dataframe, in long format. Then cast your counts to factors, and it works.
ggplot(bind_rows(data1, data2, data3)) +
geom_bar(aes(x = ID, fill = as.factor(count)), position = "fill") +
scale_y_continuous(labels=scales::percent) # To get the Y axis in percentage
So I did something to try to create my barplot
data1$var="first"
data2$var="second"
data3$var="third"
data4$var="fourth"
data5$var="fifth"
full_data=rbind(data1,data2,data3,data4,data5)
ggplot(ppgk) +
geom_bar(aes(x = var, fill = as.factor(Count)), position = "fill")+
scale_y_continuous(labels=scales::percent)
So I got something like that :
If Someone have the solution to make different group of counts : count=0,count=1,count=2,count>=3
This is my first time submitting a question, so apologies in advance if my formatting is not optimal.
I have a dataframe with roughly 6,000 rows of data in 2 columns, and I want to be able to pull out individual rows (and multiple rows together) to barplot.
I read my file in as a dataframe, here is a very small subset:
gene log2
1 SMa0002 0.457418
2 SMa0005 1.116950
3 SMa0007 0.686749
4 SMa0009 0.169450
5 SMa0011 0.393365
6 SMa0013 0.601940
So what I would want to be able to do is have a barplot where the x axis is a number of genes (SMaXXX, SMaXXX, SMaXXX, etc.), and the y-axis is the log2 column. It only has (+) values displayed, but there are (-) values as well. I have no real preference about whether I use barplot or geom_bar in ggplot2, or another plotter.
I know how to just plot the dataframe;
ggplot(df, aes(x = gene, y = log2)) + geom_bar(stat = "identity")
I've tried playing around with using 'match' but I haven't been able to figure out how to make that work. Ideally the code is versatile so I can just punch in different SMaXXXX codes to generate many different plots.
Thanks for reading!
It seems that you just need a way to subset your data.frame when plotting, right?
Let's assume you've got a vector subset.genes of the genes you need to plot:
df=data.frame(gene=c("SMa0002","SMa0005","SMa0006","SMa0007","SMa0011","SMa0013"),
"log2"=runif(6), stringsAsFactors=F)
subset.genes=sample(unique(df$gene), 4, replace=F)
A couples of ways:
1°) Inside ggplot2
ggplot(df, aes(x = gene, y = log2)) + geom_bar(stat = "identity") +
scale_x_discrete(limits=subset.genes)
2°) before:
df2 <- subset(df, gene %in% subset.genes)
ggplot(df2, aes(x = gene, y = log2)) + geom_bar(stat = "identity")