Saturation curve with geom_step in R - r

I have two problems: I would like to create a graph with multiple lines by adding the values in the columns stepwise (should kind of end up looking like multiple saturation curves). I think geom_step in the ggplot2 package should work. However, I don't know how to add the values in the columns as I go and I don't know how to add multiple lines (I will have over 100 lines) therefore both steps should be automated in some way.
This data set shows my data, only contains the first 3 columns and the first 13 lines.
a<-c(0,1,1,1,1,1,1,0,1,0,1,0,1)
b<-c(0,1,0,0,1,0,1,0,1,0,1,0,1)
c<-c(0,1,1,0,1,0,1,1,1,1,1,1,1)
df<-data.frame(a,b,c)
Can anyone help me? I have no idea where to start.

If you're looking for cumulative sums of the data, the cumsum() function will do it for you.
It isn't completely clear to me what you're looking for, but this might take care of it:
a<-c(0,1,1,1,1,1,1,0,1,0,1,0,1)
b<-c(0,1,0,0,1,0,1,0,1,0,1,0,1)
c<-c(0,1,1,0,1,0,1,1,1,1,1,1,1)
df2<-data.frame(a,b,c)
df3 <- df2 %>%
mutate_all(cumsum) %>%
rename_all(paste0, 'x') %>%
cbind(df2) %>%
mutate(row = row_number()) %>%
pivot_longer(ax:c)
ggplot(df3) +
geom_step(aes(x = row, y = value, color = name))
The data was reshaped to longer data for ease of plotting. Original data was left in as well, those are the lines that stay near the bottom of the graph.
The output:

Related

Show the difference between two averages with ggplot in R

In my dataset I have two columns, named part_1 and part_2, that contain several numerical values.
I am required to create a graph that shows how the average varies in the two parts.
I think that the best way is to create a barplot with a bar for each part, but I'm not sure about it.
First, I created two new columns that contain the mean values for the two parts in each row:
averages <- my_data %>% mutate(avg_part1=mean(part_1,na.rm=T)) %>% mutate(avg_part2=mean(part_2,na.rm=T))
Then, I inserted the values in two new variables:
avg_part1 <- averages %>% slice(1) %>% pull(avg_part1) avg_part2 <- averages %>% slice(1) %>% pull(avg_part2)
To create the plot I did:
to_graph<-c("First part"=avg_part1,"Second part"=avg_part2) barplot(to_graph)
And I obtained the graph I wanted, but it's not very nice to see.
I feel like this process is too complex and I may be able to do everything in a couple lines and without creating so many new variables, do you have any suggestions?
Also, I would prefer to create the graph with ggplot because it's better to improve the design, but I don't really know how to do it.
Thanks!
Using ggplot:
library(ggplot2)
library(dplyr)
my_data %>%
stack(select = c(part_1, part_2)) %>%
ggplot(aes(values, x=ind)) + geom_bar(stat="summary", fun=mean)

Counting in ggplot2

I'm new to R and looking to get some help/explanation on why my code is doing what it is doing. I've started doing the Tidy Tuesday projects to better learn R, so that is where the data is from. Tidy Tuesday information
Goal:
The end result I am looking to do is sort my bar graph by which country's runners had the most first place finishes from the data and only display the top 10.
Thought process
In my head, how this would happen would be having R add up each instance of the country and have it saved into a variable.
So my first attempt is returning this:
The top_N is something I found googling around, but if I take it out, it does look right, just not limited to the top ten.
Questions:
Am I using reorder correctly to control the order of nationalities?
What is the best way to limit the which results are shown?
Where exactly in the code is it counting each nationality? I'm thinking it is in the sum, but not not 100% sure. Most examples I've found of this used it for numerical values, not strings and that has me a bit confused.
library(tidyverse)
library(ggplot2)
library(readr)
library(dplyr)
ultra_rankings <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-26/ultra_rankings.csv')
race <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-26/race.csv')
ultra_rankings %>%
filter(rank == '1') %>% #Only looks at rows that have a first place finish
top_n(10, nationality) %>% #I think this is what is throwing me off
ggplot(aes(x = reorder(nationality, -rank, sum), y = rank))+geom_bar(stat = "identity")+
labs(title = "First Place Rankings by Country", caption = "Data from runrepeat.com")+
theme(plot.title = element_text(hjust = .5))+ylab("Total First Place Finishes")+xlab("Runner Nationalities")
Try this:
gt <- ultra_rankings %>% filter(rank==1) %>% group_by(nationality) %>% count(nationality) %>%arrange(-n) %>% head(10)
Then we have to change the factor to preserve sort order
gt$nationality <- factor(gt$nationality, levels = unique(gt$nationality))
Now it can be plotted:
ggplot(data=gt,aes(x=nationality,y=n))+geom_bar(stat="identity")

Making a Bar Chart in R with filter

I'm still new to R and Stackoverflow and was looking for help. My assignment is about the World Cup. I want to make a bar chart that shows the abbreviations of country names on the x-axis and the attendance of their stadiums on the y-axis. I used the code below and got a graph that I attached as an image to see what I made. The problem is that the x-axis shows all the countries in the dataframe and I only want about 10 selected countries. Is there anything I'm missing and what can I do. Thank you
CODE:
WorldCupMatches %>%
ggplot(aes(x = Home.Team.Initials, y = Attendance)) +
geom_col()
OP, you can filter the dataset before you pipe into ggplot(...). There are a few ways to do this, but I find using dplyr::filter() function to be one of the simplest. You can specify to only include rows in your dataframe that satisfy a particular condition:
WorldCupMatches %>%
dplyr::filter(Home.Team.Initials %in% c(...) ) %>%
ggplot(aes(x = Home.Team.Initials, y = Attendance)) +
geom_col()
Just specify c(...) to be a vector of the home team initials you want to see shown in the plot.

error filtering data: Faceting variables must have at least one value

I am trying to write a code by using dplyr and a yeast dataset
I Read in the data with this code
gdat <- read_csv(file = "brauer2007_tidy1.csv")
I ommitted na's by using this
gdat <- na.omit(gdat)
library(ggplot2)
Then I tried to filter some genes according to their column name "symbol" and used ggplot to make a plot
filter(gdat, symbol=="QRI7", symbol== "CFT2", symbol== "RIB2",
symbol=="EDC3", symbol=="VPS5", symbol=="AMN1" & rate=.05) %>%
ggplot(aes(x=rate,
y=expression,
group=1,
colour=nutrient)) +
geom_line(lwd=1.5) +
facet_wrap(~nutrient)
facet_wrap(~nutrient) is used to seperate each gene's rate vs. expression graphs according to the nutrient which is depleted but this error keeps coming:
error: Faceting variables must have at least one value
I checked these genes by using the filter function if all of them could be displayed on r and they did when I filtered them individually but when I combine multiple genes with ggplot I get this error.
Also when I use "&rate=.05" I can't get only the values which are at rate=.05.
Does anyone know how I can fix this problem? I have a deadline till tomorrow 17.30 and if somebody could help me I would be very glad, thanks.
I downloaded what I assume is the same dataset like this:
library(readr)
library(dplyr)
library(ggplot2)
gdat <- read_csv("https://raw.githubusercontent.com/bioconnector/workshops/master/data/brauer2007_tidy.csv")
So the first problem is your filter. If you are looking for any of those gene symbols, you need to use %in%. And rate requires a double equals ==:
gdat %>%
filter(symbol %in% c("QRI7", "CFT2", "RIB2", "EDC3", "VPS5", "AMN1"),
rate == 0.05)
I don't think you want to filter for one rate and then use geom_line, because you will just get one vertical line at one value of x (rate).
Neither do I think you want to use geom_line for multiple values of rate, because there are several values for expression at each rate and a line will generate a nasty-looking zigzag.
And as you are faceting on nutrient, there's no need to color by nutrient. Perhaps you want to color by gene?
So you need to think about what makes a good visualisation of this data. Here's a simple example to get you started.
gdat %>%
filter(symbol %in% c("QRI7", "CFT2", "RIB2", "EDC3", "VPS5", "AMN1")) %>%
ggplot(aes(x=rate,
y=expression,
color = symbol)) +
geom_line() +
facet_wrap(~nutrient)
Result:

Are there any modification/another function to plot two numerical variable against one string variable?

I have a data set like this one: Names of mutations and two numerical variables representing values in two conditions (CIP and TIG):
I was able to plot one variable (e.g. CIP) in these mutation using the following code:
Data names as "Dotchart2)
dotchart(Dotchart2$`CIP resistance`,
labels = rownames((Dotchart2)), pch = 16, cex = 1, pt.cex = 2)
This appeared as follows:
Since I am comparing CIP vs TIG, I would like to have the same figure but showing another dots for the TIG for the same mutation (i.e. on each horizontal mutation line, there will be two dots of different color, one for CIP value and the other for TIG value). It should appear like this figure for instance
Could any of you provide a simplified code for this ?
I think you'll find your answer here.
In the link provided, #JoshO'Brien creates a dotchart plot using a lattice configuration:
autos_data <- read.table("~/Documents/R/test.txt", header=F)
library(lattice)
dotplot(V1~V2, data=autos_data)
This documentation does a thorough job of explaining and detailing graph styles (graph_type), data graphing (formula), and the data source (data=), resulting in the following:
library(lattice)
graph_type(formula, data=)
To do this easily in lattice or ggplot2 you first need to convert your data to long format. I don't have a data set handy in the right format, so I took the famous iris data set and converted it to a wide-format data set called iris_wide (see code at the bottom). I'm using tidyverse here: all of this can also be done in base R.
(To understand what's going on here you should definitely examine the iris_wide and iris_long objects.)
convert from wide to long format
library(tidyverse)
iris_long <- iris_wide %>%
pivot_longer(cols=-id,names_to="species",values_to="value")
lattice version
lattice::dotplot(id~value, data=iris_long, group=species,pch=16,
auto.key=TRUE)
ggplot version
ggplot(iris_long, aes(value,id,colour=species))+geom_point()
convert iris data from long to wide
To match your example, I'm selecting only two categories (species) and one variable (sepal length)
iris_wide <- (iris
%>% filter(Species %in% c("setosa","virginica"))
%>% select(Sepal.Length, Species)
%>% group_by(Species)
%>% mutate(id=seq(n()))
%>% pivot_wider(names_from=Species, values_from=Sepal.Length)
%>% head(10)
%>% mutate(id=LETTERS[seq(n())])
)

Resources