I have a dataset (test_df) that looks like:
Species
TreatmentA
TreatmentB
X0
L
K
Apple
Hot
Cloudy
1
2
3
Apple
Cold
Cloudy
4
5
6
Orange
Hot
Sunny
7
8
9
Orange
Cold
Sunny
10
11
12
I would like to display the effect of the treatments by using the X0, L, and K values as coefficients in a standard logistic function and plotting the same species across various treatments on the same plot. I would like a grid of plots with the logistic curves for each species on it's own plots, with each treatment then being grouped by color within every plot. In the above example, Plot1.Grid1 would have 2 logistic curves corresponding to Apple Hot and Apple Cold, and plot1.Grid2 would have 2 logistic curves corresponding to Orange Hot and Orange Cold.
The below code will create a single logistic function curve which can then be layered, but manually adding the layers for multiple treatments is tedious.
testx0 <- 1
testL <- 2
testk <- 3
days <- seq(from = -5, to = 5, by = 1)
functionmultitest <- function(x,testL,testK,testX0) {
(testL)/(1+exp((-1)*(testK) *(x - testX0)))
}
ggplot()+aes(x = days, y = functionmultitest(days,testL,testk,testx0))+geom_line()
The method described in (https://statisticsglobe.com/draw-multiple-function-curves-to-same-plot-in-r) works for dataframes with few species or treatments, but it becomes very tedious to individually define the curves if you have many treatments/species. Is there a way to programatically pass the list of coefficients and have ggplot handle the grouping?
Thank you!
Your current code shows how to compute the curve for a single row in your data frame. What you can do is pre-compute the curve for each row and then feed to ggplot.
Setup:
# Packages
library(ggplot2)
# Your days vector
days <- seq(from = -5, to = 5, by = 1)
# Your sample data frame above
df = structure(list(Species = c("Apple", "Apple", "Orange", "Orange"
), TreatmentA = c("Hot", "Cold", "Hot", "Cold"), TreatmentB = c("Cloudy",
"Cloudy", "Sunny", "Sunny"), X0 = c(1L, 4L, 7L, 10L), L = c(2L,
5L, 8L, 11L), K = c(3L, 6L, 9L, 12L)), class = "data.frame", row.names = c(NA,
-4L))
# Your function
functionmultitest <- function(x,testL,testK,testX0) {
(testL)/(1+exp((-1)*(testK) *(x - testX0)))
}
We'll "expand" each row of your data frame with the days vector:
# Define first a data frame of days:
days_df = data.frame(days = days)
# Perform a cross join
df_all = merge(days_df, df, all = T)
At this point, you will have a data frame where each original row is duplicated for as many days you have.
Now, just as you did for one row, we'll compute the value of the function for each row and store in the df_all as result:
df_all$result = mapply(functionmultitest, df_all$days, df_all$L, df_all$K, df_all$X0)
I'm not sure how you intended to handle treatmentA and treatmentB, so I'll just combine for illustration purposes:
df_all$combined_treatment = paste0(df_all$TreatmentA, "-", df_all$TreatmentB)
We can now feed this data frame to ggplot, set the color to be combined_treatment, and use the facet_grid function to split by species
ggplot(data = df_all, aes(x = days, y = result, color = combined_treatment))+
geom_line() +
facet_grid(Species ~ ., scales = "free")
The result is as follows:
Related
In R it is quite trivial to "collapse" an n-dimensional array into a one-dimensional column vector and sample from that using e.g. sample() function in base R.
However, I would like to sample dimnames-groups (i.e. rowname-colname pairs in case of a two-dimensional array) based on the frequencies.
Let's have an example, and assume we have a following crosstab (the data (n=70) is randomly generated):
Man
Woman
Smoking
10
20
Non-smoking
15
25
How do I sample from this that I get:
"Smoking Man" with probability: 10 / 70
"Non-smoking Man" with probability: 15 / 70
"Smoking Woman" with probability: 20 / 70
"Non-smoking Woman" with probability: 25 / 70
The easiest way would probably be grouping the dimnames (somehow), and use this as the first argument of sample function i.e.:
sample(x = vectorOfGroupedDimnames, size = 1, prob = c(crosstabAsMatrix))
Yes, and I know that the variable vectorOfGroupedDimnames can be formed using nested for loops, but there has to be more elegant ways of doing this.
So what is the easiest way to do this? Thanks.
Maybe this will help you
library(dplyr)
data <-
structure(c(25L, 20L, 15L, 10L), .Dim = c(2L, 2L), .Dimnames = list(
smoke = c("Non-smoking", "Smoking"), sex = c("Female", "Male"
)), class = "table")
data %>%
as_tibble() %>%
sample_n(size = 1,weight = n,replace = TRUE)
I am trying to draw a scatter dot plot for this data
head(data)
Subject Length Verdict
1 2 4575 Partial
2 2 5060 Partial
3 2 8978 5'DEFECT
4 2 7224 Partial
5 2 7224 Partial
6 7 8978 5'DEFECT
I get a scatter dot plot as such:
I have patients 1,2,6,7,10 for example. R is taking the names of my subjects and using them as an x-value. I want to change that so the data points appear above each patient (not treated as a value, but rather as a category).
Appreciate your help!
Here's the code I wrote to get this scatter dot plot:
ggplot(final,
aes(x=Subject,y=Length,colour=Verdict,shape=Verdict), group=Subject) +
geom_point(position=position_jitter(width=0.1,height=0)) +
scale_shape_manual(values=c(5,0,1,4,6)) +
scale_colour_manual(values=c("blue","red","green","black","violet")) +
scale_y_continuous(breaks=c(2000,4000,6000,8000,10000)) +
labs(y="Amplicon Size in bps")
Using the sample data you posted (the 6 observations), you can do this easily with as.factor or as.character. I recommend converting your Subject variable to a character first (since it sounds like you don't want it to be treated as numeric anyway).
This should work:
data$Subject <- as.character(data$Subject)
ggplot(data,
aes(x=Subject,y=Length,colour=Verdict,shape=Verdict), group=Subject) +
geom_point(position=position_jitter(width=0.1,height=0)) +
scale_shape_manual(values=c(5,0,1,4,6)) +
scale_colour_manual(values=c("blue","red","green","black","violet")) +
scale_y_continuous(breaks=c(2000,4000,6000,8000,10000)) +
labs(y="Amplicon Size in bps")
Data:
data <- structure(list(Subject = c(2L, 2L, 2L, 2L, 2L, 7L), Length =
c(4575L, 5060L, 8978L, 7224L, 7224L, 8978L), Verdict = c("Partial", "Partial",
"5'DEFECT", "Partial", "Partial", "5'DEFECT")), .Names = c("Subject",
"Length", "Verdict"), row.names = c(NA, -6L), class = "data.frame")
You could use as.factor(data$Subject) and scale_x_discrete("Subject"). Here a dummy example:
data <- read.table(textConnection('i Subject Length Verdict
1 2 4575 Partial
2 2 5060 Partial
3 2 8978 5DEFECT
4 2 7224 Partial
5 2 7224 Partial
6 7 8978 5DEFECT'), header = TRUE, stringsAsFactors = FALSE)
data$Subject <- as.factor(data$Subject)
p = ggplot(data) + geom_point(aes(x=Subject,y=Length, colour=Verdict))+ scale_shape_manual(values= c(5,0,1,4,6))
p = p + scale_colour_manual(values=c("blue","red","green","black","violet"))
p = p+ scale_x_discrete("Subject")
p = p + scale_y_continuous(breaks=c(2000,4000,6000,8000,10000))+labs(y="Amplicon Size in bps")
p
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I have the following data-frame, where variable is 10 different genre categories of movies, eg. drama, comedy etc.
> head(grossGenreMonthLong)
Gross ReleasedMonth variable value
5 33508485 2 drama 1
6 67192859 2 drama 1
8 37865 4 drama 1
9 76665507 1 drama 1
10 221594911 2 drama 1
12 446438 2 drama 1
Reproducible dataframe:
dput(head(grossGenreMonthLong))
structure(list(Gross = c(33508485, 67192859, 37865, 76665507,
221594911, 446438), ReleasedMonth = c(2, 2, 4, 1, 2, 2), variable = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = c("drama", "comedy", "short", "romance",
"action", "crime", "thriller", "documentary", "adventure", "animation"
), class = "factor"), value = c(1, 1, 1, 1, 1, 1)), .Names = c("Gross",
"ReleasedMonth", "variable", "value"), row.names = c(5L, 6L,
8L, 9L, 10L, 12L), class = "data.frame")
I would like to calculate the mean gross vs. month for each of the 10 genres and plot them in separate bar charts using facets (varying by genre).
In other words, what's a quick way to plot 10 bar charts of mean gross vs. month for each of the 10 genres?
You should provide a reproducible example to make it easier for us to help you. dput(my.dataframe) is one way to do it, or you can generate an example dataframe like below. Since you haven't given us a reproducible example, I'm going to put on my telepathy hat and assume the "variable" column in your screenshot is the genre.
n = 100
movies <- data.frame(
genre=sample(letters[1:10], n, replace=T),
gross=runif(n, min=1, max=1e7),
month=sample(12, n, replace=T)
)
head(movies)
# genre gross month
# 1 e 5545765.4 1
# 2 f 3240897.3 3
# 3 f 1438741.9 5
# 4 h 9101261.0 6
# 5 h 926170.8 7
# 6 f 2750921.9 1
(My genres are 'a', 'b', etc).
To do a plot of average gross per month, you will need to calculate average gross per month. One such way to do so is using the plyr package (there is also data.table, dplyr, ...)
library(plyr)
monthly.avg.gross <- ddply(movies, # the input dataframe
.(genre, month), # group by these
summarize, avgGross=mean(gross)) # do this.
The dataframe monthly.avg.gross now has one row per (month, genre) with a column avgGross that has the average gross in that (month, genre).
Now it's a matter of plotting. You have hinted at "facet" so I assume you're using ggplot.
library(ggplot2)
ggplot(monthly.avg.gross, aes(x=month, y=avgGross)) +
geom_point() +
facet_wrap(~ genre)
You can do stuff like add month labels and treat month as a factor instead of a number like here, but that's peripheral to your question.
Thank you very much #mathematical.coffee. I was able to adapt your answer to produce the appropriate bar charts.
meanGrossGenreMonth = ddply(grossGenreMonthLong,
.(ReleasedMonth, variable),
summarise,
mean.Gross = mean(Gross, na.rm = TRUE))
# plot bar plots with facets
ggplot(meanGrossGenreMonth, aes(x = factor(ReleasedMonth), y=mean.Gross))
+ geom_bar(stat = "identity") + facet_wrap(~ variable) +ylab("mean Gross ($)")
+ xlab("Month") +ggtitle("Mean gross revenue vs. month released by Genre")
I have 2 dataframe sharing the same rows IDs but with different columns
Here is an example
chrom coord sID CM0016 CM0017 CM0018
7 10 3178881 SP_SA036,SP_SA040 0.000000000 0.000000000 0.0009923
8 10 38894616 SP_SA036,SP_SA040 0.000434783 0.000467464 0.0000970
9 11 104972190 SP_SA036,SP_SA040 0.497802888 0.529319536 0.5479003
and
chrom coord sID CM0001 CM0002 CM0003
4 10 3178881 SP_SA036,SA040 0.526806527 0.544927536 0.565610860
5 10 38894616 SP_SA036,SA040 0.009049774 0.002849003 0.002857143
6 11 104972190 SP_SA036,SA040 0.451612903 0.401617251 0.435318275
I am trying to create a composite boxplot figure where I have in x axis the chrom and coord combined (so 3 points) and for each x value 2 boxplots side by side corresponding to the two dataframes ?
What is the best way of doing this ? Should I merge the two dataframes together somehow in order to get only one and loop over the boxplots rendering by 3 columns ?
Any idea on how this can be done ?
The problem is that the two dataframes have the same number of rows but can differ in number of columns
> dim(A)
[1] 99 20
> dim(B)
[1] 99 28
I was thinking about transposing the dataframe in order to get the same number of column but got lost on how to this properly
Thanks in advance
UPDATE
This is what I tried to do
I merged chrom and coord columns together to create a single ID
I used reshape t melt the dataframes
I merged the 2 melted dataframe into a single one
the head looks like this
I have two variable A2 and A4 corresponding to the 2 dataframes
then I created a boxplot such using this
ggplot(A2A4, aes(factor(combine), value)) +geom_boxplot(aes(fill = factor(variable)))
I think it solved my problem but the boxplot looks very busy with 99 x values with 2 boxplots each
So if these are your input tables
d1<-structure(list(chrom = c(10L, 10L, 11L),
coord = c(3178881L, 38894616L, 104972190L),
sID = structure(c(1L, 1L, 1L), .Label = "SP_SA036,SP_SA040", class = "factor"),
CM0016 = c(0, 0.000434783, 0.497802888), CM0017 = c(0, 0.000467464,
0.529319536), CM0018 = c(0.0009923, 9.7e-05, 0.5479003)), .Names = c("chrom",
"coord", "sID", "CM0016", "CM0017", "CM0018"), class = "data.frame", row.names = c("7",
"8", "9"))
d2<-structure(list(chrom = c(10L, 10L, 11L), coord = c(3178881L,
38894616L, 104972190L), sID = structure(c(1L, 1L, 1L), .Label = "SP_SA036,SA040", class = "factor"),
CM0001 = c(0.526806527, 0.009049774, 0.451612903), CM0002 = c(0.544927536,
0.002849003, 0.401617251), CM0003 = c(0.56561086, 0.002857143,
0.435318275)), .Names = c("chrom", "coord", "sID", "CM0001",
"CM0002", "CM0003"), class = "data.frame", row.names = c("4",
"5", "6"))
Then I would combine and reshape the data to make it easier to plot. Here's what i'd do
m1<-melt(d1, id.vars=c("chrom", "coord", "sID"))
m2<-melt(d2, id.vars=c("chrom", "coord", "sID"))
dd<-rbind(cbind(m1, s="T1"), cbind(m2, s="T2"))
mm$pos<-factor(paste(mm$chrom,mm$coord,sep=":"),
levels=do.call(paste, c(unique(dd[order(dd[[1]],dd[[2]]),1:2]), sep=":")))
I first melt the two input tables to turn columns into rows. Then I add a column to each table so I know where the data came from and rbind them together. And finally I do a bit of messy work to make a factor out of the chr/coord pairs sorted in the correct order.
With all that done, I'll make the plot like
ggplot(mm, aes(x=pos, y=value, color=s)) +
geom_boxplot(position="dodge")
and it looks like
I am trying to produce a series of box plots in R that is grouped by 2 factors. I've managed to make the plot, but I cannot get the boxes to order in the correct direction.
My data farm I am using looks like this:
Nitrogen Species Treatment
2 G L
3 R M
4 G H
4 B L
2 B M
1 G H
I tried:
boxplot(mydata$Nitrogen~mydata$Species*mydata$Treatment)
this ordered the boxes alphabetically (first three were the "High" treatments, then within those three they were ordered by species name alphabetically).
I want the box plot ordered Low>Medium>High then within each of those groups G>R>B for the species.
So i tried using a factor in the formula:
f = ordered(interaction(mydata$Treatment, mydata$Species),
levels = c("L.G","L.R","L.B","M.G","M.R","M.B","H.G","H.R","H.B")
then:
boxplot(mydata$Nitrogen~f)
however the boxes are still shoeing up in the same order. The labels are now different, but the boxes have not moved.
I have pulled out each set of data and plotted them all together individually:
lg = mydata[mydata$Treatment="L" & mydata$Species="G", "Nitrogen"]
mg = mydata[mydata$Treatment="M" & mydata$Species="G", "Nitrogen"]
hg = mydata[mydata$Treatment="H" & mydata$Species="G", "Nitrogen"]
etc ..
boxplot(lg, lr, lb, mg, mr, mb, hg, hr, hb)
This gives what i want, but I would prefer to do this in a more elegant way, so I don't have to pull each one out individually for larger data sets.
Loadable data:
mydata <-
structure(list(Nitrogen = c(2L, 3L, 4L, 4L, 2L, 1L), Species = structure(c(2L,
3L, 2L, 1L, 1L, 2L), .Label = c("B", "G", "R"), class = "factor"),
Treatment = structure(c(2L, 3L, 1L, 2L, 3L, 1L), .Label = c("H",
"L", "M"), class = "factor")), .Names = c("Nitrogen", "Species",
"Treatment"), class = "data.frame", row.names = c(NA, -6L))
The following commands will create the ordering you need by rebuilding the Treatment and Species factors, with explicit manual ordering of the levels:
mydata$Treatment = factor(mydata$Treatment,c("L","M","H"))
mydata$Species = factor(mydata$Species,c("G","R","B"))
edit 1 : oops I had set it to HML instead of LMH. fixing.
edit 2 : what factor(X,Y) does:
If you run factor(X,Y) on an existing factor, it uses the ordering of the values in Y to enumerate the values present in the factor X. Here's some examples with your data.
> mydata$Treatment
[1] L M H L M H
Levels: H L M
> as.integer(mydata$Treatment)
[1] 2 3 1 2 3 1
> factor(mydata$Treatment,c("L","M","H"))
[1] L M H L M H <-- not changed
Levels: L M H <-- changed
> as.integer(factor(mydata$Treatment,c("L","M","H")))
[1] 1 2 3 1 2 3 <-- changed
It does NOT change what the factor looks like at first glance, but it does change how the data is stored.
What's important here is that many plot functions will plot the lowest enumeration leftmost, followed by the next, etc.
If you create factors simply using factor(X) then usually the enumeration is based upon the alphabetical order of the factor levels, (e.g. "H","L","M"). If your labels have a conventional ordering different from alphabetical (i.e. "H","M","L"), this can make your graphs seems strange.
At first glance, it may seem like the problem is due to the ordering of data in the data frame - i.e. if only we could place all "H" at the top and "L" at the bottom, then it would work. It doesn't. But if you want your labels to appear in the same order as the first occurrence in the data, you can use this form:
mydata$Treatment = factor(mydata$Treatment, unique(mydata$Treatment))
This earlier StackOverflow question shows how to reorder a boxplot based on a numerical value; what you need here is probably just a switch from factor to the related type ordered. But it is hard say as we do not have your data and you didn't provide a reproducible example.
Edit Using the dataset you posted in variable md and relying on the solution I pointed to earlier, we get
R> md$Species <- ordered(md$Species, levels=c("G", "R", "B"))
R> md$Treatment <- ordered(md$Treatment, levels=c("L", "M", "H"))
R> with(md, boxplot(Nitrogen ~ Species * Treatment))
which creates the chart you were looking to create.
This is also equivalent to the other solution presented here.