Finding the x-value at a certain y-value on a ggplot - r

I am currently having some difficulties trying to find the Effective Concentration of 50% for one of my datasets. To shortly summarize what it is, it is data on how levels of glutathione in cells depleted from 100% when exposed to a substance known as HEMA.
GSH50 <- read.table("Master list for all GSH data T9 TVN.csv", header = TRUE, sep = ";", dec = ",")
After some further subsetting, I end up with a plot like this
GSH plot
I have several more plots in addition to this, so I need to find the EC50 value for everyone so I can then compare them with each other (the problem is consistent on several plots, so if it can be fixed here it should be fixed on the others as well).
From an earlier dataset with almost the same setup (the only difference being x-axis values) I managed to get fairly correct EC50 using a setup like this:
HG <- approxfun(x, y)
optimize(function(t0) abs(HG(t0) - 50), interval = range(x))
Where I then got my EC50 value from the optimize function. However, it does not work on this data for some reason, as if I input the value from optimize, I end up getting this GSH plot instead.
If somebody has any idea how I can fix this issue, it would be most appreciated.
Edit
If you want a reproducible dataset I gathered the averages of the data, and as such the plot should still be similar to the GSH plots I have shown:
Concentration <- seq(from = 0, to = 9, by=1)
GSH <- c(100, 67.405, 47.78, 39.2325, 33.97, 28.435, 26.97, 24.5125, 23.5275, 21.565)
df <- data.frame(Concentration, GSH)
ggplot(df, aes(Concentration, GSH)) + geom_smooth()
I am quite certain that the dose is high enough to reach the lower level, but I have not stored the model somewhere. I hope the example data provided is enough.
Edit2
I should mention that the approx and optimize code does work for the example when we use geom_lines(), but for some reason, it is not as accurate on geom_smooth().

Related

Is there a way to add species to an ISOMAP plot in R?

I am using the isomap-function from vegan package in R to analyse community data of epiphytic mosses and lichens. I started analysing the data using NMDS but due to the structure of the data ran into problems which is why I switched to ISOMAP which works perfectly well and returns very nice results. So far so good... However, the output of the function does not support plotting of species within the ISOMAP plot as species scores are not available. Anyway, I would really like to add species information to enhance the interpretability of the output.
Does anyone of you has a solution or hint to this problem? Is there a way to add species kind of post hoc to the plot as it can be done with environmental data?
I would greatly appreciate any help on this topic!
Thank you and best regards,
Inga
No, there is no function to add species scores to isomap. It would look like this:
`sppscores<-.isomap` <-
function(object, value)
{
value <- scale(value, center = TRUE, scale = FALSE)
v <- crossprod(value, object$points)
attr(v, "data") <- deparse(substitute(value))
object$species <- v
object
}
Or alternatively:
`sppscores<-.isomap` <-
function(object, value)
{
wa <- vegan::wascores(object$points, value, expand = TRUE)
attr(wa, "data") <- deparse(substitute(value))
object$species <- wa
object
}
If ord is your isomap result and comm are your community data, you can use these as:
sppscores(ord) <- comm # either alternative
I have no idea (yet) which of these alternatives is more correct. The first adds species scores as vectors of their linear increase, the second as their weighted averages in ordination space, but expanded so that we allow some species be more extreme than the site units where they occur.
These will add new element species to the result object ord. However, using these in vegan would need more coding, but you can extract the species scores with vegan::scores, but their scaling is based on the original scale of community data, and may be badly scaled with respect to points of site units, and working on this would require more work. However, you can plot them separately, or then multiply with a constant giving similar scaling as site unit scores.
sp <- scores(ord, display="species", choices=1:2)
plot(sp, type = "n", asp = 1) # does not allow plotting text
text(sp, labels = rownames(sp)) # so we must add text

R: How does stat_density_ridges modify and plot data?

<Disclaimer(s) - (1) This is my first post, so please be gentle, specifically regarding formatting and (2) I did try to dig as much as I could on this topic before posting the question here>
I have a simple data vector containing returns of 40 portfolios on the same day:
Year Return
Now -17.39862061
Now -12.98954582
Now -12.98954582
Now -12.86928749
Now -12.37044334
Now -11.07007504
Now -10.68971539
Now -10.07578182
Now -9.984867096
Now -8.764036179
Now -8.698093414
Now -8.594026566
Now -8.193638802
Now -7.818599701
Now -7.622627735
Now -7.535216808
Now -7.391239166
Now -7.331315517
Now -5.58059597
Now -5.579797268
Now -4.525201797
Now -3.735909224
Now -2.687532902
Now -2.65363884
Now -2.177522898
Now -1.977644682
Now -1.353205681
Now -0.042584345
Now 0.096564181
Now 0.275416046
Now 0.638839543
Now 1.959529042
Now 3.715519428
Now 4.842819691
Now 5.475946426
Now 6.380955219
Now 6.535937309
Now 8.421762466
Now 8.556800842
Now 10.39185524
I am trying to plot these returns to compare versus other days (so the rest of my history e.g.). I tried to use stat_density_ridges as per the code block below
ggplot(data = data.plot, aes(x = Return, y = Year, fill = factor(..quantile..))) +
stat_density_ridges(geom = "density_ridges_gradient",calc_ecdf = TRUE,
quantiles = c(0.025, 0.5, 0.975),
quantile_lines = TRUE)
As you can see - the "year" in this case is the same i.e. there is no height parameter, yet I get a nice ridg(y) chart. While the chart is beautiful to behold, and very very awesome, I am at a loss to determine how the plotting function is computing the density in this case, specially the height.
This is the output chart I get (I have omitted the formatting code here since it doesn't make a difference to my question):
Portfolio Return Distribution Plots - US versus Europe
I tried digging into the code of the function itself, but came up with a total blank. The documentation didn't help (except perhaps give me a hint that the function plots continous distributions).
Any help, or guidance, or even a nudge in the right direction would be extremely helpful.

Using multiple datasets for one graph

I have 2 csv data files. Each file has a "date_time" column and a "temp_c" column. I want to make the x-axis have the "date_time" from both files and then use 2 y-axes to display each "temp_c" with separate lines. I would like to use plot instead of ggplot2 if possible. I haven't been able to find any code help that works with my data and I'm not sure where to really begin. I know how to do 2 separate plots for these 2 datasets, just not combine them into one graph.
plot(grewl$temp_c ~ grewl$date_time)
and
plot(kbll$temp_c ~ kbll$date_time)
work separately but not together.
As others indicated, it is easy to add new data to a graph using points() or lines(). One thing to be careful about is how you format the axes as they will not be automatically adjusted to fit any new data you input using points() and the like.
I've included a small example below that you can copy, paste, run, and examine. Pay attention to why the first plot fails to produce what you want (axes are bad). Also note how I set this example up generally - by making fake data that showcase the same "problem" you are having. Doing this is often a better strategy than simply pasting in your data since it forces you to think about the core component of the problem you are facing.
#for same result each time
set.seed(1234)
#make data
set1<-data.frame("date1" = seq(1,10),
"temp1" = rnorm(10))
set2<-data.frame("date2" = seq(8,17),
"temp2" = rnorm(10, 1, 1))
#first attempt fails
#plot one
plot(set1$date1, set1$temp1, type = "b")
#add points - oops only three showed up bc the axes are all wrong
lines(set2$date2, set2$temp2, type = "b")
#second attempt
#adjust axes to fit everything (set to min and max of either dataset)
plot(set1$date1, set1$temp1,
xlim = c(min(set1$date1,set2$date2),max(set1$date1,set2$date2)),
ylim = c(min(set1$temp1,set2$temp2),max(set1$temp1,set2$temp2)),
type = "b")
#now add the other points
lines(set2$date2, set2$temp2, type = "b")
# we can even add regression lines
abline(reg = lm(set1$temp1 ~ set1$date1))
abline(reg = lm(set2$temp2 ~ set2$date2))

Advise a Chemist: Automate/Streamline his Voltammetry Data Graphing Code

I am a chemist dealing with a significant amount of voltammetry data recently. Let me be very clear and give some research information. I run scans from a starting voltage to an ending voltage on solid state conductive films. These scans are saved as .txt files (name scheme: run#.txt) in a single folder. I am looking at how conductance changes as temperature changes. The LINEST line plotting current v. voltage at a given temperature gives me a line with slope = conductance. Once I have the conductances (slopes) for each scan, I plot conductance v. temperature to see the temperature dependent conductance characteristics. I had been doing this in Excel, but have found quicker ways to get the job done using R. I am brand new to R (Rstudio) and recognize that my coding is not the best. Without doubt, this process can be streamlined and sped up which would help immensely. This is how I am performing the process currently:
# Set working directory with folder containing all .txt files for inspection
# Add all .txt files to the global environment
allruns<-list.files(pattern=".txt")
for(i in 1:length(allruns))assign(allruns[i],read.table(allruns[i]))
Since the voltage column (a 1x1000 matrix) is the same for all runs and is in column V1 of each .txt file, I assign a x to be the voltage column from the first folder
x<-run1.txt$V1
All currents (these change as voltage changes) are found in the V2 column of all the .txt files, so I assign y# to each. These are entered one at a time..
y1<-run1.txt$V2
y2<-run2.txt$V2
y3<-run3.txt$V2
# ...
yn<-runn.txt$V2
So that I can get the eqn for each LINEST (one LINEST for each scan and plotted with abline later). Again entered one at a time:
run1<-lm(y1~x)
run2<-lm(y2~x)
run3<-lm(y3~x)
# ...
runn<-lm(yn~x)
To obtain a single graph with all LINEST (one for each scan ) on the same plot, without the data points showing up, I have been using this pattern of coding to first get all data points on a single plot in separate series:
plot(x,y1,col="transparent",main="LSV Solid Film", xlab = "potential(V)",ylab="current(A)", xlim=rev(range(x)),ylim=range(c(y3,yn)))
par(new=TRUE)
plot(x,y2,col="transparent",main="LSV Solid Film", xlab = "potential(V)",ylab="current(A)", xlim=rev(range(x)),ylim=range(c(y3,yn)))
par(new=TRUE)
plot(x,y3,col="transparent",main="LSV Solid Film", xlab = "potential(V)",ylab="current(A)", xlim=rev(range(x)),ylim=range(c(y1,yn)))
# ...
par(new=TRUE)
plot(x,yn,col="transparent",main="LSV Solid Film", xlab = "potential(V)",ylab="current(A)", xlim=rev(range(x)),ylim=range(c(y1,yn)))
#To obtain all LINEST lines (one for each scan, on the single graph):
abline(run1,col=””, lwd=1)
abline(run2,col=””,lwd=1)
abline(run3,col=””,lwd=1)
# ...
abline(runn,col=””,lwd=1)
# Then to get each LINEST equation:
summary(run1)
summary(run2)
summary(run3)
# ...
summary(runn)
Each time I use summary(), I copy the slope and paste it into an Excel sheet- along with corresponding scan temp which I have recorded separately. I then graph the conductance v temp points for the film as X-Y scatter with smooth lines to give the temperature dependent conductance curve. Giving me a single LINEST lines plot in R and the conductance v temp in Excel.
This technique is actually MUCH quicker than doing it all in Excel, but it can be done much quicker and efficiently!!! Also, if I need to change something, this entire process needs to be reexecuted with whatever change is necessary. This process takes me maybe 5 hours in Excel and 1.5 hours in R (maybe I am too slow). Nonetheless, any tips to help automate/streamline this further are greatly appreciated.
There are plenty of questions about operating on data in lists; storing a list of matrix or a list of data.frame is fast, and code that operates cleanly on one can be applied to the remaining n-1 very easily.
(Note: the way I'm showing it here is one technique: maintaining everything in well-compartmentalized lists. Other will suggest -- very justifiably -- that combing things into a single data.frame and adding a group variable (to identify from which file/experiment the data originated) will help with more advanced multi-experiment regression or combined plotting, such as with ggplot2. I'm not going to go into this latter technique here, not yet.)
It is long decried not to do for(...) assign(..., read.csv(...)); you have the important part done, so this is relatively easy:
allruns <- sapply(list.files(pattern = "*.txt"), read.table, simplify = FALSE)
(The use of sapply(..., simplify=FALSE) is similar to lapply(...), but it has a nice side-effect of naming the individual list-ified elements with, in this case, each filename. It may not be critical here but is quite handy elsewhere.)
Extracting your invariant and variable data is simple enough:
allLMs <- lapply(allruns, function(mdl) lm(V2 ~ V1, data = mdl))
I'm using each table's V1 here instead of a once-extracted x ... though you might wonder why, I argue keeping it like for two reasons: (1) JUST IN CASE the V1 variable is ever even one-row-different, this will save you; (2) it is very easy to construct the model like this.
At this point, each object within allLMs is an lm object, meaning we might do:
summary(allLMs[[1]])
Plotting: I think I understand why you are using par=NEW, and I have to laugh ... I had been deep in R for a while before I started using that technique. What I think you need is actually much simpler:
xlim <- rev(range(allruns[[1]]$V1))
ylim <- range(sapply(allruns, `[`, "V2"))
# this next plot just sets the box and axes, no points
plot(NA, type = "na", xlim = xlim, ylim = ylim)
# no need to plot points with "transparent" ...
ign <- sapply(allLMs, abline, col = "") # and other abline options ...
Copying all models into Excel, again, using lists:
out <- do.call(rbind, sapply(allLMs, function(m) summary(m)$coefficients[,1]))
This will now be a single data.frame with all coefficients in two columns. (Feel free to use similar techniques to extract the other model summary attributes, including std err, t.value, or Pr(>|t|) (in the $coefficients); or $r.squared, $adj.r.squared, etc.)
write.csv(out, file="clipboard", sep="\t")
and paste into Excel. (Or, better yet, save it to a CSV file and import that, since you might want to keep it around.)
One of the tricks to using lists for this is to persevere: keep things in lists as long as you can, so that you don't have deal with models individually. One mantra is that if you do it once, you shouldn't have to type it again, just loop/apply/map/whatever. Don't extract too much from the lists before you have to.
Note: r2evans' answer provides good general advice and doesn't require heavy package dependencies. But it probably doesn't hurt to see alternative strategies.
The tidyverse can be quite handy for this sort of thing, here's a dummy example for illustration,
library(tidyverse)
# creating dummy data files
dummy <- function(T) {
V <- seq(-5, 5, length=20)
I <- jitter(T*V + T, factor = 1)
write.table(data.frame(V=V, I = I),
file = paste0(T,".txt"),
row.names = FALSE)
}
purrr::walk(300:320, dummy)
# reading
lf <- list.files(pattern = "\\.txt")
read_one <- function(f, ...) {cbind(T = as.numeric(gsub("\\.txt", "", f)), read.table(f, ...))}
m <- purrr::map_df(lf, read_one, header = TRUE, .id="id")
head(m)
ggplot(m, aes(V, I, group = T)) +
facet_wrap( ~ T) +
geom_point() +
geom_smooth(se = FALSE)
models <- m %>%
split(.$T) %>%
map(~lm(I ~ V, data = .))
coefs <- models %>% map_df(broom::tidy, .id = "T")
ggplot(coefs, aes(as.numeric(T), estimate)) +
geom_line() +
facet_wrap(~term, scales = "free")

Long vector-plot/Coverage plot in R

I really need your R skills here. Been working with this plot for several days now. I'm a R newbie, so that might explain it.
I have sequence coverage data for chromosomes (basically a value for each position along the length of every chromosome, making the length of the vectors many millions). I want to make a nice coverage plot of my reads. This is what I got so far:
Looks alright, but I'm missing y-labels so I can tell which chromosome it is, and also I've been having trouble modifying the x-axis, so it ends where the coverage ends. Additionally, my own data is much much bigger, making this plot in particular take extremely long time. Which is why I tried this HilbertVis plotLongVector. It works but I can't figure out how to modify it, the x-axis, the labels, how to make the y-axis logged, and the vectors all get the same length on the plot even though they are not equally long.
source("http://bioconductor.org/biocLite.R")
biocLite("HilbertVis")
library(HilbertVis)
chr1 <- abs(makeRandomTestData(len=1.3e+07))
chr2 <- abs(makeRandomTestData(len=1e+07))
par(mfcol=c(8, 1), mar=c(1, 1, 1, 1), ylog=T)
# 1st way of trying with some code I found on stackoverflow
# Chr1
plotCoverage <- function(chr1, start, end) { # Defines coverage plotting function.
plot.new()
plot.window(c(start, length(chr1)), c(0, 10))
axis(1, labels=F)
axis(4)
lines(start:end, log(chr1[start:end]), type="l")
}
plotCoverage(chr1, start=1, end=length(chr1)) # Plots coverage result.
# Chr2
plotCoverage <- function(chr2, start, end) { # Defines coverage plotting function.
plot.new()
plot.window(c(start, length(chr1)), c(0, 10))
axis(1, labels=F)
axis(4)
lines(start:end, log(chr2[start:end]), type="l")
}
plotCoverage(chr2, start=1, end=length(chr2)) # Plots coverage result.
# 2nd way of trying with plotLongVector
plotLongVector(chr1, bty="n", ylab="Chr1") # ylab doesn't work
plotLongVector(chr2, bty="n")
Then I have another vector called genes that are of special interest. They are about the same length as the chromosome-vectors but in my data they contain more zeroes than values.
genes_chr1 <- abs(makeRandomTestData(len=1.3e+07))
genes_chr2 <- abs(makeRandomTestData(len=1e+07))
These gene vectors I would like plotted as a red dot under the chromosomes! Basically, if the vector has a value there (>0), it is presented as a dot (or line) under the long vector plot. This I have not idea how to add! But it seems fairly straightforward.
Please help me! Thank you so much.
DISCLAIMER: Please do not simply copy and paste this code to run off the entire positions of your chromosome. Please sample positions (for example, as #Gx1sptDTDa shows) and plot those. Otherwise you'd probably get a huge black filled rectangle after many many hours, if your computer survives the drain.
Using ggplot2, this is really easily achieved using geom_area. Here, I've generated some random data for three chromosomes with 300 positions, just to show an example. You can build up on this, I hope.
# construct a test data with 3 chromosomes and 100 positions
# and random coverage between 0 and 500
set.seed(45)
chr <- rep(paste0("chr", 1:3), each=100)
pos <- rep(1:100, 3)
cov <- sample(0:500, 300)
df <- data.frame(chr, pos, cov)
require(ggplot2)
p <- ggplot(data = df, aes(x=pos, y=cov)) + geom_area(aes(fill=chr))
p + facet_wrap(~ chr, ncol=1)
You could use the ggplot2 package.
I'm not sure what exactly you want, but here's what I did:
This has 7000 random data points (about double the amount of genes on Chromosome 1 in reality). I used alpha to show dense areas (not many here, as it's random data).
library(ggplot2)
Chr1_cov <- sample(1.3e+07,7000)
Chr1 <- data.frame(Cov=Chr1_cov,fil=1)
pl <- qplot(Cov,fil,data=Chr1,geom="pointrange",ymin=0,ymax=1.1,xlab="Chromosome 1",ylab="-",alpha=I(1/50))
print(pl)
And that's it. This ran in less than a second. ggplot2 has a humongous amount of settings, so just try some out. Use facets to create multiple graphs.
The code beneath is for a sort of moving average, and then plotting the output of that. It is not a real moving average, as a real moving average would have (almost) the same amount of data points as the original - it will only make the data smoother. This code, however, takes an average for every n points. It will of course run quite a bit faster, but you will loose a lot of detailed information.
VeryLongVector <- sample(500,1e+07,replace=TRUE)
movAv <- function(vector,n){
chops <- as.integer(length(vector)/n)
count <- 0
pos <- 0
Cov <-0
pos[1:chops] <- 0
Cov[1:chops] <- 0
for(c in 1:chops){
tmpcount <- count + n
tmppos <- median(count:tmpcount)
tmpCov <- mean(vector[count:tmpcount])
pos[c] <- tmppos
Cov[c] <- tmpCov
count <- count + n
}
result <- data.frame(pos=pos,cov=Cov)
return(result)
}
Chr1 <- movAv(VeryLongVector,10000)
qplot(pos,cov,data=Chr1,geom="line")

Resources