I'm trying to plot a 3D plane from three variables. I've read many questions on the topic but haven't been able to find what I'm looking for.
I have two sets of variables:
prob <- seq(0,1,by=0.01)
n <- seq(999,9999, by = 1000)
n <- c(9,99,n)
combis <- expand.grid(prob,n)
which I then use to produce my results:
res <- apply(combis,1,calc,pos=pos)
where the values of res can be between 0 and 2/3.
So I'm trying to make a 3d plot where n,prob are x,z and y is res. However most packages I've found require matrices etc, and cannot get this to work.
Any help would be appreciated, and sorry if I haven't found the answer.
Assuming that res is just a vector, you can just combine your data and reshape it into a wide-format matrix and then plot with something like the lattice package
prob <- seq(0,1,by=0.01)
n <- seq(999,9999, by = 1000)
n <- c(9,99,n)
combis <- expand.grid(prob,n)
res <- runif(n=nrow(combis), 0, 0.67) #generate sample data for res
dat <- cbind(combis, res)
library(reshape2)
datm <- acast(data = dat, Var1~Var2, value.var = "res") #cast it into wide format
library(lattice)
library(latticeExtra)
cloud(datm, panel.3d.cloud = panel.3dbars, xlab="n", ylab="res", zlab="prob")
Related
I have managed to generate pseudotime vs gene expression plots in Monocle for individual markers using the following code:
library("monocle")
lung <- load_lung()
diff_test_res <- differentialGeneTest(
lung,
fullModelFormulaStr = "~genotype"
)
ordering_genes <- diff_test_res[diff_test_res$qval < 0.01, "gene_id"]
lung <- setOrderingFilter(lung, ordering_genes)
plot_ordering_genes(lung)
#> Warning: Transformation introduced infinite values in continuous y-axis
lung <- reduceDimension(
lung,
max_components = 2,
method = 'DDRTree'
)
lung <- orderCells(lung)
lung_expressed_genes <- fData(lung)[fData(lung)$num_cells_expressed >= 5, "gene_id"]
lung_filtered <- lung[lung_expressed_genes, ]
my_genes <- rownames(lung_filtered)[1:3]
lung_subset <- lung_filtered[my_genes, ]
plot_genes_in_pseudotime(lung_subset, color_by = "genotype")
The "plot_genes_in_pseudotime" function on the final line generates a best fit curve of the plotted data. I was wondering if the y coordinates of this curve can somehow be obtained for say, every 0.01 units along the pseudotime axis? You can find the code and example plots here: http://cole-trapnell-lab.github.io/monocle-release/docs/#trajectory-step-3-order-cells-along-the-trajectory
You can access the Pseudotime and "expectation" values that comprise the curve in plot$data (monocle just plots Pseudotime against spline-smoothed mean expression for the specified genes).
You can then use approxfun to do 2d interpolation and evaluate a grid of points along the range of pseudotime.
NOTE: I am not sure this is a sensible thing to do. Pseudotime is a fairly loose and wooly thing, and reading deeply into minute changes in pseudotime is likely to lead to pretty shaky conclusions.
In any case, if you're interested in using this type of approach I would just read the code on github as it should be fairly easy to reproduce the output.
options(stringsAsFactors = FALSE)
library("monocle")
lung <- load_lung()
#> Removing 4 outliers
diff_test_res <- differentialGeneTest(
lung,
fullModelFormulaStr = "~genotype"
)
ordering_genes <- diff_test_res[diff_test_res$qval < 0.01, "gene_id"]
lung <- setOrderingFilter(lung, ordering_genes)
lung <- reduceDimension(
lung,
max_components = 2,
method = 'DDRTree'
)
lung <- orderCells(lung)
lung_expressed_genes <- fData(lung)[fData(lung)$num_cells_expressed >= 5, "gene_id"]
lung_filtered <- lung[as.character(lung_expressed_genes), ]
my_genes <- rownames(lung_filtered)[1:3]
## Use only 1 gene here. Otherwise the plot data will include multiple genes
lung_subset <- lung_filtered["ENSMUSG00000000031.9", ]
p <- plot_genes_in_pseudotime(lung_subset, color_by = "genotype")
df <- p$data
fun <- approxfun(df$Pseudotime, df$expectation)
s <- seq(min(df$Pseudotime), max(df$Pseudotime), by = 0.01)
plot(s, fun(s))
I need to interpolate annual data from a 5-year interval and so far I found how to do it for one observation using approx(). But I have a large data set and when trying to use ddply() to apply for each row, no matter what I try in the last row of code I keep receiving error messages.
e.g:
town <- data.frame(name = c("a","b","c"), X1990 = c(100,300,500), X1995=c(200,400,700))
d1990 <-c(1990)
d1995 <-c(1995)
town_all <- cbind(town,d1990,d1995)
library(plyr)
Input <- data.frame(town_all)
x <- c(town_all$X1990, town_all$X1995)
y <- c(town_all$d1990, town_all$d1995)
approx_frame <- function(df) (approx(x=x, y=y, method="linear", n=6, ties="mean"))
ddply(Input, town_all$X1990, approx_frame)
Also, if you know what function calculates geometric interpolation, it will be great. (I was only able to find examples of spline or constant methods.)
I would first put the data in long format (each column corresponds to a variable, so one column for 'year' and one for 'value'). Then, I use data.table, but the same approach could be followed with dplyr or another split-apply-combine method. This interp function is meant to do geometric interpolation with a constant rate calculated for each interval.
## Sample data (added one more year)
towns <- data.frame(name=c('a', 'b', 'c'),
x1990=c(100, 300, 500),
x1995=c(200, 400, 700),
x2000=c(555, 777, 999))
## First, transform data from wide -> long format, clean year column
library(data.table) # or use reshape2::melt
towns <- melt(as.data.table(towns), id.vars='name', variable.name='year') # wide -> long
towns[, year := as.integer(sub('[[:alpha:]]', '', year))] # convert years to integers
## Function to interpolate at constant rate for each interval
interp <- function(yrs, values) {
tt <- diff(yrs) # interval lengths
N <- head(values, -1L)
P <- tail(values, -1L)
r <- (log(P) - log(N)) / tt # rate for interval
const_rate <- function(N, r, time) N*exp(r*(0:(time-1L)))
list(year=seq.int(min(yrs), max(yrs), by=1L),
value=c(unlist(Map(const_rate, N, r, tt)), tail(P, 1L)))
}
## geometric interpolation for each town
res <- towns[, interp(year, value), by=name]
## Plot
library(ggplot2)
ggplot(res, aes(year, value, color=name)) +
geom_line(lwd=1.3) + theme_bw() +
geom_point(data=towns, cex=2, color='black') + # add points interpolated between
scale_color_brewer(palette='Pastel1')
I am trying to bin 3D coordinates.
I have coordinates of a molecule moving through a protein, from over 800 simulations... What I want is to bin these data to get means, variances and how many points I have in a bin.
I imagine it like this:
the space containing my 3D coordinates is split up into smaller 3D cubes (3D bins) defined by breaks().
What I need is all my x,y,z coordinates in these smaller 3D bins to calculate the mean and variance of these data.
Does this make sense?
Any help is greatly appreciated.
My input looks like this:
x<-c(1.1,1.2,4.3)
y<-c(3.4,5,2,3.2)
z<-c(10.1,10.3,12)
dat <- data.frame(x=x,y=y,z=z)
and the output should be organised by bins with dat having additional info on which bin the coordinates belong to:
x y y bin_x bin_y bin_z
Here you go. I might be completely wrong here, but your question is hard to answer without some expected output. I went on your intention of calculating mean and variance for each small cube, so created a grouping variable.
#generate some data with some more points and a vale
set.seed(32587)
n=500
dat <- data.frame(x=runif(n,min=0,max=10),
y=runif(n,min=0,max=10),
z=runif(n,min=0,max=10))
#create bins (using 'cut', no need to do this manually or in a loop)
#I have removed the labels, so each bin is just a number.
#breaks have been changed to allow for actual binning
breaks<-seq(0,10,1)
dat$bin_x <- cut(dat$x, breaks=breaks, labels=F)
dat$bin_y <- cut(dat$y, breaks=breaks, labels=F)
dat$bin_z <- cut(dat$z, breaks=breaks, labels=F)
#create grouping variable with some string formatting for readability
dat$bin_all <- with(dat, sprintf("%02d.%02d.%02d",bin_x,bin_y,bin_z))
head(dat)
library(data.table)
m_dat <- melt(setDT(dat),measure.vars=c("x","y","z"))
res <- m_dat[,.(mean_value=mean(value),variance_value=var(value),
n_value=.N),by=list(bin_all,variable)]
res
#Matrix of bins
mat <- cbind(rep(1:10, each = 100), rep(rep(1:10, each = 10), 10), rep(1:10, 100))
Data Frame of coordinates
df1 <- data.frame(x = c(1,3), y = c(2,6), z = c(8,10))
Outputs the row of mat which match the 3 values of the row of df1
apply(apply(df1, 1, function(x)
apply(mat,1, function(y)
sum(x[1] == y[1], x[2] == y[2], x[3] == y[3])) ), 2,
function(z) which(z ==3))
I have two datasets (df1 and df2) that are plotted.
df1 = data.frame(x=c(1:10), y=c(1:10))
df2 = data.frame(x=c(0:13), y=c(0:13)^1.2)
# plot
plot(df1)
# add lines of another dataset
lines(df2)
Some values of df2 are out of the plot-range and thus not visible. (In this example I could just plot df2 first). I usually try to find out the ranges of my data, as shown below.
# manual solution
minX = min(df1$x, df2$x)
minY = min(df1$y, df2$y)
maxX = max(df1$x, df2$x)
maxY = max(df1$y, df2$y)
plot (df1, xlim=c(minX, maxX), ylim=c(minY, maxY))
lines(df2)
When having many datasets, this becomes annoying. I was wondering, if there is an easier way of adjusting the ranges of the axis.
In the first step R finds axis ranges itself. Is there also a way that R adjusts the axis-ranges, when new datasets are added?
You could use range to calculate the limits.
Imho, a better solution:
df1 <- data.frame(x=c(1:10), y=c(1:10))
df2 <- data.frame(x=c(0:13), y=c(0:13)^1.2)
ll <- list(df1,df2)
ll <- lapply(1:length(ll),function(i) {res <- ll[[i]]; res$grp <- i; res})
df <- do.call("rbind",ll)
df$grp <- factor(df$grp)
library(ggplot2)
p1 <- ggplot(df,aes(x=x,y=y,group=grp,col=grp)) + geom_line()
p1
I like #Roland's solution, but here is an extension of #Glen_b's solution that works for an arbitrary number of data sets, if you have them all in a list.
(warning: untested!)
dflist <- list(df1,df2,df3,...) ## dots are not literal!
plotline <- function(L,...) { ## here the dots are literal
## use them to specify (e.g.) xlab, ylab, other arguments to plot()
allX <- unlist(lapply(L,"[[","x"))
allY <- unlist(lapply(L,"[[","y"))
plot (df1, xlim=range(allX), ylim=range(allY),type="n",...)
invisible(lapply(L,lines))
}
This assumes that you want all the data sets drawn as lines.
If you want to start specify separate colours, point types, etc., you could extend this function -- but you would be starting to re-invent the lattice and ggplot2 packages at that point.
(If all your data sets are the same size, you should consider matplot)
You could always write a function:
plotline <- function(df1,df2) {
minX = min(df1$x, df2$x)
minY = min(df1$y, df2$y)
maxX = max(df1$x, df2$x)
maxY = max(df1$y, df2$y)
plot (df1, xlim=c(minX, maxX), ylim=c(minY, maxY))
lines(df2)
}
Then you just do this:
plotline(firstdf,seconddf)
If you want to get fancy, you can even include the argument ... and pass it to the plot call.
Look at the matplot function, it will accept a matrix as x, y, or both and do all the automatic range calculations for you. If you have the data in multiple data frames then you can use sapply to extract the key pieces and form the matricies.
This approach is often even simpler than using the lines function multiple times:
df1 <- data.frame(x=1:10, y=1:10)
df2 <- data.frame(x=0:13, y=(0:13)^1.2)
df3 <- data.frame(x= -3:5, y= 5:(-3))
mylist <- list( df1, df2, df3 )
max.n <- max(sapply(mylist,nrow))
tmpfun <- function(df, which.col, n) {
tmp <- df[[which.col]]
c(tmp, rep(NA, n-length(tmp)))
}
matplot( sapply(mylist, tmpfun, which.col='x', n=max.n),
sapply(mylist, tmpfun, which.col='y', n=max.n), type='b' )
The above is even simpler if all the data frames have the same number of rows.
The other approach as mentioned in the comments is to combine the datasets into a single dataset and use tools like lattice graphics or ggplot2:
lengths <- sapply(mylist, nrow)
df.all <- do.call(rbind, mylist)
df.all$group <- rep( seq_along(lengths), lengths )
library(lattice)
xyplot( y~x, data=df.all, groups=group, type='b' )
library(ggplot2)
qplot(x,y, colour=factor(group), data=df.all, geom=c('point','path') )
If all else fails you can use the zoomplot function from the TeachingDemos package to change the limits of base graphics after the fact, but the above methods are much better.
I have the following script that emulates the type of data structure I have and analysis that I want to do on it,
library(ggplot2)
library(reshape2)
n <- 10
df <- data.frame(t=seq(n)*0.1, a =sort(rnorm(n)), b =sort(rnorm(n)),
a.1=sort(rnorm(n)), b.1=sort(rnorm(n)),
a.2=sort(rnorm(n)), b.2=sort(rnorm(n)))
head(df)
mdf <- melt(df, id=c('t'))
## head(mdf)
levels(mdf$variable) <- rep(c('a','b'),3)
g <- ggplot(mdf,aes(t,value,group=variable,colour=variable))
g +
stat_smooth(method='lm', formula = y ~ ns(x,3)) +
geom_point() +
facet_wrap(~variable) +
opts()
What I would like to do in addition to this is plot the first derivative of the smoothing function against t and against the factors, c('a','b'), as well. Any suggestions how to go about this would be greatly appreciated.
You'll have to construct the derivative yourself, and there are two possible ways for that. Let me illustrate by using only one group :
require(splines) #thx #Chase for the notice
lmdf <- mdf[mdf$variable=="b",]
model <- lm(value~ns(t,3),data=lmdf)
You then simply define your derivative as diff(Y)/diff(X) based on your predicted values, as you would do for differentiation of a discrete function. It's a very good approximation if you take enough X points.
X <- data.frame(t=seq(0.1,1.0,length=100) ) # make an ordered sequence
Y <- predict(model,newdata=X) # calculate predictions for that sequence
plot(X$t,Y,type="l",main="Original fit") #check
dY <- diff(Y)/diff(X$t) # the derivative of your function
dX <- rowMeans(embed(X$t,2)) # centers the X values for plotting
plot(dX,dY,type="l",main="Derivative") #check
As you can see, this way you obtain the points for plotting the derivative. You'll figure out from here how to apply this to both levels and combine those points to the plot you like. Below the plots from this sample code :
Here's one approach to plotting this with ggplot. There may be a more efficient way to do it, but this uses the manual calculations done by #Joris. We'll simply construct a long data.frame with all of the X and Y values while also supplying a variable to "facet" the plots:
require(ggplot2)
originalData <- data.frame(X = X$t, Y, type = "Original")
derivativeData <- data.frame(X = dX, Y = dY, type = "Derivative")
plotData <- rbind(originalData, derivativeData)
ggplot(plotData, aes(X,Y)) +
geom_line() +
facet_wrap(~type, scales = "free_y")
If data is smoothed using smooth.spline, the derivative of predicted data can be specified using the argument deriv in predict. Following from #Joris's solution
lmdf <- mdf[mdf$variable == "b",]
model <- smooth.spline(x = lmdf$t, y = lmdf$value)
Y <- predict(model, x = seq(0.1,1.0,length=100), deriv = 1) # first derivative
plot(Y$x[, 1], Y$y[, 1], type = 'l')
Any dissimilarity in the output is most likely due to differences in the smoothing.