how to make plot to compare rpkm values - r

I have a fair amount of experience analyzing RNA-Seq data, but I am looking for new ways to visualize the data. I typically use heat maps and volcano plots, but I'd like to make this plot which is from this paper. I can make this type of plot with rlog transformed data before doing DEG analysis, but I want to color dots based on statistically significant expression differences.
I've search online and have not been able to find a good way to create this plot. Thanks in advance for any advice.

This question is more about bioinformatics so maybe it is better you can post it on biostar.
In any case, maybe you can draw a scatter plot with the package "ggscatter" or "ggplot2" and colour the statistically significant gene with an if else statement.
Please, provide sample of your data.

Related

Need Help Creating a specific kind of Isotope Plot in ggplot2

I hope that you are doing well. I am currently trying to replicate a type of isotope plot that's common in my field. Essentially, it's the result of a compound-specific stable isotope analysis.
The x and y axes represent delta values that are plotted against isotopic references from animals (ellipses) to identify animals by their signature. The ellipses represent a 95% CI.
I'm a beginner in R. I've managed to get the scatter plot to work, but I don't understand how to create a CI ellipses with reference data. Would anyone here know how to do this?
enter image description here

heatmap with R,ggmap and ggplot

I want to plot incidents on a map(San Francisco). As my incidents are way too many (800k points) I end up with overplotting problem. So to avoid this I want to make a 2 dimensional density in order to grab the desired insight. The problem is that while the incidents are spread all over the map, geom_density2d only illustrates a small area of the city. Of course the expected outcome is a density that covers nearly all the city.Any ideas why this happens?
CODE
a<-get_map("San Francisco",zoom=12,source='osm')
ggmap(a,extent='device')+ geom_density2d(data=train,aes(x=X,y=Y))+
stat_density2d(data=train,aes(x=X,y=Y,fill=..level..,alpha=..level..),
geom='polygon')
--------------------------------------------------------------
At first, #ajrwhite thanks for your answer and attitude dude. You are also right that when dealing with datasets this big you have to subset in order to experiment. As far as the number of bins are concerned, I was thinking that like geom_density the optimal kernel binwidth/ number of bins is internally calculated. As it seems, in the 2-dimensional case you have to adjust it by yourself.
Now, my problem as you mentioned was that I never thought that crimes in the city would be so concentrated. The discovery was so clear that my output seemed false. As it turns out, this is the case in the city. There is also a more detailed approach on the various visualizations of this dataset by this guy.
https://www.kaggle.com/mircat/sf-crime/violent-crime-mapping
Finally, thank you for the redirection. There is indeed extensive covering of the subject.
So I grabbed the San Francisco Crime data from Kaggle, which I suspect is the dataset you are using.
First, a suggestion - given that there are 878,049 rows in this dataset, take a sample of 5,000 and use that to experiment with plots. It will save you a lot of time:
train_reduced = train[sample(1:nrow(train), 5000),]
You can then easily plot individual cases to get a better feeling for what's happening:
ggmap(a,extent='device') + geom_point(aes(x=X, y=Y), data=train_reduced)
And now we can see that the coordinates and the data are correctly aligned:
So your problem is simply that crime is concentrated in the north-east of the city.
Returning to your density contours, we can use the bins argument to increase the precision of our contour intervals:
ggmap(a,extent='device') +
geom_density2d(data=train_reduced,aes(x=X,y=Y), bins=30) +
stat_density2d(data=train_reduced,aes(x=X,y=Y,fill=..level.., alpha=..level..), geom='polygon')
Which gives us a more informative plot spreading out more into the low-crime areas of the city:
There are countless ways of improving the aesthetics and consistency of these plots, but these have already been covered elsewhere on StackOverflow, for example:
How to make a ggplot2 contour plot analogue to lattice:filled.contour()?
Filled contour plot with R/ggplot/ggmap
If you use a smaller sample of your dataset, you should be able to experiment with these ideas very quickly and find the parameters that best suit your requirements. The ggplot2 documentation is excellent, by the way.

R: How to automatically set the color of different groups in survival plot

I am plotting the survival probability for my dataframe with 8 different groups with this command:
fit2<-Surv((time=t2$uptimeDay,event=t2$solved,type='right')~t2$cluster)
plot(fit2,conf.int=F,xlim=c(0, 250),mark.time=c(1,50,100,200),mark=c(1,3,4,2,5,7,6,8,9,10),lwd=1,cex=0.7,lty = 1:11,xlab='Time(days)',ylab='Survival Probability')
the cluster here is a number between 1 and 10.
I would like to know how to automatically set the colors of the curves together with an automatic legend using key of the curves.
Can somebody help me out with this?
I have a function that I use for Kaplan-Meier curves that is based on ggplot2, which will take care of the colors and legends for you. Regrettably, I've not gotten around to packaging it up in any sensible way. But you can download the source code from
https://gist.github.com/nutterb/004ade595ec6932a0c29
And some examples on how to use it from
https://gist.github.com/nutterb/fb19644cc18c4e64d12a
It's not clear what you mean by making this "automatic" and the desire to "use the key of the curves", but perhaps you are asking that the colors of the curves match the legend.
png()
mycols=c("red","blue")
plot(prio.fit, fill=mycols)
legend(x="bottomleft", col=mycols, legend=mycols)
dev.off()
If you want this mated to a dataset and wanted to specify particular colors for your groups, then you will need to provide a dataset so there is something meaningful to use as labels, and be more specific about the coloring schema needed.

R: getting data (instead of plot) back from sm.density.compare

I'm doing a density compare in R using the sm package (sm.density.compare). Is there anyway I can get a mathematical description of the graph or at least a table with number of points rather than a plot back? I would like to plot the resulting graphs in a different application, but need the data to do so.
Thanks a lot for the help,
culicidae

Applying functions from histograms - in R

I have a very basic grasp of stats, and a very basic grasp of R so please bear with me.
I have survey data which shows the weekly expenditure of a number of respondents. I have put this into a histogram, and have plotted a density function as well. So far so good.
How do I then apply this curve to a larger population? Say that I know that the population of my town is 25000. How can I apply that to the density curve to arrive at a new histogram and the data table behind it?
I hope this is an appropriate question, thank you.
It is not exactly clear what you want to do.
If you only have data on the sample then the best estimate that you have of the histogram/density for the population is the histogram/density of the sample, the only difference would be the scale on the y-axis. Personally I think the tick marks on the y axis should be ignored (and my preference would be that the tick labels were never plotted) since it is really the shape of the histogram/density that is important and the tick labels can change based on things that don't change the meaning. If you really feel the need to have the tick labels represent population values then see the axis function.
If you want something more than this then give us a better description of what you are trying to accomplish.

Resources