Convert a Graph to a Data Frame in R - r

So a while back (6 months+) I saw a blog post where the author took a line graph someone had posted on the internet, fed the image into R, and used a function to convert the image into a data frame.
I've looked everywhere, and I can't seem to find this blog post (even though I'm sure I bookmarked it). So I was wondering if any of you had also read said blog post, or if someone knew of a quick and easy way to convert a line graph to a data frame in R?

Was this it? I searched for "R digitize plot". The package used is "ReadImages". For completeness, the steps listed were (see link):
library(ReadImages) #Load package
mygraph <- read.jpeg('plot.jpg') #Import image
plot(mygraph) # Plot the image
calpoints <- locator(n=4,type='p',pch=4,col='blue',lwd=2) # Calibrate the plot by selecting known coordinates
data <- locator(type='p',pch=1,col='red',lwd=1.2,cex=1.2) # Collect the data points in a dataframe

When you say 'the image as a data frame', do you mean you want to get back to the original data that made the line?
It's not R, but I've used Engauge Digitizer for this sort of thing:
http://digitizer.sourceforge.net/

Also look at the updateusr function in the TeachingDemos package. Once you have the image displayed as in Benjamin's post, you can use the updateusr function with the known points to change the user coordinates so that then the results from the locator function do not need any additional transformation.

As i write this, the digitize package and the ReadImages package are no longer available for R 3.0.2. Engauge Digitizer is a good option but if you still want to do this sort of thing in R, take a loook at http://rscriptsandtips.blogspot.no/

You can also use im2graph to convert graphs to data. It's free and available of Windows and Linux (http://www.im2graph.co.il).

Related

How to make a nice looking table in base r (not markdown)

I’ve been looking for an hour, but everything I can find about how to make a nice looking table out of a data frame mentions that it’s for rmarkdown, html, or latex.
Is it not possible to make a nice looking table in base r?
plot(x, y) makes a graph.
Is there no function like: printTable(df)?
Broadly speaking over what you can get from a normal print in base::print there is not much else you can do. You could try to twist plot function to plot values from selected cells in a data frame but that would be very onerous to develop and impractical in the light of currently available and maintained solutions. There is a number of packages that let you achieve what you need. For instance you can try formattable by renkun-ken.
Example
For a simple example you can try formattable::formattable(mtcars[1:10,])
Creating Images
For a solution creating images from tables, have a look at this discussion. As discussed, in the linked answer if you insist on generating a static image you can use grid.table function offered via gridExtra: tbl <- grid.table(mtcars[1:5,]).
You may be interested in the flextable package that is very easy to use with multiple options to create nice tables.
You can also have multiple word, pdf, or html output types.
I invite you to check the manual : https://cran.r-project.org/web/packages/flextable/vignettes/overview.html

Plotting a subset of data from a prcomp matrix without re-running prcomp

I am asking a question to a similar post posted up 2 years ago, with no full answer to it (subset of prcomp object in R). P.S. sorry for commenting on it for an answer..
Basically, my question is the same. I have generated a PCA table using prcomp that has 10000+ genes, and 1700+ cells, made up of 7 timepoints. Plotting all of them in a single file makes it difficult to see.
I would like to plot each timepoint separately, using the same PCA results table (ie without re-running prcomp).
Thanks Dean for giving me tips on posting. To think of a way to describe my dataset without actually loading it here, will take me a week I believe. I also tried the
dput(droplevels(head(object,2)))
option, but it was just too much info since I have such a large dataset. In short, it is a large matrix of single-cell dataset where people can commonly see on packages such as Seurat (https://satijalab.org/seurat/pbmc3k_tutorial_1_4.html). EDIT: I have posted a screenshot of a subset of my matrix here ().
Sorry I don't know how to re-create this or even export a text format.. But this is what I can provide:
My TPM matrix has 16541 rows (defining genes), and 1798 columns (defining cells).
In it, I have "re-labelled" my columns based on timepoints, using codes such as:
D0<-c(colnames(TPM[,grep("20180419-24837-1-*", colnames(TPM))])) #D0: 286 cells
D7<-c(colnames(TPM[,grep("20180419-24837-2-*", colnames(TPM))])) #D7: 237 cells
D10<-c(colnames(TPM[,grep("20180419-24947-5-*", colnames(TPM))])) #D10: 304 cells
...... and I continued to label each timepoint.
Each timepoint was also given a specific colour.
rc<-rep("white", ncol(TPM))
rc<-[,grep("20180419-24837-1-*", colnames(TPM))]= "magenta"
...... and I continued to give colour to each timepoint.
I performed a PCA using this code:
pcaRes<-prcomp(t(log(TPM+1)), center= TRUE, scale. = TRUE)
Then I proceeded to plot a PCA plot using:
plot(pcaRes$x[,1], pcaRes$x[,2], xlab="PC1", ylab="PC2",
cex=1.0, col= rc, pch=16, main="")
Then I when I wanted to plot a PCA plot only with D0, using the same PCA output (pcaRes).. This is where I am stuck.
P.S. If anyone else has an easier way of advising how to input an example data here from my large matrix, I welcome any help. Thanks so much! Sorry I am very new in bioinformatics.
Stack Exchange for
Bioinformatics is where you you will need to go to ask question(s) or learn about the package(s) and function(s) you need to deal with you area of specialty. Stack Exchange for Bioinformatics is linked with Stackoverflow so you will just need to join, you'll have the same login.
Classes S3, S4 and Base.
This Very basic over view of Classes in R. Think of a Class as the parent you inherit all of their skills or abilities from and as a result you are able to achieve certain tasks better than others and some cases, you will not be able to do the task at all.
In R and all programming, to save re-inventing the wheel, parent classes are created so that the average person does not have to repeatedly write a function to do something simple like plot() a graph. This stuff is hidden, to access it, you inherit from the parent. The child reads the traits off the parent(s), and then it either performs the task or gives you a cryptic error message.
Base and S3 classes work well together, they are like the working class people of the R world. S4 is a specialized class made for specific fields of study to be able to provide specific functionality needed in their industry. This mean you can only use certain Base and S3 functions with Class S4 functions, most are just not compatible. So it's nothing you've done wrong, plot() and ggplot() just have the wrong parent(s) to work with your dataset.
Typical Base and S3 Class dataframe: Box like structure. Along the left hand side is all the column names, nice and neatly stacked on top of each other.
Seurat S4 Class dataframe: Tree like structure, formatted to be read by a specific function(s).
Well hope that helps and I wish you well in your career. Cheers Conrad
Ps if this helps, then click the arrow up. :)
thanks #ConradThiele for your suggestion, I will check out that site.
I had a chat with other bioinformatics around the institute. My query has little to do with the object being an S4 class, since I am performing prcomp outside of the package. I have extracted my matrix out of the object and then ran prcomp on it.
Solution is simple: run prcomp with full dataset, transform the prcomp output into a dataframe, input additional columns to input additional details like "timepoint", create new dataframe(s) only with the "timepoint"/ "variable" of interest from the prcomp result, make multiple sub-dataframe and then plotting these using "plot" or whatever function you use.
This was not my solution but from a bioinformatition I went for help to in my institute. Hope this helps others! Thanks again for your time.
P.S. If I have the time, I will post a copy of the code I suggested soon.

How to output a chart from Nielsen?

I saw an interesting chart on engadget today made by Nielsen:
http://www.engadget.com/2011/07/28/nielsen-android-leads-us-smartphone-market-with-39-percent-shar/
original source: http://blog.nielsen.com/nielsenwire/online_mobile/in-u-s-smartphone-market-android-is-top-operating-system-apple-is-top-manufacturer/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+NielsenWire+%28Nielsen+Wire%29
I'd love for someone to replicate it if possible and show the R code. Basic packages or ggplot2 would be great.
I like that the boxes are proportional, that's a key feature :)
Thanks!
You can find several implementations in R under the name of 'mosaic chart'. E.g.:
require("vcd")
data(HairEyeColor)
mosaic(HairEyeColor, shade = TRUE)
Se some examples on e.g. quickR, but searching the R graph gallery is also a good option.
In ggplot2, you can find a sample on learnr's blog.
I have also done some tweaks in ggplot2, please find the attached plot below. It is in Hungarian, but if you are interested, I could clean up the code and post is somewhere.
UPDATE: I have searched for my old script based on comment and uploaded it to pastebin. Sorry, no code clean up and it is quite messy, as I had to make it up for mass reporting from SPSS data files, but I hope you could use it. The usage is simple: load all functions (e.g.: run all lines in R with the source(...) function), and you could generate a mosaic chart of any data frame by specifying two variable names in the parameters of ggMosaicChart(). The plot will be saved to a png file in the working directory (no easy resize in R of the plot as lots of manual tweaks are done to arrange text nicely).
I have translated the strings to English, a basic example (included in the above code) of the mtcars data set:
Count, row- and column percent and also Pearson residuals are shown for each cell.
It's called treemap. R project has packages named "treemap" or "portfolio" for it. Here is how to do: http://flowingdata.com/2010/02/11/an-easy-way-to-make-a-treemap/

How to show gaps in R/quantmod's chartSeries/candleChart plots

I am trying to show "gaps" in financial data using the plotting functions in the excellent quantmod package for R.
Normally R allows you to show gaps in plots using NA values, as with:
x<-1:10
y<-2*x
y[4:7]<-NA
plot(x,y,type="l")
I would like to do something similar with R/quantmod's candleChart plots. However, rows of data containing NA's are removed before plotting (there is a na.omit command in the chartSeries code that does this) so I cannot see how to do this.
An example is:
require(quantmod)
#Make some pretend data
x<-0:30
y<-100+20*sin(x)
y.open<-y[-length(y)]
y.close<-y[-1]
val<-as.xts(cbind(y.open,y.open+5,y.close-5,y.close,1000),order.by=as.POSIXct(paste("2011-01-",x[-1],sep='')))
colnames(val)<-c("Open","High","Low","Close","Volume")
#Plot this pretend data
candleChart(val,theme="white")
#Now try and make a "gap" in the middle of the data and plot it
val2<-val
val2[5:20,]<-NA
candleChart(val2,theme="white")
What is the "correct" way to do this? I guess I could overwrite chartSeries with my own version of this function (identical but without the na.omit() call), but that seems quite drastic.
Is there perhaps an option to do this kind of thing available? I have been unable to google anything useful...
Thanks,
fttb
The answer is not to use chartSeries, but rather the newer variant (still in development technically) chart_Series. Note the underscore.
chart_Series(val2)
If you're looking for more details on quantmod and using R in finance, we are hosting a large conference in Chicago at the end of this month. More info can be found here: R/Finance 2011
Hope that helps, and hope to see you in Chicago!!

Binning NMR data in R

I've imported NMR spectra on R as .csv file ( first column represent the ppm values the others, signal intensity for various spectra) and I would like to bin the data, let's say, make every 5 points one. Any suggestions?
Cheers,
Marcelo
Marcelo, you can look at ChemoSpec on GitHub here: https://github.com/bryanhanson/ChemoSpec
The function binBuck will do what you ask. There is a fairly complete vignette available once you have the package installed.
To use ChemoSpec, you may have to import your data set differently than you apparently currently have it, or if you have the skills you can modify what you have now. Again, the vignette explains how ChemoSpec stores the data.
Let me know if you need further assistance. Bryan
I know it's an old question, but it can be useful for other users.
You can use "prospectr" package in R through function "binning". You can set "bin" as your final spectral size or "bin.size" for as ratio.

Resources