Updating existing plot instead of creating new in a for loop - r

I am attempting a homework problem where I am tasked to plot the histogram that results from a Galton board experiment, essentially creating normal distribution by adding one value at at a time and updating the histogram after each trial (ball). I would like to find a way to update the histogram after each addition of a new value to the distribution; instead of that my code currently makes a whole ton of plots.
So far I've set up a vector with length=1000 (though theoretically I should be able to apply my final code to a vector of anything length?) and created a loop to add values to it using rbinom() with 200 "pegs" with a probability of 50% (falling left or right).
x<-numeric(1000) #create vector length of 1000 values of 0
for (i in 1:1000) {
x[i]<-sum(rbinom(200,1,0.5))
hist(x,freq=FALSE)
}
I have the hist() call within the for loop (this may be a cardinal sin in R...), which as you can imagine produces 1000 graphs! Definitely not the right way to go about this. Is there any way to just essentially update on top of the previous plot? I'm thinking of things like abline(), lines(), etc, which (as far as I can tell) just add lines on top of an already existing plot in R without creating a new one. This is probably because the data associated with those functions isn't the same as the data in a vector? Anyways, I haven't been able to figure this out wth google. I haven't tried using ggplot2 or the animate packages yet, though I'm only vaguely familiar with the former and I imagine there's a learning curve.
A final note: I'm fairly new to R, so I'd appreciate unrelated advice on the above code, but I also think it's very productive to work things out on your own, so I would prefer hints and/or general advice instead of pasting working code.
Thank you very much in advance for your help!

Related

How to create a loop with R function specgram(signal)

I am working with many signals; each one in a time series but is too many and, I need to make more than 1000 but, I am not sure how to implement it because I not only need the plots but the values of output for each spectrogram stored in a file or an R object. I am sorry I don't have an approach. Can anyone help out, please?

Plotting a subset of data from a prcomp matrix without re-running prcomp

I am asking a question to a similar post posted up 2 years ago, with no full answer to it (subset of prcomp object in R). P.S. sorry for commenting on it for an answer..
Basically, my question is the same. I have generated a PCA table using prcomp that has 10000+ genes, and 1700+ cells, made up of 7 timepoints. Plotting all of them in a single file makes it difficult to see.
I would like to plot each timepoint separately, using the same PCA results table (ie without re-running prcomp).
Thanks Dean for giving me tips on posting. To think of a way to describe my dataset without actually loading it here, will take me a week I believe. I also tried the
dput(droplevels(head(object,2)))
option, but it was just too much info since I have such a large dataset. In short, it is a large matrix of single-cell dataset where people can commonly see on packages such as Seurat (https://satijalab.org/seurat/pbmc3k_tutorial_1_4.html). EDIT: I have posted a screenshot of a subset of my matrix here ().
Sorry I don't know how to re-create this or even export a text format.. But this is what I can provide:
My TPM matrix has 16541 rows (defining genes), and 1798 columns (defining cells).
In it, I have "re-labelled" my columns based on timepoints, using codes such as:
D0<-c(colnames(TPM[,grep("20180419-24837-1-*", colnames(TPM))])) #D0: 286 cells
D7<-c(colnames(TPM[,grep("20180419-24837-2-*", colnames(TPM))])) #D7: 237 cells
D10<-c(colnames(TPM[,grep("20180419-24947-5-*", colnames(TPM))])) #D10: 304 cells
...... and I continued to label each timepoint.
Each timepoint was also given a specific colour.
rc<-rep("white", ncol(TPM))
rc<-[,grep("20180419-24837-1-*", colnames(TPM))]= "magenta"
...... and I continued to give colour to each timepoint.
I performed a PCA using this code:
pcaRes<-prcomp(t(log(TPM+1)), center= TRUE, scale. = TRUE)
Then I proceeded to plot a PCA plot using:
plot(pcaRes$x[,1], pcaRes$x[,2], xlab="PC1", ylab="PC2",
cex=1.0, col= rc, pch=16, main="")
Then I when I wanted to plot a PCA plot only with D0, using the same PCA output (pcaRes).. This is where I am stuck.
P.S. If anyone else has an easier way of advising how to input an example data here from my large matrix, I welcome any help. Thanks so much! Sorry I am very new in bioinformatics.
Stack Exchange for
Bioinformatics is where you you will need to go to ask question(s) or learn about the package(s) and function(s) you need to deal with you area of specialty. Stack Exchange for Bioinformatics is linked with Stackoverflow so you will just need to join, you'll have the same login.
Classes S3, S4 and Base.
This Very basic over view of Classes in R. Think of a Class as the parent you inherit all of their skills or abilities from and as a result you are able to achieve certain tasks better than others and some cases, you will not be able to do the task at all.
In R and all programming, to save re-inventing the wheel, parent classes are created so that the average person does not have to repeatedly write a function to do something simple like plot() a graph. This stuff is hidden, to access it, you inherit from the parent. The child reads the traits off the parent(s), and then it either performs the task or gives you a cryptic error message.
Base and S3 classes work well together, they are like the working class people of the R world. S4 is a specialized class made for specific fields of study to be able to provide specific functionality needed in their industry. This mean you can only use certain Base and S3 functions with Class S4 functions, most are just not compatible. So it's nothing you've done wrong, plot() and ggplot() just have the wrong parent(s) to work with your dataset.
Typical Base and S3 Class dataframe: Box like structure. Along the left hand side is all the column names, nice and neatly stacked on top of each other.
Seurat S4 Class dataframe: Tree like structure, formatted to be read by a specific function(s).
Well hope that helps and I wish you well in your career. Cheers Conrad
Ps if this helps, then click the arrow up. :)
thanks #ConradThiele for your suggestion, I will check out that site.
I had a chat with other bioinformatics around the institute. My query has little to do with the object being an S4 class, since I am performing prcomp outside of the package. I have extracted my matrix out of the object and then ran prcomp on it.
Solution is simple: run prcomp with full dataset, transform the prcomp output into a dataframe, input additional columns to input additional details like "timepoint", create new dataframe(s) only with the "timepoint"/ "variable" of interest from the prcomp result, make multiple sub-dataframe and then plotting these using "plot" or whatever function you use.
This was not my solution but from a bioinformatition I went for help to in my institute. Hope this helps others! Thanks again for your time.
P.S. If I have the time, I will post a copy of the code I suggested soon.

How do I display variable number of ggplots in R shiny, depending on input?

this is my first question on StackOverflow. I’ve tried to make it as clear as possible, but I am also very open to feedback!
I am creating an app with R shiny to analyze two dimensional data (Time and Value) for multiple samples.
I would like the application to:
Import the sample files.
Recognize the number of samples in the uploaded files.
Create a selectInput bar for each sample.
Create a ggplot object for each sample.
Huge thank you to Pork Chop for pointing out the similarities to this question - that solved my multiple selectInput bar issue. Also thank you to camille for suggesting purr's map function, that helps me create a list of ggplot objects without fuss.
However, I am still struggling to get all of the ggplot objects to display in Shiny. I have used this approach for inspiration but the author uses a for loop with static length. I tried their approach, just to see if it works, but it also only gives me the first plot of my list of plots.
Here is a very basic example of my current approach. Maybe something with map/lapply with renderPlot? i.e. map(plot_list, renderPlot({})) ?
Sincerest thanks again for your help and patience.
EDIT: finally managed to solve my issue with a lot of help from this post! Instead of using max_plots I created a reactive value for number of samples, and was able to get the plots to display properly once I added observe({}).
As described in the edit I made, this post was extremely helpful. Wrapping the actual creation of the plots in observe({}) and creating a reactive element for number of imported samples was crucial.
Pork Chop's reference to this post solved my issue with multiple dynamic inputs.

R plot data.frame to get more effective overview of data

At work when I want to understand a dataset (I work with portfolio data in life insurance), I would normally use pivot tables in Excel to look at e.g. the development of variables over time or dependencies between variables.
I remembered from university the nice R-function where you can plot every column of a dataframe against every other column like in:
For the dependency between issue.age and duration this plot is actually interesting because you can clearly see that high issue ages come with shorter policy durations (because there is a maximum age for each policy). However the plots involving the issue year iss.year are much less "visual". In fact you cant see anything from them. I would like to see with once glance if the distribution of issue ages has changed over the different issue.years, something like
where you could see immediately that the average age of newly issue policies has been increasing from 2014 to 2016.
I don't want to write code that needs to be customized for every dataset that I put in because then I can also do it faster manually in Excel.
So my question is, is there an easy way to plot each column of a matrix against every other column with more flexible chart types than with the standard plot(data.frame)?
The ggpairs() function from the GGally library. It has a lot of capability for visualizing columns of all different types, and provides a lot of control over what to visualize.
For example, here is a snippet from the vignette linked to above:
data(tips, package = "reshape")
ggpairs(tips)

Multiple regression lines to define a set of data

I am trying to use a regression model to establish a relationship between two parameters, A and B(more specifically, runtime and workload, so that can I recommend what an optimal workload could be maybe, or how strongly one affects the other etc. ) I am using 'rlm'(robust linear model) for this purpose since it saves me the trouble of dealing with outliers before hand.
However, rather than output one single regression model, I would like to determine a band that can confidently explain most of the points. Here is an image I took from the web. Those additional red lines are what I want to determine.
This is what I had in mind :
1. I found the mean of the residuals of all the points lying above the line. Then we probably shift the original regression line by some multiple of mean + k*sigma. The same can be done for the points below the line.
In SVM, in order to find the support vectors, we draw parallel lines(essentially shift the middle line until we find support vectors on either sides). I had something like that in mind. Play around with the intercepts a little and find the the number of points which can be explained by the band. Keep a threshold so you can stop somewhere.
The problem is, I am unable to implement this in R. For that matter, I am not sure if these approaches even work either. I would like to know what you would suggest. Also, is there a classic way to do this using one of the many R packages?
Thanks a lot for helping. Appreciate it.

Resources