Documentation on internal variables in ggplot, esp. PANEL

Documentation on internal variables in ggplot, esp. PANEL - r

The answer to this question uses a PANEL variable which seems to be internal to ggplot. But searching the ggplot documentation and also Hadley Wickham's book, I can find no reference to it at all. Is this documented anywhere?
Also, looking at the code for stat_bin(...), there is evidently a vector count created (which contains the count of y for each unique x??). This is also accessible in aes(...) but, again, I can find no documentation.
So my question is: is there a place where all of these internal variables are documented, or must one just go to the code?

There are some surprising gaps in the help pages for ggplot2 (and I would point also to the help page for ?layer to which many other pages refer users as a particularly egregious gap.) These "variables" have been around for years and like you I cannot find much in the online help or the package NEWS. SO's search facility is not much help because it strips off the leading and trailing dots and shows everything with "count". Only examples of their use can be found in cran.r-project.org/web/packages/ggplot2/ggplot2.pdf. Google is somewhat more helpful and the search string of: ggplot2 ..counts.. delivers many informative hits. From context one forms that sense that these are not so much special variables as much as they are combined functions and program controls. These arguments implicitly transform the named arguments. They do seem to be implicitly mentioned in ?stat_bin {ggplot2} albeit without the dots, and it appears that all four of these stat-variable-functions are calculated at the same time.
When I did a search in the pdf you linked to I found on pages 57-58 tables (#4.3,4.4) of "statistics" and "aesthetics" that you were asking for, but to my surprise it did not include count. Those tables are in section 4.7 that describes "stats".
(I have noticed improvement or the last couple of years in some of the pages to which these complaints were directed.)

I think PANEL is a column in the component data of a plot. You get list of columns names:
names(ggplot_build(x)$data)
For the count and frequency variables, you can refer to Hadley book , page 69:
Both the histogram and frequency polygon geom use stat_bin. This
statistic produces two output variables count and density. The count
is the default as it is most interpretable. The density is basically
the count divided by the total count, and is useful when you want to
compare the shape of the distributions, not the overall size. You will
often prefer this when comparing the distribution of subsets that have
different sizes.

Related

How show points of each question when producing exams' solution? [duplicate]

Consider creating exams using the exams package in R.
When using exams2nops there is a parameter showpoints that, when set to TRUE will show the points of each exercise. However, for exams2pdf this parameter is not available.
How to display the points per exercise when using exams2pdf?

(The answer below is adapted from the R/exams forum at https://R-Forge.R-project.org/forum/forum.php?thread_id=33884&forum_id=4377&group_id=1337.)
There is currently no built-in solution to automatically display the number of points in exams2pdf(). The points= argument only stores the number of points in the R object that exams2pdf() creates (as in other exams2xyz() interfaces) but not in the individual PDF files.
Thus, if you want the points to be displayed you need to do it in some way yourself. A simple solution would be to include it in the individual exercises already, possibly depending on the kind of interface used, e.g., something like this for an .Rmd exercise:
pts <- 17
pts_string <- if(match_exams_call() == "exams2pdf") {
sprintf("_(%s points)_", pts)
} else {
""
}
And then at the beginning of the "Question":
Question
========
`r pts_string` And here starts the question text...
Finally in the meta-information
expoints: `r pts`
This always includes the desired points in the meta-information but only displays them in the question when using exams2pdf(...). This is very flexible and can be easily customized further. The only downside is that it doesn't react to the exams2pdf(..., points = ...) argument.
In .Rnw exercises one would have to use \Sexpr{...} instead of r .... Also the pts_string should be something like sprintf("\\emph{(%s points)}", pts).
Finally, a more elaborate solution would be to create a suitable \newcommand in the .tex template you use. If all exercises have the same number of points, this is not hard to do. But if all the different exercises could have different numbers of points, it would need to be more involved.
The main reason for supporting this in exams2nops() but not exams2pdf() is that the former has a rather restrictive format and vocabulary. In the latter case, however, the point is to give users all freedom regarding layout, language, etc. Hence, I didn't see a solution that is simple enough but also flexible enough to cover all use-cases of exams2pdf().

Plotting a subset of data from a prcomp matrix without re-running prcomp

I am asking a question to a similar post posted up 2 years ago, with no full answer to it (subset of prcomp object in R). P.S. sorry for commenting on it for an answer..
Basically, my question is the same. I have generated a PCA table using prcomp that has 10000+ genes, and 1700+ cells, made up of 7 timepoints. Plotting all of them in a single file makes it difficult to see.
I would like to plot each timepoint separately, using the same PCA results table (ie without re-running prcomp).
Thanks Dean for giving me tips on posting. To think of a way to describe my dataset without actually loading it here, will take me a week I believe. I also tried the
dput(droplevels(head(object,2)))
option, but it was just too much info since I have such a large dataset. In short, it is a large matrix of single-cell dataset where people can commonly see on packages such as Seurat (https://satijalab.org/seurat/pbmc3k_tutorial_1_4.html). EDIT: I have posted a screenshot of a subset of my matrix here ().
Sorry I don't know how to re-create this or even export a text format.. But this is what I can provide:
My TPM matrix has 16541 rows (defining genes), and 1798 columns (defining cells).
In it, I have "re-labelled" my columns based on timepoints, using codes such as:
D0<-c(colnames(TPM[,grep("20180419-24837-1-*", colnames(TPM))])) #D0: 286 cells
D7<-c(colnames(TPM[,grep("20180419-24837-2-*", colnames(TPM))])) #D7: 237 cells
D10<-c(colnames(TPM[,grep("20180419-24947-5-*", colnames(TPM))])) #D10: 304 cells
...... and I continued to label each timepoint.
Each timepoint was also given a specific colour.
rc<-rep("white", ncol(TPM))
rc<-[,grep("20180419-24837-1-*", colnames(TPM))]= "magenta"
...... and I continued to give colour to each timepoint.
I performed a PCA using this code:
pcaRes<-prcomp(t(log(TPM+1)), center= TRUE, scale. = TRUE)
Then I proceeded to plot a PCA plot using:
plot(pcaRes$x[,1], pcaRes$x[,2], xlab="PC1", ylab="PC2",
cex=1.0, col= rc, pch=16, main="")
Then I when I wanted to plot a PCA plot only with D0, using the same PCA output (pcaRes).. This is where I am stuck.
P.S. If anyone else has an easier way of advising how to input an example data here from my large matrix, I welcome any help. Thanks so much! Sorry I am very new in bioinformatics.

Stack Exchange for
Bioinformatics is where you you will need to go to ask question(s) or learn about the package(s) and function(s) you need to deal with you area of specialty. Stack Exchange for Bioinformatics is linked with Stackoverflow so you will just need to join, you'll have the same login.
Classes S3, S4 and Base.
This Very basic over view of Classes in R. Think of a Class as the parent you inherit all of their skills or abilities from and as a result you are able to achieve certain tasks better than others and some cases, you will not be able to do the task at all.
In R and all programming, to save re-inventing the wheel, parent classes are created so that the average person does not have to repeatedly write a function to do something simple like plot() a graph. This stuff is hidden, to access it, you inherit from the parent. The child reads the traits off the parent(s), and then it either performs the task or gives you a cryptic error message.
Base and S3 classes work well together, they are like the working class people of the R world. S4 is a specialized class made for specific fields of study to be able to provide specific functionality needed in their industry. This mean you can only use certain Base and S3 functions with Class S4 functions, most are just not compatible. So it's nothing you've done wrong, plot() and ggplot() just have the wrong parent(s) to work with your dataset.
Typical Base and S3 Class dataframe: Box like structure. Along the left hand side is all the column names, nice and neatly stacked on top of each other.
Seurat S4 Class dataframe: Tree like structure, formatted to be read by a specific function(s).
Well hope that helps and I wish you well in your career. Cheers Conrad
Ps if this helps, then click the arrow up. :)

thanks #ConradThiele for your suggestion, I will check out that site.
I had a chat with other bioinformatics around the institute. My query has little to do with the object being an S4 class, since I am performing prcomp outside of the package. I have extracted my matrix out of the object and then ran prcomp on it.
Solution is simple: run prcomp with full dataset, transform the prcomp output into a dataframe, input additional columns to input additional details like "timepoint", create new dataframe(s) only with the "timepoint"/ "variable" of interest from the prcomp result, make multiple sub-dataframe and then plotting these using "plot" or whatever function you use.
This was not my solution but from a bioinformatition I went for help to in my institute. Hope this helps others! Thanks again for your time.
P.S. If I have the time, I will post a copy of the code I suggested soon.

R plot data.frame to get more effective overview of data

At work when I want to understand a dataset (I work with portfolio data in life insurance), I would normally use pivot tables in Excel to look at e.g. the development of variables over time or dependencies between variables.
I remembered from university the nice R-function where you can plot every column of a dataframe against every other column like in:
For the dependency between issue.age and duration this plot is actually interesting because you can clearly see that high issue ages come with shorter policy durations (because there is a maximum age for each policy). However the plots involving the issue year iss.year are much less "visual". In fact you cant see anything from them. I would like to see with once glance if the distribution of issue ages has changed over the different issue.years, something like
where you could see immediately that the average age of newly issue policies has been increasing from 2014 to 2016.
I don't want to write code that needs to be customized for every dataset that I put in because then I can also do it faster manually in Excel.
So my question is, is there an easy way to plot each column of a matrix against every other column with more flexible chart types than with the standard plot(data.frame)?

The ggpairs() function from the GGally library. It has a lot of capability for visualizing columns of all different types, and provides a lot of control over what to visualize.
For example, here is a snippet from the vignette linked to above:
data(tips, package = "reshape")
ggpairs(tips)

Text Categorization using R

I am relatively new at using R. I have a dataset of around 5000 datapoints.
My goal is to predict a category using the comments entered.
I have a training dataset of 4500 records and a testing data set of 500 records.
I am looking for 2-3 packages which might help me in doing this.I have to evaluate these packages and prepare a report on that. Can anyone suggest me some good packages which might easier to use and also more efficient.
Again, I have 2 columns
1st one is comments and based on this I have to predict the category.
Right now I have defined around 10 independent categories.
Most of the comments have specific keywords which I have defined as categories
One such example
Comment 1
The website is pretty good --->> category would be WebsiteContent
comment 2 might be like
Excellent article ,very detailed--->> same category as above(WebsiteContent)
But the keywords such as article, website are very limited and can be linked to the category
all of comments are different but the underlying keywords are mostly the same
Thanks,
Ankan

Although all you need is a very long and well written set of if-else statements, try using a Decision tree from the package from the rpart and prp package. I'm saying this only cause you're trying to learn and I'm guessing this is for some assignment which you're supposed to be doing on your own.
tree<-rpart(train$decision~train$comment, method"class")
prp(tree)
The first line builds the model and the second one plots it. This might be a bit overboard actually but since you're learning R this is a fun thing to work with and can be used for a wide variety of things. Although, Decision trees work better with more predictor variables.
Use predict(test,tree) to test out the model on your test dataset.

Two-way Summary Table in Julia (with weight by third variable)

Has anyone implemented a two-way frequency(summary) table in Julia? I'm looking to create summary tables (means, sums, etc.) in Julia, with the option to weight continuous variables by a third variable (very important). Essentially, I want the abilities of proc tabulate/means, for those familiar with SAS.
I searched around and found this was discussed at one point but does not appear to have moved anywhere since. I'm at the point where I will look to code this up myself, but I want to see what the communities thoughts are before proceeding.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Documentation on internal variables in ggplot, esp. PANEL - r

Related

How show points of each question when producing exams' solution? [duplicate]

Plotting a subset of data from a prcomp matrix without re-running prcomp

R plot data.frame to get more effective overview of data

Text Categorization using R

Two-way Summary Table in Julia (with weight by third variable)

Categories

Resources