I read quite a few threads on creating Venn Diagram in R. Is it possible to create a proportional triple Venn Diagram talks about using eulerr package. Venn diagram proportional and color shading with semi-transparency is very comprehensive and did help me with a lot of the other graphs I needed.
While above threads are fantastic, I believe that there is one problem that is still not solved by above threads. It happens when the intersection of three sets represents a huge portion of overall area. In my case, R&S&W is 92% of total area. Hence, the graph is imperceptible and ugly. Is there any way we can fix this?
Here's my data and code:
dput(Venn_data)
structure(c(94905288780.4383, 3910207511.54001, 2615620176.44757,
1125606833.85568, 187542691.618916, 104457994.331746, 96049675.0823557
), .Names = c("R&S&W", "R&S", "S&W", "S", "R", "W", "R&W"))
VennDiag2 <- eulerr::euler(Venn_data,shape="ellipse")
windows()
plot(VennDiag2)
Here's the output:
I cannot see what's R&S, S&W, R, S, W etc.
I also tried venneuler package.
Here's my code:
windows()
v<-venneuler(Venn_data)
plot(v)
Unfortunately, this didn't help either. Here's the output.
Is there any way we can fix this? I am not an expert so I thought of asking here. I'd sincerely appreciate any help. I have spent quite a few hours on this and am still not able to get this to work.
You could always retrieve the plot parameters yourself and position the labels using arrows or something, but another option would be to use a legend instead of labels.
plot(VennDiag2, legend = TRUE)
Is is somewhat questionable whether there is much use for an Euler diagram at all here though.
There is a different visualization strategy in the nVennR package I posted some months ago:
library(nVennR)
v <- createVennObj(nSets = 3, sNames = c('R', 'S', 'W'), sSizes = c(0, 104457994.331746, 1125606833.85568, 2615620176.44757, 187542691.618916, 96049675.0823557, 3910207511.54001, 94905288780.4383))
v <- plotVenn(nVennObj = v)
I had not anticipated the need for such large numbers, and I see they get cropped. However, the result is a vector image (svg), and the picture can be edited afterwards. You can find more details, including why the numbers are in that order, in the vignette. The package can also handle larger numbers of sets.
Related
I have performed PCA Analysis using the prcomp function apart of the FactoMineR package on quite a substantial dataset of 3000 x 500.
I have tried plotting the main Principal Components that cover up to 100% of cumulative variance proportion with a fviz_eig plot. However, this is a very large plot due to the large dimensions of the dataset. Is there any way in R to split a plot into multiple plots using a for loop or any other way?
Here is a visual of my plot that only cover 80% variance due to the fact it being large. Could I split this plot into 2 plots?
Large Dataset Visualisation
I have tried splitting the plot up using a for loop...
for(i in data[1:20]) {
fviz_eig(data, addlabels = TRUE, ylim = c(0, 30))
}
But this doesn't work.
Edited Reproducible example:
This is only a small reproducible example using an already available dataset in R but I used a similar method for my large dataset. It will show you how the plot actually works.
# Already existing data in R.
install.packages("boot")
library(boot)
data(frets)
frets
dataset_pca <- prcomp(frets)
dataset_pca$x
fviz_eig(dataset_pca, addlabels = TRUE, ylim = c(0, 100))
However, my large dataset has a lot more PCs that this one (possibly 100 or more to cover up to 100% of cumulative variance proportion) and therefore this is why I would like a way to split the single plot into multiple plots for better visualisation.
Update:
I have performed what was said by #G5W below...
data <- prcomp(data, scale = TRUE, center = TRUE)
POEV = data$sdev^2 / sum(data$sdev^2)
barplot(POEV, ylim=c(0,0.22))
lines(0.7+(0:10)*1.2, POEV, type="b", pch=20)
text(0.7+(0:10)*1.2, POEV, labels = round(100*POEV, 1), pos=3)
barplot(POEV[1:40], ylim=c(0,0.22), main="PCs 1 - 40")
text(0.7+(0:6)*1.2, POEV[1:40], labels = round(100*POEV[1:40], 1),
pos=3)
and I have now got a graph as follows...
Graph
But I am finding it difficult getting the labels to appear above each bar. Can someone help or suggest something for this please?
I am not 100% sure what you want as your result,
but I am 100% sure that you need to take more control over
what is being plotted, i.e. do more of it yourself.
So let me show an example of doing that. The frets data
that you used has only 4 dimensions so it is hard to illustrate
what to do with more dimensions, so I will instead use the
nuclear data - also available in the boot package. I am going
to start by reproducing the type of graph that you displayed
and then altering it.
library(boot)
data(nuclear)
N_PCA = prcomp(nuclear)
plot(N_PCA)
The basic plot of a prcomp object is similar to the fviz_eig
plot that you displayed but has three main differences. First,
it is showing the actual variances - not the percent of variance
explained. Second, it does not contain the line that connects
the tops of the bars. Third, it does not have the text labels
that tell the heights of the boxes.
Percent of Variance Explained. The return from prcomp contains
the raw information. str(N_PCA) shows that it has the standard
deviations, not the variances - and we want the proportion of total
variation. So we just create that and plot it.
POEV = N_PCA$sdev^2 / sum(N_PCA$sdev^2)
barplot(POEV, ylim=c(0,0.8))
This addresses the first difference from the fviz_eig plot.
Regarding the line, you can easily add that if you feel you need it,
but I recommend against it. What does that line tell you that you
can't already see from the barplot? If you are concerned about too
much clutter obscuring the information, get rid of the line. But
just in case, you really want it, you can add the line with
lines(0.7+(0:10)*1.2, POEV, type="b", pch=20)
However, I will leave it out as I just view it as clutter.
Finally, you can add the text with
text(0.7+(0:10)*1.2, POEV, labels = round(100*POEV, 1), pos=3)
This is also somewhat redundant, but particularly if you change
scales (as I am about to do), it could be helpful for making comparisons.
OK, now that we have the substance of your original graph, it is easy
to separate it into several parts. For my data, the first two bars are
big so the rest are hard to see. In fact, PC's 5-11 show up as zero.
Let's separate out the first 4 and then the rest.
barplot(POEV[1:4], ylim=c(0,0.8), main="PC 1-4")
text(0.7+(0:3)*1.2, POEV[1:4], labels = round(100*POEV[1:4], 1),
pos=3)
barplot(POEV[5:11], ylim=c(0,0.0001), main="PC 5-11")
text(0.7+(0:6)*1.2, POEV[5:11], labels = round(100*POEV[5:11], 4),
pos=3, cex=0.8)
Now we can see that even though PC 5 is much smaller that any of 1-4,
it is a good bit bigger than 6-11.
I don't know what you want to show with your data, but if you
can find an appropriate way to group your components, you can
zoom in on whichever PCs you want.
I am very new to R and would like to draw a line graph. I have got as far as importing my data into R and don't really know where to go next! I've searched the internet for examples of how to plot a line graph, but can't find anything that explains why the various commands are being used (which I think that I need to learn what is going on). Can anyone recommend any such tutorials/instructions that are aimed at the beginner?
Probably complicating the matter further, the line graph I'd like to draw doesn't have evenly spaced data points on the x-axis (0.19, 0.31 and 0.36). I'd like to reflect this in the plot, but have no idea how to program this.
Thanks in advance for everyone's help!
There are many ways to plot in R. One is with Base r commands, like this
x <- c(0.19, 0.31, 0.36)
y <- c(1,2,3)
plot(x,y,type = "l")
Look online for plot examples for ggplot and lattice graphs.
I suggest to report some data and and example of the code you are dealing with. It helps the community to examine your problem.
Anyway, if I got your problem correctly, R deals automatically with NAs, that are not reported in the plot. You can have a line graph with type = "l" ("l" stands for "line") in the plot() function.
x <- rnorm(100)
plot(x, type = "l")
I am trying to scale the plots that appear in the terminal nodes of a ctree. I have tried using the yscale parameter but this just results plots that extend beyond the plotting window
For example: Here is a ctree for two exponential distributions
set.seed(1)
classA <-data.frame(class = "a", val = round(rexp(500, rate = 0.2),0))
classB <-data.frame(class = "b", val = round(rexp(500, rate = 0.05),0))
df <- as.data.frame(rbind(classA,classB))
ct = ctree(val~., data = df)
plot(ct)
Now if I try to scale the y axis of the plots from 0 to 70 to zoom in on the box plots and cut-off the outliers, I can use:
plot(ct,terminal_panel = node_boxplot(ct,yscale =c(0,70)))
This works to scale the y axis, but now the plot extends beyond the plotting box.
Sorry I would show images, but don't have enough privileges on stackoverflow yet.
Thanks for any suggestions
First of all: In an example like this it would be better to log-transform the response because then the association tests employed in ctree() will have more power to detect differences for splitting in the tree. Possibly some small continuity correction might help if there are exact zeros.
But, of course, the problem of the proper scaling in the terminal nodes is separate from this. The reason was that the viewports for the terminal nodes were not set to clip = TRUE and hence didn't clip graphical elements outside the viewport region.
I've just fixed this problem in the partykit package on R-Forge. A new CRAN release is not scheduled yet but you can either check out the partykit-SVN from R-Forge or just download the current partykit/R/plot.R source code.
I am trying to plot a graph with weighted and colored edges using tm package in R. The issue is that the correlations are pretty strong so the edges width covers most of my plot. I can control the fact they are weighted though no success in coloring them in light grey so it won't cover the rest of the graph.
My code is:
plot(dtm,terms = findFreqTerms(dtm, lowfreq=400),corThreshold = 0.3,weighting = TRUE,attrs = list(graph = list(rankdir ="BT"),node = list(shape = "rectangle",fixedsize=FALSE,fontsize=120)))
which is working fine except the very wide and dark edges.
An alternative solution I thought about is transforming the object to a graph object but no success with this either.
Any assistance will be highly appreciated. Many many thanks.
I am a newbie to R and I am trying to do some clustering on a data table where rows represent individual objects and columns represent the features that have been measured for these objects. I've worked through some clustering tutorials and I do get some output, however, the heatmap that I get after clustering does not correspond at all to the heatmap produced from the same data table with another programme. While the heatmap of that programme does indicate clear differences in marker expression between the objects, my heatmap doesn't show much differences and I cannot recognize any clustering (i.e., colour) pattern on the heatmap, it just seems to be a randomly jumbled set of colours that are close to each other (no big contrast). Here is an example of the code I am using, maybe someone has an idea on what I might be doing wrong.
mydata <- read.table("mydata.csv")
datamat <- as.matrix(mydata)
datalog <- log(datamat)
I am using log values for the clustering because I know that the other programme does so, too
library(gplots)
hr <- hclust(as.dist(1-cor(t(datalog), method="pearson")), method="complete")
mycl <- cutree(hr, k=7)
mycol <- sample(rainbow(256)); mycol <- mycol[as.vector(mycl)]
heatmap(datamat, Rowv=as.dendrogram(hr), Colv=NA,
col=colorpanel(40, "black","yellow","green"),
scale="column", RowSideColors=mycol)
Again, I plot the original colours but use the log-clusters because I know that this is what the other programme does.
I tried to play around with the methods, but I don't get anything that would at least somehow look like a clustered heatmap. When I take out the scaling, the heatmap becomes extremely dark (and I am actually quite sure that I have somehow to scale or normalize the data by column). I also tried to cluster with k-means, but again, this didn't help. My idea was that the colour scale might not be used completely because of two outliers, but although removing them slightly increased the range of colours plotted on the heatmap, this still did not reveal proper clusters.
Is there anything else I could play around with?
And is it possible to change the colour scale with heatmap so that outliers are found in the last bin that has a range of "everything greater than a particular value"? I tried to do this with heatmap.2 (argument "breaks"), but I didn't quite succeed and also I didn't manage to put the row side colours that I use with the heatmap function.
If you are okay with using heatmap.2 from the gplots package that will allow you to add breaks to assign colors to ranges represented in your heatmap.
For example if you had 3 colors blue, white, and red with the values going from low to high you could do something like this:
my.breaks <- c(seq(-5, -.6, length.out=6),seq(-.5999999, .1, length.out=4),seq(.100009,5, length.out=7))
result <- heatmap.2(mtscaled, Rowv=T, scale='none', dendrogram="row", symm = T, col=bluered(16), breaks=my.breaks)
In this case you have 3 sets of values that correspond to the 3 colors, the values will differ of course depending on what values you have with your data.
One thing you are doing in your program is to call hclust on your data then to call heatmap on it, however if you look in the heatmap manual page it states:
Defaults to hclust.
So I don't think you need to do that. You might want to take a look at some similar questions that I had asked that might help to point you in the right direction:
Heatmap Question 1
Heatmap Question 2
If you post an image of the heatmap you get and an image of the heatmap that the other program is making it will be easier for us to help you out more.