Related
I read quite a few threads on creating Venn Diagram in R. Is it possible to create a proportional triple Venn Diagram talks about using eulerr package. Venn diagram proportional and color shading with semi-transparency is very comprehensive and did help me with a lot of the other graphs I needed.
While above threads are fantastic, I believe that there is one problem that is still not solved by above threads. It happens when the intersection of three sets represents a huge portion of overall area. In my case, R&S&W is 92% of total area. Hence, the graph is imperceptible and ugly. Is there any way we can fix this?
Here's my data and code:
dput(Venn_data)
structure(c(94905288780.4383, 3910207511.54001, 2615620176.44757,
1125606833.85568, 187542691.618916, 104457994.331746, 96049675.0823557
), .Names = c("R&S&W", "R&S", "S&W", "S", "R", "W", "R&W"))
VennDiag2 <- eulerr::euler(Venn_data,shape="ellipse")
windows()
plot(VennDiag2)
Here's the output:
I cannot see what's R&S, S&W, R, S, W etc.
I also tried venneuler package.
Here's my code:
windows()
v<-venneuler(Venn_data)
plot(v)
Unfortunately, this didn't help either. Here's the output.
Is there any way we can fix this? I am not an expert so I thought of asking here. I'd sincerely appreciate any help. I have spent quite a few hours on this and am still not able to get this to work.
You could always retrieve the plot parameters yourself and position the labels using arrows or something, but another option would be to use a legend instead of labels.
plot(VennDiag2, legend = TRUE)
Is is somewhat questionable whether there is much use for an Euler diagram at all here though.
There is a different visualization strategy in the nVennR package I posted some months ago:
library(nVennR)
v <- createVennObj(nSets = 3, sNames = c('R', 'S', 'W'), sSizes = c(0, 104457994.331746, 1125606833.85568, 2615620176.44757, 187542691.618916, 96049675.0823557, 3910207511.54001, 94905288780.4383))
v <- plotVenn(nVennObj = v)
I had not anticipated the need for such large numbers, and I see they get cropped. However, the result is a vector image (svg), and the picture can be edited afterwards. You can find more details, including why the numbers are in that order, in the vignette. The package can also handle larger numbers of sets.
I often determine that when plotting in R not all relevant tick-marks are drawn. Relevant here means that there is data present.
See this example
> set.seed(NULL)
> d <- data.frame(a=sample(1:10, replace=TRUE), b=sample(11:30))
> plot(d)
The resulting plot where you can see values on the X-axis at 3, 5, 7 and 9. But the tick-marks for them are missing.
The focus of my question is to understand why R acts like that. What is the algorithm and logic behind it?
btw: I know how to solve it. I can draw the X-axis myself. But that is not part of the question.
You could find a brief description of the algorithm for plotting the tick marks using?axis.
plot() is a generic function to plot a wide sort of data. In your example, you are using discrete data. For continuous data, it does not make much sense to have a single tick mark for every single value, which would make unreadable the axes.
However, you can easily adjust the ticks in your plot using axis()
I'm new to R. Previously, I've been able to overlay 2 separate plots that were of the same kind, p1 and p2, using plot (p1); plot (p2, add=T).
I'm struggling with the definition of factors when overlaying a barplot with a point plot showing all individual points.
I can individually plot the barplot as I want it. The point plot looks like I want it, but I realize I'm using an incorrect definition of phase as numerical to force R plot to display each value, rather than default to a boxplot (like when I use plot(my.df$cond, my.df$val).
Any tips on defining my variable types correctly or whether I'm using the correct barplot and plot functions, would be greatly appreciated. Thank you so much.
shpad <- c(1,2,5,6,1,2,5,6,1,2,5,6,1,2,5,6)
my.df <- data.frame(val=c(0.0738,0.0518,0.002,0.0397,0.1452,0.1152,0.1774,0.0658,0.0218,0.0497,-0.0296,0.0653,0.0848,0.1296,0.1416,0.0923,
phase=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4),
sub=c(1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4),
cond=c("NsNm", "NsNm", "NsNm", "NsNm", "NsLm", "NsLm", "NsLm", "NsLm", "LsNm", "LsNm", "LsNm", "LsNm", "LsLm", "LsLm", "LsLm", "LsLm"))
avg <-tapply(my.df$val, my.df$phase, mean)
barplot(avg, border=NA, names.arg=c("NsNm", "NsLm", "LsNm", "LsLm"),col=c("blue","darkblue","red", "darkred"),ylab = "score",ylim=c(-0.03,0.25))
plot(my.df$phase, my.df$val, type="p", ylim=c(-0.03,0.25), ylab = "score", pch=shpad)
tl;dr: problem is that if instead of the last line, I have plot(my.df$phase, my.df$val, type="p", ylim=c(-0.03,0.25), ylab = "score", pch=shpad, add=T), the formats are incongruent.
Alright, so, I've tried for a bit to accomplish what you wanted, but the best I could do with the base plotting system is this:
Which is accomplished purely by your lines of code above except for the last line, which I replaced with
points(my.df$phase,my.df$val,type="p",pch=shpad)
However, I think you can do much better, if you want to keep the same kind of plot, using the ggplot2 library. Using this code:
library('ggplot2')
new.df <- data.frame(avg,phase=levels(factor(phase)))
ggplot(new.df) +
geom_bar(stat="identity",aes(x=levels(phase),y=avg, fill=c("NsNm","NsLm","LsNm","LsLm")))+
geom_point(aes(x=my.df$phase,y=my.df$val,shape=factor(shpad))) +
scale_x_discrete(name="Type",labels=c("NsNm","NsLm","LsNm","LsLm")) +
ylab("Score")
you can make this chart:
I didn't adjust the coloring and the point types and the legend titles (not sure how important they are, but those can be fiddled with). However, you can see this probably produces the result you were aiming for.
I have a data set where the [,1] is time and then the next 14 are magnitudes. I would like to scatter plot all the magnitudes vs time on one graph, where each different column is gridded (layered on top of one another)
I want to use the raw data to make these graphs and came make them separately but would like to only have to do this process once.
data set called A, the only independent variable is time (the first column)
df<-data.frame(time=A[,1],V11=A[,2],V08=A[,3],
V21=A[,4],V04=A[,5],V22=A[,6],V23=A[,7],
V24=A[,8],V25=A[,9],V07=A[,10],xxx=A[,11],
V26=A[,12],PV2=A[,13],V27=A[,14],V28=A[,15],
NV1=A[,16])
I tried the code mentioned by #VlooO but it scrunched the graphs making them too hard to decipher and each had its own axes. All my graphs can be on the same axes just separated by their headings.
When looking at the ggplots I Think that would be a perfect program for what I want.
ggplot(data=df.melt,aes(x=time,y=???))
I confused what my y should be since I want to reference each different column.
Thanks R community
Hope i understand you correctly:
df<-data.frame(time=rnorm(10),A=rnorm(10),B=rnorm(10),C=rnorm(10))
par(mfrow=c(length(df)-1,1))
sapply(2:length(df), function(x){
plot(df[,c(1,x)])
})
The result would be
here some hints since you don't provide a reproducible example , neither you show what you have tried :
Use list.files to go through all your documents
Use lapply to loop over the result of the previous step and read your data
Put your data in the long format using melt from reshape2 and the variable time as id.
Use ggplot2 to plot using the variable as aes color/group.
library(ggplot2)
library(reshape2)
invisible(lapply(list.files(pattern=...),{
dt = read.table(x)
dt.l = melt(dt,id.vars='time')
print(ggplot(dt.l)+geom_line(aes(x=time,y=value,color=variable))
}))
If you don't need ggplot2, then the matplot function for base graphics can be used to do what you want in one command.
SOLUTION:
After looking through a bunch more problems and playing around a bit more with ggplot2 I found a code that works pretty great. After I made my data frame (stated above), here is what i did
> df.m<- melt(df,"time")
ggplot(df.m, aes(time, value, colour = variable)) + geom_line() +
+ facet_wrap(~ variable, ncol = 2)
I would post the image but I don't have enough reputation points yet.
I still don't really understand why "value" is placed into the y position in aes(time, value,...) If anyone could provided an explanation that would be greatly appreciated. My last question is if anyones knows how to make the subgraphs titles smaller.
Can I use cex.lab=, cex.main= in ggplot2?
I've been trying to create a 3D bar plot based on categorical data, but have not found a way.
It is simple to explain. Consider the following example data (the real example is more complex, but it reduces to this), showing the relative risk of incurring something broken down by income and age, both categorical data.
I want to display this in a 3D bar plot (similar in idea to http://demos.devexpress.com/aspxperiencedemos/NavBar/Images/Charts/ManhattanBar.jpg). I looked at the scatterplot3d package, but it's only for scatter plots and doesn't handle categorical data well. I was able to make a 3d chart, but it shows dots instead of 3d bars. There is no chart type for what I need. I've also tried the rgl package, but no luck either. I've been googling for more than an hour now and haven't found a solution. I have a copy of the ggplot2 - Elegant Graphics for Data Analysis book as well, but ggplot2 doesn't have this kind of chart.
Is there another freeware app I could use? OpenOffice 3.2 doesn't have this chart either.
Thank you for any hints.
Age,Income,Risk
young,high,1
young,medium,1.2
young,low,1.36
adult,high,1
adult,medium,1.12
adult,low,1.23
old,high,1
old,medium,1.03
old,low,1.11
I'm not sure how to make a 3d chart in R, but there are other, better ways to represent this data than with a 3d bar chart. 3d charts make interpretation difficult, because the heights of the bars and then skewed by the 3d perspective. In that example chart, it's hard to tell if Wisconsin in 2004 is really higher than Wisconsin 2001, or if that's an effect of the perspective. And if it is higher, how much so?
Since both Age and Income have meaningful orders, it wouldn't be awful to make a line graph. ggplot2 code:
ggplot(data, aes(Age, Risk, color = Income))+
geom_line(aes(group = Income))
Or, you could make a heatmap.
ggplot(data, aes(Age, Income, fill = Risk)) +
geom_tile()
Like the others suggested there are better ways to present this, but this should get you started if you want something similar to what you had.
df <- read.csv(textConnection("Age,Income,Risk
young,high,1
young,medium,1.2
young,low,1.36
adult,high,1
adult,medium,1.12
adult,low,1.23
old,high,1
old,medium,1.03
old,low,1.11
"))
df$Age <- ordered(df$Age, levels=c('young', 'adult', 'old'))
df$Income <- ordered(df$Income, levels=c('low', 'medium', 'high'))
library(rgl)
plot3d(Risk ~ Age|Income, type='h', lwd=10, col=rainbow(3))
This will just produce flat rectangles. For an example to create nice looking bars, see demo(hist3d).
You can find a starting point here but you need to add in more lines and some rectangles to get a plot like you posted.