I've been trying to create a 3D bar plot based on categorical data, but have not found a way.
It is simple to explain. Consider the following example data (the real example is more complex, but it reduces to this), showing the relative risk of incurring something broken down by income and age, both categorical data.
I want to display this in a 3D bar plot (similar in idea to http://demos.devexpress.com/aspxperiencedemos/NavBar/Images/Charts/ManhattanBar.jpg). I looked at the scatterplot3d package, but it's only for scatter plots and doesn't handle categorical data well. I was able to make a 3d chart, but it shows dots instead of 3d bars. There is no chart type for what I need. I've also tried the rgl package, but no luck either. I've been googling for more than an hour now and haven't found a solution. I have a copy of the ggplot2 - Elegant Graphics for Data Analysis book as well, but ggplot2 doesn't have this kind of chart.
Is there another freeware app I could use? OpenOffice 3.2 doesn't have this chart either.
Thank you for any hints.
Age,Income,Risk
young,high,1
young,medium,1.2
young,low,1.36
adult,high,1
adult,medium,1.12
adult,low,1.23
old,high,1
old,medium,1.03
old,low,1.11
I'm not sure how to make a 3d chart in R, but there are other, better ways to represent this data than with a 3d bar chart. 3d charts make interpretation difficult, because the heights of the bars and then skewed by the 3d perspective. In that example chart, it's hard to tell if Wisconsin in 2004 is really higher than Wisconsin 2001, or if that's an effect of the perspective. And if it is higher, how much so?
Since both Age and Income have meaningful orders, it wouldn't be awful to make a line graph. ggplot2 code:
ggplot(data, aes(Age, Risk, color = Income))+
geom_line(aes(group = Income))
Or, you could make a heatmap.
ggplot(data, aes(Age, Income, fill = Risk)) +
geom_tile()
Like the others suggested there are better ways to present this, but this should get you started if you want something similar to what you had.
df <- read.csv(textConnection("Age,Income,Risk
young,high,1
young,medium,1.2
young,low,1.36
adult,high,1
adult,medium,1.12
adult,low,1.23
old,high,1
old,medium,1.03
old,low,1.11
"))
df$Age <- ordered(df$Age, levels=c('young', 'adult', 'old'))
df$Income <- ordered(df$Income, levels=c('low', 'medium', 'high'))
library(rgl)
plot3d(Risk ~ Age|Income, type='h', lwd=10, col=rainbow(3))
This will just produce flat rectangles. For an example to create nice looking bars, see demo(hist3d).
You can find a starting point here but you need to add in more lines and some rectangles to get a plot like you posted.
Related
hello everyone I am trying to plot the heat map wanted cluster the plot and plot is not looking good wanted change the color i am newbie can any one tell me how can I plot heat-map with clustering values which are showing similar pattern cluster together
my data data_link
what i tried simply tried to log normalize the data and plot the graph
library(ggplot2)
library(reshape2)
mydata=read.table("Test_data", sep="\t", header=TRUE)
melted_cormat <- melt(mydata)
head(melted_cormat)
melted_cormat$new=log2(1+melted_cormat$value)
ggplot(data = melted_cormat, aes(x=variable, y=ID, fill=new)) +
geom_tile()
is it posible increase each value cell size like below
image
please suggest me
Thank you
You can make a heatmap from this data, but I don't think it will be a very good way to visualize this much data. You have 287 rows in mydata, which means you will have 287 rows in your plot. This will make the individual rows difficult to make out, and it will make labelling of the y axis impossible.
The other issue is that approximately 99% of your values are under 1000, yet your highest value is almost 6000. That means that the scaling of your fill is going to be extremely uneven. It will be difficult to see much detail in the lower ranges.
If you want to see clustering you could use pheatmap instead of ggplot2, and I would probably do a log transform on the fill scale to reveal the details better. However, the problem with simply having too much data on a single plot persists.
mymatrix <- log(as.matrix(mydata[,-1]))
mymatrix[mymatrix < 0] <- 0
pheatmap::pheatmap(mymatrix)
EDIT
If you only plotted the first 10 rows of data, you can see this is more clearly like a heatmap:
pheatmap(as.matrix(mydata[1:10,-1]))
Or the first 30 rows:
pheatmap(as.matrix(mydata[1:30,-1]))
I am plotting two histograms in R by using the following code.
x1<-rnorm(100)
x2<-rnorm(50)
h1<-hist(x1)
h2<-hist(x2)
plot(h1, col=rgb(0,0,1,.25), xlim=c(-4,4), ylim=c(0,0.6), main="", xlab="Index", ylab="Percent",freq = FALSE)
plot(h2, col=rgb(1,0,0,.25), xlim=c(-4,4), ylim=c(0,0.6), main="", xlab="Index", ylab="Percent",freq = FALSE,add=TRUE)
legend("topright", c("H1", "H2"), fill=c(rgb(0,0,1,.25),rgb(1,0,0,.25)))
The code produces the following output.
I need a visually good looking (or stylistic) version of the above plot. I want to use ggplot2. I am looking for something like this (see Change fill colors section). However, I think, ggplot2 only works with data frames. I do not have data frames in this case. Hence, how can I create good looking histogram plot in ggplot2? Please let me know. Thanks in advance.
You can (and should) put your data into a data.frame if you want to use ggplot. Ideally for ggplot, the data.frame should be in long format. Here's a simple example:
df1 = rbind(data.frame(grp='x1', x=x1), data.frame(grp='x2', x=x2))
ggplot(df1, aes(x, fill=grp)) +
geom_histogram(color='black', alpha=0.5)
There are lots of options to change the appearnce how you like. If you want to have the histograms stacked or grouped, or shown as percent versus count, or as densities etc., you will find many resources in previous questions showing how to implement each of those options.
I am drawing a PC plot using ggplots.
I know this question has been answered in some previous posts but I could not still solve my problem.
I have a data set called tab which is the output of PCA
sample.id pop EV1 EV2
HT185_MK8-2.sort.bam HA_27 -0.03796869 0.046369552
HT48_SD1A-37.sort.bam HA_14 0.04208393 0.032961404
HT53_IA1A-10.sort.bam HA_1 -0.02580365 0.005262476
HT260_MK1-4.sort.bam HA_20 -0.06090545 0.005578504
HT170_SD2W-14.sort.bam HA_17 0.01288395 0.012117833
Q093_MK7-13.sort.bam HA_26 0.06310162 0.188558067
I want to add labels on each dot in the plot, theses dots are individuals from several populations. So I want to give them their population ID (pop column in the data set).
I am using something this
ggplot(data=tab,aes(EV1,EV2, label=tab[,2])) + geom_point(aes(color=as.factor(pop))) + ylab("Principal component 2") + xlab("Principal component 1")
But I do not get my desired output.
This is my PC plot!
So could anyone help me to add population label on each dot in the plot!
Thanks
Try geom_text:
geom_text(aes(label=as.character(pop)),hjust=0,vjust=0)
Also consider looking into plotly, or setting a threshold on the labels, because labeling every point will lead to a very crowded plot, and probably very little additional useful information.
Attempting to create pie chart with ggplot2 but cannot seem to get it using other references online. The chart I create is missing most of its fill.
ggplot(sae,aes(x=1,fill=factor(State), width=1))+
geom_bar()+
ggtitle("House by State")+
coord_polar(theta='y')
This code gives:
How do I fill the center?
Any other improvements appreciated.
With sample data
sae <- data.frame(State=sample(LETTERS[1:6],60,T))
ggplot(sae,aes(x=factor(1),fill=factor(State)))+
geom_bar(width=1)+
ggtitle("House by State")+
coord_polar(theta="y")
EDIT: Other options (because piecharts are bad)
#following Jaaps example: some better way to visualize this
#grouped barchart
p1 <- ggplot(sae, aes(x=State, fill=State)) +
geom_bar() + labs(title="grouped barchart")
#stacked barchart; especially practical if you want to compare groups
sae$group <- rbinom(60,1,0.5)
p2 <- ggplot(sae, aes(x=factor(group),fill=State))+
geom_bar(width=0.5) + labs(title="grouped stacked barchart")
do.call(grid.arrange,list(grobs=list(p1,p2),ncol=2))
As #Heroka already mentioned in the comments, pie-charts are a bad way of visualizing information. They are bad that it is even mentioned in the help-files of R.
From ?pie:
Pie charts are a very bad way of displaying information. The eye is
good at judging linear measures and bad at judging relative areas. A
bar chart or dot chart is a preferable way of displaying this type of
data.
Cleveland (1985), page 264: “Data that can be shown by pie charts
always can be shown by a dot chart. This means that judgements of
position along a common scale can be made instead of the less accurate
angle judgements.” This statement is based on the empirical
investigations of Cleveland and McGill as well as investigations by
perceptual psychologists.
Some further reading on the pie-chart debate.
With the example data of #Heroka:
ggplot(sae,aes(x = factor(1), fill = factor(State)))+
geom_bar(width = 1, position = "dodge")+
ggtitle("House by State")
you get:
A clear demonstration that it's better to see the differences between the categories when you use a barchart instead of a piechart.
When you want to show information about proportions, there is another choice, the waffle package which gets back more to what you probably intend to show with a pie chart (i.e., proportions). In most instances, the bar plots above would likely be best, but for the sake of showing another way of plotting...
Using the sae data from above:
library(waffle) # install the package if you don't have it
w <- table(sae)
w.waf <- waffle(table(sae))
w.waf + ggtitle("Contextless Waffle Graph") + theme(plot.title=element_text(face="bold", size=24))
which yields this:
I would like to create a mosaic plot (R package vcd, see e.g. http://cran.r-project.org/web/packages/vcd/vignettes/residual-shadings.pdf ) with labels inside the plot. The labels should show either a combination of the various factors or some custom label and the percentage of total observations in this combination of categories (see e.g. http://i.usatoday.net/communitymanager/_photos/technology-live/2011/07/28/nielsen0728x-large.jpg , despite this not quite being a mosaic plot).
I suspect something like the labeling_values function might play a role here, but I cannot quite get it to work.
library(vcd)
library(MASS)
data("Titanic")
mosaic(Titanic, labeling = labeling_values)
Alternative ways to represent two variables with categorical data in a friendly way for non-statisticians are also welcome and are acceptable solutions.
Here is an example of adding proportions as labels. As usual, the degree of customization of a plot is a matter of taste, but this shows at least the principles. See ?labeling_cells for further possibilities.
labs <- round(prop.table(Titanic), 2)
mosaic(Titanic, pop = FALSE)
labeling_cells(text = labs, margin = 0)(Titanic)