I have the following dataframe:
Catergory Reason Species
1 Decline Genuine 24
2 Improved Genuine 16
3 Improved Misclassified 85
4 Decline Misclassified 41
5 Decline Taxonomic 2
6 Improved Taxonomic 7
7 Decline Unclear 41
8 Improved Unclear 117
I'm trying to make a grouped bar chart, species as height and then 2 colours for catergory.
here is my code:
Reasonstats<-read.csv("bothstats.csv")
Reasonstats2<-as.matrix(Reasonstats[,3])
barplot((Reasonstats2),beside=T,col=c("darkblue","red"),ylab="number of
species",names.arg=Reasonstats$Reason, cex.names=0.8,las=2,space=c(0,100)
,ylim=c(0,120))
box(bty="l")
Now what I want, is to not have to label the two bars twice and to group them apart, I've tried changing the space value to all sorts of things and it doesn't seem to move the bars apart. Can anyone tell me what I'm doing wrong?
with ggplot2:
library(ggplot2)
Animals <- read.table(
header=TRUE, text='Category Reason Species
1 Decline Genuine 24
2 Improved Genuine 16
3 Improved Misclassified 85
4 Decline Misclassified 41
5 Decline Taxonomic 2
6 Improved Taxonomic 7
7 Decline Unclear 41
8 Improved Unclear 117')
ggplot(Animals, aes(factor(Reason), Species, fill = Category)) +
geom_bar(stat="identity", position = "dodge") +
scale_fill_brewer(palette = "Set1")
Not a barplot solution but using lattice and barchart:
library(lattice)
barchart(Species~Reason,data=Reasonstats,groups=Catergory,
scales=list(x=list(rot=90,cex=0.8)))
There are several ways to do plots in R; lattice is one of them, and always a reasonable solution, +1 to #agstudy. If you want to do this in base graphics, you could try the following:
Reasonstats <- read.table(text="Category Reason Species
Decline Genuine 24
Improved Genuine 16
Improved Misclassified 85
Decline Misclassified 41
Decline Taxonomic 2
Improved Taxonomic 7
Decline Unclear 41
Improved Unclear 117", header=T)
ReasonstatsDec <- Reasonstats[which(Reasonstats$Category=="Decline"),]
ReasonstatsImp <- Reasonstats[which(Reasonstats$Category=="Improved"),]
Reasonstats3 <- cbind(ReasonstatsImp[,3], ReasonstatsDec[,3])
colnames(Reasonstats3) <- c("Improved", "Decline")
rownames(Reasonstats3) <- ReasonstatsImp$Reason
windows()
barplot(t(Reasonstats3), beside=TRUE, ylab="number of species",
cex.names=0.8, las=2, ylim=c(0,120), col=c("darkblue","red"))
box(bty="l")
Here's what I did: I created a matrix with two columns (because your data were in columns) where the columns were the species counts for Decline and for Improved. Then I made those categories the column names. I also made the Reasons the row names. The barplot() function can operate over this matrix, but wants the data in rows rather than columns, so I fed it a transposed version of the matrix. Lastly, I deleted some of your arguments to your barplot() function call that were no longer needed. In other words, the problem was that your data weren't set up the way barplot() wants for your intended output.
I wrote a function wrapper called bar() for barplot() to do what you are trying to do here, since I need to do similar things frequently. The Github link to the function is here. After copying and pasting it into R, you do
bar(dv = Species,
factors = c(Category, Reason),
dataframe = Reasonstats,
errbar = FALSE,
ylim=c(0, 140)) #I increased the upper y-limit to accommodate the legend.
The one convenience is that it will put a legend on the plot using the names of the levels in your categorical variable (e.g., "Decline" and "Improved"). If each of your levels has multiple observations, it can also plot the error bars (which does not apply here, hence errbar=FALSE
Related
I want to create an organized stacked barplot where bars with similar proportions appear together. I have a data frame of 10,000 individuals and each individual comes from three populations. Here is my data.
library(MCMCpack)
library(ggplot2)
n = 10000
alpha = c(0.1, 0.1, 0.1)
q <- as.data.frame(rdirichlet(n,alpha))
head(q)
individuals <- c(1:nrow(q))
q <- cbind(q, individuals)
head(q)
V1 V2 V3 individuals
1 0.0032720232 3.381345e-08 0.996727943 1
2 0.3354060035 4.433923e-01 0.221201688 2
3 0.0004121665 9.661220e-01 0.033465842 3
4 0.9966997182 3.234048e-03 0.000066234 4
5 0.7789280208 2.090134e-01 0.012058562 5
6 0.0005048727 9.408364e-02 0.905411485 6
# long format for ggplot2 plotting
qm <- gather(q, key, value, -individuals)
colnames(qm) <- c("individuals", "ancestry", "proportions")
head(qm)
individuals ancestry proportions
1 1 V1 0.0032720232
2 2 V1 0.3354060035
3 3 V1 0.0004121665
4 4 V1 0.9966997182
5 5 V1 0.7789280208
6 6 V1 0.0005048727
Without any kind of ordering of data, I plotted the stacked barplot as:
ggplot(qm) + geom_bar(aes(x = individuals, y = proportions, fill= ancestry), stat="identity")
I have two questions:
(1) I don't know how to make these individuals with similar proportions cluster together, and I have tried many solutions on stack exchange already but can't get them to work on my dataset!
(2) For some reason, it seems like when I implement the code to order individuals by decreasing/increasing proportions in one ancestry, the code sometimes works on toy datasets of lower dimensions I create, but when I try to plot 10,000 individuals, the code doesn't work anymore! Is this a problem in ggplot2 or am I doing something wrong? I would appreciate any answer to this thread to also plot n = 10,000 stacked barplots.
(3) Not sure if I'm imagining this, but in my stacked barplot, it seems like R is clustering the stacked bar plots in some order unknown to me -- because I can see regular gaps between the stacked plots. In reality, there should be no gaps and I'm not sure why this is happening.
I would appreciate any help since I have already worked on this code for an embarrassingly long amount of time!!
Since, the variance of proportions within the ancestry is very high, the bars look like clustered with other ancestry. It is plotted in the right way. However, we couldn't distinguish the difference because the number of individuals is high.
If you think that the proportions on your data set would not lose it's meaning and could be interpreted in the same way if they're transformed intro exponential or log values, you can try it.
The stacked bar with exponential of the proportions:
ggplot(qm) + geom_bar(aes(x = individuals, y = exp(proportions), fill= ancestry),
stat="identity")
If you don't want have gaps between the bars, set widht to 1.
ggplot(qm) + geom_bar(aes(x = individuals, y = exp(proportions), fill= ancestry),
stat="identity",
width=1)
I would like to recreate the following chart in R using ggplot. My data is as per a similar table where for each code (A, B, C etc.). I have a current value, a value 12M ago and the respective range (max, min) over the period.
My chart needs to show the current value in red, the value 12M ago in blue and then a line show the max and min range.
I can produce this painstaking in Excel using error bars, but I would like to reproduce it in R.
Any ideas on how I can do this using ggplot? Thanks.
Here's what I came up with, but just a note: please if you post your dataset, don't post an image, but instead post the result of dput(your.data.frame). The result of that is easily copy-pasted into the console in order to replicate your dataset, whereas I recreated your data frame manually. :/
A few points first regarding your data as is and the intended plot:
The red and blue hash marks used to indicate 12 months ago and today are not a geom I know of off the top of my head, so I'm using geom_point here to show them (easiest way). You can pick another geom of you wish to show them differently.
The ranges for high and low are already specified by those column names. I'll use those values for the required aesthetics in geom_errorbar.
You can use your data as is to plot and use two separate geom_point calls (one for "today" and one for "12M ago"), but that's going to make creating the legend more difficult than it needs to be, so the better option is to adjust the dataset to support having the legend created automatically. For that, we'll use the gather function from tidyr, being sure to just "gather together" the information in "today" and "12M ago" (my column name for that was different b/c you need to start with a letter in the data frame), but leave alone the columns for "high", "low", and the letters (called "category" in my dataframe).
Where df is the original data frame:
df1 <- df %>% gather(time, value, -category, -high, -low)
The new dataframe (df1) looks like this (18 observations total):
category high low time value
1 A 82 28 M12.ago 81
2 B 82 54 M12.ago 80
3 C 80 65 M12.ago 75
4 D 76 34 M12.ago 70
5 E 94 51 M12.ago 93
6 F 72 61 M12.ago 65
where "time" has "M12.ago" or "today".
For the plot, you apply category to x and value to y, and specify ymax and ymin with high and low, respectively for the geom_errorbar:
ggplot(df1, aes(x=category, y=value)) +
geom_errorbar(aes(ymin=low, ymax=high), width=0.2) + ylim(0,100) +
geom_point(aes(color=time), size=2) +
scale_color_manual(values=list('M12.ago'='blue', 'today'='red')) +
theme_bw() + labs(color="") + theme(legend.position='bottom')
Giving you this:
I want to present 7 different signals for male against female, that means 14 signals in a graph with 7 rows and 2 columns, also make them all share the same xlab and ylab.
i used grid.arrange() and when i plotted for female alone (7 signals) it was so messy only the titles appeared and the signals are not showing.
each f here is a ggplot() of a time series, i have 7 others for males, like m20,...,m80 .
grid.arrange(f20,f30,f40,f50,f60,f70,f80, nrow = 7)
So, how would i make the graph suitable with 14 signals (7 for females to the left and 7 for males to the right)?? and make them all share the same one xlab and one ylab?!
I recommend using ggplot to plot all your data in one plot object and then use facet_grid to make individual panels for sex & signal..if you share the code & sample dataframe you are currently using I can offer more help
Facet_grid(signal~sex)
I try to make a cumulative plot for a particular (for instance the first) column of my data (example):
1 3
2 5
4 9
8 11
12 17
14 20
16 34
20 40
Than I want to overlap this plot with another cumulative plot of another data (for example the second column) and save it as a png or jpg image.
Without using the vectors implementation "by hand" as in Cumulative Plot with Given X-Axis because if I have a very large dataset i can't be able to do that.
I try the follow simple commands:
A <- read.table("cumul.dat", header=TRUE)
Read the file, but now I want that the cumulative plot is down with a particular column of this file.
The command is:
cdat1<-cumsum(dat1)
but this is for a particular vector dat1 that I need to take from the data array (cumul.dat).
Thanks
I couldn't follow your question so this is a shot in the dark answer based on key words I did get:
m <- read.table(text=" 1 3
2 5
4 9
8 11
12 17
14 20
16 34
20 40")
library(ggplot2)
m2 <- stack(m)
qplot(rep(1:nrow(m), 2), values, colour=ind, data=m2, geom="step")
EDIT I decided I like this approach bettwe:
library(ggplot2)
library(reshape2)
m$x <- seq_len(nrow(m))
m2 <- melt(m, id='x')
qplot(x, value, colour=variable, data=m2, geom="step")
I wasn't quite sure when the events were happening and what the observations were. I'm assuming the events are just at 1,2,3,4 and the columns represent sounds of the different groups. If that's the case, using Lattice I would do
require(lattice)
A<-data.frame(dat1=c(1,2,4,8,12,14,16,20), dat2=c(3,5,9,11,17,20,34,40))
dd<-do.call(make.groups, lapply(A, function(x) {data.frame(x=seq_along(x), y=cumsum(x))}))
xyplot(y~x,dd, groups=which, type="s", auto.key=T)
Which produces
With base graphics, this can be done by specifying type='s' in the plot call:
matplot(apply(A, 2, cumsum), type='s', xlab='x', ylab='y', las=1)
Note I've used matplot here, but you could also plot the series one at a time, the first with plot and the second with points or lines.
We could also add a legend with, for example:
legend('topleft', c('Series 1', 'Series 2'), bty='n', lty=c(1, 3), col=1:2)
I want to plot 2 graphs in 1 frame. Basically I want to compare the results.
Anyways, the code I tried is:
plot(male,pch=16,col="red")
lines(male,pch=16,col="red")
par(new=TRUE)
plot(female,pch=16,col="green")
lines(female,pch=16,col="green")
When I run it, I DO get 2 plots in a frame BUT it changes my y-axis. Added my plot below. Anyways, y-axis values are -4,-4,-3,-3,...
It's like both of the plots display their own axis.
Please help.
Thanks
You don't need the second plot. Just use
> plot(male,pch=16,col="red")
> lines(male, pch=16, col = "red")
> lines(female, pch=16, col = "green")
> points(female, pch=16, col = "green")
Note: that will set the frame boundaries based on the first data set, so some data from the second plot could be outside the boundaries of the plot. You can fix it by e.g. setting the limits of the first plot yourself.
For this kind of plot I usually like the plotting with ggplot2 much better. The main reason: It generalizes nicely to more than two lines without a lot of code.
The drawback for your sample data is that it is not available as a data.frame, which is required for ggplot2. Furthermore, in every case you need a x-variable to plot against. Thus, first let us create a data.frame out of your data.
dat <- data.frame(index=rep(1:10, 2), vals=c(male, female), group=rep(c('male', 'female'), each=10))
Which leaves us with
> dat
index vals group
1 1 -0.4334269341 male
2 2 0.8829902521 male
3 3 -0.6052638138 male
4 4 0.2270191965 male
5 5 3.5123679143 male
6 6 0.0615821014 male
7 7 3.6280155376 male
8 8 2.3508890457 male
9 9 2.9824432680 male
10 10 1.1938052833 male
11 1 1.3151289227 female
12 2 1.9956491556 female
13 3 0.8229389822 female
14 4 1.2062726250 female
15 5 0.6633392820 female
16 6 1.1331669670 female
17 7 -0.9002109636 female
18 8 3.2137052284 female
19 9 0.3113656610 female
20 10 1.4664434215 female
Note that my command assumes you have 10 data values each. That command would have to be adjusted according to your actual data.
Now we may use the mighty power of ggplot2:
library(ggplot2)
ggplot(dat, aes(x=index, y=vals, color=group)) + geom_point() + geom_line()
The call above has three elements: ggplot initializes the plot, tells R to use dat as datasource and defines the plot aesthetics, or better: Which aesthetic properties of the plot (such as color, position, size, etc.) are influenced by your data. We use the x and y-values as expected and furthermore set the color aesthetic to the grouping variable - that makes ggplot automatically plot two groups with different colors. Finally, we add two geometries, that pretty much do what is written above: Draw lines and draw points.
The result:
If you have your data saved in the standard way in R (in a data.frame), you end with one line of code. And if after some thousands years of evolution you want to add another gender, it is still one line of code.