how to facet multiple data sets with different number of rows - r

I am trying to plot three data sets that share an x axis. some of the data sets, however, have missing data and are thus of different length. I can plot them fine individually but when I try to facet them all together I get an error that the data sets contain different numbers of rows. This error only occurs when I facet the plot (which is necessary).
Any suggestions for how I could get the facet plot to accept data sets with different numbers of rows?
The code i've been using is:
ggplot()+
geom_line(data=x,aes(x=x$BIN_START,y=x$TajimaD),size=0.6,alpha=0.65,colour="skyblue1")+
geom_line(data=y,aes(x=y$BIN_START,y=y$TajimaD),size=0.3,alpha=0.85,colour="greenyellow")+
geom_line(data=z,aes(x=z$BIN_START,y=z$TajimaD),size=0.25,alpha=0.95,colour="black")+
scale_x_continuous()+
facet_grid(rows=vars(x$CHROM))+
theme_classic()+
ylab("TajimaD") +
xlab("Location (bp)")
As was suggested in a comment I have now moved all the data into a single file and added a column to indicate the population the data is from. I am still getting a similar error message: "replacement has 22588 rows, data has 7537"
ggplot()+
geom_line(data=x,aes(x=a$BIN_START,y=a$TajimaD,color=a$Population),size=0.6,alpha=0.65)+
scale_x_continuous()+
facet_grid(rows=vars(a$CHROM))+
theme_classic()+
ylab("TajimaD") +
xlab("Location (bp)")

On your second attempt you're using x as data but then use a$BIN_START, etc. It's very likely that x and a have a different number of rows, and hence the error. I suggest removing the <dataset_name>$ alltogether in all your aes() calls when you use ggplot2. When you say data = x, you only need to write aes(x=BIN_START,y=TajimaD,color=Population) (i.e. no need for x$).

Related

How to change the color of one factor in every dataset in a list using a loop?

I have a set of 6 datasets, each of which has one factor named "unique" that I would like to set as the same color within all my datasets, while leaving all my other factors intact with the same colors
As an example, I currently have something along these lines:
allPlots<-list()
for(i in seq(length(allData))){
allPlots[[i]]<-ggplot(allData[[i]], aes(y=count, x="", fill=Category)) +
geom_bar(stat="identity", show.legend=FALSE,fill = c(rev(hue_pal()(1)),"black"))+
ggtitle(names(allData)[i])
}
Within each of my datasets, I ordered it so that my factor level of note, "unique", is the first row in all my datasets, hence the rev(hue_pal()(1)) call. However, upon attempting to plot all of these plots, I get the error code "Error: Aesthetics must be either length 1 or the same as the data (35): fill"
Thanks !

Avoid overlapping x-axis labels with ggplot? [duplicate]

I'm having some trouble with qplot in R. I am trying to plot data from a data frame. When I execute the command below the plot gets bunched up on the left side (see the image below). The data frame only has 963 rows so I don't think size is the issue, but I can use the same command on a smaller data frame and it looks fine. Any ideas?
library(ggplot2)
qplot(x=variable,
y=value,
data=data,
color=Classification,
main="Average MapQ Scores")
Or similarly:
ggplot(data = data, aes(x = variable, y = value, color = Classification) +
geom_point()
Your column value is likely a factor, when it should be a numeric. This causes each categorical value of value to be given its own entry on the y-axis, thus producing the effect you've noticed.
You should coerce it to be a numeric
data$value <- as.numeric(as.character(data$value))
Note that there is probably a good reason it has been interpreted as a factor and not a numeric, possibly because it has some entries that are not pure numeric values (maybe 1,000 or 1000 m or some other character entry among the numbers). The consequence of the coercion may be a loss of information, so be warned or cleanse the data thoroughly.
Also, you appear to have the same problem on the x-axis.

R ggplot geom_text Aesthetic Length

I'm working with a really big data setcontaining one dummy variable and a factor variable with 14 levels- a sample of which I have posted here. I'm trying to make a stacked proportional bar graph using the following code:
ggplot(data,aes(factor(data$factor),fill=data$dummy))+
geom_bar(position="fill")+
ylab("Proportion")+
theme(axis.title.y=element_text(angle=0))
It works great and its almost the plot I need. I just want to add small text labels reporting the number of observations of each factor level. My intuition tells me that something like this should work
Labels<-c("n=1853" , "n=392", "n=181" , "n=80", "n=69", "n=32" , "n=10", "n=6", "n=4", "n=5", "n=3", "n=3", "n=2", "n=1" )
ggplot(data,aes(factor(data$factor),fill=data$dummy))+
geom_bar(position="fill")+
geom_text(aes(label=Labels,y=.5))+
ylab("Proportion")+
theme(axis.title.y=element_text(angle=0))
But it spits out a blank graph and the error
Aesthetics must either be length one, or the same length as the dataProblems:Labels
this really doesn't make sense to me because I know for a fact that the length of my factor levels is the same length as the number of labels I muscled in. I've been trying to figure out how I can get it to just print what I need without creating a vector of values for the number of observations like this example, but no matter what I try I always get the same Aesthetics error.
How about this:
library(dplyr)
# Create a separate data frame of counts for the count labels
counts = data %>% group_by(factor) %>%
summarise(n=n()) %>%
mutate(dummy=NA)
counts$factor = factor(counts$factor, levels=0:10)
ggplot(data, aes(factor(factor), fill=factor(dummy))) +
geom_bar(position="fill") +
geom_text(data=counts, aes(label=n, x=factor, y=-0.03), size=4) +
ylab("Proportion")+
theme(axis.title.y=element_text(angle=0))
Your method is the right idea, but Labels needs to be a data frame, rather than a vector. geom_text needs to be given the name of the data frame using the data argument. Then, the label argument inside aes tells geom_text which column to use for the labels. Also, even though geom_text doesn't use the dummy column, it has to be in the data frame or you'll get an error.

R - converting a table to data frame

I'm working on the Titanic dataset from R. I want to analyse the dataset using a ggplot (stacked and group bar plots). So I wanted to convert the table into a data-frame so I could plot the graphs. I used the following code to convert :
df<-as.data.frame(Titanic)
View(df)
However, even on viewing I see my df to be more like a data-table.
And further when I tried to use it to plot a function usinf the code:
ggplot(data=df) + geom_bar(aes(x=Class,y=Sex))
All it shows is an empty plot, with just the labels on x and y axis, along with the categorical values of Sex as Male & Female and Class as 1st,2nd,3rd and crew.
What confuses me even more is that it's picking up the categorical values from the dataset but not the observations.
Please let me know how I can convert to dataframe correctly. Thanks :)
If I reproduce your code it gives me this error:
Error : Mapping a variable to y and also using stat="bin".
This is because you also included the y=Sex in your script. The main question therefore is, what would you like to plot?
If this is a barchart with the count of persons in each class the code will be:
ggplot(data=df) + geom_bar(aes(x=Class))
If it will be the total amount of females/males it will be:
ggplot(data=df) + geom_bar(aes(x=Sex))
Do not try to plot them at the same time.
To get back to the question. There is nothing wrong with your data frame. It is your ggplot code that is faulty.

Too many labels on axis

I'm having some trouble with qplot in R. I am trying to plot data from a data frame. When I execute the command below the plot gets bunched up on the left side (see the image below). The data frame only has 963 rows so I don't think size is the issue, but I can use the same command on a smaller data frame and it looks fine. Any ideas?
library(ggplot2)
qplot(x=variable,
y=value,
data=data,
color=Classification,
main="Average MapQ Scores")
Or similarly:
ggplot(data = data, aes(x = variable, y = value, color = Classification) +
geom_point()
Your column value is likely a factor, when it should be a numeric. This causes each categorical value of value to be given its own entry on the y-axis, thus producing the effect you've noticed.
You should coerce it to be a numeric
data$value <- as.numeric(as.character(data$value))
Note that there is probably a good reason it has been interpreted as a factor and not a numeric, possibly because it has some entries that are not pure numeric values (maybe 1,000 or 1000 m or some other character entry among the numbers). The consequence of the coercion may be a loss of information, so be warned or cleanse the data thoroughly.
Also, you appear to have the same problem on the x-axis.

Resources