I'm having some trouble with qplot in R. I am trying to plot data from a data frame. When I execute the command below the plot gets bunched up on the left side (see the image below). The data frame only has 963 rows so I don't think size is the issue, but I can use the same command on a smaller data frame and it looks fine. Any ideas?
library(ggplot2)
qplot(x=variable,
y=value,
data=data,
color=Classification,
main="Average MapQ Scores")
Or similarly:
ggplot(data = data, aes(x = variable, y = value, color = Classification) +
geom_point()
Your column value is likely a factor, when it should be a numeric. This causes each categorical value of value to be given its own entry on the y-axis, thus producing the effect you've noticed.
You should coerce it to be a numeric
data$value <- as.numeric(as.character(data$value))
Note that there is probably a good reason it has been interpreted as a factor and not a numeric, possibly because it has some entries that are not pure numeric values (maybe 1,000 or 1000 m or some other character entry among the numbers). The consequence of the coercion may be a loss of information, so be warned or cleanse the data thoroughly.
Also, you appear to have the same problem on the x-axis.
Related
Trying to run a simple and quick analysis of some variables. I run this code:
ggplot(data, aes(var1)) +
geom_bar()
Resulting in a Histogram however in spite of having only 6 possible values in var1, x Axis only shows 2,4,6. Is it possible to easily include all 6 possible values as labels?
You want to have frequency bar plot for six individual numbers. However, you wish to see all of these numbers on the X axis, which makes me think that you actually treat them as categorical data rather then numeric data, so you actually would prefer a categorical X axis which shows all the data. Turning the x into a factor should do the trick:
data <- data.frame(var1=floor(6*runif(200) + 1))
ggplot(data, aes(factor(var1))) + geom_bar()
Below: left - without factor, right - with factor.
What does your data look like?
Assuming you have a numeric x, adding scale_x_continuous(breaks = seq(1,6, by = 1))should work.
Of course this would only work if the x values go from 1 to 6... Otherwise you can replace the seq call with a vector that contains the values you want.
I'm having some trouble with qplot in R. I am trying to plot data from a data frame. When I execute the command below the plot gets bunched up on the left side (see the image below). The data frame only has 963 rows so I don't think size is the issue, but I can use the same command on a smaller data frame and it looks fine. Any ideas?
library(ggplot2)
qplot(x=variable,
y=value,
data=data,
color=Classification,
main="Average MapQ Scores")
Or similarly:
ggplot(data = data, aes(x = variable, y = value, color = Classification) +
geom_point()
Your column value is likely a factor, when it should be a numeric. This causes each categorical value of value to be given its own entry on the y-axis, thus producing the effect you've noticed.
You should coerce it to be a numeric
data$value <- as.numeric(as.character(data$value))
Note that there is probably a good reason it has been interpreted as a factor and not a numeric, possibly because it has some entries that are not pure numeric values (maybe 1,000 or 1000 m or some other character entry among the numbers). The consequence of the coercion may be a loss of information, so be warned or cleanse the data thoroughly.
Also, you appear to have the same problem on the x-axis.
I have an R dataframe that contains a string variable and a numerical variable, and I would like to plot the top 10 strings, based on the value of the numerical variable.
I can of course get the top 10 entries pretty simply:
top10_rank <- rank[order(rank$numerical_var_name),]
My first approach to trying to visualize this was to simple attempt to plot this like:
ggplot(data=top10_rank, aes(x = top10_rank$numerical_var_name, y = top10_rank$string_name)) + geom_point(size=3)
And to a first approximation this "works" - the problem is that the strings on the y axis are sorted alphabetically rather than by the numerical value.
My preference would be to find a way to plot the top 10 strings without having to bother showing the numerical variable at all - just basically as a list (even better would be if I could enumerate the list). I am attempting to plot this so it looks more pleasing than simply dumping the text to the screen.
Any ideas greatly appreciated!
The y-axis tick marks may be sorted alphabetically, but the points are drawn in order(from left to right) of the top10_rank dataframe. What you need to do is change the order of the y-axis. Add this to your call of ggplot + scale_y_discrete(limits=top10_rank$String) and it should work.
ggplot(data=top10_rank, aes(x = top10_rank$Number,
y = top10_rank$String)) + geom_point(size=3) + scale_y_discrete(limits=top10_rank$String)
Here is a link to a great resource on R graphics: R Graphics Cookbook
I'm working with a really big data setcontaining one dummy variable and a factor variable with 14 levels- a sample of which I have posted here. I'm trying to make a stacked proportional bar graph using the following code:
ggplot(data,aes(factor(data$factor),fill=data$dummy))+
geom_bar(position="fill")+
ylab("Proportion")+
theme(axis.title.y=element_text(angle=0))
It works great and its almost the plot I need. I just want to add small text labels reporting the number of observations of each factor level. My intuition tells me that something like this should work
Labels<-c("n=1853" , "n=392", "n=181" , "n=80", "n=69", "n=32" , "n=10", "n=6", "n=4", "n=5", "n=3", "n=3", "n=2", "n=1" )
ggplot(data,aes(factor(data$factor),fill=data$dummy))+
geom_bar(position="fill")+
geom_text(aes(label=Labels,y=.5))+
ylab("Proportion")+
theme(axis.title.y=element_text(angle=0))
But it spits out a blank graph and the error
Aesthetics must either be length one, or the same length as the dataProblems:Labels
this really doesn't make sense to me because I know for a fact that the length of my factor levels is the same length as the number of labels I muscled in. I've been trying to figure out how I can get it to just print what I need without creating a vector of values for the number of observations like this example, but no matter what I try I always get the same Aesthetics error.
How about this:
library(dplyr)
# Create a separate data frame of counts for the count labels
counts = data %>% group_by(factor) %>%
summarise(n=n()) %>%
mutate(dummy=NA)
counts$factor = factor(counts$factor, levels=0:10)
ggplot(data, aes(factor(factor), fill=factor(dummy))) +
geom_bar(position="fill") +
geom_text(data=counts, aes(label=n, x=factor, y=-0.03), size=4) +
ylab("Proportion")+
theme(axis.title.y=element_text(angle=0))
Your method is the right idea, but Labels needs to be a data frame, rather than a vector. geom_text needs to be given the name of the data frame using the data argument. Then, the label argument inside aes tells geom_text which column to use for the labels. Also, even though geom_text doesn't use the dummy column, it has to be in the data frame or you'll get an error.
I have the following data.frame:
sample <- data.frame(day=c(1,2,5,10,12,12,14))
sample.table <- as.data.frame(table(sample$day))
Now what I'd like to do is graph the day against the count of days, so something like:
require(ggplot2)
qplot(Var1, Freq, data=sample.table)
I realized though that Var1 really really really wants to be a factor. This works fine for a small number of days, but is terrible when days becomes much larger because the graph becomes unreadable. If I change it to a numeric or integer, then instead of plotting day on the x-axis, it plots the count of day, e.g. 1,2,3,4,5,6,7.
What can I do so that if I have, say 5000 days, it is still visible well?
This is because when you use table you get a vector with names (which are characters), and when you convert to data.frame these get converted to factors with the default settings.
You could avoid this by using your original data and getting ggplot2 to count the data:
qplot(day, ..count.., data=sample, stat="bin", binwidth=1)
or just use a histogram,
qplot(day, data=sample, geom="histogram", binwidth=1)
Note that you can adjust the binwidth argument to count in larger groups.
Figured out a hack for this.
as.integer(as.character(sample$day))