I am working with a dataset that includes the age of some people. I am trying to create a histogram for the ages of the people with ggplot in which the colours of the bars of the histogram should depend on some predefined age intervals.
So for example imagine a dataset like this:
>X
Age Age2
10 Under 14
11 Under 14
10 Under 14
13 Under 14
20 Between 15 and 25
21 Between 15 and 25
35 Above 25
I have tried to do something like this:
ggplot(X, aes(x = Age)) + geom_histogram(aes(fill = Age2))
But it displays the following error message:
Error: StatBin requires a continuous x variable the x variable is discrete. Perhaps you want stat="count"?
What am I doing wrong?
plotted with ggplot2, corrected excessive capitalization.
age <-c(10,11,10,13,20,21,35)
age2<-c(rep("Under 14", times=4), rep("Between 15 and 25",times=2),"Above 25")
X<-as.data.frame(cbind(age,age2))
X$age<-as.numeric(age)
X
names(X)
summary(X)
p<- ggplot(X, aes(x = age))+
geom_histogram(aes(fill = age2))
p
Related
I need help with this please.
I have searched here but not got the right output.
I am trying to plot this in R, so I can plot 3 files side-by-side in a single plot using GGplot.
The output I desire (plotted with excel) is this
What i am getting using GGplot is this
The R code i am using is this
A1 <- read.table("A1.txt", header = T, sep = "\t")
library(ggplot2)
ggplot(A1, aes(x = count)) + geom_bar()
The data is a tab-delimited file like this
length count
26 344776
27 289439
18 673395
28 338146
19 710702
20 928326
21 3491352
22 2724981
23 699007
24 726121
25 472509
The length, as it were will only be labels on the x axis for the counts plotted on the y-axis.
Is this what you wanted?
ggplot(A1, aes(x = as.character(length), y=count)) + geom_bar(stat="identity")
Currently I have a data frame where I want to plot three variables into one boxplot:
livingsetting factor outcome
1 1 CKD 2
2 1 CKD 13
3 1 CKD 23
4 13 CKD 12
5 7 CKD -14
The livingsetting variable contains factors "1", "7", and "13".
The factor variable contains factors "CKD", "HD", and "Transplant".
The outcome variable is a continuous outcome variable.
This is my code for the boxplot:
ggplot(df, aes(x = interaction(livingsetting, factor),
y= outcome)) + geom_boxplot(aes(fill = livingsetting)) + xlab("Factors")+ ylab("Y")
And my plot looks like this:
The x-axis labels show 1.CKD, 13.CKD, 7.CKD, 1.HD, 13.HD, etc., but is it possible to tweak the xlab part so that the boxplot shows "CKD", "HD", and "Transplant" as the labels?
(so that each of the individual plots are grouped by threes).
For example, the first red, green, and blue plots will be labeled as "CKD" (as the group), the second red, green, and blue plots will be labeled as "HD", etc.
Here is an example illustrating my comment from above. You don't need interaction, since each aesthetic will create another boxplot:
df <- read.table(text = " livingsetting factor outcome
1 7 BLA 2
2 1 BLA 13
3 1 CKD 23
4 13 CKD 12
5 7 CKD -14", header = T, row.names = 1)
df$livingsetting <- as.factor(df$livingsetting)
library(ggplot2)
ggplot(data = df, aes(x = factor, y = outcome, fill = livingsetting)) +
geom_boxplot()
Is there a reason not to use facet_wrap or facet_grid? Unless I'm misunderstanding what you're looking for, this is a perfect use-case for faceting, and then you don't need interaction.
You should be able to change to this:
ggplot(df, aes(x = livingsetting, y = outcome)) +
geom_boxplot(aes(fill = livingsetting)) +
facet_wrap(~ factor)
This uses the dataframe as is, rather than getting the interaction, and adds labels for the factor variable to the tops of the facets, rather than on the tick labels (though you could do that if that's something you want).
Hope that helps!
I'm trying to create a bar graph on ggplot that has proportions rather than counts, and I have c+geom_bar(aes(y=(..count..)/sum(..count..)*100)) but I'm not sure what either of the counts refer to. I tried putting in the data but it didn't seem to work. What should I input here?
This is the data I'm using
> describe(topprob1)
topprob1
n missing unique Info Mean
500 0 9 0.93 3.908
1 2 3 4 5 6 7 8 9
Frequency 128 105 9 15 13 172 39 12 7
% 26 21 2 3 3 34 8 2 1
You haven't provided a reproducible example, so here's an illustration with the built-in mtcars data frame. Compare the following two plots. The first gives counts. The second gives proportions, which are displayed in this case as percentages. ..count.. is an internal variable that ggplot creates to store the count values.
library(ggplot2)
library(scales)
ggplot(mtcars, aes(am)) +
geom_bar()
ggplot(mtcars, aes(am)) +
geom_bar(aes(y=..count../sum(..count..))) +
scale_y_continuous(labels=percent_format())
You can also use ..prop.. computed variable with group aesthetics:
library(ggplot2)
library(scales)
ggplot(mtcars, aes(am)) +
geom_bar(aes(y=..prop.., group = 1)) +
scale_y_continuous(labels=percent_format())
I've binned some data and currently have a dataframe that consists of two columns, one that specifies a bin range and another that specifies the frequency like this:-
> head(data)
binRange Frequency
1 (0,0.025] 88
2 (0.025,0.05] 72
3 (0.05,0.075] 92
4 (0.075,0.1] 38
5 (0.1,0.125] 20
6 (0.125,0.15] 16
I want to plot a histogram and density plot using this but I can't seem to find a way of doing so without having to generate new bins etc. Using this solution here I tried to do the following:-
p <- ggplot(data, aes(x= binRange, y=Frequency)) + geom_histogram(stat="identity")
but it crashes. Anyone know of how to deal with this?
Thank you
the problem is that ggplot doesnt understand the data the way you input it, you need to reshape it like so (I am not a regex-master, so surely there are better ways to do is):
df <- read.table(header = TRUE, text = "
binRange Frequency
1 (0,0.025] 88
2 (0.025,0.05] 72
3 (0.05,0.075] 92
4 (0.075,0.1] 38
5 (0.1,0.125] 20
6 (0.125,0.15] 16")
library(stringr)
library(splitstackshape)
library(ggplot2)
# extract the numbers out,
df$binRange <- str_extract(df$binRange, "[0-9].*[0-9]+")
# split the data using the , into to columns:
# one for the start-point and one for the end-point
df <- cSplit(df, "binRange")
# plot it, you actually dont need the second column
ggplot(df, aes(x = binRange_1, y = Frequency, width = 0.025)) +
geom_bar(stat = "identity", breaks=seq(0,0.125, by=0.025))
or if you don't want the data to be interpreted numerically, you can just simply do the following:
df <- read.table(header = TRUE, text = "
binRange Frequency
1 (0,0.025] 88
2 (0.025,0.05] 72
3 (0.05,0.075] 92
4 (0.075,0.1] 38
5 (0.1,0.125] 20
6 (0.125,0.15] 16")
library(ggplot2)
ggplot(df, aes(x = binRange, y = Frequency)) + geom_bar(stat = "identity")
you won't be able to plot a density-plot with your data, given its not continous but rather categorical, thats why I actually prefer the second way of showing it,
You can try
library(ggplot2)
ggplot(df, aes(x = binRange, y = Frequency)) + geom_col()
I have a dataframe, df, that looks like so:
group ID y1 y2
A 1 21 14
A 2 11 21
A 3 21 17
...
B 1 71 12
B 2 41 14
B 3 31 15
...
And would like to use ggplot() to plot variables in one group against variables in another. For example, df$y1[df$group=="A"] against df$y2[df$group=="B"]. I naively thought the code for plotting may be something like this, but it's obviously not correct:
ggplot(df, aes(x = df$y1[df$group=="A"], y = df$y2[df$group=="B"])) + geom_point()
I know that if I wanted to subset the overall data, for example to plot only group A, I could do something like:
ggplot(subset(df, group=="A"), aes(x = y1, y = y2)) + geom_point()
I think I could solve this by reshaping my data so as to create variables y1.A, y1.B, y2.A, y2.B and so forth, but I have many variables and this seems like a long-winded approach.