plot count histogram in R ggplot

plot count histogram in R ggplot - r

I need help with this please.
I have searched here but not got the right output.
I am trying to plot this in R, so I can plot 3 files side-by-side in a single plot using GGplot.
The output I desire (plotted with excel) is this
What i am getting using GGplot is this
The R code i am using is this
A1 <- read.table("A1.txt", header = T, sep = "\t")
library(ggplot2)
ggplot(A1, aes(x = count)) + geom_bar()
The data is a tab-delimited file like this
length count
26 344776
27 289439
18 673395
28 338146
19 710702
20 928326
21 3491352
22 2724981
23 699007
24 726121
25 472509
The length, as it were will only be labels on the x axis for the counts plotted on the y-axis.

Is this what you wanted?
ggplot(A1, aes(x = as.character(length), y=count)) + geom_bar(stat="identity")

Related

box plots for two columns side by side using ggplot

I have a dataset in the following format
value1 value2 group
10 20 A
20 30 A
67 45 B
98 76 C
102 11 A
11 22 B
10 10 B
19 20 C
I am trying to make box plots for three groups (A, B and C) and the box plots for 1st and end column should be side by side. I can do two separate plots like following, but not able to figure out how to combine to put it side by side.
p1 <- ggplot(x, aes(x=group, y=value1)) + geom_boxplot()
p2 <- ggplot(x, aes(x=group, y=value)) + geom_boxplot()
I would appreciate any help. I am a newbie in R and ggplot.

Here's an option using pivot_longer from tidyr
x_new <- tidyr::pivot_longer(x, c(value1, value2))
ggplot(x_new, aes(x = group, y = value, col = name, fill = name)) + geom_boxplot(alpha = .5)

The gridExtra package can do this too. Assign your plots to variables then just use grid.arrange(plot1,plot2). Look up the documentation with ?grid.arrange for extra options.

Error: Don't know how to add e2 to a plot

Hello I am working on a data set which looks like as below
raw_data =
week v1 v3 v4 v5 v6
1 17 20.983819 7.799831 16.0600278 113.018687
2 34 22.651678 8.090671 16.4898951 120.824817
3 15 24.197048 6.892516 16.9805836 128.105372
4 14 26.016688 5.272781 17.471264 140.15794
5 26 27.572317 10.767018 17.8686156 154.886518
6 37 29.018684 21.280104 19.8096452 165.244061
7 27 30.395094 32.140543 22.937902 176.453934
8 24 31.832068 44.008145 28.714597 184.7598
9 16 33.383742 45.704626 39.2958153 193.461108
10 28 34.877819 39.355206 45.9069661 201.305558
What I am trying to achieve is to plot variables from v3 to v6 as a stacked area plot while variable v1 as a line plot in the same graph plot across the week.
I have tried the following code which does plot the stack area plot but not the line plot.
mdf <- melt(raw_data, id="Week") # convert to long format
p <- ggplot(mdf, aes(x=Week, y=value)) + geom_area(aes(fill= mdf$variable), position = 'stack') + theme_classic()
p + ggplot(raw_data, aes(x=Week, y=v1)) +geom_line()
and I get the following error
Error: Don't know how to add e2 to a plot
I tired the method suggested by this article How to overlay geom_bar and geom_line plots with different number of elements using ggplot2? and used the below code
mdf <- melt(raw_data, id="Week") # convert to long format
p <- ggplot(mdf, aes(x=Week, y=value)) + geom_area(aes(colour =
mdf$variable, fill= mdf$variable), position = 'stack') + theme_classic()
p + geom_line(aes(x=Week, y=mdf$variable=="v1"))
but then I got the below error
Error: Discrete value supplied to continuous scale
I tried to convert the v1 variable as per below code referencing the following article, however it did not help to resolve.
How do I get discrete factor levels to be treated as continuous?
raw_data$v1 <- as.numeric(as.character(raw_data$v1))
Please help how to resolve the issue. Also, how do I create a black border line for each graph in my stacked graph such that it is easy to differentiate among the graphs.
Thanks a lot for the help in advance!!

Using your melt command does not work for me, so I'm using gather instead.
All you need to do is add geom_line and specify the data and mapping:
mdf <- tidyr::gather(raw_data, variable, value, -week, -v1)
ggplot(mdf, aes(week, value)) +
geom_area(aes(fill = variable), position = 'stack', color = 'black') +
geom_line(aes(y = v1), raw_data, lty = 2)
Note: don't use $ inside aes, ever!

Create a histogram filled using another variable in ggplot

I am working with a dataset that includes the age of some people. I am trying to create a histogram for the ages of the people with ggplot in which the colours of the bars of the histogram should depend on some predefined age intervals.
So for example imagine a dataset like this:
>X
Age Age2
10 Under 14
11 Under 14
10 Under 14
13 Under 14
20 Between 15 and 25
21 Between 15 and 25
35 Above 25
I have tried to do something like this:
ggplot(X, aes(x = Age)) + geom_histogram(aes(fill = Age2))
But it displays the following error message:
Error: StatBin requires a continuous x variable the x variable is discrete. Perhaps you want stat="count"?
What am I doing wrong?

plotted with ggplot2, corrected excessive capitalization.
age <-c(10,11,10,13,20,21,35)
age2<-c(rep("Under 14", times=4), rep("Between 15 and 25",times=2),"Above 25")
X<-as.data.frame(cbind(age,age2))
X$age<-as.numeric(age)
X
names(X)
summary(X)
p<- ggplot(X, aes(x = age))+
geom_histogram(aes(fill = age2))
p

Generating a histogram and density plot from binned data

I've binned some data and currently have a dataframe that consists of two columns, one that specifies a bin range and another that specifies the frequency like this:-
> head(data)
binRange Frequency
1 (0,0.025] 88
2 (0.025,0.05] 72
3 (0.05,0.075] 92
4 (0.075,0.1] 38
5 (0.1,0.125] 20
6 (0.125,0.15] 16
I want to plot a histogram and density plot using this but I can't seem to find a way of doing so without having to generate new bins etc. Using this solution here I tried to do the following:-
p <- ggplot(data, aes(x= binRange, y=Frequency)) + geom_histogram(stat="identity")
but it crashes. Anyone know of how to deal with this?
Thank you

the problem is that ggplot doesnt understand the data the way you input it, you need to reshape it like so (I am not a regex-master, so surely there are better ways to do is):
df <- read.table(header = TRUE, text = "
binRange Frequency
1 (0,0.025] 88
2 (0.025,0.05] 72
3 (0.05,0.075] 92
4 (0.075,0.1] 38
5 (0.1,0.125] 20
6 (0.125,0.15] 16")
library(stringr)
library(splitstackshape)
library(ggplot2)
# extract the numbers out,
df$binRange <- str_extract(df$binRange, "[0-9].*[0-9]+")
# split the data using the , into to columns:
# one for the start-point and one for the end-point
df <- cSplit(df, "binRange")
# plot it, you actually dont need the second column
ggplot(df, aes(x = binRange_1, y = Frequency, width = 0.025)) +
geom_bar(stat = "identity", breaks=seq(0,0.125, by=0.025))
or if you don't want the data to be interpreted numerically, you can just simply do the following:
df <- read.table(header = TRUE, text = "
binRange Frequency
1 (0,0.025] 88
2 (0.025,0.05] 72
3 (0.05,0.075] 92
4 (0.075,0.1] 38
5 (0.1,0.125] 20
6 (0.125,0.15] 16")
library(ggplot2)
ggplot(df, aes(x = binRange, y = Frequency)) + geom_bar(stat = "identity")
you won't be able to plot a density-plot with your data, given its not continous but rather categorical, thats why I actually prefer the second way of showing it,

You can try
library(ggplot2)
ggplot(df, aes(x = binRange, y = Frequency)) + geom_col()

Plots in R (ggplot2) for time series with multiple values per time?

Let's say I have data consisting of the time I leave the house and the number of minutes it takes me to get to work. I'll have some repeated values:
08:00, 20
08:04, 25
08:30, 40
08:20, 23
08:04, 22
And some numbers will repeat (like 08:04). What I want to do is a run a scatter plot that is correctly scaled at the x-axis but allows these multiple values per entry so that I could view the trend.
Is a time-series even what I want to be using? I've been able to plot a time series graph that has one value per time, and I've gotten multiple values plotted but without the time-series scaling. Can anyone suggest a good approach? Preference for ggplot2 but I'll take standard R plotting if it's easier.

First lets prepare some more data
set.seed(123)
df <- data.frame(Time = paste0("08:", sample(35:55, 40, replace = TRUE)),
Length = sample(20:50, 40, replace = TRUE),
stringsAsFactors = FALSE)
df <- df[order(df$Time), ]
df$Attempt <- unlist(sapply(rle(df$Time)$lengths, function(i) 1:i))
df$Time <- as.POSIXct(df$Time, format = "%H:%M") # Fixing y axis
head(df)
Time Length Attempt
6 08:35 24 1
18 08:35 43 2
35 08:35 34 3
15 08:37 37 1
30 08:38 33 1
38 08:39 38 1
As I understand, you want to preserve the order of observations of the same leaving house time. At first I ignored that and got a scatter plot like this:
ggplot(data = df, aes(x = Length, y = Time)) +
geom_point(aes(size = Length, colour = Length)) +
geom_path(aes(group = Time, colour = Length), alpha = I(1/3)) +
scale_size(range = c(2, 7)) + theme(legend.position = 'none')
but considering three dimensions (Time, Length and Attempt) scatter plot no longer can show us all the information. I hope I understood you correctly and this is what you are looking for:
ggplot(data = df, aes(y = Time, x = Attempt)) + geom_tile(aes(fill = Length))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

plot count histogram in R ggplot - r

Is this what you wanted? ggplot(A1, aes(x = as.character(length), y=count)) + geom_bar(stat="identity")

Related

box plots for two columns side by side using ggplot

Error: Don't know how to add e2 to a plot

Create a histogram filled using another variable in ggplot

Generating a histogram and density plot from binned data

Plots in R (ggplot2) for time series with multiple values per time?

Categories

Resources