Plot categorical data as histogram/ bar in R? - r

I am new to R and have been trying for a few days to plot histogram / bar chart to view the trend. I have this categorical variable : countryx and coded it into 1,2,3.
I have tried these 2 scripts below and got error messages as follows :
Output 1: blank chart with x and y axis, no stack/bar trend
qplot(DI$countryx,geom = "histogram",ylab = "count",
xlab = "countryx",binwidth=5,colour=I("blue"),fill=I("wheat"))
Output 2: error message- ggplot2 doesn't know how to deal with data of class integer
ggplot(DI$countryX, aes(x=countryx))
+ geom_bar(aes(y=count), stat = "count",position ="stack",...,
width =5,aes=true)
Appreciate for all advice.
Thank you very much for your help!

Multiple problems with your code. ggplot takes a dataframe, not a vector, but you're supplying a vector. Try this
ggplot(DI, aes(x=countryx, y = count)) + geom_col(width = 5)

As #yeedle mentioned you need a data.frame (maybe use as.data.frame)
How about:
library(ggplot2)
df <- data.frame(countryx = rep(1:3), count = rbinom(3,10,0.3))
p <- ggplot2::ggplot(df, aes(x = countryx, y = count)) + ylab("count")
p + geom_col(aes(x = countryx, fill = factor(countryx)))

Related

Bar plot X axis not in numerical order

I'm trying to plot using ggplot2 a bar graph. With the x values being different ranges ( SO x1= 0-10, x2= 11-20, x3= 21-30, ...... until 91-100, and the last range is ">100"). When i plot my graph with the corresponding y values as follows:
ggplot(data=figure1_data, aes(x=Average.Coverage.of.Study, y=Number.of.Studies)) + geom_bar(stat = "Identity")
The ">100" x value comes first in the plot and not at the end which is where I want it. How would I get it to come after the 91-100 range x value?? Can someone please help - I am very new to R. much appreciated!! :)
Edited:
You need to order levels by your preference.
df <- data.frame(avg = c("0-10","11-20","21-30","91-100",">100"),
studies = c(1:5))
ggplot(df)+
geom_bar(aes(x = ordered(avg, levels = c(avg)), y = studies), stat = "identity")
or
ggplot(df)+
geom_bar(aes(x = ordered(avg, levels = c("0-10","11-20","21-30","91-100",">100")), y = studies), stat = "identity")`
will both give you the same result.

How can I manually add labels to multiple ggplot2 mappings created through a for-loop?

I have been working on plotting several lines according to different probability levels and am stuck adding labels to each line to represent the probability level.
Since each curve plotted has varying x and y coordinates, I cannot simply have a large data-frame on which to perform usual ggplot2 functions.
The end goal is to have each line with a label next to it according to the p-level.
What I have tried:
To access the data comfortably, I have created a list df with for example 5 elements, each element containing a nx2 data frame with column 1 the x-coordinates and column 2 the y-coordinates. To plot each curve, I create a for loop where at each iteration (i in 1:5) I extract the x and y coordinates from the list and add the p-level line to the plot by:
plot = plot +
geom_line(data=df[[i]],aes(x=x.coor, y=y.coor),color = vector_of_colors[i])
where vector_of_colors contains varying colors.
I have looked at using ggrepel and its geom_label_repel() or geom_text_repel() functions, but being unfamiliar with ggplot2 I could not get it to work. Below is a simplification of my code so that it may be reproducible. I could not include an image of the actual curves I am trying to add labels to since I do not have 10 reputation.
# CREATION OF DATA
plevel0.5 = cbind(c(0,1),c(0,1))
colnames(plevel0.5) = c("x","y")
plevel0.8 = cbind(c(0.5,3),c(0.5,1.5))
colnames(plevel0.8) = c("x","y")
data = list(data1 = line1,data2 = line2)
# CREATION OF PLOT
plot = ggplot()
for (i in 1:2) {
plot = plot + geom_line(data=data[[i]],mapping=aes(x=x,y=y))
}
Thank you in advance and let me know what needs to be clarified.
EDIT :
I have now attempted the following :
Using bind_rows(), I have created a single dataframe with columns x.coor and y.coor as well as a column called "groups" detailing the p-level of each coordinate.
This is what I have tried:
plot = ggplot(data) +
geom_line(aes(coors.x,coors.y,group=groups,color=groups)) +
geom_text_repel(aes(label=groups))
But it gives me the following error:
geom_text_repel requires the following missing aesthetics: x and y
I do not know how to specify x and y in the correct way since I thought it did this automatically. Any tips?
You approach is probably a bit to complicated. As far as I get it you could of course go on with one dataset and use the group aesthetic to get the same result you are trying to achieve with your for loop and multiple geom_line. To this end I use dplyr:.bind_rows to bind your datasets together. Whether ggrepel is needed depends on your real dataset. In my code below I simply use geom_text to add an label at the rightmost point of each line:
plevel0.5 <- data.frame(x = c(0, 1), y = c(0, 1))
plevel0.8 <- data.frame(x = c(0.5, 3), y = c(0.5, 1.5))
library(dplyr)
library(ggplot2)
data <- list(data1 = plevel0.5, data2 = plevel0.8) |>
bind_rows(.id = "id")
ggplot(data, aes(x = x, y = y, group = id)) +
geom_line(aes(color = id)) +
geom_text(data = ~ group_by(.x, id) |> filter(x %in% max(x)), aes(label = id), vjust = -.5, hjust = .5)

Histogram with discontinuous x-axis

I need to realize an histogram in R. I add a picture to represent the desired results. I had tried to use both ggplot2 and the base function hist. I used this code (ggplot) to get the basic histogram, but I would like to add the option to set the x-axis as shown in the figure (exactly the same values). Can someone tell me how to do that?
My imput file DataLig2 contains a list of objects and for each of these is associated a value (N..of.similar..Glob.Sum...0.83..ligandable.pockets). I need to plot the frequencies of all the reported values. The lowest value is 1 and the highest is 28. There aren't values from 16 to 27 so I would like to skip thi range in my plot.
example of imput file:
Object;N..of.similar..Glob.Sum...0.83..ligandable.pockets
1b47_A_001;3
4re2_B_003;1
657w_H_004_13
1gtr_A_003;28
...
my script:
ggplot(dataLig2, aes(dataLig2$N..of.similar..Glob.Sum...0.83..ligandable.pockets, fill = group)) + geom_histogram(color="black") +
scale_fill_manual(values = c("1-5" = "olivedrab1",
"6-10" = "limegreen",
"11-28" = "green4"))
Can you also suggest a script with the hist base function to get the same graph (with spaced bars as in the figure shown)? Thank you!
Using ggplot, set x as factor, missing numbers as "...", and set to plot unused levels, see example:
library(ggplot2)
# reproducible example data
# where 8 and 9 is missing
set.seed(1); d <- data.frame(x = sample(c(1:7, 10), 100, replace = TRUE))
# add missing 8 and 9 as labels
d$x1 <- factor(d$x, levels = 1:10, labels = c(1:7, "...", "...", 10))
#compare
cowplot::plot_grid(
ggplot(d, aes(x)) +
geom_bar() +
ggtitle("before") +
scale_x_continuous(breaks = 1:10),
ggplot(d, aes(x = x1)) +
geom_bar() +
scale_x_discrete(drop = FALSE) +
ggtitle("after"))

ggplot with three variables x~axis in r

I am currently trying to create a ggplot with three variables in r, that compares a H-1b support (y~axis) - the variable is h1bvis.supp and implicit bias (x~axis) by gender - the variable is impl.prejud. I have tried to create the plot with the folllowing code:
ggplot(data = immigrant) + geom_histogram(mapping = aes(x = impl.prejud, y = h1bvis.supp))
It is not working and I don't know why.
The dataset is this one:
immigrant <- read.csv("https://raw.githubusercontent.com/umbertomig/intro-prob-stat-FGV/master/datasets/immig.csv")
Is this what you need?
immigrant <- read.csv("https://raw.githubusercontent.com/umbertomig/intro-prob-stat-FGV/master/datasets/immig.csv")
ggplot(immigrant, aes(x = impl.prejud, y = h1bvis.supp)) + geom_col()

How to make a histogram from a matrix in R

I`m having trouble constructing an histogram from a matrix in R
The matrix contains 3 treatments(lamda0.001, lambda0.002, lambda0.005 for 4 populations rec1, rec2, rec3, con1). The matrix is:
lambda0.001 lambda0.002 lambda.003
rec1 1.0881688 1.1890554 1.3653264
rec2 1.0119031 1.0687678 1.1751051
rec3 0.9540271 0.9540271 0.9540271
con1 0.8053506 0.8086985 0.8272758
my goal is to plot a histogram with lambda in the Y axis and four groups of three treatments in X axis. Those four groups should be separated by a small break from eache other.
I need help, it doesn`t matter if in ggplot2 ou just regular plot (R basic).
Thanks a lot!
Agree with docendo discimus that maybe a barplot is what you're looking for. Based on what you're asking though I would reshape your data to make it a little easier to work with first and you can still get it done with stat = "identity"
sapply(c("dplyr", "ggplot2"), require, character.only = T)
# convert from matrix to data frame and preserve row names as column
b <- data.frame(population = row.names(b), as.data.frame(b), row.names = NULL)
# gather so in a tidy format for ease of use in ggplot2
b <- gather(as.data.frame(b), lambda, value, -1)
# plot 1 as described in question
ggplot(b, aes(x = population, y = value)) + geom_histogram(aes(fill = lambda), stat = "identity", position = "dodge")
# plot 2 using facets to separate as an alternative
ggplot(b, aes(x = population, y = value)) + geom_histogram(stat = "identity") + facet_grid(. ~ lambda)

Resources