How to change bar position without affecting the results?

How to change bar position without affecting the results? - r

I'm new to R and im trying to change the position of a bar on a bar chart but my results have changed too. Here is the chart : Chart of age
when I use the code :
positions <- c("Moins de 18 ans","18 a 22 ans", "23 a 27 ans", "33 a 37 ans","38 ans et plus")
p + theme_classic() + scale_x_discrete(limits = positions)
This is the results I have:
Chart of age 2
and the message :
Warning messages:
1: Removed 86 rows containing non-finite values (stat_count).
2: Removed 86 rows containing non-finite values (stat_count).
I don't know what to do with this. Someone help me please!

Since you haven't provided the data, I can show how to rearrange bars using dummy data. To sort bar graph, essentially you need to sort data on variables you are using as x-axis in plot.
vec = c(rep("a", 30), rep("b", 20), rep("c", 10))
df = as.data.frame(table(vec)) # Create dummy data frame
Dataframe df looks like this -
vec Freq
1 a 30
2 b 20
3 c 10
The plot will be -
df %>%
ggplot(aes(x = vec, y = Freq)) +
geom_bar(stat = "identity") # default plot
Now, I want bars in order b,a,c. All I need to do is sort my dataframe in the same order -
df$vec = factor(df$vec, levels = c("b", "a", "c")) # assign levels in order you want to see the bar-plot
df = df[order(df$vec),] # sort dataframe on your x-variable
df %>%
ggplot(aes(x = vec, y = Freq)) +
geom_bar(stat = "identity") # barplot will be sorted on levels of factor now
The output of above code is -
I haven't done rest of the formatting, but from your graphs, you are good with that. By following these steps, your data shouldn't change when reordering the bars. If you can share your data, I can better understand if it solves your problem.

Related

R: how to filter within aes()

As an R-beginner, there's one hurdle that I just can't find the answer to. I have a table where I can see the amount of responses to a question according to gender.
Response
Gender
n
1
1
84
1
2
79
2
1
42
2
2
74
3
1
84
3
2
79
etc.
I want to plot these in a column chart: on the y I want the n (or its proportions), and on the x I want to have two seperate bars: one for gender 1, and one for gender 2. It should look like the following example that I was given:
The example that I want to emulate
However, when I try to filter the columns according to gender inside aes(), it returns an error! Could anyone tell me why my approach is not working? And is there another practical way to filter the columns of the table that I have?
ggplot(table) +
geom_col(aes(x = select(filter(table, gender == 1), Q),
y = select(filter(table, gender == 1), n),
fill = select(filter(table, gender == 2), n), position = "dodge")

Maybe something like this:
library(RColorBrewer)
library(ggplot2)
df %>%
ggplot(aes(x=factor(Response), y=n, fill=factor(Gender)))+
geom_col(position=position_dodge())+
scale_fill_brewer(palette = "Set1")
theme_light()

Your answer does not work, because you are assigning the x and y variables as if it was two different datasets (one for x and one for y). In line with the solution from TarJae, you need to think of it as the axis in a diagram - so you need for your x axis to assign the categorical variables you are comparing, and you want for the y axis to assign the numerical variables which determines the height of the bars. Finally, you want to compare them by colors, so each group will have a different color - that is where you include your grouping variable (here, I use fill).
library(dplyr) ## For piping
library(ggplot2) ## For plotting
df %>%
ggplot(aes(x = Response, y = n, fill = as.character(Gender))) +
geom_bar(stat = "Identity", position = "Dodge")
I am adding "Identity" because the default in geom_bar is to count the occurences in you data (i.e., if you data was not aggregated). I am adding "Dodge" to avoid the bars to be stacked. I will recommend you, to look at this resource for more information: https://r4ds.had.co.nz/index.html

ggplot2 - How to plot length of time using geom_bar?

I am trying to show different growing season lengths by displaying crop planting and harvest dates at multiple regions.
My final goal is a graph that looks like this:
which was taken from an answer to this question. Note that the dates are in julian days (day of year).
My first attempt to reproduce a similar plot is:
library(data.table)
library(ggplot2)
mydat <- "Region\tCrop\tPlanting.Begin\tPlanting.End\tHarvest.Begin\tHarvest.End\nCenter-West\tSoybean\t245\t275\t1\t92\nCenter-West\tCorn\t245\t336\t32\t153\nSouth\tSoybean\t245\t1\t1\t122\nSouth\tCorn\t183\t336\t1\t153\nSoutheast\tSoybean\t275\t336\t1\t122\nSoutheast\tCorn\t214\t336\t32\t122"
# read data as data table
mydat <- setDT(read.table(textConnection(mydat), sep = "\t", header=T))
# melt data table
m <- melt(mydat, id.vars=c("Region","Crop"), variable.name="Period", value.name="value")
# plot stacked bars
ggplot(m, aes(x=Crop, y=value, fill=Period, colour=Period)) +
geom_bar(stat="identity") +
facet_wrap(~Region, nrow=3) +
coord_flip() +
theme_bw(base_size=18) +
scale_colour_manual(values = c("Planting.Begin" = "black", "Planting.End" = "black",
"Harvest.Begin" = "black", "Harvest.End" = "black"), guide = "none")
However, there's a few issues with this plot:
Because the bars are stacked, the values on the x-axis are aggregated and end up too high - out of the 1-365 scale that represents day of year.
I need to combine Planting.Begin and Planting.End in the same color, and do the same to Harvest.Begin and Harvest.End.
Also, a "void" (or a completely uncolored bar) needs to be created between Planting.Begin and Harvest.End.
Perhaps the graph could be achieved with geom_rect or geom_segment, but I really want to stick to geom_bar since it's more customizable (for example, it accepts scale_colour_manual in order to add black borders to the bars).
Any hints on how to create such graph?

I don't think this is something you can do with a geom_bar or geom_col. A more general approach would be to use geom_rect to draw rectangles. To do this, we need to reshape the data a bit
plotdata <- mydat %>%
dplyr::mutate(Crop = factor(Crop)) %>%
tidyr::pivot_longer(Planting.Begin:Harvest.End, names_to="period") %>%
tidyr::separate(period, c("Type","Event")) %>%
tidyr::pivot_wider(names_from=Event, values_from=value)
# Region Crop Type Begin End
# <chr> <fct> <chr> <int> <int>
# 1 Center-West Soybean Planting 245 275
# 2 Center-West Soybean Harvest 1 92
# 3 Center-West Corn Planting 245 336
# 4 Center-West Corn Harvest 32 153
# 5 South Soybean Planting 245 1
# ...
We've used tidyr to reshape the data so we have one row per rectangle that we want to draw and we've also make Crop a factor. We can then plot it like this
ggplot(plotdata) +
aes(ymin=as.numeric(Crop)-.45, ymax=as.numeric(Crop)+.45, xmin=Begin, xmax=End, fill=Type) +
geom_rect(color="black") +
facet_wrap(~Region, nrow=3) +
theme_bw(base_size=18) +
scale_y_continuous(breaks=seq_along(levels(plotdata$Crop)), labels=levels(plotdata$Crop))
The part that's a bit messy here that we are using a discrete scale for y but geom_rect prefers numeric values, so since the values are factors now, we use the numeric values for the factors to create ymin and ymax positions. Then we need to replace the y axis with the names of the levels of the factor.
If you also wanted to get the month names on the x axis you could do something like
dateticks <- seq.Date(as.Date("2020-01-01"), as.Date("2020-12-01"),by="month")
# then add this to you plot
... +
scale_x_continuous(breaks=lubridate::yday(dateticks),
labels=lubridate::month(dateticks, label=TRUE, abbr=TRUE))

plotting the count of x-value in one col in df in r

I am in a bit of a struggle and I can't find a solution (it should be very simple)
my Code is this
df
Ch1 V1 V2 ID
A a1 a2 1
B b1 b2 2
C a1 b2 1
D d1 d2 3
...
in total we have values ranging from 1 to 9.
I simply want to plot how often 1(,2,3,...,9) occurs in this data frame. My code is this
ggplot(df,aes(ID))+ #because I read that leaving y value makes ggplot count the occurences which is T
geom_bar()+
This works but unfortunately I get this as a result
I want all values to be displayed though.
I tried to modify this with scale_x_continuous
but it didn't work (made the whole x-axis go away and display only 1)
I know I can also create a table = table(df)
But I want to find a universal solution. Because later I want to be able to apply this while making several bars per x-axis value with dependency on V1 or V2 ...
Thank you very much for your help!

According to the OP, the intention is to create
several bars per x-axis value with dependency on V1 or V2
This can be solved either by using fill = V1 and position = "dodge" as already suggested H 1 or by facetting. Both approaches have their merits depending on the aspect the OP wants to focus on.
Note that in all variants ID is turned into a discrete variable (using factor()) and by overriding the default axis title to solve the issue with labeling the x-axis.
Dogded position
library(ggplot2)
ggplot(df) +
aes(x = factor(ID), fill = V1) +
geom_bar(position = "dodge") +
xlab("ID")
This is good if the focus is on comparing the differences between V1 within each ID value.
Facets
library(ggplot2)
ggplot(df) +
aes(x = factor(ID), fill = V1) +
geom_bar() +
xlab("ID") +
facet_wrap(~ V1, nrow = 1L)
Here, the focus is on comparing the distribution of ID counts within each V1.
Colouring the bars in addition to faceting is redundant (but I find it aesthetically more pleasing as compared to all-black bars).
Data
As there were no reproducible data supplied in the question, I have tried to simulate the data by
nr <- 1000L
set.seed(123L) # required to reproduce the data
df <- data.frame(Ch1 = sample(LETTERS[1:4], nr, TRUE),
V1 = paste0(sample(letters[1:4], nr, TRUE), "1"),
V2 = paste0(sample(letters[1:4], nr, TRUE), "2"),
ID = pmin(1L + rgeom(nr, 0.3), 9L)
)
"Raw" plot for comparison with OP's chart
library(ggplot2)
ggplot(df) +
aes(x = ID) +
geom_bar()

grouped barplot: order x-axis & keep constant bar width, in case of missing levels

Here is my script (example inspired from here and using the reorder option from here):
library(ggplot2)
Animals <- read.table(
header=TRUE, text='Category Reason Species
1 Decline Genuine 24
2 Improved Genuine 16
3 Improved Misclassified 85
4 Decline Misclassified 41
5 Decline Taxonomic 2
6 Improved Taxonomic 7
7 Decline Unclear 10
8 Improved Unclear 25
9 Improved Bla 10
10 Decline Hello 30')
fig <- ggplot(Animals, aes(x=reorder(Animals$Reason, -Animals$Species), y=Species, fill = Category)) +
geom_bar(stat="identity", position = "dodge")
This gives the following output plot:
What I would like is to order my barplot only on condition 'Decline', and all the 'Improved' would not be inserted in the middle. Here is what I would like to get (after some svg editing):
So now all the whole 'Decline' condition is sorted and the 'Improved' condition comes after. Besides, ideally, the bars would all be at the same width, even if the condition is not represented for the value (e.g. "Bla" has no "Decline" value).
Any idea on how I could do that without having to play with SVG editors? Many thanks!

First let's fill your data.frame with missing combinations like this.
library(dplyr)
Animals2 <- expand.grid(Category=unique(Animals$Category), Reason=unique(Animals$Reason)) %>% data.frame %>% left_join(Animals)
Then you can create an ordering variable for the x-scale:
myorder <- Animals2 %>% filter(Category=="Decline") %>% arrange(desc(Species)) %>% .$Reason %>% as.character
An then plot:
ggplot(Animals2, aes(x=Reason, y=Species, fill = Category)) +
geom_bar(stat="identity", position = "dodge") + scale_x_discrete(limits=myorder)

Define new data frame with all combinations of "Category" and "Reason", merge with data of "Species" from data frame "Animals". Adapt ggplot by correct scale_x_discrete:
Animals3 <- expand.grid(Category=unique(Animals$Category),Reason=unique(Animals$Reason))
Animals3 <- merge(Animals3,Animals,by=c("Category","Reason"),all.x=TRUE)
Animals3[is.na(Animals3)] <- 0
Animals3 <- Animals3[order(Animals3$Category,-Animals3$Species),]
ggplot(Animals3, aes(x=Animals3$Reason, y=Species, fill = Category)) + geom_bar(stat="identity", position = "dodge") + scale_x_discrete(limits=as.character(Animals3[Animals3$Category=="Decline","Reason"]))

To achieve something like that I would adjust the data frame when working with ggplot. Add the missing categories with a value of zero.
Animals <- rbind(Animals,
data.frame(Category = c("Improved", "Decline"),
Reason = c("Hello", "Bla"),
Species = c(0,0)
)
)

Along the same lines as the answer from user Alex, a less manual way of adding the categories might be
d <- with(Animals, expand.grid(unique(Category), unique(Reason)))
names(d) <- names(Animals)[1:2]
Animals <- merge(d, Animals, all.x=TRUE)
Animals$Species[is.na(Animals$Species)] <- 0

R Setting Y Axis to Count Distinct in ggplot2

I have a data frame that contains 4 variables: an ID number (chr), a degree type (factor w/ 2 levels of Grad and Undergrad), a degree year (chr with year), and Employment Record Type (factor w/ 6 levels).
I would like to display this data as a count of the unique ID numbers by year as a stacked area plot of the 6 Employment Record Types. So, count of # of ID numbers on the y-axis, degree year on the x-axis, the value of x being number of IDs for that year, and the fill will handle the Record Type. I am using ggplot2 in RStudio.
I used the following code, but the y axis does not count distinct IDs:
ggplot(AlumJobStatusCopy, aes(x=Degree.Year, y=Entity.ID,
fill=Employment.Data.Type)) + geom_freqpoly() +
scale_fill_brewer(palette="Blues",
breaks=rev(levels(AlumJobStatusCopy$Employment.Data.Type)))
I also tried setting y = Entity.ID to y = ..count.. and that did not work either. I have searched for solutions as it seems to be a problem with how I am writing the aes code.
I also tried the following code based on examples of similar plots:
ggplot(AlumJobStatusCopy, aes(interval)) +
geom_area(aes(x=Degree.Year, y = Entity.ID,
fill = Employment.Data.Type)) +
scale_fill_brewer(palette="Blues",
breaks=rev(levels(AlumJobStatusCopy$Employment.Data.Type)))
This does not even seem to work. I've read the documentation and am at my wit's end.
EDIT:
After figuring out the answer to the problem, I realized that I was not actually using the correct values for my Year variable. A count tells me nothing as I am trying to display the rise in a lack of records and the decline in current records.
My Dataset:
Year, int, 1960-2015
Current Record, num: % of total records that are current
No Record, num: % of total records that are not current
Ergo each Year value has two corresponding percent values. I am now using 2 lines instead of an area plot since the Y axis has distinct values instead of a count function, but I would still like the area under the curves filled. I tried using Melt to convert the data from wide to long, but was still unable to fill both lines. Filling is just for aesthetic purposes as I would like to use a gradient for each with 1 fill being slightly lighter than the other.
Here is my current code:
ggplot(Alum, aes(Year)) +
geom_line(aes(y = Percent.Records, colour = "Percent.Records")) +
geom_line(aes(y = Percent.No.Records, colour = "Percent.No.Records")) +
scale_y_continuous(labels = percent) + ylab('Percent of Total Records') +
ggtitle("Active, Living Alumni Employment Record") +
scale_x_continuous(breaks=seq(1960, 2014, by=5))
I cannot post an image yet.

I think you're missing a step where you summarize the data to get the quantities to plot on the y-axis. Here's an example with some toy data similar to how you describe yours:
# Make toy data with three levels of employment type
set.seed(1)
df <- data.frame(Entity.ID = rep(LETTERS[1:10], 3), Degree.Year = rep(seq(1990, 1992), each=10),
Degree.Type = sample(c("grad", "undergrad"), 30, replace=TRUE),
Employment.Data.Type = sample(as.character(1:3), 30, replace=TRUE))
# Here's the part you're missing, where you summarize for plotting
library(dplyr)
dfsum <- df %>%
group_by(Degree.Year, Employment.Data.Type) %>%
tally()
# Now plot that, using the sums as your y values
library(ggplot2)
ggplot(dfsum, aes(x = Degree.Year, y = n, fill = Employment.Data.Type)) +
geom_bar(stat="identity") + labs(fill="Employment")
The result could use some fine-tuning, but I think it's what you mean. Here, the bars are equal height because each year in the toy data include an equal numbers of IDs; if the count of IDs varied, so would the total bar height.
If you don't want to add objects to your workspace, just do the summing in the call to ggplot():
ggplot(tally(group_by(df, Degree.Year, Employment.Data.Type)),
aes(x = Degree.Year, y = n, fill = Employment.Data.Type)) +
geom_bar(stat="identity") + labs(fill="Employment")

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to change bar position without affecting the results? - r

Related

R: how to filter within aes()

ggplot2 - How to plot length of time using geom_bar?

plotting the count of x-value in one col in df in r

grouped barplot: order x-axis & keep constant bar width, in case of missing levels

R Setting Y Axis to Count Distinct in ggplot2

Categories

Resources