R ggplot ordering bars within groups - r

I'm attempting to format a grouped bar plot in R with ggplot such that bars are in decreasing order per group. This is my current plot:
based on this data frame:
> top_categories
Category Count Community
1 Singer-Songwriters 151 1
2 Adult Alternative 147 1
3 Dance Pop 95 1
4 Folk 89 1
5 Adult Contemporary 88 1
6 Pop Rap 473 2
7 Gangsta & Hardcore 413 2
8 Soul 175 2
9 East Coast 170 2
10 West Coast 135 2
11 Album-Oriented Rock (AOR) 253 3
12 Singer-Songwriters 217 3
13 Soft Rock 196 3
14 Folk 145 3
15 Adult Contemporary 106 3
16 Soul 278 4
17 Blues 137 4
18 Funk 119 4
19 Quiet Storm 76 4
20 Dance Pop 74 4
21 Indie & Lo-Fi 235 5
22 Indie Rock 234 5
23 Adult Alternative 114 5
24 Alternative Rock 49 5
25 Singer-Songwriters 47 5
created with this code:
ggplot(
top_categories,
aes(
x=Community,
y=Count,
group=Category,
label=Category
)
) +
geom_bar(
stat="identity",
color="black",
fill="#9C27B0",
position="dodge"
) +
geom_text(
angle=90,
position=position_dodge(width=0.9),
hjust=-0.05
) +
ggtitle("Number of Products in each Category in Each Community") +
guides(fill=FALSE)
Based on suggestions from related posts, I've attempted to use the reorder function and turn the Count into a factor, both with results that seem to break the ordering of the bars vs. the text or rescale the plot in a nonsensical way such as this (with factors):
Any tips on how I might accomplish this in-group ordering? Thanks!

When you group by Category, the bars are ordered according to the order of appearance of Categories in the dataframe. This works fine for Community 1 and 2 as your rows are already ordered by decreasing Count. But in Community 3, as Category "Singer-Songwriters" is the first occcurring Category in the dataframe, it is put first.
Grouping instead by an Id variable resolves the problem:
top_categories$Id=rep(c(1:5),5)
ggplot(
top_categories,
aes(
x=Community,
y=Count,
group=Id,
label=Category
)
) +
geom_bar(
stat="identity",
color="black",
fill="#9C27B0",
position="dodge"
) +
geom_text(
angle=90,
position=position_dodge(width=0.9),
hjust=-0.05
) +
ggtitle("Number of Products in each Category in Each Community") +
guides(fill=FALSE)

Related

time series aesthetics with ggplot2

hello I have tried to graph the following data
I have tried to graph the following time series
fecha importaciones
1 Ene\n1994 171.0
2 Feb\n1994 170.7
3 Mar\n1994 183.7
4 Abr\n1994 214.6
5 May\n1994 227.2
6 Jun\n1994 221.1
7 Jul\n1994 216.4
8 Ago\n1994 235.3
9 Sep\n1994 227.0
10 Oct\n1994 216.0
11 Nov\n1994 221.5
12 Dic\n1994 270.9
13 Ene\n1995 250.4
14 Feb\n1995 259.6
15 Mar\n1995 258.2
16 Abr\n1995 232.9
17 May\n1995 335.0
18 Jun\n1995 295.2
19 Jul\n1995 302.5
20 Ago\n1995 283.3
21 Sep\n1995 264.4
22 Oct\n1995 277.6
23 Nov\n1995 289.1
24 Dic\n1995 280.5
25 Ene\n1996 252.4
26 Feb\n1996 250.1
.
.
.
320 Ago\n2020 794.6
321 Sep\n2020 938.2
322 Oct\n2020 966.3
323 Nov\n2020 958.9
324 Dic\n2020 1059.2
325 Ene\n2021 1056.2
326 Feb\n2021 982.5
I graph it with office cal
but trying to plot it in R with ggplot
ggplot(datos, aes(x = fecha, y = importaciones)) +
geom_line(size = 1) +
scale_color_manual(values=c("#00AFBB", "#E7B800"))+
theme_minimal()
I have tried to graph with all the possible steps but it does not fit me in a correct way for someone to guide me
Change the x-axis to date class.
library(ggplot2)
df$fecha <- lubridate::dmy(paste0(1, df$fecha))
ggplot(datos, aes(x = fecha, y = importaciones, group = 1)) +
geom_line(size = 1) +
scale_color_manual(values=c("#00AFBB", "#E7B800"))+
theme_minimal()
You can use scale_x_date to change the breaks and display format of dates on x-axis.

Making multi-line plots in R using ggplot2

I would like to compile some data into a ggplot() line plot of different colors.
It's rainfall in various places over 100 days, and the data is quite different between locations which is giving me fits.
I've tried using different suggestions from this forum and they don't seem to be working well for this data. Sample data:
Time Location1 Location2 Location3
0 48 99.2966479761526 2
1 51 98.7287820735946 4
2 58 98.4803262236528 4.82842712474619
3 43 97.8941490454599 5.46410161513775
4 47 96.6091435402632 6
5 47 95.207282404881 6.47213595499958
6 41 94.8696538619697 6.89897948556636
7 34 94.6514389757067 7.29150262212918
8 40 93.7297335476615 7.65685424949238
9 57 93.2440731907263 8
My code thus far is
ggplot(Rain) +
geom_line(aes(x=Time,y=Location1,col="red")) +
geom_line(aes(x=Time,y=Location2,col="blue")) +
geom_line(aes(x=Time,y=Location3,col="green")) +
scale_color_manual(labels = c("Location 1","Location 2","Location 3"),
values = c("red","blue","green")) +
xlab("Time (Days)") + ylab("Rainfall (Inches)") + labs(color="Locations") +
ggtitle("Rainfall Over 100 Days In Three Locations")
So far it gives me everything that I want but for some reason the colors are wrong when I plot it, i.e. it plots location 1 in green while I told it red in my first geom_line.
library(tidyr)
library(ggplot2)
df_long <- gather(data = df1, Place, Rain, -Time)
ggplot(df_long) +
geom_line(aes(x=Time, y=Rain, color=Place))
Data:
df1 <- read.table(text="Time Location1 Location2 Location3
0 48 99.2966479761526 2
1 51 98.7287820735946 4
2 58 98.4803262236528 4.82842712474619
3 43 97.8941490454599 5.46410161513775
4 47 96.6091435402632 6
5 47 95.207282404881 6.47213595499958
6 41 94.8696538619697 6.89897948556636
7 34 94.6514389757067 7.29150262212918
8 40 93.7297335476615 7.65685424949238
9 57 93.2440731907263 8",
header=T, stringsAsFactors=F)

Re-order group chart same as the input

I have an input data and i would like to create a grouped chart, but when I finish the creation the problem is the order is different from the input, it arranged it as alphabetical, plus I would like to change the font style to italic, for the species names only.
> data <- read.table(
+ text = "Superfamily Drom Bactria Feru Paos
+ ERV 294 224 206 202
+ ERVL-MaLR 103 108 184 231
+ Gypsy 274 187 413 215
+ Pao 6 2 7 4
+ DIRS/Ngaro 15 14 45 25
+ Unknown 26 23 23 37
+ Undefined 76 77 80 95",
+ header = TRUE
+ )
> data
Superfamily Drom Bactria Feru Paos
1 ERV 294 224 206 202
2 ERVL-MaLR 103 108 184 231
3 Gypsy 274 187 413 215
4 Pao 6 2 7 4
5 DIRS/Ngaro 15 14 45 25
6 Unknown 26 23 23 37
7 Undefined 76 77 80 95
> data_long <- gather(data,
+ key = "Species",
+ value = "Distrubution",
+ -Superfamily)
> ggplot(data_long, aes(fill=Superfamily, y=Distrubution, x=Species)) + geom_bar(position="dodge2", stat="identity")
I would like to build the chart as the same as the input order, and italic font style to the species name only ex ( Drom Bactria ....)
I think this is what you're asking for
data_long$Species <- factor(data_long$Species, levels = unique(data_long$Species))
ggplot(data_long, aes(fill=Superfamily, y=Distrubution, x=Species)) + geom_bar(position="dodge2", stat="identity") + theme(axis.text.x = element_text(face = "italic"))
If ggplot recieves a factor, it will use the level-order as the axis order.
When it comes to the fonts, you change that in the theme argument.
--edit--
To get the superfamily in the same order as input, you would have to create a factor as we did with the species-name.
data_long$Superfamily<- factor(data_long$Superfamily, levels = data$Superfamily)
Forgoing the use of the readxl-package to read the excel sheet into R, this should work to change the species name:
colnames(data)[2:5] <- c("Alpha Drom", "Beta Bactria", "Gamma Feru", "Delta Paos")
Add this line before you create data_long.

Function for generating multiple line charts for all variables in a dataframe for different groups

I have 106 weeks data for 5 different LOB (Line of Business). The variables are Traffic, Spend, Clicks, etc. In total there will be 106*5 = 530 rows.
Dataframe looks like:
LOB Week Traffic Spend Clicks
A 1 34 12 5
A 2 37 32 6
A 3 41 57 7
A 4 52 42 12
A 5 27 37 8
... 106 weeks
B...106 weeks
C...106 weeks
D...106 weeks
E 1 43 22 12
E 2 65 16 14
E 3 76 18 9
E 4 25 14 11
E 5 53 15 15
... 106 weeks
I want to generate line chart for Traffic for all the 5 different LOB on the same chart, similarly for other metrics also. For this I have written a function but it is not doing what I want.
Code:
for ( i in seq(1,length( data),1) ) plot(data[,i],ylab=names(data[i]),type="l", col = "red", xlab = "Week", main = "")
Kindly suggest me how this can be done.
You can use ggplot2 :
ggplot(data, aes(x = Week, y = Traffic, color = LOB)) +
geom_line()
Please try to submit a toy example of your data so we can reproduce the code. See Here.
Edit: as suggested by #Axeman, you may want to plot all metrics together. Here is his solution for visibility:
d <- gather(data, metric, value, -Week, -LOB)
ggplot(d, aes(Week, value, color = LOB)) +
geom_line() +
facet_wrap(~metric, scales = 'free_y')

R: ggplot2 - plots doesn't look right - vertical instead of diagonal lines

I'd like to create a plot for 2 categorical variables. Therefore I created two dummy sets but - although they contain the same items - they look totally different. Same happens with my real data as well.
I also tried to perform it with ordered columns, same result.
Please see attached my coding, the three plots (plot 2 is the best practice) + real data plot (but anonymized) to show you problem. Don't understand why there are those vertical lines.
Thank you in advance
library(ggplot2)
library(dplyr)
dat1 <- data.frame(
sex = factor(c("Male","Female","Male","Female")), levels=c("Female","Male"),
time = factor(c("Lunch","Dinner","Lunch","Dinner"), levels=c("Lunch","Dinner")),
total_bill = c(17.42, 16.81, 16.24, 13.53)
)
dat1
#plot1: shows horizontal lines although it should look like the plot 2
ggplot(data=dat1, aes(x=sex, y=total_bill, group=time, colour=time)) +
geom_line() +
geom_point() +
ggtitle("Plot 1")
#different approach for plot1
arrange(dat1 , sex, time)
dat1
#has ordered columns like I wanted it to be
#still looks like plot1
ggplot(data=dat1, aes(x=sex, y=total_bill, group=time, colour=time)) +
geom_line() +
geom_point() +
ggtitle("Plot 1 ordered")
dat2 <- data.frame(
sex = factor(c("Female","Female","Male","Male")),
time = factor(c("Lunch","Dinner","Lunch","Dinner"), levels=c("Lunch","Dinner")),
total_bill = c(13.53, 16.81, 16.24, 17.42)
)
dat2
#plot2: look like I'd like to have it this way
ggplot(data=dat2, aes(x=sex, y=total_bill, group=time, colour=time)) +
geom_line() +
geom_point() +
ggtitle("Plot 2")
Also an outline of the data plot which has those vertical lines in it
library(ggplot2)
library(dplyr)
mydata2
# ####Output (fictive data but same problem as with my real data, still vertical lines in it but should only have 2 lines like in plot2)
# group NM score
# 1 KG S 2537.94514
# 2 EG S 1766.39019
# 3 KG S 3970.91802
# 4 KG S 4089.14939
# 5 KG S 2795.42964
# 6 EG S 2286.60411
# 7 KG S 4027.22993
# 8 KG S 1030.18328
# 9 EG S 719.73679
# 10 EG S 724.93663
# 11 EG S 2929.03717
# 12 EG S 521.55736
# 13 KG S 1435.85625
# 14 EG S 1496.39471
# 15 EG S 3521.25827
# 16 KG S 2138.17928
# 17 EG S 1233.86267
# 18 KG S 591.33086
# 19 EG S 2171.97341
# 20 EG S 3871.92536
# 21 EG S 468.10133
# 22 KG S 2419.67419
# 23 KG S 1338.29305
# 24 KG S 1629.33862
# 25 EG S 560.39680
# 26 EG S 546.22468
# 27 KG S 3398.94647
# 28 KG S 1117.72716
# 29 EG S 2794.90527
# 30 EG S 3606.77693
# 31 KG S 3558.67156
# 32 KG S 196.64992
# 33 EG S 2174.69930
# 34 EG S 3444.10732
# 35 KG S 670.60907
# 36 EG S 3719.20997
# 37 KG S 65.76227
# 38 EG S 3420.12225
# 39 KG S 1405.83738
# 40 KG S 2859.33873
# 41 EG T 1296.75111
# 42 EG T 436.53580
# 43 KG T 213.09334
# 44 EG T 2073.70465
# 45 KG T 1679.98816
# 46 EG T 1599.26738
# 47 EG T 777.65179
# 48 EG T 1738.45395
# 49 KG T 3269.54120
# 50 EG T 3506.07302
# 51 EG T 1764.61915
# 52 EG T 493.47846
# 53 KG T 1729.02949
# 54 EG T 1454.57702
# 55 EG T 2577.32018
# 56 EG T 295.08653
# 57 EG T 3811.24064
# 58 KG T 2320.35879
# 59 EG T 1285.65291
# 60 KG T 3600.26095
# 61 EG T 3738.89452
# 62 KG T 3472.53512
# 63 KG T 1203.33462
# 64 EG T 1809.41229
# 65 EG T 3536.17972
# 66 EG T 2637.59869
# 67 KG T 1279.44567
# 68 KG T 1141.81247
# 69 KG T 3951.54206
# 70 KG T 1940.11505
# 71 KG T 192.74602
# 72 KG T 1235.81839
# 73 EG T 1907.09384
# 74 KG T 1772.86806
# 75 KG T 997.92437
# 76 KG T 217.81433
# 77 KG T 3595.69359
# 78 EG T 910.07955
# ####End of output
ggplot(data=mydata2, aes(x=group, y=score, group=NM, shape=NM, colour=NM)) +
geom_line(aes(linetype=NM), size=1) + # Set linetype by sex
geom_point(size=3, fill="white") + # Use larger points, fill with white
expand_limits(y=0) + # Set y range to include 0
scale_colour_hue(name="Sex of participant", # Set legend title
l=30) + # Use darker colors (lightness=30)
scale_shape_manual(name="Sex of participant",
values=c(22,21)) + # Use points with a fill color
scale_linetype_discrete(name="Sex of participant") +
xlab("Group") + ylab("Score") + # Set axis labels
ggtitle("Data") + # Set title
theme_bw() +
theme(legend.position=c(.7, .4)) # Position legend inside
# This must go after theme_bw
Plot1
Plot1 ordered
Plot2
Data plot which looks wrong
the data2 that makes the plot you want looks like this:
> dat2
sex time total_bill
1 Female Lunch 13.53 # female has lunch and dinner
2 Female Dinner 16.81
3 Male Lunch 16.24 # male has lunch and dinner
4 Male Dinner 17.42
However your data1 before and after the arrange looks like this:
before
sex levels time total_bill
1 Male Female Lunch 17.42
2 Female Male Dinner 16.81 # female only has dinner
3 Male Female Lunch 16.24 # male only has lunch
4 Female Male Dinner 13.53
after
sex levels time total_bill
1 Female Male Dinner 16.81 # female only has dinner
2 Female Male Dinner 13.53
3 Male Female Lunch 17.42 # male only has lunch
4 Male Female Lunch 16.24
In both data sets females either only have lunch and males only have dinner. Also for the levels column there is the same problem.
So the lines in your plots are drawn between the two female points at dinner, rather than across females points at lunch and dinner.
For your updated question, I run your code with the column names like this:
# group NM sex score
# 1 KG S 2537.945
# 2 EG S 1766.390
# 3 KG S 3970.918
and get this? what is the issue?

Resources