Re-order group chart same as the input - r

I have an input data and i would like to create a grouped chart, but when I finish the creation the problem is the order is different from the input, it arranged it as alphabetical, plus I would like to change the font style to italic, for the species names only.
> data <- read.table(
+ text = "Superfamily Drom Bactria Feru Paos
+ ERV 294 224 206 202
+ ERVL-MaLR 103 108 184 231
+ Gypsy 274 187 413 215
+ Pao 6 2 7 4
+ DIRS/Ngaro 15 14 45 25
+ Unknown 26 23 23 37
+ Undefined 76 77 80 95",
+ header = TRUE
+ )
> data
Superfamily Drom Bactria Feru Paos
1 ERV 294 224 206 202
2 ERVL-MaLR 103 108 184 231
3 Gypsy 274 187 413 215
4 Pao 6 2 7 4
5 DIRS/Ngaro 15 14 45 25
6 Unknown 26 23 23 37
7 Undefined 76 77 80 95
> data_long <- gather(data,
+ key = "Species",
+ value = "Distrubution",
+ -Superfamily)
> ggplot(data_long, aes(fill=Superfamily, y=Distrubution, x=Species)) + geom_bar(position="dodge2", stat="identity")
I would like to build the chart as the same as the input order, and italic font style to the species name only ex ( Drom Bactria ....)

I think this is what you're asking for
data_long$Species <- factor(data_long$Species, levels = unique(data_long$Species))
ggplot(data_long, aes(fill=Superfamily, y=Distrubution, x=Species)) + geom_bar(position="dodge2", stat="identity") + theme(axis.text.x = element_text(face = "italic"))
If ggplot recieves a factor, it will use the level-order as the axis order.
When it comes to the fonts, you change that in the theme argument.
--edit--
To get the superfamily in the same order as input, you would have to create a factor as we did with the species-name.
data_long$Superfamily<- factor(data_long$Superfamily, levels = data$Superfamily)
Forgoing the use of the readxl-package to read the excel sheet into R, this should work to change the species name:
colnames(data)[2:5] <- c("Alpha Drom", "Beta Bactria", "Gamma Feru", "Delta Paos")
Add this line before you create data_long.

Related

time series aesthetics with ggplot2

hello I have tried to graph the following data
I have tried to graph the following time series
fecha importaciones
1 Ene\n1994 171.0
2 Feb\n1994 170.7
3 Mar\n1994 183.7
4 Abr\n1994 214.6
5 May\n1994 227.2
6 Jun\n1994 221.1
7 Jul\n1994 216.4
8 Ago\n1994 235.3
9 Sep\n1994 227.0
10 Oct\n1994 216.0
11 Nov\n1994 221.5
12 Dic\n1994 270.9
13 Ene\n1995 250.4
14 Feb\n1995 259.6
15 Mar\n1995 258.2
16 Abr\n1995 232.9
17 May\n1995 335.0
18 Jun\n1995 295.2
19 Jul\n1995 302.5
20 Ago\n1995 283.3
21 Sep\n1995 264.4
22 Oct\n1995 277.6
23 Nov\n1995 289.1
24 Dic\n1995 280.5
25 Ene\n1996 252.4
26 Feb\n1996 250.1
.
.
.
320 Ago\n2020 794.6
321 Sep\n2020 938.2
322 Oct\n2020 966.3
323 Nov\n2020 958.9
324 Dic\n2020 1059.2
325 Ene\n2021 1056.2
326 Feb\n2021 982.5
I graph it with office cal
but trying to plot it in R with ggplot
ggplot(datos, aes(x = fecha, y = importaciones)) +
geom_line(size = 1) +
scale_color_manual(values=c("#00AFBB", "#E7B800"))+
theme_minimal()
I have tried to graph with all the possible steps but it does not fit me in a correct way for someone to guide me
Change the x-axis to date class.
library(ggplot2)
df$fecha <- lubridate::dmy(paste0(1, df$fecha))
ggplot(datos, aes(x = fecha, y = importaciones, group = 1)) +
geom_line(size = 1) +
scale_color_manual(values=c("#00AFBB", "#E7B800"))+
theme_minimal()
You can use scale_x_date to change the breaks and display format of dates on x-axis.

Plotting each value of columns for a specific row

I am struggling to plot a specific row from a dataframe. Below is the Graph i am trying to plot. I have tried using ggplot and normal plot but i cannot figure it out.
Wt2 Wt3 Wt4 Wt5 Lngth2 Lngth3 Lngth4 Lngth5
1 48 59 95 82 141 157 168 183
2 59 68 102 102 140 168 174 170
3 61 77 93 107 145 162 172 177
4 54 43 104 104 146 159 176 171
5 100 145 185 247 150 158 168 175
6 68 82 95 118 142 140 178 189
7 68 95 109 111 139 171 176 175
Above is the Data frame I am trying to plot with. The rows are for each bears measurement. So row 1 is for bear 1. How would I plot only the Wt columns for bear 1 against an X-axis that goes from years 2 to 5
You can pivot your data frame into a longer format:
First add a column with the row number (bear number):
df = cbind("Bear"=as.factor(1:nrow(df)), df)
It needs to be factor so we can pass it as a group variable to ggplot. Now pivot:
df2 = tidyr::pivot_longer(df[,1:5], cols=2:5,
names_to="Year", values_to="Weight", names_prefix="Wt")
df2$Year = as.numeric(df2$Year)
We ignore the Length columns with df[,1:5]; say that we only want to pivot the weight columns with df[,2:5]; then say the name of the columns we want to create with names_to and values_to; and lastly the names_prefix="Wt" removes the "Wt" before the column names, leaving only the year number, but we get a character, so we need to make it numeric with as.numeric().
Then plot:
ggplot(df2, aes(x=Year, y=Weight, linetype=Bear)) + geom_line()
Output (Ps: i created my own data, so the actual numbers are off):
Just an addition, if you don't want to specify the columns of your dataset explicity, you can do:
df2 = df2[,grep("Wt|Bear", colnames(df)]
df2 = tidyr::pivot_longer(df2, cols=grep("Wt", colnames(df2)),
names_to="Year", values_to="Weight", names_prefix="Wt")
Edit: one plot for each group
You can use facet_wrap:
ggplot(df2, aes(x=Year, y=Weight, linetype=Bear)) +
facet_wrap(~Bear, nrow=2, ncol=4) +
geom_line()
Output:
You can change the nrow and ncol as you wish, and can remove the linetype from aes() as you already have a differenciation, but it's not mandatory.
You can also change the levels of the categorical data to make the labels on each graph better, do levels(df2$Bear) = paste("Bear", 1:7) for example (or do that the when creating it).
Try
ggplot(mapping = aes(x = seq.int(2, 5), y = c(48, 59, 95, 82))) +
geom_point(color = "blue") +
geom_line(color = "blue") +
xlab("Year") +
ylab("Weight")

ggplot showing a trend with more than 1 variables across y axis

I have a dataframe df where I need to see the comparison of the trend between weeks
df
Col Mon Tue Wed
1 47 164 163
2 110 168 5
3 31 146 109
4 72 140 170
5 129 185 37
6 41 77 96
7 85 26 41
8 123 15 188
9 14 23 163
10 152 116 82
11 118 101 5
Right now I can only plot 2 variables like below. But I need to see for Tuesday and Wednesday as well
ggplot(data=df,aes(x=Col,y=Mon))+geom_line()
You can either add a
geom_line(aes(x = Col, y = Mon), col = 1)
for each day, or you would need to restructure your data frame using a function like gather so your new columns are col, day, value. Without reformatting the data, your result would be
ggplot(data=df)+geom_line(aes(x=Col,y=Mon), col = 1) + geom_line(aes(x=Col,y=Tue), col = 2) + geom_line(aes(x=Col,y=Wed), col = 3)
with a restructure it would be
ggplot(data=df)+geom_line(aes(x=Col,y=Val, col = Day))
The standard way would be to get the data in long format and then plot
library(tidyverse)
df %>%
gather(key, value, -Col) %>%
ggplot() + aes(factor(Col), value, col = key, group = key) + geom_line()

R find number of rows in a group and plot

I have a table of Tennis matches. I want to group by winner_ids and plot them against height, basically to check if the taller players have won more matches.
The data looks like this.
m_id winner_id winner_height
1 21 166
2 21 166
3 22 167
4 21 166
5 23 170
6 24 163
7 22 167
8 25 164
Here m_id is the match_id. I want to plot number of matches a person has won against his height
example: 21 has won 3 matches and her height is 166 cm
how can I acheive this in ggplot?
my following code doesn't seem to be working
matches %>% group_by(winner_id) %>%
ggplot(., aes(x = winner_ht, y = nrow((winner_id)))) + geom_point()
Can anyone help?
Do you mean something like this?
library(tidyverse)
df %>%
group_by(winner_id, winner_height) %>%
summarise(n = n()) %>%
ggplot(aes(winner_height, n, label = winner_id)) +
geom_point() +
geom_text(position = position_nudge(y = -0.1))
Explanation: We count the number of games n per winner_id and winner_height and pass the summarised data to ggplot where we plot winner_height vs. n. We can also add labels to indicate the winner_id.
Sample data
df <- read.table(text =
"m_id winner_id winner_height
1 21 166
2 21 166
3 22 167
4 21 166
5 23 170
6 24 163
7 22 167
8 25 164", header = T)

ggplot facets: show annotated text in selected facets

I want to create a 2 by 2 faceted plot with a vertical line shared by the four facets. However, because the facets on top have the same date information as the facets at the bottom, I only want to have the vline annotated twice: in this case in the two facets at the bottom.
I looked a.o. here, which does not work for me. (In addition I have my doubts whether this is still valid code, today.) I also looked here. I also looked up how to influence the font size in geom_text: according to the help pages this is size. In the case below it doesn't work out well.
This is my code:
library(ggplot2)
library(tidyr)
my_df <- read.table(header = TRUE, text =
"Date AM_PM First_Second Systolic Diastolic Pulse
01/12/2017 AM 1 134 83 68
01/12/2017 PM 1 129 84 76
02/12/2017 AM 1 144 88 56
02/12/2017 AM 2 148 93 65
02/12/2017 PM 1 131 85 59
02/12/2017 PM 2 129 83 58
03/12/2017 AM 1 153 90 62
03/12/2017 AM 2 143 92 59
03/12/2017 PM 1 139 89 56
03/12/2017 PM 2 141 86 56
04/12/2017 AM 1 140 87 58
04/12/2017 AM 2 135 85 55
04/12/2017 PM 1 140 89 67
04/12/2017 PM 2 128 88 69
05/12/2017 AM 1 134 99 67
05/12/2017 AM 2 128 90 63
05/12/2017 PM 1 136 88 63
05/12/2017 PM 2 123 83 61
")
# setting the classes right
my_df$Date <- as.Date(as.character(my_df$Date), format = "%d/%m/%Y")
my_df$First_Second <- as.factor(my_df$First_Second)
# to tidy format
my_df2 <- gather(data = my_df, key = Measure, value = Value,
-c(Date, AM_PM, First_Second), factor_key = TRUE)
# Measures in 1 facet, facets split over AM_PM and First_Second
## add anntotations column for geom_text
my_df2$Annotations <- rep("", 54)
my_df2$Annotations[c(4,6)] <- "Start"
p2 <- ggplot(data = my_df2) +
ggtitle("Blood Pressure and Pulse as a function of AM/PM,\n Repetition, and date") +
geom_line(aes(x = Date, y = Value, col= Measure, group = Measure), size = 1.) +
geom_point(aes(x = Date, y = Value, col= Measure, group = Measure), size= 1.5) +
facet_grid(First_Second ~ AM_PM) +
geom_vline(aes(xintercept = as.Date("2017/12/02")), linetype = "dashed",
colour = "darkgray") +
theme(axis.text.x=element_text(angle = -90))
p2
yields this graph:
This is the basic plot from which I start. Now we try to annotate it.
p2 + annotate(geom="text", x = as.Date("2017/12/02"), y= 110, label="start", size= 3)
yielding this plot:
This plot has the problem that the annotation occurs 4 times, while we only want it in the bottom parts of the graph.
Now we use geom_text which will use the "Annotations" column in our dataframe, in line with this SO Question. Be carefull, the column added to the dataframe must be present when you create "p2", the first time (that is why we added the column supra)
p2 + geom_text(aes(x=as.Date("2017/12/02"), y=100, label = Annotations, size = .6))
yielding this plot:
Yes, we succeeded in getting the annotation only in the bottom two parts of the graph. But the font is too big ( ... and ugly) and when we try to correct it with size, two things are interesting: (1) the font size is not changed (although you would expect that from the help pages) and (2) a legend is added.
I have been clicking around a lot and have been unable to solve this after hours and hours. Any help would be appreciated.

Resources