R find number of rows in a group and plot - r

I have a table of Tennis matches. I want to group by winner_ids and plot them against height, basically to check if the taller players have won more matches.
The data looks like this.
m_id winner_id winner_height
1 21 166
2 21 166
3 22 167
4 21 166
5 23 170
6 24 163
7 22 167
8 25 164
Here m_id is the match_id. I want to plot number of matches a person has won against his height
example: 21 has won 3 matches and her height is 166 cm
how can I acheive this in ggplot?
my following code doesn't seem to be working
matches %>% group_by(winner_id) %>%
ggplot(., aes(x = winner_ht, y = nrow((winner_id)))) + geom_point()
Can anyone help?

Do you mean something like this?
library(tidyverse)
df %>%
group_by(winner_id, winner_height) %>%
summarise(n = n()) %>%
ggplot(aes(winner_height, n, label = winner_id)) +
geom_point() +
geom_text(position = position_nudge(y = -0.1))
Explanation: We count the number of games n per winner_id and winner_height and pass the summarised data to ggplot where we plot winner_height vs. n. We can also add labels to indicate the winner_id.
Sample data
df <- read.table(text =
"m_id winner_id winner_height
1 21 166
2 21 166
3 22 167
4 21 166
5 23 170
6 24 163
7 22 167
8 25 164", header = T)

Related

Circular line graph with groups

I have four dataframes that look like below:
X score.i score.ii score.iii mm
1: 1 -0.3958555 -0.3750726 -0.3378881 10
2: 2 -0.3954955 -0.3799290 -0.3400876 15
3: 3 -0.3962514 -0.3776692 -0.3401180 20
4: 4 -0.4033265 -0.3764099 -0.3436115 25
5: 5 -0.4035860 -0.3753792 -0.3426287 30
---
186: 186 -0.4041035 -0.3767158 -0.3419871 80
187: 187 -0.4040643 -0.3767881 -0.3417620 85
188: 188 -0.4052228 -0.3766468 -0.3436883 90
189: 189 -0.4047009 -0.3767359 -0.3431591 95
190: 190 -0.4061497 -0.3766785 -0.3433624 100
How can I plot a circular line graph with aes(x=mm, y=score.i) for these four such that there is a gap between the lines for each dataframe?
library(ggplot2)
library(dplyr)
library(tidyr)
df1 %>%
pivot_longer(-c(X, mm), names_to = "Variable", values_to = "Score") %>%
ggplot(., aes(x = mm, y = Score, color = Variable)) +
geom_line() +
coord_polar()
Data:
read.table(text =
"X score.i score.ii score.iii mm
1 -0.3958555 -0.3750726 -0.3378881 10
2 -0.3954955 -0.3799290 -0.3400876 15
3 -0.3962514 -0.3776692 -0.3401180 20
4 -0.4033265 -0.3764099 -0.3436115 25
5 -0.4035860 -0.3753792 -0.3426287 30
186 -0.4041035 -0.3767158 -0.3419871 80
187 -0.4040643 -0.3767881 -0.3417620 85
188 -0.4052228 -0.3766468 -0.3436883 90
189 -0.4047009 -0.3767359 -0.3431591 95
190 -0.4061497 -0.3766785 -0.3433624 100",
header = T, stringsAsFactors = F) -> df1

Re-order group chart same as the input

I have an input data and i would like to create a grouped chart, but when I finish the creation the problem is the order is different from the input, it arranged it as alphabetical, plus I would like to change the font style to italic, for the species names only.
> data <- read.table(
+ text = "Superfamily Drom Bactria Feru Paos
+ ERV 294 224 206 202
+ ERVL-MaLR 103 108 184 231
+ Gypsy 274 187 413 215
+ Pao 6 2 7 4
+ DIRS/Ngaro 15 14 45 25
+ Unknown 26 23 23 37
+ Undefined 76 77 80 95",
+ header = TRUE
+ )
> data
Superfamily Drom Bactria Feru Paos
1 ERV 294 224 206 202
2 ERVL-MaLR 103 108 184 231
3 Gypsy 274 187 413 215
4 Pao 6 2 7 4
5 DIRS/Ngaro 15 14 45 25
6 Unknown 26 23 23 37
7 Undefined 76 77 80 95
> data_long <- gather(data,
+ key = "Species",
+ value = "Distrubution",
+ -Superfamily)
> ggplot(data_long, aes(fill=Superfamily, y=Distrubution, x=Species)) + geom_bar(position="dodge2", stat="identity")
I would like to build the chart as the same as the input order, and italic font style to the species name only ex ( Drom Bactria ....)
I think this is what you're asking for
data_long$Species <- factor(data_long$Species, levels = unique(data_long$Species))
ggplot(data_long, aes(fill=Superfamily, y=Distrubution, x=Species)) + geom_bar(position="dodge2", stat="identity") + theme(axis.text.x = element_text(face = "italic"))
If ggplot recieves a factor, it will use the level-order as the axis order.
When it comes to the fonts, you change that in the theme argument.
--edit--
To get the superfamily in the same order as input, you would have to create a factor as we did with the species-name.
data_long$Superfamily<- factor(data_long$Superfamily, levels = data$Superfamily)
Forgoing the use of the readxl-package to read the excel sheet into R, this should work to change the species name:
colnames(data)[2:5] <- c("Alpha Drom", "Beta Bactria", "Gamma Feru", "Delta Paos")
Add this line before you create data_long.

ggplot showing a trend with more than 1 variables across y axis

I have a dataframe df where I need to see the comparison of the trend between weeks
df
Col Mon Tue Wed
1 47 164 163
2 110 168 5
3 31 146 109
4 72 140 170
5 129 185 37
6 41 77 96
7 85 26 41
8 123 15 188
9 14 23 163
10 152 116 82
11 118 101 5
Right now I can only plot 2 variables like below. But I need to see for Tuesday and Wednesday as well
ggplot(data=df,aes(x=Col,y=Mon))+geom_line()
You can either add a
geom_line(aes(x = Col, y = Mon), col = 1)
for each day, or you would need to restructure your data frame using a function like gather so your new columns are col, day, value. Without reformatting the data, your result would be
ggplot(data=df)+geom_line(aes(x=Col,y=Mon), col = 1) + geom_line(aes(x=Col,y=Tue), col = 2) + geom_line(aes(x=Col,y=Wed), col = 3)
with a restructure it would be
ggplot(data=df)+geom_line(aes(x=Col,y=Val, col = Day))
The standard way would be to get the data in long format and then plot
library(tidyverse)
df %>%
gather(key, value, -Col) %>%
ggplot() + aes(factor(Col), value, col = key, group = key) + geom_line()

Trying to plot stacked barplot from percentages

I'm trying to plot a stacked barplot of the rate of computer used in different departments with details on what type of PC in each bar( so that for each department type1+type2+type3=tot_rate) . I've got a dataframe that looks like this :
dat=read.table(text = "Tot_rate Type1 Type2 Type3
DPT1 72 50 12 10
DPT2 80 30 20 30
DPT3 92 54 14 24", header = TRUE)
I tried to plot my barplot with raw data but now it's very important that i get the one with percentages and i can't seem to understand how i can do that.
This is how i thought i could that, but it just doesn't work
p<-ggplot(dat, aes(x=row.names(dat), y=dat$Tot_rate, fill=data[,2:ncol(dat)])) + geom_bar(stat="identity")+theme_minimal()+xlab("") + ylab("PC rate")+geom_abline(slope=0, intercept=90, col = "red",lty=2) + theme(axis.text.x = element_text(angle = 90, hjust = 1))
p
When i try the code above i get :
Don't know how to automatically pick scale for object of type data.frame. Defaulting to continuous.
Error: Aesthetics must be either length 1 or the same as the data (9): fill
Can you please help ?
Thank you,
Liana
Here is one way to do it using a ggplot2 extension package called ggstatsplot-
set.seed(123)
library(tidyverse)
# creating dataframe in long format
(dat <- read.table(
text = "Tot_rate Type1 Type2 Type3
DPT1 72 50 12 10
DPT2 80 30 20 30
DPT3 92 54 14 24",
header = TRUE
) %>%
tibble::rownames_to_column(var = "id") %>%
tidyr::gather(., "key", "counts", Type1:Type3))
#> id Tot_rate key counts
#> 1 DPT1 72 Type1 50
#> 2 DPT2 80 Type1 30
#> 3 DPT3 92 Type1 54
#> 4 DPT1 72 Type2 12
#> 5 DPT2 80 Type2 20
#> 6 DPT3 92 Type2 14
#> 7 DPT1 72 Type3 10
#> 8 DPT2 80 Type3 30
#> 9 DPT3 92 Type3 24
# bar plot
ggstatsplot::ggbarstats(dat,
main = id,
condition = key,
counts = counts,
messages = FALSE)
Created on 2019-05-27 by the reprex package (v0.3.0)
library(reshape2)
dat=read.table(text = "Department Tot_rate Type1 Type2 Type3
DPT1 72 50 12 10
DPT2 80 30 20 30
DPT3 92 54 14 24", header = TRUE)
long_dat <- dat[-2] %>% gather(type,number,Type1:Type3,-c("Department"))
First I reshaped the data you had : I put the department in a column and reshaped your data from wide to long format (dropped tot_rate which isn't needed here).
p <- ggplot(data=long_dat,aes(x=Department,y=number,fill=type)) +
geom_bar(position = "fill",stat = "identity")
p
To scale the barplot in percantages, we use the position argument of geom_barset to position=fill.

ggplot each group consists of only one observation

I'm trying to make a plot similar to this answer: https://stackoverflow.com/a/4877936/651779
My data frame looks like this:
df2 <- read.table(text='measurements samples value
1 4hours sham1 6
2 1day sham1 175
3 3days sham1 417
4 7days sham1 163
5 14days sham1 37
6 90days sham1 134
7 4hours sham2 8
8 1day sham2 402
9 3days sham2 482
10 7days sham2 67
11 14days sham2 16
12 90days sham2 31
13 4hours sham3 185
14 1day sham3 402
15 3days sham3 482
16 7days sham3 85
17 14days sham3 29
18 90days sham3 10',header=T)
And plot it with
ggplot(df2, aes(measurements, value)) + geom_line(aes(colour = samples))
No lines show in the plot, and I get the message
geom_path: Each group consist of only one observation.
Do you need to adjust the group aesthetic?
I don't see where what I'm doing is different from the answer I linked above. What should I change to make this work?
Add group = samples to the aes of geom_line. This is necessary since you want one line per samples rather than for each data point.
ggplot(df2, aes(measurements, value)) +
geom_line(aes(colour = samples, group = samples))

Resources