geom_bar : There are extra x-axis appear in my bar plot - r

My data is follow the sequence:
deptime .count
1 4.5 6285
2 14.5 5901
3 24.5 6002
4 34.5 5401
5 44.5 5080
6 54.5 4567
7 104.5 3162
8 114.5 2784
9 124.5 1950
10 134.5 1800
11 144.5 1630
12 154.5 1076
13 204.5 738
14 214.5 556
15 224.5 544
16 234.5 650
17 244.5 392
18 254.5 309
19 304.5 356
20 314.5 364
My ggplot code:
ggplot(pplot, aes(x=deptime, y=.count)) + geom_bar(stat="identity",fill='#FF9966',width = 5) + labs(x="time", y="count")
output figure
There are a gap between each 100. Does anyone know how to fix it?
Thank You

Related

Combining grouping and filtering on a dataframe to plot in ggplot and shiny

I am creating a shiny app that tracks various stats of 6 teams in a competition over 6 years. The df is as follows:
Year Pos Team P W L D GF GA GD G. BP Pts
1 2017 1 Southern Steel 15 15 0 0 1062 812 250 130.8 0 30
2 2017 2 Central Pulse 15 9 6 0 783 756 27 103.6 2 20
3 2017 3 Northern Mystics 15 8 7 0 878 851 27 111.3 3 19
4 2017 4 Waikato Bay of Plenty Magic 15 7 8 0 873 848 25 103.0 5 19
5 2017 5 Northern Stars 15 4 11 0 738 868 -130 85.0 1 9
6 2017 6 Mainland Tactix 15 2 13 0 676 875 -199 77.3 2 6
7 2018 1 Central Pulse 15 12 3 0 850 679 171 125.2 3 27
8 2018 2 Southern Steel 15 10 5 0 874 866 8 100.9 2 22
9 2018 3 Mainland Tactix 15 7 8 0 746 761 -15 98.0 5 19
10 2018 4 Northern Mystics 15 7 8 0 783 796 -13 98.4 3 17
11 2018 5 Waikato Bay of Plenty Magic 15 5 10 0 804 878 -74 91.6 3 13
12 2018 6 Northern Stars 15 4 11 0 832 909 -77 91.5 5 13
13 2019 1 Central Pulse 15 13 2 0 856 676 180 126.6 0 39
14 2019 2 Southern Steel 15 12 3 0 946 809 137 116.9 2 38
15 2019 3 Northern Stars 15 6 9 0 785 840 -55 93.5 3 21
16 2019 4 Waikato Bay of Plenty Magic 15 5 10 0 713 793 -80 89.9 0 15
17 2019 5 Mainland Tactix 15 5 10 0 740 849 -109 87.2 0 15
18 2019 6 Northern Mystics 15 4 11 0 786 859 -73 91.5 2 14
19 2020 1 Central Pulse 15 11 2 2 594 474 120 125.3 1 49
20 2020 2 Mainland Tactix 15 9 4 2 606 566 40 107.1 2 42
21 2020 3 Northern Mystics 15 7 6 2 582 475 7 101.2 3 35
22 2020 4 Northern Stars 15 5 7 3 590 626 -36 94.2 3 29
23 2020 5 Southern Steel 15 4 10 1 578 637 -59 90.7 3 21
24 2020 6 Waikato Bay of Plenty Magic 15 2 9 4 520 592 -72 87.8 3 19
25 2021 1 Northern Mystics 15 11 4 0 924 878 46 105.2 4 37
26 2021 2 Southern Steel 15 11 4 0 813 801 12 101.5 2 35
27 2021 3 Mainland Tactix 15 9 6 0 801 775 26 103.4 4 31
28 2021 4 Northern Stars 15 9 6 0 825 791 34 104.3 2 29
29 2021 5 Central Pulse 15 4 11 0 789 810 -21 97.4 8 20
30 2021 6 Waikato Bay of Plenty Magic 15 1 15 0 807 904 -97 89.3 6 9
31 2022 1 Central Pulse 15 10 5 0 828 732 96 113.1 4 34
32 2022 2 Northern Stars 15 11 4 0 836 783 53 106.8 1 34
33 2022 3 Northern Mystics 15 9 6 0 858 807 51 106.3 4 31
34 2022 4 Southern Steel 15 6 9 0 853 898 -45 95.0 2 20
35 2022 5 Waikato Bay of Plenty Magic 15 4 11 0 733 803 -70 91.3 4 16
36 2022 6 Mainland Tactix 15 5 0 0 788 873 -85 90.3 1 16
I need 3 graphs:
A stacked bar chart showing wins/draws/losses for each team across the 6 years.
A line chart showing the position of each team at the end of each of the 6 years.
A bubble chart showing total goals for/ goals against for each team across all 6 years, with total wins dictating size of the plots.
I also need to be able to filter the data for these graphs with a checkbox for choosing teams and a slider to select the year range.
I have got a stacked bar chart which can not be filtered - I can't figure out how to group the original df by team AND have it connected to the reactive filter I have. Currently the graph is connected to a melted df which is no good as I need the reactive filtered one defined in the function. The graph is also a bit ugly - how can I flip the chart so that wins are on bottom and draws are on top?
The second chart is all good.
The third chart again I need to group the data so that I have total stats across the 6 years- currently there are 36 bubbles but I only want 6.
Screenshots of shiny app output: https://imgur.com/a/qzqlUob
Code:
library(ggplot2)
library(shiny)
library(dplyr)
library(reshape2)
library(scales)
df <- read.csv("ANZ_Premiership_2017_2022.csv")
teams <- c("Central Pulse", "Northern Stars", "Northern Mystics",
"Southern Steel", "Waikato Bay of Plenty Magic", "Mainland Tactix")
mdf <- melt(df %>%
group_by(Team) %>% summarise(Wins = sum(W),
Losses = sum(L),
Draws = sum(D)),
id.vars = "Team")
ui <- fluidPage(
titlePanel("ANZ Premiership Analysis"),
sidebarLayout(
sidebarPanel(
checkboxGroupInput("teams",
"Choose teams",
choices = teams,
selected = teams),
sliderInput("years",
"Choose years",
sep="",
min=2017, max=2022, value=c(2017,2022))
),
mainPanel(
h2("Chart Tabs"),
tabsetPanel(
tabPanel("Wins/ Losses/ Draws", plotOutput("winLoss")),
tabPanel("Standings", plotOutput("standings")),
tabPanel("Goals", plotOutput ("goals"))
)
)
)
)
server <- function(input, output){
filterTeams <- reactive({
df.selection <- filter(df, Team %in% input$teams, Year %in% (input$years[1]:input$years[2]))
})
output$winLoss <- renderPlot({
ggplot(mdf, mapping=aes(Team, value, fill=variable))+
geom_bar(stat = "identity", position = "stack")+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
ylab("Wins")+
xlab("Team")
})
output$standings <- renderPlot({
filterTeams() %>%
ggplot(aes(x=Year, y=Pos, group=Team, color=Team)) +
geom_line(size=1.25) +
geom_point(size=2.5)+
ggtitle("Premiership Positions") +
ylab("Position")
})
output$goals <- renderPlot({
filterTeams()%>%
ggplot(aes(GF, GA, size=W, color=Team))+
geom_point(alpha=0.7)+
scale_size(range=c(5,15),name = "Wins")+
xlab("Goals for")+
ylab("Goals against")
})
}
shinyApp(ui = ui, server = server)

How to transpose cells with multiple values?

I'm loading in data from Excel and there are some cells with multiple values. I would like to transpose these cells such that each value gets a row.
For instance, in my data below, I'd have 10 rows for the numbers in id and time that are currently bunched in the first row.
The other values would need to be duplicated. So, as above, I'd repeat run fish, and boat_speed ten times for the first row.
structure(list(run = c(1, 2, 3, 4, 5, 6), id = c("20 4 4 4 4 4 4 11 11 11",
"18 18 18 18 18 15 15 15 15 21 18 17 17 4 4 4 19", "8 8 8 7 7 7 7 4 4 4 4 4 4 15 15 4 4 4 4 18 18 18 18",
"7 7 7 5 16 12 12 12 4", "21 21 21 21 21 21 8 6 6 6 6 6 6 9 9 9 4 4 4 4",
"5 13 13 13 13 8"), time = c("550 1574 1575 1638 1639 1640 1641 2116 2117 2118",
"632 633 637 638 639 880 881 882 883 1365 1413 1567 1569 2204 2205 2206 2214",
"82 83 84 961 962 963 964 1527 1528 1529 1544 1545 1585 1596 1597 1649 1650 1651 1652 2001 2002 2003 2033",
"734 735 736 1119 1376 1674 1675 1676 1869", "420 421 422 423 424 425 469 926 927 936 937 938 939 1353 1354 1355 2035 2036 2037 2038",
"14 587 588 589 590 4455"), fish = c(20, 20, 20, 20, 20, 20),
boat_speed = c(0.05, 0.05, 0.05, 0.05, 0.05, 0.05)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
The tidyr::separate_rows function does exactly this. Assuming your data are stored in a data frame called df:
library(tidyverse)
df %>%
separate_rows(c(id, time))
run id time fish boat_speed
<dbl> <chr> <chr> <dbl> <dbl>
1 1 20 550 20 0.05
2 1 4 1574 20 0.05
3 1 4 1575 20 0.05
4 1 4 1638 20 0.05
5 1 4 1639 20 0.05
6 1 4 1640 20 0.05
7 1 4 1641 20 0.05
8 1 11 2116 20 0.05
9 1 11 2117 20 0.05
10 1 11 2118 20 0.05
# … with 75 more rows

How can i reorder a variable having categorical values in dplyr [duplicate]

This question already has answers here:
Reorder rows using custom order
(2 answers)
Closed 6 years ago.
I have done some manipulations as below to arrive at the following dataframe:
df
cluster.kmeans variable max mean median min sd
1 1 MonthlySMS 191 90.32258 71.0 8 56.83801
2 1 SixMonthlyData 1085 567.09677 573.0 109 275.46994
3 1 SixMonthlySMS 208 94.38710 86.0 29 56.27828
4 1 ThreeMonthlyData 1038 563.03226 573.0 94 275.51340
5 1 ThreeMonthlySMS 199 88.35484 76.0 6 59.15491
6 2 MonthlySMS 155 53.18815 57.0 1 31.64533
7 2 SixMonthlyData 574 280.27352 280.5 -48 139.75252
8 2 SixMonthlySMS 167 57.77526 47.0 1 33.49210
9 2 ThreeMonthlyData 548 280.89547 279.0 -11 137.54755
10 2 ThreeMonthlySMS 149 53.68641 50.5 3 31.40001
11 3 MonthlySMS 215 135.60202 137.0 49 34.09794
12 3 SixMonthlyData 1046 541.76322 557.0 2 258.90622
13 3 SixMonthlySMS 314 152.40302 152.0 27 45.55642
14 3 ThreeMonthlyData 1064 541.50378 558.0 10 255.35560
15 3 ThreeMonthlySMS 240 146.00756 146.0 54 37.06427
16 4 MonthlySMS 136 49.93980 54.5 1 31.47778
17 4 SixMonthlyData 1091 788.09365 805.0 503 145.67031
18 4 SixMonthlySMS 190 57.50167 46.0 1 33.66157
19 4 ThreeMonthlyData 1073 785.19398 799.5 500 142.90054
20 4 ThreeMonthlySMS 141 50.88796 46.0 1 31.07977
I would like to order the variable column based on these strings:
top.vars_kmeans
[1] "ThreeMonthlySMS" "SixMonthlyData" "ThreeMonthlyData"
[4] "MonthlySMS" "SixMonthlySMS"
I could do it using sqldf as below:
library(sqldf)
a <- c(1,2,3,4,5)
a <- data.frame(top.vars_kmeans,a)
a <- sqldf('select a1.* ,b1.a from "MS.DATA.STATS.KMEANS" a1 inner join a b1
on a1.variable=b1."top.vars_kmeans"')
a <- sqldf('select * from a order by "cluster.kmeans",a')
a$a <- NULL
a
cluster.kmeans variable max mean median min sd
1 1 ThreeMonthlySMS 199 88.35484 76.0 6 59.15491
2 1 SixMonthlyData 1085 567.09677 573.0 109 275.46994
3 1 ThreeMonthlyData 1038 563.03226 573.0 94 275.51340
4 1 MonthlySMS 191 90.32258 71.0 8 56.83801
5 1 SixMonthlySMS 208 94.38710 86.0 29 56.27828
6 2 ThreeMonthlySMS 149 53.68641 50.5 3 31.40001
7 2 SixMonthlyData 574 280.27352 280.5 -48 139.75252
8 2 ThreeMonthlyData 548 280.89547 279.0 -11 137.54755
9 2 MonthlySMS 155 53.18815 57.0 1 31.64533
10 2 SixMonthlySMS 167 57.77526 47.0 1 33.49210
11 3 ThreeMonthlySMS 240 146.00756 146.0 54 37.06427
12 3 SixMonthlyData 1046 541.76322 557.0 2 258.90622
13 3 ThreeMonthlyData 1064 541.50378 558.0 10 255.35560
14 3 MonthlySMS 215 135.60202 137.0 49 34.09794
15 3 SixMonthlySMS 314 152.40302 152.0 27 45.55642
16 4 ThreeMonthlySMS 141 50.88796 46.0 1 31.07977
17 4 SixMonthlyData 1091 788.09365 805.0 503 145.67031
18 4 ThreeMonthlyData 1073 785.19398 799.5 500 142.90054
19 4 MonthlySMS 136 49.93980 54.5 1 31.47778
20 4 SixMonthlySMS 190 57.50167 46.0 1 33.66157
I am just curious to know if this could be achieved using dplyr......my understanding of this wonderful package will get enhanced....
need help here!
We can use arrange with match
library(dplyr)
a %>%
arrange(cluster.kmeans, match(variable, top.vars_kmeans))
# cluster.kmeans variable max mean median min sd
#1 1 ThreeMonthlySMS 199 88.35484 76.0 6 59.15491
#2 1 SixMonthlyData 1085 567.09677 573.0 109 275.46994
#3 1 ThreeMonthlyData 1038 563.03226 573.0 94 275.51340
#4 1 MonthlySMS 191 90.32258 71.0 8 56.83801
#5 1 SixMonthlySMS 208 94.38710 86.0 29 56.27828
#6 2 ThreeMonthlySMS 149 53.68641 50.5 3 31.40001
#7 2 SixMonthlyData 574 280.27352 280.5 -48 139.75252
#8 2 ThreeMonthlyData 548 280.89547 279.0 -11 137.54755
#9 2 MonthlySMS 155 53.18815 57.0 1 31.64533
#10 2 SixMonthlySMS 167 57.77526 47.0 1 33.49210
#11 3 ThreeMonthlySMS 240 146.00756 146.0 54 37.06427
#12 3 SixMonthlyData 1046 541.76322 557.0 2 258.90622
#13 3 ThreeMonthlyData 1064 541.50378 558.0 10 255.35560
#14 3 MonthlySMS 215 135.60202 137.0 49 34.09794
#15 3 SixMonthlySMS 314 152.40302 152.0 27 45.55642
#16 4 ThreeMonthlySMS 141 50.88796 46.0 1 31.07977
#17 4 SixMonthlyData 1091 788.09365 805.0 503 145.67031
#18 4 ThreeMonthlyData 1073 785.19398 799.5 500 142.90054
#19 4 MonthlySMS 136 49.93980 54.5 1 31.47778
#20 4 SixMonthlySMS 190 57.50167 46.0 1 33.66157
you can redefine a factor (or ordered factor) with the levels in desired order (e.g. as stored in top.vars_kmeans):
a$variable <- factor(a$variable, levels = top.vars_kmeans)
See also the help page online, or via ?factor.
If you desire to order the whole data.frame, go by the answer of akrun.
You can try group_by and slice:
df %>% group_by(cluster.kmeans) %>% slice(match(top.vars_kmeans, variable))
# cluster.kmeans variable max mean median min sd
# (int) (fctr) (int) (dbl) (dbl) (int) (dbl)
#1 1 ThreeMonthlySMS 199 88.35484 76.0 6 59.15491
#2 1 SixMonthlyData 1085 567.09677 573.0 109 275.46994
#3 1 ThreeMonthlyData 1038 563.03226 573.0 94 275.51340
#4 1 MonthlySMS 191 90.32258 71.0 8 56.83801
#5 1 SixMonthlySMS 208 94.38710 86.0 29 56.27828
#6 2 ThreeMonthlySMS 149 53.68641 50.5 3 31.40001
#7 2 SixMonthlyData 574 280.27352 280.5 -48 139.75252
#8 2 ThreeMonthlyData 548 280.89547 279.0 -11 137.54755
#9 2 MonthlySMS 155 53.18815 57.0 1 31.64533
#10 2 SixMonthlySMS 167 57.77526 47.0 1 33.49210
#11 3 ThreeMonthlySMS 240 146.00756 146.0 54 37.06427
#12 3 SixMonthlyData 1046 541.76322 557.0 2 258.90622
#13 3 ThreeMonthlyData 1064 541.50378 558.0 10 255.35560
#14 3 MonthlySMS 215 135.60202 137.0 49 34.09794
#15 3 SixMonthlySMS 314 152.40302 152.0 27 45.55642
#16 4 ThreeMonthlySMS 141 50.88796 46.0 1 31.07977
#17 4 SixMonthlyData 1091 788.09365 805.0 503 145.67031
#18 4 ThreeMonthlyData 1073 785.19398 799.5 500 142.90054
#19 4 MonthlySMS 136 49.93980 54.5 1 31.47778
#20 4 SixMonthlySMS 190 57.50167 46.0 1 33.66157

calculating Net Squared Displacement and repeating at 0 when target is reached

Apologies if a similar query has been posted - couldn't find it.
I have GPS locations (UTM) for multiple individuals.
X Y AnimalID DATE
1 550466 4789843 10 1/25/2008
2 550820 4790544 10 1/26/2008
3 551071 4791230 10 1/26/2008
4 550462 4789292 10 1/26/2008
5 550390 4789934 10 1/27/2008
6 550543 4790085 10 1/27/2008
I am attempting to calculate Net Squared Displacement and once NSD has reached at least 800m, I'd like to repeat the formula starting at 0 at the next row.
Desired output is this:
XLOC YLOC ANIMALID DATETIME Xdist Ydist NSD GROUP
1 550466 4789843 10 1/25/2008 17:00 354 701 785 1
2 550820 4790544 10 1/26/2008 1:00 605 1387 1513 1
3 551071 4791230 10 1/26/2008 9:00 609 1938 2031 2
4 550462 4789292 10 1/26/2008 17:00 72 642 646 3
5 550390 4789934 10 1/27/2008 1:00 81 793 797 3
6 550543 4790085 10 1/27/2008 9:00 82 149 170 3
7 550380 4789441 10 1/27/2008 17:00 178 192 262 3
8 550284 4789484 10 1/28/2008 1:00 559 426 703 3
9 549903 4789718 10 1/28/2008 9:00 0 35 35 3
10 550462 4789327 10 1/28/2008 17:00 574 275 636 3
11 549888 4789567 10 1/29/2008 1:00 532 263 593 3
12 549930 4789555 10 1/29/2008 9:00 65 4 65 3
13 550397 4789288 10 1/29/2008 17:00 124 140 187 3
14 550338 4789432 10 1/30/2008 1:00 554 339 649 3
15 549908 4789631 10 1/30/2008 9:00 84 75 113 3
16 550378 4789367 10 1/30/2008 17:00 657 1876 1988 3
17 550414 4789354 10 1/31/2008 1:00 531 91 539 4
18 549883 4789445 10 1/31/2008 9:00 188 136 232 4
19 550226 4789490 10 1/31/2008 17:00 126 141 189 4
20 550288 4789495 10 2/1/2008 1:00 176 187 257 4
I added the 'Group' column to indicate when 800 NSD was attained.
I'm really struggling with how exactly to code for this particular approach mainly because the first UTM has to be identical until 800m has been reached.
In other words, I can't do this:
xdist<-abs(diff(X)
ydist<-abs(diff(Y)
nsd<-sqrt(xdist^2+ydist^2)
I need to do this until the target of 800m was reached:
xdist <- abs(X in row 2 - 550446)
ydist <- abs(Y in row 2 - 4789843)
Then the unique UTMs will need to be from rows 3, 4, 17 and so on.
I hope this makes sense and I'd appreciate any help!
I think this is what you are looking for:
data$GROUP[1] <- 1
data$Xdist[1] <- data$XLOC[2] - data$XLOC[1]
data$Ydist[1] <- data$YLOC[2] - data$YLOC[1]
data$NSD[1] <- as.integer(sqrt(data$Xdist[1]^2+data$Ydist[1]^2))
for ( i in 2:(nrow(data)-1)) {
if ( data$NSD[i-1] > 800) {
data$Xdist[i] <- data$XLOC[i+1] - data$XLOC[i]
data$Ydist[i] <- data$YLOC[i+1] - data$YLOC[i]
data$NSD[i] <- as.integer(sqrt(data$Xdist[i]^2+data$Ydist[i]^2))
data$GROUP[i] <- (data$GROUP[i-1] + 1)
} else {
data$Xdist[i] <- data$XLOC[i+1] - data$XLOC[i] + data$Xdist[i-1]
data$Ydist[i] <- data$YLOC[i+1] - data$YLOC[i] + data$Ydist[i-1]
data$NSD[i] <- as.integer(sqrt(data$Xdist[i]^2+data$Ydist[i]^2))
data$GROUP[i] <- (data$GROUP[i-1])
}
}
output:
> data
XLOC YLOC ANIMALID DATE TIME Xdist Ydist NSD GROUP
1 550466 4789843 10 1/25/20081 7:00 354 701 785 1
2 550820 4790544 10 1/26/2008 1:00 605 1387 1513 1
3 551071 4791230 10 1/26/2008 9:00 -609 -1938 2031 2
4 550462 4789292 10 1/26/2008 17:00 -72 642 646 3
5 550390 4789934 10 1/27/2008 1:00 81 793 797 3
6 550543 4790085 10 1/27/2008 9:00 -82 149 170 3
7 550380 4789441 10 1/27/2008 17:00 -178 192 261 3
8 550284 4789484 10 1/28/2008 1:00 -559 426 702 3
9 549903 4789718 10 1/28/2008 9:00 0 35 35 3
10 550462 4789327 10 1/28/2008 17:00 -574 275 636 3
11 549888 4789567 10 1/29/2008 1:00 -532 263 593 3
12 549930 4789555 10 1/29/2008 9:00 -65 -4 65 3
13 550397 4789288 10 1/29/2008 17:00 -124 140 187 3
14 550338 4789432 10 1/30/2008 1:00 -554 339 649 3
15 549908 4789631 10 1/30/2008 9:00 -84 75 112 3
16 550378 4789367 10 1/30/2008 17:00 -48 62 78 3
17 550414 4789354 10 1/31/2008 1:00 -579 153 598 3
18 549883 4789445 10 1/31/2008 9:00 -236 198 308 3
19 550226 4789490 10 1/31/2008 17:00 -174 203 267 3
20 550288 4789495 10 2/1/2008 1:00 NA NA NA NA
Also I think you made a mistake above at xdist16 because for xlocline17 - xlocline16 + xdistline15 = 550414 - 550378 + (-84) = -48 and not 657 as you specified. Unless I missed something at your formula.
Hope this helps!

Barplot using three columns

The data in the table is given below:
Year NSW Vic. Qld SA WA Tas. NT ACT Aust.
1 1917 1904 1409 683 440 306 193 5 3 4941
2 1927 2402 1727 873 565 392 211 4 8 6182
3 1937 2693 1853 993 589 457 233 6 11 6836
4 1947 2985 2055 1106 646 502 257 11 17 7579
5 1957 3625 2656 1413 873 688 326 21 38 9640
6 1967 4295 3274 1700 1110 879 375 62 103 11799
7 1977 5002 3837 2130 1286 1204 415 104 214 14192
8 1987 5617 4210 2675 1393 1496 449 158 265 16264
9 1997 6274 4605 3401 1480 1798 474 187 310 18532
I want to plot a graph with (Year) on my x-axis and (total value) on my Y-axis. The barplot should depicting the ACT and NT value for the respective (Years).
I tried the following command:
barplot(as.matrix(r_data$ACT, r_data$NT), main="r_data", ylab="Total", beside=TRUE)
The above command showed the barplot of ACT column per year but didn't show the Bar plot of NT column.
You have to create the matrix in a different way:
barplot(as.matrix(r_data[c("ACT", "NT")]),
main="r_data", ylab="Total", beside=TRUE)
You can also use cbind instead of as.matrix and keep the rest of your original approach:
barplot(cbind(r_data$ACT, r_data$NT),
main="r_data", ylab="Total", beside=TRUE)

Resources