Need help plotting line chart with five lines using ggplot - r

This input data is from dput:
structure(list(Player = c("deGrom", "deGrom", "deGrom", "deGrom",
"deGrom", "deGrom", "deGrom", "Wheeler", "Wheeler", "Wheeler",
"Wheeler", "Wheeler", "Wheeler", "Syndergaard", "Syndergaard",
"Syndergaard", "Syndergaard", "Matz", "Matz", "Matz", "Matz",
"Matz", "Stroman", "Stroman"), GSc = c(66, 70, 77, 77, 79, 78,
79, 76, 70, 64, 70, 62, 70, 69, 73, 81, 62, 68, 62, 69, 68, 70,
63, 75)), row.names = c(NA, -24L), class = c("tbl_df", "tbl",
"data.frame"))
I have a data frame MetsGS3 with the data above.
I want to use ggplot to create a line chart with a different color line for each of the five players. The x-axis will contain the numbers 2, 4, 6, 8, 10, 12. The y-axis will contain the game scores (GS2). I want the x-axis label to be Player and the y-axis label to be Game Score.
This code does not work, and I need help getting it to work. I know it is missing elements.
ggplot(MetsGS, aes(x=MetsGS$Player, y=GSc, colour = MetsGS$Player) + geom_line(size=1.2) + ggtitle("Mets Game Score Game Scores")
The last time I ran the above ggplot code in RStudio I got this error:
"Error: Incomplete expression: ggplot(MetsGS, aes(x=MetsGS$Player, y=GSc, colour = MetsGS$Player) + geom_line(size=1.2) + ggtitle("Mets Game Score Game Scores")"
Thanks in advance,
Howard

I think there is some data missing in your dataset. I can't find how you are defining x as a number comprised between 2 and 12.
So, I assumed that for each player, each line containing the name of the player correspond to a different game. So, I create a new column using dplyr as this (I called your dataframe d):
library(dplyr)
d %>% group_by(Player) %>% mutate(Number = seq_along(Player)*2)
# A tibble: 24 x 3
# Groups: Player [5]
Player GSc Number
<chr> <dbl> <dbl>
1 deGrom 66 2
2 deGrom 70 4
3 deGrom 77 6
4 deGrom 77 8
5 deGrom 79 10
6 deGrom 78 12
7 deGrom 79 14
8 Wheeler 76 2
9 Wheeler 70 4
10 Wheeler 64 6
# … with 14 more rows
and plot it like this:
library(ggplot2)
library(dplyr)
d %>% group_by(Player) %>% mutate(Number = seq_along(Player)*2) %>%
ggplot(., aes(x=Number, y=GSc, colour = Player)) +
geom_line(size=1.2) +
ggtitle("Mets Game Score Game Scores")+
scale_x_continuous(breaks = seq(2,14, by = 2))
Does it look what you are looking for ? If not, can you clarify your question

Related

create directed arrow plots of two variables using ggplot2 in R

I have two variables (V1,V2) measured on same subject (id) at two time points (timepoint). I want to have a scatterplot with arrow paths to show how values moved from T1 to T2 for the same subject.
In my example, some subjects do not have change in V1 nor V2, it would be ideal to show just as one dot for those sub (sub 1 for example), but I am OK with two dots for two visits, since they will be overlap. There are also sub with a decrease in either V1 or V2 (sub 2 for example), those sub were shown in red arrow above. The third group of subjects show an increase in either V1 or V2 (sub 6 and 7): these sub were in green.
However, what I really need is all arrows point from T1 to T2. That is I hope the green arrow change direction.
The dataset can be generated by:
datatest <- data.frame(timepoint =rep(seq(2,1),8),
id = c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8),
V1= c( 30.29, 30.29, 21.60, 31.43, 20.75,20.75, 21.60, 30.03, 21.60, 31.30, 31.60, 21.72, 31.6, 20.02, 11.60, 20.16),
V2=c(40, 40, 30.78, 41.63, 40.41, 40.41,30.78, 40.97, 20.78, 40.84, 41.85, 41.85, 40.78, 31.79,20.78, 30.23))
which looks like this:
timepoint id V1 V2
1 2 1 30.29 40.00
2 1 1 30.29 40.00
3 2 2 21.60 30.78
4 1 2 31.43 41.63
5 2 3 20.75 40.41
6 1 3 20.75 40.41
7 2 4 21.60 30.78
8 1 4 30.03 40.97
9 2 5 21.60 20.78
10 1 5 31.30 40.84
11 2 6 31.60 41.85
12 1 6 21.72 41.85
13 2 7 31.60 40.78
14 1 7 20.02 31.79
15 2 8 11.60 20.78
16 1 8 20.16 30.23
To generate the (wrong) plot I currently have, please run the codes below:
library(ggplot2)
library(lemon)
ggplot(datatest, aes(V1,V2,color=as.factor(timepoint),group=id)) +ggtitle("V2 vs V1 from T1 to T2")+
geom_pointline(linesize=1, size=2, distance=4, arrow = arrow(angle = 30, length = unit(0.1, "inches"), ends = "first", type = "open") )+
scale_x_continuous(limits = c(0,33), breaks=seq(0,30,10), expand = c(0, 0)) +
scale_y_continuous(limits = c(0,43), breaks=seq(0,44,10),expand = c(0, 0))+
scale_color_manual(values=c("green","red"))+labs(color = "Timepoint")
The plot currently looks like this:
Thank you!
Would this get you closer?
library(dplyr)
library(tidyr)
library(ggplot2)
data <- data.frame(timepoint =rep(seq(2,1),8),
id = c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8),
V1= c( 30.29, 30.29, 21.60, 31.43, 20.75,20.75, 21.60, 30.03, 21.60, 31.30, 31.60, 21.72, 31.6, 20.02, 11.60, 20.16),
V2=c(40, 40, 30.78, 41.63, 40.41, 40.41,30.78, 40.97, 20.78, 40.84, 41.85, 41.85, 40.78, 31.79,20.78, 30.23))
data <- data %>%
mutate(row_id = paste0("T", timepoint)) %>%
pivot_wider(id_cols = id,
names_from = row_id,
values_from = c(V1, V2)) %>%
mutate(colour = ifelse((V1_T1 > V1_T2) | (V2_T1 > V2_T2), "red", "green"))
ggplot(data = data) +
geom_point(aes(x = V1_T1, y = V2_T1)) +
geom_point(aes(x = V1_T2, y = V2_T2)) +
geom_segment(aes(x = V1_T1, xend = V1_T2, y = V2_T1 , yend = V2_T2, colour = colour),
arrow = arrow(length = unit(0.3,"cm"))) +
scale_x_continuous(
limits = c(0, 33),
breaks = seq(0, 30, 10),
expand = c(0, 0)
) +
scale_y_continuous(
limits = c(0, 43),
breaks = seq(0, 44, 10),
expand = c(0, 0)
)
You can filter the object data to remove those lines where V1 and V2 do not change and not draw the lines with length zero.

Abundance map of species with ggmap -> How to adjust sice of points adequate

I want to create a map with ggmap where you can see which species found where and how many times. The "how many" should be represented by the size of the point/bubble. But I could imagine representing by colour (like a heatmap) would also be fine, and then seperating the species by shaping the points.
What I've got so far:
For the background map:
B<-get_map(location = c(lon = -56.731405, lat =-61.4831206),
zoom = 6,
scale = "auto",
maptype = c("terrain"),
source = c("google"))
For the scatter plot:
D<-ggmap(B) +
geom_point(data=Anatomy,
aes(x=Anatomy$Longitude,y=Anatomy$Latitude,
color=Species,
size=??) +
scale_size_continuous(range=c(1,12))
My data=Anatomy looks like this:
Species Lat Long Station
1 A 50 60 I
2 A 50 60 I
3 A 40 30 II
4 B 50 60 I
5 B 40 30 II
6 C 50 60 I
7 C 10 10 III
8 C 10 10 III
9 C 40 30 II
My idea was to use dplyr and filter by rows and sum the categories somehow. Or what do you think? And do you think this is the best way to present this data?
welcome to StackOverflow! To help people help you, you should strive to provide a reproducible example. Here for example would be a small code excerpt for your data:
library(tidyverse)
Anatomy <- tribble(~Species, ~Lat, ~Long, ~Station,
"A", 50, 60, "I",
"A", 50, 60, "I",
"A", 40, 30, "II",
"B", 50, 60, "I",
"B", 40, 30, "II")
There are several problems with your data/code:
projection: you will most likely need to reproject the data. Just looking at the coordinates, your points are 50, 60, while the map shows -50, -60. Find out the projection, and use for example st_transform from package sf
quoting of variables: you do not need to call the data-frame again, as in Anatomy$Latitude. Just use Latitude. Plus latitude is actually lat in your data!?
aggregation: I would suggest just using the count() function to see the number of observations per station.
Here is a piece of code. Note I just reverse (60, 50) to (-60, -50) which is obviously wrong!
library(ggmap)
#> Loading required package: ggplot2
#> Google's Terms of Service: https://cloud.google.com/maps-platform/terms/.
#> Please cite ggmap if you use it! See citation("ggmap") for details.
library(tidyverse)
B<-get_map(location = c(lon = -56.731405, lat =-61.4831206),
zoom = 6,
scale = "auto",
maptype = c("terrain"),
source = c("google"))
library(tidyverse)
Anatomy <- tribble(~Species, ~Lat, ~Long, ~Station,
"A", 50, 60, "I",
"A", 50, 60, "I",
"A", 40, 30, "II",
"B", 50, 60, "I",
"B", 40, 30, "II")
Anatomy_clean <- Anatomy %>%
mutate_at(c("Lat", "Long"), funs(-1*.)) %>%
count(Species, Lat, Long, Station)
#> Warning: funs() is soft deprecated as of dplyr 0.8.0
#> please use list() instead
#>
#> # Before:
#> funs(name = f(.)
#>
#> # After:
#> list(name = ~f(.))
#> This warning is displayed once per session.
ggmap(B) +
geom_point(data=Anatomy_clean,
aes(x= Lat,y=Long, color=Species, size= n)) +
scale_size_continuous(range=c(1,12))
#> Warning: Removed 2 rows containing missing values (geom_point).

R: plot multiple curves vs one var but for 4 factors

I have a DF that looks like:
id app vac dac
1: 1 1000802 579 455
2: 1 1000803 1284 918
3: 1 1000807 68 66
4: 1 1000809 1470 903
5: 2 1000802 407 188
6: 2 1000803 365 364
7: 2 1000807 938 116
8: 2 1000809 699 570
I need to plot vac and dac for each app on same canvas as a function of id. I know how to do it for only one app by using melt and bulk-plot with ggplot. But I'm stuck how to do it for arbitrary number of factors/levels.
In this example there will be total 8 curves for 4 app. Any thoughts?
Here's the data frame for tests. Thank you!!
df = structure(list(id = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), app = c(1000802,
1000803, 1000807, 1000809, 1000802, 1000803, 1000807, 1000809
), vac = c(579, 1284, 68, 1470, 407, 365, 938, 699), dac = c(455,
918, 66, 903, 188, 364, 116, 570)), .Names = c("id", "app", "vac",
"dac"), class = c("data.table", "data.frame"), row.names = c(NA,
-8L))
Edit: some clarification on axes,
x axis = id, y axis = values of vac and dac for each of 4 app factors
It is a bit unclear what you are looking for, but if you are looking for a line connecting the values of vac and dac, here is a solution using dplyr and tidyr.
First, gather the vac and dac columns (this is similar to reshape2::melt but with a syntax I find easier to follow). Then, set the variable (which has "vac" and "dac") as your x-locations, the value (from the old vac and dac columns) as your y and then map app and id to aesthetics (here, color and linetype). Set the group to ensure that it connects the right pairs of points, and add geom_line:
df %>%
gather(variable, value, vac, dac) %>%
ggplot(aes(x = variable
, y = value
, color = factor(app)
, linetype = factor(id)
, group = paste(app, id))) +
geom_line()
gives
Given the question edit, you can change axes like so:
df %>%
gather(variable, value, vac, dac) %>%
ggplot(aes(x = id
, y = value
, color = factor(app)
, linetype = variable
, group = paste(app, variable))) +
geom_line()
gives
I not sure, I understood your question but I would do something like
ggplot(df,aes(vac,app,group=app)) + geom_point(aes(color=factor(app)))

Draw a graph in R with header elaborate on two columns

I have a table with header expanded on two columns. How to draw a 3D graph on this table OR what would be a way to draw a graph on tables having elaborated headers. Kindly suggest me alternate ways to achieve this (if any)
Crime Table:
year
2014 2015 2016
Reported Detected Reported Detected Reported Detected
Murder 221 208 178 172 26 20
Murder(Gain) 20 16 11 9 1 1
Dacoity 51 45 44 36 5 1
Robbery 538 316 351 201 23 10
Chain Snatching 528 394 342 229 23 0
Code:
library(tables)
#CLASS 1 CRIMES 2014
c14 <- structure(list(`Reported` = c(221, 20, 51,
538, 528), `Detected` = c(208, 16, 45, 316, 394)), .Names = c("Reported",
"Detected"), row.names = c("Murder", "Murder(Gain)", "Dacoity", "Robbery", "Chain Snatching"), class = "data.frame")
c14
#CLASS 1 CRIMES 2015
c15 <- structure(list(`Reported` = c(178, 11, 44,
351, 342), `Detected` = c(172, 9,
36, 201, 229)), .Names = c("Reported",
"Detected"), row.names = c("Murder", "Murder(Gain)", "Dacoity",
"Robbery", "Chain Snatching"), class = "data.frame")
c15
#CLASS 1 CRIMES 31-01-2016
c16 <- structure(list(`Reported` = c(26, 1, 5,
23, 23), `Detected` = c(20, 1,
1, 10, 0)), .Names = c("Reported",
"Detected"), row.names = c("Murder", "Murder(Gain)", "Dacoity",
"Robbery", "Chain Snatching"), class = "data.frame")
c16
# rbind with rownames as a column
st <- rbind(
data.frame(c14, year = '2014', what = factor(rownames(c14), levels = rownames(c14)),
row.names= NULL, check.names = FALSE),
data.frame(c15,year = '2015',what = factor(rownames(c15), levels = rownames(c15)),
row.names = NULL,check.names = FALSE),
data.frame(c16,year = '2016',what = factor(rownames(c16), levels = rownames(c16)),
row.names = NULL,check.names = FALSE)
)
crimetable <- tabular(Heading()*what ~ year*(`Reported` +`Detected`)*Heading()*(identity),data=st)
crimetable
As I hate 3D plots for 3-way tables and I like ggplot2, I suggest this:
Gather your data into "long" format:
library(tidyr)
st_long = gather(st, type, count, -c(year, what))
head(st_long, 3)
# year what type count
# 1 2014 Murder Reported 221
# 2 2014 Murder(Gain) Reported 20
# 3 2014 Dacoity Reported 51
As you can see, both Detected and Reported columns are now included in the same column called type. This is useful for ggplot2, as it can easily create facets. Facets are separate elements within the plot that share the same aesthetic components but work with on different groups of data:
library(ggplot2)
ggplot(st_long, aes(year, count, group = what, color = what)) +
geom_line() +
facet_wrap(~ type)
(I am not saying that line plot is the only/best plot here, but it is often used when comparing frequencies across different time-points.)

Creating a colored scatter plot

I'm taking this data vis class in which the professor has us basically copying and pasting code instead of teaching us anything. I'm trying to figure out how to create a scatter plot which illustrates the strike rate and civilian casualties of drone warfare.
The problem I'm having is how to use a variable from the data to dictate the color of a data point. I want to minimally use the "status" (dead/2, alive/1) to color the points.
It'd be ideal if I could figure out how to color the points based upon the drone target's nationality, too, since I have data for that. Anyway, this is what I have so far. It creates the points, but not the colors. I'd like to know how to create the colors.
symbols(killVStarget$name, killVStarget$strikes,
circles=sqrt(killVStarget$casualties),
col=ifelse(killVStarget$status==2, "red", "black"), cex=0.15)
I imported the data from a .csv file. Here are the first 10 entries copied from excel:
name nationality status strikes casualties
baitullah mehsud pakistani 2 7 164
qari hussain pakistani 2 6 128
abu ubaidah al masri pakistani 2 3 120
mullah sangeen zadran pakistani 2 3 108
ayman al-zawahiri pakistani 1 2 105
sirajudin haqqani pakistani 1 5 82
hakimullah mehsud pakistani 2 5 68
sadiq noor pakistani 2 4 57
said al-shihri yemeni 2 4 57
df <- data.frame(name = c("baitullah mehsud pakistani", "qari hussain pakistani", "abu ubaidah al masri pakistani", "mullah sangeen zadran pakistani",
"ayman al-zawahiri pakistani", "sirajudin haqqani pakistani", "hakimullah mehsud pakistani", "sadiq noor pakistani",
"said al-shihri yemeni "), strikes = c(7, 6, 3, 3, 2, 5, 5, 4, 4), status = c(2, 2, 2, 2, 1, 1, 2, 2, 2),
casualities = c(164, 128, 120, 108, 105, 82, 68, 57, 57)
)
library(ggplot2)
ggplot(aes(x = name, y = strikes, size = casualities, color = factor(status)), data = df) + geom_point()
ggplot(aes(x = strikes, y = name, size = casualities, color = factor(status)), data = df) + geom_point()

Resources