How can I create a list of 3 lists with specific length 20,30,40 to get last 20 vectors of data as list1, last 30 vectors of data as list2, last 40 vectors of data as list3
turn
data <- seq(1,100,1)
length.y <- c(20,30,40)
into
y[[1]]=seq(81,100,1)
y[[2]]=seq(71,100,1)
y[[3]]=seq(61,100,1)
I can use a for loop or create a function like this
y <- rep(list(0),3)
for(i in 1:3){
y[[i]] <- data[(length(data)-length.y[i]+1):length(data)]
}
My data is way complicate then this, so
is there an easier way to get the same result? (using lapply for example)
Using tail as already suggested in comments is an easy way. You can also turn your for loop code to lapply as :
n <- length(data)
lapply(length.y, function(x) data[(n-x + 1):n])
#[[1]]
# [1] 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98
#[19] 99 100
#[[2]]
# [1] 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88
#[19] 89 90 91 92 93 94 95 96 97 98 99 100
#[[3]]
# [1] 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78
#[19] 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96
#[37] 97 98 99 100
using purrr::map
map(length.y, ~ data[-c(1:(length(data) - .x))])
[[1]]
[1] 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
[[2]]
[1] 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
[[3]]
[1] 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94
[35] 95 96 97 98 99 100
Related
A representation of my sample is :
dat<-read.table(text=" AN1 AN2 AN3 ANM1 ANM2 ANM3
82 78 77 98 86 93
79 73 99 85 86 77
82 74 84 79 73 76
89 73 96 83 72 80
70 71 72 84 76 99
78 76 95 87 76 98
72 87 74 76 79 88
95 85 85 96 94 81
72 86 99 76 93 72
80 97 90 95 77 91
94 95 79 90 78 95
94 83 84 91 73 100
77 92 95 82 83 95
82 82 84 78 96 90
81 83 85 71 76 95
89 79 87 72 99 98
93 96 84 74 82 86
77 98 89 84 87 86
86 98 92 95 72 89
98 92 99 87 93 99",header=TRUE)
I want to make a correlation between AN1 and ANM1; AN2 and ANM2 and AN3 and ANM3 using a loop. I want to get "basic Plot" which is available here. So I will get three scatter plots separately.
I have used the following codes, but it does not work:
AN<- dat[1:3]; ANM<- dat[4:6];
lapply(1:3, function(x) ggscatter(AN=[,x],ANM[,x]))
I think with a for loop your code would look better. So, to purely reproduce your example, I would do something like this:
library(ggpubr)
dat<-read.table(text=" AN1 AN2 AN3 ANM1 ANM2 ANM3
82 78 77 98 86 93
79 73 99 85 86 77
82 74 84 79 73 76
89 73 96 83 72 80
70 71 72 84 76 99
78 76 95 87 76 98
72 87 74 76 79 88
95 85 85 96 94 81
72 86 99 76 93 72
80 97 90 95 77 91
94 95 79 90 78 95
94 83 84 91 73 100
77 92 95 82 83 95
82 82 84 78 96 90
81 83 85 71 76 95
89 79 87 72 99 98
93 96 84 74 82 86
77 98 89 84 87 86
86 98 92 95 72 89
98 92 99 87 93 99",header=TRUE)
for(i in 1:3){
AN <- paste0("AN", i)
ANM <- paste0("ANM", i)
print(
ggscatter(dat, x = AN, y = ANM)
)
}
To try to create something similar with the basic plots from the link provided, I would change the for loop to something like:
for(i in 1:3){
AN <- paste0("AN", i)
ANM <- paste0("ANM", i)
print(
ggscatter(dat, x = AN, y = ANM,
add = "reg.line",
conf.int = TRUE,
add.params = list(color = "blue", fill = "lightgray")) +
stat_cor(method = "pearson", label.x = 3, label.y = 30) # Here label.x and label.y deform the plot, seems to be a case to tune them to your needs.
)
}
Now, if you must use lapply I would try to create some abstraction by creating a function:
create_plot <- function(data, prefix_x, prefix_y, index) {
x_col <- paste0(prefix_x, index)
y_col <- paste0(prefix_y, index)
g <- ggscatter(data, x = x_col, y = y_col,
add = "reg.line",
conf.int = TRUE,
add.params = list(color = "blue", fill = "lightgray")) +
stat_cor(method = "pearson")
return(g)
}
lapply(1:3, create_plot, data = dat, prefix_x = "AN", prefix_y = "ANM")
Here is a snippet of my code:
m <- as.data.frame.matrix(matrix(c(20, 32, 52, 84, 98, 101), ncol = 2, nrow = 3))
ages <- as.numeric()
for(i in 1:nrow(m)){
ages <- c(ages, c(m$V1[i]:m$V2[i]))
}
Essentially, the first column is the starting age, and the second column is the ending age. I'm trying to append every single age from start to end for every individual into a list. Unfortunately, this is very slow since I have around a million observations, and I'm looking for a way to optimize.
We could use mapply and create sequence between two columns
unlist(mapply(`:`, m$V1, m$V2))
#[1] 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37..
#[29] 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65..
#[57] 76 77 78 79 80 81 82 83 84 32 33 34 35 36 37 38 39 40..
#[85] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68..
#[113] 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96..
#[141] 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 ..
#[169] 88 89 90 91 92 93 94 95 96 97 98 99 100 101
Here is an option using pmap
library(purrr)
library(dplyr)
set_names(m, c('from', 'to')) %>%
pmap(., seq) %>%
unlist
Or using Map from base R
unlist(do.call(Map, c(f = `:`, m)))
This question already has answers here:
Binning across multiple categories
(2 answers)
Closed 5 years ago.
I am very new to r but have been asked to use it by my professor to analyze our data. Currently, we are trying to conduct a changepoint analysis on a large set of data which I know how to do. But we want to first place our data into time bins of 30 seconds. Our trials are 20 minutes in length so we should have a total of 40 bins. We have columns for: time, Flow, and MAP and would like to take the values of flow and MAP within each 30 second bin and average them. This will condense 1120-2000 points of data into a much cleaner 40 data points. We are having trouble binning the data and dont even know where to start, once binned we would like to generate a table of those new 40 values (40 for MAP and 40 for Flow) so that we can use the changepoint package to find the changepoint in our set. We believe possibly clip( could be what we need.
Sorry if this is too confusing or too vague, we have no programming experience whatsoever.
Edit I believe this is different than the bacteria question because I wanted a direct output into a table rather than interpolating from a graph then into a table.
Here is a sample from our data:
RawMin Flow MAP
2.9982 51 77
3.0113 110 80
3.0240 84 77
3.0393 119 75
3.0551 93 75
3.0692 136 73
3.0839 81 73
3.0988 58 72
3.1138 125 71
3.1285 89 72
3.1432 160 73
3.1576 87 74
3.1714 128 74
3.1860 90 74
3.2015 63 76
3.2154 120 76
3.2293 65 76
3.2443 156 78
3.2585 66 78
3.2723 130 78
3.2876 89 77
3.3029 111 77
3.3171 90 75
3.3329 100 76
3.3482 127 76
3.3618 69 78
3.3751 155 78
3.3898 90 79
3.4041 127 80
3.4176 103 80
3.4325 87 79
3.4484 134 78
3.4637 57 77
3.4784 147 78
3.4937 75 78
3.5080 137 78
3.5203 123 78
3.5337 99 80
3.5476 170 80
3.5620 90 79
3.5756 164 78
3.5909 85 78
3.6061 164 77
3.6203 103 77
3.6348 140 79
3.6484 152 79
3.6611 79 80
3.6742 184 82
3.6872 128 81
3.7017 123 82
3.7152 176 81
3.7295 74 81
3.7436 153 80
3.7572 85 80
3.7708 115 79
3.7847 187 78
3.7980 105 78
3.8108 175 78
3.8252 124 79
3.8392 171 79
3.8528 127 78
3.8669 138 79
3.8811 198 79
3.8944 109 80
3.9080 171 80
3.9214 137 79
3.9341 109 81
3.9455 193 83
3.9575 108 85
3.9707 163 84
3.9853 136 82
4.0005 121 81
4.0164 164 79
4.0311 73 79
4.0450 171 78
4.0591 105 79
4.0716 117 79
4.0833 210 81
4.0940 103 85
4.1041 193 88
4.1152 163 84
4.1310 145 82
4.1486 126 79
4.1654 118 77
4.1811 130 75
4.1975 83 74
4.2127 176 73
4.2277 72 74
4.2424 177 74
4.2569 90 75
4.2705 148 76
4.2841 148 77
4.2986 123 77
4.3130 150 76
4.3280 71 77
4.3433 176 76
4.3583 90 76
4.3727 138 77
4.3874 136 79
4.4007 106 80
4.4133 167 83
4.4247 119 87
4.4360 123 88
4.4496 141 85
4.4673 117 84
4.4841 133 80
4.5005 83 79
4.5166 156 77
4.5324 97 77
4.5463 182 77
4.5605 110 79
4.5744 187 80
4.5882 121 81
4.6024 142 81
4.6171 178 81
4.6313 96 80
4.6452 180 80
4.6599 107 80
4.6741 151 79
4.6876 137 80
4.7009 132 82
4.7141 199 80
4.7279 91 81
4.7402 172 83
4.7531 172 80
4.7660 128 84
4.7785 197 83
4.7909 122 84
4.8046 129 84
4.8187 176 82
4.8328 102 81
4.8448 184 81
4.8556 145 83
4.8657 123 84
4.8768 138 86
4.8885 143 82
4.9040 135 81
4.9198 112 78
4.9362 134 77
4.9515 152 76
4.9651 83 76
4.9785 177 78
4.9912 114 79
5.0037 127 80
5.0167 200 81
5.0297 104 81
5.0429 175 81
5.0559 123 81
5.0685 106 81
5.0809 176 81
5.0937 113 82
5.1064 191 81
5.1181 178 79
5.1297 121 79
5.1404 176 80
5.1506 214 83
5.1606 132 85
5.1709 149 83
5.1829 175 80
5.1981 103 79
5.2128 169 76
5.2283 97 75
5.2431 149 74
5.2575 109 74
5.2709 97 74
5.2842 195 75
5.2975 104 75
5.3106 143 77
5.3231 185 76
5.3361 140 77
5.3487 132 78
5.3614 162 79
5.3750 98 78
5.3900 137 78
5.4047 108 76
5.4202 94 76
5.4341 186 75
5.4475 82 77
5.4608 157 80
5.4739 176 81
5.4867 90 83
5.4989 123 86
Assuming RawMin is time in minutes, you could do something like this...
df2 <- aggregate(df, #the data frame
by=list(cut(df$RawMin,seq(0,10,0.5))), #the bins (see below)
mean) #the aggregating function
df2
Group.1 RawMin Flow MAP
1 (2.5,3] 2.998200 51.0000 77.00000
2 (3,3.5] 3.251682 103.5588 76.20588
3 (3.5,4] 3.748994 135.9722 79.75000
4 (4,4.5] 4.240434 132.0857 79.25714
5 (4.5,5] 4.749781 140.1892 80.43243
6 (5,5.5] 5.246556 140.9231 78.89744
Binning is done with the cut function - here by 0.5 minute intervals between 0 and 10, which you might want to change. The bin names are the intervals - e.g. (2.5,3] means greater than 2.5, less than or equal to 3.
If you don't want RawMin included in the output, just use df[,-1] in the input to aggregate.
I am trying to use facets to create 6 graphs, laid out in a 2x3, with a graph for each different MLB division. I would like there to be a title for the graph as a whole, as well as a title for each graph, indicating which division it is for. I included the colors for the AL East teams, but those colors don't translate to all other divisions, so how could I go about listing the colors for all teams? Would I keep listing them in
cust <- c("#FC4C00", "#C60C30", "#1C2841", "#79BDEE","#003DA5",...,)
or would there be 6 separate lists of 5 colors?
I have included the data for each team as well as which division each team is in, followed by my attempted program.
Any help would be appreciated.
df <- read.table(textConnection(
'Year ARI ATL BAL BOS CHC CHW CIN CLE COL DET HOU KCR LAA LAD MIA MIL MIN NYM NYY OAK PHI PIT SDP SFG SEA STL TBR TEX TOR WSN
2016 69 68 89 93 103 78 68 94 75 86 84 81 74 91 79 73 59 87 84 69 71 78 68 87 86 86 68 95 89 95
2015 79 67 81 78 97 76 64 81 68 74 86 95 85 92 71 68 83 90 87 68 63 98 74 84 76 100 80 88 93 83
2014 64 79 96 71 73 73 76 85 66 90 70 89 98 94 77 82 70 79 84 88 73 88 77 88 87 90 77 67 83 96
2013 81 96 85 97 66 63 90 92 74 93 51 86 78 92 62 74 66 74 85 96 73 94 76 76 71 97 92 91 74 86
2012 81 94 93 69 61 85 97 68 64 88 55 72 89 86 69 83 66 74 95 94 81 79 76 94 75 88 90 93 73 98
2011 94 89 69 90 71 79 79 80 73 95 56 71 86 82 72 96 63 77 97 74 102 72 71 86 67 90 91 96 81 80
2010 65 91 66 89 75 88 91 69 83 81 76 67 80 80 80 77 94 79 95 81 97 57 90 92 61 86 96 90 85 69
2009 70 86 64 95 83 79 78 65 92 86 74 65 97 95 87 80 87 70 103 75 93 62 75 88 85 91 84 87 75 59
2008 82 72 68 95 97 89 74 81 74 74 86 75 100 84 84 90 88 89 89 75 92 67 63 72 61 86 97 79 86 59
2007 90 84 69 96 85 72 72 96 90 88 73 69 94 82 71 83 79 88 94 76 89 68 89 71 88 78 66 75 83 73
2006 76 79 70 86 66 90 80 78 76 95 82 62 89 88 78 75 96 97 97 93 85 67 88 76 78 83 61 80 87 71
2005 77 90 74 95 79 99 73 93 67 71 89 56 95 71 83 81 83 83 95 88 88 67 82 75 69 100 67 79 80 81
2004 51 96 78 98 89 83 76 80 68 72 92 58 92 93 83 67 92 71 101 91 86 72 87 91 63 105 70 89 67 67
2003 84 101 71 95 88 86 69 68 74 43 87 83 77 85 91 68 90 66 101 96 86 75 64 100 93 85 63 71 86 83
2002 98 101 67 93 67 81 78 74 73 55 84 62 99 92 79 56 94 75 103 103 80 72 66 95 93 97 55 72 78 83
2001 92 88 63 82 88 83 66 91 73 66 93 65 75 86 76 68 85 82 95 102 86 62 79 90 116 93 62 73 80 68
2000 85 95 74 85 65 95 85 90 82 79 72 77 82 86 79 73 69 94 87 91 65 69 76 97 91 95 69 71 83 67
1999 100 103 78 94 67 75 96 97 72 69 97 64 70 77 64 74 63 97 98 87 77 78 74 86 79 75 69 95 84 68
1998 65 106 79 92 90 80 77 89 77 65 102 72 85 83 54 74 70 88 114 74 75 69 98 89 76 83 63 88 88 65'), header = TRUE)
df2 <- read.table(textConnection(
'Division Team
ALEast BAL
ALEast BOS
ALEast NYY
ALEast TBR
ALEast TOR
ALCentral CHW
ALCentral CLE
ALCentral DET
ALCentral KCR
ALCentral MIN
ALWest HOU
ALWest LAA
ALWest OAK
ALWest SEA
ALWest TEX
NLEast ATL
NLEast MIA
NLEast NYM
NLEast PHI
NLEast WSN
NLCentral CHC
NLCentral CIN
NLCentral MIL
NLCentral PIT
NLCentral STL
NLWest ARI
NLWest COL
NLWest LAD
NLWest SDP
NLWest SFG'), header = TRUE)
df <- gather(df, Team, Wins, -Year) %>%
mutate(Team = factor(Team, c("BAL", "BOS", "NYY","TBR","TOR")))
theme_set(theme_grey() +
theme(plot.title = element_text(hjust=0.5),
axis.title.y = element_text(angle = 0, vjust = 0.5),
panel.background = element_rect(fill = "gray"),
axis.ticks=element_blank()))
cust <- c("#FC4C00", "#C60C30", "#1C2841", "#79BDEE","#003DA5")
names(cust) <- levels(df$Team)
ggplot(df, aes(x=Year, y=Wins, color = Team)) +
geom_path(aes(color = Team)) + #Change size= here to change size of lines in graph
scale_color_manual(values = cust) +
labs(title = "AL East Wins",
y = "Wins",
x = "Year")+
facet_wrap(~division) +
guides(color=guide_legend("Team",override.aes=list(size=3)))
Consider the following data handling adjustments:
Extend the factors in your mutate() call to all teams by subsetting columns names.
Replicate the colors vector with rep for all 6 divisions.
Reorder your Division factor levels so they can be presented side by side by their AL/NL split.
Merge both two dataframes to have Division as a column to pass in facet_wrap.
R Code
library(tidyr)
library(ggplot2)
library(dplyr)
...
df <- gather(df, Team, Wins, -Year) %>%
mutate(Team = factor(Team, names(df)[2:ncol(df)]))
df3 <- merge(df, df2, by="Team")
df3$Division <- factor(df3$Division,
levels = c("ALEast", "NLEast", "ALCentral", "NLCentral", "ALWest", "NLWest"))
cust <- rep(c("#FC4C00", "#C60C30", "#1C2841", "#79BDEE","#003DA5"), 6)
Facet_Wrap Graph
After the data handling, call the facet_wrap ggplot specifying ncol and nrow arguments for arrangement of 2 X 3 layout.
ggplot(df3, aes(x=Year, y=Wins, color = Team)) +
geom_path(aes(color = Team)) + #Change size= here to change size of lines in graph
scale_color_manual(values = cust) +
labs(title = "AL East Wins",
y = "Wins",
x = "Year")+
facet_wrap(~Division, ncol=2, nrow=3) +
guides(color=guide_legend("Team",override.aes=list(size=3)))
However you will notice, your teams all share same legend and title as ggplot uses same legend/title for entire graph.
Grid.Arrange Graphs
To resolve above, consider multiple graphs laid out with gridExtra's grid.arrange where you can dynamically filter the dataframe and pass unique title:
library(gridExtra)
mlb_plots <- lapply(c("ALEast", "NLEast", "ALCentral", "NLCentral", "ALWest", "NLWest"), function(d) {
ggplot(df3[df3$Division==d,], aes(x=Year, y=Wins, color = Team)) +
geom_path(aes(color = Team)) + #Change size= here to change size of lines in graph
scale_color_manual(values = cust) +
labs(title = paste(substr(d, 1, 2), substr(d, 3, nchar(as.character(d)))), # FOR SPACE BETWEEN AL/NL AND REST
y = "Wins",
x = "Year") +
guides(color=guide_legend("Team",override.aes=list(size=3)))
})
do.call(grid.arrange, mlb_plots)
I am looking to make some changes to this graph. I would like:
(1) The lines in the legend to be a lot thicker
(2) The lines on the graph to be slightly thicker
(3) This one might be tricky. I was thinking of having a graph for all 6 MLB divisions, maybe as a 2x3. Right now, it is just the AL East data. Any ideas on how I could do that? I would like to have one title for the whole graph, and a sub title for each graph indicating which division it's for.
Any help with any of these would be greatly appreciated.
df <- read.table(textConnection(
'Year ARI ATL BAL BOS CHC CHW CIN CLE COL DET HOU KCR LAA LAD MIA MIL MIN NYM NYY OAK PHI PIT SDP SFG SEA STL TBR TEX TOR WSN
2016 69 68 89 93 103 78 68 94 75 86 84 81 74 91 79 73 59 87 84 69 71 78 68 87 86 86 68 95 89 95
2015 79 67 81 78 97 76 64 81 68 74 86 95 85 92 71 68 83 90 87 68 63 98 74 84 76 100 80 88 93 83
2014 64 79 96 71 73 73 76 85 66 90 70 89 98 94 77 82 70 79 84 88 73 88 77 88 87 90 77 67 83 96
2013 81 96 85 97 66 63 90 92 74 93 51 86 78 92 62 74 66 74 85 96 73 94 76 76 71 97 92 91 74 86
2012 81 94 93 69 61 85 97 68 64 88 55 72 89 86 69 83 66 74 95 94 81 79 76 94 75 88 90 93 73 98
2011 94 89 69 90 71 79 79 80 73 95 56 71 86 82 72 96 63 77 97 74 102 72 71 86 67 90 91 96 81 80
2010 65 91 66 89 75 88 91 69 83 81 76 67 80 80 80 77 94 79 95 81 97 57 90 92 61 86 96 90 85 69
2009 70 86 64 95 83 79 78 65 92 86 74 65 97 95 87 80 87 70 103 75 93 62 75 88 85 91 84 87 75 59
2008 82 72 68 95 97 89 74 81 74 74 86 75 100 84 84 90 88 89 89 75 92 67 63 72 61 86 97 79 86 59
2007 90 84 69 96 85 72 72 96 90 88 73 69 94 82 71 83 79 88 94 76 89 68 89 71 88 78 66 75 83 73
2006 76 79 70 86 66 90 80 78 76 95 82 62 89 88 78 75 96 97 97 93 85 67 88 76 78 83 61 80 87 71
2005 77 90 74 95 79 99 73 93 67 71 89 56 95 71 83 81 83 83 95 88 88 67 82 75 69 100 67 79 80 81
2004 51 96 78 98 89 83 76 80 68 72 92 58 92 93 83 67 92 71 101 91 86 72 87 91 63 105 70 89 67 67
2003 84 101 71 95 88 86 69 68 74 43 87 83 77 85 91 68 90 66 101 96 86 75 64 100 93 85 63 71 86 83
2002 98 101 67 93 67 81 78 74 73 55 84 62 99 92 79 56 94 75 103 103 80 72 66 95 93 97 55 72 78 83
2001 92 88 63 82 88 83 66 91 73 66 93 65 75 86 76 68 85 82 95 102 86 62 79 90 116 93 62 73 80 68
2000 85 95 74 85 65 95 85 90 82 79 72 77 82 86 79 73 69 94 87 91 65 69 76 97 91 95 69 71 83 67
1999 100 103 78 94 67 75 96 97 72 69 97 64 70 77 64 74 63 97 98 87 77 78 74 86 79 75 69 95 84 68
1998 65 106 79 92 90 80 77 89 77 65 102 72 85 83 54 74 70 88 114 74 75 69 98 89 76 83 63 88 88 65'), header = TRUE)
df <- gather(df, Team, Wins, -Year) %>%
mutate(Team = factor(Team, c("BAL", "BOS", "NYY","TBR","TOR")))
theme_set(theme_grey() +
theme(plot.title = element_text(hjust=0.5),
axis.title.y = element_text(angle = 0, vjust = 0.5),
panel.background = element_rect(fill = "gray"),
axis.ticks=element_blank()))
cust <- c("#FC4C00", "#C60C30", "#1C2841", "#79BDEE","#003DA5")
names(cust) <- levels(df$Team)
ggplot(df, aes(x=Year, y=Wins, color = Team)) +
geom_path(aes(color = Team)) + #Change size= here to change size of lines in graph
scale_color_manual(values = cust) +
labs(title = "AL East Wins",
y = "Wins",
x = "Year")+
guides(color=guide_legend("Team",override.aes=list(size=3)))
To get the answers for 1 & 2 you could do:
ggplot(df, aes(x=Year, y=Wins, color = Team)) +
geom_path(aes(color = Team),size=1) + #Change size= here to change size of lines in graph
scale_color_manual(values = cust) +
labs(title = "AL East Wins",
y = "Wins",
x = "Year")+
guides(color=guide_legend("Team",override.aes=list(size=3))) #Change size= here to change size of lines in legend
Note in the geom_path call we add size= outside of the aes() to change the size of all lines in the plot. To change the size of the plots in the legend we can use the override.aes=list(size=3).
To answer part 3 of your question, I would suggest using facets. You should add another variable on your dataset which represents each division and then adding in a + facet_wrap(~division) should get you what you want.
A very rudimentary example creating a fake grouping variable would be:
df$division<-base::sample(LETTERS[1:6],nrow(df),replace=T)
ggplot(df, aes(x=Year, y=Wins, color = Team)) +
geom_path(aes(color = Team),size=1) + #Change size= here to change size of lines in graph
scale_color_manual(values = cust) +
labs(title = "AL East Wins",
y = "Wins",
x = "Year")+
facet_wrap(~division) +
guides(color=guide_legend("Team",override.aes=list(size=3))) #Change size= here to change size of lines in legend
Which yields this plot: