r grep with or statement - r

I've been working on a r function to filter a large data frame of baseball team batting stats by game id, (i.e."2016/10/11/chnmlb-sfnmlb-1"), to create a list of past team matchups by season.
When I use some combinations of teams, output is correct, but others are not. (output contains a variety of ids)
I'm not real familiar with grep, and assume that is the problem. I patched my grep line and list output together by searching stack overflow and thought I had it till testing proved otherwise.
matchup.func <- function (home, away, df) {
matchups <- grep(paste('[0-9]{4}/[0-9]{2}/[0-9]{2}/[', home, '|', away, 'mlb]{6}-[', away, '|', home, 'mlb]{6}-[0-9]{1}', sep = ''), df$game.id, value = TRUE)
df <- df[df$game.id %in% matchups, c(1, 3:ncol(df))]
out <- list()
for (n in 1:length(unique(df$season))) {
for (s in unique(df$season)[n]) {
out[[s]] <- subset(df, season == s)
}
}
return(out)
}
sample of data frame:
bat.stats[sample(nrow(bat.stats), 3), ]
date game.id team wins losses flag ab r h d t hr rbi bb po da so lob avg obp slg ops roi season
1192 2016-04-11 2016/04/11/texmlb-seamlb-1 sea 2 5 away 38 7 14 3 0 0 7 2 27 8 11 15 0.226 0.303 0.336 0.639 0.286 R
764 2016-03-26 2016/03/26/wasmlb-slnmlb-1 sln 8 12 away 38 7 9 2 1 1 5 2 27 8 11 19 0.289 0.354 0.474 0.828 0.400 S
5705 2016-09-26 2016/09/26/oakmlb-anamlb-1 oak 67 89 home 29 2 6 1 0 1 2 2 27 13 4 12 0.260 0.322 0.404 0.726 0.429 R
sample of errant output:
matchup.func('tex', 'sea', bat.stats)
$S
date team wins losses flag ab r h d t hr rbi bb po da so lob avg obp slg ops roi season
21 2016-03-02 atl 1 0 home 32 4 7 0 0 2 3 2 27 19 2 11 0.203 0.222 0.406 0.628 1.000 S
22 2016-03-02 bal 0 1 away 40 11 14 3 2 2 11 10 27 13 4 28 0.316 0.415 0.532 0.947 0.000 S
47 2016-03-03 bal 0 2 home 41 10 17 7 0 2 10 0 27 9 3 13 0.329 0.354 0.519 0.873 0.000 S
48 2016-03-03 tba 1 1 away 33 3 5 0 1 0 3 2 24 10 8 13 0.186 0.213 0.343 0.556 0.500 S
141 2016-03-05 tba 2 2 home 35 6 6 2 0 0 5 3 27 11 5 15 0.199 0.266 0.318 0.584 0.500 S
142 2016-03-05 bal 0 5 away 41 10 17 5 1 0 10 4 27 9 10 13 0.331 0.371 0.497 0.868 0.000 S
sample of good:
matchup.func('bos', 'bal', bat.stats)
$S
date team wins losses flag ab r h d t hr rbi bb po da so lob avg obp slg ops roi season
143 2016-03-06 bal 0 6 home 34 8 14 4 0 0 8 5 27 5 8 22 0.284 0.330 0.420 0.750 0.000 S
144 2016-03-06 bos 3 2 away 38 7 10 3 0 0 7 7 24 7 13 25 0.209 0.285 0.322 0.607 0.600 S
209 2016-03-08 bos 4 3 home 37 1 12 1 1 0 1 4 27 15 8 26 0.222 0.292 0.320 0.612 0.571 S
210 2016-03-08 bal 0 8 away 36 5 12 5 0 1 4 4 27 9 4 27 0.283 0.345 0.429 0.774 0.000 S
On the good it gives a list of matchups as it should, (i.e. S, R, F, D), on the bad it outputs by season, but seems to only give matchups by date and not team. Not sure what to think.

I think that the issue is that regex inside [] behaves differently than you might expect. Specifically, it is looking for any matches to those characters, and in any order. Instead, you might try
matchups <- grep(paste0("(", home, "|", away, ")mlb-(", home, "|", away, ")mlb")
, df$game.id, value = TRUE)
That should give you either the home or the away team, followed by either the home or away team. Without more sample data though, I am not sure if this will catch edge cases.
You should also note that you don't have to match the entire string, so the date-finding regex at the beginning is likely superfluous.

Related

For Loop Alternative on large data frame that runs a different filter with each iteration

I'm running a loop that takes the ranking from R1[i] and filters a data frame of all rankings in the specified range and at the same time filtering a different column R2[i] to find the ranking of an opponent so I end up with a new data frame that only includes matches that involve players in those specific ranking ranges so that I can find the mean of a column for only those matches.
For Example: Player 1 is Ranked 10th and Player 2 is Ranked 34th. The following code takes every match including players ranked between 5-15 +/- 20% of 10 and players ranked between 29-39 +/- 20% of 34.
Then it finds the mean of Data_Dif and returns to the initial DF in row [i] and does so for every row.
This code works fine but it's a bit messy and it takes 4 hours to run 57,000 matches. Does anyone have a faster solution please? I have to run this every day.
Rank <- Data %>% filter(between(R1, Data$R1[i]-5-(Data$R1[i]*0.2), Data$R1[i]+5+(Data$R1[i]*0.2)) | between(R1, Data$R2[i]-5-(Data$R2[i]*0.2), Data$R2[i]+5+(Data$R2[i]*0.2)))
%>% filter(between(R2, Data$R1[i]-5-(Data$R1[i]*0.2), Data$R1[i]+5+(Data$R1[i]*0.2)) | between(R2, Data$R2[i]-5-(Data$R2[i]*0.2), Data$R2[i]+5+(Data$R2[i]*0.2)))
Rank_Difference <- Data$Rank_Dif[i]
Rank <- Rank %>% filter(Rank_Dif >= Rank_Difference-5)
Data$Rank_Adv[i] <- mean(Rank$Data_Dif)
}
Data
R1 R2 Rank_Dif Data_Dif Rank_Adv
1 2 1 1 -0.272 0.037696970
2 10 34 24 0.377 0.146838617
3 10 29 19 0.373 0.130336232
4 2 5 3 0.134 0.076242424
5 34 17 17 -0.196 0.094226519
6 1 18 17 0.144 0.186158879
7 17 25 8 0.264 0.036212219
8 42 18 24 0.041 0.102343915
9 5 13 8 -0.010 0.091952381
10 34 21 13 -0.226 0.060790576
11 2 14 12 0.022 0.122350649
12 10 158 148 0.330 0.184901961
13 11 1 10 -0.042 0.109918367
14 29 52 23 0.463 0.054469108
15 10 1000 990 0.628 0.437600000
16 17 329 312 0.445 0.307750000
17 11 20 9 0.216 0.072621875
18 417 200 217 -0.466 0.106737401
19 5 53 48 0.273 0.243890710
20 14 7 7 -0.462 0.075739414

Need help subsetting data based on a columns value

OD Graph Example
Let me preface by saying that I am new to R and have limited coding experience. I have a data.frame with 4 different variables, three of which are factors (replicate, dilution, and hours). My final variable is the optical density which I need as a number value.
I'm looking to graph the differences of optical density across dilution based on hours (think bar graph with positive and negative values based on dilution). The real problem for me is that I don't know how to separate my data based on hours so I can find the difference in the density between them. I feel like this is a simple task, but everywhere I've looked has let me down the wrong path.
Replicate pf_dilution hours OD
1 1 0 0 0.050
2 2 0 0 0.045
3 3 0 0 0.061
4 1 10 0 0.155
5 2 10 0 0.138
6 3 10 0 0.135
Further down the list hours go to 24 and later 48.
Graphing the data set
df %>%
group_by(hours) %>%
ggplot(aes(x = pf_dilution, y = OD)) +
geom_col(aes(fill = hours), position = position_dodge()) +
labs(title = "Optical Density of C. elegans against P. fluorescens ",
x = "PF Concentration [uL]",
y = "OD") +
scale_x_continuous(breaks = seq(0, 100, 10)) +
scale_fill_discrete(name = "Hours")
Data
df <- read.table(text = "
Replicate pf_dilution hours OD
1 0 0 0.05
2 0 0 0.045
3 0 0 0.061
1 10 0 0.155
2 10 0 0.138
3 10 0 0.135
1 20 0 0.234
2 20 0 0.212
3 20 0 0.23
1 30 0 0.31
2 30 0 0.278
3 30 0 0.279
1 40 0 0.372
2 40 0 0.392
3 40 0 0.367
1 50 0 0.426
2 50 0 0.464
3 50 0 0.443
1 60 0 0.524
2 60 0 0.546
3 60 0 0.544
1 70 0 0.624
2 70 0 0.587
3 70 0 0.55
1 80 0 0.638
2 80 0 0.658
3 80 0 0.658
1 90 0 0.721
2 90 0 0.711
3 90 0 0.711
1 100 0 0.791
2 100 0 0.791
3 100 0 0.784
1 0 24 0.059
2 0 24 0.065
3 0 24 0.063
1 10 24 0.132
2 10 24 0.106
3 10 24 0.108
1 20 24 0.186
2 20 24 0.158
3 20 24 0.184
1 30 24 0.235
2 30 24 0.206
3 30 24 0.191
1 40 24 0.263
2 40 24 0.296
3 40 24 0.255
1 50 24 0.304
2 50 24 0.333
3 50 24 0.329
1 60 24 0.358
2 60 24 0.414
3 60 24 0.414
1 70 24 0.512
2 70 24 0.438
3 70 24 0.438
1 80 24 0.509
2 80 24 0.487
3 80 24 0.481
1 90 24 0.573
2 90 24 0.528
3 90 24 0.525
1 100 24 0.633
2 100 24 0.602
3 100 24 0.607
1 0 48 0.473
2 0 48 0.392
3 0 48 0.486
1 10 48 0.473
2 10 48 0.473
3 10 48 0.491
1 20 48 0.466
2 20 48 0.437
3 20 48 0.487
1 30 48 0.469
2 30 48 0.435
3 30 48 0.424
1 40 48 0.431
2 40 48 0.439
3 40 48 0.414
1 50 48 0.42
2 50 48 0.423
3 50 48 0.402
1 60 48 0.42
2 60 48 0.523
3 60 48 0.53
1 70 48 0.531
2 70 48 0.464
3 70 48 0.45
1 80 48 0.502
2 80 48 0.511
3 80 48 0.482
1 90 48 0.549
2 90 48 0.516
3 90 48 0.488
1 100 48 0.627
2 100 48 0.562
3 100 48 0.583
",
header = TRUE,
colClasses = c("factor", "integer", "factor", "double")
)

Calculating the average of multiple entries with identical names

Not sure if the title makes sense, but I am new to "R" and to say the least I am confused. As you can see in the code below I have multiple entries that have the same name. For example, time ON and sample 1 appears 3 times. I want to figure out how to calculate the average of the OD at time ON and sample 1. How do I go about doing this? I want to do this for all repeats in the data frame.
Thanks in advance! Hope my question makes sense.
> freednaod
time sample OD
1 ON 1 0.248
2 ON 1 0.245
3 ON 1 0.224
4 ON 2 0.262
5 ON 2 0.260
6 ON 2 0.255
7 ON 3 0.245
8 ON 3 0.249
9 ON 3 0.244
10 0 1 0.010
11 0 1 0.013
12 0 1 0.012
13 0 2 0.014
14 0 2 0.013
15 0 2 0.015
16 0 3 0.013
17 0 3 0.013
18 0 3 0.014
19 30 1 0.018
20 30 1 0.020
21 30 1 0.019
22 30 2 0.017
23 30 2 0.019
24 30 2 0.021
25 30 3 0.021
26 30 3 0.020
27 30 3 0.024
28 60 1 0.023
29 60 1 0.024
30 60 1 0.023
31 60 2 0.031
32 60 2 0.031
33 60 2 0.033
34 60 3 0.025
35 60 3 0.028
36 60 3 0.024
37 90 1 0.052
38 90 1 0.048
39 90 1 0.049
40 90 2 0.076
41 90 2 0.078
42 90 2 0.081
43 90 3 0.073
44 90 3 0.068
45 90 3 0.067
46 120 1 0.124
47 120 1 0.128
48 120 1 0.134
49 120 2 0.202
50 120 2 0.202
51 120 2 0.186
52 120 3 0.192
53 120 3 0.182
54 120 3 0.183
55 150 1 0.229
56 150 1 0.215
57 150 1 0.220
58 150 2 0.197
59 150 2 0.216
60 150 2 0.200
61 150 3 0.207
62 150 3 0.211
63 150 3 0.209
By converting the 'time' column to a factor with levels specified by the unique level, the output would be ordered in the same order as in the initial dataset
aggregate(OD~ sample + time, transform(freednaod,
time = factor(time, levels = unique(time))), mean)[c(2, 1, 3)]
Or using dplyr
library(dplyr)
freednaod %>%
group_by(time = factor(time, levels = unique(time)), sample) %>%
summarise(OD = mean(OD))

Merging uneven Panel Data frames in R

I have two sets of panel data that I would like to merge. The problem is that, for each respective time interval, the variable which links the two data sets appears more frequently in the first data frame than the second. My objective is to add each row from the second data set to its corresponding row in the first data set, even if that necessitates copying said row multiple times in the same time interval. Specifically, I am working with basketball data from the NBA. The first data set is a panel of Player and Date while the second is one of Team (Tm) and Date. Thus, each Team entry should be copied multiple times per date, once for each player on that team who played that day. I could do this easily in excel, but the data frames are too large.
The result is 0 observations of 52 variables. I've experimented with bind, match, different versions of merge, and I've searched for everything I can think of; but, nothing seems to address this issue specifically. Disclaimer, I am very new to R.
Here is my code up until my road block:
HGwd = "~/Documents/Fantasy/Basketball"
library(plm)
library(mice)
library(VIM)
library(nnet)
library(tseries)
library(foreign)
library(ggplot2)
library(truncreg)
library(boot)
Pdata = read.csv("2015-16PlayerData.csv", header = T)
attach(Pdata)
Pdata$Age = as.numeric(as.character(Pdata$Age))
Pdata$Date = as.Date(Pdata$Date, '%m/%e/%Y')
names(Pdata)[8] = "OppTm"
Pdata$GS = as.factor(as.character(Pdata$GS))
Pdata$MP = as.numeric(as.character(Pdata$MP))
Pdata$FG = as.numeric(as.character(Pdata$FG))
Pdata$FGA = as.numeric(as.character(Pdata$FGA))
Pdata$X2P = as.numeric(as.character(Pdata$X2P))
Pdata$X2PA = as.numeric(as.character(Pdata$X2PA))
Pdata$X3P = as.numeric(as.character(Pdata$X3P))
Pdata$X3PA = as.numeric(as.character(Pdata$X3PA))
Pdata$FT = as.numeric(as.character(Pdata$FT))
Pdata$FTA = as.numeric(as.character(Pdata$FTA))
Pdata$ORB = as.numeric(as.character(Pdata$ORB))
Pdata$DRB = as.numeric(as.character(Pdata$DRB))
Pdata$TRB = as.numeric(as.character(Pdata$TRB))
Pdata$AST = as.numeric(as.character(Pdata$AST))
Pdata$STL = as.numeric(as.character(Pdata$STL))
Pdata$BLK = as.numeric(as.character(Pdata$BLK))
Pdata$TOV = as.numeric(as.character(Pdata$TOV))
Pdata$PF = as.numeric(as.character(Pdata$PF))
Pdata$PTS = as.numeric(as.character(Pdata$PTS))
PdataPD = plm.data(Pdata, index = c("Player", "Date"))
attach(PdataPD)
Tdata = read.csv("2015-16TeamData.csv", header = T)
attach(Tdata)
Tdata$Date = as.Date(Tdata$Date, '%m/%e/%Y')
names(Tdata)[3] = "OppTm"
Tdata$MP = as.numeric(as.character(Tdata$MP))
Tdata$FG = as.numeric(as.character(Tdata$FG))
Tdata$FGA = as.numeric(as.character(Tdata$FGA))
Tdata$X2P = as.numeric(as.character(Tdata$X2P))
Tdata$X2PA = as.numeric(as.character(Tdata$X2PA))
Tdata$X3P = as.numeric(as.character(Tdata$X3P))
Tdata$X3PA = as.numeric(as.character(Tdata$X3PA))
Tdata$FT = as.numeric(as.character(Tdata$FT))
Tdata$FTA = as.numeric(as.character(Tdata$FTA))
Tdata$PTS = as.numeric(as.character(Tdata$PTS))
Tdata$Opp.FG = as.numeric(as.character(Tdata$Opp.FG))
Tdata$Opp.FGA = as.numeric(as.character(Tdata$Opp.FGA))
Tdata$Opp.2P = as.numeric(as.character(Tdata$Opp.2P))
Tdata$Opp.2PA = as.numeric(as.character(Tdata$Opp.2PA))
Tdata$Opp.3P = as.numeric(as.character(Tdata$Opp.3P))
Tdata$Opp.3PA = as.numeric(as.character(Tdata$Opp.3PA))
Tdata$Opp.FT = as.numeric(as.character(Tdata$Opp.FT))
Tdata$Opp.FTA = as.numeric(as.character(Tdata$Opp.FTA))
Tdata$Opp.PTS = as.numeric(as.character(Tdata$Opp.PTS))
TdataPD = plm.data(Tdata, index = c("OppTm", "Date"))
attach(TdataPD)
PD = merge(PdataPD, TdataPD, by = "OppTm", all.x = TRUE)
attach(PD)
Any help on how to do this would be greatly appreciated!
EDIT
I've tweaked it a little from last night, but still nothing seems to do the trick. See the above, updated code for what I am currently using.
Here is the output for head(PdataPD):
Player Date Rk Pos Tm X..H OppTm W.L GS MP FG FGA FG. X2P
22408 Aaron Brooks 2015-10-27 817 G CHI CLE W 0 16 3 9 0.333 3
22144 Aaron Brooks 2015-10-28 553 G CHI # BRK W 0 16 5 9 0.556 3
21987 Aaron Brooks 2015-10-30 396 G CHI # DET L 0 18 2 6 0.333 1
21456 Aaron Brooks 2015-11-01 4687 G CHI ORL W 0 16 3 11 0.273 3
21152 Aaron Brooks 2015-11-03 4383 G CHI # CHO L 0 17 5 8 0.625 1
20805 Aaron Brooks 2015-11-05 4036 G CHI OKC W 0 13 4 8 0.500 3
X2PA X2P. X3P X3PA X3P. FT FTA FT. ORB DRB TRB AST STL BLK TOV PF PTS GmSc
22408 8 0.375 0 1 0.000 0 0 NA 0 2 2 0 0 0 2 1 6 -0.9
22144 3 1.000 2 6 0.333 0 0 NA 0 1 1 3 1 0 1 4 12 8.5
21987 2 0.500 1 4 0.250 0 0 NA 0 4 4 4 0 0 0 1 5 5.2
21456 6 0.500 0 5 0.000 0 0 NA 2 1 3 1 1 1 1 4 6 1.0
21152 3 0.333 4 5 0.800 0 0 NA 0 0 0 4 1 0 0 4 14 12.6
20805 5 0.600 1 3 0.333 0 0 NA 1 1 2 0 0 0 0 1 9 5.6
FPTS H.A
22408 7.50 H
22144 20.25 A
21987 16.50 A
21456 14.75 H
21152 24.00 A
20805 12.00 H
And for head(TdataPD):
OppTm Date Rk X Opp Result MP FG FGA FG. X2P X2PA X2P. X3P X3PA
2105 ATL 2015-10-27 71 DET L 94-106 240 37 82 0.451 29 55 0.527 8 27
2075 ATL 2015-10-29 41 # NYK W 112-101 240 42 83 0.506 32 59 0.542 10 24
2047 ATL 2015-10-30 13 CHO W 97-94 240 36 83 0.434 28 60 0.467 8 23
2025 ATL 2015-11-01 437 # CHO W 94-92 240 37 88 0.420 30 59 0.508 7 29
2001 ATL 2015-11-03 413 # MIA W 98-92 240 37 90 0.411 30 69 0.435 7 21
1973 ATL 2015-11-04 385 BRK W 101-87 240 37 76 0.487 29 54 0.537 8 22
X3P. FT FTA FT. PTS Opp.FG Opp.FGA Opp.FG. Opp.2P Opp.2PA Opp.2P. Opp.3P
2105 0.296 12 15 0.800 94 37 96 0.385 25 67 0.373 12
2075 0.417 18 26 0.692 112 38 93 0.409 32 64 0.500 6
2047 0.348 17 22 0.773 97 36 88 0.409 24 58 0.414 12
2025 0.241 13 14 0.929 94 32 86 0.372 18 49 0.367 14
2001 0.333 17 22 0.773 98 38 86 0.442 33 58 0.569 5
1973 0.364 19 24 0.792 101 36 83 0.434 31 62 0.500 5
Opp.3PA Opp.3P. Opp.FT Opp.FTA Opp.FT. Opp.PTS
2105 29 0.414 20 26 0.769 106
2075 29 0.207 19 21 0.905 101
2047 30 0.400 10 13 0.769 94
2025 37 0.378 14 15 0.933 92
2001 28 0.179 11 16 0.688 92
1973 21 0.238 10 13 0.769 87
If there is way to truncate the output from dput(head(___)), I am not familiar with it. It appears that simply erasing the excess characters would remove entire variables from the dataset.
It would help if you posted your data (or a working subset of it) and a little more detail on how you are trying to merge, but if I understand what you are trying to do, you want each final data record to have individual stats for each player on a particular date followed by the player's team's stats for that date. In this case, you should have a team column in the Player table that identifies the player's team, and then join the two tables on the composite key Date and Team by setting the by= attribute in merge:
merge(PData, TData, by=c("Date", "Team"))
The fact that the data frames are of different lengths doesn't matter--this is exactly what join/merge operations are for.
For an alternative to merge(), you might check out the dplyr package join functions at https://cran.r-project.org/web/packages/dplyr/vignettes/two-table.html

How to change plotting characters in Lattice

I am trying to change the kind of characters used by lattice in an xyplot using the following data
> rate
Temp Rep Ind Week Weight Rate
1 9 1 B 1 2.6713 0.254
2 9 1 B 2 2.6713 0.076
3 9 1 B 6 2.6713 0.000
4 9 1 B 8 2.6713 0.000
5 9 1 MST 1 1.0154 0.711
6 9 1 MST 2 1.0154 0.137
7 9 1 MST 6 1.0154 0.000
8 9 1 MST 8 1.0154 0.000
9 9 1 MSCT 1 1.2829 0.447
10 9 1 MSCT 2 1.2829 0.345
11 9 1 MSCT 6 1.2829 0.000
12 9 1 MSCT 8 1.2829 0.000
13 9 1 MBT 1 1.8709 0.211
14 9 1 MBT 2 1.8709 0.255
15 9 1 MBT 6 1.8709 0.000
16 9 1 MBT 8 1.8709 0.000
17 9 1 MBCT 1 2.1388 0.230
18 9 1 MBCT 2 2.1388 0.281
19 9 1 MBCT 6 2.1388 0.000
20 9 1 MBCT 8 2.1388 0.000
21 9 2 S 1 0.8779 0.287
22 9 2 S 2 0.8779 0.065
23 9 2 S 6 0.8779 0.000
24 9 2 S 8 0.8779 0.000
25 9 2 MST 1 0.7196 0.197
26 9 2 MST 2 0.7196 0.193
27 9 2 MST 6 0.7196 0.000
28 9 2 MST 8 0.7196 0.000
29 9 2 MSCT 1 1.4773 0.198
30 9 2 MSCT 2 1.4773 0.233
31 9 2 MSCT 6 1.4773 0.000
32 9 2 MSCT 8 1.4773 0.000
33 9 2 MBT 1 3.4376 0.244
34 9 2 MBT 2 3.4376 0.123
35 9 2 MBT 6 3.4376 0.000
36 9 2 MBT 8 3.4376 0.000
37 9 2 MBCT 1 1.2977 0.514
38 9 2 MBCT 2 1.2977 0.118
39 9 2 MBCT 6 1.2977 0.000
40 9 2 MBCT 8 1.2977 0.000
41 12 1 B 1 3.8078 0.262
42 12 1 B 2 3.8078 0.328
43 12 1 B 6 3.8078 0.000
44 12 1 B 8 3.8078 0.000
45 12 1 MST 1 1.6222 0.294
46 12 1 MST 2 1.6222 0.213
47 12 1 MST 6 1.6222 0.000
48 12 1 MST 8 1.6222 0.000
49 12 1 MSCT 1 1.0231 0.358
50 12 1 MSCT 2 1.0231 0.281
51 12 1 MSCT 6 1.0231 0.000
52 12 1 MSCT 8 1.0231 0.000
53 12 1 MBT 1 1.2747 0.353
54 12 1 MBT 2 1.2747 0.254
55 12 1 MBT 6 1.2747 0.000
56 12 1 MBT 8 1.2747 0.000
57 12 1 MBCT 1 1.0602 0.390
58 12 1 MBCT 2 1.0602 0.321
59 12 1 MBCT 6 1.0602 0.000
60 12 1 MBCT 8 1.0602 0.000
61 12 2 S 1 0.2584 0.733
62 12 2 S 2 0.2584 0.444
63 12 2 S 6 0.2584 0.000
64 12 2 S 8 0.2584 0.000
65 12 2 MST 1 0.6781 0.314
66 12 2 MST 2 0.6781 0.421
67 12 2 MST 6 0.6781 0.000
68 12 2 MST 8 0.6781 0.000
69 12 2 MSCT 1 0.7488 0.845
70 12 2 MSCT 2 0.7488 0.661
71 12 2 MSCT 6 0.7488 0.000
72 12 2 MSCT 8 0.7488 0.000
73 12 2 MBT 1 1.1220 0.184
74 12 2 MBT 2 1.1220 0.305
75 12 2 MBT 6 1.1220 0.000
76 12 2 MBT 8 1.1220 0.000
77 12 2 MBCT 1 1.4029 0.338
78 12 2 MBCT 2 1.4029 0.410
79 12 2 MBCT 6 1.4029 0.000
80 12 2 MBCT 8 1.4029 0.000
81 15 1 B 1 3.7202 0.340
82 15 1 B 2 3.7202 0.566
83 15 1 B 6 3.7202 0.000
84 15 1 B 8 3.7202 0.000
85 15 1 MST 1 0.7914 0.668
86 15 1 MST 2 0.7914 0.903
87 15 1 MST 6 0.7914 0.000
88 15 1 MST 8 0.7914 0.000
89 15 1 MSCT 1 1.2503 0.266
90 15 1 MSCT 2 1.2503 0.402
91 15 1 MSCT 6 1.2503 0.000
92 15 1 MSCT 8 1.2503 0.000
93 15 1 MBT 1 0.7691 0.362
94 15 1 MBT 2 0.7691 0.850
95 15 1 MBT 6 0.7691 0.000
96 15 1 MBT 8 0.7691 0.000
97 15 1 MBCT 1 1.7025 0.232
98 15 1 MBCT 2 1.7025 0.462
99 15 1 MBCT 6 1.7025 0.000
100 15 1 MBCT 8 1.7025 0.000
101 15 2 S 1 0.6142 0.084
102 15 2 S 2 0.6142 0.060
103 15 2 S 6 0.6142 0.000
104 15 2 S 8 0.6142 0.000
105 15 2 MST 1 1.0184 0.318
106 15 2 MST 2 1.0184 0.638
107 15 2 MST 6 1.0184 0.000
108 15 2 MST 8 1.0184 0.000
109 15 2 MSCT 1 1.0176 0.177
110 15 2 MSCT 2 1.0176 0.343
111 15 2 MSCT 6 1.0176 0.000
112 15 2 MSCT 8 1.0176 0.000
113 15 2 MBT 1 1.6684 0.311
114 15 2 MBT 2 1.6684 0.461
115 15 2 MBT 6 1.6684 0.000
116 15 2 MBT 8 1.6684 0.000
117 15 2 MBCT 1 2.1278 0.201
118 15 2 MBCT 2 2.1278 0.489
119 15 2 MBCT 6 2.1278 0.000
120 15 2 MBCT 8 2.1278 0.000
121 18 1 B 1 3.0669 0.233
122 18 1 B 2 3.0669 0.482
123 18 1 B 6 3.0669 0.000
124 18 1 B 8 3.0669 0.000
125 18 1 MST 1 1.1641 0.208
126 18 1 MST 2 1.1641 0.201
127 18 1 MST 6 1.1641 0.000
128 18 1 MST 8 1.1641 0.000
129 18 1 MSCT 1 1.0183 0.108
130 18 1 MSCT 2 1.0183 0.303
131 18 1 MSCT 6 1.0183 0.000
132 18 1 MSCT 8 1.0183 0.000
133 18 1 MBT 1 1.2028 -0.041
134 18 1 MBT 2 1.2028 -0.004
135 18 1 MBT 6 1.2028 0.000
136 18 1 MBT 8 1.2028 0.000
137 18 1 MBCT 1 1.6395 0.072
138 18 1 MBCT 2 1.6395 0.234
139 18 1 MBCT 6 1.6395 0.000
140 18 1 MBCT 8 1.6395 0.000
141 18 2 S 1 0.5858 0.466
142 18 2 S 2 0.5858 0.336
143 18 2 S 6 0.5858 0.000
144 18 2 S 8 0.5858 0.000
145 18 2 MST 1 1.5694 0.272
146 18 2 MST 2 1.5694 0.257
147 18 2 MST 6 1.5694 0.000
148 18 2 MST 8 1.5694 0.000
149 18 2 MSCT 1 1.1295 0.523
150 18 2 MSCT 2 1.1295 0.521
151 18 2 MSCT 6 1.1295 0.000
152 18 2 MSCT 8 1.1295 0.000
153 18 2 MBT 1 1.7526 0.105
154 18 2 MBT 2 1.7526 0.118
155 18 2 MBT 6 1.7526 0.000
156 18 2 MBT 8 1.7526 0.000
157 18 2 MBCT 1 1.6924 0.320
158 18 2 MBCT 2 1.6924 0.387
159 18 2 MBCT 6 1.6924 0.000
160 18 2 MBCT 8 1.6924 0.000
the code for plotting is
rate$Temp <- as.character(rate$Temp)
rate$Week <- as.character(rate$Week)
rate$Rep <- as.character(rate$Rep)
xyplot(Rate~Weight|Rep+Temp, groups=Week, rate,auto.key=list(columns=2), as.table=TRUE, xlab="Weight (gr)", ylab="Rate (umol/L*gr)", main="All individuals and Treatments at all times")
But this gives me all the symbols as a O and I need to make each set plotted with a different symbol.
I like to use the theme mechanism to do this. The black and white theme, will do different symbols by default; you get it like this:
bwtheme <- standard.theme("pdf", color=FALSE)
Or you can start with the color theme and modify the points as you like, as follows.
mytheme <- standard.theme("pdf")
mytheme$superpose.symbol$pch <- c(15,16,17,3)
mytheme$superpose.symbol$col <- c("blue","red","green","purple")
p4 <- xyplot(Rate~Weight|Rep+Temp, groups=Week, data=rate,
as.table=TRUE,
xlab="Weight (gr)", ylab="Rate (umol/L*gr)",
main="All individuals and Treatments at all times",
strip=strip.custom(strip.names=1),
par.settings=mytheme,
auto.key=list(title="Week", cex.title=1, space="right")
)
Or, if you'd rather have it all one line, just pass what you want to change to par.settings.
xyplot(Rate~Weight|Rep+Temp, groups=Week, data=rate,
as.table=TRUE,
xlab="Weight (gr)", ylab="Rate (umol/L*gr)",
main="All individuals and Treatments at all times",
strip=strip.custom(strip.names=1),
par.settings=list(superpose.symbol=list(
pch=c(15,16,17,3),
col=c("blue","red","green","purple"))),
auto.key=list(title="Week", cex.title=1, space="right")
)
These solutions are recommended over changing col and pch directly because then they must also be changed when building the key.
Two other notes that you may find instructive: First, try using factor instead of as.character; this will sort your weeks in the proper order. You can do this with less typing using within.
rate <- within(rate, {
Temp <- factor(Temp)
Week <- factor(Week)
Rep <- factor(Rep)
}
Second, check out the useOuterStrips function in the latticeExtra package. In particular, if your original plot is saved as p, try
useOuterStrips(p, strip=strip.custom(strip.names=1),
strip.left=strip.custom(strip.names=1) )
I found a way of changing the characters without changing the theme, just by adding a bit more code to the plot as follows
xyplot(Rate~Weight|Rep+Temp, groups=Week, rate,
pch=c(15,16,17,3), #this defines the different plot symbols used
col=c("blue","red","green","purple"), # this defines the colos used in the plot
as.table=TRUE,
xlab="Weight (gr)", ylab="Rate (umol/L*gr)",
main="All individuals and Treatments at all times",
strip=strip.custom(strip.names=1), #this changes what is displayed in the strip
key= list(text=list(c("Week","1","2","6","8")),
points=list(pch=c(NA,15,16,17,3),col=c(NA,"blue","red","green","purple")),
space="right")#this adds a complete key
)

Resources