R: Manipulating dataframes - r

Df_01a
Name re1 re2 re3 parameter
a 144 39.7 0.012 fed
b 223 31.2 5 fed
c 304 6.53 100 fed
d 187 51.3 25 fed
e 110 2.94 100 fed
f 151 4.23 75 fed
g 127 36.7 0.012 fed
Df_01b
Name re1 re2 re3 parameter
a 142 39.3 0.042 feh
b 221 31.0 4 feh
c 301 6.13 90 feh
d 185 41.3 15 feh
e 107 2.44 940 feh
f 143 2.23 75 feh
g 121 31.7 0.012 feh
Df_02
parameter c1 c2 c3
1 fed 5 4 3
2 feh 3 4 2
3 fea 5 4 3
4 few 2 4 3
Desired result:
c-value re-value name
5 142 a_fed
4 39.3 a_fed
3 0.042 a_fed
5 221 b_fed
4 31.0 b_fed
3 4 b_fed
5 304 c_fed
4 6.53 c_fed
3 100 c_fed
....
3 0.012 g_fed
3 142 a_feh
4 39.3 a_feh
2 0.042 a_feh
3 221 b_feh
4 31.0 b_feh
2 4 b_feh
....
I have Df_01a, Df_01b, Df_01c, Df_01d. These have a parameter in
column 5: fed, feh, fea, few, respectively (See Df_02).
Each parameter has 3 values, given by c1, c2 and c3 in Df_02.
How can I get the desired data.frame shown above?

code
library(dplyr)
library(tidyr)
rbind(Df_01a,Df_01b) %>% gather("re-col","re-value",c("re1","re2","re3")) %>%
inner_join(Df_02 %>% rename(re1=c1,re2=c2,re3=c3) %>% gather("re-col","c-value",c("re1","re2","re3"))) %>%
arrange(parameter,Name) %>%
unite(name,Name,parameter) %>%
select(`c-value`,`re-value`,`name`)
result
# c-value re-value name
# 1 5 144.000 a_fed
# 2 4 39.700 a_fed
# 3 3 0.012 a_fed
# 4 5 223.000 b_fed
# 5 4 31.200 b_fed
# 6 3 5.000 b_fed
# 7 5 304.000 c_fed
# 8 4 6.530 c_fed
# 9 3 100.000 c_fed
# 10 5 187.000 d_fed
# 11 4 51.300 d_fed
# 12 3 25.000 d_fed
# 13 5 110.000 e_fed
# 14 4 2.940 e_fed
# 15 3 100.000 e_fed
# 16 5 151.000 f_fed
# 17 4 4.230 f_fed
# 18 3 75.000 f_fed
# 19 5 127.000 g_fed
# 20 4 36.700 g_fed
# 21 3 0.012 g_fed
# 22 3 142.000 a_feh
# 23 4 39.300 a_feh
# 24 2 0.042 a_feh
# 25 3 221.000 b_feh
# 26 4 31.000 b_feh
# 27 2 4.000 b_feh
# 28 3 301.000 c_feh
# 29 4 6.130 c_feh
# 30 2 90.000 c_feh
# 31 3 185.000 d_feh
# 32 4 41.300 d_feh
# 33 2 15.000 d_feh
# 34 3 107.000 e_feh
# 35 4 2.440 e_feh
# 36 2 940.000 e_feh
# 37 3 143.000 f_feh
# 38 4 2.230 f_feh
# 39 2 75.000 f_feh
# 40 3 121.000 g_feh
# 41 4 31.700 g_feh
# 42 2 0.012 g_feh
data
Df_01a <- read.table(text="Name re1 re2 re3 parameter
a 144 39.7 0.012 fed
b 223 31.2 5 fed
c 304 6.53 100 fed
d 187 51.3 25 fed
e 110 2.94 100 fed
f 151 4.23 75 fed
g 127 36.7 0.012 fed",header=T,stringsAsFactors=F)
Df_01b <- read.table(text="Name re1 re2 re3 parameter
a 142 39.3 0.042 feh
b 221 31.0 4 feh
c 301 6.13 90 feh
d 185 41.3 15 feh
e 107 2.44 940 feh
f 143 2.23 75 feh
g 121 31.7 0.012 feh",header=T,stringsAsFactors=F)
Df_02 <- read.table(text="parameter c1 c2 c3
1 fed 5 4 3
2 feh 3 4 2
3 fea 5 4 3
4 few 2 4 3",header=T,stringsAsFactors=F)

Related

Classification using custom ordination rule

In my example, I want to use the following code:
# Classifiction dataset
library(dplyr)
nest <- c(1,3,4,7,12,13,21,25,26,28)
finder_max <- c(9,50,25,50,25,50,9,9,9,3)
max_TA <- c(7.4,29.4,17.0,33.1,16.2,34.4,4.3,3.52,7.47,1.4)
ds.class <- data.frame(nest,finder_max,max_TA)
ds.class$ClassType <- ifelse(ds.class$finder_max==3,"Class_1_3",
ifelse(ds.class$finder_max==9,"Class_3_9",
ifelse(ds.class$finder_max==25,"Class_9_25",
ifelse(ds.class$finder_max==50,"Class_25_50","Class_50_51"))))
ds.class
# nest finder_max max_TA ClassType
# 1 1 9 7.40 Class_3_9
# 2 3 50 29.40 Class_25_50
# 3 4 25 17.00 Class_9_25
# 4 7 50 33.10 Class_25_50
# 5 12 25 16.20 Class_9_25
# 6 13 50 34.40 Class_25_50
# 7 21 9 4.30 Class_3_9
# 8 25 9 3.52 Class
# 9 26 9 7.47 Class_3_9
# 10 28 3 1.40 Class_1_3
# Custom ordination vector
custom.vec <- c("Class_0_1","Class_1_3","Class_3_9",
"Class_9_25","Class_25_50","Class_50")
# Original dataset
my.ds <- read.csv("https://raw.githubusercontent.com/Leprechault/trash/main/test_ants.csv")
my.ds$ClassType <- cut(my.ds$AT,breaks=c(-Inf,1,2.9,8.9,24.9,49.9,Inf),
right=FALSE,labels=c("Class_0_1","Class_1_3","Class_3_9",
"Class_9_25","Class_25_50","Class_50"))
str(my.ds)
# 'data.frame': 55 obs. of 4 variables:
# $ days : int 0 47 76 0 47 76 118 160 193 227 ...
# $ nest : int 2 2 2 3 3 3 3 3 3 3 ...
# $ AT : num 10.92 22.86 23.24 0.14 0.48 ...
# $ ClassType: Factor w/ 6 levels "Class_0_1","Class_1_3",..: 4 4 4 1 1 1 1 1 1 1 ...
I'd like to remove the rows in the my.ds with equal ClassType find in ds.class by nest. I need to remove too, the classes
higher in my custom ordination than ClassType (custom.vec). Example: If I have ClassType Class_25_50 in nest 3 in ds.class, I need to remove the data with this ClassType and higher classes ("Class_50"), if exist, for nest 3 in the file my.ds
My new output must to be for new.my.ds:
new.my.ds
# days nest AT ClassType
# 1 0 2 10.9200 Class_9_25
# 2 47 2 22.8600 Class_9_25
# 3 76 2 23.2400 Class_9_25
# 4 0 3 0.1400 Class_0_1
# 5 47 3 0.4800 Class_0_1
# 6 76 3 0.8300 Class_0_1
# 7 118 3 0.8300 Class_0_1
# 8 160 3 0.9400 Class_0_1
# 9 193 3 0.9400 Class_0_1
# 10 227 3 0.9400 Class_0_1
# 11 262 3 0.9400 Class_0_1
# 12 306 3 0.9400 Class_0_1
# 13 355 3 11.9300 Class_9_25
# 14 396 3 12.8100 Class_9_25
# 16 0 4 1.0000 Class_1_3
# 17 76 4 1.5600 Class_1_3
# 18 160 4 2.8800 Class_1_3
# 19 193 4 2.8800 Class_1_3
# 20 227 4 2.8800 Class_1_3
# 21 262 4 2.8800 Class_1_3
# 22 306 4 2.8800 Class_1_3
# 24 0 7 11.7100 Class_9_25
# 25 47 7 24.7900 Class_9_25
#...
# 55 349 1067 0.9600 Class_0_1
Please, any help with it?

Looping over Dataframes and Columns in R

I have a dataframe with this structure:
df <- read.table(text="
site date v1 v2 v3 v4
a 2019-08-01 0 17 94 150
b 2019-08-01 5 25 83 148
c 2019-08-01 6 39 43 148
d 2019-08-01 10 39 144 165
a 2019-03-31 4 15 106 154
b 2019-03-31 4 21 70 151
c 2019-03-31 8 30 44 148
d 2019-03-31 9 41 144 160
a 2019-01-04 3 10 104 153
b 2019-01-04 2 16 90 150
c 2019-01-04 8 40 62 151
d 2019-01-04 9 43 142 162
a 2019-07-07 3 14 93 152
b 2019-07-07 2 23 74 147
c 2019-07-07 9 31 58 147
d 2019-07-07 9 36 123 170
a 2019-06-17 0 12 91 153
b 2019-06-17 3 25 73 147
c 2019-06-17 7 35 45 146
d 2019-06-17 8 40 134 168
a 2019-01-11 4 14 104 153
b 2019-01-11 5 18 73 151
c 2019-01-11 7 35 65 147
d 2019-01-11 11 44 134 168
a 2019-11-11 4 20 103 152
b 2019-11-11 6 22 79 152
c 2019-11-11 5 38 52 147
d 2019-11-11 10 38 144 163
a 2019-09-06 3 13 102 155
b 2019-09-06 6 17 74 149
c 2019-09-06 9 32 45 146
d 2019-09-06 11 42 138 165
", header=TRUE, stringsAsFactors=FALSE)
Now, I would like to calculate the statistic (min, max, mean, median, sd) of the variables (v1 - v4) for each of the sites for a full year, only the summer and only the winter.
First I subsetted the data for the summer and winter using the following code:
df_summer <- selectByDate(df, month = c(4:9))
df_winter <- selectByDate(df, month = c(1,2,3,10,11,12))
Then I tried to build a loop for the season and then for the variables. For this i created two lists:
df_list <- list(df, df_summer, df_winter)
col_names <- c("v1", "v2", "v3", "v4")
which I then tried to implement in the loop:
for (i in seq_along(df_list)){
for (j in col_names[,i]){
[j]_[i] <- describeBy([i]$[,j], [i]$site)
[j]_[i] <- data.frame(matrix(unlist([j]_[i]), nrow=length([j]_[i]), byrow=T))
[j]_[i]$site <- c("Frau2", "MW", "Sys1", "Sys4")
[j]_[i]$season <- c([i], [i], [i], [i])
[j]_[i]$type <- c([j], [j], [j], [j])
}
}
But this did not work - I get the messages:
Error: unexpected '[' in:
"for (j in col_names[,i]){
["
Error: unexpected '[' in " ["
Error: unexpected '}' in " }"
I already used the loop-"workflow" to generate the data I wanted, but this was done with copy and paste in order to get the data quick and dirty. Now I would like to tidy up the code.
Do you have an Idea how I could make this work or what I am doing wrong?
Thank you!
Matthias
UPDATE
So I tried what ekoam suggested - thank you for that! - and the following problems occured.
In contrary to the comments I wrote below ekoam's answer, the error occurs with both datasets (the example one provided here and the actual one I'm using - I'm not sure whether I'm allowed to publish the dataset).
This is my used code and the error I got:
df <- read_excel("C:/###/###/###/Example_data.xlsx")
df <- subset(data_watersamples, site %in% c("a","b","c", "d"))
my_summary <-
. %>%
group_by(site) %>%
summarise_at(vars(
c(v1, v2, v3, v4),
list(min = min, max = max, mean = mean, median = median, sd = sd)
)) %>%
pivot_longer(-site, names_to = c("type", "stat"), names_sep = "_") %>%
pivot_wider(names_from = "stat")
summer <- as.integer(format.Date(df$date, "%m")) %in% 4:9
df_list <- list(full_year = df, summer = df[summer, ], winter = df[!summer, ])
lapply(df_list, my_summary)
and get this error:
Error: Must subset columns with a valid subscript vector.
x Subscript has the wrong type `list`.
i It must be numeric or character.
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
Error in `*tmp*`[[id - n]] :
attempt to select more than one element in integerOneIndex
Thanks for your help!
Matthias
As you want things to be tidy, how about this tidyverse approach to your problem?
library(dplyr)
library(tidyr)
my_summary <-
. %>%
group_by(site) %>%
summarise(across(
c(v1, v2, v3, v4),
list(min = min, max = max, mean = mean, median = median, sd = sd)
)) %>%
pivot_longer(-site, names_to = c("type", "stat"), names_sep = "_") %>%
pivot_wider(names_from = "stat")
summer <- as.integer(format.Date(df$date, "%m")) %in% 4:9
df_list <- list(full_year = df, summer = df[summer, ], winter = df[!summer, ])
lapply(df_list, my_summary)
Output
`summarise()` ungrouping output (override with `.groups` argument)
`summarise()` ungrouping output (override with `.groups` argument)
`summarise()` ungrouping output (override with `.groups` argument)
$full_year
# A tibble: 16 x 7
site type min max mean median sd
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 a v1 0 4 2.62 3 1.69
2 a v2 10 20 14.4 14 3.07
3 a v3 91 106 99.6 102. 5.93
4 a v4 150 155 153. 153 1.49
5 b v1 2 6 4.12 4.5 1.64
6 b v2 16 25 20.9 21.5 3.52
7 b v3 70 90 77 74 6.63
8 b v4 147 152 149. 150. 1.92
9 c v1 5 9 7.38 7.5 1.41
10 c v2 30 40 35 35 3.78
11 c v3 43 65 51.8 48.5 8.84
12 c v4 146 151 148. 147 1.60
13 d v1 8 11 9.62 9.5 1.06
14 d v2 36 44 40.4 40.5 2.67
15 d v3 123 144 138. 140 7.38
16 d v4 160 170 165. 165 3.40
$summer
# A tibble: 16 x 7
site type min max mean median sd
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 a v1 0 3 1.5 1.5 1.73
2 a v2 12 17 14 13.5 2.16
3 a v3 91 102 95 93.5 4.83
4 a v4 150 155 152. 152. 2.08
5 b v1 2 6 4 4 1.83
6 b v2 17 25 22.5 24 3.79
7 b v3 73 83 76 74 4.69
8 b v4 147 149 148. 148. 0.957
9 c v1 6 9 7.75 8 1.5
10 c v2 31 39 34.2 33.5 3.59
11 c v3 43 58 47.8 45 6.90
12 c v4 146 148 147. 146. 0.957
13 d v1 8 11 9.5 9.5 1.29
14 d v2 36 42 39.2 39.5 2.5
15 d v3 123 144 135. 136 8.85
16 d v4 165 170 167 166. 2.45
$winter
# A tibble: 16 x 7
site type min max mean median sd
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 a v1 3 4 3.75 4 0.5
2 a v2 10 20 14.8 14.5 4.11
3 a v3 103 106 104. 104 1.26
4 a v4 152 154 153 153 0.816
5 b v1 2 6 4.25 4.5 1.71
6 b v2 16 22 19.2 19.5 2.75
7 b v3 70 90 78 76 8.83
8 b v4 150 152 151 151 0.816
9 c v1 5 8 7 7.5 1.41
10 c v2 30 40 35.8 36.5 4.35
11 c v3 44 65 55.8 57 9.60
12 c v4 147 151 148. 148. 1.89
13 d v1 9 11 9.75 9.5 0.957
14 d v2 38 44 41.5 42 2.65
15 d v3 134 144 141 143 4.76
16 d v4 160 168 163. 162. 3.40

How can i reorder a variable having categorical values in dplyr [duplicate]

This question already has answers here:
Reorder rows using custom order
(2 answers)
Closed 6 years ago.
I have done some manipulations as below to arrive at the following dataframe:
df
cluster.kmeans variable max mean median min sd
1 1 MonthlySMS 191 90.32258 71.0 8 56.83801
2 1 SixMonthlyData 1085 567.09677 573.0 109 275.46994
3 1 SixMonthlySMS 208 94.38710 86.0 29 56.27828
4 1 ThreeMonthlyData 1038 563.03226 573.0 94 275.51340
5 1 ThreeMonthlySMS 199 88.35484 76.0 6 59.15491
6 2 MonthlySMS 155 53.18815 57.0 1 31.64533
7 2 SixMonthlyData 574 280.27352 280.5 -48 139.75252
8 2 SixMonthlySMS 167 57.77526 47.0 1 33.49210
9 2 ThreeMonthlyData 548 280.89547 279.0 -11 137.54755
10 2 ThreeMonthlySMS 149 53.68641 50.5 3 31.40001
11 3 MonthlySMS 215 135.60202 137.0 49 34.09794
12 3 SixMonthlyData 1046 541.76322 557.0 2 258.90622
13 3 SixMonthlySMS 314 152.40302 152.0 27 45.55642
14 3 ThreeMonthlyData 1064 541.50378 558.0 10 255.35560
15 3 ThreeMonthlySMS 240 146.00756 146.0 54 37.06427
16 4 MonthlySMS 136 49.93980 54.5 1 31.47778
17 4 SixMonthlyData 1091 788.09365 805.0 503 145.67031
18 4 SixMonthlySMS 190 57.50167 46.0 1 33.66157
19 4 ThreeMonthlyData 1073 785.19398 799.5 500 142.90054
20 4 ThreeMonthlySMS 141 50.88796 46.0 1 31.07977
I would like to order the variable column based on these strings:
top.vars_kmeans
[1] "ThreeMonthlySMS" "SixMonthlyData" "ThreeMonthlyData"
[4] "MonthlySMS" "SixMonthlySMS"
I could do it using sqldf as below:
library(sqldf)
a <- c(1,2,3,4,5)
a <- data.frame(top.vars_kmeans,a)
a <- sqldf('select a1.* ,b1.a from "MS.DATA.STATS.KMEANS" a1 inner join a b1
on a1.variable=b1."top.vars_kmeans"')
a <- sqldf('select * from a order by "cluster.kmeans",a')
a$a <- NULL
a
cluster.kmeans variable max mean median min sd
1 1 ThreeMonthlySMS 199 88.35484 76.0 6 59.15491
2 1 SixMonthlyData 1085 567.09677 573.0 109 275.46994
3 1 ThreeMonthlyData 1038 563.03226 573.0 94 275.51340
4 1 MonthlySMS 191 90.32258 71.0 8 56.83801
5 1 SixMonthlySMS 208 94.38710 86.0 29 56.27828
6 2 ThreeMonthlySMS 149 53.68641 50.5 3 31.40001
7 2 SixMonthlyData 574 280.27352 280.5 -48 139.75252
8 2 ThreeMonthlyData 548 280.89547 279.0 -11 137.54755
9 2 MonthlySMS 155 53.18815 57.0 1 31.64533
10 2 SixMonthlySMS 167 57.77526 47.0 1 33.49210
11 3 ThreeMonthlySMS 240 146.00756 146.0 54 37.06427
12 3 SixMonthlyData 1046 541.76322 557.0 2 258.90622
13 3 ThreeMonthlyData 1064 541.50378 558.0 10 255.35560
14 3 MonthlySMS 215 135.60202 137.0 49 34.09794
15 3 SixMonthlySMS 314 152.40302 152.0 27 45.55642
16 4 ThreeMonthlySMS 141 50.88796 46.0 1 31.07977
17 4 SixMonthlyData 1091 788.09365 805.0 503 145.67031
18 4 ThreeMonthlyData 1073 785.19398 799.5 500 142.90054
19 4 MonthlySMS 136 49.93980 54.5 1 31.47778
20 4 SixMonthlySMS 190 57.50167 46.0 1 33.66157
I am just curious to know if this could be achieved using dplyr......my understanding of this wonderful package will get enhanced....
need help here!
We can use arrange with match
library(dplyr)
a %>%
arrange(cluster.kmeans, match(variable, top.vars_kmeans))
# cluster.kmeans variable max mean median min sd
#1 1 ThreeMonthlySMS 199 88.35484 76.0 6 59.15491
#2 1 SixMonthlyData 1085 567.09677 573.0 109 275.46994
#3 1 ThreeMonthlyData 1038 563.03226 573.0 94 275.51340
#4 1 MonthlySMS 191 90.32258 71.0 8 56.83801
#5 1 SixMonthlySMS 208 94.38710 86.0 29 56.27828
#6 2 ThreeMonthlySMS 149 53.68641 50.5 3 31.40001
#7 2 SixMonthlyData 574 280.27352 280.5 -48 139.75252
#8 2 ThreeMonthlyData 548 280.89547 279.0 -11 137.54755
#9 2 MonthlySMS 155 53.18815 57.0 1 31.64533
#10 2 SixMonthlySMS 167 57.77526 47.0 1 33.49210
#11 3 ThreeMonthlySMS 240 146.00756 146.0 54 37.06427
#12 3 SixMonthlyData 1046 541.76322 557.0 2 258.90622
#13 3 ThreeMonthlyData 1064 541.50378 558.0 10 255.35560
#14 3 MonthlySMS 215 135.60202 137.0 49 34.09794
#15 3 SixMonthlySMS 314 152.40302 152.0 27 45.55642
#16 4 ThreeMonthlySMS 141 50.88796 46.0 1 31.07977
#17 4 SixMonthlyData 1091 788.09365 805.0 503 145.67031
#18 4 ThreeMonthlyData 1073 785.19398 799.5 500 142.90054
#19 4 MonthlySMS 136 49.93980 54.5 1 31.47778
#20 4 SixMonthlySMS 190 57.50167 46.0 1 33.66157
you can redefine a factor (or ordered factor) with the levels in desired order (e.g. as stored in top.vars_kmeans):
a$variable <- factor(a$variable, levels = top.vars_kmeans)
See also the help page online, or via ?factor.
If you desire to order the whole data.frame, go by the answer of akrun.
You can try group_by and slice:
df %>% group_by(cluster.kmeans) %>% slice(match(top.vars_kmeans, variable))
# cluster.kmeans variable max mean median min sd
# (int) (fctr) (int) (dbl) (dbl) (int) (dbl)
#1 1 ThreeMonthlySMS 199 88.35484 76.0 6 59.15491
#2 1 SixMonthlyData 1085 567.09677 573.0 109 275.46994
#3 1 ThreeMonthlyData 1038 563.03226 573.0 94 275.51340
#4 1 MonthlySMS 191 90.32258 71.0 8 56.83801
#5 1 SixMonthlySMS 208 94.38710 86.0 29 56.27828
#6 2 ThreeMonthlySMS 149 53.68641 50.5 3 31.40001
#7 2 SixMonthlyData 574 280.27352 280.5 -48 139.75252
#8 2 ThreeMonthlyData 548 280.89547 279.0 -11 137.54755
#9 2 MonthlySMS 155 53.18815 57.0 1 31.64533
#10 2 SixMonthlySMS 167 57.77526 47.0 1 33.49210
#11 3 ThreeMonthlySMS 240 146.00756 146.0 54 37.06427
#12 3 SixMonthlyData 1046 541.76322 557.0 2 258.90622
#13 3 ThreeMonthlyData 1064 541.50378 558.0 10 255.35560
#14 3 MonthlySMS 215 135.60202 137.0 49 34.09794
#15 3 SixMonthlySMS 314 152.40302 152.0 27 45.55642
#16 4 ThreeMonthlySMS 141 50.88796 46.0 1 31.07977
#17 4 SixMonthlyData 1091 788.09365 805.0 503 145.67031
#18 4 ThreeMonthlyData 1073 785.19398 799.5 500 142.90054
#19 4 MonthlySMS 136 49.93980 54.5 1 31.47778
#20 4 SixMonthlySMS 190 57.50167 46.0 1 33.66157

Get boxplot stats of a column separated by values in another column in dataframe in R

I have a data frame like this:
distance exclude
1.1 F
1.5 F
3 F
2 F
1 F
5 T
3 F
63 F
32 F
21 F
15 F
1 T
I want get the four boxplot stats of each segment of data in distance column separated by "T" in exclude column, here "T" serves as separator.
Can anyone help? Thanks so much!
First, let's create some fake data:
library(dplyr)
# Fake data
set.seed(49349)
dat = data.frame(distance=rnorm(500, 50, 10),
exclude=sample(c("T","F"), 500, replace=TRUE, prob=c(0.03,0.95)))
Now create a new group each time exclude = "T". Then, for each group, and calculate whatever statistics you wish and return the results in a data frame:
box.stats = dat %>%
mutate(group = cumsum(exclude=="T")) %>%
group_by(group) %>%
do(data.frame(n=length(.$distance),
out_90 = sum(.$distance > quantile(.$distance, 0.9)),
out_10 = sum(.$distance < quantile(.$distance, 0.1)),
MEAN = round(mean(.$distance),2),
SD = round(sd(.$distance),2),
out_2SD_high = sum(.$distance > mean(.$distance) + 2*sd(.$distance)),
round(t(quantile(.$distance, probs=c(0,0.1,0.25,0.5,0.75,0.9,1))),2)))
names(box.stats) = gsub("X(.*)\\.$", "p\\1", names(box.stats))
box.stats
group n out_90 out_10 MEAN SD out_2SD_high p0 p10 p25 p50 p75 p90 p100
1 0 15 2 2 46.21 8.78 0 28.66 36.03 41.88 46.04 52.33 56.30 61.98
2 1 36 4 4 50.03 10.01 0 21.71 38.78 44.63 51.13 56.66 61.58 67.84
3 2 80 8 8 50.36 9.00 1 20.30 38.10 45.95 51.28 56.51 61.74 70.44
4 3 9 1 1 55.62 8.58 0 42.11 47.10 49.19 54.54 63.63 65.84 67.88
5 4 16 2 2 47.70 7.79 0 29.03 39.89 43.60 49.26 52.92 56.97 58.02
6 5 66 7 7 49.86 9.93 2 24.84 36.00 45.05 50.51 55.65 61.41 75.27
7 6 44 5 5 50.35 10.39 1 31.72 36.36 43.49 50.95 55.78 64.88 73.64
8 7 80 8 8 49.18 9.24 1 27.62 37.86 42.06 50.34 56.60 59.66 72.13
9 8 31 3 3 52.56 11.18 0 25.78 39.94 44.10 51.32 62.02 66.35 70.40
10 9 60 6 6 50.31 9.82 1 25.43 37.44 44.53 50.31 56.78 62.36 71.77
11 10 33 4 4 49.99 9.78 2 32.74 38.72 42.56 49.60 55.75 62.86 72.20
12 11 30 3 3 48.26 11.47 1 30.03 37.68 40.24 45.65 55.42 60.18 79.36

How to change plotting characters in Lattice

I am trying to change the kind of characters used by lattice in an xyplot using the following data
> rate
Temp Rep Ind Week Weight Rate
1 9 1 B 1 2.6713 0.254
2 9 1 B 2 2.6713 0.076
3 9 1 B 6 2.6713 0.000
4 9 1 B 8 2.6713 0.000
5 9 1 MST 1 1.0154 0.711
6 9 1 MST 2 1.0154 0.137
7 9 1 MST 6 1.0154 0.000
8 9 1 MST 8 1.0154 0.000
9 9 1 MSCT 1 1.2829 0.447
10 9 1 MSCT 2 1.2829 0.345
11 9 1 MSCT 6 1.2829 0.000
12 9 1 MSCT 8 1.2829 0.000
13 9 1 MBT 1 1.8709 0.211
14 9 1 MBT 2 1.8709 0.255
15 9 1 MBT 6 1.8709 0.000
16 9 1 MBT 8 1.8709 0.000
17 9 1 MBCT 1 2.1388 0.230
18 9 1 MBCT 2 2.1388 0.281
19 9 1 MBCT 6 2.1388 0.000
20 9 1 MBCT 8 2.1388 0.000
21 9 2 S 1 0.8779 0.287
22 9 2 S 2 0.8779 0.065
23 9 2 S 6 0.8779 0.000
24 9 2 S 8 0.8779 0.000
25 9 2 MST 1 0.7196 0.197
26 9 2 MST 2 0.7196 0.193
27 9 2 MST 6 0.7196 0.000
28 9 2 MST 8 0.7196 0.000
29 9 2 MSCT 1 1.4773 0.198
30 9 2 MSCT 2 1.4773 0.233
31 9 2 MSCT 6 1.4773 0.000
32 9 2 MSCT 8 1.4773 0.000
33 9 2 MBT 1 3.4376 0.244
34 9 2 MBT 2 3.4376 0.123
35 9 2 MBT 6 3.4376 0.000
36 9 2 MBT 8 3.4376 0.000
37 9 2 MBCT 1 1.2977 0.514
38 9 2 MBCT 2 1.2977 0.118
39 9 2 MBCT 6 1.2977 0.000
40 9 2 MBCT 8 1.2977 0.000
41 12 1 B 1 3.8078 0.262
42 12 1 B 2 3.8078 0.328
43 12 1 B 6 3.8078 0.000
44 12 1 B 8 3.8078 0.000
45 12 1 MST 1 1.6222 0.294
46 12 1 MST 2 1.6222 0.213
47 12 1 MST 6 1.6222 0.000
48 12 1 MST 8 1.6222 0.000
49 12 1 MSCT 1 1.0231 0.358
50 12 1 MSCT 2 1.0231 0.281
51 12 1 MSCT 6 1.0231 0.000
52 12 1 MSCT 8 1.0231 0.000
53 12 1 MBT 1 1.2747 0.353
54 12 1 MBT 2 1.2747 0.254
55 12 1 MBT 6 1.2747 0.000
56 12 1 MBT 8 1.2747 0.000
57 12 1 MBCT 1 1.0602 0.390
58 12 1 MBCT 2 1.0602 0.321
59 12 1 MBCT 6 1.0602 0.000
60 12 1 MBCT 8 1.0602 0.000
61 12 2 S 1 0.2584 0.733
62 12 2 S 2 0.2584 0.444
63 12 2 S 6 0.2584 0.000
64 12 2 S 8 0.2584 0.000
65 12 2 MST 1 0.6781 0.314
66 12 2 MST 2 0.6781 0.421
67 12 2 MST 6 0.6781 0.000
68 12 2 MST 8 0.6781 0.000
69 12 2 MSCT 1 0.7488 0.845
70 12 2 MSCT 2 0.7488 0.661
71 12 2 MSCT 6 0.7488 0.000
72 12 2 MSCT 8 0.7488 0.000
73 12 2 MBT 1 1.1220 0.184
74 12 2 MBT 2 1.1220 0.305
75 12 2 MBT 6 1.1220 0.000
76 12 2 MBT 8 1.1220 0.000
77 12 2 MBCT 1 1.4029 0.338
78 12 2 MBCT 2 1.4029 0.410
79 12 2 MBCT 6 1.4029 0.000
80 12 2 MBCT 8 1.4029 0.000
81 15 1 B 1 3.7202 0.340
82 15 1 B 2 3.7202 0.566
83 15 1 B 6 3.7202 0.000
84 15 1 B 8 3.7202 0.000
85 15 1 MST 1 0.7914 0.668
86 15 1 MST 2 0.7914 0.903
87 15 1 MST 6 0.7914 0.000
88 15 1 MST 8 0.7914 0.000
89 15 1 MSCT 1 1.2503 0.266
90 15 1 MSCT 2 1.2503 0.402
91 15 1 MSCT 6 1.2503 0.000
92 15 1 MSCT 8 1.2503 0.000
93 15 1 MBT 1 0.7691 0.362
94 15 1 MBT 2 0.7691 0.850
95 15 1 MBT 6 0.7691 0.000
96 15 1 MBT 8 0.7691 0.000
97 15 1 MBCT 1 1.7025 0.232
98 15 1 MBCT 2 1.7025 0.462
99 15 1 MBCT 6 1.7025 0.000
100 15 1 MBCT 8 1.7025 0.000
101 15 2 S 1 0.6142 0.084
102 15 2 S 2 0.6142 0.060
103 15 2 S 6 0.6142 0.000
104 15 2 S 8 0.6142 0.000
105 15 2 MST 1 1.0184 0.318
106 15 2 MST 2 1.0184 0.638
107 15 2 MST 6 1.0184 0.000
108 15 2 MST 8 1.0184 0.000
109 15 2 MSCT 1 1.0176 0.177
110 15 2 MSCT 2 1.0176 0.343
111 15 2 MSCT 6 1.0176 0.000
112 15 2 MSCT 8 1.0176 0.000
113 15 2 MBT 1 1.6684 0.311
114 15 2 MBT 2 1.6684 0.461
115 15 2 MBT 6 1.6684 0.000
116 15 2 MBT 8 1.6684 0.000
117 15 2 MBCT 1 2.1278 0.201
118 15 2 MBCT 2 2.1278 0.489
119 15 2 MBCT 6 2.1278 0.000
120 15 2 MBCT 8 2.1278 0.000
121 18 1 B 1 3.0669 0.233
122 18 1 B 2 3.0669 0.482
123 18 1 B 6 3.0669 0.000
124 18 1 B 8 3.0669 0.000
125 18 1 MST 1 1.1641 0.208
126 18 1 MST 2 1.1641 0.201
127 18 1 MST 6 1.1641 0.000
128 18 1 MST 8 1.1641 0.000
129 18 1 MSCT 1 1.0183 0.108
130 18 1 MSCT 2 1.0183 0.303
131 18 1 MSCT 6 1.0183 0.000
132 18 1 MSCT 8 1.0183 0.000
133 18 1 MBT 1 1.2028 -0.041
134 18 1 MBT 2 1.2028 -0.004
135 18 1 MBT 6 1.2028 0.000
136 18 1 MBT 8 1.2028 0.000
137 18 1 MBCT 1 1.6395 0.072
138 18 1 MBCT 2 1.6395 0.234
139 18 1 MBCT 6 1.6395 0.000
140 18 1 MBCT 8 1.6395 0.000
141 18 2 S 1 0.5858 0.466
142 18 2 S 2 0.5858 0.336
143 18 2 S 6 0.5858 0.000
144 18 2 S 8 0.5858 0.000
145 18 2 MST 1 1.5694 0.272
146 18 2 MST 2 1.5694 0.257
147 18 2 MST 6 1.5694 0.000
148 18 2 MST 8 1.5694 0.000
149 18 2 MSCT 1 1.1295 0.523
150 18 2 MSCT 2 1.1295 0.521
151 18 2 MSCT 6 1.1295 0.000
152 18 2 MSCT 8 1.1295 0.000
153 18 2 MBT 1 1.7526 0.105
154 18 2 MBT 2 1.7526 0.118
155 18 2 MBT 6 1.7526 0.000
156 18 2 MBT 8 1.7526 0.000
157 18 2 MBCT 1 1.6924 0.320
158 18 2 MBCT 2 1.6924 0.387
159 18 2 MBCT 6 1.6924 0.000
160 18 2 MBCT 8 1.6924 0.000
the code for plotting is
rate$Temp <- as.character(rate$Temp)
rate$Week <- as.character(rate$Week)
rate$Rep <- as.character(rate$Rep)
xyplot(Rate~Weight|Rep+Temp, groups=Week, rate,auto.key=list(columns=2), as.table=TRUE, xlab="Weight (gr)", ylab="Rate (umol/L*gr)", main="All individuals and Treatments at all times")
But this gives me all the symbols as a O and I need to make each set plotted with a different symbol.
I like to use the theme mechanism to do this. The black and white theme, will do different symbols by default; you get it like this:
bwtheme <- standard.theme("pdf", color=FALSE)
Or you can start with the color theme and modify the points as you like, as follows.
mytheme <- standard.theme("pdf")
mytheme$superpose.symbol$pch <- c(15,16,17,3)
mytheme$superpose.symbol$col <- c("blue","red","green","purple")
p4 <- xyplot(Rate~Weight|Rep+Temp, groups=Week, data=rate,
as.table=TRUE,
xlab="Weight (gr)", ylab="Rate (umol/L*gr)",
main="All individuals and Treatments at all times",
strip=strip.custom(strip.names=1),
par.settings=mytheme,
auto.key=list(title="Week", cex.title=1, space="right")
)
Or, if you'd rather have it all one line, just pass what you want to change to par.settings.
xyplot(Rate~Weight|Rep+Temp, groups=Week, data=rate,
as.table=TRUE,
xlab="Weight (gr)", ylab="Rate (umol/L*gr)",
main="All individuals and Treatments at all times",
strip=strip.custom(strip.names=1),
par.settings=list(superpose.symbol=list(
pch=c(15,16,17,3),
col=c("blue","red","green","purple"))),
auto.key=list(title="Week", cex.title=1, space="right")
)
These solutions are recommended over changing col and pch directly because then they must also be changed when building the key.
Two other notes that you may find instructive: First, try using factor instead of as.character; this will sort your weeks in the proper order. You can do this with less typing using within.
rate <- within(rate, {
Temp <- factor(Temp)
Week <- factor(Week)
Rep <- factor(Rep)
}
Second, check out the useOuterStrips function in the latticeExtra package. In particular, if your original plot is saved as p, try
useOuterStrips(p, strip=strip.custom(strip.names=1),
strip.left=strip.custom(strip.names=1) )
I found a way of changing the characters without changing the theme, just by adding a bit more code to the plot as follows
xyplot(Rate~Weight|Rep+Temp, groups=Week, rate,
pch=c(15,16,17,3), #this defines the different plot symbols used
col=c("blue","red","green","purple"), # this defines the colos used in the plot
as.table=TRUE,
xlab="Weight (gr)", ylab="Rate (umol/L*gr)",
main="All individuals and Treatments at all times",
strip=strip.custom(strip.names=1), #this changes what is displayed in the strip
key= list(text=list(c("Week","1","2","6","8")),
points=list(pch=c(NA,15,16,17,3),col=c(NA,"blue","red","green","purple")),
space="right")#this adds a complete key
)

Resources