Unify boxplot factor group colours - r

I'm somewhat of an R and ggplot novice so I'm struggling to plot this data as a box plot with Flux on the y and Week on the X, with the boxplots grouped by species (and within each species group treatment).
Treatment Species Week Flux
1 L- Heisteria 1 0.19
2 L- Heisteria 1 0.03
3 L- Heisteria 1 NA
4 L- Heisteria 1 0.12
5 L- Simarouba 1 0.22
6 L- Simarouba 1 0.19
7 L- Simarouba 1 NA
8 L- Simarouba 1 -0.65
9 C Heisteria 1 -0.99
10 C Heisteria 1 0.10
11 C Heisteria 1 0.26
12 C Heisteria 1 NA
13 C Simarouba 1 -1.41
14 C Simarouba 1 0.17
15 C Simarouba 1 NA
16 C Simarouba 1 0.35
17 L+ Heisteria 1 0.71
18 L+ Heisteria 1 0.25
19 L+ Heisteria 1 0.08
20 L+ Heisteria 1 4.14
21 L+ Simarouba 1 -1.36
22 L+ Simarouba 1 0.06
23 L+ Simarouba 1 -0.65
24 L+ Simarouba 1 -0.25
25 L- Heisteria 2 0.31
26 L- Heisteria 2 0.15
27 L- Heisteria 2 -0.09
28 L- Heisteria 2 -0.08
29 L- Simarouba 2 0.04
30 L- Simarouba 2 0.06
31 L- Simarouba 2 0.05
32 L- Simarouba 2 -0.07
33 C Heisteria 2 0.20
34 C Heisteria 2 0.15
35 C Heisteria 2 -0.03
36 C Heisteria 2 0.18
37 C Simarouba 2 0.10
38 C Simarouba 2 0.08
39 C Simarouba 2 0.09
40 C Simarouba 2 0.05
41 L+ Heisteria 2 0.24
42 L+ Heisteria 2 0.09
43 L+ Heisteria 2 0.16
44 L+ Heisteria 2 0.11
45 L+ Simarouba 2 NA
46 L+ Simarouba 2 0.21
47 L+ Simarouba 2 -0.07
48 L+ Simarouba 2 1.51
49 L- Heisteria 3 0.15
50 L- Heisteria 3 0.07
51 L- Heisteria 3 NA
52 L- Heisteria 3 -1.02
53 L- Simarouba 3 -0.02
54 L- Simarouba 3 0.08
55 L- Simarouba 3 -0.06
56 L- Simarouba 3 -0.08
57 C Heisteria 3 0.23
58 C Heisteria 3 0.19
59 C Heisteria 3 0.09
60 C Heisteria 3 -0.10
61 C Simarouba 3 0.77
62 C Simarouba 3 0.07
63 C Simarouba 3 0.20
64 C Simarouba 3 0.62
65 L+ Heisteria 3 0.19
66 L+ Heisteria 3 -0.09
67 L+ Heisteria 3 NA
68 L+ Heisteria 3 0.06
69 L+ Simarouba 3 NA
70 L+ Simarouba 3 -0.17
71 L+ Simarouba 3 0.13
72 L+ Simarouba 3 0.64
73 L- Heisteria 4 0.13
74 L- Heisteria 4 0.54
75 L- Heisteria 4 0.18
76 L- Heisteria 4 3.59
77 L- Simarouba 4 0.00
78 L- Simarouba 4 0.10
79 L- Simarouba 4 0.20
80 L- Simarouba 4 NA
81 C Heisteria 4 -0.14
82 C Heisteria 4 -0.32
83 C Heisteria 4 0.21
84 C Heisteria 4 0.12
85 C Simarouba 4 0.10
86 C Simarouba 4 NA
87 C Simarouba 4 0.11
88 C Simarouba 4 0.42
89 L+ Heisteria 4 0.14
90 L+ Heisteria 4 0.05
91 L+ Heisteria 4 0.25
92 L+ Heisteria 4 0.74
93 L+ Simarouba 4 NA
94 L+ Simarouba 4 0.05
95 L+ Simarouba 4 -0.06
96 L+ Simarouba 4 -0.13
I can plot the data using this code
ggplot(treeflux, aes(Week, Flux, fill=interaction(Week, Species, Treatment), dodge=Species, Treatment)) +
stat_boxplot(geom ='errorbar') +
geom_boxplot()
It gives me a plot in the order I want but with way too many colours and items in the legend section. I want the treatments for each species to be variants of a single colour and the legend to read like this "L- Heisteria".

How about this for a start? (The legend for alpha needs a little tweaking ...) This is much easier than setting up an entire custom palette of fill colours and getting the legend right ...
theme_set(theme_bw()) ## my aesthetic preference, also easier for
## distinguishing light vs dark colours
ggplot(treeflux, aes(factor(Week), Flux, fill=Species, alpha=Treatment),
dodge=Species, Treatment) +
stat_boxplot(geom ='errorbar') +
geom_boxplot()

Related

How to change loadings.label in PCA plot using ggplot2?

I am plotting a PCA analysis in ggplot2 and loadings.label overlap with the arrows. I want to move the labels a little to make more accessible the reading of the plot, but I can't find a way to do it. I am attaching the plot below.
here is part of the data:
Linfoprolif CORT Testo FDL Ac.GRO ifn.g il.4 Profile
1 23.76 0.27 0.96 2.41 6 307 69 1
2 NA 2.59 0.07 0.39 4 117 58
3 25.53 0.16 0.71 2.17 5 273 54 1
4 31.67 0.88 0.07 0.55 5 211 48 1
5 6.15 0.24 0.23 1.07 5 224 48 1
6 26.19 0.74 0.04 0.60 4 308 59 1
7 10.31 0.34 0.75 2.29 7 295 49 1
8 22.30 0.42 0.07 0.63 5 271 52 1
9 24.74 0.29 1.18 2.91 4 236 56 1
10 9.51 2.19 0.07 0.40 5 54 62 2
11 22.59 0.19 0.40 3.28 4 272 58 1
12 22.01 0.28 0.04 0.54 4 67 64 1
13 39.21 0.21 0.82 1.91 4 235 56 1
14 42.07 0.32 0.16 0.70 5 362 54 3
15 13.45 0.30 0.24 2.21 6 146 68 1
16 15.08 2.19 0.08 0.34 5 58 63 2
17 20.48 0.38 1.27 2.40 4 278 52 1
18 12.10 0.83 0.11 0.53 2 146 41 1
19 61.56 0.07 0.09 1.09 9 305 52 3
20 35.06 0.59 0.05 0.67 4 220 54 1
21 33.48 0.68 0.99 1.24 3 102 58 1
22 20.56 0.94 0.06 1.71 3 58 45 2
23 26.46 0.12 0.29 1.60 3 210 55 1
24 24.91 0.56 0.11 0.55 5 108 56 1
25 29.22 0.42 2.60 1.55 3 84 69 1
26 19.30 1.63 0.02 0.78 3 62 69 2
27 14.45 0.22 0.79 1.89 4 245 59 1
28 20.89 0.72 0.04 0.57 4 85 53 1
29 26.70 0.36 1.02 2.05 3 309 45 1
30 27.83 2.66 0.04 0.54 3 52 65 2
31 34.70 0.46 0.83 1.39 5 120 65 1
and the code
library(ggfortify)
p_pca<-d_e_b[c(1,2,3,4,5,6,7)]
p_pca<-na.omit(p_pca)
pca_res <- prcomp(p_pca, scale. = TRUE)
pca_b<-autoplot(pca_res, data = d_e_b, colour = "Profile",
loadings = TRUE, loadings.colour = 'gray30',loadings.size = 5,
loadings.label = TRUE, loadings.label.color='black',
loadings.label.size = 4) + theme_classic()+
scale_colour_discrete("Profile")+
theme(text = element_text(size = 20 ),
axis.line.x = element_line(color="black", size = 1),
axis.line.y = element_line(color="black", size = 1),
axis.text.x=element_text(colour="black",angle = 360,vjust = 0.6),
axis.text.y=element_text(colour="black"))
pca_b
Any ideas on how to solve it?
You can add loadings.label.repel = T inside autoplot() to offset the labels a bit.

Problems with partimat plot in R

I am trying to plot an LDA analysis using partimat function from klaRpackage in R and I am getting this warning message Error in partimat.default(x, grouping, ...) : at least two classes required I am pasting here part of the data to make a reproducible example:
abrev Linfoprolif CORT Testo FDL Ac.GRO ifn.g il.4
1 A 2.00 0.53 1.54 1.65 8 192 68
2 A 13.91 0.65 1.34 2.27 6 195 58
3 A 15.65 0.50 0.07 0.97 5 280 67
4 A 4.96 1.51 1.45 2.54 3 30 48
5 A 0.00 3.18 0.01 0.95 3 60 71
6 A 36.23 0.28 0.88 3.63 7 320 50
7 A 9.15 1.20 0.16 1.32 1 52 74
8 A 17.63 1.68 1.29 1.86 1 47 53
9 A 6.52 2.36 0.03 0.92 4 51 75
113 B 20.48 0.38 1.27 2.40 4 278 52
114 B 12.10 0.83 0.11 0.53 2 146 41
115 B 61.56 0.07 0.09 1.09 9 305 52
116 B 35.06 0.59 0.05 0.67 4 220 54
117 B 33.48 0.68 0.99 1.24 3 102 58
118 B 20.56 0.94 0.06 1.71 3 58 45
119 B 26.46 0.12 0.29 1.60 3 210 55
120 B 24.91 0.56 0.11 0.55 5 108 56
121 B 29.22 0.42 2.60 1.55 3 84 69
122 B 19.30 1.63 0.02 0.78 3 62 69
123 B 14.45 0.22 0.79 1.89 4 245 59
373 D 27.13 0.23 1.03 4.23 6 261 100
374 D 0.00 0.43 0.08 15.34 1 58 69
375 D 17.42 0.27 2.07 7.09 5 184 80
376 D 37.34 0.91 0.08 6.18 6 210 81
377 D 28.19 0.20 3.34 6.82 6 269 105
378 D 8.53 0.61 0.05 5.31 4 98 115
I followed the code posted here like this:
partimat(abrev ~ Linfoprolif + CORT + Testo + FDL+Ac.GRO,+ ifn.g + ifn.g, data=d_e_disc, method="lda")
I can't find my error. Any help is wecome
Your response variable abrev must be factor , so you have to make it of class factor
d_e_disc $abrev <- as.factor(d_e_disc $abrev)
# then apply your code above
#Mohamed Desouky found your problem, abrev should be a factor! Also, there is a small typo in your formula (","), So here you can see a reproducible example to make sure you can reproduce your problem:
library(klaR)
partimat(factor(abrev) ~ Linfoprolif + CORT + Testo + FDL+Ac.GRO + ifn.g + ifn.g, data=d_e_disc, method="lda")
Created on 2022-07-11 by the reprex package (v2.0.1)

Object not found in for loop

I'm trying to figure out why this doesn't work:
data=read.csv("data_risk.csv")
pa1 = c(data$pa1)
pa2 = c(data$pa2)
pb1 = c(data$pb1)
pb2 = c(data$pb2)
a1 = c(data$a1)
a2= c(data$a2)
b1 = c(data$b1)
b2 = c(data$b2)
yy=c(data$choice)
crra=function(x,r){
u=x^(1-r)/(1-r)
return(u)
}
eua = c(pa1*crra(a1,r)+pa2*crra(a2,r))
eub = c(pb1*crra(b1,r)+pb2*crra(b2,r))
LL_all = c()
R<-seq(0,1,0.01)
for (r in R){
eua = c(pa1*crra(a1,r)+pa2*crra(a2,r))
eub = c(pb1*crra(b1,r)+pb2*crra(b2,r))
probA = eua/(eua+eub)
total = ifelse(yy==1, probA, 1-probA)
LL=log(prod(total))
LL_all=c(LL_all,LL)
}
Right now every time I try and run it it says object r not found or error object R not found it works without the for loop just fine but when I add the for loop it all breaks down.
I'm trying to find the value of r that maximises someones utility given two choices. A decision maker chooses option A over B with probability a EUA/(EUA+EUB). In this example r is the risk aversion coefficient and x is the outcome of the lottery.
pa1 = probability of event a1 happening
pa2 = probability of event a2 happening
pb1 = probability of event b1 happening
pb2 = probability of event b2 happening
a1,a2,b1,b2 = outcomes of events
yy= indicator function that takes the value of 1 if lottery a is chosen and 0 otherwise
dataset:
: task pa1 a1 pa2 a2 pb1 b1 pb2 b2 choice
1 0.34 24 0.66 59 0.42 47 0.58 64 0
2 0.88 79 0.12 82 0.20 57 0.80 94 0
3 0.74 62 0.26 0 0.44 23 0.56 31 1
4 0.05 56 0.95 72 0.95 68 0.05 95 1
5 0.25 84 0.75 43 0.43 7 0.57 97 0
6 0.28 7 0.72 74 0.71 55 0.29 63 0
7 0.09 56 0.91 19 0.76 13 0.24 90 0
8 0.63 41 0.37 18 0.98 56 0.02 8 0
9 0.88 72 0.12 29 0.39 67 0.61 63 1
10 0.61 37 0.39 50 0.60 6 0.40 45 1
11 0.08 54 0.92 31 0.15 44 0.85 29 1
12 0.92 63 0.08 5 0.63 43 0.37 53 1
13 0.78 32 0.22 99 0.32 39 0.68 56 0
14 0.16 66 0.84 23 0.79 15 0.21 29 1
15 0.12 52 0.88 73 0.98 92 0.02 19 0
16 0.29 88 0.71 78 0.29 53 0.71 91 1
17 0.31 39 0.69 51 0.84 16 0.16 91 1
18 0.17 70 0.83 65 0.35 100 0.65 50 0
19 0.91 80 0.09 19 0.64 37 0.36 65 1
20 0.09 83 0.91 67 0.48 77 0.52 6 1
21 0.44 14 0.56 72 0.21 9 0.79 31 1
22 0.68 41 0.32 65 0.85 100 0.15 2 0
23 0.38 40 0.62 55 0.14 26 0.86 96 0
24 0.62 1 0.38 83 0.41 37 0.59 24 1
25 0.49 15 0.51 50 0.94 64 0.06 14 0
26 0.10 40 0.90 32 0.10 77 0.90 2 1
27 0.20 40 0.80 32 0.20 77 0.80 2 1
28 0.30 40 0.70 32 0.30 77 0.70 2 1
29 0.40 40 0.60 32 0.40 77 0.60 2 1
30 0.50 40 0.50 32 0.50 77 0.50 2 0
31 0.60 40 0.40 32 0.60 77 0.40 2 0
32 0.70 40 0.30 32 0.70 77 0.30 2 0
33 0.80 40 0.20 32 0.80 77 0.20 2 0
34 0.90 40 0.10 32 0.90 77 0.10 2 0
35 1.00 40 0.00 32 1.00 77 0.00 2 0
The problem in the peace of code below after your definition of crra function:
eua = c(pa1*crra(a1,r)+pa2*crra(a2,r))
eub = c(pb1*crra(b1,r)+pb2*crra(b2,r))
Basically you are trying to use r variable before it's defined moreover it is a duplicate of the code inside the for-loop. If you comment out these two lines everything goes OK. Please see the code below:
data=read.table(text = " task pa1 a1 pa2 a2 pb1 b1 pb2 b2 choice
1 0.34 24 0.66 59 0.42 47 0.58 64 0
2 0.88 79 0.12 82 0.20 57 0.80 94 0
3 0.74 62 0.26 0 0.44 23 0.56 31 1
4 0.05 56 0.95 72 0.95 68 0.05 95 1
5 0.25 84 0.75 43 0.43 7 0.57 97 0
6 0.28 7 0.72 74 0.71 55 0.29 63 0
7 0.09 56 0.91 19 0.76 13 0.24 90 0
8 0.63 41 0.37 18 0.98 56 0.02 8 0
9 0.88 72 0.12 29 0.39 67 0.61 63 1
10 0.61 37 0.39 50 0.60 6 0.40 45 1
11 0.08 54 0.92 31 0.15 44 0.85 29 1
12 0.92 63 0.08 5 0.63 43 0.37 53 1
13 0.78 32 0.22 99 0.32 39 0.68 56 0
14 0.16 66 0.84 23 0.79 15 0.21 29 1
15 0.12 52 0.88 73 0.98 92 0.02 19 0
16 0.29 88 0.71 78 0.29 53 0.71 91 1
17 0.31 39 0.69 51 0.84 16 0.16 91 1
18 0.17 70 0.83 65 0.35 100 0.65 50 0
19 0.91 80 0.09 19 0.64 37 0.36 65 1
20 0.09 83 0.91 67 0.48 77 0.52 6 1
21 0.44 14 0.56 72 0.21 9 0.79 31 1
22 0.68 41 0.32 65 0.85 100 0.15 2 0
23 0.38 40 0.62 55 0.14 26 0.86 96 0
24 0.62 1 0.38 83 0.41 37 0.59 24 1
25 0.49 15 0.51 50 0.94 64 0.06 14 0
26 0.10 40 0.90 32 0.10 77 0.90 2 1
27 0.20 40 0.80 32 0.20 77 0.80 2 1
28 0.30 40 0.70 32 0.30 77 0.70 2 1
29 0.40 40 0.60 32 0.40 77 0.60 2 1
30 0.50 40 0.50 32 0.50 77 0.50 2 0
31 0.60 40 0.40 32 0.60 77 0.40 2 0
32 0.70 40 0.30 32 0.70 77 0.30 2 0
33 0.80 40 0.20 32 0.80 77 0.20 2 0
34 0.90 40 0.10 32 0.90 77 0.10 2 0
35 1.00 40 0.00 32 1.00 77 0.00 2 0", header = TRUE)
pa1 = c(data$pa1)
pa2 = c(data$pa2)
pb1 = c(data$pb1)
pb2 = c(data$pb2)
a1 = c(data$a1)
a2= c(data$a2)
b1 = c(data$b1)
b2 = c(data$b2)
yy=c(data$choice)
crra=function(x,r){
u=x^(1-r)/(1-r)
return(u)
}
# eua = c(pa1*crra(a1,r)+pa2*crra(a2,r))
# eub = c(pb1*crra(b1,r)+pb2*crra(b2,r))
LL_all = c()
R<-seq(0,1,0.01)
for (r in R){
eua = c(pa1*crra(a1,r)+pa2*crra(a2,r))
eub = c(pb1*crra(b1,r)+pb2*crra(b2,r))
probA = eua/(eua+eub)
total = ifelse(yy==1, probA, 1-probA)
LL=log(prod(total))
LL_all=c(LL_all,LL)
}
head(LL_all)
Output:
[1] -18.93759 -18.97863 -19.02000 -19.06170 -19.10374 -19.14611

How to merge three tables by inserting to each other in R?

I have a data frame as following. I want to know the evolution from RIK_T1 to RIK_T2 by seeing their frequency, row% and Column%. How to show them at once?
ID<-c('1','2','3','4','5','6','7','8','9','10')
RIK_T1<-c('20','15','20','20','97','20','20','20','15','15')
RIK_T2<-c('20','15','15','20','97','97','20','20','20','20')
df<-data.frame(ID,RIK_T1,RIK_T2)
df
TAB=table(df$RIK_T1,df$RIK_T2)
t1<-addmargins(TAB) #TABLE-01
TAB_row=prop.table(TAB,1)#row
t2<-round(addmargins(TAB_row),digits=2)#TABLE-01-1
TAB_col=prop.table(TAB,2)#column
t3<-round(addmargins(TAB_col),digits=2)#TABLE-01-2
I get three tables as following:table, row% and col%
15 20 97 Sum
15 1 2 0 3
20 1 4 1 6
97 0 0 1 1
Sum 2 6 2 10
15 20 97 Sum
15 0.33 0.67 0.00 1.00
20 0.17 0.67 0.17 1.00
97 0.00 0.00 1.00 1.00
Sum 0.50 1.33 1.17 3.00
15 20 97 Sum
15 0.50 0.33 0.00 0.83
20 0.50 0.67 0.50 1.67
97 0.00 0.00 0.50 0.50
Sum 1.00 1.00 1.00 3.00
Is it possible to merge them into one table as following?
15 20 97 Sum
R%/C% R%/C% R%/C% R%/C%
15 1 2 0 3
0.33/0.50 0.67/0.33 0.00/0.00 1.00/0.83
20 1 4 1 6
0.17/0.50 0.67/0.67 0.17/0.50 1.00/1.67
97 0 0 1 1
0.00/0.00 0.00/0.00 1.00/0.50 1.00/0.50
Sum 2 6 2 10
0.50/1.00 1.33/1.00 1.17/1.00 3.00/3.00
Thanks in advance.

Referring to other cells in R without using a for loop

I am new to R and one thing I have been told again and again is that there really is no need for for loops. I have had some success with apply but could not figure out how to use it in this instance.
Here is the data I am working with:
Bid Ask Exp Strike Price V6
51 4.95 5.15 NOV1 13 335 5.050 3.08
52 3.40 3.50 NOV1 13 340 3.450 NA
53 2.28 2.42 NOV1 13 345 2.350 NA
54 1.51 1.57 NOV1 13 350 1.540 NA
55 0.99 1.07 NOV1 13 355 1.030 NA
56 0.66 0.71 NOV1 13 360 0.685 NA
57 0.46 0.51 NOV1 13 365 0.485 NA
58 0.33 0.37 NOV1 13 370 0.350 NA
59 0.25 0.28 NOV1 13 375 0.265 NA
60 0.18 0.24 NOV1 13 380 0.210 NA
61 0.11 0.20 NOV1 13 385 0.155 NA
62 0.05 0.17 NOV1 13 390 0.110 NA
63 0.05 0.16 NOV1 13 395 0.105 NA
64 0.07 0.13 NOV1 13 400 0.100 NA
In column 6 (called V6), I want the values to be twice the value in the price column in the cell that is 3 below the current row. For example, Row 1 in Col 6 is 3.08 which is 2*1.54 which is in column 5, row 4. I would like to do this for every cell in row 6 until it runs out in row 12. NA is fine in column 6 after this row.
Here is how I accomplished this:
for (i in 1:11){
data[i,6] <- 2*data[i+3,5]}
Is there a faster/easier/ more appropriate way to do this?
Here is the final data as I want it.
Bid Ask Exp Strike Price V6
51 4.95 5.15 NOV1 13 335 5.050 3.08
52 3.40 3.50 NOV1 13 340 3.450 2.06
53 2.28 2.42 NOV1 13 345 2.350 1.37
54 1.51 1.57 NOV1 13 350 1.540 0.97
55 0.99 1.07 NOV1 13 355 1.030 0.70
56 0.66 0.71 NOV1 13 360 0.685 0.53
57 0.46 0.51 NOV1 13 365 0.485 0.42
58 0.33 0.37 NOV1 13 370 0.350 0.31
59 0.25 0.28 NOV1 13 375 0.265 0.22
60 0.18 0.24 NOV1 13 380 0.210 0.21
61 0.11 0.20 NOV1 13 385 0.155 0.20
62 0.05 0.17 NOV1 13 390 0.110 NA
63 0.05 0.16 NOV1 13 395 0.105 NA
64 0.07 0.13 NOV1 13 400 0.100 NA
Thank you.
use mydata$V6 <- 2 * c(mydata$Price[-(1:3)], rep(NA, 3))
df1 is your data. I used sapply here which should be faster than for loop
df1$V6<-sapply(1:nrow(df1),function(x) 2*df1[x+3,5])

Resources