Why is t() returning a 'vector'? - r

Just trying to some basic matrix algebra in R and I'm getting some weird results that I don't completely understand.
So, my data looks like this:
Wt LvrWt Dose Y
1 176 6.5 0.88 0.42
2 176 9.5 0.88 0.25
3 190 9.0 1.00 0.56
4 176 8.9 0.88 0.23
5 200 7.2 1.00 0.23
6 167 8.9 0.83 0.32
7 188 8.0 0.94 0.37
8 195 10.0 0.98 0.41
9 176 8.0 0.88 0.33
10 165 7.9 0.84 0.38
11 158 6.9 0.80 0.27
12 148 7.3 0.74 0.36
13 149 5.2 0.75 0.21
14 163 8.4 0.81 0.28
15 170 7.2 0.85 0.34
16 186 6.8 0.94 0.28
17 146 7.3 0.73 0.30
18 181 9.0 0.90 0.37
19 149 6.4 0.75 0.46
And here is the code I'm using:
# Creating the X matrix
Xmatrix <- subset(questionOneA, select = -c(Y))
Xmatrix <- matrix(Xmatrix)
Xmatrix <- sapply(Xmatrix, as.numeric)
is.numeric(Xmatrix)
# Transposing the x matrix
Xtranspose <- t(Xmatrix)
Xtranspose <- matrix(Xtranspose)
is.numeric(Xtranspose)
The output of Xmatrix seems correct:
V1 V2 V3
1 176 6.5 0.88
2 176 9.5 0.88
3 190 9.0 1.00
4 176 8.9 0.88
5 200 7.2 1.00
6 167 8.9 0.83
7 188 8.0 0.94
8 195 10.0 0.98
9 176 8.0 0.88
10 165 7.9 0.84
11 158 6.9 0.80
12 148 7.3 0.74
13 149 5.2 0.75
14 163 8.4 0.81
15 170 7.2 0.85
16 186 6.8 0.94
17 146 7.3 0.73
18 181 9.0 0.90
19 149 6.4 0.75
However, the output of Xtranspose seems strange to me:
V1
1 176.00
2 6.50
3 0.88
4 176.00
5 9.50
6 0.88
7 190.00
8 9.00
9 1.00
10 176.00
11 8.90
12 0.88
13 200.00
14 7.20
15 1.00
16 167.00
17 8.90
18 0.83
19 188.00
20 8.00
21 0.94
22 195.00
23 10.00
24 0.98
25 176.00
26 8.00
27 0.88
28 165.00
29 7.90
30 0.84
31 158.00
32 6.90
33 0.80
34 148.00
35 7.30
36 0.74
37 149.00
38 5.20
39 0.75
40 163.00
41 8.40
42 0.81
43 170.00
44 7.20
45 0.85
46 186.00
47 6.80
48 0.94
49 146.00
50 7.30
51 0.73
52 181.00
53 9.00
54 0.90
55 149.00
56 6.40
57 0.75
I was expecting an output with 3 rows and 19 columns. What's happened here that I'm not understanding?
Any help would be appreciated!

You should use as.matrix instead of matrix to convert to matrix, also this can be done in fewer steps.
Xmatrix <- subset(questionOneA, select = -Y)
Xmatrix <- as.matrix(Xmatrix)
Xtranspose <- t(Xmatrix)
Xmatrix
# Wt LvrWt Dose
#1 176 6.5 0.88
#2 176 9.5 0.88
#3 190 9.0 1.00
#4 176 8.9 0.88
#5 200 7.2 1.00
#6 167 8.9 0.83
#7 188 8.0 0.94
#8 195 10.0 0.98
#9 176 8.0 0.88
#10 165 7.9 0.84
#11 158 6.9 0.80
#12 148 7.3 0.74
#13 149 5.2 0.75
#14 163 8.4 0.81
#15 170 7.2 0.85
#16 186 6.8 0.94
#17 146 7.3 0.73
#18 181 9.0 0.90
#19 149 6.4 0.75
Xtranspose
# 1 2 3 4 5 6 7 8
#Wt 176.00 176.00 190 176.00 200.0 167.00 188.00 195.00
#LvrWt 6.50 9.50 9 8.90 7.2 8.90 8.00 10.00
#Dose 0.88 0.88 1 0.88 1.0 0.83 0.94 0.98
# 9 10 11 12 13 14 15 16
#Wt 176.00 165.00 158.0 148.00 149.00 163.00 170.00 186.00
#LvrWt 8.00 7.90 6.9 7.30 5.20 8.40 7.20 6.80
#Dose 0.88 0.84 0.8 0.74 0.75 0.81 0.85 0.94
# 17 18 19
#Wt 146.00 181.0 149.00
#LvrWt 7.30 9.0 6.40
#Dose 0.73 0.9 0.75
See what matrix(Xmatrix) returns :
matrix(Xmatrix)
# [,1]
#[1,] Integer,19
#[2,] Numeric,19
#[3,] Numeric,19

Just check the output from each of your steps, and you will see the matrix becomes a "one column" matrix after this step:
Xtranspose <- matrix(Xtranspose)
This function creates a matrix. If you see the manual of the matrix function you will see that it defaults to nrow=1 and ncol=1.
Your matrix obviously has more elements than would fit in a 1x1 matrix, but creating a matrix isn't really what you would want to do at this point, you would just make sure that the 2-dimensional structure you have, is a matrix, for which as.matrix is better. (But unecessary, it already is a matrix.)
Though I will say, the manual does not explain this specific happening well enough. It does not clearly say what happens if you give matrix() a matrix as input data that has more elements than would fit in the given number of rows and columns you want.
Though it does say this, which is probably applicable to your case:
When coercing a vector, it produces a one-column matrix, and promotes the names (if any) of the vector to the rownames of the matrix.
This is also what you see.

Related

How to change loadings.label in PCA plot using ggplot2?

I am plotting a PCA analysis in ggplot2 and loadings.label overlap with the arrows. I want to move the labels a little to make more accessible the reading of the plot, but I can't find a way to do it. I am attaching the plot below.
here is part of the data:
Linfoprolif CORT Testo FDL Ac.GRO ifn.g il.4 Profile
1 23.76 0.27 0.96 2.41 6 307 69 1
2 NA 2.59 0.07 0.39 4 117 58
3 25.53 0.16 0.71 2.17 5 273 54 1
4 31.67 0.88 0.07 0.55 5 211 48 1
5 6.15 0.24 0.23 1.07 5 224 48 1
6 26.19 0.74 0.04 0.60 4 308 59 1
7 10.31 0.34 0.75 2.29 7 295 49 1
8 22.30 0.42 0.07 0.63 5 271 52 1
9 24.74 0.29 1.18 2.91 4 236 56 1
10 9.51 2.19 0.07 0.40 5 54 62 2
11 22.59 0.19 0.40 3.28 4 272 58 1
12 22.01 0.28 0.04 0.54 4 67 64 1
13 39.21 0.21 0.82 1.91 4 235 56 1
14 42.07 0.32 0.16 0.70 5 362 54 3
15 13.45 0.30 0.24 2.21 6 146 68 1
16 15.08 2.19 0.08 0.34 5 58 63 2
17 20.48 0.38 1.27 2.40 4 278 52 1
18 12.10 0.83 0.11 0.53 2 146 41 1
19 61.56 0.07 0.09 1.09 9 305 52 3
20 35.06 0.59 0.05 0.67 4 220 54 1
21 33.48 0.68 0.99 1.24 3 102 58 1
22 20.56 0.94 0.06 1.71 3 58 45 2
23 26.46 0.12 0.29 1.60 3 210 55 1
24 24.91 0.56 0.11 0.55 5 108 56 1
25 29.22 0.42 2.60 1.55 3 84 69 1
26 19.30 1.63 0.02 0.78 3 62 69 2
27 14.45 0.22 0.79 1.89 4 245 59 1
28 20.89 0.72 0.04 0.57 4 85 53 1
29 26.70 0.36 1.02 2.05 3 309 45 1
30 27.83 2.66 0.04 0.54 3 52 65 2
31 34.70 0.46 0.83 1.39 5 120 65 1
and the code
library(ggfortify)
p_pca<-d_e_b[c(1,2,3,4,5,6,7)]
p_pca<-na.omit(p_pca)
pca_res <- prcomp(p_pca, scale. = TRUE)
pca_b<-autoplot(pca_res, data = d_e_b, colour = "Profile",
loadings = TRUE, loadings.colour = 'gray30',loadings.size = 5,
loadings.label = TRUE, loadings.label.color='black',
loadings.label.size = 4) + theme_classic()+
scale_colour_discrete("Profile")+
theme(text = element_text(size = 20 ),
axis.line.x = element_line(color="black", size = 1),
axis.line.y = element_line(color="black", size = 1),
axis.text.x=element_text(colour="black",angle = 360,vjust = 0.6),
axis.text.y=element_text(colour="black"))
pca_b
Any ideas on how to solve it?
You can add loadings.label.repel = T inside autoplot() to offset the labels a bit.

Problems with partimat plot in R

I am trying to plot an LDA analysis using partimat function from klaRpackage in R and I am getting this warning message Error in partimat.default(x, grouping, ...) : at least two classes required I am pasting here part of the data to make a reproducible example:
abrev Linfoprolif CORT Testo FDL Ac.GRO ifn.g il.4
1 A 2.00 0.53 1.54 1.65 8 192 68
2 A 13.91 0.65 1.34 2.27 6 195 58
3 A 15.65 0.50 0.07 0.97 5 280 67
4 A 4.96 1.51 1.45 2.54 3 30 48
5 A 0.00 3.18 0.01 0.95 3 60 71
6 A 36.23 0.28 0.88 3.63 7 320 50
7 A 9.15 1.20 0.16 1.32 1 52 74
8 A 17.63 1.68 1.29 1.86 1 47 53
9 A 6.52 2.36 0.03 0.92 4 51 75
113 B 20.48 0.38 1.27 2.40 4 278 52
114 B 12.10 0.83 0.11 0.53 2 146 41
115 B 61.56 0.07 0.09 1.09 9 305 52
116 B 35.06 0.59 0.05 0.67 4 220 54
117 B 33.48 0.68 0.99 1.24 3 102 58
118 B 20.56 0.94 0.06 1.71 3 58 45
119 B 26.46 0.12 0.29 1.60 3 210 55
120 B 24.91 0.56 0.11 0.55 5 108 56
121 B 29.22 0.42 2.60 1.55 3 84 69
122 B 19.30 1.63 0.02 0.78 3 62 69
123 B 14.45 0.22 0.79 1.89 4 245 59
373 D 27.13 0.23 1.03 4.23 6 261 100
374 D 0.00 0.43 0.08 15.34 1 58 69
375 D 17.42 0.27 2.07 7.09 5 184 80
376 D 37.34 0.91 0.08 6.18 6 210 81
377 D 28.19 0.20 3.34 6.82 6 269 105
378 D 8.53 0.61 0.05 5.31 4 98 115
I followed the code posted here like this:
partimat(abrev ~ Linfoprolif + CORT + Testo + FDL+Ac.GRO,+ ifn.g + ifn.g, data=d_e_disc, method="lda")
I can't find my error. Any help is wecome
Your response variable abrev must be factor , so you have to make it of class factor
d_e_disc $abrev <- as.factor(d_e_disc $abrev)
# then apply your code above
#Mohamed Desouky found your problem, abrev should be a factor! Also, there is a small typo in your formula (","), So here you can see a reproducible example to make sure you can reproduce your problem:
library(klaR)
partimat(factor(abrev) ~ Linfoprolif + CORT + Testo + FDL+Ac.GRO + ifn.g + ifn.g, data=d_e_disc, method="lda")
Created on 2022-07-11 by the reprex package (v2.0.1)

Can't add points to a plot: Error: A continuous variable can not be mapped to shape

I'm trying to add points to an already generated plot
dataframe used in my original plot looks like this:
> head(df)
fa va ca rs ch fsd tsd d p s a cluster q
1 7.4 0.29 0.50 1.8 0.042 35 127 0.99370 3.45 0.50 10.2 1 7
2 10.0 0.41 0.45 6.2 0.071 6 14 0.99702 3.21 0.49 11.8 2 7
3 7.8 0.26 0.27 1.9 0.051 52 195 0.99280 3.23 0.50 10.9 3 6
4 6.9 0.32 0.30 1.8 0.036 28 117 0.99269 3.24 0.48 11.0 1 6
5 6.8 0.37 0.28 1.9 0.024 64 106 0.98993 3.45 0.60 12.6 1 8
6 6.2 0.25 0.44 15.8 0.057 39 167 0.99804 3.14 0.51 9.2 3 5
I generate the plot like this:
> ggplot(data=df, aes(x=p, y=tsd, color=cluster )) +
+ geom_point() +
+ geom_point(data=centers, aes(x=p,y=tsd, color='Center')) +
+ geom_point(data=centers, aes(x=p,y=tsd, color='Center'), size=52, alpha=.3, show_guide=FALSE)
Now I need to add more points to this plot. The dataframe I want to add looks like this:
> head(df2)
fa va ca rs ch fsd tsd d p s a cluster q
1 6.2 0.23 0.35 0.7 0.051 24 111 0.99160 3.37 0.43 11.0 1 3
2 11.8 0.23 0.38 11.1 0.034 15 123 0.99970 2.93 0.55 9.7 1 3
3 6.6 0.36 0.29 1.6 0.021 24 85 0.98965 3.41 0.61 12.4 1 9
4 9.1 0.27 0.45 10.6 0.035 28 124 0.99700 3.20 0.46 10.4 1 9
5 8.3 0.33 0.42 1.2 0.033 18 96 0.99110 3.20 0.32 12.4 1 3
6 7.4 0.24 0.36 2.0 0.031 27 139 0.99055 3.28 0.48 12.5 1 9
This is the error
> last_plot() + geom_point(data=df2, aes(x=tsd, y=p, shape=5, alpha=.7, size=4.5) , show_guide=FALSE)
Error: A continuous variable can not be mapped to shape

Referring to other cells in R without using a for loop

I am new to R and one thing I have been told again and again is that there really is no need for for loops. I have had some success with apply but could not figure out how to use it in this instance.
Here is the data I am working with:
Bid Ask Exp Strike Price V6
51 4.95 5.15 NOV1 13 335 5.050 3.08
52 3.40 3.50 NOV1 13 340 3.450 NA
53 2.28 2.42 NOV1 13 345 2.350 NA
54 1.51 1.57 NOV1 13 350 1.540 NA
55 0.99 1.07 NOV1 13 355 1.030 NA
56 0.66 0.71 NOV1 13 360 0.685 NA
57 0.46 0.51 NOV1 13 365 0.485 NA
58 0.33 0.37 NOV1 13 370 0.350 NA
59 0.25 0.28 NOV1 13 375 0.265 NA
60 0.18 0.24 NOV1 13 380 0.210 NA
61 0.11 0.20 NOV1 13 385 0.155 NA
62 0.05 0.17 NOV1 13 390 0.110 NA
63 0.05 0.16 NOV1 13 395 0.105 NA
64 0.07 0.13 NOV1 13 400 0.100 NA
In column 6 (called V6), I want the values to be twice the value in the price column in the cell that is 3 below the current row. For example, Row 1 in Col 6 is 3.08 which is 2*1.54 which is in column 5, row 4. I would like to do this for every cell in row 6 until it runs out in row 12. NA is fine in column 6 after this row.
Here is how I accomplished this:
for (i in 1:11){
data[i,6] <- 2*data[i+3,5]}
Is there a faster/easier/ more appropriate way to do this?
Here is the final data as I want it.
Bid Ask Exp Strike Price V6
51 4.95 5.15 NOV1 13 335 5.050 3.08
52 3.40 3.50 NOV1 13 340 3.450 2.06
53 2.28 2.42 NOV1 13 345 2.350 1.37
54 1.51 1.57 NOV1 13 350 1.540 0.97
55 0.99 1.07 NOV1 13 355 1.030 0.70
56 0.66 0.71 NOV1 13 360 0.685 0.53
57 0.46 0.51 NOV1 13 365 0.485 0.42
58 0.33 0.37 NOV1 13 370 0.350 0.31
59 0.25 0.28 NOV1 13 375 0.265 0.22
60 0.18 0.24 NOV1 13 380 0.210 0.21
61 0.11 0.20 NOV1 13 385 0.155 0.20
62 0.05 0.17 NOV1 13 390 0.110 NA
63 0.05 0.16 NOV1 13 395 0.105 NA
64 0.07 0.13 NOV1 13 400 0.100 NA
Thank you.
use mydata$V6 <- 2 * c(mydata$Price[-(1:3)], rep(NA, 3))
df1 is your data. I used sapply here which should be faster than for loop
df1$V6<-sapply(1:nrow(df1),function(x) 2*df1[x+3,5])

Curve fitting in R using nls

I'm trying to fit a curve over (the tail of) the following data:
[1] 1 1 1 1 1 1 2 1 2 2 3 2 1 1 4 3 2 11 6 2 16 7 17 36
[25] 27 39 41 33 42 66 92 138 189 249 665 224 309 247 641 777 671 532 749 506 315 292 281 130
[49] 137 91 40 27 34 19 1
I'm using the following function in R to accomplish this:
nls(y~a*x*exp(-b*x^2),start=list(a=1,b=1),trace=TRUE)
However, I'm getting the following error:
3650202 : 1 1
Error in numericDeriv(form[[3L]], names(ind), env) :
Missing value or an infinity produced when evaluating the model
When using the following, artificial values for x and y, everything works just fine:
y=x*exp(-.5*x^2)+rnorm(length(x),0,0.1)
x
[1] 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90
[20] 0.95 1.00 1.05 1.10 1.15 1.20 1.25 1.30 1.35 1.40 1.45 1.50 1.55 1.60 1.65 1.70 1.75 1.80 1.85
[39] 1.90 1.95 2.00 2.05 2.10 2.15 2.20 2.25 2.30 2.35 2.40 2.45 2.50 2.55 2.60 2.65 2.70 2.75 2.80
[58] 2.85 2.90 2.95 3.00 3.05 3.10 3.15 3.20 3.25 3.30 3.35 3.40 3.45 3.50 3.55 3.60 3.65 3.70 3.75
[77] 3.80 3.85 3.90 3.95 4.00 4.05 4.10 4.15 4.20 4.25 4.30 4.35 4.40 4.45 4.50 4.55 4.60 4.65 4.70
[96] 4.75 4.80 4.85 4.90 4.95 5.00
y
[1] -0.080214106 0.075247488 0.076355116 -0.020087646 0.181314038 0.075832658 0.248303254
[8] 0.364244010 0.453655908 0.347854869 0.514373164 0.384051249 0.618584696 0.515684390
[15] 0.534737770 0.609279111 0.618936091 0.534443863 0.739118585 0.677679546 0.526011452
[22] 0.645645150 0.578274968 0.589619834 0.476186241 0.621638333 0.601663144 0.535981735
[29] 0.518434367 0.581735107 0.423872948 0.445335110 0.340884242 0.317121065 0.342683141
[36] 0.278351104 0.402947372 0.429483276 0.276655872 0.108164828 0.389994138 0.372300257
[43] -0.057320612 0.131271986 0.226212869 0.131171973 0.245970674 0.009926555 0.173465207
[50] 0.141220590 0.280616078 0.108515613 0.117697407 0.130700771 0.058540888 0.251613512
[57] 0.168094899 -0.058382571 0.123306762 -0.048605186 -0.010131767 0.076701962 -0.051982924
[64] 0.058427540 0.144665070 0.063998841 -0.010495697 0.119868854 0.114447318 0.006759691
[71] 0.025041761 -0.178145771 0.041547126 0.122084819 0.034283141 0.209140060 0.197024853
[78] -0.005491966 -0.033260219 -0.028123314 -0.005775553 -0.040781462 0.090024896 0.116390743
[85] -0.017811031 0.094039200 -0.147064060 -0.057249278 0.211587898 -0.066153592 0.032100332
[92] -0.092756136 -0.125906598 0.136937364 0.046453010 0.002000336 -0.134047101 0.089748847
[99] -0.019355567 -0.042158950 0.149594368
Can anyone point out what I'm doing wrong? Thanks for your help.
Well I found the answer to my problem. The starting values for the real data are completely different from the dummy values: a=500 and b=.1 result in a nice fit. Just thought it might be useful to mention that here.

Resources