How to fix the error "Subscript out of bounds" - r

I have a question about fixing the error:
"subscript out of bounds".
I am analyzing data of an eye-tracking experiment. You may find example data below:
Stimulus Timebin Language Percentage on AOI
1 11 L1 0.80
1 11 L2 0.60
1 12 L1 0.80
1 12 L2 0.50
1 13 L1 0.83
1 13 L2 0.50
...
10 37 L1 0.00
10 37 L2 0.50
10 38 L1 0.70
10 38 L2 0.50
10 39 L1 0.60
10 39 L2 0.70
10 40 L1 0.75
10 40 L2 0.89
...
I would like to do a Growth curve analysis with the Language and Timebin as independent variables and percentage on Area of Interest (AOI) as dependent variable. Besides, the Stimulus as random factor. I got 40 timebins for each stimulus and condition. In order to avoid the potential problem of collinearity, I want to create orthogonalized polynomials. The code below was used to create independent (orthogonal) polynomial time terms (linear, quadratic, and cubic).
Gaze_1_Poly <- poly((unique(Gaze_1$timebin)), 3)
Gaze_1[,paste("ot", 1:3, sep="")] <- Gaze_1_Poly[Gaze_1$timebin, 1:3]
I always get an error told me that there is a Out of Bounds Subscript.
Error in Gaza_1_Poly[Gaze_1$timebin, :
subscript out of bounds
So I checked the class of variables and I think it is of no problem:
Stimulus Timebin Language percentage on AOI
"character" "integer" "factor" "numeric"
I can not figure out the reason. Can someone give me a hand?

See comment above. Let me know if this is what you had in mind.
library(dplyr)
Gaze_1 %>%
left_join(data.frame(Timebin = unique(.$Timebin), poly(unique(.$Timebin), degree = 3)),
by = 'Timebin') %>%
setNames(c("Stimulus", "Timebin", "Language", "Percentage on AOI", "ot1", "ot2", "ot3"))

Related

want to expand a large bipartite network plot avoid vertices overlapped

I was plotting a bipartite graph using igraph package with R. There are about 10,000 edges, I want to expand the width of the whole plot to avoid state vertices overlapped.
my data looks like this:
> test2
user_id state meanlat meanlon countUS countS degState
<chr> <chr> <dbl> <dbl> <int> <int> <int>
1 -_1ctLaz3jhPYc12hKXsEQ NC 35.19401 -80.83235 909 3 18487
2 -_1ctLaz3jhPYc12hKXsEQ NV 36.11559 -115.18042 29 3 37884
3 -_1ctLaz3jhPYc12hKXsEQ SC 35.05108 -80.96166 4 3 665
4 -0wUMy3vgInUD4S6KJInnw IL 40.11227 -88.22955 2 3 1478
5 -0wUMy3vgInUD4S6KJInnw NV 36.11559 -115.18042 23 3 37884
6 -0wUMy3vgInUD4S6KJInnw WI 43.08051 -89.39835 20 3 3963
and below is my code on graph creating and setting.
g2 <- graph_from_data_frame(test2,directed = F)
V(g2)$type <- ifelse(names(V(g2)) %in% UserStateR$user_id, 'user', 'state')
V(g2)$label <- ifelse(V(g2)$type == 'user', " ", paste(names(V(g2)),"\n",as.character(test2$degState),sep=""))
V(g2)$size <- ifelse(V(g2)$type == 'user', 3, 20)
V(g2)$color <- ifelse(V(g2)$type == 'user', 'wheat', 'salmon')
V(g2)$type <- ifelse(names(V(g2)) %in% UserStateR$user_id, T, F )
E(g2)$color <- heat.colors(8)[test2$countS]
plot(g2,layout=layout.bipartite(g2, types = names(V(g2)) %in% UserStateR$state, hgap = 50, vgap = 50))
as you can see, I have tried to change the hgap and vgap arguments, but it doesn't work apparently. I have also tried asp argument, but that is not what I want.
I know this might be too late for #floatsd but I was struggling with this today and had a really hard time finding an answer, so this might help others out.
First, in general, there is a an attribute to iplot.graph called asp that very simply controls how rectangular your plot is. Simply do
l=layout.bipartite(CCM_net)
plot(CCM_net, layout=l, asp=0.65)
for a wide plot. asp smaller than 1 gives you a wide plot, asp larger than 1 a tall plot.
However, this might still not give you the layout you want. The bipartite command basically generates a matrix with coordinates for your vertices, and I actually don't understand yet how it comes up with the x-coordinates, so I ended up changing them myself.
Below the example (I am assuming you know how to turn your data into data frames with the edge list and edge/vertex attributes for making graphs so am skipping that).
My data is CCM_data_sign and is
from to value
2 EVI MAXT 0.67
4 EVI MINT 0.81
5 EVI P 0.70
7 EVI SM 0.79
8 EVI AMO 0.86
11 MAXT EVI 0.81
18 MAXT AMO 0.84
21 MEANT EVI 0.88
28 MEANT AMO 0.83
29 MEANT PDO 0.71
31 MINT EVI 0.96
39 MINT PDO 0.78
40 MINT MEI 0.66
41 P EVI 0.91
49 P PDO 0.77
50 P MEI 0.71
51 PET EVI 0.90
58 PET AMO 0.89
59 PET PDO 0.70
61 SM EVI 0.94
68 SM AMO 0.90
69 SM PDO 0.81
70 SM MEI 0.73
74 AMO MINT 0.93
76 AMO PET 0.66
79 AMO PDO 0.71
80 AMO MEI 0.83
90 PDO MEI 0.82
The data frame I generated for graphing is called CCM_net.
First a bipartite plot without any layout adjustments
V(CCM_net)$size<-30
l=layout.bipartite(CCM_net)
plot(CCM_net,
layout=l,
edge.arrow.size=1,
edge.arrow.width=2,
vertex.label.family="Helvetica",
vertex.label.color="black",
vertex.label.cex=2,
vertex.label.dist=c(3,3,3,3,3,3,3,3,3,3,3),
vertex.label.degree=c(pi/2,-pi/2,-pi/2,-pi/2,-pi/2,-pi/2,-pi/2,-pi/2,pi/2,pi/2,pi/2), #0 is right, “pi” is left, “pi/2” is below, and “-pi/2” is above
edge.lty=1)
This gives you the following
If I use asp I get the following
plot(CCM_net,
layout=l,
edge.arrow.size=1,
vertex.label.family="Helvetica",
vertex.label.color="black",
vertex.label.cex=2,
vertex.label.dist=c(3,3,3,3,3,3,3,3,3,3,3),
vertex.label.degree=c(pi/2,-pi/2,-pi/2,-pi/2,-pi/2,-pi/2,-pi/2,-pi/2,pi/2,pi/2,pi/2), #0 is right, “pi” is left, “pi/2” is below, and “-pi/2” is above
edge.arrow.width=2,
edge.lty=1,
asp=0.6) # controls how rectangular the plot is. <1 = wide, >1 = tall
dev.off()
This is looking better, but still not really what I want - see how some vertices are closer to each other than others?
So eventually I took the following approach. Setting the coordinates as bipartite looks like this
coords <- layout_as_bipartite(CCM_net)
coords
[,1] [,2]
[1,] 3.0 0
[2,] 0.0 1
[3,] 2.0 1
[4,] 3.5 1
[5,] 6.0 1
[6,] 1.0 1
[7,] 5.0 1
[8,] 7.0 1
[9,] 1.0 0
[10,] 4.5 0
[11,] 5.5 0
This matrix shows the x coordinates of your vertices in the first columns and the y coordinates in the second column, ordered according to your list with names. My list with names is
id name
1 EVI EVI
2 MAXT MAXT
3 MEANT MEANT
4 MINT MINT
5 P P
6 PET PET
7 SM SM
8 SR SR
9 AMO AMO
10 PDO PDO
11 MEI MEI
In my graph, EVI, AMO and PDO are on the bottom, but note their x coordinates: 3.0, 1.0, 4.5 and 5.5. I haven't figured out yet how the code comes up with that, but I don't like it so I simply changed it.
coords[,1]=c(2,0,4,8,12,16,20,24,9,16,24)
Now the plotting code (also with asp) and the output becomes
plot(CCM_net,
layout=coords,
edge.arrow.size=1,
vertex.label.family="Helvetica",
vertex.label.color="black",
vertex.label.cex=1,
vertex.label.dist=c(4,4,4,4,4,4,4,4,4,4,4),
vertex.label.degree=c(pi/2,-pi/2,-pi/2,-pi/2,-pi/2,-pi/2,-pi/2,-pi/2,pi/2,pi/2,pi/2), #0 is right, “pi” is left, “pi/2” is below, and “-pi/2” is above
edge.arrow.width=2,
edge.lty=1,
asp=0.6) # controls how rectangular the plot is. <1 = wide, >1 = tall
Now the vertices are nicely spaced in a rectangular plot!
Note - I also decreased the size of the vertices, the size of the labels and their positioning, for better readability.
I think you can output with PDF. then zoom in.
Or, use rgexf package to output gexf file. Then visualizate in gephi.
I think gephi is a good tools for network visualization.

R inference from one matrix to a data frame

I think this may be a very simple and easy question, but since I'm new to R, I hope someone can give me some outlines of how to solve it step by step. Thanks!
So the question is if I have a (n * 2) matrix (say m) where the first column representing the index of the data in another data frame (say d) and the second column representing some value(p value).
What i want to do is if the p value of some row r in m is less than 0.05,I will plot the data in d by the index indicated in the first column in row r of matrix m.
..............
The data is somewhat like what I draw below:
m:
ind p_value
2 0.02
23 0.03
56 0.12
64 0.54
105 0.04
d:
gene_id s1 s2 s3 s4 ... sn
IDH1 0.23 3.01 0 0.54 ... 4.02
IDH2 0.67 0 8.02 10.54 ... 0.72
...
so IDH2 is corresponding to the first line in m whose index column is 2
toplot <- d[ m[ m[,'p_value'] < .05,'ind'], ] works!

ANOVA of subsetted data

I am manipulating a data set comprising several factors with several variables. The idea is that I want to do ANOVA analysis between factor levels nested within one level of another factor.
Here is an example similar to my data set:
treatment category trial individual response
1 A big 1 F1 0.10
2 A big 2 F1 0.20
3 A big 1 F2 0.30
4 A big 2 F2 0.11
5 A small 1 F3 0.12
6 A small 2 F3 0.13
7 A small 1 F4 0.20
8 A small 2 F4 0.30
9 B big 1 F5 0.40
10 B big 2 F5 0.21
11 B big 1 F6 0.22
12 B big 2 F6 0.23
13 B small 1 F7 0.31
14 B small 2 F7 0.32
15 B small 1 F8 0.34
16 B small 2 F8 0.25
So basically, I'd like to do an ANOVA between big and small when treatment is A, then B, then same idea with ANOVA between big and small when treatment is A and trial 1... you get the logic.
It seems I have to use:
anova(lm(Y~x,data=dataset))
and add a subset argument, but I can't work the logic out of it and I can't find any example similar to mine. Any hint for it? Thank you in advance!
By your description, you want to apply separated ANOVAs to different subsets of your data.
Try this:
df1 <- df[df$treatment=="A",]
df2 <- df[df$treatment=="B",]
aov(response ~ category, data=df1)
aov(response ~ category, data=df2)
If you are interested in the effect of factor treatment, maybe you should keep it in a more complex model and use a posthoc to test differences within treatment A and B. But it's just a suggestion.

Subtracting Values in Previous Rows: Ecological Lifetable Construction

I was hoping I could get some help. I am constructing a life table, not for insurance, but for ecology (a cross-sectional of the population of a any kind of wild fauna), so essentially censoring variables like smoker/non-smoker, pregnant, gender, health-status, etc.:
AgeClass=C(1,2,3,4,5,6)
SampleSize=c(100,99,87,46,32,19)
for(i in 1:6){
+ PropSurv=c(Sample/100)
+ }
> LifeTab1=data.frame(cbind(AgeClass,Sample,PropSurv))
Which gave me this:
ID AgeClas Sample PropSurv
1 1 100 1.00
2 2 99 0.99
3 3 87 0.87
4 4 46 0.46
5 5 32 0.32
6 6 19 0.19
I'm now trying to calculate those that died in each row (DeathInt) by taking the initial number of those survived and subtracting it by the number below it (i.e. 100-99, then 99-87, then 87-46, so on and so forth). And try to look like this:
ID AgeClas Sample PropSurv DeathInt
1 1 100 1.00 1
2 2 99 0.99 12
3 3 87 0.87 41
4 4 46 0.46 14
5 5 32 0.32 13
6 6 19 0.19 NA
I found this and this, and I wasn't sure if they answered my question as these guys subtracted values based on groups. I just wanted to subtract values by row.
Also, just as a side note: I did a for() to get the proportion that survived in each age group. I was wondering if there was another way to do it or if that's the proper, easiest way to do it.
Second note: If any R-users out there know of an easier way to do a life-table for ecology, do let me know!
Thanks!
If you have a vector x, that contains numbers, you can calculate the difference by using the diff function.
In your case it would be
LifeTab1$DeathInt <- c(-diff(Sample), NA)

Backward selection in LME, singularity in backsolve occured

I have data, where "speed of flight" is a response variable and group (experimental/control), test (first/second), FL (fuel loads, % from lean body mass: from 0 to ~25%), wing (wing length in mm). Since we have tested same birds twice (first and second test, experimental group was infected), I want to perform the mixed model (add a random term ~1|ring). I also added the weight parameter for the test variable because of heteroscedasticity.
mod<-lme(speed~test* group * FL * wing,weight=~1|test,random=~1|ring,data=data,method="ML")
This is how the full model looks like (I use nlme package). After that I start the backward selection. I do it manually (according to the lowest AIC) and then check the result with a function stepAIC (MASS package). In this case first two steps of selection are well, but when I start with the model:
mod3<-lme(speed~test+group + FL + wing+ test:group + group:FL + FL:wing + test:group:wing, weight=~1|test,random=~1|ring,data=data,method="ML")
I got an error:
Error in MEEM(object, conLin, control$niterEM) :
Singularity in backsolve at level 0, block 1
As far as I understand, it means that not all interactions of factors exist. But then I should have got the same error already with the full model. And with other response variables it works well. If any of you have an idea, I would be glad!
original data
ring group wing speed_aver FL test
1 XZ13125 E 75 0.62 16.2950000 first
2 XZ13125 E 75 0.22 12.5470149 second
3 XZ13126 E 68 0.39 7.7214876 first
4 XZ13127 C 75 0.52 9.1573643 first
5 XZ13127 C 75 0.17 -1.9017391 second
6 XZ13129 C 73 0.46 10.3821705 first
7 XZ13129 C 73 0.33 -0.5278261 second
8 XZ13140 C 73 0.48 13.0774436 first
9 XZ13140 C 73 0.27 18.0092199 second
10 XZ13144 C 73 0.36 7.5144000 first
11 XZ13144 C 73 0.36 9.6820312 second
12 XZ13146 E 73 0.32 14.3651852 first
13 XZ13146 E 73 0.28 20.8171233 second
14 XZ13159 C 74 0.55 20.2760274 first
15 XZ13159 C 74 0.37 19.1687500 second
16 XZ13209 E 72 0.35 8.1464000 first
17 XZ13209 E 72 0.43 10.9945736 second
18 XZ13213 E 74 0.57 5.3682927 first
19 XZ13213 E 74 0.26 1.3584746 second
20 XZ13220 C 73 0.30 6.0105691 first
21 XZ13220 C 73 0.36 -8.0439252 second
22 XZ13230 E 74 0.44 5.3682927 first
23 XZ13230 E 74 0.31 3.0025000 second
24 XZ13231 C 75 0.28 6.2504000 first
25 XZ13231 C 75 0.37 7.7267717 second
26 XZ13232 C 74 0.34 16.8592857 first
27 XZ13232 C 74 0.33 13.7800000 second
28 XZ13271 C 73 0.32 16.2268116 first
29 XZ13271 C 73 0.28 14.3651852 second
30 XZ13278 E 72 0.45 15.5757353 first
31 XZ13278 E 72 0.37 14.9503704 second
32 XZ13280 C 74 0.33 15.0386861 first
33 XZ13280 C 74 0.36 7.6214286 second
34 XZ13340 E 73 0.62 16.8294964 first
35 XZ13340 E 73 0.26 13.7261194 second
36 XZ13367 E 75 0.42 23.4071895 first
37 XZ13370 E 71 0.25 13.6159091 first
This is pretty tricky as it turns out. I think the problem is that due to the way you're constructing your second formula, R is not automatically removing collinear variables from the model matrix.
tl;dr this is a bit stream-of-consciousness, but I think the basic take-home points are
lme doesn't necessarily check/handle aliasing in a model specification for you (unlike lm, or to a lesser extent lmer)
you can get in trouble with R's formulas if you violate marginality, which you've done here by including the test:group:wing interaction without including the group:wing and test:wing interactions. R lets you do this, but the model doesn't necessarily make sense ... I'm a little bit surprised you ended up with this model specification -- usually stepAIC, and drop1, and R's other built-in model simplification tools, try to respect marginality and thus wouldn't let you end up here ...
if you really want to fit these kinds of models, use lmer (although dealing with heteroscedasticity is harder), or construct your own numeric dummy variables with model.matrix() ...
checking out these kinds of aliasing problems can best be done with model.matrix(), outside the scope of the model-fitting (lm/lme/lmer) function itself ...
For simplicity I'm going to leave out the variance model (weights=varIdent(form=~1|test)) as it doesn't seem to be relevant to this specific problem (I didn't know that a priori, but tests with and without it didn't differ).
library("nlme")
form1 <- speed_aver~test* group * FL * wing
form2 <- speed_aver~test+group + FL + wing+
test:group + group:FL + FL:wing +
test:group:wing
mod <- lme(form1,random=~1|ring,data=dd,method="ML") ## OK
update(mod,form2)
## fails with "Singularity in backsolve" error
What if we try it with lme4?
## ugh, I wish I knew a better way to append to a formula
form1L <- formula(paste(deparse(form1),"(1|ring)",sep="+"))
form2L <- formula(paste(deparse(form2,width=100),"(1|ring)",sep="+"))
library("lme4")
mod2 <- lmer(form1L, data=dd)
mod3 <- lmer(form2L, data=dd)
## fixed-effect model matrix is rank deficient so dropping 1 column / coefficient
Aha! lmer automatically detects that the model matrix is rank-deficient. lm does this automatically and substitutes NA values for the aliased terms. At present lmer just drops them, although with reasonably recent versions of lme4 the (documented but unadvertised) option add.dropped=TRUE to fixef() will put the NA values back in the appropriate places.
So let's investigate the model matrices:
X0 <- model.matrix(form1,data=dd)
c(rankMatrix(X0)==ncol(X0)) ## TRUE; both are 16
X <- model.matrix(form2,data=dd)
c(rankMatrix(X))==ncol(X) ## FALSE; 11<12
Try to identify aliased columns: 12th element of svd(X)$d is tiny (1e-15)
ss <- svd(X)
(zz <- zapsmall(ss$v[,12])) ## elements of collinear grouping
## [1] 0.0000000 0.0000000 0.0000000 0.0000000 -0.4472136 0.0000000
## [7] 0.0000000 0.0000000 0.4472136 0.4472136 0.4472136 0.4472136
So the sum of columns 9-12 is exactly the same as column 5 (same values, oppositite signs). What's going on here?
colnames(X)[zz!=0]
## [1] "wing" "testfirst:groupC:wing" "testsecond:groupC:wing"
## [4] "testfirst:groupE:wing" "testsecond:groupE:wing"
It looks like we somehow got all of the levels of the test-by-group interaction interacting with wing, along with the wing variable itself ...
mm <- X[,zz!=0]
colnames(mm) <- gsub("(test|group|:wing)","",colnames(mm))
head(mm)
## wing first:C second:C first:E second:E
## 1 75 0 0 75 0
## 2 75 0 0 0 75
## 3 68 0 0 68 0
## 4 75 75 0 0 0
## 5 75 0 75 0 0
## 6 73 73 0 0 0
I'm still not 100% sure why this happens, but you can see that R expands the three-way interaction include all four levels of the two-way interaction (which in turn interact with the continuous wing variable), but it's also got wing
colnames(X)
## [1] "(Intercept)" "testsecond" "groupE"
## [4] "FL" "wing" "testsecond:groupE"
## [7] "groupE:FL" "FL:wing" "testfirst:groupC:wing"
## [10] "testsecond:groupC:wing" "testfirst:groupE:wing"
## "testsecond:groupE:wing"
colnames(X0)
## [1] "(Intercept)" "testsecond"
## [3] "groupE" "FL"
## [5] "wing" "testsecond:groupE"
## [7] "testsecond:FL" "groupE:FL"
## [9] "testsecond:wing" "groupE:wing"
## [11] "FL:wing" "testsecond:groupE:FL"
## [13] "testsecond:groupE:wing" "testsecond:FL:wing"
## [15] "groupE:FL:wing" "testsecond:groupE:FL:wing"
If we define a model that respects marginality, then we're OK again ...
form3 <- speed_aver~test*group*wing+FL*(group+wing)
X1 <- model.matrix(form3,dd)
c(rankMatrix(X1)== ncol(X1)) ## TRUE
And we can replicate the problem more simply this way:
form4 <- speed_aver~wing+test:group:wing
X2 <- model.matrix(form4,dd)
c(rankMatrix(X2)== ncol(X2)) ## FALSE
this model has the three-way interaction (explicitly), but is missing the two-way interaction. If we used ~wing*test*group, or even ~wing+wing*test*group, we would be OK ...
form5 <- speed_aver~wing+test*group*wing
X3 <- model.matrix(form5,dd)
c(rankMatrix(X3)== ncol(X3)) ## TRUE

Resources