Refer to variable by part of the variable name - r

It seems that, in R, I can refer to a variable with part of a variable name. But I am confused about why I can do that.
Use the following code as an example:
library(car)
scatterplot(housing ~ total)
house.lm <- lm(housing ~ total)
summary(house.lm)
str(summary(house.lm))
summary(house.lm)$coefficients[2,2]
summary(house.lm)$coe[2,2]
When I print the structure of summary(house.lm), I got the following output:
> str(summary(house.lm))
List of 11
$ call : language lm(formula = housing ~ total)
$ terms :Classes 'terms', 'formula' language housing ~ total
.. ..- attr(*, "variables")= language list(housing, total)
.. ..- attr(*, "factors")= int [1:2, 1] 0 1
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:2] "housing" "total"
.. .. .. ..$ : chr "total"
.. ..- attr(*, "term.labels")= chr "total"
.. ..- attr(*, "order")= int 1
.. ..- attr(*, "intercept")= int 1
.. ..- attr(*, "response")= int 1
.. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. ..- attr(*, "predvars")= language list(housing, total)
.. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
.. .. ..- attr(*, "names")= chr [1:2] "housing" "total"
$ residuals : Named num [1:162] -8.96 -11.43 3.08 8.45 2.2 ...
..- attr(*, "names")= chr [1:162] "1" "2" "3" "4" ...
$ coefficients : num [1:2, 1:4] 28.4523 0.0488 10.2117 0.0103 2.7862 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:2] "(Intercept)" "total"
.. ..$ : chr [1:4] "Estimate" "Std. Error" "t value" "Pr(>|t|)"
$ aliased : Named logi [1:2] FALSE FALSE
..- attr(*, "names")= chr [1:2] "(Intercept)" "total"
$ sigma : num 53.8
$ df : int [1:3] 2 160 2
$ r.squared : num 0.123
$ adj.r.squared: num 0.118
$ fstatistic : Named num [1:3] 22.5 1 160
..- attr(*, "names")= chr [1:3] "value" "numdf" "dendf"
$ cov.unscaled : num [1:2, 1:2] 3.61e-02 -3.31e-05 -3.31e-05 3.67e-08
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:2] "(Intercept)" "total"
.. ..$ : chr [1:2] "(Intercept)" "total"
- attr(*, "class")= chr "summary.lm"
However, it seems that I can refer to the variable coefficients with all of the following commands:
summary(house.lm)$coe[2,2]
summary(house.lm)$coef[2,2]
summary(house.lm)$coeff[2,2]
summary(house.lm)$coeffi[2,2]
summary(house.lm)$coeffic[2,2]
summary(house.lm)$coeffici[2,2]
summary(house.lm)$coefficie[2,2]
summary(house.lm)$coefficien[2,2]
summary(house.lm)$coefficient[2,2]
summary(house.lm)$coefficients[2,2]
They all give the same results: 0.01029709
Therefore, I was wondering when I can refer to a variable with only part of its name in R?

You can do it when rest of name is unambiguous. For example
df <- data.frame(abcd = c(1,2,3), xyz = c(4,5,6), abc = c(5,6,7))
> df$xy
[1] 4 5 6
> df$ab
NULL
> df$x
[1] 4 5 6
df$xy and even df$x gives right data, but df$ab results in NULL because it can refer to both df$abc and df$abcd. It's like when you type df$xy in RStudio and press Ctrl + Space you will get rigtht variable name, so you could refer to part of variable name.

http://adv-r.had.co.nz/Functions.html#lexical-scoping
When calling a function you can specify arguments by position, by
complete name, or by partial name. Arguments are matched first by
exact name (perfect matching), then by prefix matching, and finally by
position.
When you are doing quick coding to analyse some data, using partial names is not a problem, but I tend to agree, it's not good when writing code. In a package you can't do that, R-CMD check will find every occurence.

Related

R package RSiena, goodness of fit for multiple networks at the same time

I'm relatively new to R and I'm running network and behavior coevolution models using the R Package RSiena.
My data set consists of around 100 networks and for each of these networks, I run one RSiena model.
ans.1 <- siena07(myalgorithm, data=mydata.1, effects=myeff.1, batch=TRUE)
...
ans.100 <- siena07(myalgorithm, data=mydata.100, effects=myeff.100, batch=TRUE)
Now I want to test the goodness of fit for each of the multiple network models. I actually know how to check the goodness of fit for a single model.
gof <- sienaGOF(ans.1, verbose=TRUE, varName="Friend", IndegreeDistribution)
plot(gof)
But I don't know how to combine the GOF results of all 100 models to get an overall impression. How can I get a table with the model number and the p-values. Or can I plot the results for all models within one plot? Or is there a better way?
So far I tried to put the GOF results in a list:
goftest <-list()
goftest[[1]] <- sienaGOF(ans.1, verbose=TRUE, varName="Friend", IndegreeDistribution)
...
goftest[[100]] <- sienaGOF(ans.100, verbose=TRUE, varName="Friend", IndegreeDistribution)
plot(goftest)
goftest[[1]] #Output:
"Siena Goodness of Fit ( IndegreeDistribution ), all periods
=====
Monte Carlo Mahalanobis distance test p-value: 0.941
-----
One tailed test used (i.e. estimated probability of greater distance than observation).
-----
Calculated joint MHD = ( 14.4 ) for current model."
str(goftest[[1]])#Output:
"List of 1
$ Joint:List of 8
..$ p : num 0.941
..$ SimulatedTestStat: Named num [1:2000] 9.97 16.02 6.83 10.14 8.65 ...
.. ..- attr(*, "names")= chr [1:2000] "1" "2" "3" "4" ...
..$ ObservedTestStat : num 2.09
..$ TwoTailed : logi FALSE
..$ Simulations : int [1:2000, 1:9] 21 22 22 21 19 26 30 23 25 26 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:2000] "1" "2" "3" "4" ...
.. .. ..$ : NULL
..$ Observations : int [1, 1:9] 26 48 63 73 76 78 78 78 78
..$ InvCovSimStats : num [1:9, 1:9] 13.2509 4.9587 1.2948 0.231 0.0895 ...
..$ Rank : int 9
.. ..- attr(*, "method")= chr "tolNorm2"
.. ..- attr(*, "useGrad")= logi FALSE
.. ..- attr(*, "tol")= num 2e-15
..- attr(*, "class")= chr "sienaGofTest"
..- attr(*, "sienaFitName")= chr "sienaFitObject"
..- attr(*, "auxiliaryStatisticName")= chr "IndegreeDistribution"
..- attr(*, "key")= chr [1:9] "0" "1" "2" "3" ...
- attr(*, "class")= chr "sienaGOF"
- attr(*, "scoreTest")= logi FALSE
- attr(*, "originalMahalanobisDistances")= num [1:3] 2.15 3.51 8.74
- attr(*, "oneStepMahalanobisDistances")=List of 3
..$ : Named num(0)
.. ..- attr(*, "names")= chr(0)
..$ : Named num(0)
.. ..- attr(*, "names")= chr(0)
..$ : Named num(0)
.. ..- attr(*, "names")= chr(0)
- attr(*, "joinedOneStepMahalanobisDistances")= Named num(0)
..- attr(*, "names")= chr(0)
- attr(*, "oneStepMahalanobisDistances_old")=List of 3
..$ : Named num(0)
.. ..- attr(*, "names")= chr(0)
..$ : Named num(0)
.. ..- attr(*, "names")= chr(0)
..$ : Named num(0)
.. ..- attr(*, "names")= chr(0)
- attr(*, "joinedOneStepMahalanobisDistances_old")= Named num(0)
..- attr(*, "names")= chr(0)
- attr(*, "oneStepSpecs")= num[1:20, 0 ]
- attr(*, "auxiliaryStatisticName")= chr "IndegreeDistribution"
- attr(*, "simTime")= 'proc_time' Named num [1:5] 39.61 0.28 40.21 NA NA
..- attr(*, "names")= chr [1:5] "user.self" "sys.self" "elapsed" "user.child" ...
- attr(*, "twoTailed")= logi FALSE
- attr(*, "joined")= logi TRUE"
But I don't know how to extract the p-Values and get a table, which just contains the network number and the associated p-value.
Furthermore, the plot command just produces error messages and no output so far.

replace values in a nested list with values from another matrix

I have a list of 28 elements like the following which are combinations of 8 elements two by two (SG1, SG2, ... SG8):
str(combinations)
List of 28
$ : chr [1:2] "SG1" "SG2"
$ : chr [1:2] "SG1" "SG3"
$ : chr [1:2] "SG1" "SG4"
...
I have stored the results of a t.test in an object:
results <- lapply(seq_along(combinations), function (n) {
mydatatemp <- mydata[with(mydata, Subgroup %in% unlist(combinations2[n]) & Group %in% c("G1", "G3")),]
result <- t.test(mydatatemp[
mydatatemp$Subgroup == sapply(combinations[n], "[",1),4],
mydatatemp[mydatatemp$Subgroup == sapply(combinations[n], "[", 2),4],
alternative="two.sided", var.equal=TRUE)
return(result)})
and my str(result) is like this, of course 28 list elements one after another like this one:
List of 28
$ :List of 9
..$ statistic : Named num -6.9
.. ..- attr(*, "names")= chr "t"
..$ parameter : Named num 38
.. ..- attr(*, "names")= chr "df"
..$ p.value : num 3.33e-08
..$ conf.int : atomic [1:2] -0.301 -0.164
.. ..- attr(*, "conf.level")= num 0.95
..$ estimate : Named num [1:2] 0.196 0.429
.. ..- attr(*, "names")= chr [1:2] "mean of x" "mean of y"
..$ null.value : Named num 0
.. ..- attr(*, "names")= chr "difference in means"
..$ alternative: chr "two.sided"
..$ method : chr " Two Sample t-test"
..$ data.name : chr [1:2] "mydatatemp[mydatatemp$Subgroup == sapply(combinations[n], \"[\", 1), and mydatatemp[mydatatemp$Subgroup == sapply(combinations[n], \"[\""| __truncated__ " 4] and 4]"
..- attr(*, "class")= chr "htest"
how can I rename all of the $data.name elements chr [1:2] with the following paste code?
paste(matrix(unlist(combinations), ncol = 2, byrow = TRUE)[,1], matrix(unlist(combinations), ncol = 2, byrow = TRUE)[,2], sep = " vs. ")
the reason I am doing this is to replace those ugly looking characters in the $data.name element.

R-package(baseline) application to sample dataset

I am trying to use the R baseline-package on a sample dataset that I have for, to test and evaluate the current baseline algorithm that I have.
I wanted to apply the fillpeaks algorithm as a trend line to compare.
bc.fillPeaks <- baseline(milk$spectra[1, drop=FALSE], lambda=6,
hwi=50, it=10, int=2000, method="fillPeaks")
plot(bc.fillPeaks)
But my problem is that the sample data that I have does not fit the matrix structure which is used in the example. When I look at the data.frame used for the example I don't understand it
'data.frame': 45 obs. of 2 variables
$ cow : num 0 0.25 0.375 0.875 0.5 0.75 0.5 0.125 0 0.125 ...
$ spectra: num [1:45, 1:21451] 1029 371 606 368 554 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr "4999.94078628963" "5001.55954267662" "5003.17856106153" "5004.79784144435" ...
- attr(*, "terms")=Classes 'terms', 'formula' length 3 cow ~ spectra
.. ..- attr(*, "variables")= language list(cow, spectra)
.. ..- attr(*, "factors")= int [1:2, 1] 0 1
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:2] "cow" "spectra"
.. .. .. ..$ : chr "spectra"
.. ..- attr(*, "term.labels")= chr "spectra"
.. ..- attr(*, "order")= int 1
.. ..- attr(*, "intercept")= int 1
.. ..- attr(*, "response")= int 1
.. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. ..- attr(*, "predvars")= language list(cow, spectra)
.. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "nmatrix.21451"
.. .. ..- attr(*, "names")= chr [1:2] "cow" "spectra"
My question is therefore if any of you have experience with the baseline-package and the dataset (milk) used and ideas to how I can convert my data set which is structed: Date, Visits, Old_baseline_visits
To fit and test the baseline algorithm from the R-package
I have used baseline, and found it slightly confusing at first, particularly the example data. As it says in the help file, baseline expects a matrix with the spectra in rows. Even if you only have one "spectrum", it needs to be in the form of a single row matrix. Try this:
foo <- data.frame(Date=seq.Date(as.Date("1957-01-01"), by = "day",
length.out = ncol(milk$spectra)),
Visits=milk$spectra[1,],
Old_baseline_visits=milk$spectra[1,], row.names = NULL)
foo.t <- t(foo$Visits) # Visits in a single row matrix
bc.fillPeaks <- baseline(foo.t, lambda=6,
hwi=50, it=10, int=2000, method='fillPeaks')
plot(bc.fillPeaks)
If you want the baseline and corrected spectra back in your original data frame, try this:
foo$New_baseline <- c(getBaseline(bc.fillPeaks))
foo$New_corrected <- c(getCorrected(bc.fillPeaks))
plot(foo$Date, foo$New_corrected, "l")
Alternatively, if you don't need the baseline object, you can use baseline.fillPeaks(), which returns a list.

Error using stargazer and coxph - Survival Data

I get the following error when trying to use stargazer::stargazer with the coxph (Survival):
> summary(firm.survcox)
> stargazer(firm.survcox)
> firm.survcox #only partial results are included for brevity
Call: coxph(formula = Surv(Duration, Event, type = "right") ~ Innovation + Avg_Wage)
coef exp(coef) se(coef) z p
Innovation -0.87680 0.4161 0.008040 -109.051 0.0e+00
Prior_Experience:Age -0.01297 0.9871 0.004174 -3.107 1.9e-03
Likelihood ratio test=131885 on 23 df, p=0 n= 535416,
number of events= 203037 (1060020 observations deleted due to missingness)
Error in .get.standard.errors.1(object.name, user.given) :
subscript out of bounds
Variables in the model are firm-, industry-, and region-level, and there is one interaction term. I try running the model on only one variable (E.g. Innovation), and I get the same error message. Rownames are NULL.
Updated Question and Response:
Sorry about the confusion. I did not realize I could edit the original question. Below is the model run on one variable - innovation, with the same error produced).
Call: coxph(formula = Surv(Duration, Event, type = "right") ~ Innovation)
coef exp(coef) se(coef) z p
Innovation -0.87680 0.4161 0.008040 -109.051 0.0e+00
Likelihood ratio test=131885 on 23 df, p=0 n= 535416,
number of events= 203037 (1060020 observations deleted due to missingness)
stargazer(firm.survcox)
Error in .get.standard.errors.1(object.name, user.given) :
subscript out of bounds
Structure of the Cox Model
str(firm.survcox)
List of 18
$ coefficients : Named num -0.772
..- attr(*, "names")= chr "Innovation"
$ var : num [1, 1] 3.91e-05
$ loglik : num [1:2] -5583174 -5573640
$ score : num 16015
$ iter : int 4
$ linear.predictors: num [1:1595436] 0.0807 0.0807 0.0807 -0.6915 0.0807 ...
$ residuals : Named num [1:1595436] 0.925 0.976 -0.516 0.888 0.976 ...
..- attr(*, "names")= chr [1:1595436] "1" "2" "3" "4" ...
$ means : Named num 0.104
..- attr(*, "names")= chr "Innovation"
$ concordance : Named num [1:5] 5.20e+10 1.97e+10 3.67e+11 1.14e+10 2.49e+08
..- attr(*, "names")= chr [1:5] "concordant" "discordant" "tied.risk" "tied.time"..
$ method : chr "efron"
$ n : int 1595436
$ nevent : num 404033
$ terms :Classes 'terms', 'formula' length 3 Surv(Duration, Event, type =
"right") ~ Innovation
.. ..- attr(*, "variables")= language list(Surv(Duration, Event, type = "right"),
Innovation)
.. ..- attr(*, "factors")= int [1:2, 1] 0 1
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:2] "Surv(Duration, Event, type = \"right\")" "Innovation"
.. .. .. ..$ : chr "Innovation"
.. ..- attr(*, "term.labels")= chr "Innovation"
.. ..- attr(*, "specials")=Dotted pair list of 3
.. .. ..$ strata : NULL .. .. ..$ cluster: NULL .. .. ..$ tt : NULL ..-
attr(*, "order")= int 1 .. ..- attr(*, "intercept")= int 1 .. ..- attr(*,
"response")= int 1 .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> .. ..-
attr(*, "predvars")= language list(Surv(Duration, Event, type = "right"),
Innovation) .. ..- attr(*, "dataClasses")= Named chr [1:2] "nmatrix.2" "numeric"
.. .. ..- attr(*, "names")= chr [1:2] "Surv(Duration, Event, type = \"right\")"
"Innovation"
$ assign :List of 1 ..$ Innovation: num 1
$ wald.test : Named num 15249 ..- attr(*, "names")= chr "Innovation"
$ y : Surv [1:1595436, 1:2] 2 1 10+ 5 1 10+ 10+ 6 8+ 8+
... ..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:1595436] "1" "2" "3" "4"
$ : chr [1:2] "time" "status" ..- attr(*, "type")= chr "right" $ formula
:Class 'formula' length 3 Surv(Duration, Event, type = "right") ~ Innovation ..
attr(*, ".Environment")=<environment: R_GlobalEnv> $ call : language
coxph(formula = Surv(Duration, Event, type = "right") ~ Innovation)
- attr(*, "class")= chr "coxph"
The material posted in your comment (which I added to the question and applied the code block formatting function as you should have done) doesn't really make sense. The coefficients listed "Innovation" and "Prior_Experience:Age" do not match the formula in the call: "formula = Surv(Duration, Event, type = "right") ~ Innovation + Avg_Wage)". There seems to have been some mangling of the object along the way.

extracting values from an aovp object in Lmperm

I have the following object M, from which I need to extract the fstatistic. It is a model generated by the function summaryC of a model generated by aovp, both functions from package lmPerm. I have tried hints for extracting values from normal linear models and from the functions in attr, extract and getElement, but without success.
Anybody could give me a hint?
> str(M)
List of 2
$ Error: vegetation: NULL
$ Error: Within :List of 11
..$ NA : NULL
..$ terms :Classes 'terms', 'formula' length 3 Temp ~ depth
.. .. ..- attr(*, "variables")= language list(Temp, depth)
.. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
.. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. ..$ : chr [1:2] "Temp" "depth"
.. .. .. .. ..$ : chr "depth"
.. .. ..- attr(*, "term.labels")= chr "depth"
.. .. ..- attr(*, "order")= int 1
.. .. ..- attr(*, "intercept")= int 1
.. .. ..- attr(*, "response")= int 1
.. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
..$ residuals : Named num [1:498] -46.9 -43.9 -46.9 -38.9 -41.9 ...
.. ..- attr(*, "names")= chr [1:498] "3" "4" "5" "6" ...
..$ coefficients : num [1:4, 1:4] -2.00 -1.00 -1.35e-14 1.00 2.59 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:4] "depth1" "depth2" "depth3" "depth4"
.. .. ..$ : chr [1:4] "Estimate" "Std. Error" "t value" "Pr(>|t|)"
..$ aliased : Named logi [1:4] FALSE FALSE FALSE FALSE
.. ..- attr(*, "names")= chr [1:4] "depth1" "depth2" "depth3" "depth4"
..$ sigma : num 29
..$ df : int [1:3] 4 494 4
..$ r.squared : num 0.00239
..$ adj.r.squared: num -0.00367
..$ **fstatistic** : Named num [1:3] 0.395 3 494
.. ..- attr(*, "names")= chr [1:3] "value" "numdf" "dendf"
..$ cov.unscaled : num [1:4, 1:4] 0.008 -0.002 -0.002 -0.002 -0.002 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:4] "depth1" "depth2" "depth3" "depth4"
.. .. ..$ : chr [1:4] "depth1" "depth2" "depth3" "depth4"
..- attr(*, "class")= chr "summary.lmp"
- attr(*, "class")= chr "listof"
there it goes a reproducible example to play with:
Temp=1:100
depth<- rep( c("1","2","3","4","5"), 100)
vegetation=rep( c("1","2"), 50)
df=data.frame(Temp,depth,vegetation)
M=summaryC(aovp(Temp~depth+Error(vegetation),df, perm=""))
as the str output from your example shows, M is a list of two lists, the second one contains what you want. Hence list extraction via [[ does the trick:
> M[[2]][["fstatistic"]]
value numdf dendf
0.3946 3.0000 494.0000
If this is not what you want, please comment.

Resources