I have data that looks on the surface as follows:
The data is structured as follows:
str(minstest)
List of 948
$ : Named int [1:5] 4 6 11 16 75
..- attr(*, "names")= chr [1:5] "Mass " "Bite " "Burn " "Cyst " ...
$ : Named int 37
..- attr(*, "names")= chr "Impaired skin integrity "
$ : Named int 2
..- attr(*, "names")= chr "Abrasion "
$ : Named int 33
..- attr(*, "names")= chr "Infection of nail "
$ : Named int 34
..- attr(*, "names")= chr "Lesion "
What I want to do is replace the numbers in the V1 vector with the names. Further, I would like to separate each name into distinct columns.
I tried multiple permutations of a recommendation that I found at:
How to print names of values in R
Unfortunately, no option I have tried thus far has worked.
Related
I'm relatively new to R and I'm running network and behavior coevolution models using the R Package RSiena.
My data set consists of around 100 networks and for each of these networks, I run one RSiena model.
ans.1 <- siena07(myalgorithm, data=mydata.1, effects=myeff.1, batch=TRUE)
...
ans.100 <- siena07(myalgorithm, data=mydata.100, effects=myeff.100, batch=TRUE)
Now I want to test the goodness of fit for each of the multiple network models. I actually know how to check the goodness of fit for a single model.
gof <- sienaGOF(ans.1, verbose=TRUE, varName="Friend", IndegreeDistribution)
plot(gof)
But I don't know how to combine the GOF results of all 100 models to get an overall impression. How can I get a table with the model number and the p-values. Or can I plot the results for all models within one plot? Or is there a better way?
So far I tried to put the GOF results in a list:
goftest <-list()
goftest[[1]] <- sienaGOF(ans.1, verbose=TRUE, varName="Friend", IndegreeDistribution)
...
goftest[[100]] <- sienaGOF(ans.100, verbose=TRUE, varName="Friend", IndegreeDistribution)
plot(goftest)
goftest[[1]] #Output:
"Siena Goodness of Fit ( IndegreeDistribution ), all periods
=====
Monte Carlo Mahalanobis distance test p-value: 0.941
-----
One tailed test used (i.e. estimated probability of greater distance than observation).
-----
Calculated joint MHD = ( 14.4 ) for current model."
str(goftest[[1]])#Output:
"List of 1
$ Joint:List of 8
..$ p : num 0.941
..$ SimulatedTestStat: Named num [1:2000] 9.97 16.02 6.83 10.14 8.65 ...
.. ..- attr(*, "names")= chr [1:2000] "1" "2" "3" "4" ...
..$ ObservedTestStat : num 2.09
..$ TwoTailed : logi FALSE
..$ Simulations : int [1:2000, 1:9] 21 22 22 21 19 26 30 23 25 26 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:2000] "1" "2" "3" "4" ...
.. .. ..$ : NULL
..$ Observations : int [1, 1:9] 26 48 63 73 76 78 78 78 78
..$ InvCovSimStats : num [1:9, 1:9] 13.2509 4.9587 1.2948 0.231 0.0895 ...
..$ Rank : int 9
.. ..- attr(*, "method")= chr "tolNorm2"
.. ..- attr(*, "useGrad")= logi FALSE
.. ..- attr(*, "tol")= num 2e-15
..- attr(*, "class")= chr "sienaGofTest"
..- attr(*, "sienaFitName")= chr "sienaFitObject"
..- attr(*, "auxiliaryStatisticName")= chr "IndegreeDistribution"
..- attr(*, "key")= chr [1:9] "0" "1" "2" "3" ...
- attr(*, "class")= chr "sienaGOF"
- attr(*, "scoreTest")= logi FALSE
- attr(*, "originalMahalanobisDistances")= num [1:3] 2.15 3.51 8.74
- attr(*, "oneStepMahalanobisDistances")=List of 3
..$ : Named num(0)
.. ..- attr(*, "names")= chr(0)
..$ : Named num(0)
.. ..- attr(*, "names")= chr(0)
..$ : Named num(0)
.. ..- attr(*, "names")= chr(0)
- attr(*, "joinedOneStepMahalanobisDistances")= Named num(0)
..- attr(*, "names")= chr(0)
- attr(*, "oneStepMahalanobisDistances_old")=List of 3
..$ : Named num(0)
.. ..- attr(*, "names")= chr(0)
..$ : Named num(0)
.. ..- attr(*, "names")= chr(0)
..$ : Named num(0)
.. ..- attr(*, "names")= chr(0)
- attr(*, "joinedOneStepMahalanobisDistances_old")= Named num(0)
..- attr(*, "names")= chr(0)
- attr(*, "oneStepSpecs")= num[1:20, 0 ]
- attr(*, "auxiliaryStatisticName")= chr "IndegreeDistribution"
- attr(*, "simTime")= 'proc_time' Named num [1:5] 39.61 0.28 40.21 NA NA
..- attr(*, "names")= chr [1:5] "user.self" "sys.self" "elapsed" "user.child" ...
- attr(*, "twoTailed")= logi FALSE
- attr(*, "joined")= logi TRUE"
But I don't know how to extract the p-Values and get a table, which just contains the network number and the associated p-value.
Furthermore, the plot command just produces error messages and no output so far.
It seems that, in R, I can refer to a variable with part of a variable name. But I am confused about why I can do that.
Use the following code as an example:
library(car)
scatterplot(housing ~ total)
house.lm <- lm(housing ~ total)
summary(house.lm)
str(summary(house.lm))
summary(house.lm)$coefficients[2,2]
summary(house.lm)$coe[2,2]
When I print the structure of summary(house.lm), I got the following output:
> str(summary(house.lm))
List of 11
$ call : language lm(formula = housing ~ total)
$ terms :Classes 'terms', 'formula' language housing ~ total
.. ..- attr(*, "variables")= language list(housing, total)
.. ..- attr(*, "factors")= int [1:2, 1] 0 1
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:2] "housing" "total"
.. .. .. ..$ : chr "total"
.. ..- attr(*, "term.labels")= chr "total"
.. ..- attr(*, "order")= int 1
.. ..- attr(*, "intercept")= int 1
.. ..- attr(*, "response")= int 1
.. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. ..- attr(*, "predvars")= language list(housing, total)
.. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
.. .. ..- attr(*, "names")= chr [1:2] "housing" "total"
$ residuals : Named num [1:162] -8.96 -11.43 3.08 8.45 2.2 ...
..- attr(*, "names")= chr [1:162] "1" "2" "3" "4" ...
$ coefficients : num [1:2, 1:4] 28.4523 0.0488 10.2117 0.0103 2.7862 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:2] "(Intercept)" "total"
.. ..$ : chr [1:4] "Estimate" "Std. Error" "t value" "Pr(>|t|)"
$ aliased : Named logi [1:2] FALSE FALSE
..- attr(*, "names")= chr [1:2] "(Intercept)" "total"
$ sigma : num 53.8
$ df : int [1:3] 2 160 2
$ r.squared : num 0.123
$ adj.r.squared: num 0.118
$ fstatistic : Named num [1:3] 22.5 1 160
..- attr(*, "names")= chr [1:3] "value" "numdf" "dendf"
$ cov.unscaled : num [1:2, 1:2] 3.61e-02 -3.31e-05 -3.31e-05 3.67e-08
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:2] "(Intercept)" "total"
.. ..$ : chr [1:2] "(Intercept)" "total"
- attr(*, "class")= chr "summary.lm"
However, it seems that I can refer to the variable coefficients with all of the following commands:
summary(house.lm)$coe[2,2]
summary(house.lm)$coef[2,2]
summary(house.lm)$coeff[2,2]
summary(house.lm)$coeffi[2,2]
summary(house.lm)$coeffic[2,2]
summary(house.lm)$coeffici[2,2]
summary(house.lm)$coefficie[2,2]
summary(house.lm)$coefficien[2,2]
summary(house.lm)$coefficient[2,2]
summary(house.lm)$coefficients[2,2]
They all give the same results: 0.01029709
Therefore, I was wondering when I can refer to a variable with only part of its name in R?
You can do it when rest of name is unambiguous. For example
df <- data.frame(abcd = c(1,2,3), xyz = c(4,5,6), abc = c(5,6,7))
> df$xy
[1] 4 5 6
> df$ab
NULL
> df$x
[1] 4 5 6
df$xy and even df$x gives right data, but df$ab results in NULL because it can refer to both df$abc and df$abcd. It's like when you type df$xy in RStudio and press Ctrl + Space you will get rigtht variable name, so you could refer to part of variable name.
http://adv-r.had.co.nz/Functions.html#lexical-scoping
When calling a function you can specify arguments by position, by
complete name, or by partial name. Arguments are matched first by
exact name (perfect matching), then by prefix matching, and finally by
position.
When you are doing quick coding to analyse some data, using partial names is not a problem, but I tend to agree, it's not good when writing code. In a package you can't do that, R-CMD check will find every occurence.
I'm using the haven package for R to read an spss file with user_na=TRUE. The file has many string variables with value labels. In R only the first of the string variables (SizeofH1) has the correct value labels assigned to it as attribute.
Unfortunately I cannot not even provide a snippet of this data to make this fully reproducible but here is a screenshot of what I can see in PSPP
and what str() in R returns...
$ SizeofH1:Class 'labelled' atomic [1:280109] 3 3 3 3 ...
..- attr(*, "label")= chr "Size of Household ab 2002"
..- attr(*, "format.spss")= chr "A30"
..- attr(*, "labels")= Named chr [1:9] "1" "2" "3" "4" ...
..- attr(*, "names")= chr [1:9] "4 Persons" "2 Persons" "1 Person 50 years plus" "3 Persons" ...
$ PROMOTIO: atomic 40 1 40 40 ...
..- attr(*, "label")= chr "PROMOTION"
..- attr(*, "format.spss")= chr "A30"
$ inFMCGfr: atomic 1 1 1 1 ...
..- attr(*, "label")= chr "in FMCG from2011"
..- attr(*, "format.spss")= chr "A30"
$ TRADESEG: atomic 1 1 1 1 ...
..- attr(*, "label")= chr "TRADE SEGMENT"
..- attr(*, "format.spss")= chr "A30"
$ ORGANISA: atomic 111 111 111 111 ...
..- attr(*, "label")= chr "ORGANISATION"
..- attr(*, "format.spss")= chr "A30"
$ NAME : atomic 9 9 9 9 ...
..- attr(*, "label")= chr "NAME"
..- attr(*, "format.spss")= chr "A30"
I hope someone can point me to any possible reason that causes this behavior.
The "semantics" vignette has some useful information on this topic.
library(haven)
vignette('semantics')
There are a couple of options to get value labels. I think a good one is the example demonstrated below, using the map function from the purrr package (but could be done with lapply instead, too)
# Get data from spss file
df <- read_sav(path_to_file)
# get value labels
df <- map_df(.x = df, .f = function(x) {
if (class(x) == 'labelled') as_factor(x)
else x})
# get column names
colnames(df) <- map(.x = spss_file, .f = function(x) {attr(x, 'label')})
The best is to save your spss file as CSV and then read it in R. I've faced this before and some strings didn't read correctly- Generally SPSS is not very smart when it comes to string variables that this could contribute to the problem.
I have a list of 28 elements like the following which are combinations of 8 elements two by two (SG1, SG2, ... SG8):
str(combinations)
List of 28
$ : chr [1:2] "SG1" "SG2"
$ : chr [1:2] "SG1" "SG3"
$ : chr [1:2] "SG1" "SG4"
...
I have stored the results of a t.test in an object:
results <- lapply(seq_along(combinations), function (n) {
mydatatemp <- mydata[with(mydata, Subgroup %in% unlist(combinations2[n]) & Group %in% c("G1", "G3")),]
result <- t.test(mydatatemp[
mydatatemp$Subgroup == sapply(combinations[n], "[",1),4],
mydatatemp[mydatatemp$Subgroup == sapply(combinations[n], "[", 2),4],
alternative="two.sided", var.equal=TRUE)
return(result)})
and my str(result) is like this, of course 28 list elements one after another like this one:
List of 28
$ :List of 9
..$ statistic : Named num -6.9
.. ..- attr(*, "names")= chr "t"
..$ parameter : Named num 38
.. ..- attr(*, "names")= chr "df"
..$ p.value : num 3.33e-08
..$ conf.int : atomic [1:2] -0.301 -0.164
.. ..- attr(*, "conf.level")= num 0.95
..$ estimate : Named num [1:2] 0.196 0.429
.. ..- attr(*, "names")= chr [1:2] "mean of x" "mean of y"
..$ null.value : Named num 0
.. ..- attr(*, "names")= chr "difference in means"
..$ alternative: chr "two.sided"
..$ method : chr " Two Sample t-test"
..$ data.name : chr [1:2] "mydatatemp[mydatatemp$Subgroup == sapply(combinations[n], \"[\", 1), and mydatatemp[mydatatemp$Subgroup == sapply(combinations[n], \"[\""| __truncated__ " 4] and 4]"
..- attr(*, "class")= chr "htest"
how can I rename all of the $data.name elements chr [1:2] with the following paste code?
paste(matrix(unlist(combinations), ncol = 2, byrow = TRUE)[,1], matrix(unlist(combinations), ncol = 2, byrow = TRUE)[,2], sep = " vs. ")
the reason I am doing this is to replace those ugly looking characters in the $data.name element.
I am trying to use the R baseline-package on a sample dataset that I have for, to test and evaluate the current baseline algorithm that I have.
I wanted to apply the fillpeaks algorithm as a trend line to compare.
bc.fillPeaks <- baseline(milk$spectra[1, drop=FALSE], lambda=6,
hwi=50, it=10, int=2000, method="fillPeaks")
plot(bc.fillPeaks)
But my problem is that the sample data that I have does not fit the matrix structure which is used in the example. When I look at the data.frame used for the example I don't understand it
'data.frame': 45 obs. of 2 variables
$ cow : num 0 0.25 0.375 0.875 0.5 0.75 0.5 0.125 0 0.125 ...
$ spectra: num [1:45, 1:21451] 1029 371 606 368 554 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr "4999.94078628963" "5001.55954267662" "5003.17856106153" "5004.79784144435" ...
- attr(*, "terms")=Classes 'terms', 'formula' length 3 cow ~ spectra
.. ..- attr(*, "variables")= language list(cow, spectra)
.. ..- attr(*, "factors")= int [1:2, 1] 0 1
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:2] "cow" "spectra"
.. .. .. ..$ : chr "spectra"
.. ..- attr(*, "term.labels")= chr "spectra"
.. ..- attr(*, "order")= int 1
.. ..- attr(*, "intercept")= int 1
.. ..- attr(*, "response")= int 1
.. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. ..- attr(*, "predvars")= language list(cow, spectra)
.. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "nmatrix.21451"
.. .. ..- attr(*, "names")= chr [1:2] "cow" "spectra"
My question is therefore if any of you have experience with the baseline-package and the dataset (milk) used and ideas to how I can convert my data set which is structed: Date, Visits, Old_baseline_visits
To fit and test the baseline algorithm from the R-package
I have used baseline, and found it slightly confusing at first, particularly the example data. As it says in the help file, baseline expects a matrix with the spectra in rows. Even if you only have one "spectrum", it needs to be in the form of a single row matrix. Try this:
foo <- data.frame(Date=seq.Date(as.Date("1957-01-01"), by = "day",
length.out = ncol(milk$spectra)),
Visits=milk$spectra[1,],
Old_baseline_visits=milk$spectra[1,], row.names = NULL)
foo.t <- t(foo$Visits) # Visits in a single row matrix
bc.fillPeaks <- baseline(foo.t, lambda=6,
hwi=50, it=10, int=2000, method='fillPeaks')
plot(bc.fillPeaks)
If you want the baseline and corrected spectra back in your original data frame, try this:
foo$New_baseline <- c(getBaseline(bc.fillPeaks))
foo$New_corrected <- c(getCorrected(bc.fillPeaks))
plot(foo$Date, foo$New_corrected, "l")
Alternatively, if you don't need the baseline object, you can use baseline.fillPeaks(), which returns a list.