I've got the optim function in r returning a list of stuff like this:
[[354]]
r k sigma
389.4 354.0 354.0
but when I try accessing say list$sigma it doesn't exist returning NULL.
I've tried attach and I've tried names, and I've tried assigning it to a matrix, but none of these things would work
Anyone got any idea how I can access the lowest or highest value for sigma r or k in my list??
Many thanks!!
str gives me this output:
List of 354
$ : Named num [1:3] -55.25 2.99 119.37
..- attr(*, "names")= chr [1:3] "r" "k" "sigma"
$ : Named num [1:3] -53.91 4.21 119.71
..- attr(*, "names")= chr [1:3] "r" "k" "sigma"
$ : Named num [1:3] -41.7 14.6 119.2
So I've got a double within a list within a list (?) I'm still mystified as to how I can cycle through the list and pick one out meeting my conditions without writing a function from scratch
The key issue is that you have a list of lists (or a list of data.frames, which in fact is also a list).
To confirm this, take a look at is(list[[354]]).
The solution is simply to add an additional level of indexing. Below you have multiple alternatives of how to accomplish this.
you can use a vector as an index to [[, so for example if you want to access the third element from the 354th element, you can use
myList[[ c(354, 3) ]]
You can also use character indecies, however, all nested levels must have named indecies.
names(myList) <- as.character(1:length(myList))
myList[[ c("5", "sigma") ]]
Lastly, please try to avoid using names like list, data, df etc. This will lead to crashing code and erors which will seem unexplainable and mysterious until one realizes that they've tried to subset a function
Edit:
In response to your question in the comments above: If you want to see the structure of an object (ie the "makeup" of the object), use str
> str(myList)
List of 5
$ :'data.frame': 1 obs. of 3 variables:
..$ a : num 0.654
..$ b : num -0.0823
..$ sigma: num -31
$ :'data.frame': 1 obs. of 3 variables:
..$ a : num -0.656
..$ b : num -0.167
..$ sigma: num -49
$ :'data.frame': 1 obs. of 3 variables:
..$ a : num 0.154
..$ b : num 0.522
..$ sigma: num -89
$ :'data.frame': 1 obs. of 3 variables:
..$ a : num 0.676
..$ b : num 0.595
..$ sigma: num 145
$ :'data.frame': 1 obs. of 3 variables:
..$ a : num -0.75
..$ b : num 0.772
..$ sigma: num 6
If you want -for example- all the sigmas, you can use sapply:
sapply(list, function(x)x["sigma"])
You can use that to find the minimum and maximum:
range(sapply(list, function(x)x["sigma"]))
Using , do.call you can do this :
do.call('[',mylist,354)['sigma']
Related
I'm trying to compute a phenotypic covariance matrix between a fatty acid dataset and a phylogenetic tree using the Rphylopars package.
I'm able to load the data set and phylogeny; however, when I attempt to run the test I get the error message
Error in class(tree) <- "phylo" : attempt to set an attribute on NULL"
This is the code for the test
phy <- read.tree("combined_trees.txt")
plot(phy)
phy$tip.label
FA_data <- read.csv("fatty_acid_example_data.csv", header = TRUE, na.strings = ".")
head(FA_data)
str(FA_data)
PPE <- phylopars(trait_data = FA_data$fatty1_continuous, tree = FA_data$phy)
Not sure what other info will help figure out the issue. The data set and phylogeny loaded without an error.
In the tutorial, the tree and trait data are jointly simulated by the simtraits() function, so both end up as elements of a single list. In your case (which will be typical of real-data cases), the tree and the trait data come from different sources, so most likely you want
PPE <- phylopars(trait_data = FA_data, tree = phy)
provided that FA_data contains a first column species matching the tip names in phy, and otherwise only the numeric data you want to use (potentially only the single fatty_acid1 column).
For comparison, the data structure returned by simtraits() looks like this (using str()):
List of 4
$ trait_data:'data.frame': 45 obs. of 5 variables:
..$ species: chr [1:45] "t7" "t8" "t2" "t3" ...
..$ V1 : num [1:45] 1.338 0.308 1.739 2.009 2.903 ...
..$ V2 : num [1:45] -2.002 -0.115 -0.349 -4.452 NA ...
..$ V3 : num [1:45] -1.74 NA 1.09 -2.54 -1.19 ...
..$ V4 : num [1:45] 2.496 2.712 1.198 1.675 -0.117 ...
$ tree :List of 4
..$ edge : int [1:28, 1:2] 29 29 28 28 27 27 26 26 25 25 ...
..$ edge.length: num [1:28] 0.0941 0.0941 0.6233 0.7174 0.0527 ...
..$ Nnode : int 14
..$ tip.label : chr [1:15] "t7" "t8" "t2" "t3" ...
..- attr(*, "class")= chr "phylo"
..- attr(*, "order")= chr "postorder"
...
you can see that simtraits() returns a list containing (among other things) (1) a data frame with species as the first column and the other columns numeric and (2) a phylogenetic tree.
You
I'm definitely a noob, though I have used R for various small tasks for several years.
For the life of me, I cannot figure out how to get the results from the "Desc" function into something I can work with. When I save the x<-Desc(mydata) the class(x) shows up as "Desc." In R studio it is under Values and says "List of 1." Then when I click on x it says ":List of 25" in the first line. There is a list of data in this object, but I cannot for the life of me figure out how to grab any of it.
Clearly I have a severe misunderstanding of the R data structures, but I have been searching for the past 90 minutes to no avail so figured I would reach out.
In short, I just want to pull certain aspects (N, mean, UB, LB, median) of the descriptive statistics provided from the Desc results for multiple datasets and build a little table that I can then work with.
Thanks for the help.
Say you have a dataframe, x, where:
x <- data.frame(i=c(1,2,3),j=c(4,5,6))
You could set:
desc.x <- Desc(x)
And access the info on any given column like:
desc.x$i
desc.x$i$mead
desc.x$j$sd
And any other stats Desc comes up with. The $ is the key here, it's how you access the named fields of the list that Desc returns.
Edit: In case you pass a single column (as the asker does), or simply a vector to Desc, you are then returned a 1 item list. The same principle applies but the usual syntax is different. Now you would use:
desc.x <- Desc(df$my.col)
desc.x[[1]]$mean
In the future, the way to attack this is to either look in the environment window in RStudio and play around trying to figure out how to access the fields, check the source code on github or elsewhere, or (best first choice) use str(desc.x), which gives us:
> str(desc.x)
List of 1
$ :List of 25
..$ xname : chr "data.frame(i = c(1, 2, 3), j = c(4, 5, 6))$i"
..$ label : NULL
..$ class : chr "numeric"
..$ classlabel: chr "numeric"
..$ length : int 3
..$ n : int 3
..$ NAs : int 0
..$ main : chr "data.frame(i = c(1, 2, 3), j = c(4, 5, 6))$i (numeric)"
..$ unique : int 3
..$ 0s : int 0
..$ mean : num 2
..$ meanSE : num 0.577
..$ quant : Named num [1:9] 1 1.1 1.2 1.5 2 2.5 2.8 2.9 3
.. ..- attr(*, "names")= chr [1:9] "min" ".05" ".10" ".25" ...
..$ range : num 2
..$ sd : num 1
..$ vcoef : num 0.5
..$ mad : num 1.48
..$ IQR : num 1
..$ skew : num 0
..$ kurt : num -2.33
..$ small :'data.frame': 3 obs. of 2 variables:
.. ..$ val : num [1:3] 1 2 3
.. ..$ freq: num [1:3] 1 1 1
..$ large :'data.frame': 3 obs. of 2 variables:
.. ..$ val : num [1:3] 3 2 1
.. ..$ freq: num [1:3] 1 1 1
..$ freq :Classes ‘Freq’ and 'data.frame': 3 obs. of 5 variables:
.. ..$ level : Factor w/ 3 levels "1","2","3": 1 2 3
.. ..$ freq : int [1:3] 1 1 1
.. ..$ perc : num [1:3] 0.333 0.333 0.333
.. ..$ cumfreq: int [1:3] 1 2 3
.. ..$ cumperc: num [1:3] 0.333 0.667 1
..$ maxrows : num 12
..$ x : num [1:3] 1 2 3
- attr(*, "class")= chr "Desc"
"List of 1" means you access it by desc.x[[1]], and below that follow the $s. When you see something like num[1:3] that means it's an atomic vector so you access the first member like var$field$numbers[1]
I have a list of data frames called ls.df.val.dcas. Each dataframe has various columns with some missing values which are NA. I would like to use lappy() to the list so that I can remove those columns that more than X % (e.g. 40%) of their values are NA. To give you a view of how the dataframes within the list look like I am showing an example:
$ SK_VALUES_IMV_EU28_INTRA :'data.frame': 74 obs. of 65 variables:
..$ PERIOD : Date[1:74], format: "2010-01-01" "2010-02-01" "2010-03-01" "2010-04-01" ...
..$ 2207 : num [1:74] 1078759 1850083 1872924 1038070 626471 ...
..$ 2208 : num [1:74] 3329179 7061890 1351550 1371469 1557605 ...
..$ 220710 : num [1:74] 1030704 1804495 1831958 972263 574855 ...
..$ 220720 : num [1:74] 48055 45588 40966 65807 51616 ...
..$ 220820 : num [1:74] 380843 1014933 71804 126348 138138 ...
..$ 220830 : num [1:74] 380007 459653 155033 205879 297446 ...
..$ 220840 : num [1:74] 41561 88449 31549 60768 117534 ...
..$ 220850 : num [1:74] 94483 340439 44949 32949 37550 ...
..$ 220860 : num [1:74] 371217 728521 143974 179311 254546 ...
..$ 220870 : num [1:74] 731231 1374532 228087 227772 230129 ...
..$ 22082014: num [1:74] NA 2531 1776 NA NA ...
$ RO_VALUES_IMV_EU28_EXTRA :'data.frame': 74 obs. of 44 variables:
..$ PERIOD : Date[1:74], format: "2010-01-01" "2010-02-01" "2010-03-01" "2010-04-01" ...
..$ 2207 : num [1:74] NA NA NA NA NA 5 NA NA NA NA ...
..$ 2208 : num [1:74] 312035 840540 315008 884357 100836 ...
..$ 220710 : num [1:74] NA NA NA NA NA 5 NA NA NA NA ...
..$ 220720 : num [1:74] NA NA NA NA NA NA NA NA NA NA ...
..$ 220820 : num [1:74] 3570 698 483 1087 1802 ...
My incomplete solution is based on counting the number of NA in each column of each dataframe and calculating the percentage of NA. Then removing those columns that the percentage is more than X%.
# Counting the number of NA
ls.Nan <- lapply(ls.df.val.dcas, function(x) colSums(!is.na(x)))
# Calculating the lengths of all column
ls.size <- lapply(ls.df.val.dcas, function(x) dim(x))
# we want the first element of size which shows the number of rows.
ls.percen <- mapply(function(x,y) x/y[1] , x=ls.Nan, y=ls.size)
# keeping those columns that have more than half of the data on that category
mis.list <- sapply(ls.df.val.dcas, "]]" sapply(ls.percen, function(x) x >= NPI))
I get the following error from running the last line.
Error: unexpected symbol in "mis.list <- sapply(ls.df.val.dcas, "]]" sapply"
Ultimately I also like to merge all of these functions into a single functions and then use lapply once. But right now, I am struggling to understand the indexing system of lapply applied to list of dataframes. If any one can demonstrate with an example how to use lapply with different granularity of lists then that would be great. For instance how functions should be written when you want to change an element of a list or a dataframe within list, or a column within a dataframe of a list.
EDIT
Given the comment below about forgetting to put a comma after "]]". I corrected the code but still getting the error
> mis.list <- sapply(ls.df.val.dcas, "]]", sapply(ls.percen, function(x) x >= NPI))
Error in get(as.character(FUN), mode = "function", envir = envir) :
object ']]' of mode 'function' was not found
By the way, the NPI is just a percentage threshold of NAs in the column. For instance I have set it to NPI= 0.35
Since I suspect there the error is related to the structure of my data, I added the more info on the structure of the ls.percen.
> str(ls.percen)
List of 69
$ AT_VALUES_IMV_EU28_EXTRA : Named num [1:59] 1 0.635 1 0.378 0.338 ...
..- attr(*, "names")= chr [1:59] "PERIOD" "2207" "2208" "220710" ...
$ AT_VALUES_IMV_EU28_INTRA : Named num [1:67] 1 0.986 0.986 0.986 0.986 ...
..- attr(*, "names")= chr [1:67] "PERIOD" "2207" "2208" "220710" ...
$ BE_VALUES_IMV_EU28_EXTRA : Named num [1:57] 1 1 1 1 0.365 ...
..- attr(*, "names")= chr [1:57] "PERIOD" "2207" "2208" "220710" ...
$ BE_VALUES_IMV_EU28_INTRA : Named num [1:69] 1 0.986 0.986 0.986 0.986 ...
..- attr(*, "names")= chr [1:69] "PERIOD" "2207" "2208" "220710" ...
Might be a simple typo (and not a problem with indexing): that message says you are missing a comma, and it should perhaps be:
mis.list <- sapply( ls.df.val.dcas, "]]", sapply(ls.percen, function(x) x >= NPI))
We don't see a definition of 'NPI'. Might be simpler to merge the first two 'lapply' calls (and return the desired list of shorted df's) with:
mis.lst <- lapply( ls.df.val.dcas,
function(x) x[ , colSums(!is.na(x))/nrow(x) > .40 ] )
You can use logical indexing in the "j" position for the two argument version of "[".
This question already has an answer here:
mgcv: How to set number and / or locations of knots for splines
(1 answer)
Closed 5 years ago.
I am running a GAM across many samples and am extracting coefficients/t-values/r-squared from the results in the way shown below. For background, I am using a natural spline, so the regular lm() works fine here and perhaps that is why this method works fine.
tvalsm93exf=ldply(fitsm93exf, function(x) as.data.frame(t(coef(summary(x))[,'t value', drop=FALSE])))
r2m93exf=ldply(fitsm93exf, function(x) as.data.frame(t(summary(x))[,'r.squared', drop=FALSE]))
I would also like to extract the knot locations for each sample set(df=4 and no intercept, so three internal knots and the boundaries). I have tried several variations of the commands above, but haven't been able to index in to this. The regular way to do this is below, so I was attempting to put this into the form above. But I am not certain if the summary function contains these values, or if there is another result I should be including instead.
attr(terms(fits),"predvars")
http://www.inside-r.org/r-doc/splines/ns
Note: This question is related to the question below, if that helps, though its solution did not help me solve my problem:
Extract estimates of GAM
The knots are fixed at the time that the ns function is called in the examples on the help page you linked to, so you could have extracted the knots without going into the model object. But ... you have not provided the code for the GAM model creation, so we can only speculate about what you might have done. Just because the word "spline" is used in both the ?ns-help-page and in the documentation does not mean they are the same. The model in the other page you linked to had two "smooth" terms constructed wtih the s function.
.... + s(time,bs="cr",k=200) + s(tmpd,bs="cr")
The result of that gam call had a list node named "smooth" and the first one looked like this when viewed with str():
str(ap1$smooth)
List of 2
$ :List of 22
..$ term : chr "time"
..$ bs.dim : num 200
..$ fixed : logi FALSE
..$ dim : int 1
..$ p.order : logi NA
..$ by : chr "NA"
..$ label : chr "s(time)"
..$ xt : NULL
..$ id : NULL
..$ sp : Named num -1
.. ..- attr(*, "names")= chr "s(time)"
..$ S :List of 1
.. ..$ : num [1:199, 1:199] 5.6 -5.475 2.609 -0.577 0.275 ...
..$ rank : num 198
..$ null.space.dim: num 1
..$ df : num 199
..$ xp : Named num [1:200] -2556 -2527 -2502 -2476 -2451 ...
.. ..- attr(*, "names")= chr [1:200] "0.0000000%" "0.5025126%" "1.0050251%" "1.5075377%" ...
..$ F : num [1:40000] 0 0 0 0 0 0 0 0 0 0 ...
..$ plot.me : logi TRUE
..$ side.constrain: logi TRUE
..$ S.scale : num 9.56e-05
..$ vn : chr "time"
..$ first.para : num 5
..$ last.para : num 203
..- attr(*, "class")= chr [1:2] "cr.smooth" "mgcv.smooth"
..- attr(*, "qrc")=List of 4
.. ..$ qr : num [1:200, 1] -0.0709 0.0817 0.0709 0.0688 0.0724 ...
.. ..$ rank : int 1
.. ..$ qraux: num 1.03
.. ..$ pivot: int 1
.. ..- attr(*, "class")= chr "qr"
..- attr(*, "nCons")= int 1
So the smooth was evaluated at each of 200 points and a polynomial function fit to the data on that grid. If you forced the knots to be at three interior locations then they will just be at the extremes and evenly spaced location between the extremes.
I am trying to use the penalizedLDA package to run a penalized linear discriminant analysis in order to select the "most meaningful" variables. I have searched here and on other sites for help in accessing the the output from the penalized model to no avail.
My data comprises of 400 varaibles and 44 groups. Code I used and results I got thus far:
yy.m<-as.matrix(yy) #Factors/groups
xx.m<-as.matrix(xx) #Variables
cv.out<-PenalizedLDA.cv(xx.m,yy.m,type="standard")
## aplly the penalty
out <- PenalizedLDA(xx.m,yy.m,lambda=cv.out$bestlambda,K=cv.out$bestK)
Too get the structure of the output from the anaylsis:
> str(out)
List of 10
$ discrim: num [1:401, 1:4] -0.0234 -0.0219 -0.0189 -0.0143 -0.0102 ...
$ xproj : num [1:100, 1:4] -8.31 -14.68 -11.07 -13.46 -26.2 ...
$ K : int 4
$ crits :List of 4
..$ : num [1:4] 2827 2827 2827 2827
..$ : num [1:4] 914 914 914 914
..$ : num [1:4] 162 162 162 162
..$ : num [1:4] 48.6 48.6 48.6 48.6
$ type : chr "standard"
$ lambda : num 0
$ lambda2: NULL
$ wcsd.x : Named num [1:401] 0.0379 0.0335 0.0292 0.0261 0.0217 ...
..- attr(*, "names")= chr [1:401] "R400" "R405" "R410" "R415" ...
$ x : num [1:100, 1:401] 0.147 0.144 0.145 0.141 0.129 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:401] "R400" "R405" "R410" "R415" ...
$ y : num [1:100, 1] 2 2 2 2 2 1 1 1 1 1 ...
- attr(*, "class")= chr "penlda"
I am interested in obtaining a list or matrix of the top 20 variables for feature selection, more than likely based on the coefficients of the Linear discrimination.
I realized I would have to sort the coefficients in descending order, and get the variable names matched to it. So the output I would expect is something like this imaginary example
V1 V2
R400 0.34
R1535 0.22...
Can anyone provide any pointers (not necessarily the R code). Thanks in advance.
Your out$K is 4, and that means you have 4 discriminant vectors. If you want the top 20 variables according to, say, the 2nd vector, try this:
# get the data frame of variable names and coefficients
var.coef = data.frame(colnames(xx.m), out$discrim[,2])
# sort the 2nd column (the coefficients) in decreasing order, and only keep the top 20
var.coef.top = var.coef[order(var.coef[,2], decreasing = TRUE)[1:20], ]
var.coef.top is what you want.