I have a data frame 'QARef" whith 25 variables. There are only 5 unique jobs (3rd column) but lots of rows per job:
str(QARef)
'data.frame': 648 obs. of 25 variables:
I'm using tapply to generate mean values across all 5 jobs for certain rows:
RefMean <- tapply(QARef$MTN,
list(QARef$Target_CD, QARef$Feature_Type, QARef$Orientation, QARef$Contrast, QARef$Prox),
FUN=mean, trim=0, na.rm=TRUE)
and I get something I'm hoping is referred to as multidimensional list:
str(RefMean)
num [1:17, 1:2, 1:2, 1:2, 1:2] 34.1 34.2 25.2 28.9 29.2 ...
- attr(*, "dimnames")=List of 5
..$ : chr [1:17] "55" "60" "70" "80" ...
..$ : chr [1:2] "LINE" "SQUARE"
..$ : chr [1:2] "X" "Y"
..$ : chr [1:2] "CLEAR" "DARK"
..$ : chr [1:2] "1:1" "Iso"
What I want to do is add a column to QARef which contains the correct RefMean value for each row depending on a match between values in columns of QARef and dimnames of RefMean. E.g. QARef column Feature_Type=="LINE" should match the dimname "LINE" etc.
Any hint how to do this or where to find the answer would be highly appreciated.
I think I found solution. Probably not elegant but it works:
RefMean <- data.frame(tapply(QARef$MTN,paste(QARef$Target_CD,QARef$Feature_Type,QARef$Orientation,QARef$Contrast,QARef$Prox,QARef$Measurement_Type),FUN=mean,trim=0,na.rm=TRUE))
colnames(RefMean) <- c("MTN_Ref")
Ident <- do.call(rbind, strsplit(rownames(RefMean), " "))
RefMean["Target_CD"] <- Ident[,1]
RefMean["Feature_Type"] <- Ident[,2]
RefMean["Orientation"] <- Ident[,3]
RefMean["Contrast"] <- Ident[,4]
RefMean["Prox"] <- Ident[,5]
RefMean["Measurement_Type"] <- Ident[,6]
QA4 <- merge(QARef,RefMean,by=c("Target_CD","Feature_Type","Orientation","Contrast","Prox","Measurement_Type"),all.x=TRUE,sort=FALSE)
Related
With such rudimentary application, I'm having trouble removing data.table column labels/attributes from imported data (SAS)
My data.table DT is an import from a SAS file. Not all columns have labels, and some have two labels. I can't share my data as it's imported (so i can't replicate it), but here is a partial structure of DT:
> str(DT)
Classes ‘data.table’ and 'data.frame': 96293709 obs. of 150 variables:
$ Col1 : chr "Y" "N" "N" "N" ...
..- attr(*, "label")= chr "some label, description goes on and on"
$ Col2 : chr "N" "N" "N" "Y" ...
..- attr(*, "label")= chr "some label 2, description goes on and on"
$ Col3 : Date, format: "1994-08-07" "1994-08-07" "1994-08-07" "1994-08-07" ...
$ Col4 : chr "M" "M" "M" "M" ...
..- attr(*, "label")= chr "some label 3, description goes on and on"
..- attr(*, "format.sas")= chr "$"
$ Col5 : num 1e+07 1e+07 1e+07 1e+07 1e+07 ...
..- attr(*, "label")= chr "some label 4, description goes on and on"
$ Col6 : Date, format: "2000-01-01" "2005-03-10" "2013-06-01" "2015-06-01" ...
I'm trying to remove all attributes, because when I use certain columns to create news ones these attributes are inherited in the new column, which is very annoying and undesired (prevents me from merging with another data.table without the labels). I thought the only way to prevent that is to remove the attributes (labels) from the original data DT.
I tried
> setattr(DT, "label", NULL)
> setattr(DT, "format.sas", NULL)
and i get no error. but nothing happens.
after I try the above and check the structure, i get the same thing as before. labels/attributes have not been removed.
what am I doing wrong here?
I know i have to use setattr somehow as I don't want DT to be copied (it's rather large)
The attributes are stored against each column, not for the data.table as a whole I think. Check attributes(DT) vs lapply(DT, attributes) and see if this is the case. Here's an example which I think replicates what you're trying to do:
DT <- data.table(a=1:3,b=2:4)
attr(DT$a, "label") <- "a label"
attr(DT$b, "label") <- "a label"
attr(DT$b, "sas format") <- "ddmmyy10."
str(DT)
#Classes ‘data.table’ and 'data.frame': 3 obs. of 2 variables:
# $ a: atomic 1 2 3
# ..- attr(*, "label")= chr "a label"
# $ b: atomic 2 3 4
# ..- attr(*, "label")= chr "a label"
# ..- attr(*, "sas format")= chr "ddmmyy10."
# - attr(*, ".internal.selfref")=<externalptr>
DT[, names(DT) := lapply(.SD, setattr, "label", NULL)]
DT[, names(DT) := lapply(.SD, setattr, "sas format", NULL)]
str(DT)
#Classes ‘data.table’ and 'data.frame': 3 obs. of 2 variables:
# $ a: int 1 2 3
# $ b: int 2 3 4
# - attr(*, ".internal.selfref")=<externalptr>
I am reducing the dimensional of a test DataFrame(contain 30rows and 750 colunm) with PCA model with PCA (using the FactoMineR library) as follows:
pca_base <- PCA(test, ncp=5, graph=T)
I used function dimdesc() [in FactoMineR], for dimension description,to
identify the most significantly associated variables with a given principal component as follow:
pca_dim<-dimdesc(pca_base)
pca_dim is a list of 3 length.
My question is How can I extract row names of pca_dim from the list[1] and list[2]??.
I try this code:
#to select dim 1,2 use axes
pca_dim<-dimdesc(pca_base,axes = c(1,2))
rownames(pca_dim[[1]])
But the result was NULL.
For instant, I'll use the demo data sets decathlon2 from the factoextra package:data(decathlon2)
It contains 27 individuals (athletes) described by 13 variables.
library(factoextra)
data(decathlon2)
decathlon2.active <- decathlon2[1:23, 1:10]
res.pca <- PCA(decathlon2.active,scale.unit = TRUE, graph = FALSE)
res.desc <- dimdesc(res.pca, axes = c(1,2))
Thanks!
When you have that kind of issues, to access information on an R object, the best way to solve them is to start by examining the output of function str.
str(pca_dim)
#List of 2
# $ Dim.1:List of 1
# ..$ quanti: num [1:8, 1:2] 0.794 0.743 0.734 0.61 0.428 ...
# .. ..- attr(*, "dimnames")=List of 2
# .. .. ..$ : chr [1:8] "Long.jump" "Discus" "Shot.put" "High.jump" ...
# .. .. ..$ : chr [1:2] "correlation" "p.value"
# $ Dim.2:List of 1
# ..$ quanti: num [1:3, 1:2] 8.07e-01 7.84e-01 -4.65e-01 3.21e-06 9.38e-06 ...
# .. ..- attr(*, "dimnames")=List of 2
# .. .. ..$ : chr [1:3] "Pole.vault" "X1500m" "High.jump"
# .. .. ..$ : chr [1:2] "correlation" "p.value"
So the structure of the object is simple, it is a list of two lists. Each of these sublists has just one member, a matrix with the dimnames attribute set.
So you can use standard accessor functions to get those attributes.
rownames(pca_dim$Dim.1$quanti)
#[1] "Long.jump" "Discus" "Shot.put" "High.jump" "Javeline"
#[6] "X400m" "X110m.hurdle" "X100m"
rownames(pca_dim$Dim.2$quanti)
#[1] "Pole.vault" "X1500m" "High.jump"
You have to move the result of dimdesc to data.frame for each element, like this:
rownames(data.frame(res.desc[1]))
[1] "Long.jump" "Discus" "Shot.put" "High.jump" "Javeline" "X400m" "X110m.hurdle"
[8] "X100m"
> rownames(data.frame(res.desc[2]))
[1] "Pole.vault" "X1500m" "High.jump"
i have a problem:
I work with a data frame with development time (dependent variable) of five species according to the Temperature (independent variable)
with a "by" function I calculated lm's for all of the five species
by(dados, dados$Especie, function(dados) lm(dados$Tempo ~ dados$Temp, data = dados)
and as result I got lists nested in other lists as yo can see here
List of 5
$ C.albiceps :List of 12
..$ coefficients : Named num [1:2] 262.78 -1.76
.. ..- attr(*, "names")= chr [1:2] "(Intercept)" "dados$Temp"
..$ residuals : Named num [1:41] -4.157 -2.394 -0.631 1.131 2.894 ...
.. ..- attr(*, "names")= chr [1:41] "1" "2" "3" "4" ...
..$ effects : Named num [1:41] -1344.031 -133.548 0.235 1.977 3.72 ...
.. ..- attr(*, "names")= chr [1:41] "(Intercept)" "dados$Temp" "" "" ...
..$ rank : int 2
It's a list of 5 elements (one for each specie), and each specie is a list of 12 elements (from the lm function). So, 5 lists of 12, within a list of 5.
Now, my question:
I want to extract the values from my coefficients and sum them up. So I got
List$speciesName$coefficients[2] and I want to extract each value (the second item of $coefficients, for each specie), and I also want to save it in a vector (in order do calculate indices with it).
Any helpful hints on that?
You can loop over the models with an apply function and extract the coefficients with coef.
## Example
mods <- by(mtcars, mtcars$cyl, function(x) lm(mpg ~ disp, data=x))
## Sum up all the second coefficients
sum(sapply(mods, function(x) coef(x)[[2]]))
I am working with the (I think) very cool titanic data that is publicly available.
There are two principal ways of how to import it to R:
(1) You can either use the built-in dataset Titanic (library(datasets)) or
(2) you can download it as .csv-file, e.g. here.
Now, the data is aggregated frequency data. I would like to convert the multi-dimensional contingency table into an individual-level data frame.
PROBLEM: If I use the built-in dataset, this is no problem; if I use the imported .csv-file, however, it doesn't work. This is the error message I get:
Error in rep(1:nrow(tablevars), counts) : invalid 'times' argument In
addition: Warning message: In expand.table(Titanic.table) : NAs
introduced by coercion
Why? And what do I wrong? Many thanks.
R CODE
#required packages
library(datasets)
library(epitools)
#(1) Expansion of built-in data set
data(Titanic)
Titanic.raw <- Titanic
class(Titanic.raw) # data is stored as "table"
Titanic.expand <- expand.table(Titanic.raw)
#(2) Expansion of imported data set
Titanic.raw <- read.table("Titanic.csv", header=TRUE, sep=",", row.names=1)
class(Titanic.raw) #data is stored as "data.frame"
Titanic.table <- as.table(as.matrix(Titanic.raw))
class(Titanic.table) #data is stored as "table"
Titanic.expand <- expand.table(Titanic.table)
I think you probably want xtabs: Watch out that the factor coding is different for the factors in the Titanic and the Titanic.new objects. By default factor levels have lexicographic order, while two of the Titanic factors do not :
str(Titanic)
table [1:4, 1:2, 1:2, 1:2] 0 0 35 0 0 0 17 0 118 154 ...
- attr(*, "dimnames")=List of 4
..$ Class : chr [1:4] "1st" "2nd" "3rd" "Crew"
..$ Sex : chr [1:2] "Male" "Female"
..$ Age : chr [1:2] "Child" "Adult"
..$ Survived: chr [1:2] "No" "Yes"
Titanic.raw <- read.table("~/Downloads/Titanic.csv", header=TRUE, sep=",", row.names=1)
str( Titanic.new <-
xtabs( Freq ~ Class + Sex + Age +Survived, data=Titanic.raw))
xtabs [1:4, 1:2, 1:2, 1:2] 4 13 89 3 118 154 387 670 0 0 ...
- attr(*, "dimnames")=List of 4
..$ Class : chr [1:4] "1st" "2nd" "3rd" "Crew"
..$ Sex : chr [1:2] "Female" "Male"
..$ Age : chr [1:2] "Adult" "Child"
..$ Survived: chr [1:2] "No" "Yes"
- attr(*, "class")= chr [1:2] "xtabs" "table"
- attr(*, "call")= language xtabs(formula = Freq ~ Class + Sex + Age + Survived, data = Titanic.raw)
An 'xtabs'-object inherits from 'table'-class so you can use that expand.table function.
I'm using the bystat function from the Hmisc package in R. How can I extract attribute values and place them into variables. For example, I want to calculate mean and SD for variable aaf and put them in a dataframe or matrix.
t <- with(d.aaf,bystats(y=aaf,plot_bid,fun=function(x) {
c(Mean = round(mean(x),digits=2),SD = round(sd(x),digits=2))
}))
> str(t)
bystats [1:121, 1:3] 5 5 5 5 5 4 5 5 3 4 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:121] "P00000000006001288020278" "P00000000006001288085814"
"P00000000006001288151350" "P00000000006001288216886" ...
..$ : chr [1:3] "N" "Mean" "SD"
- attr(*, "heading")= chr "function(x) { c(Mean = round(mean(x),digits=2),
SD = round(sd(x),digits=2)) }
of aaf by plot_bid"
- attr(*, "byvarnames")= chr "plot_bid"
The way I'm doing it is by first converting "t" into a dataframe, which I do not think is very efficient.
Thanks for your suggestions.
You could use ddply from the plyr package which outputs directly to a data frame.
library(plyr)
t<-ddply(d.aaf, "plot_bid", summarise, mean=round(mean(aaf),2), SD=round(sd(aaf),2))
SD<-t$SD
mean<-t$mean