I would like to use attributes to store variable names like Stata does with their labels: Instead of e.g. printing the variable name (to e.g. output tables), I'd rather have attribute thereof (hence I call the attribute name). But how can I access it in a loop?
dummies <- c("a", "b", "c")
attr(dummies, "names") <- c("First letter", "Second letter", "Third letter")
for (dummy in dummies) {
# do something with dummy
# e.g. accessing a variable in a dataframe
# and printing something to a table
print attr(dummies$dummy, "names") # doesn't work
print attr(dummies, "names")$dummy # doesn't work
}
As an alternative approach one can use a matrix:
dummies <- c("a", "b", "c")
names <- c("First letter", "Second letter", "Third letter")
dummies.matrix <- matrix(c(dummies, names), nrow=3)
Then I loop over dummies.matrix:
for (i in 1:nrow(dummies.matrix)) {
print(dummies.matrix[i,1]) # value
print(dummies.matrix[i,2]) # name or label
}
But that's neither convenient nor intuitive.
It looks like you have an indexing problem.
dummies <- c("a", "b", "c")
attr(dummies, "names") <- c("First letter", "Second letter", "Third letter")
for (i in seq_along(dummies)) {
print(dummies[i])
print(attr(dummies[i], "names"))
}
As a style point, be cautious about using indexes like dummy on vectors named dummies. At some point the vector and the index start to blend together, which makes it harder to interpret what the code should be doing.
Related
Very basic question - but something I haven't seen before. My variable names have 'subnames' beneath them (see image link below).
variable names and subnames
When I call colnames - i just get the main name:
colnames(df)
"Age" "Gender" "AgeGenderQuota" "AgeGenStateQuota"
"Q1_1" "Q2_1" "Q3_1"
Any ideas on how i call the subnames in the pic above?
These are stored as a column attribute called label. You can access them with the function attr():
An example:
df <- data.frame(
x = structure(10, label = 'This is x'),
y = structure(3, label = 'and this is our y')
)
attr(df$x, 'label')
# [1] "This is x"
And modify:
attr(df$x, 'label') <- 'This is x which is our first column'
And to get all at once:
sapply(df, attr, 'label')
# x y
# "This is x which is our first column" "and this is our y"
To see all the attributes you can use the function attributes():
attributes(df$x)
# $label
# [1] "This is x"
The subnames that you're referencing are the column labels.
To retrieve all the labels, you can use:
library(tidyverse)
colnames(df) %>%
map(~attr(df[[.x]], "label")) %>%
flatten()
where attr() returns a named character vector.
This code loops attr() over all the columns and returns a named list of column labels.
Alternate Solution
If you want an easy one liner to retrieve the column labels as a vector, check out a tidyverse-approved package sjlabelled
library(sjlabelled)
labels <- get_label(df)
I create named lists manually such as:
FD_OesophagitisIntro<-list(x="LA Grade A",
x="LA Grade B",
x="LA Grade C",
x="LA Grade D")
but this is repetitive so a neater version is
FD_OesophagitisIntro<-list(unique(append(FD_OesophagitisIntro,replicate(4,paste("LA Grade ",sample(c("A","B","C","D"),replace=F))))))
however this creates a non named list. How can I create the list above with the neater code
If your question is how do you use replicate to create a named list, you can only do that if the expression has a name.
replicate is a wrapper to sapply with the expression evaluated as an anonymous function like this:
sapply(integer(4), function(...){
paste("LA Grade ", sample(c("A", "B", "C", "D"), replace = F)
})
There is no ... argument for replicate, but thankfully, sapply's USE.NAMES argument is set to TRUE by default. So to get names from this, you need to either have X be "character" (it isn't, it's "integer"), or have the return value of expr have names. It isn't. It's the return value of a call to paste(), which calls as.character() on all it's arguments, so removes attributes, including names. You can see this in the following example:
paste(c(a = "x", b = "x"), c(a = "y", b = "y"))
[1] "x y" "x y"
This means your solution will involve separating the call to replicate out, THEN assigning names to the object it returns. Sadly, it then becomes a fake one-liner with curly braces, or not a one liner at all.
You're also going to have to pass the product of replicate to append as a list, so that its names are retained, and not use unique either (since it strips names).
Here's an example:
repd <- replicate(4, paste("LA Grade ", sample(c("A", "B", "C", "D"), replace = FALSE)))
names(repd) <- rep("x", length(repd))
long <- append(FD_OesophagitisIntro, as.list(repd))
FD_OesophagitisIntro <- long[!duplicated(long)]
names(FD_OesophagitisIntro)
# [1] "x" "x" "x" "x" "x" "x" "x" "x"
I manage to create the attribute label for determined variables in a dataset, but I used a loop. I would like to avoid using a loop, can you help me?
Here is a toy example with the iris dataset.
Let´s suppose I want to add an attribute label to the "Sepal.length", "Petal.width", and "Species" variables. What I did was the following:
1) created a vector with the name of the variables I want to add the attribute to.
varNames <- c("Sepal.Length", "Petal.Width", "Species")
2) created a character vector with the labels I want to add
newLabels <- c("a", "b", "c")
3) Then, created a for loop to do the task of assigning attribute labels to
the selected variables.
for (i in 1:length(varNames)) {
attributes(iris[[which(names(iris) %in% varNames[i])]])$label <-
newLabels[i]
}
How can I do this without a for loop?
You could do it by finding columns that you want to append "a","b" and "c" and using %in%, and append the appropriate tag.
# Your vector
varNames <- c("Sepal.Length", "Petal.Width", "Species")
# Use names() to append
names(newLabels) <- c("a", "b", "c")
Code to append appropriate tag
names(iris)[names(iris) %in% varNames] <- paste(names(iris)[names(iris) %in% varNames], names(newLabels), sep = ".")
# And output
> names(iris)
[1] "Sepal.Length.a" "Sepal.Width" "Petal.Length" "Petal.Width.b" "Species.c"
UPDATED POST
I you want to change the atrribute label of the iris variables than you can achieve this by using lapply and label like this
varNames = c(Sepal.Length="a", Petal.Width="b",Species="c")
# Apply to each value of varNames
label(iris[c("Sepal.Length", "Petal.Width", "Species")]) = lapply(names(varNames),
function(x) label(iris[,x]) = varNames[x])
And the output
> attributes(iris$Sepal.Length)$label
Sepal.Length
"a"
> attributes(iris$Petal.Width)$label
Petal.Width
"b"
> attributes(iris$Species)$label
Species
"c"
The following code will not work on built in datasets like iris and you will have to modify the data-frame name in the function code for every data-frame you're using this on...
That being said, on a normal data-frame like for example this one:
dta=data.frame(SL=c(1,2,3,4,5),SW=c(6,7,8,9,10),PL=c(11,12,13,14,15),PW=c(16,17,18,19,20),Spe=c("f","g","h","i","j"))
with similar additional information:
varNames <- c("SL", "PW", "Spe")
newLabels <- c("a", "b", "c")
this is a way to do it without loop:
fu=function(i){
attributes(dta[[which(names(dta) %in% varNames[i])]])$label <<- newLabels[i]
}
mapply(fu,1:length(varNames))
verify first label:
> attributes(dta[[1]])$label
[1] "a"
I am attempting to create my first custom function in R (yay!). I've got something that sort of works now but I think it could be improved.
Basically, I want to create my own custom table within R that can be run through xtable for a final report. I want the table to follow this format for each column:
group1mean, group1sd, group2mean, group2sd, t-value, p-value.
At current, my function does this. However, it produces column names (e.g., V3 and V4) that I would like to leave blank and I would like to have it loop through multiple dependent variables and append the results as new rows in the matrix automatically. Right now, I have to write a line of code for each dependent variable manually (in the example below the DVs are PWB, SWB, and EWB.
Here is my code so far:
data <- read.delim("~/c4044sol.txt", header=T)
library(psych)
proc.ttest <- function(dv,group,decimals) {
x1 <- describeBy((dv), (group), mat=TRUE)
stat1 <- t.test((dv) ~ (group))
output1 <- c(paste (round(x1$mean[1], digits=(decimals)),"(", round(x1$sd[1], digits= (decimals)), ")", sep =" "),
paste (round(x1$mean[2], digits=(decimals)), "(", round(x1$sd[2], digits=(decimals)), ")", sep =" "),
round(stat1$statistic, digits=2), round(stat1$p.value, digits=3))
return(output1)
}
toprow <- c("M (SD)", "M (SD)", "t", "p")
outtable <- rbind(toprow,
proc.ttest(data$PWB, data$college, 2),
proc.ttest(data$SWB, data$college, 2),
proc.ttest(data$EWB, data$college, 2))
colnames(outtable) <- c("College graduate", "Less than college graduate", "", "")
row.names(outtable) <- c("", "PWB", "SWB", "EWB")
library(xtable)
xtable(outtable)
So to repeat, I would like to suppress the column names "V3" and "V4" (leave them blank) and make the code run automatically on a list of variables. Are either of these things possible? Thanks for your time.
Try keeping outtable as you have it, but without toprow.
Instead, use toprow as the names:
toprow <- c("M (SD)", "M (SD)", "t", "p")
outtable <- rbind( # toprow,
proc.ttest(data$PWB, data$college, 2),
proc.ttest(data$SWB, data$college, 2),
proc.ttest(data$EWB, data$college, 2))
names(outtable) <- toprow
## note that the parens and spaces are
## not best practices, but this should still
## get your your desired results
I fixed the extra column labels printing issue by putting all the labels I actually wanted in the final table in the first two rows of the matrix...
toptoprow <- c("College graduate", "Less than college graduate", "", "")
toprow <- c("M (SD)", "M (SD)", "t", "p")
outtable <- rbind(toptoprow,toprow, proc.ttest(PWB, college, 2),
proc.ttest(SWB, college, 2),
proc.ttest(EWB, college, 2))
And then suppressing the colnames using the print function (as suggested by Ricardo)...
print(xtable(outtable), hline.after=c(-1,1,nrow(outtable)),include.colnames=FALSE)
I still would like to automate the function itself so I can ideally give it a list of variable names, it will run the function on each variable, and populate the results in the final matrix. But one baby step at at time...
I am parsing the left-hand side of an R formula. In my specific case, this can be a variable or object with an index (something like myvariable[[3]]). I would like to access the third sub-object of this object and store it in another object. The following example starts at the point where I have the string of the indexed object, but I need the reference.
mychars <- c("a", "b", "c")
mystring <- "mychars[2]"
get(mystring) # does not work
eval(as.name(mystring)) # does not work either
I could of course parse the number using regular expressions and use as.numeric to convert it to a real index. But in some cases, there may be named indices, like mystring["second"]. So how can I extract the sub-object?
You can parse and then eval this expression.
mychars <- c("a", "b", "c")
mystring <- "mychars[2]"
eval(parse(text = mystring))
[1] "b"
It works for named indices too
names(mychars) <- c("first", "second", "third")
eval(parse(text = 'mychars["second"]'))
second
"b"