Relabelling multiple columns label attribute without a for loop - r

I manage to create the attribute label for determined variables in a dataset, but I used a loop. I would like to avoid using a loop, can you help me?
Here is a toy example with the iris dataset.
Let´s suppose I want to add an attribute label to the "Sepal.length", "Petal.width", and "Species" variables. What I did was the following:
1) created a vector with the name of the variables I want to add the attribute to.
varNames <- c("Sepal.Length", "Petal.Width", "Species")
2) created a character vector with the labels I want to add
newLabels <- c("a", "b", "c")
3) Then, created a for loop to do the task of assigning attribute labels to
the selected variables.
for (i in 1:length(varNames)) {
attributes(iris[[which(names(iris) %in% varNames[i])]])$label <-
newLabels[i]
}
How can I do this without a for loop?

You could do it by finding columns that you want to append "a","b" and "c" and using %in%, and append the appropriate tag.
# Your vector
varNames <- c("Sepal.Length", "Petal.Width", "Species")
# Use names() to append
names(newLabels) <- c("a", "b", "c")
Code to append appropriate tag
names(iris)[names(iris) %in% varNames] <- paste(names(iris)[names(iris) %in% varNames], names(newLabels), sep = ".")
# And output
> names(iris)
[1] "Sepal.Length.a" "Sepal.Width" "Petal.Length" "Petal.Width.b" "Species.c"
UPDATED POST
I you want to change the atrribute label of the iris variables than you can achieve this by using lapply and label like this
varNames = c(Sepal.Length="a", Petal.Width="b",Species="c")
# Apply to each value of varNames
label(iris[c("Sepal.Length", "Petal.Width", "Species")]) = lapply(names(varNames),
function(x) label(iris[,x]) = varNames[x])
And the output
> attributes(iris$Sepal.Length)$label
Sepal.Length
"a"
> attributes(iris$Petal.Width)$label
Petal.Width
"b"
> attributes(iris$Species)$label
Species
"c"

The following code will not work on built in datasets like iris and you will have to modify the data-frame name in the function code for every data-frame you're using this on...
That being said, on a normal data-frame like for example this one:
dta=data.frame(SL=c(1,2,3,4,5),SW=c(6,7,8,9,10),PL=c(11,12,13,14,15),PW=c(16,17,18,19,20),Spe=c("f","g","h","i","j"))
with similar additional information:
varNames <- c("SL", "PW", "Spe")
newLabels <- c("a", "b", "c")
this is a way to do it without loop:
fu=function(i){
attributes(dta[[which(names(dta) %in% varNames[i])]])$label <<- newLabels[i]
}
mapply(fu,1:length(varNames))
verify first label:
> attributes(dta[[1]])$label
[1] "a"

Related

create new dataframes from a master database in R

I have a database of different notifiable diseases.
I want to extract a dataframe for each disease in that database so that I can make an automated report form a template in Rmarkdown.
I created a function for creating the dataframe
NMC <- is master database
The database lists all conditions reported
I created a list of those conditions
conditions <- list(unique(NMC$Condition))
I then created a function to create a new dataframe based on the condition
newdf <- function(data, var){
var <- data %>% filter(data$Condition %in% paste0(var))
var
}
Now I want to run my function to create a number of new dataframes from the master database. I thought of doing a for loop:
for (df in conditions){
df <- newdf(NMC, "df")
}
Which runs but doesn't give me anything.
So I found split(), but this hasn't perfectly solved my problem as I still need to type out all the conditions to get each df to apply to the r template.
NMC <- split(NMC, factor(NMC$Condition), drop= FALSE)
#then to get a specifc df (which is laborious)
rubella <- NMC$congenitalrubellasyndrome
# How can i get the dataframes per condition into my environemnt, or access them easily, maybe with %>% fucntion?
My end goal is to then apply an R template to each data frame so that i have a standard epicurve/descriptive stats for each disease.
Thanks
> df <- data.frame(a = rep(letters[1:10], each = 3), x = 1:30)
> for (i in df$a) {
+ assign(i, df[df$a == i, ])
+ }
> ls()
[1] "a" "b" "c" "d" "df" "e" "f" "g" "h" "i" "j"
> a
a x
1 a 1
2 a 2
3 a 3
But see my comment above.

R function that selects certain columns from a dataframe

I am trying to figure out how to write a function in R than can select specific columns from a dataframe(df) for subsetting:
Essentially I have df with columns or colnames : count_A.x, count_B.x, count_C.x, count_A.y, count_B.y, count_C.y.
I would ideally like a function where I can select both "count_A.x" and "count_A.y" columns by simply specifying count_A in function argument.
I tried the following:
e.g. pull_columns2 <- function(df,count_char){
df_subset<- df%>% select(,c(count_char.x, count_char.y))
}
Unfortunately when I run the above code [i.e., pull_columns2(df, count_A)] the following code it rightfully says that column count_char.x does not exist and does not "convert" count_char.x to count_A
pull_columns2(df, count_A)
We can use
pull_columns2 <- function(df,count_char){
df_subset<- df %>% select(contains(count_char))
df_subset
}
#> then use it as follows
df %>% pull_columns2("count_A")
Try
select_func = function(df, pattern){
return(df[colnames(df)[which(grepl(pattern, colnames(df)))]])
}
df = data.frame("aaa" = 1:10, "aab" = 1:10, "bb" = 1:10, "ca" = 1:10)
select_func(df,"b")

Unable to create a for loop to convert multiple varialbe into as.factor

I am trying to write a for loop in r to convert multiple variables with similar character pattern in r to as.factor.
Below is the function I wrote, R runs the code, does not show any error, but does not give the desired output. There is a logical error, can someone help me correct this error?
for (i in grep(pattern = "hml35_", x=tanre))
{
tanre$i<-as.factor(tanre$i)
}
Assuming the grep pattern returns the column names, you need to change the syntax:
for (i in grep(pattern = "hml35_", x=tanre))
{
tanre[[i]]<-as.factor(tanre[[i]])
}
R is expecting the literal column name when you use the $ operator.
Edit: You could use lapply here instead of a loop. I would also have a look at using mutate across.
Edit 2:
As requested, here is how you could do it with lapply:
# Create some data
tanre <- data.frame(
id = c(1:12),
hml35_a = rep(c("a", "b", "c"), 4),
hml35_b = rep(c("a", "b", "c"), 4)
)
sapply(tanre, class)
# id hml35_a hml35_b
# "integer" "character" "character"
# Make factor
tanre[grep("hml35_", names(tanre))] <- lapply(tanre[grep("hml35_", names(tanre))], as.factor)
sapply(tanre, class)
# id hml35_a hml35_b
# "integer" "factor" "factor"
A solution with tidyverse
library(tidyverse)
tanre %>%
mutate_at(vars(contains("hml35_")), as.factor)

How to convert a dataframe in long format into a list of an appropriate format?

I have a dataframe in the following long format:
I need to convert it into a list which should look something like this:
Wherein, each of the main element of the list would be the "Instance No." and its sub-elements should contain all its corresponding Parameter & Value pairs - in the format of "Parameter X" = "abc" as you can see in the second picture, listed one after the other.
Is there any existing function which can do this? I wasn't really able to find any. Any help would be really appreciated.
Thank you.
A dplyr solution
require(dplyr)
df_original <- data.frame("Instance No." = c(3,3,3,3,5,5,5,2,2,2,2),
"Parameter" = c("age", "workclass", "education", "occupation",
"age", "workclass", "education",
"age", "workclass", "education", "income"),
"Value" = c("Senior", "Private", "HS-grad", "Sales",
"Middle-aged", "Gov", "Hs-grad",
"Middle-aged", "Private", "Masters", "Large"),
check.names = FALSE)
# the split function requires a factor to use as the grouping variable.
# Param_Value will be the properly formated vector
df_modified <- mutate(df_original,
Param_Value = paste0(Parameter, "=", Value))
# drop the parameter and value columns now that the data is contained in Param_Value
df_modified <- select(df_modified,
`Instance No.`,
Param_Value)
# there is now a list containing dataframes with rows grouped by Instance No.
list_format <- split(df_modified,
df_modified$`Instance No.`)
# The Instance No. is still in each dataframe. Loop through each and strip the column.
list_simplified <- lapply(list_format,
select, -`Instance No.`)
# unlist the remaining Param_Value column and drop the names.
list_out <- lapply(list_simplified ,
unlist, use.names = F)
There should now be a list of vectors formatted as requested.
$`2`
[1] "age=Middle-aged" "workclass=Private" "education=Masters" "income=Large"
$`3`
[1] "age=Senior" "workclass=Private" "education=HS-grad" "occupation=Sales"
$`5`
[1] "age=Middle-aged" "workclass=Gov" "education=Hs-grad"
The posted data.table solution is faster, but I think this is a bit more understandable.
require(data.table)
your_dt <- data.table(your_df)
dt_long <- melt.data.table(your_dt, id.vars='Instance No.')
class(dt_long) # for debugging
dt_long[, strVal:=paste(variable,value, sep = '=')]
result_list <- list()
for (i in unique(dt_long[['Instance No.']])){
result_list[[as.character(i)]] <- dt_long[`Instance No.`==i, strVal]
}
Just for reference. Here is the R base oneliner to do this. df is your dataframe.
l <- lapply(split(df, list(df["Instance No."])),
function(x) paste0(x$Parameter, "=", x$Value))

Same Column Names in flextable in R

I am trying to create a 'flexable' object from the R package "flextable". I need to put same column names in more than one columns. When I am putting them in the the "col_key" option in the function "flextable" I am getting the error of "duplicated col_keys". Is there a way to solve this problem?
a<-c(1:8)
b<-c(1:8)
c<-c(2:9)
d<-c(2:9)
A<-flextable(A,col_keys=c("a","b","a","b"))
This is the example code for which I am getting the error.
As it stands now, flextable does not allow duplicate column keys. However, you can achieve the same result by adding a row of "headers," or a row of column labels, to the top of your table. These headers can contain duplicate values.
You do this with the "add_header_row" function.
Here is a base example using the iris data set.
ft <- add_header_row(
ft,
values = c("", "length", "width", "length", "width"),
top = FALSE
)
ft <- theme_box(ft)
https://davidgohel.github.io/flextable/articles/layout.html
I found a work around by adding the character \r to the column names to create unique column names :
library(flextable)
A <- matrix(rnorm(8), nrow = 2, ncol = 4)
A <- as.data.frame(A)
col_Names <- c("a","b","a","b")
nb_Col_Names <- length(col_Names)
for(i in 1 : nb_Col_Names)
{
col_Names[i] <- paste0(col_Names[i], paste0(rep("\r", i), collapse = ""), collapse = "")
}
colnames(A) <- col_Names
tbl_A <- flextable(A)
Currently, using set_header_labels:
library(flextable)
a<-c(1:8)
b<-c(1:8)
c<-c(2:9)
d<-c(2:9)
A <- data.frame(a,b,c,d)
flextable(A) |> set_header_labels(`a` = "a", `b` = "b", `c` = "a", `d` = "b")

Resources