Dataframe variable names in R

Dataframe variable names in R - r

Very basic question - but something I haven't seen before. My variable names have 'subnames' beneath them (see image link below).
variable names and subnames
When I call colnames - i just get the main name:
colnames(df)
"Age" "Gender" "AgeGenderQuota" "AgeGenStateQuota"
"Q1_1" "Q2_1" "Q3_1"
Any ideas on how i call the subnames in the pic above?

These are stored as a column attribute called label. You can access them with the function attr():
An example:
df <- data.frame(
x = structure(10, label = 'This is x'),
y = structure(3, label = 'and this is our y')
)
attr(df$x, 'label')
# [1] "This is x"
And modify:
attr(df$x, 'label') <- 'This is x which is our first column'
And to get all at once:
sapply(df, attr, 'label')
# x y
# "This is x which is our first column" "and this is our y"
To see all the attributes you can use the function attributes():
attributes(df$x)
# $label
# [1] "This is x"

The subnames that you're referencing are the column labels.
To retrieve all the labels, you can use:
library(tidyverse)
colnames(df) %>%
map(~attr(df[[.x]], "label")) %>%
flatten()
where attr() returns a named character vector.
This code loops attr() over all the columns and returns a named list of column labels.
Alternate Solution
If you want an easy one liner to retrieve the column labels as a vector, check out a tidyverse-approved package sjlabelled
library(sjlabelled)
labels <- get_label(df)

Related

How to use variable as element name when creating list in R

Suppose I have
label <- 'My val'
and I try to create the list
Output <- list(
label = pi
)
I get that the name of the first (only) object in the list is "label" but I want "My val".
I understand I can do
names(Output) <- label
But the list is quite long and I'd rather name it in the list function.

No, you can't reference variables for names when using the list() function to create a list. It will just interpret any variable name as name for the entry. But after constructing the list, you can change the names:
label <- 'My val'
Output <- list(pi)
names(Output)=label

Maybe this could help you, but it would've been much better if you could share a more detailed sample as I thought there might be more variable names involved:
label <- 'My val'
Output <- list(
label = pi
)
Output |>
setNames(label)
$`My val`
[1] 3.141593

Another option is lst from dplyr
library(dplyr)
lst(!!label := pi)
#$`My val`
#[1] 3.141593

Beginner using pipes

I am a beginner and I'm trying to find the most efficient way to change the name of the first column for many CSV files that I will be creating. Once I have created the CSV files, I am loading them into R as follows:
data <- read.csv('filename.csv')
I have used the names() function to do the name change of a single file:
names(data)[1] <- 'Y'
However, I would like to find the most efficient way of combining/piping this name change to read.csv so the same name change is applied to every file when they are opened. I tried to write a 'simple' function to do this:
addName <- function(data) {
names(data)[1] <- 'Y'
data
}
However, I do not yet fully understand the syntax for writing a function and I can't get this to work.

Note
If you were expecting your original addName function to "mutate" an existing object like so
x <- data.frame(Column_1 = c(1, 2, 3), Column_2 = c("a", "b", "c"))
# Try (unsuccessfully) to change title of "Column_1" to "Y" in x.
addName(x)
# Print x.
x
please be aware that R passes by value rather than by reference, so x itself would remain unchanged:
Column_1 Column_2
1 1 a
2 2 b
3 3 c
Any "mutation" would be achieved by overwriting x with the return value of the function
x <- addName(x)
# Print x.
x
in which case x itself would obviously be changed:
Y Column_2
1 1 a
2 2 b
3 3 c
Answer
Anyway, here's a solution that compactly incorporates pipes (%>% from the magrittr package) and a custom function. Please note that without the linebreaks and comments, which I have added for clarity, this could be condensed to only a few lines of code.
# The dplyr package helps with easy renaming, and it includes the magrittr pipe.
library(dplyr)
# ...
filenames <- c("filename1.csv", "filename2.csv", "filename3.csv")
# A function to take a CSV filename and give back a renamed dataset taken from that file.
addName <- function(filename) {
return(# Read in the named file as a data.frame.
read.csv(file = filename) %>%
# Take the resulting data.frame, and rename its first column as "Y";
# quotes are optional, unless the name contains spaces: "My Column"
# or `My Column` are needed then.
dplyr::rename(Y = 1))
}
# Get a list of all the renamed datasets, as taken by addName() from each of the filenames.
all_files <- sapply(filenames, FUN = addName,
# Keep the list structure, in which each element is a
# data.frame.
simplify = FALSE,
# Name each list element by its filename, to help keep track.
USE.NAMES = TRUE)
In fact, you could easily rename any columns you desire, all in one fell swoop:
dplyr::rename(Y = 1, 'X' = 2, "Z" = 3, "Column 4" = 4, `Column 5` = 5)

This will read a vector of filenames, change the name of the first column of each one to "Y" and store all of the files in a list.
filenames <- c("filename1.csv","filename2.csv")
addName <- function(filename) {
data <- read.csv(filename)
names(data)[1] <- 'Y'
data
}
files <- list()
for (i in 1:length(filenames)) {
files[[i]] <- addName(filenames[i])
}

How to use a List Item in fct_recode

I want to rename factor levels with fct_recode by using items I created beforehand.
I first create some labels and save them into a list:
#Creating the Labels:
LabelsWithN <- c(
sprintf("Man(%s)", FreqGender["Man","Freq"]),
sprintf("Woman(%s)", FreqGender["Woman","Freq"]),
sprintf("Non-Binary(%s)", FreqGender["Non-Binary","Freq"]),
sprintf("Other(%s)", FreqGender["Other","Freq"]),
sprintf("Prefer Not To Disclose(%s)", FreqGender["Prefer not to disclose","Freq"])
)
This creates a chr list with items like "Man(105)", "Woman(51)" etc.
Now I want to relabel the factors in the original DataSet (i.e. "Man" --> "Man(105)") in order to label a graph. I want to use either the list item (i.e., LabelsWithN[1]) or directly the function creating the string (i.e., sprintf("Man(%s)", FreqGender["Man","Freq"]).
I then try to enter either the list item or the function into fct_recode:
#Using the Labels:
DataSet %>%
mutate(`Gender. What_is_your_ge.._` = fct_recode(`Gender. What_is_your_ge.._`, LabelsWithN[1] = "Man", sprintf("Woman(%s)", FreqGender["Woman","Freq"]) = "Woman")) %>%
#THis is just the code for the graph:
ggplot(aes(x = `Gender. What_is_your_ge.._` , y = `Age. How_old_are_you?`, main = "Age Distribution By Gender")) +
geom_boxplot() +
xlab("Gender (n)") +
ylab("Age")
However, this yields:
"Unexpected '=' in:
"DataSet %>%
mutate(`Gender. What_is_your_ge.._` = fct_recode(`Gender. What_is_your_ge.._`, LabelsWithN[1] ="
It doesn't matter if I use the function or the list item.
The vector is a factor and the list is filled with characters. If I manipulate the code to rename the factor "man" to "cat" ("cat" = "Man") the code works fine.
How can I address the list item/enter the function into fct_recode so that it works?
Also, can somebody explain to me what the problem here is? If I print out LabelsWithN[1] I get the correct string printed out.
Thank you and Bw,
Jan

perhaps this might help ?
x <- factor(c("apple", "bear", "banana", "dear")) # what you want to recode
levels <- c(fruit = "apple", fruit = "banana") # how you want to recode it
x <- fct_recode(x, !!!levels) # recoding it

How to name Dataframes inside a List

I have created a basic list, and inside this list called lista (not big fantasy I know) there are 10 small dataframes.
Each one of this dataframes is called "numberone","numbertwo",...,"numberten".
When I accede this list I can't see their names.
but the output I can see in the workspace (Rstudio) is this
This below is the code and my tries:
#creating multiple dataframes and a list and then give a title to this dataframes inside the list.
lista = list()
names = c("numberone","numbertwo","numberthree","numberfour","numberfive","numbersix","numberseven","numbereight","numbernine","numberten")
for (i in 1:10) {
x = rnorm(10)
df = data.frame(x)
assign(names[i],df)
lista[[i]] = df
}
#trying to change manually the names of the dataframes inside the "lista" list
names(lista[1]) = "number one"
print(names(lista[1])) #this gives no results
#trying using dput
output = dput(lista[1])
##trying put manually the name in front of the dput output to rename the first dataframe inside lista..
list('numberone'= structure(list(x = c(0.750704535096297, 1.16925878942967,
0.806475114411396, 1.00973486249489, -0.301553383694518, 0.546485320708262,
1.03645444095639, 0.247820396853631, -1.64294545886444, -0.216784798035195
)), class = "data.frame", row.names = c(NA, -10L)))
#this seems to have renamed the first dataframe but, it's not working anyway
lista$numberone
print(names(lista[1])) #still no results
I've tried almost everything I could, but I can't give this single dataframes their names inside the list.
How can i name these dataframes?
Thank You

Try to do names(list)
Here an example using empty lists
list_test = vector("list",4)
names(list_test) = c("A","B","C","D")
list_test
$A
NULL
$B
NULL
$C
NULL
$D
NULL
With your example, I did:
names(lista) <- names
and I get:
names(lista)
[1] "numberone" "numbertwo" "numberthree" "numberfour" "numberfive" "numbersix" "numberseven"
[8] "numbereight" "numbernine" "numberten"

I think you might be looking to use double brackets (e.g.[[1]]) to reference elements in your list. Using your example code, this will work:
names(lista[[1]]) = "number one"
print(names(lista[[1]])) #first element is now called "number one"
You can also use a setNames() function within a Map() function to rename each column for your list of dataframes.
lista <-Map(setNames, lista , names)
lista # each column is now assigned a name from your vector called names
To keep your code clean as possible, it is best to avoid naming objects with the same names as functions. (Your example code uses a vector called "names" but also uses names() function.)

R - How to replace a string from multiple matches (in a data frame)

I need to replace subset of a string with some matches that are stored within a dataframe.
For example -
input_string = "Whats your name and Where're you from"
I need to replace part of this string from a data frame. Say the data frame is
matching <- data.frame(from_word=c("Whats your name", "name", "fro"),
to_word=c("what is your name","names","froth"))
Output expected is what is your name and Where're you from
Note -
It is to match the maximum string. In this example, name is not matched to names, because name was a part of a bigger match
It has to match whole string and not partial strings. fro of "from" should not match as "froth"
I referred to the below link but somehow could not get this work as intended/described above
Match and replace multiple strings in a vector of text without looping in R
This is my first post here. If I haven't given enough details, kindly let me know

Edit
Based on the input from Sri's comment I would suggest using:
library(gsubfn)
# words to be replaced
a <-c("Whats your","Whats your name", "name", "fro")
# their replacements
b <- c("What is yours","what is your name","names","froth")
# named list as an input for gsubfn
replacements <- setNames(as.list(b), a)
# the test string
input_string = "fro Whats your name and Where're name you from to and fro I Whats your"
# match entire words
gsubfn(paste(paste0("\\w*", names(replacements), "\\w*"), collapse = "|"), replacements, input_string)
Original
I would not say this is easier to read than your simple loop, but it might take better care of the overlapping replacements:
# define the sample dataset
input_string = "Whats your name and Where're you from"
matching <- data.frame(from_word=c("Whats your name", "name", "fro", "Where're", "Whats"),
to_word=c("what is your name","names","froth", "where are", "Whatsup"))
# load used library
library(gsubfn)
# make sure data is of class character
matching$from_word <- as.character(matching$from_word)
matching$to_word <- as.character(matching$to_word)
# extract the words in the sentence
test <- unlist(str_split(input_string, " "))
# find where individual words from sentence match with the list of replaceble words
test2 <- sapply(paste0("\\b", test, "\\b"), grepl, matching$from_word)
# change rownames to see what is the format of output from the above sapply
rownames(test2) <- matching$from_word
# reorder the data so that largest replacement blocks are at the top
test3 <- test2[order(rowSums(test2), decreasing = TRUE),]
# where the word is already being replaced by larger chunk, do not replace again
test3[apply(test3, 2, cumsum) > 1] <- FALSE
# define the actual pairs of replacement
replacements <- setNames(as.list(as.character(matching[,2])[order(rowSums(test2), decreasing = TRUE)][rowSums(test3) >= 1]),
as.character(matching[,1])[order(rowSums(test2), decreasing = TRUE)][rowSums(test3) >= 1])
# perform the replacement
gsubfn(paste(as.character(matching[,1])[order(rowSums(test2), decreasing = TRUE)][rowSums(test3) >= 1], collapse = "|"),
replacements,input_string)

toreplace =list("x1" = "y1","x2" = "y2", ..., "xn" = "yn")
function have two arguments xi and yi.
xi is pattern (find what), yi is replacement (replace with).
input_string = "Whats your name and Where're you from"
toreplace<-list("Whats your name" = "what is your name", "names" = "name", "fro" = "froth")
gsubfn(paste(names(toreplace),collapse="|"),toreplace,input_string)

Was trying out different things and the below code seems to work.
a <-c("Whats your name", "name", "fro")
b <- c("what is your name","names","froth")
c <- c("Whats your name and Where're you from")
for(i in seq_along(a)) c <- gsub(paste0('\\<',a[i],'\\>'), gsub(" ","_",b[i]), c)
c <- gsub("_"," ",c)
c
Took help from the below link Making gsub only replace entire words?
However, I would like to avoid the loop if possible. Can someone please improve this answer, without the loop

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Dataframe variable names in R - r

Related

How to use variable as element name when creating list in R

Beginner using pipes

How to use a List Item in fct_recode

How to name Dataframes inside a List

R - How to replace a string from multiple matches (in a data frame)

Categories

Resources