Variable as a column name in data frame - r

Is there any way to use string stored in variable as a column name in a new data frame? The expected result should be:
col.name <- 'col1'
df <- data.frame(col.name=1:4)
print(df)
# Real output
col.name
1 1
2 2
3 3
4 4
# Expected output
col1
1 1
2 2
3 3
4 4
I'm aware that I can create data frame and then use names() to rename column or use df[, col.name] for existing object, but I'd like to know if there is any other solution which could be used during creating data frame.

You cannot pass a variable into the name of an argument like that.
Instead what you can do is:
df <- data.frame(placeholder_name = 1:4)
names(df)[names(df) == "placeholder_name"] <- col.name
or use the default name of "V1":
df <- data.frame(1:4)
names(df)[names(df) == "V1"] <- col.name
or assign by position:
df <- data.frame(1:4)
names(df)[1] <- col.name
or if you only have one column just replace the entire names attribute:
df <- data.frame(1:4)
names(df) <- col.name
There's also the set_names function in the magrittr package that you can use to do this last solution in one step:
library(magrittr)
df <- set_names(data.frame(1:4), col.name)
But set_names is just an alias for:
df <- `names<-`(data.frame(1:4), col.name)
which is part of base R. Figuring out why this expression works and makes sense will be a good exercise.

In addition to ssdecontrol's answer, there is a second option.
You're looking for mget. First assign the name to a variable, then the value to the variable that you have previously assigned. After that, mget will evaluate the string and pass it to data.frame.
assign("col.name", "col1")
assign(paste(col.name), 1:4)
df <- data.frame(mget(col.name))
print(df)
col1
1 1
2 2
3 3
4 4

I don't recommend you do this, but:
col.name <- 'col1'
eval(parse(text=paste0('data.frame(', col.name, '=1:4)')))

Related

Extract and append data to new datasets in a for loop

I have (what I think) is a really simple question, but I can't figure out how to do it. I'm fairly new to lists, loops, etc.
I have a small dataset:
df <- c("one","two","three","four")
df <- as.data.frame(df)
df
I need to loop through this dataset and create a list of datasets, such that this is the outcome:
[[1]]
one
[[2]]
one
two
[[3]]
one
two
three
This is more or less as far as I've gotten:
blah <- list()
for(i in 1:3){
blah[[i]]<- i
}
The length will be variable when I use this in the future, so I need to automate it in a loop. Otherwise, I would just do
one <- df[1,]
two <- df[2,]
list(one, rbind(one, two))
Any ideas?
You can try using lapply :
result <- lapply(seq(nrow(df)), function(x) df[seq_len(x), , drop = FALSE])
result
#[[1]]
# df
#1 one
# [[2]]
# df
#1 one
#2 two
#[[3]]
# df
#1 one
#2 two
#3 three
#[[4]]
# df
#1 one
#2 two
#3 three
#4 four
seq(nrow(df)) creates a sequence from 1 to number of rows in your data (which is 4 in this case). function(x) part is called as anonymous function where each value from 1 to 4 is passed to one by one. seq_len(x) creates a sequence from 1 to x i.e 1 to 1 in first iteration, 1 to 2 in second and so on. We use this sequence to subset the rows from dataframe (df[seq_len(x), ]). Since the dataframe has only 1 column when we subset it , it changes it to a vector. To avoid that we add drop = FALSE.
Base R solution:
# Coerce df vector of data.frame to character, store as new data.frame: str_df => data.frame
str_df <- transform(df, df = as.character(df))
# Allocate some memory in order to split data into a list: df_list => empty list
df_list <- vector("list", nrow(str_df))
# Split the string version of the data.frame into a list as required:
# df_list => list of character vectors
df_list <- lapply(seq_len(nrow(str_df)), function(i){
str_df[if(i == 1){1}else{1:i}, grep("df", names(str_df))]
}
)
Data:
df <- c("one","two","three","four")
df <- as.data.frame(df)
df

Change dataframe column names while referencing the dataframe using a string holding its name

I want to change the column names of a dataframe in R, while using a string holding the dataframe name to reference it. However, my attempt fails:
> dataframe <- data.frame(c(1,2), c(3,4))
> dfname <- "dataframe"
> colnames(get(dfname))
[1] "c.1..2." "c.3..4."
> colnames(get(dfname)) <- c("col1", "col2")
Error in colnames(get(dfname)) <- c("col1", "col2"):
could not find function "get<-"
How can I get this example to work and change the column names of dataframe while using only dfname?
Try this:
eval(substitute(x <- setNames(x,c("col1", "col2")),list(x=as.name(dfname))))
dataframe
# col1 col2
# 1 1 3
# 2 2 4

dplyr rename_ produces error [duplicate]

dplyr's rename functions require the new column name to be passed in as unquoted variable names. However I have a function where the column name is constructed by pasting a string onto an argument passed in and so is a character string.
For example say I had this function
myFunc <- function(df, col){
new <- paste0(col, '_1')
out <- dplyr::rename(df, new = old)
return(out)
}
If I run this
df <- data.frame(a = 1:3, old = 4:6)
myFunc(df, 'x')
I get
a new
1 1 4
2 2 5
3 3 6
Whereas I want the 'new' column to be the name of the string I constructed ('x_1'), i.e.
a x_1
1 1 4
2 2 5
3 3 6
Is there anyway of doing this?
I think this is what you were looking for. It is the use of rename_ as #Henrik suggested, but the argument has an, lets say, interesting, name:
> myFunc <- function(df, col){
+ new <- paste0(col, '_1')
+ out <- dplyr::rename_(df, .dots=setNames(list(col), new))
+ return(out)
+ }
> myFunc(data.frame(x=c(1,2,3)), "x")
x_1
1 1
2 2
3 3
>
Note the use of setNames to use the value of new as name in the list.
Recent updates to tidyr and dplyr allow you to use the rename_with function.
Say you have a data frame:
library(tidyverse)
df <- tibble(V0 = runif(10), V1 = runif(10), V2 = runif(10), key=letters[1:10])
And you want to change all of the "V" columns. Usually, my reference for columns like this comes from a json file, which in R is a labeled list. e.g.,
colmapping <- c("newcol1", "newcol2", "newcol3")
names(colmapping) <- paste0("V",0:2)
You can then use the following to change the names of df to the strings in the colmapping list:
df <- rename_with(.data = df, .cols = starts_with("V"), .fn = function(x){colmapping[x]})

Reorder data.frame and rewrite rownames?

I have a data.frame like this:
id<-c("001-020", "001-010", "001-051")
name<-c("Fred", "Sue", "Liam")
df<-data.frame(id, name)
I tried:
df[with(df, order(id)), ]
# id name
# 2 001-010 Sue
# 1 001-020 Fred
# 3 001-051 Liam
which orders the data.frame correctly, but doesn't touch the rownames.
How may I reorder the data.frame using the ascending order of the id field and rewrite the rownames in one go?
You could try
newdf <- df[with(df, order(id)), ]
row.names(newdf) <- NULL
Or it can be done in a single step
newdf <- `row.names<-`(df[with(df,order(id)),], NULL)
Setting row.names to NULL will also work when you have an empty data.frame.
d1 <- data.frame()
row.names(d1) <- NULL
d1
#data frame with 0 columns and 0 rows
If we do the same with 1:nrow
row.names(d1) <-1:nrow(d1)
#Error in `row.names<-.data.frame`(`*tmp*`, value = c(1L, 0L)) :
#invalid 'row.names' length
Or another option is data.table
library(data.table)#v1.9.4+
setorder(setDT(df), id)[]
Or
setDT(df)[order(id)]
Or using sqldf
library(sqldf)
sqldf('select * from df
order by id')
You can simply assign new rownames:
df2 <- df[with(df, order(id)), ]
rownames(df2) <- 1:nrow(df2)
And a cleaner solution with magrittr:
library(magrittr)
df %>% extract(order(df$id), ) %>% set_rownames(1:nrow(df))
I am surprised it's not in the previous answers.
What you are looking for is arrange from plyr:
library(plyr)
arrange(df, id)
# id name
#1 001-010 Sue
#2 001-020 Fred
#3 001-051 Liam
Since row names are stored as an attribute on the object, perhaps structure() would be appropriate here:
structure(df[order(df$id),],row.names=rownames(df));
## id name
## 1 001-010 Sue
## 2 001-020 Fred
## 3 001-051 Liam

Unique values in each of the columns of a data frame

I want to get the number of unique values in each of the columns of a data frame.
Let's say I have the following data frame:
DF <- data.frame(v1 = c(1,2,3,2), v2 = c("a","a","b","b"))
then it should return that there are 3 distinct values for v1, and 2 for v2.
I tried unique(DF), but it does not work as each rows are different.
Or using unique:
rapply(DF,function(x)length(unique(x)))
v1 v2
3 2
sapply(DF, function(x) length(unique(x)))
In dplyr:
DF %>% summarise_all(funs(n_distinct(.)))
Here's one approach:
> lapply(DF, function(x) length(table(x)))
$v1
[1] 3
$v2
[1] 2
This basically tabulates the unique values per column. Using length on that tells you the number. Removing length will show you the actual table of unique values.
For the sake of completeness: Since CRAN version 1.9.6 of 19 Sep 2015, the data.table package includes the helper function uniqueN() which saves us from writing
function(x) length(unique(x))
when calling one of the siblings of apply():
sapply(DF, data.table::uniqueN)
v1 v2
3 2
Note that neither the data.table package needs to be loaded nor DF coerced to class data.table in order to use uniqueN(), here.
In dplyr (>=1.0.0 - june 2020):
DF %>% summarize_all(n_distinct)
v1 v2
1 3 2
I think a function like this would give you what you are looking for. This also shows the unique values, in addition to how many NA's there are in each dataframe's columns. Simply plug in your dataframe, and you are good to go.
totaluniquevals <- function(df) {
x <<- data.frame("Row Name"= numeric(0), "TotalUnique"=numeric(0), "IsNA"=numeric(0))
result <- sapply(df, function(x) length(unique(x)))
isnatotals <- sapply(df, function(x) sum(is.na(x)))
#Now Create the Row names
for (i in 1:length(colnames(df))) {
x[i,1] <<- (names(result[i]))
x[i,2] <<- result[[i]]
x[i,3] <<- isnatotals[[i]]
}
return(x)
}
Test:
DF <- data.frame(v1 = c(1,2,3,2), v2 = c("a","a","b","b"))
totaluniquevals(DF)
Row.Name TotalUnique IsNA
1 v1 3 0
2 v2 2 0
You can then use unique on whatever column, to see what the specific unique values are.
unique(DF$v2)
[1] a b
Levels: a b
This should work for getting an unique value for each variable:
length(unique(datasetname$variablename))
This will give you unique values in DF dataframe of column 1.
unique(sc_data[,1])

Resources