How to assign the column name to the variable dynamically - r

I am currently developing an application and I need to loop through the columns of the data frame. For instance, if the data frame has the columns
char_set <- data.frame(character(),character(),character(),character(),stringsAsFactors = FALSE)
names(char_set) <- c("a","b","c","d")
If the input is given as "a", then the column name "b" should be assigned to the variable, say promote.
It throws an error Error in[.data.frame(char_set, i + 1) : undefined columns selected. Is there any solution?
char_name <- "a"
char_set <- data.frame(character(),character(),character(),character(),stringsAsFactors = FALSE)
names(char_set) <- c("a","b","c","d")
for (i in 1:ncol(char_set)) {
promote <- ifelse(names(char_set) == char_name,char_set[i+1], "-")
print(promote)
}
Thanks in advance!!!

This is actually quite interesting. I would suggest doing something on those lines:
char_name <- "a"
char_set <- data.frame(
a = 1:2,
b = 3:4,
c = 5:6,
d = 8:9,
stringsAsFactors = FALSE
)
res_dta <- data.frame(matrix(nrow = 2, ncol = 3))
for (i in wrapr::seqi(1, NCOL(char_set) - 1)) {
print(i)
if (names(char_set)[i] == char_name) {
res_dta[i] <- char_set[i + 1]
} else {
res_dta[i] <- char_set[i]
}
}
Results
char_set
a b c d
1 1 3 5 8
2 2 4 6 9
res_dta
X1 X2 X3
1 3 3 5
2 4 4 6
There are few generic points:
When you are looping through columns be mindful not fall outside data frame dimensions; running i + 1 on i = 4 will give you column 5 which will return an error for data frame with four columns. You may then decide to run to one column less or break for a specific i value
Not sure if I got your request right, for column names a you want to take values of column b; then column b stays as it was?
Broadly speaking, I'm of a view that this names(char_set)[i] == char_name requires more thought but you have a start with this answer. Updating your post with desired results would help to design a solution.

The problem in your code is that you are looping from 1 to the number of columns of the char_set df, then you are calling the variable char_set[i+1].
This, when the i index takes the maximum value, the instruction char_set[i+1] returns an error because there is no element with that index.
You can try with this solution:
char_name<-"a"
promote<-ifelse((which(names(char_set)==char_name)+1)<ncol(char_set),names(char_set)[which(names(char_set)==char_name)+1],"-")
promote
> [1] "b"
char_name<-"d"
promote<-ifelse((which(names(char_set)==char_name)+1)<ncol(char_set),names(char_set)[which(names(char_set)==char_name)+1],"-")
promote
> [1] "-"
However. when the variable char_name takes the value a, the variable promote will take the value that the set char_set has at the position after the element named a, which matches char_name.
I suggest you to think about the case in which the variable char_name takes the value d and you don't have any values in the char_set after d.

Related

Reference vector from data frame using custom function

I'm trying to call a vector "a" from a data frame "df" using a function. I know I could do this just fine with the following:
> df$a
[1] 1 2 3
But I'd like to use a function where both the data frame and vector names are input separately as arguments. This is the best that I've come up with:
show_vector <- function(data.set, column) {
data.set$column
}
But here's how it goes when I try it out:
> show_vector(df, a)
NULL
How could I change this function in order to successfully reference vector df$a where the names of both are input to a function as arguments?
It's actually possible to do this without passing the column name as a string (in other words, you can pass in the unquoted column name:
show_vector <- function(data.set, column) {
eval(substitute(column), envir = data.set)
}
Usage example:
df <- data.frame(a = 1:3, b = 4:6)
show_vector(df, b)
# 4 5 6
I've wondered about this kind of thing a lot in the past and haven't found an easy fix. The best I've come up with is this:
df <- data.frame(c(1, 2, 3), c(4, 5, 6))
colnames(df) <- c("A", "B")
test <- function(dataframe, columnName) {
return(dataframe[, match(columnName, colnames(dataframe))])
}
test(df, "A")
Your code would work if you only put the column name in quotes i.e. show_vector(df, "a")
Other multiple ways to do this:
Using base functionality
func <- function(df, cname){
return(df[, grep(cname, colnames(df))])
}
Or even
func <- function(df, cname){
return(df[, cname])
}
You can use substitute to capture the input vector name as it is then use `as.character to make it as a character.
show_vector <- function(data.set, column) {
data.set[,as.character(substitute(column))]
}
Now lets take a look:
(dat=data.frame(a=1:3,b=4:6,c=10:12))
a b c
1 1 4 10
2 2 5 11
3 3 6 12
show_vector(dat,a)
[1] 1 2 3
show_vector(dat,"a")
[1] 1 2 3
It works.
we can also write a simple one where we just input a character string:
show_vector1 <- function(data.set, column) {
data.set[,column]
}
show_vector1(dat,"a")
[1] 1 2 3
Although this will not work if the column name is not a character:
show_vector1(dat,a)
**Show Traceback
Rerun with Debug
Error in `[.data.frame`(data.set, , column) : undefined columns selected**

R : Track Changes Across two columns

I have a data frame which records the changes in the name of companies. A simple representation would be :
df <- data.frame(key = c("A", "B","C", "E","F","G"), Change = c("B", "C","D" ,"F","G","H"))
print(df)
Key Change
1 A B
2 B C
3 C D
4 E F
5 F G
6 G H
I want to track all the changes a value is going through. Here is an output that can help me do so:
Key 1st 2nd 3rd 4th
1 A B C D
2 E F G H
How can I do it in R? I am new to R and Programming. It would be great to get help.
The question was marked duplicate of How to reshape data from long to wide format?
However, it is not an exact duplicate. For the reasons :
1. example used here contains data changing across columns. That is not the case in the question of reshaping data. Here, the two columns are dependent on each other.
2. Before reshaping, I reckon there is another step : maybe giving an id for the changes taking place. I am not sure how to do it.
Could you help me?
Can we assume that a same name never appears (never occurs like A->B->C and D->E->A)? If so, you can do the following.
df <- data.frame(key = c("A","B","C", "E","F","G"),
Change = c("B","C","D" ,"F","G","H"))
print(df)
# mapping from old to new name
next_name <- as.character(df$Change)
names(next_name) <- df$key
all_names <- unique(c(as.character(df$key), as.character(df$Change)))
get_id <- function(x) {
# for each name, repeatedly traverse until the final name
ss <- x %in% names(next_name)
if (any(ss)) {
x[ss] <- get_id(next_name[x[ss]])
}
x
}
ids <- get_id(all_names)
lapply(unique(ids), function(i) c(all_names[ids==i]))
# out come is a list of company names,
# each entry represents a history of a firm
##[[1]]
##[1] "A" "B" "C" "D"
##[[2]]
##[1] "E" "F" "G" "H"
The outcome is a list, not data frame since the number of name sequences may not be unique (firms may have different number of names).

Dynamic merge in R

I have an example filter table as below and a big source data table. I need to do the merge using these two tables. If no column in the filter table contains ALL, use three columns to do the the merge (using Tran=1001, Acct=1 & Co=a to do the inner join with the data table).If one of them, ie Tran has ALL, use the remaining two columns to do the merge (using Acct=3 & Co=c to do the join). If two of them, ie Tran and Acct, have All, use the remaining one column to do the merge (using Co=b to do the join).
The real question is the number of columns is uncertain.
Can anyone help me with this?
Tran Acct Co
1001 1 a
1002 ALL ALL
ALL ALL b
ALL 4 ALL
1003 2 ALL
ALL 3 c
1004 ALL d
You're going to have to write a series of conditional statements using if, elseif and else. I'll use the %in% operator to check for this. The %in% operator returns a series of boolean values. The easiest way is to show through example:
> x <- c(1, 2, 3, 4, 5)
> y <- c(2, 3, 4, 5, 6)
> x %in% y
[1] FALSE TRUE TRUE TRUE TRUE
Notice that it returns FALSE for the first value as the value of 1 in x is not in y. You can do the same for the "ALL" value in your data set. I assume you are going row by row as you seemed to imply in your question. Let me know if you need to check the whole column first (you can use the any function for that case). Here is an example of your first condition:
# Assume that df is your data.frame of data.
for (i in 1:length(df$Tran)) {
if (!("All" %in% df$Tran[i]) & !("ALL" %in% df$Acct[i]) & !("All" %in% df$Co[i])) {
# Do your merge here
}
if ( [Put your next condition here] ) {
# Do the appropriate merge for that condition
}
...
Note that I used the "!" operator to get the inverse of whatever %in% returns because you want it to be the case where ALL is NOT in the row. I realize now that you could have just done All != df$Tran[1] since you are going row by row, but %in% might be more useful if you end up going for the whole column.
Hope this helps!
Editing in a new method now that it's more clear what the need is. So we have to find the number of "ALL" values in each row and then merge a certain way depending on the number of them. There are a lot of methods, but here's one I like:
> test <- data.frame(a = "ALL", b = 2, c = "ALL", d = 3, e = "ALL")
> test
a b c d e
1 ALL 2 ALL 3 ALL
> table(test[1, ] == "ALL")["TRUE"]
TRUE
3
Basically, I'm looking at the first row, and getting the number that return TRUE when asked if it contains the string "ALL". From here you can set conditionals on this number. To automate over the entire data frame, throw it in a for loop and set "1" equal to "i" or whatever you sequence variable is.
To get which rows have "ALL" in it (which in converse would tell which rows do not have "ALL" in it as well), you can use grep on each row. Here's a short example:
> # Initializing a sample data frame.
> df <- data.frame(a = "1", b = "ALL", c = "ALL", d = "5", e = "ALL")
> print(df)
a b c d e
1 1 ALL ALL 5 ALL
>
> # Finding the column numbers that have "ALL" in it using grep.
> places <- grep("ALL", df[1, ])
> print(places)
[1] 2 3 5
>
> # Each number corresponds to the order of the columns in the data frame and can be returned as such.
> nameCols <- names(df)[places]
> print(nameCols)
[1] "b" "c" "e"
>
> # Likewise, you can find what columns did not have "ALL" in it by doing the opposite.
> nameColsNOT <- names(df)[-places]
> print(nameColsNOT)
[1] "a" "d"
Iterate this method through a loop for each row in your data frame and use the conditional method I outlined above. Please note that this requires your columns to all be of "character" class, which I assume is the case already.

concatenating strings to make variable name

I want to change the name of the output of my R function to reflect different strings that are inputted. Here is what I have tried:
kd = c("a","b","d","e","b")
test = function(kd){
return(list(assign(paste(kd,"burst",sep="_"),1:6)))
}
This is just a simple test function. I get the warning (which is just as bad an error for me):
Warning message:
In assign(paste(kd, "burst", sep = "_"), 1:6) :
only the first element is used as variable name
Ideally I would get ouput like a_burst = 1, b_burst = 2 and so on but am not getting close.
I would like split up a dataframe by contents of a vector and be able to name everything according to the name from that vector, similar to
How to split a data frame by rows, and then process the blocks?
but not quite. The naming is imperative.
Something like this, maybe?
kd = c("a","b","d","e","b")
test <- function(x){
l <- as.list(1:5)
names(l) <- paste(x,"burst",sep = "_")
l
}
test(kd)
You could use a vector instead of a list by way of setNames:
t1_6 <- setNames( 1:6, kd)
t1_6
a b d e b <NA>
1 2 3 4 5 6
> t1_6["a"]
a
1
Looking at the question again I wondered if you wnated to assign sequential names to a character vector:
> a1_5 <- setNames(kd, paste0("alpha", 1:5))
> a1_5
alpha1 alpha2 alpha3 alpha4 alpha5
"a" "b" "d" "e" "b"

Retain Vector Names as Dataframe Column Names

In my code, I am filling the columns of a dataframe with vectors, as so:
df1[columnNum] <- barWidth
This works fine, except for one thing: I want the name of the vector variable (barWidth above) to be retained as the column header, one column at a time. Furthermore, I do not wish to use cbind. This slows the execution of my code down considerably. Consequently, I am using a pre-allocated dataframe.
Can this be done in the vector-to-column assignment? If not, then how do I change it after the fact? I can't find the right syntax to do this with colNames().
TIA
It's being done by the [<-.data.frame function. It could conceivably be replaced by one that looked at the name of the argument but it's such a fundamental function I would be hesitant. Furthermore there appears to be an aversion to that practice signaled by this code at the top of the function definition:
> `[<-.data.frame`
function (x, i, j, value)
{
if (!all(names(sys.call()) %in% c("", "value")))
warning("named arguments are discouraged")
nA <- nargs()
if (nA == 4L) {
<snipped rest of rather long definition>
I don't know why that is there, but it is. Maybe you should either be thinking about using names<- after the column assignment, or using this method:
> dfrm["barWidth"] <- barWidth
> dfrm
a V2 barWidth
1 a 1 1
2 b 2 2
3 c 3 3
4 d 4 4
This can be generalized to a list of new columns:
dfrm <- data.frame(a=letters[1:4])
barWidth <- 1:4
newcols <- list(barWidth=barWidth, bw2 =barWidth)
dfrm[names(newcol)] <- newcol
dfrm
#
a barWidth bw2
1 a 1 1
2 b 2 2
3 c 3 3
4 d 4 4
If you have the list of names of vectors you want to apply you could do:
namevec <- c(...,"barWidth"...,)
columnNums <- c(...,10,...)
df1[columnNums[i]] <- get(namevec[i])
names(df1)[columnNums[i]] <- namevec[i]
or even
columnNums <- c(barWidth=4,...)
for (i in seq_along(columnNums)) {
df1[columnNums[i]] <- get(names(columnNums)[i])
}
names(df1)[columnNums] <- names(columnNums)
but the deeper question would be where this set of vectors is coming from in the first place: could you have them in a list all along?
I'd simply use cbind():
df1 <- cbind( df1, barWidth )
which retains the name. It will, however, end up as the last column in df1

Resources