Affecting variable through eval or similar - r

I know that if i have the name of a variable stored like a = "var.name", i can call this var.name by doing eval(as.symbol(a)) or get(a), but i wanted to not only call a variable, but also make changes to it. Example:
names = c("X1","X2")
for(i in names){
assign(i, cbind(replicate(2,rnorm(3))) #Just creating a 3x2 matrix with dummy data
###
At ### i'd like to make a change to the variables, specifically change its column names to "a" and "b".
I tried colnames(get(i)) = c("a","b"), or colnames(eval(as.symbol(i))) = c("a","b"), but they return errors like could not find function "eval<-"

One option could be to create the matrix in the first step, and name and assign to a new name in the second step.
names = c("X1","X2")
for(i in names){
x <- cbind(replicate(2,rnorm(3)))
assign(i, provideDimnames(x))
}
#--------------
> X1
A B
A -0.59174062 1.8527780
B -0.53088643 -3.2713544
C -0.09330006 -0.5977568
Another option would be to assign the dimnames at the time of creation of the matrix.
for (i in names) {
x <- matrix(replicate(2, rnorm(3)),
ncol = 2,
dimnames = list(a = c(LETTERS[1:3]), b = c(LETTERS[1:2])))
assign(i, x)
}
#-------------------
> X1
b
a A B
A -0.2313692 -0.93161762
B -0.9666849 0.06164904
C 1.5614446 -0.09391062

Related

Is it possible to keep the datatype while using assign inside a function in r?

My first post here so please tell me if I'm missing any important information.
I am handling a lot of data in form of time(1:30=rowID) vs value all stored in a number of dataframes and I need to keep it as a data.frame.
I wrote a function that gets dataframes from my global environment and sorts the columns in each set into new data frames depending on their values.
So I start with a list of names of my data frames as input for my function and then end with assigning the created new dataframes to my global environment while using the assign function.
The dataframes I get all are 30 rows long, but have different column length depending on how often a case appears in a dataset. The names of each dataframe represent one data set and the column names inside represent one timeline. I use data frames, so I don't loose the information of the column name.
This works for having 0 cases and everything above 1.
But if a data.frame ends up with only one column and I use the assign function it appears as a vector in my global environment instead of a data frame. Therefore I loose the name of the column and my other functions that only use data frames stop at such a case and throw errors.
Here is a basic example of my problem:
#create two datasets with different cases
data1 <- data.frame(matrix(nrow=30, ncol=5))
data1[1] <- c(rep(1,each=30))
data1[2] <- c(rep(5, each=30))
data1[3] <- c(rep(5, each=30))
data1[4] <- c(rep(10, each=30))
data1[5] <- c(rep(10, each=30))
data2 <- data.frame(matrix(nrow=30, ncol=6))
data2[1] <- c(rep(5,each=30))
data2[2] <- c(rep(1, each=30))
data2[3] <- c(rep(1, each=30))
data2[4] <- c(rep(0, each=30))
data2[5] <- c(rep(0, each=30))
data2[6] <- c(rep(10, each=30))
#create list with names of datasets
names <- c('data1','data2')
#function for sorting
examplefunction <- function(VarNames) {
for (i in 1:length(VarNames)) {
#get current dataset
name <- VarNames[i]
data <- get(VarNames[i])
#create new empty data.frames for sorting
data.0 <- data.frame(matrix(nrow=30))
name.data.0 <- paste(name,"0", sep=".")
c.0 = 2 #start at second column, since first doesn't like the colname later
data.1 <- data.frame(matrix(nrow=30))
name.data.1 <- paste(name,"1", sep=".")
c.1 = 2
data.5 <- data.frame(matrix(nrow=30))
name.data.5 <- paste(name,"5", sep=".")
c.5 = 2
data.10 <- data.frame(matrix(nrow=30))
name.data.10 <- paste(name,"10", sep=".")
c.10 = 2
#sort data into new different data.frames
for (c in 1:ncol(data)) {
if(data[1,c]==0) {
data.0[c.0] = data[c]
c.0 = c.0 +1
}
else if(data[1,c]==1) {
data.1[c.1] = data[c]
c.1 = c.1 +1
}
else if(data[1,c]==5) {
data.5[c.5] = data[c]
c.5 = c.5 +1
}
else if(data[1,c]==10) {
data.10[c.10] = data[c]
c.10 = c.10 +1
}
else (stop="new values")
}
#remove first column with weird name
data.0 <- data.0[,-1]
data.1 <- data.1[,-1]
data.5 <- data.5[,-1]
data.10 <- data.10[,-1]
#assign data frames to global environment
assign(name.data.0, data.0, envir = .GlobalEnv)
assign(name.data.1, data.1, envir = .GlobalEnv)
assign(name.data.5, data.5, envir = .GlobalEnv)
assign(name.data.10, data.10, envir = .GlobalEnv)
}
}
#function call
examplefunction(names)
As explained before, if you run this you will end up with data frames of 0 variables and >1 variables.
And three vectors, where the data frame had only one column.
So my questions are:
1. Is there any way to keep the data type and forcing R to assign it to a data frame instead of a vector?
2. Or is there an alternative function I could use instead of assign()? If I use <<- how can I do the name assigning as above?
You can use drop = FALSE when subsetting:
examplefunction <- function(VarNames) {
for (i in 1:length(VarNames)) {
#get current dataset
name <- VarNames[i]
data <- get(VarNames[i])
#create new empty data.frames for sorting
data.0 <- data.frame(matrix(nrow=30))
name.data.0 <- paste(name,"0", sep=".")
c.0 = 2 #start at second column, since first doesn't like the colname later
data.1 <- data.frame(matrix(nrow=30))
name.data.1 <- paste(name,"1", sep=".")
c.1 = 2
data.5 <- data.frame(matrix(nrow=30))
name.data.5 <- paste(name,"5", sep=".")
c.5 = 2
data.10 <- data.frame(matrix(nrow=30))
name.data.10 <- paste(name,"10", sep=".")
c.10 = 2
#sort data into new different data.frames
for (c in 1:ncol(data)) {
if(data[1,c]==0) {
data.0[c.0] = data[c]
c.0 = c.0 +1
}
else if(data[1,c]==1) {
data.1[c.1] = data[c]
c.1 = c.1 +1
}
else if(data[1,c]==5) {
data.5[c.5] = data[c]
c.5 = c.5 +1
}
else if(data[1,c]==10) {
data.10[c.10] = data[c]
c.10 = c.10 +1
}
else (stop="new values")
}
#remove first column with weird name
data.0 <- data.0[ , -1, drop = FALSE]
data.1 <- data.1[ , -1, drop = FALSE]
data.5 <- data.5[ , -1, drop = FALSE]
data.10 <- data.10[ , -1, drop = FALSE]
#assign data frames to global environment
assign(name.data.0, data.0, envir = .GlobalEnv)
assign(name.data.1, data.1, envir = .GlobalEnv)
assign(name.data.5, data.5, envir = .GlobalEnv)
assign(name.data.10, data.10, envir = .GlobalEnv)
}
}
#function call
examplefunction(names)
Let's take a look at the one-column dataframes:
str(data1.1)
'data.frame': 30 obs. of 1 variable:
$ X1: num 1 1 1 1 1 1 1 1 1 1 ...
str(data2.10)
'data.frame': 30 obs. of 1 variable:
$ X6: num 10 10 10 10 10 10 10 10 10 10 ...
Now, all that said, I agree with Roland's comment -- you almost never want to take this approach of assigning to the global environment in a complicated way, and instead should return a list; that's best practice. However, you'd still need drop = FALSE to keep the column names.
Really, to me, there's probably an entirely different approach to doing whatever kind of data wrangling you're wanting to do that is a much better approach. I just don't have a good grasp of your task to make a suggestion.

How to find specific strings in dataframe using for loop?

I'm using for loop to find all specific strings (df2$x2) in another dataframe (df1$x1) and what my purpose is create new column the df1$test and write the df$x2 value.
For example:
df1 <- data.frame(x1 = c("TE-T6-3 XYZ12X","TE-D31L-2 QWE12X","TE-H6-1 ABC12X","TE-D31L-2 QWE12X","EC20 QWX12X"),
Y = c(2017,2017,2018,2018,2017),
Sales = c(25,50,30,40,90))
df1$x1 <- as.character(as.factor(df1$x1))
df2 <- data.frame(x2 = c("TE-T6-5","TE-D31L-2","TE-H6-15","EC500","EC20","TE-D31L-2"),
Y = c(2018,2017,2018,2017,2018,2018),
P = c(100,300,200,50,150,300))
df2$x2 <- as.character(as.factor(df2$x2))
for(i in 1:nrow(df2)){
f <- df2[i,1]
df1$test <- ifelse(grepl(f, df1$x1),f,"not found")
}
What should I do after the end of loop? I know that problem is y is refreshing every time. I tried "if" statement to create new data frame and save outputs but it didn't work. It's writing only one specific string.
Thank you in advance.
Expected output:
df1 <- data.frame(x1 = c("TE-T6-3 XYZ12X","TE-D31L-2 QWE12X","TE-H6-1 ABC12X","TE-D31L-2 QWE12X","EC20 QWX12X"),
output = c("not found","TE-D31L-2","not found","TE-D31L-2","EC20"))
Do you want to have one new column for each string? if that is what you need, your code should be:
df1 <- data.frame(x1 = c("TE-T6-3 XYZ12X","TE-D31L-2 QWE12X","TE-H6-1 ABC12X","TE-D31L-2 QWE12X","EC20 QWX12X"),
Y = c(2017,2017,2018,2018,2017),
Sales = c(25,50,30,40,90))
df1$x1 <- as.character(as.factor(df1$x1))
df2 <- data.frame(x2 = c("TE-T6-5","TE-D31L-2","TE-H6-15","EC500","EC20","TE-D31L-2"),
Y = c(2018,2017,2018,2017,2018,2018),
P = c(100,300,200,50,150,300))
df2$x2 <- as.character(as.factor(df2$x2))
for(i in 1:nrow(df2)){
f <- df2[i,1]
df1$test <- ""
df1$test<-ifelse(grepl(f, df1$x1),T,F)
colnames(df1) <- c(colnames(df1[1:length(df1[1,])-1]),f)
}
it creates a new column with a temp name and then rename it with the string evaluated. Also i change "not found" for F, but you can use whatever you want.
[EDIT:]
If you want that expected output, you can use this code:
df1 <- data.frame(x1 = c("TE-T6-3 XYZ12X","TE-D31L-2 QWE12X","TE-H6-1 ABC12X","TE-D31L-2 QWE12X","EC20 QWX12X"),
Y = c(2017,2017,2018,2018,2017),
Sales = c(25,50,30,40,90))
df1$x1 <- as.character(as.factor(df1$x1))
df2 <- data.frame(x2 = c("TE-T6-5","TE-D31L-2","TE-H6-15","EC500","EC20","TE-D31L-2"),
Y = c(2018,2017,2018,2017,2018,2018),
P = c(100,300,200,50,150,300))
df2$x2 <- as.character(as.factor(df2$x2))
df1$output <- "not found"
for(i in 1:nrow(df2)){
f <- df2[i,1]
df1$output[grepl(f, df1$x1)]<-f
}
Very similar of what you have done, but it was needed to index which rows you have to write.
This only works when the data only can have one match, it is a little more complicated if you can have more than one match for row. But i think that's not your problem.
You simply need to split the df1$x1 strings on space and merge (or match since you are only interested in one variable)on df2$x2, i.e.
v1 <- sub('\\s+.*', '', df1$x1)
v1[match(v1, df2$x2)]
#[1] NA "TE-D31L-2" NA "TE-D31L-2" "EC20"

Looping correlation tests within nested lists on same variables across more than two dataframes

Consider these three dataframes in a nested list:
df1 <- data.frame(a = runif(10,1,10), b = runif(10,1,10), c = runif(10,1,10))
df2 <- data.frame(a = runif(10,1,10), b = runif(10,1,10), c = runif(10,1,10))
df3 <- data.frame(a = runif(10,1,10), b = runif(10,1,10), c = runif(10,1,10))
dflist1 <- list(df1,df2,df3)
dflist2 <- list(df1,df2,df3)
nest_list <- list(dflist1, dflist2)
I want to do a 'cor.test' between column 'a' against column 'a', 'b' against 'b' and 'c' against 'c' in all 'dfs' for each dflist. I can do it individually if assign each one to the global environment with the code below thanks to this post:
for (i in 1:length(nest_list)) { # extract dataframes from list in to individual dfs
for(j in 1:length(dflist1)) {
temp_df <- Norm_red_list[[i]][[j]]}
ds <- paste (names(nest_list[i]),names(nestlist[[i]][[j]]), sep = "_")
assign(ds,temp_df)
}
}
combn(paste0("df", 1:3), 2, FUN = function(x) { #a ctual cor.test
x1 <- mget(x, envir = .GlobalEnv)
Map(function(x,y) cor.test(x,y, method = "spearman")$p.value, x1[[1]], x1[[2]])})
I am not sure that I understand exactly what you want to do but could something like this help you ?
#vector of your columns name
columns <- c("a","b","c")
n <- length(columns)
# correlation calculation function
correl <- function(i,j,data) {cor.test(unlist(data[i]),unlist(data[j]), method = "spearman")$p.value}
correlfun <- Vectorize(correl, vectorize.args=list("i","j"))
# Make a "loop" on columns vector (u will then be each value in columns vector, "a" then "b" then "c")
res <- sapply(columns,function(u){
# Create another loop on frames that respect the condition names(x)==u (only the data stored in columns "a", "b" or "c")
lapply(lapply(nest_list,function(x){sapply(x,function(x){x[which(names(x)==u)]})}),function(z)
# on those data, use the function outer to apply correlfun function on each pair of vectors
{outer(1:n,1:n,correlfun,data=z)})},simplify = FALSE,USE.NAMES = TRUE)
Is this helping ? Not sure I'm really clear in my explanation :)

Dynamic variable names in function in R

I am looking to make a function that takes a vector as input, does some simple arithmetic with the vector and call the new vector something which consists of a set string (say, "log.") plus the original vector name.
d = c(1 2, 3)
my.function <- function { x
x2 <- log(x)
...
I would like the function to return a vector called log.d (that is, not log.x or something set, but something dependent on the name of the vector input as x).
You can try next:
d = c(1, 2, 3)
my.function <- function(x){
x2 <- log(x)
arg_name <- deparse(substitute(x)) # Get argument name
var_name <- paste("log", arg_name, sep="_") # Construct the name
assign(var_name, x2, env=.GlobalEnv) # Assign values to variable
# variable will be created in .GlobalEnv
}
One way to do this would be to store separately names of all your input vector names and then pass them to assign function. Like assign takes text string for output object name, get looks up object from string.
I will assume your vectors all follow common pattern and start with "d", to make it all as dynamic as possible.
d1 <- c(1,2,3)
d2 <- c(2,3,4)
vec_names <- ls(pattern = "^d")
log_vec <- function(x){
log(x)
}
sapply(vec_names, function(x) assign(paste0("log.", x), log_vec(get(x)), envir = globalenv()))
This should create two new objects "log.d1" and "log.d2".

How to write a function() with arguments x,y that only returns values from column x that are == y

I have a data frame:
df <- data.frame( a = 1:5, b = 1:5, c = 1:5, d = as.factor(1:5))
I want to write a function that takes as its argument one of the columns a,b or c, and one of the factors of column d, and returns only the values of column a, b, or c, that have said factor value for column d.
I tried the following code:
fun1 <- function(x,y) {
u <- x[data$d == "y"]
return(u)
}
and I keep getting back numeric(0) as the output of the function. When I try similar code outside of the function() environment, it appears to work fine. Any help would be appreciated.
Probably a duplicate but I don't know how I would find it in the haystack of items with tags: data.frame, indexing, columns, values. Best practice is to pass the "data" as well as the search terms. (Calling the object df1 rather than df.)
fun1 <- function(dfrm, col,val) {
u <- dfrm[dfrm$d == val , col]
return(u)
}
fun1(df1, 'b', 3)
#[1] 3

Resources