I have a subset function that takes in an object of a user defined class, a condition passed to the function, and adds that condition as an attribute of the object.
subset.survey.data.frame <- function(x, condition, drop=FALSE, inside=FALSE) {
if(inside) {
condition_call <- deparse(substitute(condition, env=parent.frame(n=1)))
}
else {
condition_call <- substitute(condition)
}
x[["user_conditions"]] <- unique(c(x[["user_conditions"]],list(condition_call)))
cat("Subset Conditions have been added to SDF")
x
}
I can call this function as:
sdf <- subset.survey.data.frame(sdf,dsex =="Male")
This adds dsex == "Male" in user_conditions attribute.
However, if I want to call it from within another function and a loop, it passed v1 and v2, instead of the actual variable names.
for(i in 1:length(lvls)) {
v1 <- rhs_vars[1]
v2 <- lvls[i]
print(v1) #"dsex"
print(v2) #"Male"
dsdf <- subset.survey.data.frame(sdf, v1 == v2, inside=T)
How can I modify the subset function so that I can get the names of v1 and v2 and then add the condition to the object?
Here is what SDF, lvls, and rhs_vars looks like
sdf <- list(user_conditions = list(),default_conditions = list(default_conditions) ,data = data_Laf, weights=weights, pvvars=pvs, fileDescription = f)
Here, data_Laf is an LaF object (http://cran.r-project.org/web/packages/LaF/index.html), weights, pvs, and f are all lists.
rhs_vars <- rhs.vars(y ~ dsex + b017451) # from formula.tools package
> rhs_vars
[1] "dsex" "b017451"
lvls is the levels of a column in a dataframe
lvls <- levels(data[,rhs_vars[1]])
"Male" "Female"
Here is a working example:
default_conditions= quote(rptsamp=="Reporting sample")
sdf <- list(user_conditions = list(),default_conditions = list(default_conditions))
class(sdf) <- "Userdefined"
subset.survey.data.frame <- function(x, condition, drop=FALSE, inside=FALSE) {
if(inside) {
condition_call <- deparse(substitute(condition, env=parent.frame(n=1)))
}
else {
condition_call <- substitute(condition)
}
x[["user_conditions"]] <- unique(c(x[["user_conditions"]],list(condition_call)))
cat("Subset Conditions have been added to X")
x
}
sdf <- subset.survey.data.frame(sdf,dsex =="Male")
print(sdf)
#This gives the correct answer and adds dsex == "Male" to user conditions
#Creating some sample data
dsex =c('1','2','1','1','2','1','1','2','1')
b017451 <- sample(c(1:100), 9)
y <- rep(10, 9)
data <- data.frame(dsex, y, b017451)
data[,'dsex'] <- factor(data[,'dsex'], levels=c("1", "2"), labels=c('Male','Female'))
require(formula.tools)
rhs_vars <- rhs.vars(y ~ dsex + b017451)
lvls <- levels(data[,rhs_vars[1]])
for(i in 1:length(lvls)) {
v1 <- rhs_vars[1]
v2 <- lvls[i]
print(v1) #"dsex"
print(v2) #"Male"
dsdf <- subset.survey.data.frame(sdf, v1 == v2, inside=F)
print(dsdf)
#this doesnt give the correct answer and adds v1 == v2 to user conditions
break
}
As #nrussell was alluding to, substitute should help you build your expressions. Then you just need to evalute them. Here's a simple example
v1 <- quote(cyl)
v2 <- 6
eval(substitute(subset(mtcars, v1==v2), list(v1=v1, v2=v2)))
If your v1 is a character class, you can convert it to a symbol vi as.name() because you need a symbol and not a character for the expression to work.
v1 <- "cyl"
v2 <- 6
eval(substitute(subset(mtcars, v1==v2), list(v1=as.name(v1), v2=v2)))
If you're controlling the "inside" parameter, then isn't it as simple as:
if(inside) condition_call = call(substitute(condition[[1]]), as.name(condition[[2]]), condition[[3]])
This of course assumes people are only using binary conditions, but you can extend the above logic.
Related
I am trying to write a function with an unspecified number of arguments using ... but I am running into issues where those arguments are column names. As a simple example, if I want a function that takes a data frame and uses within() to make a new column that is several other columns pasted together, I would intuitively write it as
example.fun <- function(input,...){
res <- within(input,pasted <- paste(...))
res}
where input is a data frame and ... specifies column names. This gives an error saying that the column names cannot be found (they are treated as objects). e.g.
df <- data.frame(x = c(1,2),y=c("a","b"))
example.fun(df,x,y)
This returns "Error in paste(...) : object 'x' not found "
I can use attach() and detach() within the function as a work around,
example.fun2 <- function(input,...){
attach(input)
res <- within(input,pasted <- paste(...))
detach(input)
res}
This works, but it's clunky and runs into issues if there happens to be an object in the global environment that is called the same thing as a column name, so it's not my preference.
What is the correct way to do this?
Thanks
1) Wrap the code in eval(substitute(...code...)) like this:
example.fun <- function(data, ...) {
eval(substitute(within(data, pasted <- paste(...))))
}
# test
df <- data.frame(x = c(1, 2), y = c("a", "b"))
example.fun(df, x, y)
## x y pasted
## 1 1 a 1 a
## 2 2 b 2 b
1a) A variation of that would be:
example.fun.2 <- function(data, ...) {
data.frame(data, pasted = eval(substitute(paste(...)), data))
}
example.fun.2(df, x, y)
2) Another possibility is to convert each argument to a character string and then use indexing.
example.fun.3 <- function(data, ...) {
vnames <- sapply(substitute(list(...))[-1], deparse)
data.frame(data, pasted = do.call("paste", data[vnames]))
}
example.fun.3(df, x, y)
3) Other possibilities are to change the design of the function and pass the variable names as a formula or character vector.
example.fun.4 <- function(data, formula) {
data.frame(data, pasted = do.call("paste", get_all_vars(formula, data)))
}
example.fun.4(df, ~ x + y)
example.fun.5 <- function(data, vnames) {
data.frame(data, pasted = do.call("paste", data[vnames]))
}
example.fun.5(df, c("x", "y"))
I have a function to select a value from a dataframe. I want to select that value, save it, remove it from the dataset, and select a value using the same function from the remaining values in the dataframe. What is the best way to do this?
Here is a simple example:
V1 <- c(5,6,7,8,9,10)
df <- data.frame(V1)
V2 <- as.data.frame(matrix(nrow=3,ncol=1))
maximum <- function(x){
max(x)
}
V2[i,]<- maximum(df)
df <- anti_join(df,V2,by='V1')
How can I set this up such that I reapply the maximum function to the remaining values in df and save these values in in V2?
I'm using a different and more complex set of functions and if/else statements than max - this is just an example. I do have to reapply the function to the remaining values, because I will be using the function on a new dataframe if df is empty.
Is this what you're looking for?
V1 <- data.frame(origin = c(5,6,7,8,9,10))
V2 <- as.data.frame(matrix(nrow=3,ncol=1))
df1 <- V1
df2 <- V2
recursive_function <- function(df1,df2,depth = 3,count = 1){
if (count == depth){
# Find index
indx <- which.max(df1[,1])
curVal <- df1[indx,1]
df2[count,1] <- curVal
df1 <- df1[-indx, ,drop = FALSE]
return(list(df1,
df2))
} else {
# Find index
indx <- which.max(df1[,1])
# Find Value
curVal <- df1[indx,1]
# Add value to new data frame
df2[count,1] <- curVal
# Subtract value from old dataframe
df1 <- df1[-indx, ,drop = FALSE]
recursive_function(df1,df2,depth,count + 1)
}
}
recursive_function(df1,df2)
Here is another solution that I stumbled across:
V1 <- c(5,6,7,8,9,10)
df <- data.frame(V1)
minFun <- function(df, maxRun){
V2 <- as.data.frame(matrix(nrow=maxRun,ncol=1))
for(i in 1:maxRun){
V2[i,]<- min(df)
df <- dplyr::anti_join(df,V2,by='V1')
}
return(V2)
}
test <- minFun(df = df, maxRun = 3)
test
I'm using for loop to find all specific strings (df2$x2) in another dataframe (df1$x1) and what my purpose is create new column the df1$test and write the df$x2 value.
For example:
df1 <- data.frame(x1 = c("TE-T6-3 XYZ12X","TE-D31L-2 QWE12X","TE-H6-1 ABC12X","TE-D31L-2 QWE12X","EC20 QWX12X"),
Y = c(2017,2017,2018,2018,2017),
Sales = c(25,50,30,40,90))
df1$x1 <- as.character(as.factor(df1$x1))
df2 <- data.frame(x2 = c("TE-T6-5","TE-D31L-2","TE-H6-15","EC500","EC20","TE-D31L-2"),
Y = c(2018,2017,2018,2017,2018,2018),
P = c(100,300,200,50,150,300))
df2$x2 <- as.character(as.factor(df2$x2))
for(i in 1:nrow(df2)){
f <- df2[i,1]
df1$test <- ifelse(grepl(f, df1$x1),f,"not found")
}
What should I do after the end of loop? I know that problem is y is refreshing every time. I tried "if" statement to create new data frame and save outputs but it didn't work. It's writing only one specific string.
Thank you in advance.
Expected output:
df1 <- data.frame(x1 = c("TE-T6-3 XYZ12X","TE-D31L-2 QWE12X","TE-H6-1 ABC12X","TE-D31L-2 QWE12X","EC20 QWX12X"),
output = c("not found","TE-D31L-2","not found","TE-D31L-2","EC20"))
Do you want to have one new column for each string? if that is what you need, your code should be:
df1 <- data.frame(x1 = c("TE-T6-3 XYZ12X","TE-D31L-2 QWE12X","TE-H6-1 ABC12X","TE-D31L-2 QWE12X","EC20 QWX12X"),
Y = c(2017,2017,2018,2018,2017),
Sales = c(25,50,30,40,90))
df1$x1 <- as.character(as.factor(df1$x1))
df2 <- data.frame(x2 = c("TE-T6-5","TE-D31L-2","TE-H6-15","EC500","EC20","TE-D31L-2"),
Y = c(2018,2017,2018,2017,2018,2018),
P = c(100,300,200,50,150,300))
df2$x2 <- as.character(as.factor(df2$x2))
for(i in 1:nrow(df2)){
f <- df2[i,1]
df1$test <- ""
df1$test<-ifelse(grepl(f, df1$x1),T,F)
colnames(df1) <- c(colnames(df1[1:length(df1[1,])-1]),f)
}
it creates a new column with a temp name and then rename it with the string evaluated. Also i change "not found" for F, but you can use whatever you want.
[EDIT:]
If you want that expected output, you can use this code:
df1 <- data.frame(x1 = c("TE-T6-3 XYZ12X","TE-D31L-2 QWE12X","TE-H6-1 ABC12X","TE-D31L-2 QWE12X","EC20 QWX12X"),
Y = c(2017,2017,2018,2018,2017),
Sales = c(25,50,30,40,90))
df1$x1 <- as.character(as.factor(df1$x1))
df2 <- data.frame(x2 = c("TE-T6-5","TE-D31L-2","TE-H6-15","EC500","EC20","TE-D31L-2"),
Y = c(2018,2017,2018,2017,2018,2018),
P = c(100,300,200,50,150,300))
df2$x2 <- as.character(as.factor(df2$x2))
df1$output <- "not found"
for(i in 1:nrow(df2)){
f <- df2[i,1]
df1$output[grepl(f, df1$x1)]<-f
}
Very similar of what you have done, but it was needed to index which rows you have to write.
This only works when the data only can have one match, it is a little more complicated if you can have more than one match for row. But i think that's not your problem.
You simply need to split the df1$x1 strings on space and merge (or match since you are only interested in one variable)on df2$x2, i.e.
v1 <- sub('\\s+.*', '', df1$x1)
v1[match(v1, df2$x2)]
#[1] NA "TE-D31L-2" NA "TE-D31L-2" "EC20"
I have a function where I want it to add its result to a dataframe. However, this doesn't seem to work. If I try the following:
testdf <- data.frame(matrix(ncol = 1, nrow = 0))
testfunction <- function() {
testdf[nrow(testdf) + 1,] <- list("a")
}
testfunction()
testdf
[1] matrix.ncol...1..nrow...0.
<0 rows> (or 0-length row.names)
It doesn't add a row. But if I do what's in the testfunction() directly, it works:
testdf[nrow(testdf) + 1,] <- list("a")
testdf
matrix.ncol...1..nrow...0.
1 a
Why is this the case and how can I add a row of data to a dataframe from within a function?
Perhaps the best way to do this would be to pass your data frame to the function as a parameter, and then to return the modified data frame to the caller.
testfunction <- function(df) {
df[nrow(df) + 1,] <- list("a")
return(df)
}
testdf <- data.frame(matrix(ncol = 1, nrow = 0))
testdf <- testfunction(testdf)
testdf
You could also keep everything the same but use the global assignment operator <<-:
testfunction <- function() {
testdf[nrow(testdf) + 1,] <<- list("a") # generally bad
}
But there are potential caveats with doing this, and the first option I gave is preferable.
There is a table which has two columns with each column having the type character. It is:
"FTGS" "JKLP"
"CVVA" "CVVA"
"HGFF" "CVVD"
"CVVD" "HGFF"
"OPSF" "WQSR"
...
Can somebody tell me how I would write a function that spits out the index (row number) of a specific combination of characters in column1 and 2? If I enter the function (HGFF,CVVD) it would return 3 and 4 (whether the HGFF or CVVD is in column1 or 2 does not matter). If I enter (CVVA,CVVA) it would be 2. The problem is that it should check accross two columns. Is there a solution in R? Otherwise bash would also be fine.
A function like the following should work for you:
myFun <- function(v1, v2, indf) {
x <- sort(c(v1, v2))
which(apply(indf, 1, function(z) all(sort(z) == x)))
}
The usage would be like this (assuming your data are in a data.frame called "mydf"):
myFun("CVVA", "CVVD", indf = mydf)
myFun("HGFF", "CVVD", indf = mydf)
In R, the function that it sounds like you are looking for is which, but it won't do what you are looking for directly.
This also seems to work
fun1 <- function(v1, v2, mat) {
ind <- c(0, -nrow(mat))
indx1 <- which(mat == v1) + ind
indx2 <- which(mat == v2) + ind
if (all(sort(indx1) == sort(indx2))) {
indx1
} else NULL
}
fun1("HGFF","CVVD", mat) #mat is the matrix
#[1] 3 4
fun1("CVVA","CVVD", mat)
#NULL