Converting Dataframe into List and inputting the function - r

I have a dataset with different receipts, items and its value:
library(arules)
library(VennDiagram)
Data <- data.frame(
Receipt_ID = c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,5,5,6,6,6),
item = c('a','b','c','k','a','b','d','k','a','k','c','q','k', 'a','b','a','a', 'b', 'c'
)
,
value = c(2,3,2,4,2,5,6,7,8,1,2,3,4,5,6,7,8,3,4
)
)
And I created a market basket, with some output like below:
rules <- data.frame(
Target = c("a","a","b"),
item1 = c("b","b","k" )
,
item2 = c("k","",""),
lift = c(1,2,3),
confidence = c(0.08,0.45,0.1)
)
)
I created a function which will take the Vector of items as input(Combination of Target, item1, item2 in vector form)
findvalue <- function (vectormb) {
keyvalue <- Data[Data$item %in% vectormb, ]
keyvaluetran <- sapply(vectormb,function(x){
ret <- unique (keyvalue$Receipt_ID[keyvalue$item==x])
})
#reducetran <- calculate.overlap(keyvaluetran)
reducetran1 <- Reduce(intersect,keyvaluetran)
totalsales <- sum (keyvalue$value[keyvalue$Receipt_ID %in% reducetran1])
return(totalsales)
}
And I comverted the Dataframe to List
createlist <- function(data){
subset <- data.frame(data$Target,data$item1,data$item2)
listdata <- apply(subset,1,
function(x){
ret<-list( x[1:ncol(subset)])
names(ret)<-as.character(x[1])
return(ret)
})
listdata <- createlist(rules)
List data converts the dataframe to vector.
My question is how can I pass the Listdata Output to the function findvalue.
I tried couple of things:
findvalue(c("a","b","k")) is giving proper value as 23.
When I tried findvalue(listdata[1]), this is not giving any value.
What mistake I did. Listdata should be sent to findvalue in a loop.

Your createlist() function has no return value - hence it returns nothing for you. Here is what I would suggest:
createlist <- function(data) {
subset <- data.frame(data$Target,data$item1,data$item2)
apply(subset,1,
function(x){
ret<-list( x[1:ncol(subset)])
names(ret)<-as.character(x[1])
return(ret)
})
}
listdata <- createlist(rules)
This will create your listdata variable properly.
Then, in order to run findvalue() on it, the way the function is currently written, you need to pass on a vector, and not a list. Thus:
> class(listdata[1])
[1] "list"
> findvalue(unlist(listdata[1]))
[1] 23

Related

R retrieving values from data.tables while within custom functions

I'm trying to retrive a value from a datatable as part of a larger custom function. I can gnerate the row number, but I can't retrive the values from that row. The formula works outside of the function environment but not inside.
example_outlier_table <- data.table(dataset = c("doe", "ray", "me", "fa", "so"),
upper_limit = c(2,6,9,11,7))
example_function <- function(dt,otable){
return(match(deparse(substitute(dt)), otable$dataset))
}
example_function(ray, example_outlier_table)
result = 2
This is correct, 'ray' is the second entry in the 'dataset' column
In this example, 'ray' is both the character string in 'example_outlier_table$dataset' and the name of another data table object, hence the 'deparse(substitute(dt))' step.
The issue is this: I want to use the value that 'ray' indicates in the example_outlier_table, number 6, in another place within my custom function.
example_function <- function(dt,otable){
return(otable[dataset == as.character(deparse(substitute(dt))),
upper_limit])
}
example_function(ray, example_outlier_table)
result = numeric(0)
incorrect
example_function <- function(dt,otable){
return(otable[match(deparse(substitute(dt)), otable$dataset),
upper_limit])
}
example_function(ray, example_outlier_table)
result = [1] NA
We could directly extract the column with [[
example_function <- function(dt,otable){
dt <- deparse(substitute(dt))
otable[["upper_limit"]][otable[["dataset"]] == dt]
}
-testing
example_function(ray, example_outlier_table)
[1] 6
Or using the data.table methods
example_function <- function(dt,otable){
dt <- deparse(substitute(dt))
otable[dataset == dt, upper_limit][[1]]
}
example_function(ray, example_outlier_table)
[1] 6

How to get the original name of a object when sending a list of objects to a function

Let's say I have the following function:
return_name <- function(data){
for(datasets in data)
print(deparse(substitute(datasets)))
}
my_data_1 <- data.frame(a = "a", b = "b")
my_data_2 <- data.frame(a = "a", b = "b")
return_name(list(my_data, my_data_2))
I'd like to be able for this function to print my_data_1 followed by my_data_2 (the name of the object in memory.
Instead it prints the structure of the entire object.
Edit for #Ronak
In my actual code I am iterating over a list of dataframes (not a named list). I need to be able to grep on the name of the current object in the for loop.
It looks something like this:
data_list = list(my_data_1, my_data_2)
random_function <- function(data_list){
for(datasets in data_list)
value = ifelse(grepl("my_data_1", return_name(datasets)), 1, 0)
}
The problem is that the return_name function described in your answer will return "datasets" and not the actual, original object name.
This is kind of a hack :
return_name <- function(data){
strsplit(gsub('list|[()]', '', deparse(substitute(data))), ',\\s*')[[1]]
}
return_name(list(my_data, my_data_2))
#[1] "my_data" "my_data_2"
return_name(my_data_1)
#[1] "my_data_1"

Is there an R function to add content to an already existing list?

I am a newbie on R. I am trying to create a list within 2 functions. The first one is the extraction function, it takes the data and creates a list with it. The second one is the process one, it calculates some values and I need them to be together in the first list. How do I do that ?
myfun <- function(data,number_meta) { #extraction function
OR <- data$`Odds Ratio`[data$`Identification number`==number_meta]
SE <- ((log(data$`Upper limit`) - (log(data$`Lower limit`))) / 3.92)[data$`Identification number`==number_meta]
res <- metagen(TE=log(OR),seTE=SE,sm="OR")
tableau = cbind(OR, SE)
LIST = list(tableau, res)
return(LIST)
}
myfun(data,number_meta)
number_meta = c(1:33)
i = c(1:33)
number_meta = i
LIST = list()
for (i in 1:33) {
LIST[[i]] = myfun(data, number_meta[i])
}
myfun2 <- function(LIST) { # processing function
dup_OR <- duplicated(LIST[[i]][[1]][,1])
dup_SE <- duplicated(LIST[[i]][[1]][,2])
options(scipen = 999)
Egger <- metabias(LIST[[i]][[2]], method.bias = "linreg", k.min = 1)
Begg <- metabias(LIST[[i]][[2]], method.bias = "rank", k.min = 1)
Result <- c(dup_OR,dup_SE,Egger,Begg)
return(Result)
}
myfun2(LIST)
for (i in 1:33) {
LIST[[i]] = c(LIST, list(myfun2(LIST))) ## This one is not working !
}
I would like to obtain a final list of 33 items in which I could find inside the different values of res, dup_OR, dup_SE, Egger, Begg. These values varies from the values of res. Thanks for your help
Here is my original script :
setwd("U:/Stage M2 Phame")
library(readxl)
library(meta)
data <- read_excel("Tableau_OR.xlsx")
OR <- ((data$`Odds Ratio`[data$`Identification number`==number_meta[i]]))
SE <- (((log(data$`Upper limit`) - (log(data$`Lower limit`)))/3.92)[data$`Identification
number`==number_meta[i]])
dup_OR <- duplicated(OR)
dup_SE <- duplicated(SE)
options(scipen = 999)
res <- metagen(TE=log(OR),seTE=SE,sm="OR")
Egger <- metabias(res, method.bias = "linreg", k.min = 5)
Begg <- metabias(res, method.bias = "rank", k.min = 5)
Trim <- trimfill(res)
LIST=list(dup_OR, dup_SE, Egger, Begg, Trim)
Sorry for my whole block of text.
How about this (I have taken the liberty of generating a minimal working example):
## define a function that appends something to an existing list
appendtolist = function(oldlist, add_element){
if(class(add_element) == "list"){
oldlist = c(oldlist, add_element)
}
else if(class(add_element) != "list"){
oldlist[[length(oldlist) + 1]] = add_element
}
return(oldlist)
}
## define a test list
firstlist = list("a", c(1:10), "test")
## add content to the first list
newlist = appendtolist(firstlist, c(1:1000))

name columns of list element with same name

my code is like the following:
unemp <- c(1:10)
bsp_li <- list(c(1:10),c(11:20),c(21:30))
var_data_rep <- lapply(bsp_li, function(x) {cbind(as.numeric(x), as.numeric(unemp))} )
var_data_rep2 <- lapply(var_data_rep, function(x) {colnames(x) = c("rGDP", "U")} )
but it does not what i wanted. i would like to name always the two elements of the list var_data_rep with c("rGDP", "U"). but instead the values are overwritten by c("rGDP", "U") and becomes the sole elements of the list.. can anyone help? i need the same names because i want to estimate always the same model later.
Easy fix: put the names in as the matrices are created:
var_data_rep <- lapply(bsp_li, function(x) {
cbind(rGDP = as.numeric(x), U = as.numeric(unemp))
} )
More roundabout fix (why your attempt didn't work): functions return the last line. You want to return x, not colnames(x)
var_data_rep2 <- lapply(var_data_rep, function(x) {
colnames(x) = c("rGDP", "U")
return(x)
} )
Fancy fix: use the colnames<- function directly
var_data_rep3 = lapply(var_data_rep, `colnames<-`, c("rGDP", "U"))

Simplify ave() or aggregate() with several inputs

How can I write this all in one line?
mydata is a "zoo" series, limit is a numeric vector of the same size
tmp <- ave(coredata(mydata), as.Date(index(mydata)),
FUN = function(x) cummax(x)-x)
tmp <- (tmp < limit)
final <- ave(tmp, as.Date(index(mydata)),
FUN = function(x) cumprod(x))
I've tried to use two vectors as argument to ave(...) but it seems to accept just one even if I join them into a matrix.
This is just an example, but any other function could be use.
Here I need to compare the value of cummax(mydata)-mydata with a numeric vector and
once it surpasses it I'll keep zeros till the end of the day. The cummax is calculated from the beginning of each day.
If limit were a single number instead of a vector (with different possible numbers) I could write it:
ave(coredata(mydata), as.Date(index(mydata)),
FUN = function(x) cumprod((cummax(x) - x) < limit))
But I can't introduce there a vector longer than x (it should have the same length than each day) and I don't know how to introduce it as another argument in ave().
Seems like this routine imposes intraday stoploss based on maxdrawdown. So I assume you want to be able to pass in variable limit as a second argument to your aggregation function which only currently only takes 1 function due to the way ave works.
If putting all this in one line is not an absolute must, I can share a function I've written that generalizes aggregation via "cut variables". Here's the code:
mtapplylist2 <- function(t, IDX, DEF, MoreArgs=NULL, ...)
{
if(mode(DEF) != "list")
{
cat("Definition must be list type\n");
return(NULL);
}
a <- c();
colnames <- names(DEF);
for ( i in 1:length(DEF) )
{
def <- DEF[[i]];
func <- def[1];
if(mode(func) == "character") { func <- get(func); }
cols <- def[-1];
# build the argument to be called
arglist <- list();
arglist[[1]] <- func;
for( j in 1:length(cols) )
{
col <- cols[j];
grp <- split(t[,col], IDX);
arglist[[1+j]] <- grp;
}
arglist[["MoreArgs"]] <- MoreArgs;
v <- do.call("mapply", arglist);
# print(class(v)); print(v);
if(class(v) == "matrix")
{
a <- cbind(a, as.vector(v));
} else {
a <- cbind(a, v);
}
}
colnames(a) <- colnames;
return(a);
}
And you can use it like this:
# assuming you have the data in the data.frame
df <- data.frame(date=rep(1:10,10), ret=rnorm(100), limit=rep(c(0.25,0.50),50))
dfunc <- function(x, ...) { return(cummax(x)-x ) }
pfunc <- function(x,y, ...) { return((cummax(x)-x) < y) }
# assumes you have the function declared in the same namespace
def <- list(
"drawdown" = c("dfunc", "ret"),
"hasdrawdown" = c("pfunc", "ret", "limit")
);
# from R console
> def <- list("drawdown" = c("dfunc", "ret"),"happened" = c("pfunc","ret","limit"))
> dim( mtapplylist2(df, df$date, def) )
[1] 100 2
Notice that the "def" variable is a list containing the following items:
computed column name
vector arg function name as a string
name of the variable in the input data.frame that are inputs into the function
If you look at the guts of "mtapplylist2" function, the key components would be "split" and "mapply". These functions are sufficiently fast (I think split is implemented in C).
This works with functions requiring multiple arguments, and also for functions returning vector of the same size or aggregated value.
Try it out and let me know if this solves your problem.

Resources