Binded feature is not recognized - r

I am using Titanic dataset, to extract some data out of it.
and then I tried to extract the rules based on Aprior :
library(arules)
rules<- apriori(df)
Then I asked for two metrics lift, and oddsRatio
metrics <- interestMeasure(rules, c("oddsRatio","lift"),transactions = df)
rules<-sort(rules, decreasing = TRUE, by = "lift")
inspect(head(rules ))
head(metrics)
But, I need to sort the results based on oddsRatio, so I did
dataFramedRules <-quality(rules)
rules<-cbind(dataFramedRules,metrics)
Everything was well before the last line
rules<-sort(rules, decreasing = TRUE, by = "oddsRatio")
but in the last line it complains with:
Error in [.data.frame(x, order(x, na.last = na.last, decreasing = decreasing)) :
undefined columns selected
it seems that it cannot recognized the binded column oddsRatio.
How can I fix it?

The problem get solved by the following code:
quality(rules) <- cbind(quality(rules), oddsRatio = interestMeasure(rules, measure ="oddsRatio", df))
inspect(head(sort(rules, by = "oddsRatio")))

Related

Consistent error message while running grouping analysis in 'plspm' package

I am looking for some help in resolving an error using the partial least squares path modeling package ('plspm').
I can get results running a basic PLS-PM analysis but run into issues when using the grouping function, receiving the error message:
Error in if (w_dif < specs$tol || iter == specs$maxiter) break : missing value where TRUE/FALSE needed
I have no missing values and all variables have the proper classification. Elsewhere I read that there is a problem with processing observations with the exact same values across all variables, I have deleted those and still face this issue. I seem to be facing the issue only when I run the groups using the "bootstrap" method as well.
farmwood = read.csv("farmwood_groups(distance).csv", header = TRUE) %>%
slice(-c(119:123))
Control = c(0,0,0,0,0,0)
Normative = c(0,0,0,0,0,0)
B_beliefs = c(0,0,0,0,0,0)
P_control = c(1,0,0,0,0,0)
S_norm = c(0,1,0,0,0,0)
Behavior = c(0,0,1,1,1,0)
farmwood_path = rbind(Control, Normative, B_beliefs, P_control, S_norm, Behavior)
colnames(farmwood_path) = rownames(farmwood_path)
farmwood_blocks = list(14:18,20:23,8:13,24:27,19,4:7)
farmwood_modes = rep("A", 6)
farmwood_pls = plspm(farmwood, farmwood_path, farmwood_blocks, modes = farmwood_modes)
ames(farmwood)[names(farmwood) == "QB3"] <- "Distance"
farmwood$Distance <- as.factor(farmwood$Distance)
distance_boot = plspm.groups(farmwood_pls, farmwood$Distance, method = "bootstrap")
distance_perm = plspm.groups(farmwood_pls, farmwood$Distance, method = "permutation")
The data is contained here:
https://www.dropbox.com/s/8vewuupywpi1jkt/farmwood_groups%28distance%29.csv?dl=0
Any help would be appreciated. Thank you in advance

Making a function that builds a dataframe

I'm trying to make a function that basically builds a dataframe and returns it. This new dataframe is made of columns taken from another dataframe that I have, called metadata.. in addetion to some additional data that I want to control, by passing the TRUE or FALSE values when calling the function.
Here is what I did:
make_data = function(metric, use_additions = FALSE){
data = data.frame(my_metric = metadata[['metric']], gender = metadata$Gender ,
age = as.numeric(metadata$Age) , use_additions = t(additional_data))
data = data %>% dplyr::select(my_metric, everything())
return(data)
}
data = make_data(CR, FALSE)
I want to pass different metric values each time, and all other features stay the same. So here for example I called the function with metric as CR which is the name of the column I want in the metadata. The argument I want to control is use_additions, sometines I want to add it and sometimes I don't.
metadata and additional_data have the exact same row names and the same rows number. It's just adding the data or not.
I get this error(s):
Error in data.frame(metric = metadata[["metric"]], gender = metadata$Gender, :
arguments imply differing number of rows: 0, 1523
In addition: Warning message:
In data.frame(metric = metadata[["metric"]], gender = metadata$Gender, :
Error in data.frame(my_metric = metadata[["metric"]], gender = metadata$Gender, :
arguments imply differing number of rows: 0, 1523
I've tried several ways to do this, with '' and without, using the $, but non of these worked. So for example when I type metric = metadata[[metric]] I get this:
Error in (function(x, i, exact) if (is.matrix(i)) as.matrix(x)[[i]] else .subset2(x, :
object 'CR' not found
make_data = function(colname, use_additions = FALSE){
data = data.frame(my_metric = metadata[colname], gender = metadata$Gender ,
age = as.numeric(metadata$Age))
if (use_additions) data$use_additions=additional_data
return(data)
}
data = make_data(“CR”, FALSE)

Clustering by M3C package : Error in `[.data.frame`(df, neworder2) : undefined columns selected

I had a similar problem to what posted here. To resolve the issue, followed the answer by #Jack Gisby there. Now a new error showed up:
Working on TCGA data , I am getting the same error (first error):
Error in `.rowNamesDF<-`(x, value = value) :
duplicate 'row.names' are not allowed
running duplicated() on each relevant field returned FALSE.
Her is the second error (just after trimming identifiers to not start with a common string like "TCGA-"):
Error in `[.data.frame`(df, neworder2) : undefined columns selected
> traceback()
5: stop("undefined columns selected")
4: `[.data.frame`(df, neworder2)
3: df[neworder2]
2: M3Creal(as.matrix(mydata), maxK = maxK, reps = repsreal, pItem = pItem,
pFeature = 1, clusterAlg = clusteralg, distance = distance,
title = "/home/christopher/Desktop/", des = des, lthick = lthick,
dotsize = dotsize, x1 = pacx1, x2 = pacx2, seed = seed, removeplots = removeplots,
silent = silent, fsize = fsize, method = method, objective = objective)
1: M3C(pro.vst, des = clin, removeplots = FALSE, iters = 25, objective = "PAC",
fsize = 8, lthick = 1, dotsize = 1.25)
I've added to an opened issue on the M3C GitHub.
I got the same error as Hamid Ghaedi while running M3C. I managed to track it down to the following line of code (line 476 on the M3C.R file):
df <- data.frame(m_matrix)
Many of my sample names (column names) started with a number and the data.frame() function added an "X" to the beginning of each name that started with a number ("1" becomes "X1"). This caused a mismatch with the names listed in neworder2.
To get around this problem, I changed all of my sample names to start with a letter and M3C is now running correctly.
Edit: This workaround can be easily applied by using the data.frame() function on your input dataset before running M3C.

arulesViz subscript out of bounds paracoord

I want to perform basket analysis and draw a paracoord plot however I receive an error.
Content of this error is: :
Error in m[j, i] : subscript out of bounds.In addition: Warning message:
In cbind(pl, pr) :
number of rows of result is not a multiple of vector length (arg 2)
I am using data from: Link.
First I am transforming this to fit basket analysis, name of the original excel files is Online_Retail:
library(arules)
library(arulesViz)
library(plyr)
items <- ddply(Online_Retail, c("CustomerID", "InvoiceDate"), function(df1)paste(df1$Description, collapse = ","))
items1 <- items["V1"]
write.csv(items1, "groceries1.csv", quote=FALSE, row.names = FALSE, col.names = FALSE)
trans1 <- read.transactions("groceries1.csv", format = "basket", sep=",",skip=1)
And to draw paracoord I have created such a code:
rules.trans2<-apriori(data=trans1, parameter=list(supp=0.001,conf = 0.05),
appearance=list(default="rhs", lhs="ROSES REGENCY TEACUP AND SAUCER"), control=list(verbose=F))
sorted.plot <- sort(rules.trans2, by="support", decreasing = TRUE)
plot(sorted.plot, method="paracoord", control=list(reorder=TRUE, verbose = TRUE))
Why my code for paracoord is not working? how can I fix it? What should I change?
This is, unfortunately, a bug in arulesViz. This will be fixed in the next release (arulesViz 1.3-3). The fix is already available in the development version on GitHub: https://github.com/mhahsler/arulesViz

Error in as(x, class(k)) : no method or default for coercing “NULL” to “data.frame”

I am currently facing an error mentioned below which is related to NULL values being coerced to a data frame. The data set does contain nulls, however I have tried both is.na() and is.null() functions to replace the null values with something else. The data is stored on hdfs and is stored in a pig.hive format. I have also attached the code below. The code works fine if I remove v[,25] from the key.
Code:
AM = c("AN");
UK = c("PP");
sample.map <- function(k,v){
key <- data.frame(acc = v[!which(is.na(v[,1],1],
year = substr(v[!which(is.na(v[,1]),2],1,4),
month = substr(v[!which(is.na(v[,1]),2],5,6))
value <- data.frame(v[,3],count=1)
keyval(key,value)
}
sample.reduce <- function(key,v){
AT <- sum(v[which(v[,1] %in% AM=="TRUE"),2])
UnknownT <- sum(v[which(v[,1] %in% UK=="TRUE"),2])
Total <- AT + UnknownT
d <- data.frame(AT,UnknownT,Total)
keyval(key,d)
}
out <- mapreduce(input ="/user/hduser/input",
output = "/user/hduser/output",
input.format = make.input.format("pig.hive", sep = "\u0001")
output.format = make.output.format("csv", sep = ","),
map= sample.map)
reduce = sample.reduce)
Error:
Warning in asMethod(object) : NAs introduced by coercion
Warning in split.default(1:rmr.length(y), unique(ind), drop = TRUE) : data length is not a multiple of split variable
Warning in rmr.split(x, x, FALSE, keep.rownames = FALSE) : number of items to replace is not a multiple of replacement length Warning in split.default(1:rmr.length(y), unique(ind), drop = TRUE) :
data length is not a multiple of split variable
Warning in rmr.split(v, ind, lossy = lossy, keep.rownames = TRUE) : number of items to replace is not a multiple of replacement length
Error in as(x, class(k)) :
no method or default for coercing “NULL” to “data.frame”
Calls: <Anonymous> ... apply.reduce -> c.keyval -> reduce.keyval -> lapply -> FUN -> as No traceback available
UPDATE
I have added the sample data and edited the code above. Hope this helps!
Sample Data:
NULL,"2014-03-14","PP"
345689202,"2014-03-14","AN"
234539390,"2014-03-14","PP"
123125444,"2014-03-14","AN"
NULL,"2014-03-14","AN"
901828393,"2014-03-14","AN"
There are some issues with as which have been identified recently. I don't see why as can't handle this by default, but you can modify coerce which handles the conversion with an S4 method to call as.data.frame.
setMethod("coerce",c("NULL","data.frame"), function(from, to, strict=TRUE) as.data.frame(from))
[1] "coerce"
as(NULL,"data.frame")
data frame with 0 columns and 0 rows

Resources