dim(X) must have a positive length - r

I am trying to call a Azure Machine Learning web service from Microsoft Power BI (Visualization tool) through R. The process demands input to be given as a list. So for that I am converting my input to a list in R. Below is my code.
dataset <- data.frame(sqlQuery(conn, "SELECT * FROM dbo.Automobile"))
close(conn)
if(nrow(dataset)>0)
{
dataset <- dataset[,c(-1, -14)]
dataset <- na.omit(dataset)
createList <- function(dataset)
{
temp <- apply(dataset, 1, function(x) as.vector(paste(x, sep = "")))
colnames(temp) <- NULL
temp <- apply(temp, 2, function(x) as.list(x))
return(temp)
}
...
I am very new to R so above code is from Power BI's documentation only. But is gives the following error :
dim(X) must have a positive length
I tried googling this error and applied some of the workarounds like
1. using lapply function
2. adding drop=F
but kept on returning errors.
Can anyone help me with this ?

Related

Returning objects with lapply

a small question. I'm trying to get the values for the network indicies (fl_wp_mod, fl_wp_den) stored in seperate variables after running the function. I tried this but I'm able to get it.
Any idea why?
Sorry, I'm new to lapply and R in general.
cluster_modularity = function(graph_object){
fl_wp_ig <- graph_from_incidence_matrix(graph_object)
fl_wp_cw <- cluster_walktrap(fl_wp_ig)
fl_wp_mod <- modularity(fl_wp_cw)
fl_den <- edge_density(fl_wp_ig, loops = FALSE)
return(c(fl_wp_mod, fl_den))
}
Mod = lapply(fl_wp_n, cluster_modularity[1]) #fl_wp_n is the raw data
Den = lapply(fl_wp_n, cluster_modularity[2]) #Both these lines are giving me errors
Using sapply is more convenient here.
ans <- sapply(fl_wp_n, cluster_modularity)
Mod <- ans[1, ]
Den <- ans[2, ]

Problems with changing a function variable inside another lower function in R?

I need to open files (matrices) from a directory and apply a function pca on each one. It uses another function count_pc which is thought to null diagonals in the matrix step by step and add recalculated PC1 to a the table pcs from the previous function. At the start, I didn't think of environments so count_pca was crashing with the error "unknown variable". Then I tried to do it this way:
files <- list.files()
count_pc <- function(x, env = parent.frame()) {
diag(file[x:nrow(file),]) <- 0
diag(file[,x:nrow(file)]) <- 0
pcn <- prcomp(file, scale = FALSE)
pcn <- data.frame(pcn$rotation)
pcs <- cbind(pcs, pcn$PC1)
}
pca <- function(filename) {
file <- as.matrix(read.table(filename))
pc <- prcomp(file, scale = FALSE)
pc <- data.frame(pc$rotation)
pc1 <- pc$PC1
pcs <- data.frame(pc1)
for (k in 1:40) {
count_pc(k)
}
new_filename <- strsplit(filename, "_")[[1]][3]
print(pcs)
colnames(pcs) <- paste0(0:40, rep("_bins_deleted", 40))
write.table(pcs, file=paste(new_filename, "eigenvectors", sep="_"))
return(apply(pcs, 2, cor, y = pc1))
}
ldply(files, pca)
And indeed, count_pc does not crash with above error but, unfortunately, it crashes with the new one:
"colnames<-`(`*tmp*`, value = c("0_bins_deleted", "1_bins_deleted", :
'names' [41] attribute must be the same length as the vector [1]"
which means that count_pc does not change needed variables. First, I thought the problem might be connected with using sapply(1:40, count_pc) so I replaced it with a cycle. But it didn't help. I've also tried to use environment(count_pc) <- environment() in the pca but it didn't help either (as well as changing variable names in count_pc to env$'name'). I don't know what to do and googling doesn't seem to help.

Running 'xlsx' processes in parallel, using the 'parallel' R package

I have a project where I need to process some data from an Excel file with R. I must use the 'xlsx' package because of some specific functions.
First, I wrote a script, which works as expected without errors.
options(java.parameters = "-Xmx4096m") #for extra memory
library(xlsx)
wb <- loadWorkbook(file = "my_excel.xlsx")
sheet1 <- getSheets(wb)[[1]]
rows <- getRows(sheet1)
make_df <- function (x) {
cells <- getCells(rows[x])
styles <- sapply(cells, getCellStyle)
cellColor <- function(style) {
fg <- style$getFillForegroundXSSFColor()
rgb <- tryCatch(fg$getRgb(), error = function(e) NULL)
rgb <- paste(rgb, collapse = "")
return(rgb)
}
colors <- sapply(styles, cellColor)
if (!any(colors == "ff0000")) {
df[nrow(df) + 1, ] <- sapply(cells, getCellValue) #I define this 'df' somewhere in the code; this part could be improved
}
}
df <- sapply(1 : length(rows), make_df)
In short, I am looking for the rows in Excel where there are no red-colored cells, like described here. The problem is that the Excel file is very big, and it takes a lot of time to process.
What I'd like to do is to run the row checking in parallel, to be more efficient, so I added:
cl = makeCluster(detectCores() - 1)
clusterEvalQ(cl=cl, c(library(xlsx))) #sharing the package with the workers
clusterExport(cl = cl, c('rows')) #sharing the 'row' variable with the workers
df <- parSapply(cl, 1 : length(rows), make_df)
And after running this, I get the following error:
Error in checkForRemoteErrors(val) :
7 nodes produced errors; first error: RcallMethod: attempt to call a method of a NULL object.
I tried the parallelization with another example, without using 'xlsx' functions, and it worked.
After some digging, I found this post which offered somewhat of an answer (more like a workaround), but I can't seem to be able to implement it.
Is there a clean way to do what I'm trying to do here?
If not, then what would be the best solution in this case?

R - For Cycle and Apply function (Quantmod)

Hy,
I have this data frame, I want download the data from Yahoo and Calculate Percent Change (Delt function in Quantmod)
View(Equity)
Symbol
1 A
2 AA
3 AAC
I made a cycle
m<-nrow(Equity)
for (i in 1:m) {
EquityDF <- Equity[i,]
Data<-getSymbols(EquityDF,src="yahoo")
Delt[i]<-apply(EquityDF[,1:5], 2, function(x) Delt(x, k=1)*100)
}
But I received this error
Error in EquityDF[, 1:5] : incorrect number of dimensions
I know why this error appear because if I make
EquityDF
the output it is
"A"
how can I fix this ?
Thanks
This happens because EquityDF is still a character. To retrieve the corresponding data you must use get: get(EquityDF)[, 1:5]
Additionally I'd suggest to call getSymbols only once, so that you retrieve all your needed data in a single call, thus your code can be simplified to:
Equity <- data.frame(Symbol = c("A","AA","AAC"), stringsAsFactors = FALSE)
getSymbols(Equity[, 1], src="yahoo")
Delt <- lapply(mget(Equity[, 1]), function(y){
apply(y[, 1:5], 2, function(x) Delt(x, k=1)*100)})

How to debug "invalid subscript type 'list'" error in R (genalg package)

I am new to genetic algorithms and am trying a simple variable selection code based on the example on genalg package's documentation:
data(iris)
library(MASS)
X <- cbind(scale(iris[,1:4]), matrix(rnorm(36*150), 150, 36))
Y <- iris[,5]
iris.evaluate <- function(indices) {
result = 1
if (sum(indices) > 2) {
huhn <- lda(X[,indices==1], Y, CV=TRUE)$posterior
result = sum(Y != dimnames(huhn)[[2]][apply(huhn, 1,
function(x)
which(x == max(x)))]) / length(Y)
}
result
}
monitor <- function(obj) {
minEval = min(obj$evaluations);
plot(obj, type="hist");
}
woppa <- rbga.bin(size=40, mutationChance=0.05, zeroToOneRatio=10,
evalFunc=iris.evaluate, verbose=TRUE, monitorFunc=monitor)
The code works just fine on its own, but when I try to apply my dataset (here), I get the following error:
X <- reducedScaledTrain[,-c(541,542)]
Y <- reducedScaledTrain[,542]
ga <- rbga.bin(size=540, mutationChance=0.05, zeroToOneRatio=10,
evalFunc=iris.evaluate, verbose=TRUE, monitorFunc=monitor)
Testing the sanity of parameters...
Not showing GA settings...
Starting with random values in the given domains...
Starting iteration 1
Calucating evaluation values... Error in dimnames(huhn)[[2]][apply(huhn, 1, function(x) which(x == max(x)))] :
invalid subscript type 'list'
I am trying to perform feature selection on 540 variables (I've eliminated the variables with 100% correlation) using LDA. I've tried transforming my data into numeric or list, but to no avail. I have also tried entering the line piece by piece, and the 'huhn' line works just fine with my data. Please help, I might be missing something...

Resources