Tried running this code and I am getting this error message:
"Must subset columns with a valid subscript vector. Can't convert from double to integer due to loss of precision." Could someone help either fix or convert so it recognizes the dataframe columns appropriately
data1 <- wins.df(data1, data1$q, wins.limits = c(.01, .99), append.wins.label = FALSE, verbose = TRUE)
Here is the function:
wins.df <-
function(X,
var,
wins.limits = c(.01, .99),
append.wins.label = TRUE,
verbose = TRUE) {
Y <- X
x <- X[, var]
x.w <- wins(x, wins.limits)
var.w <- var
if (append.wins.label)
var.w <- paste(var, ".w", sep = "")
Y[, var.w] <- x.w
if (verbose) {
print(summary(Y[, var.w])) print(summary(X[, var]))
}
return(Y)
}
Related
I tried to run this code (with no success):
for (i in chanel_code) {
assign(paste("prospect_",i,sep="", collapse = NULL,recycle0 = FALSE),(aggregate(na_adss_score ~ month_year + na_appl_status, paste("new_account_",i, sep="",collapse = NULL, recycle0 = FALSE),mean) %>%
mutate(aggregate(na_pcn_no ~ month_year + na_appl_status, paste("new_account_",i, sep="",collapse = NULL, recycle0 = FALSE), length))))
}
Error in eval(predvars, data, env) :
argument 'envir' incorrect de type 'character'
would you mind please helping me.
thanks in advance
You don't need assign() or all the arguments for paste(). Instead, subset your data with i and use paste0().
Assuming your dataset is called new_account_BA and new_account_BA$chanel_codes contains chanel_codes:
chanel_code=c("BA","CS","DM","DS","EN","IA","MG","PS","TM")
results <- list()
for (i in chanel_code) {
x <- new_account_BA[new_account_BA$chanel_code == paste0("prospect_",i),]
results[[chanel_code]] <- cbind(
aggregate(na_adss_score ~ month_year + na_appl_status, x = x, mean),
aggregate(na_pcn_no ~ month_year + na_appl_status, x = x, length))
}
library(RSSL)
set.seed(1)
df <- generateSlicedCookie(1000,expected=FALSE) %>%
add_missinglabels_mar(Class~.,0.98)
class_erlr <- EntropyRegularizedLogisticRegression(Class ~., df, lambda=0.01,lambda_entropy = 100)
In the EntropyRegularizedLogisticRegression function from the RSSL package, the example in the documentation passed in the formula Class ~. as the input. I was looking at the source code, and these are the parameters for the function
function (X, y, X_u = NULL, lambda = 0, lambda_entropy = 1, intercept = TRUE,
init = NA, scale = FALSE, x_center = FALSE)
I tried manually defining what X, y, X_u are based on the df I generated. But running the following gives me an error with the optimization:
y <- df$Class
X <- df[, -1]
ids <- which(is.na(y))
X_u <- X[ids, ]
class_erlr_manual <- EntropyRegularizedLogisticRegression(X = X, y = y, X_u = X_u, lambda=0.01,lambda_entropy = 100)
The error reads:
Error in optim(w, fn = loss_erlr, gr = grad_erlr, X, y, X_u, lambda = lambda, :
initial value in 'vmmin' is not finite
Why does changing the formula input Class ~. into X=X, y =y, X_u = X_u result in an error? Can anyone point me to where in the source code the formula input is being used?
i have defined a method for printing a vector with the class test:
print.test <- function(x, ...) {
x <- formatC(
as.numeric(x),
format = "f",
big.mark = ".",
decimal.mark = ",",
digits = 1
)
x[x == "NA"] <- "-"
x[x == "NaN"] <- "-"
print.default(x)
}
which works fine for the following
a <- c(1000.11, 2000.22, 3000.33)
class(a) <- c("test", class(a))
print(a)
[1] "1.000,11" "2.000,22" "3.000,33"
this also works:
round(a)
[1] "1.000,0" "2.000,0" "3.000,0"
this does not:
median(a)
[1] 2000.22
class(median(a))
[1] "numeric"
now my question is: do i need to write a custom method for this class to use median e.g. and if so what would it look like or is there another way (as i simply would like this class to print the data in a certain format)?
The problem is that median.default returns an object of class numeric therefore autoprinting of the returned object does not call your custom print method.
The following will do so.
median.test <- function(x, na.rm = FALSE, ...){
y <- NextMethod(x, na.rm = na.rm, ...)
class(y) <- c("test", class(y))
y
}
median(a)
#[1] "2.000,2"
As for the handling of NA values, I will first define another method for a base R function. It is not strictly needed but save some code lines if objects of class test are used frequently.
c.test <- function(x, ...){
y <- NextMethod(x, ...)
class(y) <- c("test", class(y))
y
}
b <- c(a, NA)
class(b)
#[1] "test" "numeric"
median(b)
#[1] "-"
median(b, na.rm = TRUE)
#[1] "2.000,2"
EDIT.
The following defines a generic function wMedian, a default method and a method for objects of class "currency", as requested by the OP in a comment.
Note that there must be a method print.currency, which I don't redefine since it's exactly the same as print.test above. As for the other methods, I have made them simpler with the help of a new function, as.currency.
median.currency <- function(x, na.rm = FALSE, ...){
y <- NextMethod(x, na.rm = na.rm, ...)
as.currency(y)
}
c.currency <- function(x, ...){
y <- NextMethod(x, ...)
as.currency(y)
}
as.currency <- function(x){
class(x) <- c("currency", class(x))
x
}
wMedian <- function(x, ...) UseMethod("wMedian")
wMedian.default <- function(x, ...){
matrixStats::weightedMedian(x, ...)
}
wMedian.currency <- function(x, w = NULL, idxs = NULL, na.rm = FALSE, interpolate = is.null(ties), ties = NULL, ...) {
y <- NextMethod(x, w = w, idxs = idxs, na.rm = na.rm, interpolate = interpolate, ties = ties, ... )
as.currency(y)
}
set.seed(1)
x <- rnorm(10)
wMedian(x, w = (1:10)/10)
#[1] 0.4084684
wMedian(as.currency(x), w = (1:10)/10)
#[1] "0,4"
I need to write a function which draws a plot for the variables. The problem is that it doesn't print the name of variables.
visual<-function( x , y){
df<-cbind(x,y)
df<-scale(df, center = TRUE, scale = TRUE)
df<-as.data.frame(df)
ggpairs(df, columns=1:2,xlab = colnames(df)[1],ylab =colnames(df)[2])
}
If we have these to vectors:
a <- c(128.095014, 71.430997, 88.704595, 48.180638)
b <- c(10.584888, 10.246740, 4.422322, 9.621246)
visual(a,b)
What is wrong with that?
You can use substitute to get the names of the objects passed into your function.
visual<-function(x, y){
xname <- substitute(x)
yname <- substitute(y)
df<-cbind(x,y)
df<-scale(df, center = TRUE, scale = TRUE)
df<-as.data.frame(df)
names(df) <- c(xname, yname)
GGally::ggpairs(df, columns=1:2, xlab = colnames(df)[1], ylab =colnames(df)[2])
}
b<-c(128.095014, 71.430997, 88.704595, 48.180638)
a<-c(10.584888, 10.246740, 4.422322, 9.621246)
visual(a,b)
output
I am working on the Kaggle Digit Recognizer problem.when I tried the given code I got the error.
Error in eval(expr, envir, enclos) : could not find function "eval"
library(ggplot2)
library(proto)
library(readr)
train <- data.frame(read_csv("../input/train.csv"))
labels <- train[,1]
features <- train[,-1]
rowsToPlot <- sample(1:nrow(train), 49)
rowToMatrix <- function(row) {
intensity <- as.numeric(row)/max(as.numeric(row))
return(t(matrix((rgb(intensity, intensity, intensity)), 28, 28)))
}
geom_digit <- function (digits, labels) GeomRasterDigit$new(geom_params =
list(digits=digits),stat = "identity", position = "identity", data = NULL,
inherit.aes = TRUE)
I am getting the error when I run the following segment.
GeomRasterDigit <- proto(ggplot2:::GeomRaster, expr={
draw_groups <- function(., data, scales, coordinates, digits, ...) {
bounds <- coord_transform(coordinates, data.frame(x = c(-Inf, Inf), y = c(
- Inf, Inf)), scales)
x_rng <- range(bounds$x, na.rm = TRUE)
y_rng <- range(bounds$y, na.rm = TRUE)
rasterGrob(as.raster(rowToMatrix(digits[data$rows,])), x_rng[1], y_rng[1],
diff(x_rng), diff(y_rng),default.units = "native", just =c("left","bottom"),
interpolate = FALSE)
}
})
Link for the complete code :
https://www.kaggle.com/benhamner/digit-recognizer/example-handwritten-digits/code
Take a look at the latest ggplot2 code on github. ggproto now replaces proto among other changes.
The code below should work fine.
GeomRasterDigit <- ggproto(ggplot2:::GeomRaster, expr={
draw_groups <- function(., data, scales, coordinates, digits, ...) {
bounds <- coord_transform(coordinates, data.frame(x = c(-Inf, Inf), y = c(
- Inf, Inf)), scales)
x_rng <- range(bounds$x, na.rm = TRUE)
y_rng <- range(bounds$y, na.rm = TRUE)
rasterGrob(as.raster(rowToMatrix(digits[data$rows,])), x_rng[1], y_rng[1],
diff(x_rng), diff(y_rng),default.units = "native", just =c("left","bottom"),
interpolate = FALSE)
}
})
There is a vignette about ggproto that is a good read.