I'm a python guy and very new to R (so far, all I've done is copy-paste code and screen-shot the resulting, graph).
I would now like to actually learn the language so that I can draw useful plots (right now, I am trying to plot this).
In attempting my first plot, I came across this function call:
sets_options("universe", seq(from = 0, to = 25, by = 0.1))
Now, I would like to know if I can achieve the same result by calling
sets_options("universe", seq(0, 25, 0.1))
The help page for seq doesn't speak to this specifically (or I'm not reading it correctly), so I was hoping someone could shed some light on how R handles positional arguments
I tried calling the function that way in R and it worked (no syntax errors, etc), but I don't know how to test the output of that function, so I'm forced to ask here
Calling sets_options() will display the current settings. From the following log, it seems that the positional arguments are treated as expected:
> sets_options("universe", seq(0,5,0.25))
> sets_options()
$quote
[1] TRUE
$hash
[1] TRUE
$openbounds
[1] "()"
$universe
[1] 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75 3.00 3.25 3.50 3.75 4.00 4.25 4.50 4.75 5.00
> sets_options("universe", seq(from=0,to=5,by=0.25))
> sets_options()
$quote
[1] TRUE
$hash
[1] TRUE
$openbounds
[1] "()"
$universe
[1] 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75 3.00 3.25 3.50 3.75 4.00 4.25 4.50 4.75 5.00
The question is what seq is doing with positional versus named objects. The way to address this looking at the ?seq page which lays out the named arguments and their order:
seq(from = 1, to = 1, by = ((to - from)/(length.out - 1)),
length.out = NULL, along.with = NULL, ...)
So seq(0, 25, 0.1) will be interpreted the same way as seq(from = 0, to = 25, by = 0.1) since the order is the same as name in Usage listing.
Related
I am trying to make a ROC Curve using pROC with the 2 columns as below: (the list goes on to over >300 entries)
Actual_Findings_%
Predicted_Finding_Prob
0.23
0.6
0.48
0.3
0.26
0.62
0.23
0.6
0.48
0.3
0.47
0.3
0.23
0.6
0.6868
0.25
0.77
0.15
0.31
0.55
The code I tried to use is:
roccurve<- plot(roc(response = data$Actual_Findings_% <0.4, predictor = data$Predicted_Finding_Prob >0.5),
legacy.axes = TRUE, print.auc=TRUE, main = "ROC Curve", col = colors)
Where the threshold for positive findings is
Actual_Findings_% <0.4
AND
Predicted_Finding_Prob >0.5
(i.e to be TRUE POSITIVE, actual_finding_% would be LESS than 0.4, AND predicted_finding_prob would be GREATER than 0.5)
but when I try to plot this roc curve, I get the error:
"Setting levels: control = FALSE, case = TRUE
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'plot': Predictor must be numeric or ordered."
Any help would be much appreciated!
This should work:
data <- read.table( text=
"Actual_Findings_% Predicted_Finding_Prob
0.23 0.6
0.48 0.3
0.26 0.62
0.23 0.6
0.48 0.3
0.47 0.3
0.23 0.6
0.6868 0.25
0.77 0.15
0.31 0.55
", header=TRUE, check.names=FALSE )
library(pROC)
roccurve <- plot(
roc(
response = data$"Actual_Findings_%" <0.4,
predictor = data$"Predicted_Finding_Prob"
),
legacy.axes = TRUE, print.auc=TRUE, main = "ROC Curve"
)
Now importantly - the roc curve is there to show you what happens when you varry your classification threshold. So one thing you do do wrong is to go and enforce one, by setting predictions < 0.5
This does however give a perfect separation, which is nice I guess. (Though bad for educational purposes.)
Recently trying to solve a problem which involves a formula.
I have one already as an example.
So
MIN 0.58 equals X = 0.50
MAX 1.23 equals X = 4.50
You can achieve any in between values with the following formula: (0.1621 * X) + 0.4990, works great.
Example:
(0.1621 * 1.20) + 0.4990 = 0.69, so 0.69 equals X = 1.20
However the scenario changed, and so the formula, and I can't find out.
MIN 0.59 equals X = 0.20
MAX 1.10 equals X = 4.80
I have a dataframe and I need to calculate log for all numbers greater than 0 and log1p for numbers equal to 0. My dataframe is called tcPainelLog and is it like this (str from columns 6:8):
$ IDD: num 0.04 0.06 0.07 0.72 0.52 ...
$ Soil: num 0.25 0.22 0.16 0.00 0.00 ...
$ QAI: num 0.00 0.50 0.00 0.71 0.26 ...
Therefore, I guess need to concatenate an ifelse statement with log and log1p functions. However, I tried several different ways to do it, but none has succeeded. For instance:
tcPainelLog <- tcPainel
cols <- names(tcPainelLog[,6:17]) # These are the columns I need to calculate
tcPainelLog$IDD <- ifelse(X = tcPainelLog$IDD>0, log(X), log1p(X))
tcPainelLog[cols] <- lapply(tcPainelLog[cols], function(x) ifelse((x > 0), log(x), log1p(x)))
tcPainelLog[cols] <- if(tcPainelLog[,cols] > 0) log(.) else log1p(.)
I haven't been able to perform it and I would appreciate any help for that. I am really sorry it there is an explanation for that, I searched by many words but I didn't find it.
Best regards.
This one is really stumping me. I cannot find what is wrong with this line of code and I'm using it in a function and it makes the entire function fail.
Here is a vector of class 'matrix', result.1:
SPY TLT VGK EEM BIV RWX IYR EWJ DBC GLD PCY LQD TIP
-0.08 0.09 -0.07 -0.07 -0.09 -0.07 -0.07 -0.07 -0.066 -0.04 -0.08 -0.08 -0.08
I calculate the mean of this vector by running this line of code:
mean.momentum.1 <- mean(result.1)
The result is a numeric with a value of -0.062309. I want to get the names of the columns that are greater than the mean, so I run this code:
names(result.1[,(result.1 > mean.momentum.1)])
The output is as I expect, returning a character vector "TLT", "GLD".
Here is the issue: when I do this with a second, nearly identical matrix result.2 I get a NULL result every time, when I should be getting the result "TLT".
Here is the matrix result.2:
SPY TLT VGK EEM BIV RWX IYR EWJ DBC GLD TIP PCY LQD
-0.08 0.15 -0.07 -0.06 -0.054 -0.07 -0.07 -0.07 -0.06 -0.06 -0.07 -0.07 -0.06
I calculate the mean using the same method as above (-0.05298) and name it mean.momentum.2
Then, I run this line:
names(result.2[,(result.2 > mean.momentum.2)])
I get NULL back every time, when I expect "TLT". What is going wrong with the way I am subsetting? If I run the line:
result.2 > mean.momentum.2
I get a logical vector where TLT is TRUE, but it does not work when I try to subset with this method. It works perfectly in the first instance, but never works in the second instance, and since I get NULL back, my entire function fails.
Thank you...
Use colnames() and rownames() rather than names() when referring to a matrix:
mean.momentum.1 <- mean(result.1)
colnames(result.1)[which(result.1 > mean.momentum.1)]
## [1] "TLT" "GLD"
Also note the syntax of colnames() above. You are returning a subset of the vector colnames(result1) using which() to select the elements of the vector for which result.1 is greater than mean.momentum.1.
This works fine for result.2 also:
mean.momentum.2 <- mean(result.2)
colnames(result.2)[which(result.2 > mean.momentum.2)]
## [1] "TLT"
I’ve been having some trouble with the plotCalibration() function, I have managed to get it to work before, but recently whilst working with another dataset (here is a link to the .Rda data file), I have been unable to shake off an error message which keeps cropping up:
> plotCalibration(data = data, cOutcome = 2, predRisk = data$sortmort)
Error in plotCalibration(data = data, cOutcome = 2, predRisk = data$sortmort) : The specified outcome is not a binary variable.`
When I’ve tried to set the cOutcome column to factors or to logical, it still doesn’t work.
I’ve looked at the source of the function and the only time the error message comes up is in the first if()else{} statement:
if (length(unique(y))!=2) {stop(" The specified outcome is not a binary variable.\n")}
else{
But I have checked that the length(unique(y)) is indeed ==2, and so don’t understand why the error message still crops up!
Be sure you're passing a dataframe to PlotCalibration. Passing a dplyr tibble can cause this error. Converting with the normal as.data.frame() worked for me.
Using the data you sent earlier, I do not see any error though:
Following output were produced along with a calibration plot:
> library(PredictABEL)
> plotCalibration(data = data, cOutcome = 2, predRisk = data$sortmort)
$Table_HLtest
total meanpred meanobs predicted observed
[0.000632,0.00129) 340 0.001 0.000 0.31 0
0.001287 198 0.001 0.000 0.25 0
[0.001374,0.00201) 283 0.002 0.004 0.53 1
0.002009 310 0.002 0.000 0.62 0
[0.002505,0.00409) 154 0.003 0.000 0.52 0
[0.004086,0.00793) 251 0.006 0.000 1.42 0
[0.007931,0.00998) 116 0.008 0.009 0.96 1
[0.009981,0.19545] 181 0.024 0.011 4.40 2
$Chi_square
[1] 4.906
$df
[1] 8
$p_value
[1] 0.7676
Please try using table(data[,2],useNA = "ifany") to see the number of levels of the outcome variable of your dataset.
The function plotCalibration will execute when the outcome is a binary variable (two levels).