Apply log and log1p to several columns with an if condition - r

I have a dataframe and I need to calculate log for all numbers greater than 0 and log1p for numbers equal to 0. My dataframe is called tcPainelLog and is it like this (str from columns 6:8):
$ IDD: num 0.04 0.06 0.07 0.72 0.52 ...
$ Soil: num 0.25 0.22 0.16 0.00 0.00 ...
$ QAI: num 0.00 0.50 0.00 0.71 0.26 ...
Therefore, I guess need to concatenate an ifelse statement with log and log1p functions. However, I tried several different ways to do it, but none has succeeded. For instance:
tcPainelLog <- tcPainel
cols <- names(tcPainelLog[,6:17]) # These are the columns I need to calculate
tcPainelLog$IDD <- ifelse(X = tcPainelLog$IDD>0, log(X), log1p(X))
tcPainelLog[cols] <- lapply(tcPainelLog[cols], function(x) ifelse((x > 0), log(x), log1p(x)))
tcPainelLog[cols] <- if(tcPainelLog[,cols] > 0) log(.) else log1p(.)
I haven't been able to perform it and I would appreciate any help for that. I am really sorry it there is an explanation for that, I searched by many words but I didn't find it.
Best regards.

Related

ROC Curve Plot using R (Error code: Predictor must be numeric or ordered)

I am trying to make a ROC Curve using pROC with the 2 columns as below: (the list goes on to over >300 entries)
Actual_Findings_%
Predicted_Finding_Prob
0.23
0.6
0.48
0.3
0.26
0.62
0.23
0.6
0.48
0.3
0.47
0.3
0.23
0.6
0.6868
0.25
0.77
0.15
0.31
0.55
The code I tried to use is:
roccurve<- plot(roc(response = data$Actual_Findings_% <0.4, predictor = data$Predicted_Finding_Prob >0.5),
legacy.axes = TRUE, print.auc=TRUE, main = "ROC Curve", col = colors)
Where the threshold for positive findings is
Actual_Findings_% <0.4
AND
Predicted_Finding_Prob >0.5
(i.e to be TRUE POSITIVE, actual_finding_% would be LESS than 0.4, AND predicted_finding_prob would be GREATER than 0.5)
but when I try to plot this roc curve, I get the error:
"Setting levels: control = FALSE, case = TRUE
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'plot': Predictor must be numeric or ordered."
Any help would be much appreciated!
This should work:
data <- read.table( text=
"Actual_Findings_% Predicted_Finding_Prob
0.23 0.6
0.48 0.3
0.26 0.62
0.23 0.6
0.48 0.3
0.47 0.3
0.23 0.6
0.6868 0.25
0.77 0.15
0.31 0.55
", header=TRUE, check.names=FALSE )
library(pROC)
roccurve <- plot(
roc(
response = data$"Actual_Findings_%" <0.4,
predictor = data$"Predicted_Finding_Prob"
),
legacy.axes = TRUE, print.auc=TRUE, main = "ROC Curve"
)
Now importantly - the roc curve is there to show you what happens when you varry your classification threshold. So one thing you do do wrong is to go and enforce one, by setting predictions < 0.5
This does however give a perfect separation, which is nice I guess. (Though bad for educational purposes.)

How to substitude some characters with an space between them in R

I'm trying to substitute some characters by some strings, but when I try this happens:
Group <- "ABC"
A <- "0.25 0.65 0.48"
B <- "0.054 0.41 0.09"
C <- "0.8 0.047 0.34"
Group <- gsub("A", A, Group)
Group <- gsub("B", B, Group)
Group <- gsub("C", C, Group)
Group
When I group them there is no space between A, B and C. The above code results in:
0.25 0.65 0.480.054 0.41 0.090.8 0.047 0.34
I want that the input be like this:
0.25 0.65 0.48 0.054 0.41 0.09 0.8 0.047 0.34
I will appreciate if you can help me with this.
There are several syntactical errors, but let me present you what I think you are trying to accomplish:
Group <- 'ABC'
A <- paste(0.25, 0.65, 0.48)
Group = gsub('A', A, Group)
[1] "0.25 0.65 0.48BC"
EDIT: Seeing your reformatted question, I would say the only change is to put a space between your Group letters:
Group <- 'A B C'
Or paste an empty character at the end of all groups of numbers:
A <- paste(0.25, 0.65, 0.48, "")
You can transform Group a bit, i.e., trimsw(gsub(""," ",Group)), then " " is inserted among characters in Group.
just use paste with collapse = "":
A <- "0.25 0.65 0.48"
B <- "0.054 0.41 0.09"
C <- "0.8 0.047 0.34"
paste(A, B, C, collaspe = "")
"0.25 0.65 0.48 0.054 0.41 0.09 0.8 0.047 0.34 "

Change factor labels in psych::fa or psych::fa.diagram

I'm using the psych package for factor analysis. I want to specify the labels of the latent factors, either in the fa() object, or when graphing with fa.diagram().
For example, with toy data:
require(psych)
n <- 100
choices <- 1:5
df <- data.frame(a=sample(choices, replace=TRUE, size=n),
b=sample(choices, replace=TRUE, size=n),
c=sample(choices, replace=TRUE, size=n),
d=sample(choices, replace=TRUE, size=n))
model <- fa(df, nfactors=2, fm="pa", rotate="promax")
model
Factor Analysis using method = pa
Call: fa(r = df, nfactors = 2, rotate = "promax", fm = "pa")
Standardized loadings (pattern matrix) based upon correlation matrix
PA1 PA2 h2 u2 com
a 0.45 -0.49 0.47 0.53 2.0
b 0.22 0.36 0.17 0.83 1.6
c -0.02 0.20 0.04 0.96 1.0
d 0.66 0.07 0.43 0.57 1.0
I want to change PA1 and PA2 to FactorA and FactorB, either by changing the model object itself, or adjusting the labels in the output of fa.diagram():
The docs for fa.diagram have a labels argument, but no examples, and the experimentation I've done so far hasn't been fruitful. Any help much appreciated!
With str(model) I found the $loadings attribute, which fa.diagram() uses to render the diagram. Modifying colnames() of model$loadings did the trick.
colnames(model$loadings) <- c("FactorA", "FactorB")
fa.diagram(model)

Subset named numeric in R error

This one is really stumping me. I cannot find what is wrong with this line of code and I'm using it in a function and it makes the entire function fail.
Here is a vector of class 'matrix', result.1:
SPY TLT VGK EEM BIV RWX IYR EWJ DBC GLD PCY LQD TIP
-0.08 0.09 -0.07 -0.07 -0.09 -0.07 -0.07 -0.07 -0.066 -0.04 -0.08 -0.08 -0.08
I calculate the mean of this vector by running this line of code:
mean.momentum.1 <- mean(result.1)
The result is a numeric with a value of -0.062309. I want to get the names of the columns that are greater than the mean, so I run this code:
names(result.1[,(result.1 > mean.momentum.1)])
The output is as I expect, returning a character vector "TLT", "GLD".
Here is the issue: when I do this with a second, nearly identical matrix result.2 I get a NULL result every time, when I should be getting the result "TLT".
Here is the matrix result.2:
SPY TLT VGK EEM BIV RWX IYR EWJ DBC GLD TIP PCY LQD
-0.08 0.15 -0.07 -0.06 -0.054 -0.07 -0.07 -0.07 -0.06 -0.06 -0.07 -0.07 -0.06
I calculate the mean using the same method as above (-0.05298) and name it mean.momentum.2
Then, I run this line:
names(result.2[,(result.2 > mean.momentum.2)])
I get NULL back every time, when I expect "TLT". What is going wrong with the way I am subsetting? If I run the line:
result.2 > mean.momentum.2
I get a logical vector where TLT is TRUE, but it does not work when I try to subset with this method. It works perfectly in the first instance, but never works in the second instance, and since I get NULL back, my entire function fails.
Thank you...
Use colnames() and rownames() rather than names() when referring to a matrix:
mean.momentum.1 <- mean(result.1)
colnames(result.1)[which(result.1 > mean.momentum.1)]
## [1] "TLT" "GLD"
Also note the syntax of colnames() above. You are returning a subset of the vector colnames(result1) using which() to select the elements of the vector for which result.1 is greater than mean.momentum.1.
This works fine for result.2 also:
mean.momentum.2 <- mean(result.2)
colnames(result.2)[which(result.2 > mean.momentum.2)]
## [1] "TLT"

Problems with using plotCalibration() from the predictABEL package in R

I’ve been having some trouble with the plotCalibration() function, I have managed to get it to work before, but recently whilst working with another dataset (here is a link to the .Rda data file), I have been unable to shake off an error message which keeps cropping up:
> plotCalibration(data = data, cOutcome = 2, predRisk = data$sortmort)
Error in plotCalibration(data = data, cOutcome = 2, predRisk = data$sortmort) : The specified outcome is not a binary variable.`
When I’ve tried to set the cOutcome column to factors or to logical, it still doesn’t work.
I’ve looked at the source of the function and the only time the error message comes up is in the first if()else{} statement:
if (length(unique(y))!=2) {stop(" The specified outcome is not a binary variable.\n")}
else{
But I have checked that the length(unique(y)) is indeed ==2, and so don’t understand why the error message still crops up!
Be sure you're passing a dataframe to PlotCalibration. Passing a dplyr tibble can cause this error. Converting with the normal as.data.frame() worked for me.
Using the data you sent earlier, I do not see any error though:
Following output were produced along with a calibration plot:
> library(PredictABEL)
> plotCalibration(data = data, cOutcome = 2, predRisk = data$sortmort)
$Table_HLtest
total meanpred meanobs predicted observed
[0.000632,0.00129) 340 0.001 0.000 0.31 0
0.001287 198 0.001 0.000 0.25 0
[0.001374,0.00201) 283 0.002 0.004 0.53 1
0.002009 310 0.002 0.000 0.62 0
[0.002505,0.00409) 154 0.003 0.000 0.52 0
[0.004086,0.00793) 251 0.006 0.000 1.42 0
[0.007931,0.00998) 116 0.008 0.009 0.96 1
[0.009981,0.19545] 181 0.024 0.011 4.40 2
$Chi_square
[1] 4.906
$df
[1] 8
$p_value
[1] 0.7676
Please try using table(data[,2],useNA = "ifany") to see the number of levels of the outcome variable of your dataset.
The function plotCalibration will execute when the outcome is a binary variable (two levels).

Resources