The if else statement compare to 0 - r

I try to get the square root of negative number. I got the absolute value of data and, for the positive number, I use the squart root of absolute number directly, otherwive add an negaitve sign to the result. However all numbers I got are negaitve...
My code
Results shown
I try to get negaitve and positive results, but I only got negative numbers.your text``your text

Library and Data
Not sure exactly what you are doing because your original data frame isn't included in the question. However, I have simulated a dataset that should emulate what you want depending on what you are doing. First, I loaded the tidyverse package for data wrangling like creating/manipulating variables, then set a random seed so you can reproduce the simulated data.
#### Load Library ####
library(tidyverse)
#### Set Random Seed ####
set.seed(123)
Now I create a randomly distributed x value that is both positive and negative.
#### Create Randomly Distributed X w/Neg Values ####
tib <- tibble(
x = rnorm(n=100)
)
Creating Variables
Now we can make absolute values, followed by square roots, which are made negative if the original raw value was negative.
#### Create Absolute and Sqrt Values ####
new.tib <- tib %>%
mutate(
abs.x = abs(x),
sq.x = sqrt(abs.x),
final.x = ifelse(x < 0,
sq.x * -1,
sq.x)
)
new.tib
If you print new.tib, the end result will look like this:
# A tibble: 100 × 4
x abs.x sq.x final.x
<dbl> <dbl> <dbl> <dbl>
1 2.20 2.20 1.48 1.48
2 1.31 1.31 1.15 1.15
3 -0.265 0.265 0.515 -0.515
4 0.543 0.543 0.737 0.737
5 -0.414 0.414 0.644 -0.644
6 -0.476 0.476 0.690 -0.690
7 -0.789 0.789 0.888 -0.888
8 -0.595 0.595 0.771 -0.771
9 1.65 1.65 1.28 1.28
10 -0.0540 0.0540 0.232 -0.232
If you just want to select the final x values, you can simply select them, like so:
new.tib %>%
select(final.x)
Giving you just this vector:
# A tibble: 100 × 1
final.x
<dbl>
1 1.48
2 1.15
3 -0.515
4 0.737
5 -0.644
6 -0.690
7 -0.888
8 -0.771
9 1.28
10 -0.232
# … with 90 more rows

Using the first example in ?ifelse:
x <- c(6:-4)
[1] 6 5 4 3 2 1 0 -1 -2 -3 -4
sqrt(ifelse(x >= 0, x, -x))
[1] 2.449490 2.236068 2.000000 1.732051 1.414214 1.000000
[7] 0.000000 1.000000 1.414214 1.732051 2.000000

Related

ROC curves using pROC on R: Calculating lab value a threshold equates to

I am using pROC to provide the ROC analysis of blood tests. I have calculated the ROC curve, AUC and am using the ci.coords function to provide the spec, sens, PPV and NPV at a provided specificity (with 95% CI).
I would like to be able to say at what value of blod test this is, for instance at 1.2 the sens is x, spec is y, NPV is c, PPV is d. Ideally I ould have the data for a table like:
Lab value | Sens | Spec | NPV | PPV
I don't seem to be able to get this from the methodology I am currently using?
Does anyone have any suggestions?
Many thanks
Currently
spred1 = predict(smodel1)
sroc1 = roc(EditedDF1$any_abnormality, spred1)
ci.coords(sroc1, x=0.95, input="sensitivity", transpose = FALSE, ret=c("sensitivity","specificity","ppv","npv"))```
As you gave no reproducible example let's use the one that comes with the package
library(pROC)
data(aSAH)
roc1 <- roc(aSAH$outcome, aSAH$s100b)
The package comes with the function coords which lists specificity and sensititivity at different thresholds:
> coords(roc1)
threshold specificity sensitivity
1 -Inf 0.00000000 1.00000000
2 0.035 0.00000000 0.97560976
3 0.045 0.06944444 0.97560976
4 0.055 0.11111111 0.97560976
5 0.065 0.13888889 0.97560976
6 0.075 0.22222222 0.90243902
7 0.085 0.30555556 0.87804878
8 0.095 0.38888889 0.82926829
9 0.105 0.48611111 0.78048780
10 0.115 0.54166667 0.75609756
...
From there you can use the function ci.coords that you already have used to complete the table by whatever data you desire.
library(tidyverse)
library(pROC)
#> Type 'citation("pROC")' for a citation.
#>
#> Attaching package: 'pROC'
#> The following objects are masked from 'package:stats':
#>
#> cov, smooth, var
data(aSAH)
roc <- roc(aSAH$outcome, aSAH$s100b,
levels = c("Good", "Poor")
)
#> Setting direction: controls < cases
tibble(threshold = seq(0, 1, by = 0.1)) %>%
mutate(
data = threshold %>% map(~ {
res <- roc %>% ci.coords(x = .x, ret = c("sensitivity", "specificity", "ppv", "npv"))
# 97.5%
list(
sens = res$sensitivity[[3]],
spec = res$specificity[[3]],
ppv = res$ppv[[3]],
npv = res$npv[[3]]
)
})
) %>%
unnest_wider(data)
#> # A tibble: 11 x 5
#> threshold sens spec ppv npv
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0 1 0 0.363 NA
#> 2 0.1 0.927 0.5 0.5 0.917
#> 3 0.2 0.780 0.903 0.784 0.867
#> 4 0.3 0.634 0.917 0.769 0.8
#> 5 0.4 0.561 0.958 0.85 0.782
#> 6 0.5 0.439 1 1 0.755
#> 7 0.6 0.366 1 1 0.735
#> 8 0.7 0.317 1 1 0.72
#> 9 0.8 0.195 1 1 0.686
#> 10 0.9 0.122 1 1 0.667
#> 11 1 0.0732 1 1 0.655
Created on 2021-09-10 by the reprex package (v2.0.1)

How could I use R to pull a few select lines out of a large text file?

I am fairly new to stack overflow but did not find this in the search engine. Please let me know if this question should not be asked here.
I have a very large text file. It has 16 entries and each entry looks like this:
AI_File 10
Version
Date 20200708 08:18:41
Prompt1 LOC
Resp1 H****
Prompt2 QUAD
Resp2 1012
TransComp c-p-s
Model Horizontal
### Computed Results
LAI 4.36
SEL 0.47
ACF 0.879
DIFN 0.031
MTA 40.
SEM 1.
SMP 5
### Ring Summary
MASK 1 1 1 1 1
ANGLES 7.000 23.00 38.00 53.00 68.00
AVGTRANS 0.038 0.044 0.055 0.054 0.030
ACFS 0.916 0.959 0.856 0.844 0.872
CNTCT# 3.539 2.992 2.666 2.076 1.499
STDDEV 0.826 0.523 0.816 0.730 0.354
DISTS 1.008 1.087 1.270 1.662 2.670
GAPS 0.028 0.039 0.034 0.032 0.018
### Contributing Sensors
### Observations
A 1 20200708 08:19:12 x 31.42 38.30 40.61 48.69 60.28
L 2 20200708 08:19:12 1 5.0e-006
B 3 20200708 08:19:21 x 2.279 2.103 1.408 5.027 1.084
B 4 20200708 08:19:31 x 1.054 0.528 0.344 0.400 0.379
B 5 20200708 08:19:39 x 0.446 1.255 2.948 3.828 1.202
B 6 20200708 08:19:47 x 1.937 2.613 5.909 3.665 5.964
B 7 20200708 08:19:55 x 0.265 1.957 0.580 0.311 0.551
Almost all of this is junk information, and I am looking to run some code for the whole file that will only give me the lines for "Resp2" and "LAI" for all 16 of the entries. Is a task like this doable in R? If so, how would I do it?
Thanks very much for any help and please let me know if there's any more information I can give to clear anything up.
I've saved your file as a text file and read in the lines. Then you can use regex to extract the desired rows. However, I feel that my approach is rather clumsy, I bet there are more elegant ways (maybe also with (unix) command line tools).
data <- readLines("testfile.txt")
library(stringr)
resp2 <- as.numeric(str_trim(str_extract(data, "(?m)(?<=^Resp2).*$")))
lai <- as.numeric(str_trim(str_extract(data, "(?m)(?<=^LAI).*$")))
data_extract <- data.frame(
resp2 = resp2[!is.na(resp2)],
lai = lai[!is.na(lai)]
)
data_extract
resp2 lai
1 1012 4.36
A solution based in the tidyverse can look as follows.
library(dplyr)
library(vroom)
library(stringr)
library(tibble)
library(tidyr)
vroom_lines('data') %>%
enframe() %>%
filter(str_detect(value, 'Resp2|LAI')) %>%
transmute(value = str_squish(value)) %>%
separate(value, into = c('name', 'value'), sep = ' ')
# name value
# <chr> <chr>
# 1 Resp2 1012
# 2 LAI 4.36

Conditional sorting / reordering of column values in R

I have a data set similar to the following with 1 column and 60 rows:
value
1 0.0423
2 0.0388
3 0.0386
4 0.0342
5 0.0296
6 0.0276
7 0.0246
8 0.0239
9 0.0234
10 0.0214
.
40 0.1424
.
60 -0.0312
I want to reorder the rows so that certain conditions are met. For example one condition could be: sum(df$value[4:7]) > 0.1000 & sum(df$value[4:7]) <0.1100
With the data set looking like this for example.
value
1 0.0423
2 0.0388
3 0.0386
4 0.1312
5 -0.0312
6 0.0276
7 0.0246
8 0.0239
9 0.0234
10 0.0214
.
.
.
60 0.0342
What I tried was using repeat and sample as in the following:
repeat{
df1 <- as_tibble(sample(sdf$value, replace = TRUE))
if (sum(df$value[4:7]) > 0.1000 & sum(df$value[4:7]) <0.1100) break
}
Unfortunately, this method takes quite some time and I was wondering if there is a faster way to reorder rows based on mathematical conditions such as sum or prod
Here's a quick implementation of the hill-climbing method I outlined in my comment. I've had to slightly reframe the desired condition as "distance of sum(x[4:7]) from 0.105" to make it continuous, although you can still use the exact condition when doing the check that all requirements are satisfied. The benefit is that you can add extra conditions to the distance function easily.
# Using same example data as Jon Spring
set.seed(42)
vs = rnorm(60, 0.05, 0.08)
get_distance = function(x) {
distance = abs(sum(x[4:7]) - 0.105)
# Add to the distance with further conditions if needed
distance
}
max_attempts = 10000
best_distance = Inf
swaps_made = 0
for (step in 1:max_attempts) {
# Copy the vector and swap two random values
new_vs = vs
swap_inds = sample.int(length(vs), 2, replace = FALSE)
new_vs[swap_inds] = rev(new_vs[swap_inds])
# Keep the new vector if the distance has improved
new_distance = get_distance(new_vs)
if (new_distance < best_distance) {
vs = new_vs
best_distance = new_distance
swaps_made = swaps_made + 1
}
complete = (sum(vs[4:7]) < 0.11) & (sum(vs[4:7]) > 0.1)
if (complete) {
print(paste0("Solution found in ", step, " steps"))
break
}
}
sum(vs[4:7])
There's no real guarantee that this method will reach a solution, but I often try this kind of basic hill-climbing when I'm not sure if there's a "smart" way to approach a problem.
Here's an approach using combn from base R, and then filtering using dplyr. (I'm sure there's a way w/o it but my base-fu isn't there yet.)
With only 4 numbers from a pool of 60, there are "only" 488k different combinations (ignoring order; =60*59*58*57/4/3/2), so it's quick to brute force in about a second.
# Make a vector of 60 numbers like your example
set.seed(42)
my_nums <- rnorm(60, 0.05, 0.08);
all_combos <- combn(my_nums, 4) # Get all unique combos of 4 numbers
library(tidyverse)
combos_table <- all_combos %>%
t() %>%
as_tibble() %>%
mutate(sum = V1 + V2 + V3 + V4) %>%
filter(sum > 0.1, sum < 0.11)
> combos_table
# A tibble: 8,989 x 5
V1 V2 V3 V4 sum
<dbl> <dbl> <dbl> <dbl> <dbl>
1 0.160 0.00482 0.0791 -0.143 0.100
2 0.160 0.00482 0.101 -0.163 0.103
3 0.160 0.00482 0.0823 -0.145 0.102
4 0.160 0.00482 0.0823 -0.143 0.104
5 0.160 0.00482 -0.0611 -0.00120 0.102
6 0.160 0.00482 -0.0611 0.00129 0.105
7 0.160 0.00482 0.0277 -0.0911 0.101
8 0.160 0.00482 0.0277 -0.0874 0.105
9 0.160 0.00482 0.101 -0.163 0.103
10 0.160 0.00482 0.0273 -0.0911 0.101
# … with 8,979 more rows
This says that in this example, there are about 9000 different sets of 4 numbers from my sequence which meet the criteria. We could pick any of these and put them in positions 4-7 to meet your requirement.

R find top n results of column operation on aggregate operation per column over dataframe

Say I have a dataframe called RaM that holds cumulative return values. In this case, they literally are just a single row of cumulative return values along with column headers, but I would like to apply the logic to not just single row dataframes.
Say I want to sort by the max cumulative return value of each column, or even the average, or the sum of each column.
So each column would be re-ordered so that the max cumulative returns for each column is compared and the highest return becomes the 1st column with the min being the last column
then say I want to derive either the top 10 (1st 10 columns after they are rearranged), or even the top 10%.
I know how to derive the column averages, but I don't know how to effectively do the remaining operations. There is an order function, but when I used it, it stripped my column names, which I need. I could easily then cut the 1st say 10 columns, but is there a way that preserves the names? I don't think I can easily extract the names from the unordered original dataframe and apply it with my sorted by aggregate dataframe. My goal is to extract the column names of the top n columns (in dataframe RaM) in terms of a column aggregate function over the entire dataframe.
something like
top10 <- getTop10ColumnNames(colSums(RaM))
that would then output a dataframe of the top 10 columns in terms of their sum from RaM
Here's output off RaM
> head(RaM,2)
ABMD ACAD ALGN ALNY ANIP ASCMA AVGO CALD CLVS CORT
2013-01-31 0.03794643 0.296774194 0.13009009 0.32219178 0.13008130 0.02857604 0.13014640 -0.07929515 0.23375000 0.5174825
2013-02-28 0.14982079 0.006633499 0.00255102 -0.01823456 -0.05755396 0.07659708 -0.04333138 0.04066986 -0.04457953 -0.2465438
CPST EA EGY EXEL FCSC FOLD GNC GTT HEAR HK HZNP
2013-01-31 -0.05269663 0.08333333 -0.01849711 0.01969365 0 0.4179104 0.07992677 0.250000000 0.2017417 0.10404624 -0.085836910
2013-02-28 0.15051595 0.11443102 -0.04475854 -0.02145923 0 -0.2947368 0.14079036 0.002857143 0.4239130 -0.07068063 -0.009389671
ICON IMI IMMU INFI INSY KEG LGND LQDT MCF MU
2013-01-31 0.07750896 0.05393258 -0.01027397 -0.01571429 -0.05806459 0.16978417 -0.03085824 -0.22001958 0.01345609 0.1924290
2013-02-28 -0.01746362 0.03091684 -0.20415225 0.19854862 0.36849503 0.05535055 0.02189055 0.06840289 -0.09713487 0.1078042
NBIX NFLX NVDA OREX PFPT PQ PRTA PTX RAS REXX RTRX
2013-01-31 0.2112299 0.7846467 0.00000000 0.08950306 0.06823721 0.03838384 -0.1800819 0.04387097 0.23852335 0.008448541 0.34328358
2013-02-28 0.1677704 0.1382251 0.03888981 0.04020979 0.06311787 -0.25291829 0.0266223 -0.26328801 0.05079882 0.026656512 -0.02222222
SDRL SHOS SSI STMP TAL TREE TSLA TTWO UVE VICL
2013-01-31 0.07826093 0.2023956 -0.07788381 0.07103175 -0.14166875 -0.030504714 0.10746974 0.1053588 0.0365299 0.2302405
2013-02-28 -0.07585546 0.1384419 0.08052150 -0.09633197 0.08009728 -0.002860412 -0.07144761 0.2029581 -0.0330408 -0.1061453
VSI VVUS WLB
2013-01-31 0.06485356 -0.0976155 0.07494647
2013-02-28 -0.13965291 -0.1156069 0.04581673
Here's one way using the first section of your sample data to illustrate. You can gather up all the columns so that we can do summary calculations more easily, calculate all the summaries by group that you want, and then sort with arrange. Here I ordered with the highest sums first, but you could do whatever order you wanted.
library(tidyverse)
ram <- read_table2(
"ABMD ACAD ALGN ALNY ANIP ASCMA AVGO CALD CLVS CORT
0.03794643 0.296774194 0.13009009 0.32219178 0.13008130 0.02857604 0.13014640 -0.07929515 0.23375000 0.5174825
0.14982079 0.006633499 0.00255102 -0.01823456 -0.05755396 0.07659708 -0.04333138 0.04066986 -0.04457953 -0.2465438"
)
summary <- ram %>%
gather(colname, value) %>%
group_by(colname) %>%
summarise_at(.vars = vars(value), .funs = funs(mean = mean, sum = sum, max = max)) %>%
arrange(desc(sum))
summary
#> # A tibble: 10 x 4
#> colname mean sum max
#> <chr> <dbl> <dbl> <dbl>
#> 1 ALNY 0.152 0.304 0.322
#> 2 ACAD 0.152 0.303 0.297
#> 3 CORT 0.135 0.271 0.517
#> 4 CLVS 0.0946 0.189 0.234
#> 5 ABMD 0.0939 0.188 0.150
#> 6 ALGN 0.0663 0.133 0.130
#> 7 ASCMA 0.0526 0.105 0.0766
#> 8 AVGO 0.0434 0.0868 0.130
#> 9 ANIP 0.0363 0.0725 0.130
#> 10 CALD -0.0193 -0.0386 0.0407
If you then want to reorder your original data frame, you can get the order from this summary output and index with it:
ram[summary$colname]
#> # A tibble: 2 x 10
#> ALNY ACAD CORT CLVS ABMD ALGN ASCMA AVGO ANIP
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0.322 0.297 0.517 0.234 0.0379 0.130 0.0286 0.130 0.130
#> 2 -0.0182 0.00663 -0.247 -0.0446 0.150 0.00255 0.0766 -0.0433 -0.0576
#> # ... with 1 more variable: CALD <dbl>
Created on 2018-08-01 by the reprex package (v0.2.0).

Data pattern visualization using R

I have a table like
0.5625 0.037037037 0.009923785
0.7734375 0.0781893 0.009923785
0.9609375 0.127572016 0.009923785
0.26953125 0.008230453 0.009923785
0.85546875 0.144032922 0.009923785
0.873046875 0.187928669 0.009923785
0.969726563 0.138545953 0.009923785
0.711914063 0.031550069 0.009923785
0.588867188 0.066300869 0.009923785
0.670898438 0.038866027 0.009923785
0.331054688 0.004572474 0.009328358
0.670898438 0.038866027 0.009923785
0.8203125 0.1015625 0.009923785
0.794921875 0.115234375 0.009923785
0.947265625 0.228515625 0.009923785
0.284179688 0.032226563 0.009923785
0.987304688 0.079101563 0.009923785
0.485351563 0.081054688 0.009923785
0.584960938 0.012695313 0.009288663
0.485351563 0.081054688 0.009923785
0.862048458 0.112664883 0.00996348
0.844804516 0.126747993 0.00996348
0.859089866 0.072807892 0.00996348
0.069334708 0.013713014 0.00996348
0.515944115 0.001011122 0.009288663
0.787155502 0.089283342 0.009923785
I want to visualize the data in such a way that center point should be the result data and it should be connected to all those points which have provided that result((example 0.009288663 is generated by (0.515944115, 0.001011122) and (0.485351563, 0.081054688) so 0.009288663 should be connected to (a1,b1) and (a2,b2)).
In the bellow. resembles the results.
(a2,b2)<-----.------------>(a1,b1)
I have tried using following code:
scatterplot3d(x = test$A, # x axis
y = test$B, # y axis
z = test$Result, # z axis
x.ticklabs = levels(test$A),
y.ticklabs = levels(test$B))
1st Approach:
But what I realized, that above method is going to result in plotting 2 points in the 3D plane, instead of the way I needed.
2nd Approach:
I tried plotting all the points and based on condition tried connecting them, that can be like a workaround but still, I couldn't able to figure of the placeholder for the result.
Any help with the query will be much appreciated.
Thanks
Is this what you mean? Note that I only read in the first two columns with [,1:2] here, but it should work even if you read in the full dataset:
> library(dplyr)
> library(tibble)
> test <- as_tibble(read.table("yourdata.txt",header=TRUE))[,1:2]
> test
# A tibble: 26 x 2
A B
<dbl> <dbl>
1 0.562 0.0370
2 0.773 0.0782
3 0.961 0.128
4 0.270 0.00823
5 0.855 0.144
6 0.873 0.188
7 0.970 0.139
8 0.712 0.0316
9 0.589 0.0663
10 0.671 0.0389
# ... with 16 more rows
Create columns containing the midpoints of the x's and midpoints of the y's:
> test %>% mutate(xdiff=((A+lag(A))/2),ydiff=((B+lag(B))/2))
# A tibble: 26 x 4
A B xdiff ydiff
<dbl> <dbl> <dbl> <dbl>
1 0.562 0.0370 NA NA
2 0.773 0.0782 0.668 0.0576
3 0.961 0.128 0.867 0.103
4 0.270 0.00823 0.615 0.0679
... the rest are truncated
And then feed all the center points to a plot:
> test %>% mutate(xdiff=((A+lag(A))/2),ydiff=((B+lag(B))/2)) %>%
ggplot() + geom_point(aes(x=xdiff,y=ydiff))
You can even draw the segments that created the points by adding geom_segment but you're going to have to spend some time coming up with a creative color strategy, because it kind of makes the plot look messy:
> test %>% mutate(xdiff=((A+lag(A))/2),ydiff=((B+lag(B))/2)) %>%
ggplot() + geom_point(aes(x=xdiff,y=ydiff)) +
geom_segment(aes(x=A,y=B,xend=lead(A),yend=lead(B)))

Resources