How to normalize rather long decimal number in R? - r

I have list of data.frame, where I need to do transformation for .score column. However, I implemented helper function for this transformation. After I call .helperFunc for my input list of data.frame, but I got weird pvalue format in first, third data.frame. How to normalize rather big decimal to simple scientific number ? Can anyone tell me how to make this happen easily ?
toy data :
savedDF <- list(
bar = data.frame(.start=c(12,21,37), .stop=c(14,29,45), .score=c(5,69,14)),
cat = data.frame(.start=c(18,42,18,42,81), .stop=c(27,46,27,46,114), .score=c(15,5,15,5,134)),
foo = data.frame(.start=c(3,3,33,3,33,91), .stop=c(26,26,42,26,42,107), .score=c(22,22,6,22,6,7))
)
I got this weird output:
> .savedDF
$bar
.start .stop .score p.value
1 12 14 5 0.000010000000000000000817488438054070343241619411855936050415039062500
2 21 29 69 0.000000000000000000000000000000000000000000000000000000000000000000001
3 37 45 14 0.000000000000009999999999999999990459020882127560980734415352344512939
$cat
.start .stop .score p.value
1 18 27 15 1e-15
2 42 46 5 1e-05
3 18 27 15 1e-15
4 42 46 5 1e-05
5 81 114 134 1e-134
$foo
.start .stop .score p.value
1 3 26 22 0.0000000000000000000001
2 3 26 22 0.0000000000000000000001
3 33 42 6 0.0000010000000000000000
4 3 26 22 0.0000000000000000000001
5 33 42 6 0.0000010000000000000000
6 91 107 7 0.0000001000000000000000
I don't know what happen this, only second data.frame' format is desired. How can I normalize p.value column as simple as possible ?
last column of cat is considered to be desired format, or more precise but simple scientific number is also fit for me.
How can I make this normalization for unexpectedly long decimal numbers ? How can I achieve my desired output ? Any idea ? Thanks a lot

0 is the default scipen option. (See ?options for more details.) You apparently have changed the option to 100, which tells R to use decimal notation unless it is 100 characters longer than scientific notation. To get back to the default, run the line
options(scipen = 0)
As to "So in my function, I could add this option as well?" - you shouldn't do that. Doing it in your script is fine, but not in a function. Functions really shouldn't set user options. That's likely how you got in to this mess - some function you used probably rudely ran options(scipen = 100) and changed your options without you being aware.
Related: the opposite question How to disable scientific notation in R?

Related

Evaluating a function pointed to by a string in R

Suppose I have the following:
x <- 1:10
squared <- function(x) {x^2}
y <- "squared"
I want to be able to evaluate the function using the string defined by y. Something like eval(y), which I know is wrong, but will return
[1] 1 4 9 16 25 36 49 64 81 100
Any help is appreciated.
Either use match.fun
match.fun(y)(x)
#[1] 1 4 9 16 25 36 49 64 81 100
or with get
get(y)(x)
#[1] 1 4 9 16 25 36 49 64 81 100
To tell R that a given string is rather a command than a simple string, use eval(parse(text=...)).
Therefore, you could do
eval(parse(text=y))(x)
where eval(parse(text=y)) returns a function encoded in the string in y and x is the functions argument.
Moreover you could simply use match.fun, which looks whether there is a function with a specific name in the environment and grabs this function. Then then apply it to the argument x like
match.fun(y)(x)

R triangular numbers function

While working on a small program for calculating the right triangular number that fulfils an equation, I stumbled over a page that holds documentation on the function Triangular()
Triangular function
When I tried to use this, Rstudio says it couldn't find it and I can't seem to find any other information about what library this could be in.
Does this function even exist and/or are there other ways to fill a vector with triangular numbers?
Here is a base R solution to define your custom triangular number generator, i.e.,
myTriangular <- function(n) choose(seq(n),2)
or
myTriangular <- function(n) cumsum(seq(n)-1)
such that
> myTriangular(10)
[1] 0 1 3 6 10 15 21 28 36 45
If you would like to use Triangular() from package Zseq, then please try
Zseq::Triangular(10)
such that
> Zseq::Triangular(10)
Big Integer ('bigz') object of length 10:
[1] 0 1 3 6 10 15 21 28 36 45
It's pretty easy to do it yourself:
triangular <- function(n) sapply(1:n, function(x) sum(1:x))
So you can do:
triangular(10)
# [1] 1 3 6 10 15 21 28 36 45 55

Reassigning one column according to another using data.table

I am interested in replacing the value of -11 in one column "contra_end" to the corresponding values contained in "current_age", another column. -11 is a variable indicating current activity, and I want to replace that value with the actual age of each individual stored in "current_age". Age has ~500,000 values and only ~4,000 values from the first column have the value -11. When I run the following code to assign my age column values to the -11 values in "contra_end" I get the following error. Can I make this work without creating a new age variable?
biobank[contra_end == -11, contra_end := biobank[,"current_age", with=FALSE]]
Error in `[.data.table`(biobank, contra_end == -11, `:=`(contra_end, biobank[, :
Supplied 500000 items to be assigned to 4919 items of column 'contra_end'. The RHS length must either be 1 (single values are ok) or match the LHS length exactly. If you wish to 'recycle' the RHS please use rep() explicitly to make this intent clear to readers of your code.
I used a short dataset which I made using this code
biobank <- data.frame(contra_end = c(0,13,15,109,-11,23,45),
current_age = c(34,35,36,46,43,56,23))
which gives
contra_end current_age
1 0 34
2 13 35
3 15 36
4 109 46
5 -11 43
6 23 56
7 45 23
Using the tidyverse::mutate
biobank_2 <- biobank %>%
mutate(contra_end = ifelse(contra_end == -11, current_age, contra_end))
Or using base
biobank$contra_end[biobank$contra_end==-11] <- biobank$current_age[biobank$contra_end==-11]
Both options give:
contra_end current_age
1 0 34
2 13 35
3 15 36
4 109 46
5 43 43
6 23 56
7 45 23
EDIT: I didn't even notice that you were looking for a solution in data.table until after I posted. It doesn't sound like you have too many records for either of the solutions I posted to not be efficient enough, though.

For loop to iterate through columns in data.table [duplicate]

This question already has answers here:
Convert *some* column classes in data.table
(2 answers)
Closed 4 years ago.
I am trying to write a "for" loop that iterates through each column in a data.table and return a frequency table. However, I keep getting an error saying:
library(datasets)
data(cars)
cars <- as.data.table(cars)
for (i in names(cars)){
print(table(cars[,i]))
}
Error in `[.data.table`(cars, , i) :
j (the 2nd argument inside [...]) is a single symbol but column name 'i' is not found. Perhaps you intended DT[, ..i]. This difference to data.frame is deliberate and explained in FAQ 1.1.
When I use each column individually like below, I do not have any problem:
> table(cars[,dist])
2 4 10 14 16 17 18 20 22 24 26 28 32 34 36 40 42 46 48 50 52 54 56 60 64 66
1 1 2 1 1 1 1 2 1 1 4 2 3 3 2 2 1 2 1 1 1 2 2 1 1 1
68 70 76 80 84 85 92 93 120
1 1 1 1 1 1 1 1 1
My data is quite large (8921483x52), that is why I want to use the "for" loop and run everything at once then look at the result.
I included the cars dataset (which is easier to run) to demonstrate my code.
If I convert the dataset to data.frame, there is no problem running the "for" loop. But I just want to know why this does not work with data.table because I am learning it, which work better with large dataset in my belief.
If by chance, someone saw a post with an answer already, please let me know because I have been trying for several hours to look for one.
Some solution found here
My personal preference is the apply function though
library(datasets)
data(cars)
cars <- as.data.table(cars)
apply(cars,2,table)
To make your loop work you tweak the i
library(datasets)
data(cars)
cars <- as.data.table(cars)
for (i in names(cars)){
print(table(cars[,(i) := as.character(get(i))]))
}

How to reorder a column in a data frame to be the last column

I have a data frame where columns are constantly being added to it. I also have a total column that I would like to stay at the end. I think I must have skipped over some really basic command somewhere but cannot seem to find the answer anywhere. Anyway, here is some sample data:
x=1:10
y=21:30
z=data.frame(x,y)
z$total=z$x+z$y
z$w=11:20
z$total=z$x+z$y+z$w
When I type z I get this:
x y total w
1 1 21 33 11
2 2 22 36 12
3 3 23 39 13
4 4 24 42 14
5 5 25 45 15
6 6 26 48 16
7 7 27 51 17
8 8 28 54 18
9 9 29 57 19
10 10 30 60 20
Note how the total column comes before the w, and obviously any subsequent columns. Is there a way I can force it to be the last column? I am guessing that I would have to use ncol(z) somehow. Or maybe not.
You can reorder your columns as follows:
z <- z[,c('x','y','w','total')]
To do this programmatically, after you're done adding your columns, you can retrieve their names like so:
nms <- colnames(z)
Then you can grab the ones that aren't 'total' like so:
nms[nms!='total']
Combined with the above:
z <- z[, c(nms[nms!='total'],'total')]
You have a logic issue here. Whenever you add to a data.frame, it grows to the right.
Easiest fix: keep total a vector until you are done, and only then append it. It will then be the rightmost column.
(For critical applications, you would of course determine your width k beforehand, allocate k+1 columns and just index the last one for totals.)

Resources