In a a function, round values based on different conditions - r

I want to round these values but they are diverse so I cant set a general rule, like round(pvalue,2). How do I accomplish this?
id <- LETTERS[1:10]
pvalue <- c(0.3,0.0432,0.0032,0.67,0.00000003,0.0069,0.782, 0.0004, 0.00076,0.341)
df <- data.frame(id,pvalue)
df
id pvalue
1 A 0.30000000
2 B 0.04320000
3 C 0.00320000
4 D 0.67000000
5 E 0.00000003
6 F 0.00690000
7 G 0.78200000
8 H 0.00040000
9 I 0.00076000
10 J 0.34100000
It should look like:
id pvalue
1 A 0.3
2 B 0.04
3 C 0.003
4 D 0.67
5 E <0.0001
6 F 0.007
7 G 0.78
8 H 0.0004
9 I 0.0007
10 J 0.34

I think you're using the wrong tool. If you want to prepare p values for scientific display you can use the function pvalString in lazyWeave to convert your numeric values into correctly formatted strings.
library(lazyWeave)
pvalue <- c(0.3,0.0432,0.0032,0.67,0.00000003,0.0069,0.782, 0.0004, 0.00076,0.341)
pvalString(pvalue)
You can edit the parameters to get exactly what you want but the default settings will give you the standard convention.
[1] "0.30" "0.043" "0.003" "0.67" "< 0.001" "0.007" "0.78" "< 0.001" "< 0.001" "0.34"

Related

maximum function in R removes decimals

I have a question according to the max function in R. I have a column in a dataframe which has 2 decimals after the comma and whenever I apply the max function on the column i will get the highest value but with only 1 decimal after the comma.
max(df$e)
How can I get two decimals after the comma? I tried using options() and round() but nothing works.
Reproducible example:
a = c(228, 239)
b = c(50,83)
d = c(0.27,0.24)
e = c(2.12,1.69)
df = data.frame(a,b,d,e)
max(df$e)
#[1] 2.1
df
# a b d e
# 1 228 50 0.27 2.1
# 2 239 83 0.24 1.7
Now I would like to make more calculations:
df$f = (sqrt(df[,1]/(df[,2]+0.5))/max(df$e))*100
In the end the dataframe should have column a and b without decimals and d , e and f with two decimals after the comma.
Thank you!
tl;dr you've probably got options(digits = 2).
df
a b d e
1 228 50 0.27 2.12
2 239 83 0.24 1.69
If I set options(digits = 2) then R uses this value to set the output format (it doesn't change the actual values), globally, to two significant digits — this is "total digits", not "digits after the decimal point".
options(digits = 2)
df
a b d e f
1 228 50 0.27 2.1 100
2 239 83 0.24 1.7 80
To restore the value to the default, use options(digits = 7).
After restoring the default digits setting and computing df$f = (sqrt(df[,1]/(df[,2]+0.5))/max(df$e))*100 I get
a b d e f
1 228 50 0.27 2.12 100.2273
2 239 83 0.24 1.69 79.8031
You can round the last column to two decimal places:
df$f <- round(df$f, 2)
> df
a b d e f
1 228 50 0.27 2.12 100.23
2 239 83 0.24 1.69 79.80

Cell by cell merge

I have a table of counts in R:
Deg_Maj.tbl <- table(dat1$Maj,dat1$DegCat)
Deg_Maj.tbl
B D
A 66 5
C 2 9
and a table of percentages:
Deg_Maj.ptbl <- round(prop.table(Deg_Maj.tbl)*100,3)
Deg_Maj.ptbl
B D
A 80.48 6.10
C 2.44 10.98
I need to build a table that looks like this:
B D
A 66(80.48%) 5(6.1%)
C 2(2.44%) 9(10.98%)
I have several tables to do in this manner, so I am hoping to find a nice easy way to accomplish this.
We can use paste
Deg_Maj_out <- Deg_Maj.tbl
Deg_Maj_out[] <- paste0(Deg_Maj.tbl, "(", Deg_Maj.ptbl, "%)")
Deg_Maj_out
# B D
#A 66(80.48%) 5(6.1%)
#C 2(2.44%) 9(10.98%)

R is not ordering data correctly - skips E values

I am trying to order data by the column weightFisher. However, it is almost as if R does not process e values as low, because all the e values are skipped when I try to order from smallest to greatest.
Code:
resultTable_bon <- GenTable(GOdata_bon,
weightFisher = resultFisher_bon,
weightKS = resultKS_bon,
topNodes = 15136,
ranksOf = 'weightFisher'
)
head(resultTable_bon)
#create Fisher ordered df
indF <- order(resultTable_bon$weightFisher)
resultTable_bonF <- resultTable_bon[indF, ]
what resultTable_bon looks like:
GO.ID Term Annotated Significant Expected Rank in weightFisher
1 GO:0019373 epoxygenase P450 pathway 19 13 1.12 1
2 GO:0097267 omega-hydroxylase P450 pathway 9 7 0.53 2
3 GO:0042738 exogenous drug catabolic process 10 7 0.59 3
weightFisher weightKS
1 1.9e-12 0.79744
2 7.9e-08 0.96752
3 2.5e-07 0.96336
what "ordered" resultTable_bonF looks like:
GO.ID Term Annotated Significant Expected Rank in weightFisher
17 GO:0014075 response to amine 33 7 1.95 17
18 GO:0034372 very-low-density lipoprotein particle re... 11 5 0.65 18
19 GO:0060710 chorio-allantoic fusion 6 4 0.35 19
weightFisher weightKS
17 0.00014 0.96387
18 0.00016 0.83624
19 0.00016 0.92286
As #bhas says, it appears to be working precisely as you want it to. Maybe it's the use of head() that's confusing you?
To put your mind at ease, try it with something simpler
dtf <- data.frame(a=c(1, 8, 6, 2)^-10, b=c(7, 2, 1, 6))
dtf
# a b
# 1 1.000000e+00 7
# 2 9.313226e-10 2
# 3 1.653817e-08 1
# 4 9.765625e-04 6
dtf[order(dtf$a), ]
# a b
# 2 9.313226e-10 2
# 3 1.653817e-08 1
# 4 9.765625e-04 6
# 1 1.000000e+00 7
Try the following :
resultTable_bon$weightFisher <- as.numeric (resultTable_bon$weightFisher)
Then :
resultTable_bonF <- resultTable_bon[order(resultTable_bonF$weightFisher),]

Merging dataframes with all.equal on numeric(float) keys?

I have two data frames I want to merge based on a numeric value, however I am having trouble with floating point accuracy. Example:
> df1 <- data.frame(number = 0.1 + seq(0.01,0.1,0.01), letters = letters[1:10])
> df2 <- data.frame(number = seq(0.11,0.2,0.01), LETTERS = LETTERS[1:10])
> (merged <- merge(df1, df2, by = "number", all = TRUE))
number letters LETTERS
1 0.11 a A
2 0.12 <NA> B
3 0.12 b <NA>
4 0.13 c C
5 0.14 d D
6 0.15 <NA> E
7 0.15 e <NA>
8 0.16 f F
9 0.17 g G
10 0.18 h H
11 0.19 i I
12 0.20 j J
Some of the values (0.12 and 0.15) don't match up due to floating point accuracy issues as discussed in this post. The solution for finding equality there was the use of the all.equal function to remove floating point artifacts, however I don't believe there is a way to do this within the merge function.
Currently I get around it by forcing one of the the number columns to a character and then back to a number after merge, but this is a little clunky; does anyone have a better solution for this problem?
> df1c <- df1
> df1c[["number"]] <- as.character(df1c[["number"]])
> merged2 <- merge(df1c, df2, by = "number", all = TRUE)
> merged2[["number"]] <- as.numeric(merged2[["number"]])
> merged2
number letters LETTERS
1 0.11 a A
2 0.12 b B
3 0.13 c C
4 0.14 d D
5 0.15 e E
6 0.16 f F
7 0.17 g G
8 0.18 h H
9 0.19 i I
10 0.20 j J
EDIT: A little more about the data
I wanted to keep my question general to make it more applicable to other people's problems, but it seems I may need to be more specific to get an answer.
It is likely that all of the issues with merging with be due to floating point inaccuracy, but it may be a little hard to be sure. The data comes in as a series of time series values, a start time, and a frequency. These are then turned into a time series (ts) object and a number of functions are called to extract features from the time series (one of which is the time value), which is returned as a data frame. Meanwhile another set of functions is being called to get other features from the time series as targets. There are also potentially other series getting features generated to complement the original series. These values then have to be reunited using the time value.
Can't store as POSIXct: Each of these processes (feature extraction, target computation, merging) has to be able to occur independently and be stored in a CSV type format so it can be passed to other platforms. Storing as a POSIXct value would be difficult since the series aren't necessarily stored in calendar times.
Round to the level of precision that will allow the number to be equal.
> df1$number=round(df1$number,2)
> df2$number=round(df2$number,2)
>
> (merged <- merge(df1, df2, by = "number", all = TRUE))
number letters LETTERS
1 0.11 a A
2 0.12 b B
3 0.13 c C
4 0.14 d D
5 0.15 e E
6 0.16 f F
7 0.17 g G
8 0.18 h H
9 0.19 i I
10 0.20 j J
If you need to choose the level of precision programmatically then you should tell us more about the data and whether we can perhaps assume that it will always be due to floating point inaccuracy. If so, then rounding to 10 decimal places should be fine. The all.equal function uses sqrt(.Machine$double.eps) which in usually practice should be similar to round( ..., 16).

Survdiff() output fields in R

my question is about the output structure of survdiff() function form the 'survival' library in R. Namely, I have a data frame containing survival data
> dat
ID Time Treatment Gender Censored
1 E002 2.7597536 IND F 0
2 E003 4.2710472 Control M 0
3 E005 1.4784394 IND F 0
4 E006 6.8993840 Control F 1
5 E008 9.5934292 IND M 0
6 E009 2.9897331 Control F 0
7 E014 1.3470226 IND F 1
8 E016 2.1683778 Control F 1
9 E018 2.7597536 IND F 1
10 E022 1.3798768 IND F 0
11 E023 0.7227926 IND M 1
12 E024 5.5195072 IND F 0
13 E025 2.4640657 Control F 0
14 E028 7.4579055 Control M 1
15 E029 5.5195072 Control F 1
16 E030 2.7926078 IND M 0
17 E031 4.9938398 Control F 0
18 E032 2.7268994 IND M 0
19 E033 0.1642710 IND M 1
20 E034 4.1396304 Control F 0
and a model
> diff = survdiff(Surv(Time, Censored) ~ Treatment+Gender, data = dat)
> diff
Call:
survdiff(formula = Surv(Time, Censored) ~ Treatment + Gender,
data = dat)
N Observed Expected (O-E)^2/E (O-E)^2/V
Treatment=Control, Gender=M 2 1 1.65 0.255876 0.360905
Treatment=Control, Gender=F 7 3 2.72 0.027970 0.046119
Treatment=IND, Gender=M 5 2 2.03 0.000365 0.000519
Treatment=IND, Gender=F 6 2 1.60 0.100494 0.139041
Chisq= 0.5 on 3 degrees of freedom, p= 0.924
I'm wondering what's the field of the output object that contains the values from the very right column (O-E)^2/V? I'd like to use them further but can't obtain them neither from diff\$obs, diff\$exp, diff\$var nor from their combinations.
Your help's gonna be much appreciated.
For (O-E)^2/V try something like
rowSums(diff$obs - diff$exp)^2 / diag(diff$var)
while for (O-E)^2/E try something like
rowSums(diff$obs - diff$exp)^2 / rowSums(diff$exp)

Resources