I'm trying to use the percent() formatter for a valuebox() in a shiny app I'm designing... and came across some interesting behavior.
Obviously this works:
library(formattable)
a <- 0.2
percent(a)
But I need to catch a potential NA, so tried this:
ifelse(!is.na(a),percent(a),NA)
which returns the unpercented a (ie 0.2 rather than 20%)! What's going on? Some additional testing:
> ifelse(!is.na(a),percent(a),2)
[1] 0.2
> percent(a)
[1] 20.00%
> if(1==1) percent(a)
[1] 20.00%
> ifelse(1==1,percent(a),0)
[1] 0.2
> ifelse(1==1,eval(percent(a)),0)
[1] 0.2
> ifelse(1==1,parse(text = percent(a)),0)
expression(0.2)
> ifelse(1==1,eval(parse(text = percent(a))),0)
[1] 0.2
So what's going on?
Full transparency: percent(NA) does return NA, so I'm not stuck, just curious.
Take it outside the ifelse:
percent(ifelse(!is.na(a),a,NA))
[1] 20.00%
since NA is returned as NA, you can do it for NA:
percent(ifelse(!is.na(NA),a,NA))
[1] NA
It also works with a vector with NA and numbers:
percent(ifelse(!is.na(c(0.2,NA)),a,NA))
[1] 20.00% NA
This also happens in ifelse with time formats and a lot more always try to put it outside
Related
I recently reverted to R version 3.1.3 for compatibility reasons and am now encountering an unexplained error with the subset function.
I want to extract all rows for the gene "Migut.A00003" from the data frame transcr_effects using the gene name as listed in the data frame expr_mim_genes. (this will later become a loop). This action always returns all rows instead of specific rows I am looking for, no matter the formatting of the subset lookup:
> class(expr_mim_genes)
[1] "data.frame"
> sapply(expr_mim_genes, class)
gene longest.tr pair.length
"character" "logical" "numeric"
> head(expr_mim_genes)
gene longest.tr pair.length
1 Migut.A00003 NA 0
2 Migut.A00006 NA 0
3 Migut.A00007 NA 0
4 Migut.A00012 NA 0
5 Migut.A00014 NA 0
6 Migut.A00015 NA 0
> class(transcr_effects)
[1] "data.frame"
> sapply(transcr_effects, class)
pair gene
"character" "character"
> head(transcr_effects)
pair gene
1 pair1 Migut.N01020
2 pair10 Migut.A00351
3 pair1000 Migut.F00857
4 pair10007 Migut.D01637
5 pair10008 Migut.A00401
6 pair10009 Migut.G00442
. . .
7168 pair3430 Migut.A00003
. . .
The gene I am interested in:
> expr_mim_genes[1,"gene"]
[1] "Migut.A00003"
R sees these two terms as equivalent:
> expr_mim_genes[1,"gene"] == "Migut.A00003"
[1] TRUE
If I type in the name of the gene manually, the correct number of rows are returned:
> nrow(subset(transcr_effects, transcr_effects$gene=="Migut.A00003"))
[1] 1
> subset(transcr_effects, transcr_effects$gene=="Migut.A00003")
pair gene
7168 pair3430 Migut.A00003
However, this should return one row from the data.frame but it returns all rows:
> nrow(subset(transcr_effects, transcr_effects$gene == (expr_mim_genes[1,"gene"]))
[1] 10122
I have a feeling this has something to do with text formatting, but I've tried everything and haven't been able to figure it out. I've seen this issue with quoted v.s. unquoted entries, but it does not appear to be the issue here (see equality above).
I didn't have this problem before switching to R v.3.1.3, so maybe it is a version convention I am unaware of?
EDIT:
This is driving me crazy, but at least I think I have found a patch. There was quite a bit of data and file processing to get to this point in the code, involving loading at least 4 files. I've tried taking snippets of each file to post a reproducible example here, but sometimes when I analyze the snippets the error recurs, sometimes it does not (!!). After going through the process though, I discover that:
i = 1
gene = expr_mim_genes[i,"gene"]
> nrow(subset(transcr_effects, gene == gene))
[1] 10122
> nrow(subset(transcr_effects, gene == (expr_mim_genes[i,"gene"])))
[1] 1
I still can't explain this behavior of the code, but at least I know how to work around it.
Thanks all.
From a dendrogram which i created with
hc<-hclust(kk)
hcd<-as.dendrogram(hc)
i picked a subbranch
k=hcd[[2]][[2]][[2]][[2]][[2]][[2]][[2]][1]
When i simply have k displayed, this gives:
> k
[[1]]
[[1]][[1]]
[1] 243
attr(,"label")
[1] "NAfrica_002"
attr(,"members")
[1] 1
attr(,"height")
[1] 0
attr(,"leaf")
[1] TRUE
[[1]][[2]]
[1] 257
attr(,"label")
[1] "NAfrica_016"
attr(,"members")
[1] 1
attr(,"height")
[1] 0
attr(,"leaf")
[1] TRUE
attr(,"members")
[1] 2
attr(,"midpoint")
[1] 0.5
attr(,"height")
[1] 37
How can i access, for example, the "midpoint" attribute, or the second of the "label" attributes?
(I hope i use the correct terminology here)
I have tried things like
k$midpoint
attr(k,"midpoint")
but both returned 'NULL'.
Sorry for question number 2: how could i add a "label" attribute after the attribute "midpoint"?
Your k is still buried one layer too deep. The attributes have been set on the first element of the list k.
attributes(k[[1]]) # Display attributes
attributes(k[[1]])$label # Access attributes
attributes(k[[1]])$label <- 'new' # Change attribute
Alternatively, you can use attr:
attr(k[[1]],'label') # Display attribute
You can change parameters manually as in the previous answer. The problem with this is that it is not efficient to do manually when you want to do it many times. Also, while it is easy to change parameters - that change may not be reflected in any other function, since they won't implement any action based on that change (it must be programmed).
For your specific question - it generally depends on which attribute we want to view. For "midpoint", use the get_nodes_attr function, with the "midpoint" parameter - from the dendextend package.
# install.packages("dendextend")
library(dendextend)
dend <- as.dendrogram(hclust(dist(USArrests[1:5,])))
# Like:
# dend <- USArrests[1:5,] %>% dist %>% hclust %>% as.dendrogram
# midpoint for all nodes
get_nodes_attr(dend, "midpoint")
And you get this:
[1] 1.25 NA 1.50 0.50 NA NA 0.50 NA NA
To also change an attribute, you can use the various assign functions from the package: assign_values_to_leaves_nodePar, assign_values_to_leaves_edgePar, assign_values_to_nodes_nodePar, assign_values_to_branches_edgePar, remove_branches_edgePar, remove_nodes_nodePar
If all you want is to change the labels, the following ability from the package would solve your question:
> labels(dend)
[1] "Arkansas" "Arizona" "California" "Alabama" "Alaska"
> labels(dend) <- 1:5
> labels(dend)
[1] 1 2 3 4 5
For more details on the package, you can have a look at its vignette.
Can anyone help me with the format .. what command should I be using or let me know what should I learn to solve this myslef...)
> d
[1] 5.5
> cat("##",d[])
## 5.5
[1] 5.5
> print("##",d)
[1] "##"
> print("##",d[])
[1] "##"
> print(d)
[1] 5.5
I am looking to match the sample out:
## [1] 5.5
cat(paste("##", capture.output(print(5.5))))
## [1] 5.5
There are probably better alternatives to your actual problem (which you don't define). Package knitr comes to mind.
This is likely a duplicate question, but maybe I'm just not using the proper keywords to find what I'm looking for.
Currently, if I have a vector x, if I want a unit equivalent I would just do
x_unit <- x/sum(x)
This is an extremely basic task and may not be made any more efficiently than this, but I'm wondering if there is a function which obtains this task but has an na.rm feature or a handles the case when sum(x) == 0.
James
Here's a little function that does what you ask:
unit_vec <- function(vec){
return(vec/(sqrt(sum(vec**2,na.rm=TRUE))))
}
let's test it!
!> vec
[1] 87.45438 83.96820 111.47986 106.00922 110.13914 107.26847 86.53061
[8] 103.61227 85.79385 88.16059 NA
!> unit_vec(vec)
[1] 0.2832068 0.2719174 0.3610094 0.3432936 0.3566677 0.3473715 0.2802153
[8] 0.3355315 0.2778294 0.2854937 NA
!> sqrt(sum(unit_vec(vec)**2,na.rm=TRUE))
[1] 1
It works when the sum is 0 too!
> vec2
[1] 10 -10
> unit_vec(vec2)
[1] 0.7071068 -0.7071068
!> sqrt(sum(unit_vec(vec2)**2,na.rm=TRUE))
[1] 1
I hope this helps.
I have two seemingly identical zoo objects created by the same commands from csv files for different time periods. I try to combine them into one long zoo but I'm failing with "indexes overlap" error. ('merge' 'c' or 'rbind' all produce variants of the same error text.) As far as I can see there are no duplicates and the time periods do not overlap. What am I doing wrong? Am using R version 3.0.1 on Windows 7 64bit if that makes a difference.
> colnames(z2)
[1] "Amb" "HWS" "Diff"
> colnames(t.tmp)
[1] "Amb" "HWS" "Diff"
> max(index(z2))
[1] "2012-12-06 02:17:45 GMT"
> min(index(t.tmp))
[1] "2012-12-06 03:43:45 GMT"
> anyDuplicated(c(index(z2),index(t.tmp)))
[1] 0
> c(z2,t.tmp)
Error in rbind.zoo(...) : indexes overlap
>
UPDATE: In trying to make a reproducible case I've concluded this is an implementation error due to the large number of rows I'm dealing with: it fails if the final result is more than 311434 rows long.
> nrow(c(z2,head(t.tmp,n=101958)))
Error in rbind.zoo(...) : indexes overlap
> nrow(c(z2,head(t.tmp,n=101957)))
[1] 311434
# but row 101958 inserts fine on its own so its not a data problem.
> nrow(c(z2,tail(head(t.tmp,n=101958),n=2)))
[1] 209479
I'm sorry but I dont have the R scripting skills to produce a zoo of the critical length, hopefully someone might be able to help me out..
UPDATE 2- Responding to Jason's suggestion.. : The problem is in the MATCH but my R skills arent sufficient to know how to interpret it- does it mean MATCH finds a duplicate value in x.t whereas anyDuplicated does not?
> x.t <- c(index(z2),index(t.tmp));
> length(x.t)
[1] 520713
> ix <- ORDER (x.t)
> length(ix)
[1] 520713
> x.t <- x.t[ix]
> length(ix)
[1] 520713
> length(x.t)
[1] 520713
> tx <- table(MATCH(x.t,x.t))
> max(tx)
[1] 2
> tx[which(tx==2)]
311371 311373 311378 311383 311384 311386 311389 311392 311400 311401
2 2 2 2 2 2 2 2 2 2
> anyDuplicated(x.t)
[1] 0
After all the testing and head scratching it seems that the problem I'm having is timezone related. Setting the environment to the same time zone as the original data makes it work just fine.
Sys.setenv(TZ="GMT")
> z3<-rbind(z2,t.tmp)
> nrow(z3)
[1] 520713
Thanks to how to guard against accidental time zone conversion for the inspiration to look in that direction.