I'm trying to find a format function which will suppress NaN output in R. I want to pass in a vector of double and have the NaN values return as empty and not as NaN. I'm trying for format output for a Latex table. This should be simple right? Is there such a function?
Here is what I get now:
> x <- c(seq(1,2,0.2), NaN)
> as.character(x)
> [1] "1" "1.2" "1.4" "1.6" "1.8" "2" "NaN"
This is what I want to get:
> x <- c(seq(1,2,0.2), NaN)
> formatting.function(x)
> [1] "1" "1.2" "1.4" "1.6" "1.8" "2" ""
Here you go:
R> x <- c(seq(1,2,0.2), NaN)
R> zx <- as.character(x)
R> zx
[1] "1" "1.2" "1.4" "1.6" "1.8" "2" "NaN"
So now we define a new function mattFun():
R> mattFun <- function(x) gsub("NaN", "", as.character(x))
and use it:
R> zy <- mattFun(x)
R> zy
[1] "1" "1.2" "1.4" "1.6" "1.8" "2" ""
R>
In all seriousness, you are simply looking for a simple pattern replacement which is what
regular expressions do. gsub() is one of several functions offering that. Try to read up on regular expression.
Just replace "NaN" by "":
x <- c(seq(1,2,0.2), NaN)
fofu <- function(x){
cx <- as.character(x)
cx[cx=="NaN"] <- ""
cx
}
fofu(x)
#[1] "1" "1.2" "1.4" "1.6" "1.8" "2" ""
(edit)
Or using implicit conversion to make it shorter:
fofu2 <- function(x) "[<-"(x, is.nan(x), "")
# or ... replace(x, is.nan(x), "")
This is about two times faster than the gsub based solution (using microbenchmark and x as defined above) - although of course, what matters most of the time, is convenience rather than computing time.
Related
> foo <- as.character(c(0, 2))
> foo
[1] "0" "2"
> foo[1]
[1] "0"
> foo[2]
[1] "2"
> as.character("0-2")
[1] "0-2" #this is the output I want from the command below:
> as.character("foo[1]-foo[2]")
[1] "foo[1]-foo[2]" # ... was hoping to get "0-2"
I tried some variations of eval(parse()), but same problem. I also tried these simple examples:
> as.character("as.name(foo[1])")
[1] "as.name(foo[1])"
> as.character(as.name("foo[1]"))
[1] "foo[1]"
Any chance of getting something simple like as.character("foo[1]-foo[2]") to display "0-2"?
UPDATE
Similar example (with a much longer string):
> lol <- as.character(seq(0, 20, 2))
> lol
[1] "0" "2" "4" "6" "8" "10" "12" "14" "16" "18" "20"
> c(as.character("0-2"), as.character("2-4"), as.character("4-6"), as.character("6-8"), as.character("8-10"), as.character("10-12"), as.character("12-14"),as.character("14-16"),as.character("16-18"),as.character("18-20"))
[1] "0-2" "2-4" "4-6" "6-8" "8-10" "10-12" "12-14" "14-16" "16-18" "18-20"
I would like to be able to actually call the object lol from within my character string.
We can use paste with the collapse argument
paste(foo, collapse='-')
#[1] "0-2"
If we need to paste adjacent elements together, remove the first and last elements of 'lol' and then paste it together with the sep argument.
paste(lol[-length(lol)], lol[-1], sep='-')
#[1] "0-2" "2-4" "4-6" "6-8" "8-10" "10-12" "12-14" "14-16" "16-18"
#[10] "18-20"
I am using apply to generate strings from a data frame.
For example:
df2 <- data.frame(a=c(1:3), b=c(9:11))
apply(df2, 1, function(row) paste0("hello", row['a']))
apply(df2, 1, function(row) paste0("hello", row['b']))
works as I would expect and generates
[1] "hello1" "hello2" "hello3"
[1] "hello9" "hello10" "hello11"
However, if I have
df <- data.frame(a=c(1:3), b=c(9:11), c=c("1", "2", "3"))
apply(df, 1, function(row) paste0("hello", row['a']))
apply(df, 1, function(row) paste0("hello", row['b']))
the output is
[1] "hello1" "hello2" "hello3"
[1] "hello 9" "hello10" "hello11"
Can any one please explain why I get a padded space to make all the strings the same length in the second case? I can work around the problem using gsub, but I would like to have a better understanding of why this happens
You don't need apply function:
paste0("hello", df[["a"]])
[1] "hello1" "hello2" "hello3"
paste0("hello", df[["b"]])
[1] "hello9" "hello10" "hello11"
This is happening because apply transforms your data.frame in a matrix. See what happens when you coerce df to matrix:
as.matrix(df)
a b c
[1,] "1" " 9" "1"
[2,] "2" "10" "2"
[3,] "3" "11" "3"
Notice that it coerced to a character matrix and it included the extra space on the " 9".
I'm wondering if there is any way to remove blanks from the list.
As far as I've searched, I found out that there are many Q&As for removing
the whole element from the list, but couldn't find the one regarding
a specific component of the element.
To be specific, the list now I'm working with looks like this:
[[1]]
[1] "1" "" "" "2" "" "" "3"
[[2]]
[1] "weak"
[[3]]
[1] "22" "33"
[[4]]
[1] "44" "34p" "45"
From above, you can find " ", which should be removed.
I've tried different commands like
text.words.bl <- text.words.ll[-which(text.words.ll==" ")]
text.words.bl <- text.words.ll[!sapply(text.words.ll, is.null)]
etc, but seems like " "s in [[1]] of the list still remains.
Is it impossible to apply commands to small pieces in each element of the list?
(e.g. 1, 2, weak, 22, 33... respectively)
I've used "lapply" function to run specific commands to each elements,
and it seemed like those lapply commands all worked....
JY
Use %in%, but negate it with !:
## Sample data:
L <- list(c(1, 2, "", "", 4), c(1, "", "", 2), c("", "", 3))
L
# [[1]]
# [1] "1" "2" "" "" "4"
#
# [[2]]
# [1] "1" "" "" "2"
#
# [[3]]
# [1] "" "" "3"
The replacement:
lapply(L, function(x) x[!x %in% ""])
# [[1]]
# [1] "1" "2" "4"
#
# [[2]]
# [1] "1" "2"
#
# [[3]]
# [1] "3"
Obviously, assign the output to "L" if you want to overwrite the original dataset:
L[] <- lapply(L, function(x) x[!x %in% ""])
Another way would be to use nchar(). I borrowed L from #Ananda Mahto.
lapply(L, function(x) x[nchar(x) >= 1])
#[[1]]
#[1] "1" "2" "4"
#
#[[2]]
#[1] "1" "2"
#
#[[3]]
#[1] "3"
I need to process some data that are mostly csv. The problem is that R ignores the comma if it comes at the end of a line (e.g., the one that comes after 3 in the example below).
> strsplit("1,2,3,", ",")
[[1]]
[1] "1" "2" "3"
I'd like it to be read in as [1] "1" "2" "3" NA instead. How can I do this? Thanks.
Here are a couple ideas
scan(text="1,2,3,", sep=",", quiet=TRUE)
#[1] 1 2 3 NA
unlist(read.csv(text="1,2,3,", header=FALSE), use.names=FALSE)
#[1] 1 2 3 NA
Those both return integer vectors. You can wrap as.character around either of them to get the exact output you show in the Question:
as.character(scan(text="1,2,3,", sep=",", quiet=TRUE))
#[1] "1" "2" "3" NA
Or, you could specify what="character" in scan, or colClasses="character" in read.csv for slightly different output
scan(text="1,2,3,", sep=",", quiet=TRUE, what="character")
#[1] "1" "2" "3" ""
unlist(read.csv(text="1,2,3,", header=FALSE, colClasses="character"), use.names=FALSE)
#[1] "1" "2" "3" ""
You could also specify na.strings="" along with colClasses="character"
unlist(read.csv(text="1,2,3,", header=FALSE, colClasses="character", na.strings=""),
use.names=FALSE)
#[1] "1" "2" "3" NA
Hadley's stringi (and previously stringr) libraries are a huge improvement on base string functions (fully vectorized, consistent function interface):
require(stringr)
str_split("1,2,3,", ",")
[1] "1" "2" "3" ""
as.integer(unlist(str_split("1,2,3,", ",")))
[1] 1 2 3 NA
Using stringi package:
require(stringi)
> stri_split_fixed("1,2,3,",",")
[[1]]
[1] "1" "2" "3" ""
## you can directly specify if you want to omit this empty elements
> stri_split_fixed("1,2,3,",",",omit_empty = TRUE)
[[1]]
[1] "1" "2" "3"
I need to export some data bidirectionally between R and Matlab, and the latter prefers arrays. I am trying to convert my R nested-list data structures into a multidimensional array before conversion to matlab, such that the slicing remains the same. This is (analogous to) what I am currently doing:
nestlist <- lapply(1:2, function(x) lapply(1:3, function(y) lapply(1:4, function(z) paste(x, y, z, sep = ""))))
unlist(nestlist)
[1] "111" "112" "113" "114" "121" "122" "123" "124" "131" "132" "133" "134"
[13] "211" "212" "213" "214" "221" "222" "223" "224" "231" "232" "233" "234"
> length(nestlist)
[1] 2
> length(nestlist[[1]])
[1] 3
> length(nestlist[[1]][[1]])
[1] 4
As you can see, dimensions are 2x3x4 as expected. Now:
> ar <- array(unlist(nestlist), c(2, 3, 4))
> nestlist[[1]][[1]][[1]]
[1] "111"
> ar[1,1,1]
[1] "111"
so far so good, but....
> nestlist[[2]][[2]][[3]]
[1] "223"
> ar[2,2,3]
[1] "214"
So somehow array creation is not happening in the same order as the list is parsed with unlist. How can I do this efficiently, preserving the indexing orders and dimensions? I'd like avoid nested sapplies etc ("manual" parsing) if possible.
Here is a generalization of the transpose (t) function for multi-dimensional arrays:
tarray <- function(x) aperm(x, rev(seq_along(dim(x))))
Then you can define ar as follows:
ar <- tarray(array(unlist(nestlist), c(4, 3, 2)))
ar[2,2,3]
# [1] "223"