Related
I tried to replace NaN values with zeros using the following script:
rapply( data123, f=function(x) ifelse(is.nan(x),0,x), how="replace" )
# [31] 0.00000000 -0.67994832 0.50287454 0.63979527 1.48410571 -2.90402836
The NaN value was showing to be zero but when I typed in the name of the data frame and tried to review it, the value was still remaining NaN.
data123$contri_us
# [31] NaN -0.67994832 0.50287454 0.63979527 1.48410571 -2.90402836
I am not sure whether the rapply command was actually applying the adjustment in the data frame, or just replaced the value as per shown.
Any idea how to actually change the NaN value to zero?
It would seem that is.nan doesn't actually have a method for data frames, unlike is.na. So, let's fix that!
is.nan.data.frame <- function(x)
do.call(cbind, lapply(x, is.nan))
data123[is.nan(data123)] <- 0
In fact, in R, this operation is very easy:
If the matrix 'a' contains some NaN, you just need to use the following code to replace it by 0:
a <- matrix(c(1, NaN, 2, NaN), ncol=2, nrow=2)
a[is.nan(a)] <- 0
a
If the data frame 'b' contains some NaN, you just need to use the following code to replace it by 0:
#for a data.frame:
b <- data.frame(c1=c(1, NaN, 2), c2=c(NaN, 2, 7))
b[is.na(b)] <- 0
b
Note the difference is.nan when it's a matrix vs. is.na when it's a data frame.
Doing
#...
b[is.nan(b)] <- 0
#...
yields: Error in is.nan(b) : default method not implemented for type 'list' because b is a data frame.
Note: Edited for small but confusing typos
The following should do what you want:
x <- data.frame(X1=sample(c(1:3,NaN), 200, replace=TRUE), X2=sample(c(4:6,NaN), 200, replace=TRUE))
head(x)
x <- replace(x, is.na(x), 0)
head(x)
Here is a tidyverse solution. I've generated sample data with both NaN and NA. The first column is fully complete.
df <- tibble(x = LETTERS[1:5],
y = c(1:3, NaN, 4),
z = c(rep(NaN, 3), NA, 5))
df
# A tibble: 5 x 3
x y z
<chr> <dbl> <dbl>
1 A 1 NaN
2 B 2 NaN
3 C 3 NaN
4 D NaN NA
5 E 4 5
Then we can apply mutate_all with replace to the dataframe:
df %>%
mutate_all(~replace(., is.nan(.), 0))
# A tibble: 5 x 3
x y z
<chr> <dbl> <dbl>
1 A 1 0
2 B 2 0
3 C 3 0
4 D 0 NA
5 E 4 5
We've replaced NaN values with zero and touched neither NA values nor the x column.
UPDATE to dplyr 1.0.0
Since the mutate_all is deprecated we can now rewrite the expression using across() like following:
df %>%
mutate(across(everything(), ~replace(.x, is.nan(.x), 0)))
# A tibble: 5 × 3
x y z
<chr> <dbl> <dbl>
1 A 1 0
2 B 2 0
3 C 3 0
4 D 0 NA
5 E 4 5
I have a string:
> all_scn[1]
[1] "Cars_20160601_01.hdf5"
I want to use it to repeat some numbers based on a variable last_step:
> last_step
[1] 439
if-else statement:
> ifelse(substring(all_scn[1], 1, 1)=="C",
rep(seq(0, last_step-1, 1), 13),
rep(seq(0, last_step-1, 1), 12))
[1] 0
But you see that instead of repeating a numeric vector of 0:438, 13 times, it just produces zero. Outside ofifelse I get following:
> rep(seq(0, last_step-1, 1), 13)
[1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
[30] 29 30 31 32 . . . (I truncated the output due to space limitation)
What am I doing wrong?
From help("ifelse"):
ifelse returns a value with the same shape as test which is filled
with elements selected from either yes or no depending on whether the
element of test is TRUE or FALSE.
This means that if the shape of test is a vector with just one element, the output will be a vector with just one element. That is the case with your test.
substring(all_scn[1], 1, 1) == "C"
#[1] TRUE
In cases like this, you don't need to vectorize because there is nothing to vectorize. All you need a simple if/else.
if(substring(all_scn[1], 1, 1) == "C")
rep(seq(0, last_step-1, 1), 13)
else
rep(seq(0, last_step-1, 1), 12)
You are using the ifelse in a wrong way. It doesnot work like you have guessed. You imagined ifelse(condition, result_if _true, result_if_false) just like in excel. But it works differently in R.
Take the following example from R documentation:
> x <- c(6:-4)
> x
[1] 6 5 4 3 2 1 0 -1 -2 -3 -4
> sqrt(ifelse(x >= 0, x, NA))
[1] 2.449490 2.236068 2.000000 1.732051 1.414214 1.000000 0.000000 NaN NaN NaN NaN
Warning message:
In sqrt(x) : NaNs produced
You can see NaNs (Not a Number) produced, because sqrt() of a negativve number is an imaginary number, hence NaN. It produced an error. Now lets see the same for ifelse.
> sqrt(ifelse(x >= 0, x, NA))
[1] 2.449490 2.236068 2.000000 1.732051 1.414214 1.000000 0.000000 NA NA NA NA
See, no errors.
The solution for you to is to use simple if...else condition.
if (condition){
statement if true } else {
statement if false }
I have a vector in R,
a = c(2,3,4,9,10,2,4,19)
let us say I want to efficiently insert the following vectors, b, and c,
b = c(2,1)
d = c(0,1)
right after the 3rd and 7th positions (the "4" entries), resulting in,
e = c(2,3,4,2,1,9,10,2,4,0,1,19)
How would I do this efficiently in R, without recursively using cbind or so.
I found a package R.basic but its not part of CRAN packages so I thought about using a supported version.
Try this:
result <- vector("list",5)
result[c(TRUE,FALSE)] <- split(a, cumsum(seq_along(a) %in% (c(3,7)+1)))
result[c(FALSE,TRUE)] <- list(b,d)
f <- unlist(result)
identical(f, e)
#[1] TRUE
EDIT: generalization to arbitrary number of insertions is straightforward:
insert.at <- function(a, pos, ...){
dots <- list(...)
stopifnot(length(dots)==length(pos))
result <- vector("list",2*length(pos)+1)
result[c(TRUE,FALSE)] <- split(a, cumsum(seq_along(a) %in% (pos+1)))
result[c(FALSE,TRUE)] <- dots
unlist(result)
}
> insert.at(a, c(3,7), b, d)
[1] 2 3 4 2 1 9 10 2 4 0 1 19
> insert.at(1:10, c(4,7,9), 11, 12, 13)
[1] 1 2 3 4 11 5 6 7 12 8 9 13 10
> insert.at(1:10, c(4,7,9), 11, 12)
Error: length(dots) == length(pos) is not TRUE
Note the bonus error checking if the number of positions and insertions do not match.
You can use the following function,
ins(a, list(b, d), pos=c(3, 7))
# [1] 2 3 4 2 1 9 10 2 4 0 1 4 19
where:
ins <- function(a, to.insert=list(), pos=c()) {
c(a[seq(pos[1])],
to.insert[[1]],
a[seq(pos[1]+1, pos[2])],
to.insert[[2]],
a[seq(pos[2], length(a))]
)
}
Here's another function, using Ricardo's syntax, Ferdinand's split and #Arun's interleaving trick from another question:
ins2 <- function(a,bs,pos){
as <- split(a,cumsum(seq(a)%in%(pos+1)))
idx <- order(c(seq_along(as),seq_along(bs)))
unlist(c(as,bs)[idx])
}
The advantage is that this should extend to more insertions. However, it may produce weird output when passed invalid arguments, e.g., with any(pos > length(a)) or length(bs)!=length(pos).
You can change the last line to unname(unlist(... if you don't want a's items named.
The straightforward approach:
b.pos <- 3
d.pos <- 7
c(a[1:b.pos],b,a[(b.pos+1):d.pos],d,a[(d.pos+1):length(a)])
[1] 2 3 4 2 1 9 10 2 4 0 1 19
Note the importance of parenthesis for the boundaries of the : operator.
After using Ferdinand's function, I tried to write my own and surprisingly it is far more efficient.
Here's mine :
insertElems = function(vect, pos, elems) {
l = length(vect)
j = 0
for (i in 1:length(pos)){
if (pos[i]==1)
vect = c(elems[j+1], vect)
else if (pos[i] == length(vect)+1)
vect = c(vect, elems[j+1])
else
vect = c(vect[1:(pos[i]-1+j)], elems[j+1], vect[(pos[i]+j):(l+j)])
j = j+1
}
return(vect)
}
tmp = c(seq(1:5))
insertElems(tmp, c(2,4,5), c(NA,NA,NA))
# [1] 1 NA 2 3 NA 4 NA 5
insert.at(tmp, c(2,4,5), c(NA,NA,NA))
# [1] 1 NA 2 3 NA 4 NA 5
And there's the benchmark result :
> microbenchmark(insertElems(tmp, c(2,4,5), c(NA,NA,NA)), insert.at(tmp, c(2,4,5), c(NA,NA,NA)), times = 10000)
Unit: microseconds
expr min lq mean median uq max neval
insertElems(tmp, c(2, 4, 5), c(NA, NA, NA)) 9.660 11.472 13.44247 12.68 13.585 1630.421 10000
insert.at(tmp, c(2, 4, 5), c(NA, NA, NA)) 58.866 62.791 70.36281 64.30 67.923 2475.366 10000
my code works even better for some cases :
> insert.at(tmp, c(1,4,5), c(NA,NA,NA))
# [1] 1 2 3 NA 4 NA 5 NA 1 2 3
# Warning message:
# In result[c(TRUE, FALSE)] <- split(a, cumsum(seq_along(a) %in% (pos))) :
# number of items to replace is not a multiple of replacement length
> insertElems(tmp, c(1,4,5), c(NA,NA,NA))
# [1] NA 1 2 3 NA 4 NA 5
Here's an alternative that uses append. It's fine for small vectors, but I can't imagine it being efficient for large vectors since a new vector is created upon each iteration of the loop (which is, obviously, bad). The trick is to reverse the vector of things that need to be inserted to get append to insert them in the correct place relative to the original vector.
a = c(2,3,4,9,10,2,4,19)
b = c(2,1)
d = c(0,1)
pos <- c(3, 7)
z <- setNames(list(b, d), pos)
z <- z[order(names(z), decreasing=TRUE)]
for (i in seq_along(z)) {
a <- append(a, z[[i]], after = as.numeric(names(z)[[i]]))
}
a
# [1] 2 3 4 2 1 9 10 2 4 0 1 19
I tried to replace NaN values with zeros using the following script:
rapply( data123, f=function(x) ifelse(is.nan(x),0,x), how="replace" )
# [31] 0.00000000 -0.67994832 0.50287454 0.63979527 1.48410571 -2.90402836
The NaN value was showing to be zero but when I typed in the name of the data frame and tried to review it, the value was still remaining NaN.
data123$contri_us
# [31] NaN -0.67994832 0.50287454 0.63979527 1.48410571 -2.90402836
I am not sure whether the rapply command was actually applying the adjustment in the data frame, or just replaced the value as per shown.
Any idea how to actually change the NaN value to zero?
It would seem that is.nan doesn't actually have a method for data frames, unlike is.na. So, let's fix that!
is.nan.data.frame <- function(x)
do.call(cbind, lapply(x, is.nan))
data123[is.nan(data123)] <- 0
In fact, in R, this operation is very easy:
If the matrix 'a' contains some NaN, you just need to use the following code to replace it by 0:
a <- matrix(c(1, NaN, 2, NaN), ncol=2, nrow=2)
a[is.nan(a)] <- 0
a
If the data frame 'b' contains some NaN, you just need to use the following code to replace it by 0:
#for a data.frame:
b <- data.frame(c1=c(1, NaN, 2), c2=c(NaN, 2, 7))
b[is.na(b)] <- 0
b
Note the difference is.nan when it's a matrix vs. is.na when it's a data frame.
Doing
#...
b[is.nan(b)] <- 0
#...
yields: Error in is.nan(b) : default method not implemented for type 'list' because b is a data frame.
Note: Edited for small but confusing typos
The following should do what you want:
x <- data.frame(X1=sample(c(1:3,NaN), 200, replace=TRUE), X2=sample(c(4:6,NaN), 200, replace=TRUE))
head(x)
x <- replace(x, is.na(x), 0)
head(x)
Here is a tidyverse solution. I've generated sample data with both NaN and NA. The first column is fully complete.
df <- tibble(x = LETTERS[1:5],
y = c(1:3, NaN, 4),
z = c(rep(NaN, 3), NA, 5))
df
# A tibble: 5 x 3
x y z
<chr> <dbl> <dbl>
1 A 1 NaN
2 B 2 NaN
3 C 3 NaN
4 D NaN NA
5 E 4 5
Then we can apply mutate_all with replace to the dataframe:
df %>%
mutate_all(~replace(., is.nan(.), 0))
# A tibble: 5 x 3
x y z
<chr> <dbl> <dbl>
1 A 1 0
2 B 2 0
3 C 3 0
4 D 0 NA
5 E 4 5
We've replaced NaN values with zero and touched neither NA values nor the x column.
UPDATE to dplyr 1.0.0
Since the mutate_all is deprecated we can now rewrite the expression using across() like following:
df %>%
mutate(across(everything(), ~replace(.x, is.nan(.x), 0)))
# A tibble: 5 × 3
x y z
<chr> <dbl> <dbl>
1 A 1 0
2 B 2 0
3 C 3 0
4 D 0 NA
5 E 4 5
i have a matrix in R as follows:
YITEMREVENUE XCARTADD XCARTUNIQADD XCARTADDTOTALRS
YITEMREVENUE 1.0000000000 -0.02630016 -0.01811156 0.0008988723
XCARTADD -0.0263001551 1.00000000 0.02955307 -0.0438881639
XCARTUNIQADD -0.0181115638 0.02955307 1.00000000 0.0917359285
XCARTADDTOTALRS 0.0008988723 -0.04388816 0.09173593 1.0000000000
i want to list out the names of the columns with negative values only.. my output should look like:
YITEMREVENUE - XCARTADD XCARTUNIQADD
XCARTADD - YITEMREVENUE XCARTADDTOTALRS
XCARTUNIQADD - YITEMREVENUE
XCARTADDTOTALRS - XCARTADD
is it possible in R?
I would first cast the matrix to a data.frame, in code this would be:
# Some example data
dat = matrix(runif(9) - 0.5, 3, 3)
dimnames(dat) = list(LETTERS[1:3], LETTERS[1:3])
> dat
A B C
A 0.1216529 0.3501861 0.47473598
B -0.4720577 0.4887181 -0.41118597
C 0.4406510 -0.2516563 0.02344829
# Cast to data.frame
library(reshape)
df = melt(dat)
df
X1 X2 value
1 A A 0.12165293
2 B A -0.47205771
3 C A 0.44065104
4 A B 0.35018605
5 B B 0.48871810
6 C B -0.25165634
7 A C 0.47473598
8 B C -0.41118597
9 C C 0.02344829
# And find the combinations of row-columns which have < 0
df[df$value < 0, c("X1","X2")]
X1 X2
2 B A
6 C B
8 B C
If your data are in a data frame called m, you can use the following :
lapply(m, function(v) {rownames(m)[v<0]})
If your data are in a matrix called m, you can use :
apply(m, 2,function(v) {rownames(m)[v<0]})
In both cases, you will get a list like this :
$YITEMREVENUE
[1] "XCARTADD" "XCARTUNIQADD"
$XCARTADD
[1] "YITEMREVENUE" "XCARTADDTOTALRS"
$XCARTUNIQADD
[1] "YITEMREVENUE"
$XCARTADDTOTALRS
[1] "XCARTADD"