Issue in replace function in R - r

I have a vector:
> a <- c(0,1,2,3,4)
I am trying to replace the value of everything with that value incremented by 1, like below:
a <- (1,2,3,4,5)
> replace(a,a==4,5)
[1] 0 1 2 3 5
But when I try to replace 3 with 4, there is some issue
replace(a,a==3,4)
[1] 0 1 2 4 4
Both 3 and 5 are getting converted to 4.
and again when I try to replace 2 with 3, the same happens
> replace(a,a==2,3)
[1] 0 1 3 3 4
Can someone point out what i am doing wrong here?

replace doesn't change its argument.
> a = c(0,1,2,3,4)
> replace(a,a==2,99)
[1] 0 1 99 3 4
But a is still the same:
> a
[1] 0 1 2 3 4
so when you thought you'd converted the 4 to a 5 in a you hadn't. Use the return value if you want to change a:
> a
[1] 0 1 2 3 4
> a = replace(a,a==2,99)
> a
[1] 0 1 99 3 4
[As pointed out in comments, there are better ways to add 1 to all values of a vector, a=a+1 being the best]

Related

na.pad not working in diff() function

For some reason the diff() functions na.pad parameter is not working properly? Anyone else having this problem or have a work around?
yo <- c(5,3,3,4,5,6,5,8,9)
diff(yo, na.pad = TRUE)
[1] -2 0 1 1 1 -1 3 1
The resulting vector should be:
[1] NA -2 0 1 1 1 -1 3 1
The function diff you use certainly comes from xts package, na.pad does not apply on base R vectors. And you also need to convert your vector to times series:
library(xts)
library(zoo)
yy = zoo(yo)
diff(yy, na.pad=TRUE)
# 1 2 3 4 5 6 7 8 9
#NA -2 0 1 1 1 -1 3 1

Cumulative sum for positive numbers only [duplicate]

This question already has answers here:
Create counter within consecutive runs of certain values
(6 answers)
Closed 1 year ago.
I have this vector :
x = c(1,1,1,1,1,0,1,0,0,0,1,1)
And I want to do a cumulative sum for the positive numbers only. I should have the following vector in return:
xc = (1,2,3,4,5,0,1,0,0,0,1,2)
How could I do it?
I've tried : cumsum(x) but that do the cumulative sum for all values and gives :
cumsum(x)
[1] 1 2 3 4 5 5 6 6 6 6 7 8
One option is
x1 <- inverse.rle(within.list(rle(x), values[!!values] <-
(cumsum(values))[!!values]))
x[x1!=0] <- ave(x[x1!=0], x1[x1!=0], FUN=seq_along)
x
#[1] 1 2 3 4 5 0 1 0 0 0 1 2
Or a one-line code would be
x[x>0] <- with(rle(x), sequence(lengths[!!values]))
x
#[1] 1 2 3 4 5 0 1 0 0 0 1 2
Here's a possible solution using data.table v >= 1.9.5 and its new rleid funciton
library(data.table)
as.data.table(x)[, cumsum(x), rleid(x)]$V1
## [1] 1 2 3 4 5 0 1 0 0 0 1 2
Base R, one line solution with Map Reduce :
> Reduce('c', Map(function(u,v) if(v==0) rep(0,u) else 1:u, rle(x)$lengths, rle(x)$values))
[1] 1 2 3 4 5 0 1 0 0 0 1 2
Or:
unlist(Map(function(u,v) if(v==0) rep(0,u) else 1:u, rle(x)$lengths, rle(x)$values))
x=c(1,1,1,1,1,0,1,0,0,0,1,1)
cumsum_ <- function(x) {
r <- rle(x)
s <- split(x, rep(seq_along(r$values), rle(x)$lengths))
return(unlist(sapply(s, cumsum), use.names = F))
}
(xc <- cumsum_(x))
# [1] 1 2 3 4 5 0 1 0 0 0 1 2
I dont know much of R but i have written a small code in Python. Logic remains the same in all language. Hope this will help you
x=[1,1,1,1,1,0,1,0,0,0,1,1]
tot=0
for i in range(0,len(x)):
if x[i]!=0:
tot=tot+x[i]
x[i]=tot
else:
tot=0
print x
x<-c(1,1,1,1,1,0,1,0,0,0,1,1)
skumulowana<-function(x) {
dl<-length(x)
xx<-numeric(dl+1)
for (i in 1:dl){
ifelse (x[i]==0,xx[i+1]<-0,xx[i+1]<-xx[i]+x[i])
}
wynik<<-xx[1:dl+1]
return (wynik)
}
skumulowana(x)
## [1] 1 2 3 4 5 0 1 0 0 0 1 2
Try this one-liner...
Reduce(function(x,y) (x+y)*(y!=0), x, accumulate=T)
split and lapply version:
x <- c(1,1,1,1,1,0,1,0,0,0,1,1)
unlist(lapply(split(x, cumsum(x==0)), cumsum))
step by step:
a <- split(x, cumsum(x==0)) # divides x into pieces where each 0 starts a new piece
b <- lapply(a, cumsum) # calculates cumsum in each piece
unlist(b) # rejoins the pieces
Result has useless names but is otherwise what you wanted:
# 01 02 03 04 05 11 12 2 3 41 42 43
# 1 2 3 4 5 0 1 0 0 0 1 2
Here is another base R solution using aggregate. The idea is to make a data frame with x and a new column named x.1 by which we can apply aggregate functions (cumsum in this case):
x <- c(1,1,1,1,1,0,1,0,0,0,1,1)
r <- rle(x)
df <- data.frame(x,
x.1=unlist(sapply(1:length(r$lengths), function(i) rep(i, r$lengths[i]))))
# df
# x x.1
# 1 1 1
# 2 1 1
# 3 1 1
# 4 1 1
# 5 1 1
# 6 0 2
# 7 1 3
# 8 0 4
# 9 0 4
# 10 0 4
# 11 1 5
# 12 1 5
agg <- aggregate(df$x~df$x.1, df, cumsum)
as.vector(unlist(agg$`df$x`))
# [1] 1 2 3 4 5 0 1 0 0 0 1 2

rewriting variable using for loop R

I've got a column in my dataset that contains a collection of 0,1 and 2. The 2's are a weird leftover from some previous transformation, and I need to convert them to 1. I've written a simple loop to do this
for (i in my.cl.accept$enroll){
if (i==2){
i=1
}
}
however, this doesn't change the actual contents of the dataframe. ifelse() doesn't work, because I don't need to change the other digits at all; just the number 2.
I've been using R a little more after coming from python, what simple thing am I misunderstanding here?
Lets generate a sample set:
set.seed(10)
DF <- data.frame(
a=1:10,
b=sample(0:2,10,rep=T))
DF
Now, replace every entry corresponding to 2 with 1:
DF$b[DF$b==2] <- 1
DF
Note: This is a vectorized method, and will always work faster than loop iterations.
Dunno whether this is what you want?
> A<- 1:10
> B<- c(rep(0,5), rep(1,3), rep(2,2))
> data <- data.frame(A,B)
> data
A B
1 1 0
2 2 0
3 3 0
4 4 0
5 5 0
6 6 1
7 7 1
8 8 1
9 9 2
10 10 2
> data[data$B==2,]$B <- 1
> data
A B
1 1 0
2 2 0
3 3 0
4 4 0
5 5 0
6 6 1
7 7 1
8 8 1
9 9 1
10 10 1
Are you sure you're using ifelse correctly? It actually does allow you to only change one value to another. Here's an example:
> x <- sample(c(0, 1, 2), 10, TRUE)
> x
## [1] 2 1 1 0 2 2 0 0 2 1
> ifelse(x == 2, 1, x)
## [1] 1 1 1 0 1 1 0 0 1 1
For future reference, your good old-fashioned for loop should go something like this...
for (i in 1:length(my.cl.accept$enroll)){
if (my.cl.accept$enroll[i] == 2){
my.cl.accept$enroll[i] <- 1
} else {
my.cl.accept$enroll[i]
}
}

Replace NaN values in a list with zero (0)

Hi dear I have a problem with NaN. I am working with a large dataset with many variables and they have NaN. The data is like this:
z=list(a=c(1,2,3,NaN,5,8,0,NaN),b=c(NaN,2,3,NaN,5,8,NaN,NaN))
I used this commands to force the list to data frame but I got this:
z=as.data.frame(z)
> is.list(z)
[1] TRUE
> is.data.frame(z)
[1] TRUE
> replace(z,is.nan(z),0)
Error en is.nan(z) : default method not implemented for type 'list'
I forced z to data frame but it wasn't enough, maybe there is a form to change NaN in list. Thanks for your help. This data is only an example my original data has 36000 observations and 40 variables.
This is a perfect use case for rapply.
> rapply( z, f=function(x) ifelse(is.nan(x),0,x), how="replace" )
$a
[1] 1 2 3 0 5 8 0 0
$b
[1] 0 2 3 0 5 8 0 0
lapply would work too, but rapply deals properly with nested lists in this situation.
As you don't seem to mind having your data in a dataframe, you can do something highly vectorised too. However, this will only work if each list element is of equal length. I am guessing in your data (36000/40 = 900) that this is the case:
z <- as.data.frame(z)
dim <- dim(z)
y <- unlist(z)
y[ is.nan(y) ] <- 0
x <- matrix( y , dim )
# [,1] [,2]
# [1,] 1 0
# [2,] 2 2
# [3,] 3 3
# [4,] 0 0
# [5,] 5 5
# [6,] 8 8
# [7,] 0 0
# [8,] 0 0
Following OP's edit: Following your edited title, this should do it.
unstack(within(stack(z), values[is.nan(values)] <- 0))
# a b
# 1 1 0
# 2 2 2
# 3 3 3
# 4 0 0
# 5 5 5
# 6 8 8
# 7 0 0
# 8 0 0
unstack automatically gives you a data.frame if the resulting output is of equal length (unlike the first example, shown below).
Old solution (for continuity).
Try this:
unstack(na.omit(stack(z)))
# $a
# [1] 1 2 3 5 8 0
# $b
# [1] 2 3 5 8
Note 1: It seems from your post that you want to replace NaN with 0. The output of stack(z), it can be saved to a variable and then replaced to 0 and then you can unstack.
Note 2: Also, since na.omit removes NA as well as NaN, I also assume that your data contains no NA (from your data above).
z = do.call(data.table, rapply(z, function(x) ifelse(is.nan(x),0,x), how="replace"))
If you initially have data.table and want to 1-line the replacement.
But keep in mind that keys are need to be redefined after that:
> key(x1)
[1] "date"
> x1 = do.call(data.table, rapply(x1, function(x) ifelse(is.na(x), 0, x), how="replace"))
> key(x1)
NULL

How to get matching pairs with R's RecordLinkage package

Can anyone tell me what I'm doing wrong here. I am trying to test the R package RecordLinkage's compare function on a toy dataset
> test<-cbind(
+ a = c(1, 1, 1),
+ b = c(2, 0, 2),
+ c = c(1, 2, 1))
>
> test
a b c
[1,] 1 2 1
[2,] 1 0 2
[3,] 1 2 1
>
> results <- compare.dedup(test)
>
> results$pairs
id1 id2 a b c is_match
1 1 2 1 0 0 NA
2 1 3 1 1 1 NA
3 2 3 1 0 0 NA
>
Records 1 and 3 clearly match but is_match is NA for all three pairs.
because you forgot to use a identity index:
> compare.dedup(cbind(a=c(1,1,1), b=c(2,0,2), c=c(1,2,1)), identity=c(1,2,3))$pair
id1 id2 a b c is_match
1 1 2 1 0 0 0
2 1 3 1 1 1 0
3 2 3 1 0 0 0
For anyone, who stumbles accross this question like me: type
help(RLdata500)
in R. It explains that identity.RLdata500 is a separatly defined vector, that holds the unique ID's.
I think, it is define separatly, because otherwise, the data would be used by some of the functions automatically, unless they would be explicitly told, to do not so...
To see, which rows are duplicates, type the following in R:
i=cbind(RLdata500,identity.RLdata500)
L = i[8] == 33
i[L,]
I faced the same issue and I have the possible solution for this answer This is due to identity parameter.
from the sample data, in Record Linkage package, I found that this vector identity.RLdata500 carry information about the duplicate records of RLdata500 out of 500 records 50 are duplicate records
length(unique(identity.RLdata500))
[1] 450
I found the similar column in my dataset and stored as a separate vector and passed the vector to the identity parameter
New_data_seq
118
118
New_data_seq <- R_New_data_zero$SEQ_NO
abc <- compare.dedup (R_New_data_zero,identity = New_data_seq)
BICODE ALCODE IS_T OID conc
I A 1 99 IA1
I A 1 99 IA1
abc$pairs[1:1, ]
id1 id2 BICODE ALCODE IS_T OID conc is_match
1 2 1 1 1 1 1 1

Resources