Updating 0 vector values based on preceding and successive values

Updating 0 vector values based on preceding and successive values - r

I have a data frame which has a cumulative count for each event (an event in this case being represented by a sequence of 1's in the bin column) with separating values given the value 0 and each event given an ID as such:
bin cumul ID
0 0 0
1 1 3
1 1 3
1 1 3
1 1 3
0 0 0
0 0 0
0 0 0
0 0 0
1 2 2
1 2 2
1 2 2
1 2 2
1 2 2
0 0 0
0 0 0
0 0 0
0 0 0
1 3 1
1 3 1
1 3 1
I want to update the ID column so each non-event (0 in the bin column) is assigned an ID value based on the previous and subsequent ID.
Therefore, if a non-event is preceded and succeeded by events of equal ID values (e.g. both 3) the non-event also carries this ID value (3). However if the non-event is preceded by an event with one value but succeeded with an event with a different value then the first half of the non-event is given an ID value equal to the preceding event and the final half of the non-event is given an ID value equal to the ID value of the succeeding event. Giving the final data frame:
bin cumul ID
0 0 3
1 1 3
1 1 3
1 1 3
1 1 3
0 0 3
0 0 3
0 0 2
0 0 2
1 2 2
1 2 2
1 2 2
1 2 2
1 2 2
0 0 2
0 0 2
0 0 1
0 0 1
1 3 1
1 3 1
1 3 1

If the question were how to fill in the zeros with ID that matched the preceding values, or matched successive values, then you could use na.locf from the zoo-package and it would be a one liner. For this task I think you might reach for the rle function:
rle(dat$ID)
#Run Length Encoding
# lengths: int [1:6] 1 4 4 5 4 3
# values : int [1:6] 0 3 0 2 0 1
Then thinking about how to use such result, my thinking was to use an algorithm like:
for each '0' in values; assign the first [`length`/2 + .9] values as $values[ idx-1 ]
assign the next ]`length`/2] values as $values[ idx+1 ]
( using `rep` will truncate/floor the fractional indices and adding a number
slightly less than 1.0 will take care of the edge cases where there are an
odd number of zeros in a row.)
( `sum` on the lengths can recover the correct positions.)
and for the beginning and ending 0-cases;
replace with successive and preceding values respectively
After considerable debugging effort (and commenting out the debugging cat-calls):
rldat <- rle(dat$ID)
for ( nth in seq_along( rldat$lengths) ){ #cat("nth=", nth, "\n")
if(rldat$values[nth] == 0){
if (nth == 1) { # cat("first value=",rldat$values[nth+1], "\n")
dat$ID[ 1:rldat$lengths[nth] ] <-rldat$values[nth+1];
} else {
if (nth== length(rldat$lengths) ){
dat$ID[ (length(dat$ID)-rldat$lengths[nth]+1):length(dat$ID) ] <-
rldat$values[nth-1]
} else {
# cat( "seq=", (sum(rldat$lengths[1:(nth- 1)])+1): sum(rldat$lengths[1:nth]) ,"\n")
dat$ID[ (sum(rldat$lengths[1:(nth-1)])+1):sum(rldat$lengths[1:nth]) ] <-
c( rep( rldat$values[nth-1],rldat$lengths[nth]/2+.9) ,
rep( rldat$values[nth+1],rldat$lengths[nth]/2) )}}
} }

Related

Compute combination of a pair variables for a given operation in R

From a given dataframe:
# Create dataframe with 4 variables and 10 obs
set.seed(1)
df<-data.frame(replicate(4,sample(0:1,10,rep=TRUE)))
I would like to compute a substract operation between in all columns combinations by pairs, but only keeping one substact, i.e column A- column B but not column B-column A and so on.
What I got is very manual, and this tend to be not so easy when there are lots of variables.
# Result
df_result <- as.data.frame(list(df$X1-df$X2,
df$X1-df$X3,
df$X1-df$X4,
df$X2-df$X3,
df$X2-df$X4,
df$X3-df$X4))
Also the colname of the feature name should describe the operation i.e.(x1_x2) being x1-x2.

You can use combn:
COMBI = combn(colnames(df),2)
res = data.frame(apply(COMBI,2,function(i)df[,i[1]]-df[,i[2]]))
colnames(res) = apply(COMBI,2,paste0,collapse="minus")
head(res)
X1minusX2 X1minusX3 X1minusX4 X2minusX3 X2minusX4 X3minusX4
1 0 0 -1 0 -1 -1
2 1 1 0 0 -1 -1
3 0 0 0 0 0 0
4 0 0 -1 0 -1 -1
5 1 1 1 0 0 0
6 -1 0 0 1 1 0

making 1000 contingency tables in R

I have a vector called "combined" with 1's and 0's
combined
1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
I sampled twice from this vector, each with a sample size of 3 and put it into a contingency table of counts as follows.
2 1
1 2
I want to reiterate this sampling 1000 times such that I end with 1000 contingency tables each with counts of 1s and 0s from the sampling.
This is what I tried:
sample1 = as.vector(replicate(10000, sample(combined, 3)))
sample2 = as.vector(replicate(10000, sample(combined, 3)))
con_table = table(sample1,sample2)
but I ended up only getting 1 table instead of 10000. Hoping to get some help.
8109 7573
7306 7012

You need to wrap the entire expression, sample and table inside replicate. Add a conversion to a factor to ensure you always get a 2x2 table. E.g. a simple version with 2 replications:
combined <- rep(0:1,each=10)
combined <- as.factor(combined)
replicate(2, table(sample(combined,3), sample(combined,3)), simplify=FALSE)
#[[1]]
#
# 0 1
# 0 0 1
# 1 1 1
#
#[[2]]
#
# 0 1
# 0 1 1
# 1 0 1

Vector analysis in R

As inputs your function should take a vector of 0s and 1s;
Every time you see a sequence of 1s in the data you need to increase the number of children by 1;
Be careful with the two subsequent sequences of 1s, where the difference between them is less than 5 (i.e. when there are less than 5 0s in between them, then it is the same child and not a new child);
To help you social planner provides some examples of what your function should return:
#Input: c(1,1,1,1,0,0,0,0)
#Output: 1 1 1 1 1 1 1 1
#Input: c(0,0,0,0,1,1,1,1,0,0,0,0,0,1,1,1)
#Output: 0 0 0 0 1 1 1 1 1 1 1 1 1 2 2 2
#Input: c(0,0,0,0,1,1,1,1,0,0,1,1,0,0,0,1,1,0,0,0,0,1,1,0,0,0,0,0,1)
#Output: 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2
Functions, which might be helpful:
diff()
cumsum()
which()
rle()
I dont quite understand how to approach the question, my thoughts on this are using diff function after the cumsum as it will help me to sustain a row of 1s but in this scenario i am loosing the length of vector (it obviously becomes shorter) also #rle$lenght seems to help me to detect gaps of length 5 or more to turn 1s into 2s. Sorry for this question I am only a beginner

I make use of which function in r (https://www.r-bloggers.com/which-function-in-r/) and run length encoding (http://www.cookbook-r.com/Manipulating_data/Finding_sequences_of_identical_values/). Here's my attempt:
vector_analyse <- function(sample_vector){
# ----------------------------------------------------------------------------
# Signature: vector --> vector
# Author: kon_u
# Description: Given a sample vector of 0s and 1s, return a sequence of 1s in
# the data you need to increase the number of children by 1 (when there are less
# 5 0s in between them, then it is the same child and not a new child)
# ----------------------------------------------------------------------------
# ----------------------------------------------------------------------------
# Run Length Encoding gives a list of length and values
# ----------------------------------------------------------------------------
rle_object <- rle(sample_vector)
x <- rle_object$lengths # original length
y <- rle_object$values # original values
z <- which(y == 1) # index of 1 in vector y
if (length(z) == 1){
invisible()
} else{
for (i in 2:length(z)){
if (x[z[i]-1] >= 5){
y[z[i]] = y[z[i]]
} else {
y[z[i]] = y[z[i]] - 1
}
}
}
y_cumsum = cumsum(y)
rle_object$values <- y_cumsum
new_vector = inverse.rle(rle_object)
return(new_vector)
}
vector_analyse(c(1,1,1,1,0,0,0,0)) # 1 1 1 1 1 1 1 1
vector_analyse(c(0,0,0,0,1,1,1,1,0,0,0,0,0,1,1,1)) # 0 0 0 0 1 1 1 1 1 1 1 1 1 2 2 2
vector_analyse(c(0,0,0,0,1,1,1,1,0,0,1,1,0,0,0,1,1,0,0,0,0,1,1,0,0,0,0,0,1)) # 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2

How to add multiple values to data.frame without loop?

Suppose I have matrix D which consists of death counts per year by specific ages.
I want to fill this matrix with appropriate death counts that is stored in
vector Age, but the following code gives me wrong answer. How should I write the code without making a loop?
# Year and age grid for tables
Years=c(2007:2017)
Ages=c(60:70)
#Data.frame of deaths
D=data.frame(matrix(ncol=length(Years),nrow=length(Ages))); D[is.na(D)]=0
colnames(D)=Years
rownames(D)=Ages
Age=c(60,61,62,65,65,65,68,69,60)
year=2010
D[as.character(Age),as.character(year)]<-
D[as.character(Age),as.character(year)]+1
D[,'2010'] # 1 1 1 0 0 1 0 0 1 1 0
# Should be 2 1 1 0 0 3 0 0 1 1 0

You need to use table
AgeTable = table(Age)
D[names(AgeTable), as.character(year)] = AgeTable
D[,'2010']
[1] 2 1 1 0 0 3 0 0 1 1 0

In R: Sample from a "totals" column, then subtract 1 from sampled column, store value, and resample

I am definitely not an R coder but am trying to stumble my way through this code. I have a dataframe that looks like this--with 200 rows (just 8 shown here).
Ind.ID V1 V2 V3 V4 V5 V6 V7 Captures
1 1 0 0 1 1 0 0 0 2
2 2 0 0 1 0 0 0 1 2
3 3 1 1 0 1 1 0 1 5
4 4 0 0 1 1 0 0 0 2
5 5 1 0 0 0 0 1 0 2
6 6 0 1 1 0 0 0 0 2
7 7 0 0 1 1 1 0 0 3
8 8 1 0 0 0 1 0 0 2
I am trying to sample from the Captures column (which is the sum of the row) and output the Ind.ID value. If there is a 0 in the Captures column, I want it to subtract 1 from i (i=i-1) and resample--to ensure that I get the correct number of samples. I also want to then subtract 1 from the sampled column (i.e., decrease the Captures value by 1 if it was sampled), and then resample. I am trying to get 400 samples (I think the current code will get me only 200, but I can't figure out how to get 400).
i want my output to be
23
45
197
64
.....
Here's my code:
sess1<-(numeric(200)) #create a place for output
for(i in 1:length(dep.pop$Captures)){
if(dep.pop[i,'Captures']!=0){ #if the value of Captures is not 0, sample and
sample(dep.pop$Captures, size=1, replace=TRUE) #want to resample the row if Captures >1
#code here to decrease the value of the sampled Captures column by 1. create new vector for resampling?
}
else {
if(dep.pop[i,'Captures']==0){ #if the value of Captures = 0
i<-i-1 #decrease the value of i by 1 to ensure 200 samples
sample(dep.pop$Captures, size=1, replace=TRUE) #and resample
}
#sess1<- #store the value from a different column (ID column) that represents the sampled row
}}
Thanks!

Assuming sum(dep.pop$Captures) is at least 400 then the following code may meet your needs to sample up to the number of captures for each individual id:
sample(rep(dep.pop$Ind.ID, times=dep.pop$Captures), size=400)
If you wish to sample with replacement (so you do not need to worry about the total number of captures) but still want to use the number of captures per individual id as sampling weights, then perhaps
sample(dep.pop$Ind.ID, size=400, replace=TRUE, prob=dep.pop$Captures)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Updating 0 vector values based on preceding and successive values - r

Related

Compute combination of a pair variables for a given operation in R

making 1000 contingency tables in R

Vector analysis in R

How to add multiple values to data.frame without loop?

In R: Sample from a "totals" column, then subtract 1 from sampled column, store value, and resample

Categories

Resources