From a given dataframe:
# Create dataframe with 4 variables and 10 obs
set.seed(1)
df<-data.frame(replicate(4,sample(0:1,10,rep=TRUE)))
I would like to compute a substract operation between in all columns combinations by pairs, but only keeping one substact, i.e column A- column B but not column B-column A and so on.
What I got is very manual, and this tend to be not so easy when there are lots of variables.
# Result
df_result <- as.data.frame(list(df$X1-df$X2,
df$X1-df$X3,
df$X1-df$X4,
df$X2-df$X3,
df$X2-df$X4,
df$X3-df$X4))
Also the colname of the feature name should describe the operation i.e.(x1_x2) being x1-x2.
You can use combn:
COMBI = combn(colnames(df),2)
res = data.frame(apply(COMBI,2,function(i)df[,i[1]]-df[,i[2]]))
colnames(res) = apply(COMBI,2,paste0,collapse="minus")
head(res)
X1minusX2 X1minusX3 X1minusX4 X2minusX3 X2minusX4 X3minusX4
1 0 0 -1 0 -1 -1
2 1 1 0 0 -1 -1
3 0 0 0 0 0 0
4 0 0 -1 0 -1 -1
5 1 1 1 0 0 0
6 -1 0 0 1 1 0
I have a vector called "combined" with 1's and 0's
combined
1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
I sampled twice from this vector, each with a sample size of 3 and put it into a contingency table of counts as follows.
2 1
1 2
I want to reiterate this sampling 1000 times such that I end with 1000 contingency tables each with counts of 1s and 0s from the sampling.
This is what I tried:
sample1 = as.vector(replicate(10000, sample(combined, 3)))
sample2 = as.vector(replicate(10000, sample(combined, 3)))
con_table = table(sample1,sample2)
but I ended up only getting 1 table instead of 10000. Hoping to get some help.
8109 7573
7306 7012
You need to wrap the entire expression, sample and table inside replicate. Add a conversion to a factor to ensure you always get a 2x2 table. E.g. a simple version with 2 replications:
combined <- rep(0:1,each=10)
combined <- as.factor(combined)
replicate(2, table(sample(combined,3), sample(combined,3)), simplify=FALSE)
#[[1]]
#
# 0 1
# 0 0 1
# 1 1 1
#
#[[2]]
#
# 0 1
# 0 1 1
# 1 0 1
As inputs your function should take a vector of 0s and 1s;
Every time you see a sequence of 1s in the data you need to increase the number of children by 1;
Be careful with the two subsequent sequences of 1s, where the difference between them is less than 5 (i.e. when there are less than 5 0s in between them, then it is the same child and not a new child);
To help you social planner provides some examples of what your function should return:
#Input: c(1,1,1,1,0,0,0,0)
#Output: 1 1 1 1 1 1 1 1
#Input: c(0,0,0,0,1,1,1,1,0,0,0,0,0,1,1,1)
#Output: 0 0 0 0 1 1 1 1 1 1 1 1 1 2 2 2
#Input: c(0,0,0,0,1,1,1,1,0,0,1,1,0,0,0,1,1,0,0,0,0,1,1,0,0,0,0,0,1)
#Output: 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2
Functions, which might be helpful:
diff()
cumsum()
which()
rle()
I dont quite understand how to approach the question, my thoughts on this are using diff function after the cumsum as it will help me to sustain a row of 1s but in this scenario i am loosing the length of vector (it obviously becomes shorter) also #rle$lenght seems to help me to detect gaps of length 5 or more to turn 1s into 2s. Sorry for this question I am only a beginner
I make use of which function in r (https://www.r-bloggers.com/which-function-in-r/) and run length encoding (http://www.cookbook-r.com/Manipulating_data/Finding_sequences_of_identical_values/). Here's my attempt:
vector_analyse <- function(sample_vector){
# ----------------------------------------------------------------------------
# Signature: vector --> vector
# Author: kon_u
# Description: Given a sample vector of 0s and 1s, return a sequence of 1s in
# the data you need to increase the number of children by 1 (when there are less
# 5 0s in between them, then it is the same child and not a new child)
# ----------------------------------------------------------------------------
# ----------------------------------------------------------------------------
# Run Length Encoding gives a list of length and values
# ----------------------------------------------------------------------------
rle_object <- rle(sample_vector)
x <- rle_object$lengths # original length
y <- rle_object$values # original values
z <- which(y == 1) # index of 1 in vector y
if (length(z) == 1){
invisible()
} else{
for (i in 2:length(z)){
if (x[z[i]-1] >= 5){
y[z[i]] = y[z[i]]
} else {
y[z[i]] = y[z[i]] - 1
}
}
}
y_cumsum = cumsum(y)
rle_object$values <- y_cumsum
new_vector = inverse.rle(rle_object)
return(new_vector)
}
vector_analyse(c(1,1,1,1,0,0,0,0)) # 1 1 1 1 1 1 1 1
vector_analyse(c(0,0,0,0,1,1,1,1,0,0,0,0,0,1,1,1)) # 0 0 0 0 1 1 1 1 1 1 1 1 1 2 2 2
vector_analyse(c(0,0,0,0,1,1,1,1,0,0,1,1,0,0,0,1,1,0,0,0,0,1,1,0,0,0,0,0,1)) # 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2
Suppose I have matrix D which consists of death counts per year by specific ages.
I want to fill this matrix with appropriate death counts that is stored in
vector Age, but the following code gives me wrong answer. How should I write the code without making a loop?
# Year and age grid for tables
Years=c(2007:2017)
Ages=c(60:70)
#Data.frame of deaths
D=data.frame(matrix(ncol=length(Years),nrow=length(Ages))); D[is.na(D)]=0
colnames(D)=Years
rownames(D)=Ages
Age=c(60,61,62,65,65,65,68,69,60)
year=2010
D[as.character(Age),as.character(year)]<-
D[as.character(Age),as.character(year)]+1
D[,'2010'] # 1 1 1 0 0 1 0 0 1 1 0
# Should be 2 1 1 0 0 3 0 0 1 1 0
You need to use table
AgeTable = table(Age)
D[names(AgeTable), as.character(year)] = AgeTable
D[,'2010']
[1] 2 1 1 0 0 3 0 0 1 1 0
I am definitely not an R coder but am trying to stumble my way through this code. I have a dataframe that looks like this--with 200 rows (just 8 shown here).
Ind.ID V1 V2 V3 V4 V5 V6 V7 Captures
1 1 0 0 1 1 0 0 0 2
2 2 0 0 1 0 0 0 1 2
3 3 1 1 0 1 1 0 1 5
4 4 0 0 1 1 0 0 0 2
5 5 1 0 0 0 0 1 0 2
6 6 0 1 1 0 0 0 0 2
7 7 0 0 1 1 1 0 0 3
8 8 1 0 0 0 1 0 0 2
I am trying to sample from the Captures column (which is the sum of the row) and output the Ind.ID value. If there is a 0 in the Captures column, I want it to subtract 1 from i (i=i-1) and resample--to ensure that I get the correct number of samples. I also want to then subtract 1 from the sampled column (i.e., decrease the Captures value by 1 if it was sampled), and then resample. I am trying to get 400 samples (I think the current code will get me only 200, but I can't figure out how to get 400).
i want my output to be
23
45
197
64
.....
Here's my code:
sess1<-(numeric(200)) #create a place for output
for(i in 1:length(dep.pop$Captures)){
if(dep.pop[i,'Captures']!=0){ #if the value of Captures is not 0, sample and
sample(dep.pop$Captures, size=1, replace=TRUE) #want to resample the row if Captures >1
#code here to decrease the value of the sampled Captures column by 1. create new vector for resampling?
}
else {
if(dep.pop[i,'Captures']==0){ #if the value of Captures = 0
i<-i-1 #decrease the value of i by 1 to ensure 200 samples
sample(dep.pop$Captures, size=1, replace=TRUE) #and resample
}
#sess1<- #store the value from a different column (ID column) that represents the sampled row
}}
Thanks!
Assuming sum(dep.pop$Captures) is at least 400 then the following code may meet your needs to sample up to the number of captures for each individual id:
sample(rep(dep.pop$Ind.ID, times=dep.pop$Captures), size=400)
If you wish to sample with replacement (so you do not need to worry about the total number of captures) but still want to use the number of captures per individual id as sampling weights, then perhaps
sample(dep.pop$Ind.ID, size=400, replace=TRUE, prob=dep.pop$Captures)