Suppose I have matrix D which consists of death counts per year by specific ages.
I want to fill this matrix with appropriate death counts that is stored in
vector Age, but the following code gives me wrong answer. How should I write the code without making a loop?
# Year and age grid for tables
Years=c(2007:2017)
Ages=c(60:70)
#Data.frame of deaths
D=data.frame(matrix(ncol=length(Years),nrow=length(Ages))); D[is.na(D)]=0
colnames(D)=Years
rownames(D)=Ages
Age=c(60,61,62,65,65,65,68,69,60)
year=2010
D[as.character(Age),as.character(year)]<-
D[as.character(Age),as.character(year)]+1
D[,'2010'] # 1 1 1 0 0 1 0 0 1 1 0
# Should be 2 1 1 0 0 3 0 0 1 1 0
You need to use table
AgeTable = table(Age)
D[names(AgeTable), as.character(year)] = AgeTable
D[,'2010']
[1] 2 1 1 0 0 3 0 0 1 1 0
Related
From a given dataframe:
# Create dataframe with 4 variables and 10 obs
set.seed(1)
df<-data.frame(replicate(4,sample(0:1,10,rep=TRUE)))
I would like to compute a substract operation between in all columns combinations by pairs, but only keeping one substact, i.e column A- column B but not column B-column A and so on.
What I got is very manual, and this tend to be not so easy when there are lots of variables.
# Result
df_result <- as.data.frame(list(df$X1-df$X2,
df$X1-df$X3,
df$X1-df$X4,
df$X2-df$X3,
df$X2-df$X4,
df$X3-df$X4))
Also the colname of the feature name should describe the operation i.e.(x1_x2) being x1-x2.
You can use combn:
COMBI = combn(colnames(df),2)
res = data.frame(apply(COMBI,2,function(i)df[,i[1]]-df[,i[2]]))
colnames(res) = apply(COMBI,2,paste0,collapse="minus")
head(res)
X1minusX2 X1minusX3 X1minusX4 X2minusX3 X2minusX4 X3minusX4
1 0 0 -1 0 -1 -1
2 1 1 0 0 -1 -1
3 0 0 0 0 0 0
4 0 0 -1 0 -1 -1
5 1 1 1 0 0 0
6 -1 0 0 1 1 0
I have a vector called "combined" with 1's and 0's
combined
1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
I sampled twice from this vector, each with a sample size of 3 and put it into a contingency table of counts as follows.
2 1
1 2
I want to reiterate this sampling 1000 times such that I end with 1000 contingency tables each with counts of 1s and 0s from the sampling.
This is what I tried:
sample1 = as.vector(replicate(10000, sample(combined, 3)))
sample2 = as.vector(replicate(10000, sample(combined, 3)))
con_table = table(sample1,sample2)
but I ended up only getting 1 table instead of 10000. Hoping to get some help.
8109 7573
7306 7012
You need to wrap the entire expression, sample and table inside replicate. Add a conversion to a factor to ensure you always get a 2x2 table. E.g. a simple version with 2 replications:
combined <- rep(0:1,each=10)
combined <- as.factor(combined)
replicate(2, table(sample(combined,3), sample(combined,3)), simplify=FALSE)
#[[1]]
#
# 0 1
# 0 0 1
# 1 1 1
#
#[[2]]
#
# 0 1
# 0 1 1
# 1 0 1
Reference:
Transpose and create categorical values in R
Follow-up to this question. While both model.matrix and data.table work very well with values already in it, how can we use them to simulate a column?
Meaning, from data in the same data frame,
data <- read.table(header=T, text='
subject weight sex test
1 2 M control
2 3 F cond1
3 2 F cond2
4 4 M control
5 3 F control
6 2 F control
')
If I were to simulate the case statement with OR condition from SQL in R, how do I go about it? In SQL I would do:
case when ( sex = 'F' OR sex = 'M') AND CONTROL IS NOT NULL THEN 1 ELSE 0 AS F_M_CONTROL
case when (sex = 'F' OR sex = 'M') AND COND1 IS NOT NULL THEN 1 ELSE 0 AS F_M_COND1
bringing the output to:
subject weight control_F_M control_M condtrol_F cond1_F_M cond1_F cond1_M
1 2 0 1 0 0 0 0
2 3 0 0 1 0 0 0
3 2 0 0 0 0 1 0
4 4 0 1 0 0 0 0
5 3 1 0 0 0 0 0
6 2 1 0 0 0 0 0
Any idea how I can generate the "Control_F_M" and Cond1_F_M columns in R?
Thanks in advance,
Bee
Edit:
To generate the afore mentioned output, i'm using the data table & dcast as suggested before.
I can use If-Else if I knew all the values in the column: test. I apologize for not clarifying this earlier. The challenge ofcourse is that the column is dynamic and so I'm hoping to generate that many columns dynamically as an extension to the below or using a similar approach.
dcast(data, subject+weight~test+sex, fun=length, drop=c(TRUE,FALSE))
I am definitely not an R coder but am trying to stumble my way through this code. I have a dataframe that looks like this--with 200 rows (just 8 shown here).
Ind.ID V1 V2 V3 V4 V5 V6 V7 Captures
1 1 0 0 1 1 0 0 0 2
2 2 0 0 1 0 0 0 1 2
3 3 1 1 0 1 1 0 1 5
4 4 0 0 1 1 0 0 0 2
5 5 1 0 0 0 0 1 0 2
6 6 0 1 1 0 0 0 0 2
7 7 0 0 1 1 1 0 0 3
8 8 1 0 0 0 1 0 0 2
I am trying to sample from the Captures column (which is the sum of the row) and output the Ind.ID value. If there is a 0 in the Captures column, I want it to subtract 1 from i (i=i-1) and resample--to ensure that I get the correct number of samples. I also want to then subtract 1 from the sampled column (i.e., decrease the Captures value by 1 if it was sampled), and then resample. I am trying to get 400 samples (I think the current code will get me only 200, but I can't figure out how to get 400).
i want my output to be
23
45
197
64
.....
Here's my code:
sess1<-(numeric(200)) #create a place for output
for(i in 1:length(dep.pop$Captures)){
if(dep.pop[i,'Captures']!=0){ #if the value of Captures is not 0, sample and
sample(dep.pop$Captures, size=1, replace=TRUE) #want to resample the row if Captures >1
#code here to decrease the value of the sampled Captures column by 1. create new vector for resampling?
}
else {
if(dep.pop[i,'Captures']==0){ #if the value of Captures = 0
i<-i-1 #decrease the value of i by 1 to ensure 200 samples
sample(dep.pop$Captures, size=1, replace=TRUE) #and resample
}
#sess1<- #store the value from a different column (ID column) that represents the sampled row
}}
Thanks!
Assuming sum(dep.pop$Captures) is at least 400 then the following code may meet your needs to sample up to the number of captures for each individual id:
sample(rep(dep.pop$Ind.ID, times=dep.pop$Captures), size=400)
If you wish to sample with replacement (so you do not need to worry about the total number of captures) but still want to use the number of captures per individual id as sampling weights, then perhaps
sample(dep.pop$Ind.ID, size=400, replace=TRUE, prob=dep.pop$Captures)
I am just starting to get beyond the basics in R and have come to a point where I need some help. I want to restructure some data. Here is what a sample dataframe may look like:
ID Sex Res Contact
1 M MA ABR
1 M MA CON
1 M MA WWF
2 F FL WIT
2 F FL CON
3 X GA XYZ
I want the data to look like:
ID SEX Res ABR CON WWF WIT XYZ
1 M MA 1 1 1 0 0
2 F FL 0 1 0 1 0
3 X GA 0 0 0 0 1
What are my options? How would I do this in R?
In short, I am looking to keep the values of the CONT column and use them as column names in the restructred data frame. I want to hold a variable set of columns constant (in th example above, I held ID, Sex, and Res constant).
Also, is it possible to control the values in the restructured data? I may want to keep the data as binary. I may want some data to have the value be the count of times each contact value exists for each ID.
The reshape package is what you want. Documentation here: http://had.co.nz/reshape/. Not to toot my own horn, but I've also written up some notes on reshape's use here: http://www.ling.upenn.edu/~joseff/rstudy/summer2010_reshape.html
For your purpose, this code should work
library(reshape)
data$value <- 1
cast(data, ID + Sex + Res ~ Contact, fun = "length")
model.matrix works great (this was asked recently, and gappy had this good answer):
> model.matrix(~ factor(d$Contact) -1)
factor(d$Contact)ABR factor(d$Contact)CON factor(d$Contact)WIT factor(d$Contact)WWF factor(d$Contact)XYZ
1 1 0 0 0 0
2 0 1 0 0 0
3 0 0 0 1 0
4 0 0 1 0 0
5 0 1 0 0 0
6 0 0 0 0 1
attr(,"assign")
[1] 1 1 1 1 1
attr(,"contrasts")
attr(,"contrasts")$`factor(d$Contact)`
[1] "contr.treatment"