Cumulative sum based on factor on R - r

I have the following dataset, and I need to acumulate the value and
sum, if the factor is 0, and then put the cummulated sum when I found
the factor != 0.
I've tried the loop bellow, but it didn't worked at all.
for(i in dataset$Variable.1) {
ifelse(dataset$Factor == 0,
dataset$teste <- dataset$Variable.1 + i,
dataset$teste <- dataset$Variable.1)
i<- dataset$Variable.1
print(i)
}
Any ideas?
Bellow an example of the dataset. I wish to get the "Result" Column.
On the real one, I also have a negative factor (-1).
Date Factor Variable.1 Result
1 03/02/2018 0 0.75 0.75
2 04/02/2018 0 0.75 1.50
3 05/02/2018 1 0.96 2.46
4 06/02/2018 1 0.76 0.76
5 07/02/2018 0 1.35 1.35
6 08/02/2018 1 0.70 2.05
7 09/02/2018 1 2.02 2.02
8 10/02/2018 0 0.00 0.00
9 11/02/2018 0 0.00 0.00
10 12/02/2018 0 0.20 0.20
11 13/02/2018 0 0.13 0.33
12 14/02/2018 0 1.64 1.97
13 15/02/2018 0 0.03 2.00
14 16/02/2018 1 0.51 2.51
15 17/02/2018 1 0.00 0.00
16 18/02/2018 0 0.00 0.00
17 19/02/2018 0 0.83 0.83
18 20/02/2018 1 0.42 1.25
19 21/02/2018 1 0.17 0.17
20 22/02/2018 1 0.97 0.97
21 23/02/2018 0 0.92 0.92
22 24/02/2018 0 0.00 0.92
23 25/02/2018 0 0.00 0.92
24 26/02/2018 1 0.19 1.11
25 27/02/2018 1 0.87 0.87
26 28/02/2018 1 0.85 0.85
27 01/03/2018 1 1.95 1.95
28 02/03/2018 1 0.54 0.54
29 03/03/2018 1 0.00 0.00
30 04/03/2018 0 0.00 0.00
31 05/03/2018 0 1.17 1.17
32 06/03/2018 1 0.25 1.42
33 07/03/2018 1 1.45 1.45
Thanks In advance.

If you want to stick with the for-loop, you can try this code :
DF$Result <- NA
prev <- 0
for(i in seq_len(nrow(DF))){
DF$Result[i] <- DF$Variable.1[i] + prev
if(DF$Factor[i] == 1)
prev <- 0
else
prev <- DF$Result[i]
}

Iteratively, try something like:
a=as.data.frame(cbind(Factor=c(0,0,1,1,0,1,1,
rep(0,3),1),Variable.1=c(0.75,0.75,0.96,0.71,1.35,0.7,
0.75,0.96,0.71,1.35,0.7)))
Result=0
aux=NULL
for (i in 1:nrow(a)){
if (a$Factor[i]==0){
Result=Result+a$Variable.1[i]
aux=c(aux,Result)
} else{
Result=Result+a$Variable.1[i]
aux=c(aux,Result)
Result=0
}
}
a$Results=aux
a
Factor Variable.1 Results
1 0 0.75 0.75
2 0 0.75 1.50
3 1 0.96 2.46
4 1 0.71 0.71
5 0 1.35 1.35
6 1 0.70 2.05
7 1 0.75 0.75
8 0 0.96 0.96
9 0 0.71 1.67
10 0 1.35 3.02
11 1 0.70 3.72

A possibility using tidyverse and data.table:
df %>%
mutate(temp = ifelse(Factor == 1 & lag(Factor) == 1, NA, 1), #Marking the rows after the first 1 in "Factor" as NA
temp = ifelse(!is.na(temp), rleid(temp), NA)) %>% #Run length along non-NA values
group_by(temp) %>% #Grouping by run length
mutate(Result = ifelse(!is.na(temp), cumsum(Variable.1), Variable.1)) %>% #Cumulative sum of desired rows
ungroup() %>%
select(-temp) #Removing the redundant variable
Date Factor Variable.1 Result
<chr> <int> <dbl> <dbl>
1 03/02/2018 0 0.750 0.750
2 04/02/2018 0 0.750 1.50
3 05/02/2018 1 0.960 2.46
4 06/02/2018 1 0.760 0.760
5 07/02/2018 0 1.35 1.35
6 08/02/2018 1 0.700 2.05
7 09/02/2018 1 2.02 2.02
8 10/02/2018 0 0. 0.
9 11/02/2018 0 0. 0.
10 12/02/2018 0 0.200 0.200

Related

Method in R to find difference between rows with varying row spacing

I want to add an extra column in a dataframe which displays the difference between certain rows, where the distance between the rows also depends on values in the table.
I found out that:
mutate(Col_new = Col_1 - lead(Col_1, n = x))
can find the difference for a fixed n, but only a integer can be used as input. How would you find the difference between rows for a varying distance between the rows?
I am trying to get the output in Col_new, which is the difference between the i and i+n row where n should take the value in column Count. (The data is rounded so there might be 0.01 discrepancies in Col_new).
col_1 count Col_new
1 0.90 1 -0.68
2 1.58 1 -0.31
3 1.89 1 0.05
4 1.84 1 0.27
5 1.57 1 0.27
6 1.30 2 -0.26
7 1.25 2 -0.99
8 1.56 2 -1.58
9 2.24 2 -1.80
10 3.14 2 -1.58
11 4.04 3 -0.95
12 4.72 3 0.01
13 5.04 3 0.60
14 4.99 3 0.60
15 4.71 3 0.01
16 4.44 4 -1.84
17 4.39 4 NA
18 4.70 4 NA
19 5.38 4 NA
20 6.28 4 NA
Data:
df <- data.frame(Col_1 = c(0.90, 1.58, 1.89, 1.84, 1.57, 1.30, 1.35,
1.56, 2.24, 3.14, 4.04, 4.72, 5.04, 4.99,
4.71, 4.44, 4.39, 4.70, 5.38, 6.28),
Count = sort(rep(1:4, 5)))
Some code that generates the intended output, but can undoubtably be made more efficient.
library(dplyr)
df %>%
mutate(col_2 = sapply(1:4, function(s){lead(Col_1, n = s)})) %>%
rowwise() %>%
mutate(Col_new = Col_1 - col_2[Count]) %>%
select(-col_2)
Output:
# A tibble: 20 × 3
# Rowwise:
Col_1 Count Col_new
<dbl> <int> <dbl>
1 0.9 1 -0.68
2 1.58 1 -0.310
3 1.89 1 0.0500
4 1.84 1 0.27
5 1.57 1 0.27
6 1.3 2 -0.26
7 1.35 2 -0.89
8 1.56 2 -1.58
9 2.24 2 -1.8
10 3.14 2 -1.58
11 4.04 3 -0.95
12 4.72 3 0.0100
13 5.04 3 0.600
14 4.99 3 0.600
15 4.71 3 0.0100
16 4.44 4 -1.84
17 4.39 4 NA
18 4.7 4 NA
19 5.38 4 NA
20 6.28 4 NA
df %>% mutate(Col_new = case_when(
df$count == 1 ~ df$col_1 - lead(df$col_1 , n = 1),
df$count == 2 ~ df$col_1 - lead(df$col_1 , n = 2),
df$count == 3 ~ df$col_1 - lead(df$col_1 , n = 3),
df$count == 4 ~ df$col_1 - lead(df$col_1 , n = 4),
df$count == 5 ~ df$col_1 - lead(df$col_1 , n = 5)
))
col_1 count Col_new
1 0.90 1 -0.68
2 1.58 1 -0.31
3 1.89 1 0.05
4 1.84 1 0.27
5 1.57 1 0.27
6 1.30 2 -0.26
7 1.25 2 -0.99
8 1.56 2 -1.58
9 2.24 2 -1.80
10 3.14 2 -1.58
11 4.04 3 -0.95
12 4.72 3 0.01
13 5.04 3 0.60
14 4.99 3 0.60
15 4.71 3 0.01
16 4.44 4 -1.84
17 4.39 4 NA
18 4.70 4 NA
19 5.38 4 NA
20 6.28 4 NA
This would give you your desired results but is not a very good solution for more cases. Imagine your task with 10 or more different counts another solution is required.

Creating new variable in wide data format, R

I have transformed my data into a wide format using the mlogit.data function in order to be able to perform an mlogit multinomial logit regression in R. The data has three different "choices" and looks like this (in its wide format):
Observation Choice Variable A Variable B Variable C
1 1 1.27 0.2 0.81
1 0 1.27 0.2 0.81
1 -1 1.27 0.2 0.81
2 1 0.20 0.45 0.70
2 0 0.20 0.45 0.70
2 -1 0.20 0.45 0.70
However, as the variables A, B and C are linked to the different outcomes I would now like to create a new variable that looks like this:
Observation Choice Variable A Variable B Variable C Variable D
1 1 1.27 0.2 0.81 1.27
1 0 1.27 0.2 0.81 0.2
1 -1 1.27 0.2 0.81 0.81
2 1 0.20 0.45 0.70 0.20
2 0 0.20 0.45 0.70 0.45
2 -1 0.20 0.45 0.70 0.70
I have tried the following code:
Variable D <- ifelse(Choice == "1", Variable A, ifelse(Choice == "-1", Variable B, Variable C))
However, the ifelse function only considers one choice from each observation, creating this:
Observation Choice Variable A Variable B Variable C Variable D
1 1 1.27 0.2 0.81 1.27
1 0 1.27 0.2 0.81 -
1 -1 1.27 0.2 0.81 -
2 1 0.20 0.45 0.70 -
2 0 0.20 0.45 0.70 0.2
2 -1 0.20 0.45 0.70 -
Anyone know how to solve this?
Thanks!
You can create a table mapping choices to variables and then use match
choice_map <-
data.frame(choice = c(1, 0, -1), var = grep('Variable[A-C]', names(df)))
# choice var
# 1 1 3
# 2 0 4
# 3 -1 5
df$VariableD <-
df[cbind(seq_len(nrow(df)), with(choice_map, var[match(df$Choice, choice)]))]
df
# Observation Choice VariableA VariableB VariableC VariableD
# 1 1 1 1.27 0.20 0.81 1.27
# 2 1 0 1.27 0.20 0.81 0.20
# 3 1 -1 1.27 0.20 0.81 0.81
# 4 2 1 0.20 0.45 0.70 0.20
# 5 2 0 0.20 0.45 0.70 0.45
# 6 2 -1 0.20 0.45 0.70 0.70
Data used (removed spaces in colnames)
df <- data.table::fread('
Observation Choice VariableA VariableB VariableC
1 1 1.27 0.2 0.81
1 0 1.27 0.2 0.81
1 -1 1.27 0.2 0.81
2 1 0.20 0.45 0.70
2 0 0.20 0.45 0.70
2 -1 0.20 0.45 0.70
', data.table = F)
df$`Variable D`= sapply(1:nrow(df),function(x){
df[x,4-df$Choice[x]]
})
> df
Observation Choice Variable A Variable B Variable C Variable D
1 1 1 1.27 0.20 0.81 1.27
2 1 0 1.27 0.20 0.81 0.20
3 1 -1 1.27 0.20 0.81 0.81
4 2 1 0.20 0.45 0.70 0.20
5 2 0 0.20 0.45 0.70 0.45
6 2 -1 0.20 0.45 0.70 0.70

How to merge three tables by inserting to each other in R?

I have a data frame as following. I want to know the evolution from RIK_T1 to RIK_T2 by seeing their frequency, row% and Column%. How to show them at once?
ID<-c('1','2','3','4','5','6','7','8','9','10')
RIK_T1<-c('20','15','20','20','97','20','20','20','15','15')
RIK_T2<-c('20','15','15','20','97','97','20','20','20','20')
df<-data.frame(ID,RIK_T1,RIK_T2)
df
TAB=table(df$RIK_T1,df$RIK_T2)
t1<-addmargins(TAB) #TABLE-01
TAB_row=prop.table(TAB,1)#row
t2<-round(addmargins(TAB_row),digits=2)#TABLE-01-1
TAB_col=prop.table(TAB,2)#column
t3<-round(addmargins(TAB_col),digits=2)#TABLE-01-2
I get three tables as following:table, row% and col%
15 20 97 Sum
15 1 2 0 3
20 1 4 1 6
97 0 0 1 1
Sum 2 6 2 10
15 20 97 Sum
15 0.33 0.67 0.00 1.00
20 0.17 0.67 0.17 1.00
97 0.00 0.00 1.00 1.00
Sum 0.50 1.33 1.17 3.00
15 20 97 Sum
15 0.50 0.33 0.00 0.83
20 0.50 0.67 0.50 1.67
97 0.00 0.00 0.50 0.50
Sum 1.00 1.00 1.00 3.00
Is it possible to merge them into one table as following?
15 20 97 Sum
R%/C% R%/C% R%/C% R%/C%
15 1 2 0 3
0.33/0.50 0.67/0.33 0.00/0.00 1.00/0.83
20 1 4 1 6
0.17/0.50 0.67/0.67 0.17/0.50 1.00/1.67
97 0 0 1 1
0.00/0.00 0.00/0.00 1.00/0.50 1.00/0.50
Sum 2 6 2 10
0.50/1.00 1.33/1.00 1.17/1.00 3.00/3.00
Thanks in advance.

How to count these transitions - in R

Given a table of values, where A = state of system, B = length of state, and C = cumulative length of states:
A B C
1 1.16 1.16
0 0.51 1.67
1 1.16 2.84
0 0.26 3.10
1 0.59 3.69
0 0.39 4.08
1 0.78 4.85
0 0.90 5.75
1 0.78 6.53
0 0.26 6.79
1 0.12 6.91
0 0.51 7.42
1 0.26 7.69
0 0.51 8.20
1 0.39 8.59
0 0.51 9.10
1 1.16 10.26
0 1.10 11.36
1 0.59 11.95
0 0.51 12.46
How would I use R to calculate the number of transitions (where A gives the state) per constant interval length - where the intervals are consecutive and could be any arbitrary number (I chose a value of 2 in my image example)? For example, using the table values or the image included we count 2 transitions from 0-2, 3 transitions from greater than 2-4, 3 transitions from >4-6, etc.
This is straightforward in R. All you need is column C and ?cut. Consider:
d <- read.table(text="A B C
1 1.16 1.16
0 0.51 1.67
1 1.16 2.84
0 0.26 3.10
1 0.59 3.69
0 0.39 4.08
1 0.78 4.85
0 0.90 5.75
1 0.78 6.53
0 0.26 6.79
1 0.12 6.91
0 0.51 7.42
1 0.26 7.69
0 0.51 8.20
1 0.39 8.59
0 0.51 9.10
1 1.16 10.26
0 1.10 11.36
1 0.59 11.95
0 0.51 12.46", header=TRUE)
fi <- cut(d$C, breaks=seq(from=0, to=14, by=2))
table(fi)
# fi
# (0,2] (2,4] (4,6] (6,8] (8,10] (10,12] (12,14]
# 2 3 3 5 3 3 1

get the paired sample in R language [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
X<-scan()
1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1
1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 0 0 1 1 0 0 1 1 1
1 1 1 0 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1
Z<-scan()
-0.05 0.11 -0.01 1.08 0.68 -1.79 -0.12 -0.06 0.17 -1.35 1.55 0.60
-1.42 -1.21 0.97 0.23 0.20 0.89 0.28 0.56 1.02 -0.32 0.20 -1.35
0.53 -0.52 -0.07 -1.07 0.10 0.53 0.97 0.32 -0.07 0.98 -1.23 0.72
-0.09 0.31 1.25 0.60 1.16 -0.98 1.63 0.72 0.24 -0.02 -1.13 0.56
0.78 1.75 -0.01 -0.44 0.47 -0.21 2.06 2.19 -0.94 -0.36 1.35 -1.35
1.50 0.13 -0.20 -0.57 -0.14 -1.34 -1.17 2.04 0.21 1.47 -1.20 -0.60
0.15 -0.64 -0.71 0.24 -0.86 -1.39 -0.63 -1.25 0.40 -0.76 0.73 -0.15
0.09 0.35 -0.19 0.29 0.56 0.82 -0.28 0.63 1.35 -0.04 1.99 1.12
-1.91 0.26 -1.18 -0.10
In the vector X, 0 is control group and 1 is case group.
I want to match this cases and controls based on Z vector.Actually I want to match elements of X based on Z ang get the samples from matched data.
what should I do?
The other answers seem to think that you're looking for subsetting, but I'm assuming (based on your use of the language "case" and "controls") that you're talking about matching in a statistical sense. If so, it sounds like you want something like the functionality provided by the Matching package, like the following:
library(Matching)
out <- Match(Tr=X,X=Z)
out$mdata # list of `Y` outcome vector (if applicable),
# `Tr` treatment vector, and
# `X` matrix of covariates for the matched sample
If you also have an outcome measure, you can specify that in Match and it will give you treatment effect estimates.
There are also other packages to do matching, like MatchIt, cem, and nonrandom (the last of which has apparently been removed from CRAN), depending on what particular matching procedure you're going for.
I suppose you are looking for
Z[as.logical(X)] # case
and
Z[!X] # control
I suppose your question is about subsetting, here is some examples:
# Data
X<-c(1,1,1,0,1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,0,1,0,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,1,0,1,1,1,1,1,1,0,1,1,1,1,0,0,1,1,0,0,1,1,1,1,1,1,0,1,1,1,1,1,0,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1)
Z<-c(-0.05,0.11,-0.01,1.08,0.68,-1.79,-0.12,-0.06,0.17,-1.35,1.55,0.60,-1.42,-1.21,0.97,0.23,0.20,0.89,0.28,0.56,1.02,-0.32,0.20,-1.35,0.53,-0.52,-0.07,-1.07,0.10,0.53,0.97,0.32,-0.07,0.98,-1.23,0.72,-0.09,0.31,1.25,0.60,1.16,-0.98,1.63,0.72,0.24,-0.02,-1.13,0.56,0.78,1.75,-0.01,-0.44,0.47,-0.21,2.06,2.19,-0.94,-0.36,1.35,-1.35,1.50,0.13,-0.20,-0.57,-0.14,-1.34,-1.17,2.04,0.21,1.47,-1.20,-0.60,0.15,-0.64,-0.71,0.24,-0.86,-1.39,-0.63,-1.25,0.40,-0.76,0.73,-0.15,0.09,0.35,-0.19,0.29,0.56,0.82,-0.28,0.63,1.35,-0.04,1.99,1.12,-1.91,0.26,-1.18,-0.10)
myMatrix <- cbind(X,Z)
# Subsetting
myMatrixControls <- myMatrix[ myMatrix[,1]==0,]
myMatrixCases <- myMatrix[ myMatrix[,1]==1,]
# Example: get sum per group
sumZ_Contolrs <- sum(myMatrix[ myMatrix[,1]==0, 2])
sumZ_Cases <- sum(myMatrix[ myMatrix[,1]==1, 2])

Resources