I need to implement a logic in my R script for the below shown sample data frame. df
ID A B
1 2.471264262 0
2 2.53024575 0
3 2.559114933 1
4 2.502350493 1
5 2.529496526 0
6 2.480199137 0
7 2.521066835 0
8 2.481272625 0
9 2.505953959 0
10 2.481272625 0
11 2.499424723 0
12 2.492515087 0
13 2.502385996 0
14 2.487579633 0
15 2.479438021 -1
16 2.044195946 1
17 2.054051421 0
18 2.108811073 1
19 2.249767599 0
20 2.627294516 -1
21 2.624337386 0
22 2.157110862 0
23 2.142325212 -1
24 2.124582433 -1
25 2.114725333 0
26 2.113739623 0
27 1.92054047 0
28 2.00037188 0
29 2.183995509 0
30 2.629451192 0
31 2.772756046 0
32 2.603141474 0
33 2.502385996 0
Column B shows the data point where State is changed. Now I need to implement a complex logic where I will be adding or subtracting the "Correction Factor" for the values in Column A for next 15 data points from the point where B == 1 or -1.
The formula for the correction factor is as follows,
If B == 1 then Correction Factor == [A - 0.19*(15/15)*A], Also value the fraction (15/15) will keep on decrementing for the next 15 values like (14/15) , (13/15) .....(0/15).
Similarly if B == -1 then Correction Factor == [A + 0.53*(15/15)*A], Also value (15/15) will keep on decrementing for the next 15 values like (14/15) , (13/15) .....(0/15).
And another condition to consider is that, Once a state change has be detected in B then though there is state change with in the next 15 values, it should not be considered. Ex First change in state is detected at B3 then though there is state change in B4,B15,16 it should not be considered.
For a better Understanding I have attached my expected output along with the formulas executed manually in excel.
Expected Output
A B A With Correction Factor Formula Executed
2.471264262 0 2.471264262 Same Value of A retained since no transition
2.53024575 0 2.53024575 Same Value of A retained since no transition
2.559114933 1 2.072883096 A4-0.19* (15/15)*A4
2.502350493 1 2.058600339 A5-0.19* (14/15)*A5
2.529496526 0 2.112972765 A6-0.19* (13/15)*A6
2.480199137 0 2.103208868 A7-0.19* (12/15)*A7
2.521066835 0 2.169798189 A8-0.19* (11/15)*A8
2.481272625 0 2.166978093 A9-0.19* (10/15)*A9
2.505953959 0 2.220275208 A10-0.19* (9/15)*A10
2.481272625 0 2.229836999 A11-0.19* (8/15)*A11
2.499424723 0 2.277809064 A12-0.19* (7/15)*A12
2.492515087 0 2.30308394 A13-0.19* (6/15)*A13
2.502385996 0 2.34390155 A14-0.19* (5/15)*A14
2.487579633 0 2.361542265 A15-0.19* (4/15)*A15
2.479438021 -1 2.385219376 A16-0.19* (3/15)*A16
2.044195946 1 1.992409649 A17-0.19* (2/15)*A17
2.054051421 0 2.028033436 A18-0.19* (1/15)*A18
2.108811073 1 2.108811073 A19-0.19* (0/15)*A19
2.249767599 0 2.249767599 Same Value of A retained since no transition
2.627294516 -1 4.019760609 A21+0.53*(15/15)*A21
2.624337386 0 3.922509613 A22+0.53*(14/15)*A22
2.157110862 0 3.147943785 A23+0.53*(13/15)*A23
2.142325212 -1 3.050671102 A24+0.53*(12/15)*A24
2.124582433 -1 2.950336805 A25+0.53*(11/15)*A25
2.114725333 0 2.861928284 A26+0.53*(10/15)*A26
2.113739623 0 2.785908823 A27+0.53*(9/15)*A27
1.92054047 0 2.463413243 A28+0.53*(8/15)*A28
2.00037188 0 2.495130525 A29+0.53*(7/15)*A29
2.183995509 0 2.647002557 A30+0.53*(6/15)*A30
2.629451192 0 3.093987569 A31+0.53*(5/15)*A31
2.772756046 0 3.164638901 A32+0.53*(4/15)*A32
2.603141474 0 2.87907447 A33+0.53*(3/15)*A33
2.502385996 0 2.679221273 A34+0.53*(2/15)*A34
Edit
The code suggested below works exactly as required for the above mentioned dataframe i.e the dataframe with 33 rows, but I have the below data frame with 32rows and code doesnt work. Any suggestion on this?
ID A B
1 2.471264262 0
2 2.53024575 0
3 2.559114933 1
4 2.502350493 1
5 2.529496526 0
6 2.480199137 0
7 2.521066835 0
8 2.481272625 0
9 2.505953959 0
10 2.481272625 0
11 2.499424723 0
12 2.492515087 0
13 2.502385996 0
14 2.487579633 0
15 2.479438021 -1
16 2.044195946 1
17 2.054051421 0
18 2.108811073 1
19 2.249767599 0
20 2.627294516 -1
21 2.624337386 0
22 2.157110862 0
23 2.142325212 -1
24 2.124582433 -1
25 2.114725333 0
26 2.113739623 0
27 1.92054047 0
28 2.00037188 0
29 2.183995509 0
30 2.629451192 0
31 2.772756046 0
32 2.603141474 0
Well I was not able to post another question giving this post as the reference so I have updated iin the same post.
Thanks.
This should work, the counting to 15 is a little tricky, so we use a for loop to calculate the correct counter and state. The actual formula is then relatively simple:
counter <- 0
current_state <- NA
for (i in seq_along(df$B)) {
if (counter == 0) {
if (df$B[i] == 0) next
counter <- 15
current_state <- df$B[i]
df$state[i] <- df$B[i]
df$counter[i] <- counter
} else {
counter <- counter - 1
df$state[i] <- current_state
df$counter[i] <- counter
}
}
df$A_corr <- ifelse(df$state == 1,
df$A - 0.19 * (df$counter / 15) * df$A,
df$A + 0.53 * (df$counter / 15) * df$A)
df$A_corr <- ifelse(is.na(df$A_corr), df$A, df$A_corr)
Gives:
> df
ID A B state counter A_corr
1 1 2.471264 0 NA NA 2.471264
2 2 2.530246 0 NA NA 2.530246
3 3 2.559115 1 1 15 2.072883
4 4 2.502350 1 1 14 2.058600
5 5 2.529497 0 1 13 2.112973
6 6 2.480199 0 1 12 2.103209
7 7 2.521067 0 1 11 2.169798
8 8 2.481273 0 1 10 2.166978
9 9 2.505954 0 1 9 2.220275
10 10 2.481273 0 1 8 2.229837
11 11 2.499425 0 1 7 2.277809
12 12 2.492515 0 1 6 2.303084
13 13 2.502386 0 1 5 2.343902
14 14 2.487580 0 1 4 2.361542
15 15 2.479438 -1 1 3 2.385219
16 16 2.044196 1 1 2 1.992410
17 17 2.054051 0 1 1 2.028033
18 18 2.108811 1 1 0 2.108811
19 19 2.249768 0 NA NA 2.249768
20 20 2.627295 -1 -1 15 4.019761
21 21 2.624337 0 -1 14 3.922510
22 22 2.157111 0 -1 13 3.147944
23 23 2.142325 -1 -1 12 3.050671
24 24 2.124582 -1 -1 11 2.950337
25 25 2.114725 0 -1 10 2.861928
26 26 2.113740 0 -1 9 2.785909
27 27 1.920540 0 -1 8 2.463413
28 28 2.000372 0 -1 7 2.495131
29 29 2.183996 0 -1 6 2.647003
30 30 2.629451 0 -1 5 3.093988
31 31 2.772756 0 -1 4 3.164639
32 32 2.603141 0 -1 3 2.879074
33 33 2.502386 0 -1 2 2.679221
Related
I'm trying to perform calculations on different elements in a matrix in R. My Matrix is 18x18 and I would like to get e.g. the mean of each 6x6 array (which makes 9 arrays in total). My desired arrays would be:
A1 <- df[1:6,1:6]
A2 <- df[1:6,7:12]
A3 <- df[1:6,13:18]
B1 <- df[7:12,1:6]
B2 <- df[7:12,7:12]
B3 <- df[7:12,13:18]
C1 <- df[13:18,1:6]
C2 <- df[13:18,7:12]
C3 <- df[13:18,13:18]
The matrix looks like this:
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
5 14 17 9 10 8 4 10 12 18 9 13 14 NA NA 19 15 10 10
10 30 32 23 27 17 28 25 12 28 29 28 26 19 25 34 24 11 17
15 16 16 16 9 17 27 17 16 30 13 18 13 15 13 19 8 7 9
20 15 12 18 18 18 6 4 6 9 11 10 10 13 11 8 10 15 15
25 7 13 21 7 3 5 2 5 5 4 3 2 3 5 2 1 5 6
30 5 9 1 7 7 4 4 12 8 9 2 0 5 2 1 0 2 6
35 3 0 2 0 0 4 4 7 4 4 5 2 0 0 1 0 0 0
40 0 4 0 0 0 1 3 9 10 10 1 0 0 0 1 0 1 0
45 0 0 0 0 0 3 10 9 17 9 1 0 0 0 0 0 0 0
50 0 0 2 0 0 0 2 8 20 0 0 0 0 0 1 0 0 0
55 0 0 0 0 0 0 7 3 21 0 0 0 0 0 0 0 0 0
60 0 0 0 0 3 4 10 2 2 0 0 1 0 0 0 0 0 0
65 0 0 0 0 0 4 8 4 8 11 0 0 0 0 0 0 0 0
70 0 0 0 0 0 6 2 5 14 0 0 0 0 0 0 0 0 0
75 0 0 0 0 0 4 0 5 9 0 0 0 0 0 0 0 0 0
80 0 0 0 0 0 4 4 0 4 2 0 0 0 0 0 0 0 0
85 0 0 0 0 0 0 0 4 1 1 0 0 0 0 0 0 0 0
90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Is there a clean way to solve this issue with a loop?
Thanks a lot in advance,
Paul
Given your matrix, e.g.
x <- matrix(1:(18*18), ncol=18)
Try, for example for sub matrices of 6
step <- 6
nx <- nrow(x)
if((nx %% step) != 0) stop("nx %% step should be 0")
indI <- seq(1, nx, by=step)
nbStep <- length(indI)
for(Col in 1:nbStep){
for(Row in 1:nbStep){
name <- paste0(LETTERS[Col],Row)
theCol <- indI[Col]:(indI[Col]+step-1)
theRow <- indI[Row]:(indI[Row]+step-1)
assign(name, sum(x[theCol, theRow]))
}
}
You'll get your results in A1, A2, A3...
This is the idea. Twist the code for non square matrices, different size of sub matrices, ...
Here's one way:
# generate fake data
set.seed(47)
n = 18
m = matrix(rpois(n * n, lambda = 5), nrow = n)
# generate starting indices
n_array = 6
start_i = seq(1, n, by = n_array)
arr_starts = expand.grid(row = start_i, col = start_i)
# calculate sums
with(arr_starts, mapply(function(x, y) sum(m[(x + 1:n_array) - 1, (y + 1:n_array) - 1]), row, col))
# [1] 158 188 176 201 188 201 197 206 204
I want to create a new column that is 1 if the value of a particular column is an outlier. Otherwise, the value should be 0.
An example would be the following:
outlier <- c(rnorm(10,0,5),40,-60,rnorm(10,0,5))
V1
1 -6.273411
2 -6.576979
3 9.256693
4 -2.448468
5 -7.386433
6 -8.922403
7 -1.339524
8 -2.136594
9 -2.271990
10 -6.066499
11 40.000000
12 -60.000000
13 6.697281
14 -3.212984
15 6.950176
16 -7.054237
17 11.820208
18 -1.836457
19 -1.341675
20 -3.271044
21 -10.260103
22 8.239565
So, observation 11 and 12 should be clearly outliers:
boxplot.stats(outlier)$out
[1] 40 -60
What I want to archive is the following:
V1 V2
1 -6.273411 0
2 -6.576979 0
3 9.256693 0
4 -2.448468 0
5 -7.386433 0
6 -8.922403 0
7 -1.339524 0
8 -2.136594 0
9 -2.271990 0
10 -6.066499 0
11 40.000000 1
12 -60.000000 1
13 6.697281 0
14 -3.212984 0
15 6.950176 0
16 -7.054237 0
17 11.820208 0
18 -1.836457 0
19 -1.341675 0
20 -3.271044 0
21 -10.260103 0
22 8.239565 0
Is there any elegant way to do this?
Thanks!
Keep in mind there is no universal, agreed definition for what is an "outlier" in all cases. By default, boxplot assumes the value is no more than 1.5 times the inter-quartile range away from the .25 and .75 quartiles. You can write your own function which gives you complete control over the definition. For example
is_outlier <- function(x) {
iqr <- IQR(x)
q <- quantile(x, c(.25, .75))
x < q[1]-1.5*iqr | x > q[2]+1.5*iqr
}
you can use it with your data like
is_outlier(outlier)
which returns TRUE/FALSE. Which you can convert to 1/0 with as.numeric(is_outlier(outlier)) or is_outlier(outlier)+0 if that's really needed.
We can use %in% to convert to logical and coerce it back to binary with as.integer or +
+(outlier %in% boxplot.stats(outlier)$out)
#[1] 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0
I need to implement a counter that decrements dec_cntby 1 based on certain conditions.
Below is my dataframe df.
ID A
1 0
2 0
3 0
4 1
5 1
6 0
7 0
8 0
9 0
10 0
11 0
12 0
13 0
14 0
15 0
16 -1
17 1
18 0
19 1
20 0
21 -1
22 0
23 0
24 -1
25 0
26 0
27 0
28 0
29 0
30 0
31 0
32 0
33 0
34 0
The conditions are
a. The counter should start from the data point where the A==1 or -1 and start decrementing the counter for next 16 values,for example value of A == 1 at ID 4, so from ID == 4 till ID==19 the decrement counter should be implemented starting from value 15 till counter is 0. Also to note that if there exists any A== 1/-1 in between this range it should be ignored.
b. I also need to implement retain_A column which retains the value of A through out the counter.
Below is my expected output.
ID A retain_A dec_cnt
1 0 NA NA
2 0 NA NA
3 0 NA NA
4 1 1 15
5 1 1 14
6 0 1 13
7 0 1 12
8 0 1 11
9 0 1 10
10 0 1 9
11 0 1 8
12 0 1 7
13 0 1 6
14 0 1 5
15 0 1 4
16 -1 1 3
17 1 1 2
18 0 1 1
19 1 1 0
20 0 NA NA
21 -1 -1 15
22 0 -1 14
23 0 -1 13
24 -1 -1 12
25 0 -1 11
26 0 -1 10
27 0 -1 9
28 0 -1 8
29 0 -1 7
30 0 -1 6
31 0 -1 5
32 0 -1 4
33 0 -1 3
34 0 -1 2
The similar kind of question had been posted couple of days ago where the solution uses for loop, Also the loop fails to execute if the data points are more than 35. I wanted to avoid for loop because its execution time will be more if we are dealing with huge amount of data.
The data frame is take from the question posted here
below is the script that I tried using the above referenced post.
dec_cnt <- 0
Retain_A <- NA
for (i in seq_along(df$A)) {
if (dec_cnt == 0) {
if (df$A[i] == 0) next
dec_cnt <- 15
Retain_A <- df$A[i]
df$Retain_A[i] <- df$A[i]
df$dec_cnt[i] <- dec_cnt
} else {
dec_cnt <- dec_cnt - 1
df$Retain_A[i] <- Retain_A
df$dec_cnt[i] <- dec_cnt
}
}
I don't think it's realistic to avoid any kind of loop, for or otherwise. Perhaps a more realistic goal would be to avoid loops that iterate over every single value, regardless of whether it is relevant.
Starting from your 2-column input, let's pre-set the empty columns:
dat$retain_A <- NA
dat$dec_cnt <- NA
Here's where we can gain some efficiency: instead of repeatedly making comparisons, we can know if it matches -1/1 now:
ind <- which(dat$A %in% c(-1,1))
last_match <- 0
ind
# [1] 4 5 16 17 19 21 24
The trick is to keep track of the last_match and discard any indices between it and the next 15 entries.
ind <- ind[ind > last_match]
while (length(ind) > 0) {
i <- seq(ind[1], min(ind[1] + 15, nrow(dat)))
dat$dec_cnt[i] <- head(15:0, n = length(i))
dat$retain_A[i] <- dat$A[ ind[1] ]
last_match <- ind[1] + 15
ind <- ind[ind > last_match]
}
dat
# ID A retain_A dec_cnt
# 1 1 0 NA NA
# 2 2 0 NA NA
# 3 3 0 NA NA
# 4 4 1 1 15
# 5 5 1 1 14
# 6 6 0 1 13
# 7 7 0 1 12
# 8 8 0 1 11
# 9 9 0 1 10
# 10 10 0 1 9
# 11 11 0 1 8
# 12 12 0 1 7
# 13 13 0 1 6
# 14 14 0 1 5
# 15 15 0 1 4
# 16 16 -1 1 3
# 17 17 1 1 2
# 18 18 0 1 1
# 19 19 1 1 0
# 20 20 0 NA NA
# 21 21 -1 -1 15
# 22 22 0 -1 14
# 23 23 0 -1 13
# 24 24 -1 -1 12
# 25 25 0 -1 11
# 26 26 0 -1 10
# 27 27 0 -1 9
# 28 28 0 -1 8
# 29 29 0 -1 7
# 30 30 0 -1 6
# 31 31 0 -1 5
# 32 32 0 -1 4
# 33 33 0 -1 3
# 34 34 0 -1 2
You'll find that your initial loop iterates once per row whereas this solution iterates only once per non-zero.
I have a data.frame with a factor identifying events
year event
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 1
10 0
11 0
12 0
13 0
14 0
15 0
16 0
17 1
18 0
19 0
20 0
And I would need a counter-type identifying a given window around the events. The result should look like this (for a window that is, for example, 3 periods around the event):
year event window
1 0
2 0
3 0
4 0
5 0
6 0 -3
7 0 -2
8 0 -1
9 1 0
10 0 1
11 0 2
12 0 3
13 0
14 0 -3
15 0 -2
16 0 -1
17 1 0
18 0 1
19 0 2
20 0 3
Any guidance on how to implement this within a function would be appreciated. You can copy the data. frame, pasting the block above in "..." here:
dt <- read.table( text="...", , header = TRUE )
Assuming there is no overlapping, you can use on of my favourite base functions, filter:
DF <- read.table(text="year event
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 1
10 0
11 0
12 0
13 0
14 0
15 0
16 0
17 1
18 0
19 0
20 0", header=TRUE)
DF$window <- head(filter(c(rep(0, 3), DF$event, rep(0, 3)),
filter=-3:3)[-(1:3)], -3)
DF$window[DF$window == 0 & DF$event==0] <- NA
# year event window
# 1 1 0 NA
# 2 2 0 NA
# 3 3 0 NA
# 4 4 0 NA
# 5 5 0 NA
# 6 6 0 -3
# 7 7 0 -2
# 8 8 0 -1
# 9 9 1 0
# 10 10 0 1
# 11 11 0 2
# 12 12 0 3
# 13 13 0 NA
# 14 14 0 -3
# 15 15 0 -2
# 16 16 0 -1
# 17 17 1 0
# 18 18 0 1
# 19 19 0 2
# 20 20 0 3
I dont want to perform operation in a loop,My data look like this
dfU[4:7]
vNeg neg pos vPos
1 0 35 28 0
2 0 42 26 0
3 0 77 59 0
4 0 14 24 0
5 0 35 45 0
6 0 17 12 0
7 0 31 23 0
8 0 64 52 1
9 0 15 17 0
10 0 21 29 0
when i performed certain operation like this but getting an wrong result may be just because of conversion i tried with with and transform also but getting an error not meaningful for factors
b<-as.numeric(((as.numeric(dfU[,4])*-5)+(as.numeric(dfU[,5])*-2)+(as.numeric(dfU[,6])*2)+(as.numeric(dfU[,7])*5)))
b
[1] -14 -32 -16 18 8 -8 -18 -7 6 14 24 -9 0
error may be just because of this when i am converting integer to numeric
typeof(dfU[,4])
[1] "integer"
as.numeric(dfU[,4])
[1] 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1
k<-transform(dfU, (vNeg*(-5))+(neg*(-2))+(pos*2)+(vPos*5))
not meaningful for factors
i want the 8th column in a dataframe to be as score and i want to avoid the loop ,Is their any better way to perform operation on columns,any help in this direction,thanks.
The best would be to avoid having the 4th. column as factor if this is not what to you want to.
Still, a workaround is using as.numeric(as.character( )). Assume "a" is your 4th column, your situation is this:
> a <- as.factor(c(rep(0,7),1,rep(0,2)))
> a
[1] 0 0 0 0 0 0 0 1 0 0
Levels: 0 1
> as.numeric(a)
[1] 1 1 1 1 1 1 1 2 1 1
And the workaround does:
> as.numeric(as.character(a))
[1] 0 0 0 0 0 0 0 1 0 0