I was playing with R a little bit and I came out with this behavior that I don't understand:
num <- seq(1,20,1)
num[num %% c(1,2) == 0]
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
So it seems to be an analog expression of
num[num %% 1 == 0 | num %% 2 == 0]
But when I do the following gets weird:
num[num %% c(1,3) == 0]
[1] 1 3 5 6 7 9 11 12 13 15 17 18 19
num[num %% c(1,4) == 0]
[1] 1 3 4 5 7 8 9 11 12 13 15 16 17 19 20
I have been thinking about it, but I can't come out with an explanation for this. It's just out of curiosity, but if someone has a reason it would be very interesting to hear!.
Thanks!
As jogo says, it's the recycling rule.
The result of num %% 1 is
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
whilst the result of num %% 3 is
[1] 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2
Looking at the result of num %% c(1,3)
[1] 0 2 0 1 0 0 0 2 0 1 0 0 0 2 0 1 0 0 0 2
The first number in the result is taken from the first number of the num %% 1 result, the second from the second number of the num %% 3 result, the third from the third in num %% 1 and so on.
Related
I have a dataframe where some rows have values as 0. I want to make a code that makes the next few rows as 0 too.
> head(df$n,n=20)
df$n
1 0
2 9009
3 0
4 0
5 0
6 0
7 0
8 5410
9 0
10 0
11 0
12 0
13 0
14 0
15 32
16 0
17 0
18 1054
19 0
20 0
I want to create a code that converts the next five rows with value 0 as 0.
basically row with 0 is 0 and the next five rows is also 0.
I tried
for(j in 1:nrow(indx)){
for(i in 1:4){
df$n[j+i]<-0
}
}
where indx is dataframe containing all the row number with 0 values.
This works but incorrectly.
How to I get my desired output?
> head(df$n,n=20)
df$n
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 5410
9 0
10 0
11 0
12 0
13 0
14 0
15 32
16 0
17 0
18 0
19 0
20 0
Edit: sorry for the unclear language. My aim is to convert 5 values after 0 to 0. since it is incorrect data.
Edit2: I think this code worked for me. its a little bit primitive.
for( i in 1:nrow(indx)){
u<-indx[i,]
df[u,]<-0
df[u+1,]<-0
df[u+2,]<-0
df[u+3,]<-0
df[u+4,]<-0
df[u+5,]<-0
}
however it introduces extra rows at end but it works.
If I understand correctly, you want to make sure any run of zeros is at least five rows long, unless it's at the end of the data. Here's a dplyr-based solution:
library(dplyr)
df %>%
group_by(zero_run = cumsum(n == 0 & lag(n, default = 1) != 0)) %>%
mutate(
zeros_consecutive = row_number(),
n_new = ifelse(zero_run == 0 | zeros_consecutive > 5, n, 0)
) %>%
ungroup()
# # A tibble: 20 × 4
# n zero_run zeros_consecutive n_new
# <dbl> <int> <int> <dbl>
# 1 0 1 1 0
# 2 9009 1 2 0
# 3 0 2 1 0
# 4 0 2 2 0
# 5 0 2 3 0
# 6 0 2 4 0
# 7 0 2 5 0
# 8 5410 2 6 5410
# 9 0 3 1 0
# 10 0 3 2 0
# 11 0 3 3 0
# 12 0 3 4 0
# 13 0 3 5 0
# 14 0 3 6 0
# 15 32 3 7 32
# 16 0 4 1 0
# 17 0 4 2 0
# 18 1054 4 3 0
# 19 0 5 1 0
# 20 0 5 2 0
I left in the helper columns to better demonstrate the approach, but you could remove these by using n = ifelse(...) instead of n_new = ifelse(...) and adding select(!zeros_run:zeros_consecutive).
I am trying to recode a data frame with four columns. Across all of the columns, I want to recode all the numeric values into these ordinal numeric values:
0 stays as is
1:3 <- 1
4:10 <- 2
11:22 <- 3
22:max <-4
This is the data frame:
> df
T4.1 T4.2 T4.3 T4.4
1 0 54 0 5
2 0 5 0 0
3 0 3 0 0
4 0 2 0 0
5 0 3 0 0
6 0 2 0 0
7 0 4 0 0
8 1 20 0 0
9 1 7 0 2
10 0 14 0 0
11 0 3 0 0
12 0 202 0 41
13 2 12 0 0
14 3 6 0 0
15 3 21 0 3
16 0 143 0 0
17 0 0 0 0
18 4 9 0 0
19 3 15 0 0
20 0 58 0 6
21 2 0 0 0
22 0 52 0 0
23 0 3 0 0
24 0 1 0 0
25 4 6 0 1
26 1 4 0 0
27 0 38 0 1
28 0 6 0 0
29 0 8 0 0
30 0 29 0 4
31 1 14 0 0
32 0 12 0 10
33 4 1 0 3
I'm trying to use the recode function, but I can't seem to figure out how to input a range of numeric values into it. I get the following errors with my attempts:
> recode(df, 11:22=3)
Error: unexpected '=' in "recode(df, 11:22="
> recode(df, c(11:22)=3)
Error: unexpected '=' in "recode(df, c(11:22)="
I would greatly appreciate any advice. Thanks for your time!
Edit: Thanks all for the help!!
You can use cut with range of values as:
df_res <- as.data.frame(sapply(df, function(x)cut(x,
breaks = c(-0.5, 0.5, 3.5, 10.5, 22.5, Inf),
labels = c(0, 1, 2, 3, 4)))
)
str(df_res)
#'data.frame': 33 obs. of 4 variables:
# $ T4.1: Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 2 2 1 ...
# $ T4.2: Factor w/ 5 levels "0","1","2","3",..: 5 3 2 2 2 2 3 4 3 4 ...
# $ T4.3: Factor w/ 1 level "0": 1 1 1 1 1 1 1 1 1 1 ...
# $ T4.4: Factor w/ 4 levels "0","1","2","4": 3 1 1 1 1 1 1 1 2 1 ...
df_res
# T4.1 T4.2 T4.3 T4.4
# 1 0 4 0 2
# 2 0 2 0 0
# 3 0 1 0 0
# 4 0 1 0 0
# 5 0 1 0 0
# 6 0 1 0 0
# 7 0 2 0 0
# 8 1 3 0 0
# 9 1 2 0 1
# 10 0 3 0 0
# 11 0 1 0 0
# 12 0 4 0 4
# 13 1 3 0 0
# 14 1 2 0 0
# 15 1 3 0 1
# 16 0 4 0 0
# 17 0 0 0 0
# 18 2 2 0 0
# 19 1 3 0 0
# 20 0 4 0 2
# 21 1 0 0 0
# 22 0 4 0 0
# 23 0 1 0 0
# 24 0 1 0 0
# 25 2 2 0 1
# 26 1 2 0 0
# 27 0 4 0 1
# 28 0 2 0 0
# 29 0 2 0 0
# 30 0 4 0 2
# 31 1 3 0 0
# 32 0 3 0 2
# 33 2 1 0 1
I find named vectors are a nice pattern for re-coding variables, especially for irregular patterns. You could use one like this here:
decoder <- c(0, rep(1,3), rep(2,7), rep(3, 12))
names(decoder) <- 0:22
sapply(df, function(x) ifelse(x <= 22, decoder[as.character(x)], 4))
If the re-coding was more of a pattern, cut is a useful function.
I need to implement a logic in my R script for the below shown sample data frame. df
ID A B
1 2.471264262 0
2 2.53024575 0
3 2.559114933 1
4 2.502350493 1
5 2.529496526 0
6 2.480199137 0
7 2.521066835 0
8 2.481272625 0
9 2.505953959 0
10 2.481272625 0
11 2.499424723 0
12 2.492515087 0
13 2.502385996 0
14 2.487579633 0
15 2.479438021 -1
16 2.044195946 1
17 2.054051421 0
18 2.108811073 1
19 2.249767599 0
20 2.627294516 -1
21 2.624337386 0
22 2.157110862 0
23 2.142325212 -1
24 2.124582433 -1
25 2.114725333 0
26 2.113739623 0
27 1.92054047 0
28 2.00037188 0
29 2.183995509 0
30 2.629451192 0
31 2.772756046 0
32 2.603141474 0
33 2.502385996 0
Column B shows the data point where State is changed. Now I need to implement a complex logic where I will be adding or subtracting the "Correction Factor" for the values in Column A for next 15 data points from the point where B == 1 or -1.
The formula for the correction factor is as follows,
If B == 1 then Correction Factor == [A - 0.19*(15/15)*A], Also value the fraction (15/15) will keep on decrementing for the next 15 values like (14/15) , (13/15) .....(0/15).
Similarly if B == -1 then Correction Factor == [A + 0.53*(15/15)*A], Also value (15/15) will keep on decrementing for the next 15 values like (14/15) , (13/15) .....(0/15).
And another condition to consider is that, Once a state change has be detected in B then though there is state change with in the next 15 values, it should not be considered. Ex First change in state is detected at B3 then though there is state change in B4,B15,16 it should not be considered.
For a better Understanding I have attached my expected output along with the formulas executed manually in excel.
Expected Output
A B A With Correction Factor Formula Executed
2.471264262 0 2.471264262 Same Value of A retained since no transition
2.53024575 0 2.53024575 Same Value of A retained since no transition
2.559114933 1 2.072883096 A4-0.19* (15/15)*A4
2.502350493 1 2.058600339 A5-0.19* (14/15)*A5
2.529496526 0 2.112972765 A6-0.19* (13/15)*A6
2.480199137 0 2.103208868 A7-0.19* (12/15)*A7
2.521066835 0 2.169798189 A8-0.19* (11/15)*A8
2.481272625 0 2.166978093 A9-0.19* (10/15)*A9
2.505953959 0 2.220275208 A10-0.19* (9/15)*A10
2.481272625 0 2.229836999 A11-0.19* (8/15)*A11
2.499424723 0 2.277809064 A12-0.19* (7/15)*A12
2.492515087 0 2.30308394 A13-0.19* (6/15)*A13
2.502385996 0 2.34390155 A14-0.19* (5/15)*A14
2.487579633 0 2.361542265 A15-0.19* (4/15)*A15
2.479438021 -1 2.385219376 A16-0.19* (3/15)*A16
2.044195946 1 1.992409649 A17-0.19* (2/15)*A17
2.054051421 0 2.028033436 A18-0.19* (1/15)*A18
2.108811073 1 2.108811073 A19-0.19* (0/15)*A19
2.249767599 0 2.249767599 Same Value of A retained since no transition
2.627294516 -1 4.019760609 A21+0.53*(15/15)*A21
2.624337386 0 3.922509613 A22+0.53*(14/15)*A22
2.157110862 0 3.147943785 A23+0.53*(13/15)*A23
2.142325212 -1 3.050671102 A24+0.53*(12/15)*A24
2.124582433 -1 2.950336805 A25+0.53*(11/15)*A25
2.114725333 0 2.861928284 A26+0.53*(10/15)*A26
2.113739623 0 2.785908823 A27+0.53*(9/15)*A27
1.92054047 0 2.463413243 A28+0.53*(8/15)*A28
2.00037188 0 2.495130525 A29+0.53*(7/15)*A29
2.183995509 0 2.647002557 A30+0.53*(6/15)*A30
2.629451192 0 3.093987569 A31+0.53*(5/15)*A31
2.772756046 0 3.164638901 A32+0.53*(4/15)*A32
2.603141474 0 2.87907447 A33+0.53*(3/15)*A33
2.502385996 0 2.679221273 A34+0.53*(2/15)*A34
Edit
The code suggested below works exactly as required for the above mentioned dataframe i.e the dataframe with 33 rows, but I have the below data frame with 32rows and code doesnt work. Any suggestion on this?
ID A B
1 2.471264262 0
2 2.53024575 0
3 2.559114933 1
4 2.502350493 1
5 2.529496526 0
6 2.480199137 0
7 2.521066835 0
8 2.481272625 0
9 2.505953959 0
10 2.481272625 0
11 2.499424723 0
12 2.492515087 0
13 2.502385996 0
14 2.487579633 0
15 2.479438021 -1
16 2.044195946 1
17 2.054051421 0
18 2.108811073 1
19 2.249767599 0
20 2.627294516 -1
21 2.624337386 0
22 2.157110862 0
23 2.142325212 -1
24 2.124582433 -1
25 2.114725333 0
26 2.113739623 0
27 1.92054047 0
28 2.00037188 0
29 2.183995509 0
30 2.629451192 0
31 2.772756046 0
32 2.603141474 0
Well I was not able to post another question giving this post as the reference so I have updated iin the same post.
Thanks.
This should work, the counting to 15 is a little tricky, so we use a for loop to calculate the correct counter and state. The actual formula is then relatively simple:
counter <- 0
current_state <- NA
for (i in seq_along(df$B)) {
if (counter == 0) {
if (df$B[i] == 0) next
counter <- 15
current_state <- df$B[i]
df$state[i] <- df$B[i]
df$counter[i] <- counter
} else {
counter <- counter - 1
df$state[i] <- current_state
df$counter[i] <- counter
}
}
df$A_corr <- ifelse(df$state == 1,
df$A - 0.19 * (df$counter / 15) * df$A,
df$A + 0.53 * (df$counter / 15) * df$A)
df$A_corr <- ifelse(is.na(df$A_corr), df$A, df$A_corr)
Gives:
> df
ID A B state counter A_corr
1 1 2.471264 0 NA NA 2.471264
2 2 2.530246 0 NA NA 2.530246
3 3 2.559115 1 1 15 2.072883
4 4 2.502350 1 1 14 2.058600
5 5 2.529497 0 1 13 2.112973
6 6 2.480199 0 1 12 2.103209
7 7 2.521067 0 1 11 2.169798
8 8 2.481273 0 1 10 2.166978
9 9 2.505954 0 1 9 2.220275
10 10 2.481273 0 1 8 2.229837
11 11 2.499425 0 1 7 2.277809
12 12 2.492515 0 1 6 2.303084
13 13 2.502386 0 1 5 2.343902
14 14 2.487580 0 1 4 2.361542
15 15 2.479438 -1 1 3 2.385219
16 16 2.044196 1 1 2 1.992410
17 17 2.054051 0 1 1 2.028033
18 18 2.108811 1 1 0 2.108811
19 19 2.249768 0 NA NA 2.249768
20 20 2.627295 -1 -1 15 4.019761
21 21 2.624337 0 -1 14 3.922510
22 22 2.157111 0 -1 13 3.147944
23 23 2.142325 -1 -1 12 3.050671
24 24 2.124582 -1 -1 11 2.950337
25 25 2.114725 0 -1 10 2.861928
26 26 2.113740 0 -1 9 2.785909
27 27 1.920540 0 -1 8 2.463413
28 28 2.000372 0 -1 7 2.495131
29 29 2.183996 0 -1 6 2.647003
30 30 2.629451 0 -1 5 3.093988
31 31 2.772756 0 -1 4 3.164639
32 32 2.603141 0 -1 3 2.879074
33 33 2.502386 0 -1 2 2.679221
I have a data.frame with a factor identifying events
year event
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 1
10 0
11 0
12 0
13 0
14 0
15 0
16 0
17 1
18 0
19 0
20 0
And I would need a counter-type identifying a given window around the events. The result should look like this (for a window that is, for example, 3 periods around the event):
year event window
1 0
2 0
3 0
4 0
5 0
6 0 -3
7 0 -2
8 0 -1
9 1 0
10 0 1
11 0 2
12 0 3
13 0
14 0 -3
15 0 -2
16 0 -1
17 1 0
18 0 1
19 0 2
20 0 3
Any guidance on how to implement this within a function would be appreciated. You can copy the data. frame, pasting the block above in "..." here:
dt <- read.table( text="...", , header = TRUE )
Assuming there is no overlapping, you can use on of my favourite base functions, filter:
DF <- read.table(text="year event
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 1
10 0
11 0
12 0
13 0
14 0
15 0
16 0
17 1
18 0
19 0
20 0", header=TRUE)
DF$window <- head(filter(c(rep(0, 3), DF$event, rep(0, 3)),
filter=-3:3)[-(1:3)], -3)
DF$window[DF$window == 0 & DF$event==0] <- NA
# year event window
# 1 1 0 NA
# 2 2 0 NA
# 3 3 0 NA
# 4 4 0 NA
# 5 5 0 NA
# 6 6 0 -3
# 7 7 0 -2
# 8 8 0 -1
# 9 9 1 0
# 10 10 0 1
# 11 11 0 2
# 12 12 0 3
# 13 13 0 NA
# 14 14 0 -3
# 15 15 0 -2
# 16 16 0 -1
# 17 17 1 0
# 18 18 0 1
# 19 19 0 2
# 20 20 0 3
I have a dataframe with many rows, but the structure looks like this:
year factor
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 1
10 0
11 0
12 0
13 0
14 0
15 0
16 0
17 1
18 0
19 0
20 0
I would need to add a counter as a third column. It should count the cumulative cells that contains zero until it set again to zero once the value 1 is encountered. The result should look like this:
year factor count
1 0 0
2 0 1
3 0 2
4 0 3
5 0 4
6 0 5
7 0 6
8 0 7
9 1 0
10 0 1
11 0 2
12 0 3
13 0 4
14 0 5
15 0 6
16 0 7
17 1 0
18 0 1
19 0 2
20 0 3
I would be glad to do it in a quick way, avoiding loops, since I have to do the operations for hundreds of files.
You can copy my dataframe, pasting the dataframe in "..." here:
dt <- read.table( text="...", , header = TRUE )
Perhaps a solution like this with ave would work for you:
A <- cumsum(dt$factor)
ave(A, A, FUN = seq_along) - 1
# [1] 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3
Original answer:
(Missed that the first value was supposed to be "0". Oops.)
x <- rle(dt$factor == 1)
y <- sequence(x$lengths)
y[dt$factor == 1] <- 0
y
# [1] 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 0 1 2 3