I'm having a bit of a struggle trying to figure out how to do the following. I want to map how many days of high sales I have previously a change of price. For example, I have a price change on day 10 and the high sales indicator will tell me any sale greater than or equal to 10. Need my algorithm to count the number of consecutive high sales.
In this case it should return 5 (day 5 to 9)
For example purposes, the dataframe is called df. Code:
#trying to create a while loop that will check if lag(high_sales) is 1, if yes it will count until
#there's a lag(high_sales) ==0
#loop is just my dummy variable that will take me out of the while loop
count_sales<-0
loop<-0
df<- df %>% mutate(consec_high_days= ifelse(price_change > 0, while(loop==0){
if(lag(High_sales_ind)==1){
count_sales<-count_sales +1}
else{loop<-0}
count_sales},0))
day
price
price_change
sales
High_sales_ind
1
5
0
12
1
2
5
0
6
0
3
5
0
5
0
4
5
0
4
0
5
5
0
10
1
6
5
0
10
1
7
5
0
10
1
8
5
0
12
1
9
5
0
14
1
10
7
2
3
0
11
7
0
2
0
This is my error message:
Warning: Problem with mutate() column consec_high_days.
i consec_high_days = ifelse(...).
i the condition has length > 1 and only the first element will be used
Warning: Problem with mutate() column consec_high_days.
i consec_high_days = ifelse(...).
i 'x' is NULL so the result will be NULL
Error: Problem with mutate() column consec_high_days.
i consec_high_days = ifelse(...).
x replacement has length zero
Any help would be greatly appreciated.
This is a very inelegant brute-force answer, though hopefully someone better than me can provide a more elegant answer - but to get the desired dataset, you can try:
df <- read.table(text = "day price price_change sales High_sales_ind
1 5 0 12 1
2 5 0 6 0
3 5 0 5 0
4 5 0 4 0
5 5 0 10 1
6 5 0 10 1
7 5 0 10 1
8 5 0 12 1
9 5 0 14 1
10 7 2 3 0
11 7 0 2 0", header = TRUE)
# assign consecutive instances of value
df$seq <- sequence(rle(as.character(df$sales >= 10))$lengths)
# Find how many instance of consecutive days occurred before price change
df <- df %>% mutate(lseq = lag(seq))
# define rows you want to keep and when to end
keepz <- df[df$price_change != 0, "lseq"]
end <- as.numeric(rownames(df[df$price_change != 0,]))-1
df_want <- df[keepz:end,-c(6:7)]
Output:
# day price price_change sales High_sales_ind
# 5 5 5 0 10 1
# 6 6 5 0 10 1
# 7 7 5 0 10 1
# 8 8 5 0 12 1
# 9 9 5 0 14 1
This question already has an answer here:
Bulk update in subset obtained from dataframe filtering [duplicate]
(1 answer)
Closed 3 years ago.
My usecase involve me to filter a dataframe with some condition. Once I get the subset dataframe, I want to traverse through the subset one row at a time and checking for certain condition and updating a value in that particular row.
Here is my implementation:
> sales_data[sales_data$month == 1 & sales_data$dept_name == 1,]
emp_name month dept_name revenue status n_points x_partition y_partition x y
1 Sam 1 1 100 Low 9 3 3 0 0
7 Kenneth 1 1 500 Very High 9 3 3 0 0
11 Jonathan 1 1 500 Low 9 3 3 0 0
12 Sam 1 1 100 Low 9 3 3 0 0
18 Kenneth 1 1 500 Very High 9 3 3 0 0
22 Jonathan 1 1 500 Low 9 3 3 0 0
23 Sam 1 1 100 Low 9 3 3 0 0
29 Kenneth 1 1 500 Very High 9 3 3 0 0
33 Jonathan 1 1 500 Low 9 3 3 0 0
Now, my subset dataframe has 9 rows. So, a for loop:
for(i in 1:nrow(sales_data[sales_data$month == 1 & sales_data$dept_name == 1, ] )) {
#Here I want to update the value of column named x with i
sales_data[sales_data$month == month_item & sales_data$dept_name == dept_item, ][i]$x <- x_vector_data[i] ##NOT CORRECT APPROACH
}
Why loop, maybe:
sales_data[sales_data$month == 1 & sales_data$dept_name == 1, "x"] <- x_vector_data
I’m in the process of cleaning some data for a survival analysis and I am trying to make it so that an individual only has a single, sustained, transition from symptom present (ss=1) to symptom remitted (ss=0). An individual must have a complete sustained remission in order for it to count as a remission. Statistical problems/issues aside, I’m wondering how I can go about addressing the issues detailed below.
I’ve been trying to break the problem apart into smaller, more manageable operations and objects, however, the solutions I keep coming to force me to use conditional formatting based on rows immediately above and below the a missing value and, quite frankly, I’m at a bit of a loss as to how to do this. I would love a little guidance if you think you know of a good technique I can use, experiment with, or if you know of any good search terms I can use when looking up a solution.
The details are below:
#Fake dataset creation
id <- c(1,1,1,1,1,1,1,2,2,2,2,2,2,2,3,3,3,3,3,3,3,4,4,4,4,4,4,4)
time <-c(0,1,2,3,4,5,6,0,1,2,3,4,5,6,0,1,2,3,4,5,6,0,1,2,3,4,5,6)
ss <- c(1,1,1,1,NA,0,0,1,1,0,NA,0,0,0,1,1,1,1,1,1,NA,1,1,0,NA,NA,0,0)
mydat <- data.frame(id, time, ss)
*Bold and underlined characters represent changes from the dataset above
The goal here is to find a way to get the NA values for ID #1 (variable ss) to look like this: 1,1,1,1,1,0,0
ID# 2 (variable ss) to look like this: 1,1,0,0,0,0,0
ID #3 (variable ss) to look like this: 1,1,1,1,1,1,NA (no change because the row with NA will be deleted eventually)
ID #4 (variable ss) to look like this: 1,1,1,1,1,0,0 (this one requires multiple changes and I expect it is the most challenging to tackle).
I don't really think you have considered all the "edge case". What to do with two NA's in a row at the end of a period or 4 or 5 NA's in a row. This will give you the requested solution in your tiny test case, however, using the na.locf-function:
require(zoo)
fillNA <- function(vec) { if ( is.na(tail(vec, 1)) ){ vec } else { vec <- na.locf(vec) }
}
> mydat$locf <- with(mydat, ave(ss, id, FUN=fillNA))
> mydat
id time ss locf
1 1 0 1 1
2 1 1 1 1
3 1 2 1 1
4 1 3 1 1
5 1 4 NA 1
6 1 5 0 0
7 1 6 0 0
8 2 0 1 1
9 2 1 1 1
10 2 2 0 0
11 2 3 NA 0
12 2 4 0 0
13 2 5 0 0
14 2 6 0 0
15 3 0 1 1
16 3 1 1 1
17 3 2 1 1
18 3 3 1 1
19 3 4 1 1
20 3 5 1 1
21 3 6 NA NA
22 4 0 1 1
23 4 1 1 1
24 4 2 0 0
25 4 3 NA 0
26 4 4 NA 0
27 4 5 0 0
28 4 6 0 0
I have a data frame which has 2 columns - A & B. I want to replace the values of column B in such a way that, when the VALUE>=5 replace with 1, else replace with 0.
Note - There are 2 conditions to be checked.
X=read.csv("Y:/impdat.csv")
A B
3 16
12 3
1 2
12 9
4 4
5 6
21 1
4 14
3 10
12 1
So after replacing, the data should be
A B
3 1
12 0
1 0
12 1
4 0
5 1
21 0
4 1
3 1
12 0
Sounds simple. But I am unable to implement it.
I tried
ifelse(X$B>=5,1,0)
This only prints the new values, but the original data remains the same.
X$B <- as.integer(X$B >= 5)
will do the trick.
transform(X, B=ifelse(B>=5,1,0))
Got it.
Just had to assign the object.
X$B=ifelse(X$B>=5,1,0)
i want to write a code that checks two columns in a dataframe and compares them. one is supposed to have lower limit and the other upper limits. if values on the upper limit columns are less than on the lower limit, them it should interchange the values. if both lower and upper limits are zero, it should replace the upper limit column with a value say 2. a sample data is as below:
lower_limit upper_limit
0 3
0 4
5 2
0 15
0 0
0 0
7 4
8 2
after running the code, it should produce something like
lower_limit upper_limit
0 3
0 4
2 5
0 15
0 2
0 2
4 7
2 8
dfrm <- read.table(text="lower_limit upper_limit
0 3
0 4
5 2
0 15
0 0
0 0
7 4
8 2", header=TRUE)
dfrm2 <- dfrm
dfrm2[,2] <- pmax(dfrm[,1], dfrm[,2] )
dfrm2[,1] <- pmin(dfrm[,1], dfrm[,2] );
dfrm2[abs(pmax(dfrm[,1],dfrm[,2]))==0 , 2] <- 2
> dfrm2
lower_limit upper_limit
1 0 3
2 0 4
3 2 5
4 0 15
5 0 2
6 0 2
7 4 7
8 2 8
Assuming dat is the name of your data frame/matrix:
setNames(as.data.frame(t(apply(dat, 1, function(x) {
tmp <- sort(x);
tmp[2] <- tmp[2] + (!any(x)) * 2;
return(tmp) }))), colnames(dat))
lower_limit upper_limit
1 0 3
2 0 4
3 2 5
4 0 15
5 0 2
6 0 2
7 4 7
8 2 8
How it works?
The function apply is used to apply a function to each line (argument 1). In this function, x represents a line of dat. Firstly, the values are ordered (with sort) and stored in the object tmp. Then, the second value of tmp is replaced with 2 if both values are 0. Finally, tmp is returned. The function apply returns the results as matrix, which needs to be transposed (with t). This matrix is transformed to a data frame (as.data.frame) with the same column names as the original object dat (with setNames).