interchanging values after comparing two columns in R - r

i want to write a code that checks two columns in a dataframe and compares them. one is supposed to have lower limit and the other upper limits. if values on the upper limit columns are less than on the lower limit, them it should interchange the values. if both lower and upper limits are zero, it should replace the upper limit column with a value say 2. a sample data is as below:
lower_limit upper_limit
0 3
0 4
5 2
0 15
0 0
0 0
7 4
8 2
after running the code, it should produce something like
lower_limit upper_limit
0 3
0 4
2 5
0 15
0 2
0 2
4 7
2 8

dfrm <- read.table(text="lower_limit upper_limit
0 3
0 4
5 2
0 15
0 0
0 0
7 4
8 2", header=TRUE)
dfrm2 <- dfrm
dfrm2[,2] <- pmax(dfrm[,1], dfrm[,2] )
dfrm2[,1] <- pmin(dfrm[,1], dfrm[,2] );
dfrm2[abs(pmax(dfrm[,1],dfrm[,2]))==0 , 2] <- 2
> dfrm2
lower_limit upper_limit
1 0 3
2 0 4
3 2 5
4 0 15
5 0 2
6 0 2
7 4 7
8 2 8

Assuming dat is the name of your data frame/matrix:
setNames(as.data.frame(t(apply(dat, 1, function(x) {
tmp <- sort(x);
tmp[2] <- tmp[2] + (!any(x)) * 2;
return(tmp) }))), colnames(dat))
lower_limit upper_limit
1 0 3
2 0 4
3 2 5
4 0 15
5 0 2
6 0 2
7 4 7
8 2 8
How it works?
The function apply is used to apply a function to each line (argument 1). In this function, x represents a line of dat. Firstly, the values are ordered (with sort) and stored in the object tmp. Then, the second value of tmp is replaced with 2 if both values are 0. Finally, tmp is returned. The function apply returns the results as matrix, which needs to be transposed (with t). This matrix is transformed to a data frame (as.data.frame) with the same column names as the original object dat (with setNames).

Related

Data Frame- Add number of occurrences with a condition in R

I'm having a bit of a struggle trying to figure out how to do the following. I want to map how many days of high sales I have previously a change of price. For example, I have a price change on day 10 and the high sales indicator will tell me any sale greater than or equal to 10. Need my algorithm to count the number of consecutive high sales.
In this case it should return 5 (day 5 to 9)
For example purposes, the dataframe is called df. Code:
#trying to create a while loop that will check if lag(high_sales) is 1, if yes it will count until
#there's a lag(high_sales) ==0
#loop is just my dummy variable that will take me out of the while loop
count_sales<-0
loop<-0
df<- df %>% mutate(consec_high_days= ifelse(price_change > 0, while(loop==0){
if(lag(High_sales_ind)==1){
count_sales<-count_sales +1}
else{loop<-0}
count_sales},0))
day
price
price_change
sales
High_sales_ind
1
5
0
12
1
2
5
0
6
0
3
5
0
5
0
4
5
0
4
0
5
5
0
10
1
6
5
0
10
1
7
5
0
10
1
8
5
0
12
1
9
5
0
14
1
10
7
2
3
0
11
7
0
2
0
This is my error message:
Warning: Problem with mutate() column consec_high_days.
i consec_high_days = ifelse(...).
i the condition has length > 1 and only the first element will be used
Warning: Problem with mutate() column consec_high_days.
i consec_high_days = ifelse(...).
i 'x' is NULL so the result will be NULL
Error: Problem with mutate() column consec_high_days.
i consec_high_days = ifelse(...).
x replacement has length zero
Any help would be greatly appreciated.
This is a very inelegant brute-force answer, though hopefully someone better than me can provide a more elegant answer - but to get the desired dataset, you can try:
df <- read.table(text = "day price price_change sales High_sales_ind
1 5 0 12 1
2 5 0 6 0
3 5 0 5 0
4 5 0 4 0
5 5 0 10 1
6 5 0 10 1
7 5 0 10 1
8 5 0 12 1
9 5 0 14 1
10 7 2 3 0
11 7 0 2 0", header = TRUE)
# assign consecutive instances of value
df$seq <- sequence(rle(as.character(df$sales >= 10))$lengths)
# Find how many instance of consecutive days occurred before price change
df <- df %>% mutate(lseq = lag(seq))
# define rows you want to keep and when to end
keepz <- df[df$price_change != 0, "lseq"]
end <- as.numeric(rownames(df[df$price_change != 0,]))-1
df_want <- df[keepz:end,-c(6:7)]
Output:
# day price price_change sales High_sales_ind
# 5 5 5 0 10 1
# 6 6 5 0 10 1
# 7 7 5 0 10 1
# 8 8 5 0 12 1
# 9 9 5 0 14 1

How to create a complex running calculation on an R data table

I want to create a running calculation that includes logic to restart the running sum when the value is negative. Initially I have a data table or frame like below :
df <- data.frame(value1 = c(0,0,10,0,1,0,2,0)
, value2 = c(5,1,2,6,8,3,7,2))
value1 value2
0 5
0 1
10 2
0 6
1 8
0 3
2 7
0 2
I would like to take the cumulative sum of value2 subtracted by value1. However, if the new value is less than 0, then start the running calculation over.
i.e. end up with
value1 value2 newvalue
0 5 5
0 1 6
10 2 2
0 6 8
1 8 15
0 3 18
2 7 23
0 2 25
I tried multiple attempts with data.table and dplyr packages with no luck.
EDIT: Updated df to match the actual table shown.
I am sure there are other simpler ways to do this by tweaking cumsum or other such functions, but I came up with this basic loop to produce the desired output. Hope it helps !!
> df
GroupID value1 value2
1 1 0 5
2 1 0 1
3 1 10 2
4 2 0 6
5 2 1 8
6 3 0 3
7 3 2 7
8 3 0 2
for(i in 1:nrow(df)) {
if(i == 1) {
df$newvalue[i] <- df$value2[i]
} else {
df$newvalue[i] <- (df$newvalue[i-1] + df$value2[i]) - df$value1[i]
if(df$newvalue[i] < 0 | df$GroupID[i] != df$GroupID[i-1]) {
df$newvalue[i] <- df$value2[i]
}
}
}
> df
GroupID value1 value2 newvalue
1 1 0 5 5
2 1 0 1 6
3 1 10 2 2
4 2 0 6 6
5 2 1 8 13
6 3 0 3 3
7 3 2 7 8
8 3 0 2 10
I believe that explicitly looping through the data frame is the only solution for calculating this type of conditional cumulative sum. Sagar's solution was very helpful to me (I up-voted but do not have enough reputation points for it to count).
In my experience, new value needs to be initialized prior to starting the loop in order to work properly. Below is how I would approach this:
df$newvalue <- df$value2
for(i in 2:nrow(df)) {
if(df$GroupID[i] == df$GroupID[i-1]) {
df$newvalue[i] <- max(df$newvalue[i-1] + df$value2[i]) - df$value1[i], df$value2[i])
}
}

How to find and remove columns containing more than k consecutive zeros in R data.frame?

I have a huge data.frame with around 200 variables, each represented by a column. Unfortunately, the data is sourced from a poorly formatted data dump (and hence can't be modified) which represents both missing values and zeroes as 0.
The data has been observed every 5 minutes for a month, and a day-long period of only 0s can be reasonably thought of as a day where the counter was not functioning, thereby leading to the conclusion that those 0s are actually NAs.
I want to find (and remove) columns that have at least 288 consecutive 0s at any point. Or, more generally, how can we remove columns from a data.frame containing >=k consecutive 0s?
I'm relatively new to R, and any help would be greatly appreciated. Thanks!
EDIT: Here is a reproducible example. Considering k=4, I would like to remove columns A and B (but not C, since the 0s are not consecutive).
df<-data.frame(A=c(4,5,8,2,0,0,0,0,6,3), B=c(3,0,0,0,0,6,8,2,1,0), C=c(4,5,6,0,3,0,2,1,0,0), D=c(1:10))
df
A B C D
1 4 3 4 1
2 5 0 5 2
3 8 0 6 3
4 2 0 0 4
5 0 0 3 5
6 0 6 0 6
7 0 8 2 7
8 0 2 1 8
9 6 1 0 9
10 3 0 0 10
You can use this function on your data:
cons.Zeros <- function (x, n)
{
x <- x[!is.na(x)] == 0
r <- rle(x)
any(r$lengths[r$values] >= n)
}
This function returns TRUE for the columns that need to be dropped. n is the number of consecutive zeros that you want the column to be dropped for.
For your sample dataset let's use n = 3;
df.dropped <- df[, !sapply(df, cons.Zeros, n=3)]
#output:
# > df.dropped
# C D
# 1 4 1
# 2 5 2
# 3 6 3
# 4 0 4
# 5 3 5
# 6 0 6
# 7 2 7
# 8 1 8
# 9 0 9
# 10 0 10

R - Conditional replacement of column values in a data frame

I have a data frame which has 2 columns - A & B. I want to replace the values of column B in such a way that, when the VALUE>=5 replace with 1, else replace with 0.
Note - There are 2 conditions to be checked.
X=read.csv("Y:/impdat.csv")
A B
3 16
12 3
1 2
12 9
4 4
5 6
21 1
4 14
3 10
12 1
So after replacing, the data should be
A B
3 1
12 0
1 0
12 1
4 0
5 1
21 0
4 1
3 1
12 0
Sounds simple. But I am unable to implement it.
I tried
ifelse(X$B>=5,1,0)
This only prints the new values, but the original data remains the same.
X$B <- as.integer(X$B >= 5)
will do the trick.
transform(X, B=ifelse(B>=5,1,0))
Got it.
Just had to assign the object.
X$B=ifelse(X$B>=5,1,0)

How can I read a matrix with missing end elements in R?

I would like to pass to R a txt file with a matrix, where tailing zeros are omitted from rows (except the first tailing zero, if any). Those missing values considered as zeros.
for example:
8 7 0
5 4 3 2 1
4 8 9
should be read as:
8 7 0 0 0
5 4 3 2 1
4 8 9 0 0
The max row size (i.e. the number of matrix columns) is unknown prior to reading the matrix.
d <- as.matrix(read.table(filename, fill=T))
d[is.na(d)] <- 0

Resources