I am struggling to recreate the following Excel logic in DAX:
IF(OR(IFERROR(--OFFSET(F2,,,-4)>0,)),"",D3)
Given the data of the index column and the Val column, the index is an index starting with 0, and val is an integer continuously increasing except 0. To calculate the preserveval column,
Logic:
From top to bottom, when Val = 0, preserveval = 0
When Val > 0, preserveval = Val, and then the following four consecutive values are set to blank,
Return to step 1
Examples I created in excel:
Any tips & solutions would be much appreciated. Thank you for your time.
Related
I'm working with a data frame "mydata" containing a variable "therapy" which is a factor (0/1). Another variable is "died" (1 if died, 0 if survived). There are no missing values, so every observation has a value for therapy and died.
Now I would like to alter the value of "therapy" based on the value of "died": If died == 1, the therapy should be set to 0 (so I want do replace the existing value), otherwise the value should stay unchanged.
mydata$therapy <- ifelse(mydata$died == 1,
0,
mydata$therapy)
As a result I get values that not only contain 0 and 1, but also 2 (therapy never contained any "2"). I assume that the increment by one is due to the factor type of "therapy". Also the following code with case_when leads to the same results:
mydata <- mydata %>%
mutate(therapy = case_when(
died == 1 ~ 0,
TRUE ~ therapy))
Does anybody have an idea, what I do wrong? Or does anybody have a solutation for just changing "treatment" to zero if died == 1 and keeping all values as they are if died == 0.
Thank you all for your answers!
Especially the comment by Ray was helpful - my problem was solved this way:
mydata$therapy <- ifelse(mydata$died == 1,
0,
as.numeric(levels(mydata$therapy[mydata$therapy]))
In the end I got the 0/1 values instead of the 1/2 values because of the factor.
I have identical two vectors, S and T.
S <- seq(from = 0, to = 80, by = 2)
T is the exact same. I am trying to create a data frame so that column one would be all of the S values (2 through 80) but column two would be all of the T values (2 through 80). However, I want it so that row one would be 0, 0. Row 2 would be 0, 2. Row 3 would be 0,4. etc. And then row 42 would be 2, 0. I believe it would be possible using a for loop, but I am struggling on how to accomplish this. Any advice would greatly help. I understand that there would be close to if not over 1000 rows, but I feel like there is a simple way to accomplish this.
Don't label variables T or t in R. T is a popular abbreviation of TRUE, and t is a function (transpose).
expand.grid() is probably what you're looking for.
S <- seq(from = 0, to=80, by=2)
TT <- S
expand.grid(S,TT)
Yes, it's big.
dim(expand.grid(S,TT))
[1] 1681 2
So let's say I have a list of data frames. Within each data frame, there is a column in which I want to create a new dummy column based on. This is how it works. For simplicity, let's just use vectors instead of a data frame in the example.
vect<-c(0, 0, 100, 100, 0, 0)
In this case, the dummy column created would be as follows:
dummy_vect<- c(0, 0, 0, 0, 1, 1)
The dummy essentially occurs in the indexes only after the last value in vect. I have the code written to do this and it works without any issues. The big issue I'm running into occurs in the rare instance when all of vect is 0s
vect<-c(0,0,0,0,0,0)
For the context of the problem, when this case occurs, I need the dummy columns to be 1 at every instance.
How would I translate this into code? So if every value in vect is 0, return all 1s in the dummy column, else just do the code I've written that works for other cases. Any help is greatly appreciated! It might be something simple and I'm just really over thinking it, but I don't know how to set the if condition up properly at all
Take absolute values, reverse the input and take the cumulative sum. Finally change the 0 values to TRUE, reverse and convert to numeric.
vect <- c(0, 0, 100, 100, 0, 0)
+rev(cumsum(rev(abs(vect))) == 0)
## [1] 0 0 0 0 1 1
+rev(cumsum(rev(abs(0*vect))) == 0) # 0*vect is all 0 input
## [1] 1 1 1 1 1 1
Just found a condition in an if statement that looks as though it is working.
if(all(df$x == 0){
df$dummy_col = 1
}else{
The code that does the process for all other cases...
}
I am trying to cleaning up some data in a huge dataset.
One column holds values for the Sales aamount. Example values could be like those:
Clean Data:
Sales Potential
230
120
300
However, at some points there appear something like this:
Dirty Data
0, 0, 0, 0, 0
4, 0, 0, 0
0, 0, 480
0, 200, 0
In the first case of the dirty data the cell shoul only contains a zero: 0
In all other cases I would like to extract, if there is any non-zero number, this number and replace the cell with this value or add a new cleanded-column.
So the dirty data cleaned up:
Cleaned Data:
0
4
480
200
My approach was using RegExpressions in R as I am loading the data into Power-BI using Power-Query.
I tried to find a pattern where I extract the value I am looking for and place it in a new column. However, my resutls looks like nothing.
Is there maybe a much simpler approach to achieve this in R?
Code so far:
library(stringr)
OutputRegEx <- data.frame(MyDataset)
Splitter = function(x) substr(str_extract(x,'[1-9]'),1,7)
OutputRegEx[["RegExAuswertung"]] <- apply(OutputRegEx[43],1, function(x) Splitter(x) )
In Powerquery, insert a custom column with below formula
=List.Max(List.Transform(Text.Split(Text.From([Sales Potential]),","), each Number.FromText(_)))
The formula splits everything on commas, puts into a list, converts the list from text into numbers, then takes the maximum number from the list.
This R solution seems to do what you want:
SalesPotential <- c("0, 0, 0, 0, 0", "4, 0, 0, 0","0, 0, 480","0, 200, 0")
library(stringr)
str_extract(gsub(",", "", SalesPotential), "(?=(0\\s){4})\\d+|[1-9]+(0{1,})?")
[1] "0" "4" "480" "200"
Using gsubthis solution first removes the commas in gsub(",", "", SalesPotential) submitting this edited vector to str_extract. It then goes on to define two patterns, one for values where there are no other numbers but 0, another for values that start with non-0digits and may have one or more 0s at the end.
If you want to have clean numbers, convert to numeric:
as.numeric(str_extract(gsub(",", "", SalesPotential), "(?=(0\\s){4})\\d+|[1-9]+(0{1,})?"))
[1] 0 4 480 200
Well, you can achieve the desired result in Power Query itself either by using M-formula language or, by using the GUI itself.
Let me tell you the simplest approach.
If I am correct then, the column has some cleaned numbers and some comma delimited numbers.
So what you do is
Split the column by comma for each occurance.
So, you will get (n+1)-number of columns if the maximum no. Of comman in any cell is "n"
Now, you have to create a conditional column that checks for numbers greater than zero in all these columns and gives the output.
Bhmy doing so,you will get non-zero numbers in that calculated column for dirty data and the same number for the cleaned data.
After doing that you can delete all those comma delimited columns and keep the conditional column only.
Now the formula should be as follows :
if delcol1 <> 0 then delcol1 elseif delcol2 <> 0 then delcol2 elseif.......
delcol2 <> 0 then delcoln
This is the easiest way out of the probelm that I can think of.
However, there are other alternatives also for getting the same answer.
I'm quite new to R, unfortunately I wasn't able to find help in other related questions so far.
I have this dataframe called selection, including column 'RUN' and column 'TRNO'.
It originally had 9 columns. I added the column 'RUN' which contains a count that increases by 1 whenever the value in the column 'DAP' is 0, using this code:
# Insert column RUN in "selection" dataframe
library(dplyr)
selection$RUN <- cumsum(selection$DAP == 0)
That worked perfectly. Now I would like to do a similar operation for the column 'TRNO'. It also needs to contain a count that this time only increases when the column 'RUN' arrives at multiples of 80 (i.e. from RUN == 1-80 --> count =1; RUN == 81-160 --> count =2,...)
I tried several codes, amongst others this one:
# Insert column TRNO in "selection" dataframe
i = 0
repeat{
i = i+80
selection$TRNO <- cumsum(selection$RUN == i)
break
}
Instead of increasing the count at every multiple of 80, it returns "0" when RUN values are between 1-80, increases to 92 when RUN values are at 80, and then stagnates at 92 for all the higher values in RUN.
try this:
selection$TRONO <- ceiling(selection$RUN/80)