How many 1's in last 4 days? - count

I would like to count or sum how many 1's in last four days (1/1 to 1/4) for each name (A, B, C, D) in Informatica Developer. Advise please!!

simply create an exp transformation and a new output port. Formula should be like this-
o_how_many_1s= day1 + day2+day3+day4
Now, you said, all the day* values can be wither 0 or 1, so adding them will give you count of 1s.
If you have null in these fields, then you can use IIF(isnull(day1),0,day1) logic while adding.
If you have some numbers other than 0,1, then, then you need to use something specific like this
o_how_many_1s= IIF(day1=1,1,0) + IIF(day2=1,1,0)+IIF(day3=1,1,0)+IIF(day4=1,1,0)

Related

Finding the percentage of a specific value in the column of a data set

I have a dataset called college, and one of the columns is 'accepted'. There are two values for this column - 1 (which means student was accepted) and 0 (which means student was not accepted). I was to find the accepted student percentage.
I did this...
table(college$accepted)
which gave me the frequency of 1 and 0. (1 = 44,224 and 0 = 75,166). I then manually added those two values together (119,390) and divided the 44,224/119,390. This is fine and gets me the value I was looking for. But I would really like to know how I could do this with R code, since I'm sure there is a way to do it that I just haven't thought of.
Thanks!
Perhaps you can use prop.table like below
prop.table(table(college$accepted))["1"]
If it's a simple 0/1 column then you only need take the column mean.
mean_accepted <- mean(df$accepted)
you could first sum the column, and the count the total number in the column
sum(college$accepted)/length(college$accepted)
To make the code more explicit and describe your intent better, I suggest using a condition to identify the cases that meet your criteria for inclusion. For example:
college$accepted == 1
Then take the average of the logical vector to compute the proportion (between 0 and 1), multiply by 100 to make it a percentage.
100 * mean(college$accepted == 1, na.rm = TRUE)

How to create a new column with repeated values based on another column?

Here is what I currently Have. I have a column named "test1M", which has values of either 0 or 1. If the value is 1, I want to set the next 20 values in column "test1Mxx" to value 1.
If I run this code, I get an error of (Error in if (data$test1M[x] == 1) { : argument is of length zero).
Whats a better way for me to do this? The code is pretty repetitive, so I would like to minimize that if possible. If there is a way to turn this into a function that would be preferable, so I could change the number of values (for instance, maybe the following 25 values, or 40 values, etc.)
for(x in data$test1){
if(data$test1[x]==1){
data$test2[x+1]=1
data$test2[x+2]=1
data$test2[x+3]=1
data$test2[x+4]=1
data$test2[x+5]=1
data$test2[x+6]=1
data$test2[x+7]=1
data$test2[x+8]=1
data$test2[x+9]=1
data$test2[x+10]=1
data$test2[x+11]=1
data$test2[x+12]=1
data$test2[x+13]=1
data$test2[x+14]=1
data$test2[x+15]=1
data$test2[x+16]=1
data$test2[x+17]=1
data$test2[x+18]=1
data$test2[x+19]=1
data$test2[x+20]=1}
}
Your loop doesn't work because x is a value of data$test1, not an index of it. You need something like:
data$test2 <- data$test1
for (x in seq_along(data$test1))
if (data$test1[x] == 1) data$test2[x + 1:20] <- 1

Is it possible to create a countif like function in R using ranges?

I've already read this question with an approach to counting entries in R:
how to realize countifs function (excel) in R
I'm looking for a similar approach, except that I want to count data that is within a given range.
For example, let's say I have this dataset:
data <- data.frame( values = c(1,1.2,1.5,1.7,1.7,2))
Following the approach on the linked question, we would develop something like this:
count <- data$values == 1.5
sum(count)
Problem is, I want to be able to include in the count anything that varies 0.2 from 1.5 - that is, all possible number from 1.3 to 1.7.
Is there a way to do so?
sum(data$values>=1.3 & data$values<=1.7)
As the explanation in the question you linked to points out, when you just write out a boolean condition, it generates a vector of TRUEs and FALSEs the same length as your original dataframe. TRUE equals 1 and FALSE equals 0, so summing across it gives you a count. So it simply becomes a matter of putting your condition as a boolean phrase. In the case of more than one condition, you connect them with & or | (or) -- much the same way that you could do in excel (only in excel you have to do AND() or OR()).
(For a more general solution, you can use dplyr::between - it's also supposed to be faster since it's implemented in C++. In this case, it would be sum(between(data$values,1.3,1.7).)
Like #doviod writes, you can use a compound logical condition.
My approach is different, I wrote a function that takes the vector and as range the center point value and the distance delta.
After a suggestion by #doviod, I have set a default value delta = 0, so that if only value is passed, the function returns
a count of cases where the values equal the value the user provides.
(doviod, in the comment)
countif <- function(x, value, delta = 0)
sum(value - delta <= x & x <= value + delta)
data <- data.frame( values = c(1,1.2,1.5,1.7,1.7,2))
countif(data$values, 1.5, 0.2)
#[1] 3
which identifies the location of all values in your vector that satisfy your criterion, and length subsequently counts the 'hits'.
length( which(data$values>=1.3 & data$values<=1.7) )
[1] 3

Confusing LOD expressions in Tableau

I have the following data structure:
Scope,Metric ID,Item ID,System,Color
TRUE,A1,123,A,Red
FALSE,A1,123,B,Red
FALSE,B1,234,C,Red
TRUE,B1,234,A,Red
FALSE,B1,415,A,Red
I'd like to group by Scope, filter on TRUE and get the unique list of Items, then count these Items and subtract from a total unique count for the Color = Red.
So, in the example above, I have 3 unique items for Color = Red and I have 2 unique items with Scope = TRUE, so the result should say 3 - 2 = 1.
Because of the data structure, simple filtering won't help. I realize I need to use a complex LOD syntax, but after having tried them for a few hours, I find them rather confusing.
Does anyone have an idea how to write an LOD expression to give me the desired count? Thanks!
Did you try using 3 calculated fields like this:
then doing a count distinct on them.
1:
if [Color]='Red' then [Item ID] end
2:
if [Scope]='TRUE' then [Item ID] end
3 :
subtract the 2 calculated fields i,e 2-1
It gives out 1.

Calculated measure that combines two values to form a fractional value [Analysis Services]

In my cube I have two measure: A that stores values of the form aaaaa and B that stores values like bbbbb. I want to define measure C that will give the values aaaaa.bbbbb. How can I achieve this?
If I understand correctly your question, you could do something like this:
with member [Measures].[A] as 123
member [Measures].[B] as 456
member [Measures].[C] as val(cstr([Measures].[A]) + "." + cstr([Measures].[B]))
select {[Measures].[A], [Measures].[B], [Measures].[C]} on 0
from [YourCube]

Resources