How many 1's are there in selected column/row in Informatica? - count

I am very new to Informatica. Now I would like to Count how many 1's are there in selected row in Informatica, which we could do easily in excel using COUNTIF formula.
For example, column1 - a,b,c,d ; column2 - 1,2,1,4 ; column3 - 1,1,3,5
How to write expression in Informatica to calculate how many 1's are there in each row (a,b,c,d)?

occurrence of any string can be calculated like this -
length(column1 ) - length(replacechr(column1,'1',''))
Take the total length of string and then minus length of everything else in the string.
Use an exp transformation and calculate how many 1 exist in column column1 ,column2,column3...

Related

R programming- adding column in dataset error

cv.uk.df$new.d[2:nrow(cv.uk.df)] <- tail(cv.uk.df$deaths, -1) - head(cv.uk.df$deaths, -1) # this line of code works
I wanted to know why do we -1 in the tail and -1 in head to create this new column.
I made an effort to understand by removing the -1 and "R"(The code is in R studio) throws me this error.
Could anyone shed some light on this? I can't explain how much I would appreciate it.
Look at what is being done. On the left-hand side of the assignment operator, we have:
cv.uk.df$new.d[2:nrow(cv.uk.df)] <-
Let's pick this apart.
cv.uk.df # This is the data.frame
$new.d # a new column to assign or a column to reassign
[2:nrow(cv.uk.df)] # the rows which we are going to assign
Specifically, this line of code will assign a new value all rows of this column except the first. Why would we want to do that? We don't have your data, but from your example, it looks like you want to calculate the change from one line to the next. That calculation is invalid for the first row (no previous row).
Now let's look at the right-hand side.
<- tail(cv.uk.df$deaths, -1) - head(cv.uk.df$deaths, -1)
The cv.uk.df$deaths column has the same number of rows as the data.frame. R gets grouchy when the numbers of elements don't follow sum rules. For data.frames, the right-hand side needs to have the same number of elements, or a number that can be recycled a whole-number of times. For example, if you have 10 rows, you need to have a replacement of 10 values. Or you can have 5 values that R will recycle.
If your data.frame has 100 rows, only 99 are being replaced in this operation. You cannot feed 100 values into an operation that expects 99. We need to trim the data. Let's look at what is happening. The tail() function has the usage tail(x, n), where it returns the last n values of x. If n is a negative integer, tail() returns all values but the first n. The head() function works similarly.
tail(cv.uk.df$deaths, -1) # This returns all values but the first
head(cv.uk.df$deaths, -1) # This returns all values but the last
This makes sense for your calculation. You cannot subtract the number of deaths in the row before the first row from the number in the first row, nor can you subtract the number of deaths in the last row from the number in the row after the last row. There are more intuitive ways to do this thing using functions from other packages, but this gets the job done.

R: how to conditionally replace rows in data frame with randomly sampled rows from another data frame?

I need to conditionally replace rows in a data frame (x) with rows selected at random from another data frame (y).Some of the rows between the two data frames are the same and so data frame x will contain rows with repeated information. What sort of base r code would I need to achieve this?
I am writing an agent based model in r where rows can be thought of as vectors of attributes pertaining to an agent and columns are attribute types. For agents to transmit their attributes they need to send rows from one data frame (population) to another, but according to conditional learning rules. These rules need to be: conditionally replace values in row n in data frame x if attribute in column 10 for that row is value 1 or more and if probability s is greater than a randomly selected number between 0 and 1. Probability s is itself an adjustable parameter that can take any value from 0 to 1.
I have tried IF function in the code below, but I am new to r and have made a mistake somewhere with it as I get this warning:
"missing value where TRUE/FALSE needed"
I reckon that I have not specified what should happen to a row if the conditions are not satisfied.
I cannot think of an alternative method of achieving my aim.
Note: agent.dat is data frame x and top_ten_percent is data frame y.
s = 0.7
N = nrow(agent.dat)
copy <- runif(N) #to generate a random probability for each row in agent.dat
for (i in 1:nrow(agent.dat)){
if(agent.dat[,10] >= 1 & copy < s){
agent.dat <- top_ten_percent[sample(nrow(top_ten_percent), 1), ]
}
}
The agent.dat data frame should have rows that are replaced with values from rows in the top_ten_percent data frame if the randomly selected value of copy between 0 and 1 for that row is less than the value of parameter s and if the value for that row in column 10 is 1 or more. For each row I need to replace the first 10 columns of agent.dat with the first 10 columns of top_ten_percent (excluding column 11 i.e. copy value).
Assistance with this problem is greatly appreciated.
So you just need to change a few things.
You need to get a particular value for copy for each iteration of the for loop (use: copy[i]).
You also need to make the & in the if statement an && (Boolean operators && and ||)
Then you need to replace a particular row (and columns 1 through 10) in agent.dat, instead of the whole thing (agent.dat[i,1:10])
So, the final code should look like:
copy <- runif(N)
for (i in 1:nrow(agent.dat)){
if(agent.dat[,10] >= 1 && copy[i] < s){
agent.dat[i,1:10] <- top_ten_percent[sample(nrow(top_ten_percent), 1), ]
}
}
This should fix your errors, assuming your data structure fits your code:
copy <- runif(nrow(agent.dat))
s <- 0.7
for (i in 1:nrow(agent.dat)){
if(agent.dat[i,10] >= 1 & copy[i] < s){
agent.dat[i,] <- top_ten_percent[sample(1:nrow(top_ten_percent), 1), ]
}
}

In R, how can you check if a record in one column matches a value, and if so, alter a record in another column?

I have the following data in R:
name coltype x y
ADDL_AUTH_AMT DECIMAL 11 2
BILL_NAME CHAR 30 0
BIRTH_DATE DATE 4 0
What I want to do is check if the second column has "DECIMAL", and if so, change the value for x to x+1. Here's what I have tried:
db2$coltype2 <- ifelse(db2$COLTYPE %in% c('DECIMAL'), db2$LENGTH+1, db2$LENGTH)
Basically, if COLTYPE is DECIMAL, take length and add 1 to it. If not, just use the value of length. It created a new column, but with the exact same values and nothing changed at all.
How can I check if a row in a column is equal to a value/string, and then alter a row in another column?
We need to change the column names
ifelse(db2$COLTYPE %in% c('DECIMAL'), db2$x+1, db2$x)

PHPExcel: Setting column width based on column number

i am using PHPExcel & searched a lot to get the result for setting the column width based on column number. I found results based on column id's but couldnt find any result for setting width based on column number. I am asking to know about, based on column number. What i tried before is
$length = strlen($tempval);
$objPHPExcel->getActiveSheet()->getColumnDimensionByColumn($dataColumn)->setWidth($length+10);
But it is hsowing me fatel error.. what supposed to be the right one??
You can get the Column ID from the Column Number using the
PHPExcel_Cell::stringFromColumnIndex(), pass the column index (e.g. 32 or 7) and it will return the column ID (like AG or H).
There is also a corresponding PHPExcel_Cell::columnIndexFromString() static method.... pass the column ID (like "AB") as an argument, and it will return the column number (e.g. 28).
Note that (for historic reasons) PHPExcel_Cell::stringFromColumnIndex() is 0-based (0 will return A, 1 will return B, etc); whereas PHPExcel_Cell::columnIndexFromString() is 1-based (A will return 1, B will return 2, etc).

Grouping price ranges

I am trying to group some price ranges from an .ods file, but have no idea how to do that.
e.g. I have a column with different prices like this:
11,61
6,15
13,68
7,69
6,00
What I want is to tell Calc to group everything from 0,00~10,99 and output text 0-10 and everything from 11,00~20,00 and output text 11-20, so the final output would be:
col1 col2
11,61 11-20
6,15 0-10
13,68 11-20
7,69 0-10
6,00 0-10
You can use the functions ROUNDDOWN() and ROUNDUP() with a negative count to get the next multiple of 10 (-1), 100 (-2) or 1000 (-3). It reduces the accuracy of a certain value by squares of 10. So, rounding to the previous or next multiple of 10 is done using:
=ROUNDDOWN(<yourvalue>; -1)
and
=ROUNDUP(<yourvalue>; -1)
respectively (take care to adapt the formula argument separators to commata (,) if this is required by the i18y your're using).
So, =ROUNDDOWN(11,61; -1) will result in 10, and =ROUNDUP(11,61; -1) will give you 20. This way, you can "calculate" the appropriate group for each value (example for value in A1):
=CONCATENATE(ROUNDDOWN($A1; -1)+1;"-";ROUNDUP($A1;-1))
To split it up on multiple lines:
=CONCATENATE( # Result will be a concatenated string
ROUNDDOWN($A1;-1)+1; # first value: previous multiple of 10, +1;
"-"; # second value: literal "-"
ROUNDUP($A1;-1) # third value: next multiple of 10
)
With your example data, this results in:
EDIT:
For a grouping 0-9, 9-19 and so on, the following formula should work:
=CONCATENATE(ABS(ROUNDDOWN($A2+1; -1)-1);"-";ROUNDUP($A2+1,01;-1)-1)
EDIT2:
For a solution using the IF() function, you could use:
=IF(A2 < 9;"0-9";IF(A2 < 19; "9-19";IF(A2 < 29; "19-29";"more than 29")))
For grouping of values greater than 29, you will have to add according IF clauses replacing the string "more than 29" by additional checks. Every grouping range will require its own IF clause.

Resources