How can I solve this error when using case_when? - r

I'm using this code:
ovabonnement <- ovabonnement %>%
mutate(c12_ovabonnement_type_con_voor = case_when(s2_ovabonnement_type_voor_anders == 1 ~ NA,
s2_ovabonnement_type_voor_1 == 1 |
s2_ovabonnement_type_voor_13 == 1 ~ "Basis",
s2_ovabonnement_type_voor_2 == 1 |
s2_ovabonnement_type_voor_3 == 1 |
s2_ovabonnement_type_voor_4 == 1 |
s2_ovabonnement_type_voor_9 == 1 |
s2_ovabonnement_type_voor_11 == 1 ~ "Voordeel",
s2_ovabonnement_type_voor_5 == 1 |
s2_ovabonnement_type_voor_6 == 1 |
s2_ovabonnement_type_voor_7 == 1 |
s2_ovabonnement_type_voor_8 == 1 |
s2_ovabonnement_type_voor_10 == 1 |
s2_ovabonnement_type_voor_12 == 1 |
s2_ovabonnement_type_voor_14 == 1 ~ "Vrij"))
So I have these 15 variables that represent whether a person has that subscription added onto their public transport membership. Because it was a multiple choice questionnaire people could select multiple choices, which is why they are different variables.
I want to make these into one variable that takes NA if people answered "other", "Basis" if people answered 1 or 13, "Voordeel" if people answered 2,3,4,9 or 11 and "Vrij" if people answered 5,6,7,8,10,12 or 14.
If people answered 2, there will be a 1 in s2_ovabonnement_type_voor_2. People can have answered multiple of these, which makes it a bit tricky. However, I want it to go through these chronologically. For example, if a person answered 2 AND 10, it should choose the 10, because the code is later, but I'm not sure if that is how case_when works.
I get this error:
Error in `mutate()`:
! Problem while computing `c12_ovabonnement_type_con_voor = case_when(...)`.
Caused by error in `names(message) <- `*vtmp*``:
! 'names' attribute [1] must be the same length as the vector [0]
Run `rlang::last_error()` to see where the error occurred.

case_when/if_else are type sensitive i.e all the expressions should return the same type. In the OP's expression, the first expression returns NA and NA by default is logical, and all others return character type. We need NA_character_ to match the type of others
ovabonnement <- ovabonnement %>%
mutate(c12_ovabonnement_type_con_voor = case_when(s2_ovabonnement_type_voor_anders == 1 ~ NA_character_,
s2_ovabonnement_type_voor_1 == 1 |
s2_ovabonnement_type_voor_13 == 1 ~ "Basis",
s2_ovabonnement_type_voor_2 == 1 |
s2_ovabonnement_type_voor_3 == 1 |
s2_ovabonnement_type_voor_4 == 1 |
s2_ovabonnement_type_voor_9 == 1 |
s2_ovabonnement_type_voor_11 == 1 ~ "Voordeel",
s2_ovabonnement_type_voor_5 == 1 |
s2_ovabonnement_type_voor_6 == 1 |
s2_ovabonnement_type_voor_7 == 1 |
s2_ovabonnement_type_voor_8 == 1 |
s2_ovabonnement_type_voor_10 == 1 |
s2_ovabonnement_type_voor_12 == 1 |
s2_ovabonnement_type_voor_14 == 1 ~ "Vrij"))

Related

Filter for appearance of 2 values that must at least exist 1 times

Title may be bad, couldn't think of a better one.
My comment data, each comment is assigned to an account by usernameChannelId:
usernameChannelId | hasTopic | sentiment_sum | commentId
a | 1 | 4 | xyxe24
a | 0 | 2 | h5hssd
a | 1 | 3 | k785hg
a | 0 | 2 | j7kgbf
b | 1 | -2 | 76hjf2
c | 0 | -1 | 3gqash
c | 1 | 2 | ptkfja
c | 0 | -2 | gbe5gs
c | 1 | 1 | hghggd
My code:
SELECT u.usernameChannelId, avg(sentiment_sum) sentiment_sum, u.hasTopic
FROM total_comments u
WHERE u.hasTopic is True
GROUP BY u.usernameChannelId
HAVING count(u.usernameChannelId) > 0
UNION
SELECT u.usernameChannelId, avg(sentiment_sum) sentiment_sum, u.hasTopic
FROM total_comments u
WHERE u.hasTopic is False
GROUP BY u.usernameChannelId
I want to get all usernameChannelIds that have at least 1 comment with hasTopic == 0 and 1 comment with hasTopic == 1 (to compare both groups statistically and remove user that only commented in topic or offtopic videos).
How can I filter like that?
Here's a little trick that may help. First, you need to get familiar with the CASE expression., here's an excerpt from the doc.
The CASE expression
A CASE expression serves a role similar to IF-THEN-ELSE in other
programming languages.
The optional expression that occurs in between the CASE keyword and
the first WHEN keyword is called the "base" expression. There are two
basic forms of the CASE expression: those with a base expression and
those without.
An expression like CASE when hasTopic is False then 1 else 0 END will evaluate to 1 if hasTopic is 0. An expression for hasTopic is True would be similar.
Now, those CASEs can be summed, which will tell you if user has any rows with hasTopic True and hasTopic False.
Something like this in the having clause might do the trick (one for each value of course)
HAVING SUM(CASE when hasTopic is False then 1 else 0 END) > 0
(it would be necessary to remove the WHERE clause, and the UNION query would be unnecessary).

if else multiple conditions comparing rows

I am strugling with this loop. I want to get "6" in the second row of column "Newcolumn".I get the following error.
Error in if (mydata$type_name[i] == "a" && mydata$type_name[i - :
missing value where TRUE/FALSE needed.
The code that I created:
id type_name name score newcolumn
1 a Car 2 2
1 a van 2 6
1 b Car 2 2
1 b Car 2 2
mydata$newcolumn <-c(0)
for (i in 1:length(mydata$id)){
if ((mydata$type_name [i] == "a") && (mydata$type_name[i-1] == "a") && ((mydata$name[i]) != (mydata$name[i-1]))){
mydata$newcolumn[i]=mydata$score[i]*3 }
else {
mydata$newcolumn[i]=mydata$score[i]*1
}
}
Thank you very much in advance
List starts at index 1 in R but like you are doing a i-1 in your loop starting at 1, your list is out of range (i-1=0) so your code can not return a True or False.

Why is SQLite conditional between generated variables not comparing correctly?

I'm a little confused why I'm getting results in the following example. Let's pretend that we want to rank how much someone likes animals. They like any colored tiger best, red animals second, and doesn't like any others.
Consider the following table:
----------------------
| Color | Animal |
----------------------
| Yellow | Butterfly |
| Red | Lion |
| Red | Tiger |
| Green | Lion |
| Green | Donkey |
| Yellow | Tiger |
----------------------
Now when I run this code:
CREATE VIEW animal_colors as select "farm"."color" as "color",
CASE WHEN "farm"."color" = 'Red' THEN 1 ELSE 0 END AS "color_red",
CASE WHEN "farm"."animal" = 'Tiger' THEN 1 ELSE 0 END AS "animal_tiger",
CASE
WHEN "animal_tiger" = 1 THEN 1
WHEN "color_red" = 1 THEN 0
ELSE -1
END
AS "rank"
FROM "farm"
I get the following output:
----------------------------------------------------------
| Color | Animal | color_red | animal_tiger | rank |
---------------------------------------------------------
| Yellow | Butterfly | 0 | 0 | -1 |
| Red | Lion | 1 | 0 | -1 |
| Red | Tiger | 1 | 1 | -1 |
| Green | Lion | 0 | 0 | -1 |
| Green | Donkey | 0 | 0 | -1 |
| Yellow | Tiger | 0 | 1 | -1 |
---------------------------------------------------------
I thought I was comparing the variables color_red and animal_tiger to see if they are 1, but it doesn't look like that is happening. Could anyone shed some light on why the expression is always evaluating as -1?
You are comparing "animal_tiger" (a string) to 'Tiger' which will not be true.
To access the derived columns animal_tiger and color_red you need to make the select a subquery, or use the actual columns.
So you could use either (not using derived columns) :-
SELECT farm.color AS color,
CASE
WHEN farm.color = 'Red' THEN 1 ELSE 0 END AS color_red,
CASE
WHEN farm.animal = 'Tiger' THEN 1 ELSE 0 END AS animal_tiger,
CASE
WHEN farm.animal = 'Tiger' THEN 1
WHEN farm.color = 'Red' THEN 0
ELSE -1
END AS rank
FROM farm;
resulting in
or (using derived columns via a subquery) :-
SELECT *,
CASE
WHEN animal_tiger = 1 THEN 1
WHEN color_red = 1 THEN 0
ELSE -1
END AS rank
FROM (
SELECT "farm"."color" as "color",
CASE WHEN "farm"."color" = 'Red' THEN 1 ELSE 0 END AS "color_red",
CASE WHEN "farm"."animal" = 'Tiger' THEN 1 ELSE 0 END AS "animal_tiger"
FROM "farm"
);
Resulting in :-
Note done as SELECTS rather than creating VIEWS for my convenience
Additional
Perhaps you could simplify using the following as the basis :-
SELECT farm.color AS color,
-1 + (farm.color = 'Red') + (farm.animal = 'Tiger') AS rank
FROM farm
This yields :-
Or perhaps even
SELECT farm.color AS color,
0 + (farm.color = 'Red') + ((farm.animal = 'Tiger') * 2) AS rank
FROM farm
(giving a ranking base of zero and giving tiger a higher rank then red thus you could easily discriminate say between a red and yellow tiger, the former having a higher rank)
This would yield :-
Try this. I originally had the syntax incorrect for the muli-when case statement.
CREATE VIEW "animal_colors" as select "farm"."color" as "color",
CASE WHEN "farm"."color" = 'Red' THEN 1 ELSE 0 END AS "color_red",
CASE WHEN "farm"."animal" = 'Tiger' THEN 1 ELSE 0 END AS "animal_tiger",
CASE WHEN "farm"."animal" = 'Tiger' THEN 1
WHEN "farm"."color" = 'Red' THEN 0
ELSE -1 END as "rank"
FROM "farm"
which yields
color color_red animal_tiger rank
"Yellow" "0" "0" "-1"
"Red" "1" "0" "0"
"Red" "1" "1" "1"
"Green" "0" "0" "-1"
"Green" "0" "0" "-1"
"Yellow" "0" "1" "1"

Making new variable through mutate

I want to make a new variable "churned" by taking into account five variables :
Include in churn
A-Churn
B-Churn
C-Churn
D-Churn
My condition is - If variable "Include in churn" has 1 and for all other variables , if any one of the variables has 1 than my new variable "Churned" should have 1 else 0. I am a newbie in using mutate function.
Please help me to create this new variable thru 'mutate' function.
If I understand your formulation logically, you want
mutate(data, Churned = Include.in.Churn == 1 & (A.Churn == 1 | B.Churn == 1 | C.Churn == 1 | D.Churn == 1))
This will make Churned a logical. If you really need an integer, as.integer will produce 1 for TRUE and 0 for FALSE.
If all mentioned Variables are either 1 or 0 you can also use the possibly faster
mutate(data, Churned = Include.in.Churn * (A.Churn + B.Churn + C.Churn + D.Churn) >= 1)

Get the Max() value of calculated columns (totals) in a report

I am trying to get the value for the column Max, which is the max value of columns A, B, C. The rows T and G are Total and Grand total (because of row groups), I only need the max value for them :
-----------------------------
A B C | Max
-----------------------------
| 1 1 2 |
-----------------------------
| 2 1 3 |
------+---------------+------
T | 3 2 5 | 5
------+---------------+------
| 2 5 1 |
-----------------------------
| 1 2 1 |
------+---------------+------
T | 3 7 2 | 7
------+---------------+------
G | 6 9 7 | 9
-----------------------------
Whenever I try something with the Max() function, I get an error like The expression of [...] uses an aggregate function on a report item. Aggregate functions can be used only on report items contained in the headers and footers..
In MS Excel, I would simply do MAX(A1:C1) in column Max. Is there any solution to achieve this in rdlc ?
I have search the above error and found this answer, but first option is not possible and second option.. well, I did not really understand it, and I do not think it is applicable for Max. If it is, could you explain where I should place the workaround ?
I'm working with Visual Studio 2015 and Microsoft.ReportViewer.WebForms v10.0.0.0.
it needs to put this code in the field "Max" of the rows "T" and "G".. it should work.. I haven't tryed ;)
If Sum(Fields!A.Value) >= Sum(Fields!B.Value) And Sum(Fields!A.Value) >= Sum(Fields!C.Value) Then
Sum(Fields!A.Value)
Else if Sum(Fields!B.Value) >= Sum(Fields!A.Value) And Sum(Fields!B.Value) >= Sum(Fields!C.Value) Then
Sum(Fields!B.Value)
Else
Sum(Fields!C.Value)
End If
update after the comment of KevinM
IIf (
Sum(Fields!A.Value) >= Sum(Fields!B.Value) And Sum(Fields!A.Value) >= Sum(Fields!C.Value)
, Sum(Fields!A.Value)
, (
IIf (Sum(Fields!B.Value) >= Sum(Fields!A.Value) And Sum(Fields!B.Value) >= Sum(Fields!C.Value)
, Sum(Fields!B.Value)
, Sum(Fields!C.Value)
)
)

Resources