How to calculate the count of the results of a formula - count

Good day; I am in need of help creating a formula that will only count the results of a formula that = "No". I have written a formula that returns either a Yes or No value, and now I want to count how many values = No. I hope that the wording of the question makes sense.

Related

3-way Contingency Table R: How to get marginal sum, percentages per group

I have been trying to create a contingency table in R with percentage distribution of education for husbands (6 category) and wives (6 category) BY marriage cohort (total 4 cohorts). My ideal is something like this: IdealTable.
However, what I have been able to get at max is: CurrentTable.
I am not able to figure out how to convert my row and column sums to percentages (similar to the ideal). The current code that I am using is:
three.table = addmargins(xtabs(~MarriageCohort + HerEdu + HisEdu, data = mydata))
ftable(three.table)
Is there a way I can turn the row and column sums into percentages for each marriage cohort?
How can I add labels to this and export the ftable?
I am relatively new to R and tried to find solutions to my questions above on google, but havent been successful. Posting my query on this platform for the first time and any help with this will be greatly appreciated! Thank you!
One approach would be to create separate xtab runs for each MarriageCohort:
Cohorts <- lapply( mydata, mydata["MarriageCohort"],
function(z) xtabs( ~HerEdu + HisEdu, data = z) )
Then get totals in each Cohorts item before dividing the cohort addmargins(.) result by those totals and multiplying by 100 to get percent values:
divCohorts <- lapply(Cohorts, function(tbl) 100*addmargins(tbl)/sum(tbl) )
Then you will need to clean those items up to your desires. You have not included data so the cleanup remains your responsibility. (I did not use sapply because that could give you a big matrix that might be difficult to manage, but you could try it and see if you in the second stepwere satisfied with that approach.)

how can I use aggregate() in R to work out this mean?

Use the aggregate() command to calculate the proportion of trips that occurred on the weekend among subscribers vs. non-subscribers. Provide a clear interpretation of the numbers you see, and answer whether there appears to be a difference in bike usage on weekdays vs. weekends among subscribers vs. non-subscribers?
My code is like this:
aggregate(is_weekend ~ is_subscriber , data = citibike, FUN=mean)
It's unclear what your data looks like, but it seems you're missing your y (response) variable. As I read it, is_weekend and is_subscriber are your categorical variables indicating the group they are in (subscriber vs. non-subscriber etc.) You need to include your response variable (bike usage). This is what you should do to answer you question:
means <- aggregate(citibike$Y_VAR , by=list(citibike$is_subscriber, citibike$is_weekend), mean)
means
Hope this helps.

how to code in R 'for any values of a, given values of b from the sliderInput , we'll take values from the dataframe for the specific values?

Hi im currently coding like an insurance premium tool, which needs to link a formula for specific contract, i.e. pure endowment (survival benefit)
in order to do so, it needs a population vector based on the age input on the sliderInput, which i already have prepared in a dataframe.
But i'm not sure how to link the formula to R, like how do i say to R, for any age input and any benefit term, we can take data from the dataframe.
a little background: kPx -> probablity aged x lives another k years (formula: population at x+k/population at x) . I've got the population vector in a dataframe, i just need to tell R for a certain age and certain benefit term, we'll get it from the specific dataframe.
HOPEFULLY THE QUESTION MAKES SENSE.

How to compute questionnaire total score and subscores by summing all and a selection of columns in R?

I'm new in R and I'm having a little issue. I hope some of you can help me!
I have a data.frame including answers at a single questionnaire.
The rows indicate the participants.
The first columns indicates the participant ID.
The following columns include the answers to each item of the questionnaire (item.1 up to item.20).
I need to create two new vectors:
total.score <- sum of all 20 values for each participant
subscore <- sum of some of the items
I would like to use a function, like a sum(A:T) in Excel.
Just to recap, I'm using R and not other software.
I already did it by summing each vector just with the symbol +
(data$item.1 + data$item.2 + data$item.3 etc...)
but it is a slow way to do it.
Answers range from 0 to 3 for each item, so I expect a total score ranging from 0 to 60.
Thank you in advance!!
Let's use as example this data from a national survey with a questionnaire
If you download the .csv file to your working directory
data <- read.csv("2016-SpanishSurveyBreastfeedingKnowledge-AELAMA.csv", sep = "\t")
Item names are p01, p02, p03...
Imagine you want a subtotal of the first five questions (from p01 to p05)
You can give a name to the group:
FirstFive <- c("p01", "p02", "p03", "p04", "p05")
I think this is worthy because of probably you will want to perform more tasks with this group (analysis, add or delete a question from the group...), and because it helps you to provide meaningful names (for instance "knowledge", "attitudes"...)
And then create the subtotal variable:
data$subtotal1 <- rowSums(data[ , FirstFive])
You can check that the new variable is the sum
head(data[ , c(FirstFive, "subtotal2")])
(notice that FirstFive is not quoted, because it is an object outside data, but subtotal2 is quoted, because it is the name of a variable in data)
You can compute more subtotals and use them to compute a global score
You could may be save some keystrokes if you know that these variables are the columns 20 to 24:
names(data)[20:24]
And then sum them as
rowSums(data[ , c(20:24)])
I think this is what you asked for, but I would avoid doing this way, as it is easier to make mistakes, whick can be hard to be detected

How to calculate and create an index to represent the values of other columns? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Please, could anyone help me implement the calculation outlined below.
I'm using R in RStudio.
df <- data.frame(x = c(1,2,3,4,5,6,7,8,9,0,11,12,13,14,15,16,17,18,19,20),
total_fatal_injuries = c(1,0,5,4,0,27,10,15,6,2,10,4,0,0,1,0,3,0,1,0),
total_serious_injuries = c(10,0,9,3,2,4,9,9,0,8,3,1,0,8,2,7,5,4,0,2),
total_minor_injuries = c(10,0,9,3,2,4,9,9,0,8,3,1,0,8,2,7,5,4,0,3),
total_uninjuried = c(1,0,1,0,0,10,2,5,0,4,0,0,31,0,2,3,0,1,0,0),
injured_index = c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0))
In the data set above, each line represents an observation of the occurrence of accidents with vehicles.
Column 'x' is just an ID.
The same occurrence may have individuals with various levels of injury: fatal injuries, serious injuries, minor injuries and uninjured. The sum of the values of each column is equal to the number of individuals involved in the occurrence.
The goal is to populate the 'injured_index' column with a value that represents the severity of the occurrence, according to the values recorded in the other columns.
A numerical index that represents the severity of the occurrence, by which the data set can be ordered.
What would be the best formula for calculating the 'injured_index' column?
I would like someone to make a suggestion on how to calculate a value for an index that represents the level of how bad the occurrence is. Based on the total number of victims at each level, per occurrence.
The importance is simple to understand.
1) Fatal is bad
2) Serious is a bit less bad
3) Minor is not good
4) Uninjured is ideal.
How to put everything together mathematically and get an index that represents which occurrence is more or less serious than the other?
I know how to create the column and assign a value.
I just want the hint of how to calculate the value that will be stored.
I know this has more to do with math, but mathematicians in the Mathematics Stack Exchange refuse to answer because they think it does not have mathematics but programming. :/
Thank you all for trying!
Here's an approach.
# This counts how many people in each row, for columns 2 through 5
df$count <- rowSums(df[,2:5])
# This assigns a weighting to each severity of injury and divides by how
# many people in that row. Adjust the weights based on your judgment.
df$injured_index = (1000 * df$total_fatal_injuries + 200 *
df$total_serious_injuries + 20 * df$total_minor_injuries) / df$count

Resources