I am trying to use the command assocstats() in order to receive Cramer's V for 2 Variables. This is not a a problem as long as I target the entirety of both variables:
assocstats(table(democrat, sex))
Problems arise when I try to target only 1 specific value of the dichotomous variable sex, which consists of 1 and 2.
I thought that dplyr might be of help with the filter command, but
assocstats(table(democrat, filter(sex==1))
does not yield any results.
Does anybody know how I can target only 1 value of the variable sex in this case?
Many thanks
Suppose if I am using the Arthritis data from library(vcd), we need to filter the rows that matches the 'Male' (or 1 in your dataset), select the columns of interest ('Treatment', and 'Sex'), get the frequency with table and use assocstats.
library(vcd)
assocstats(table(Arthritis[Arthritis$Sex=='Male', c('Treatment', 'Sex')]))
Assuming that the OP have two vectors i.e. 'democrat' and 'sex'
i1 <- sex ==1
assocstats(table(democrat[i1], sex[i1]))
Related
I have a multiple-response-variable with seven possible observations: "Inhalt", "Arbeit", "Verhindern Koalition", "Ermöglichen Koalition", "Verhindern Kanzlerschaft", "Ermöglichen Kanzlerschaft", "Spitzenpolitiker".
If one chose more than one observation, the answers however are not separated in the data (Data)
My goal is to create a matrix with all possible observations as variables and marked with 1 (yes) and 0 (No). Currently I am using this command:
einzeln_strategisch_2021 <- data.frame(strategisch_2021[, ! colnames (strategisch_2021) %in% "Q12"], model.matrix(~ Q12 - 1, strategisch_2021)) %>%
This gives me the matrix I want but it does not separate the observations, so now I have a matrix with 20 variables instead of the seven (variables).
I also tried seperate() like this:
separate(Q12, into = c("Inhalt", "Arbeit", "Verhindern Koalition", "Ermöglichen Koalition", "Verhindern Kanzlerschaft", "Ermöglichen Kanzlerschaft", "Spitzenpolitiker"), ";") %>%
This does separate the observations, but not in the right order and without the matrix.
How do I separate my observations and create a matrix with the possible observations as variables akin to the third picture (Matrix)?
Thank you very much in advance ;)
I am trying to run a regression in R based on two conditions. My data has binary variables for both year and another classification. I can get the regression to run properly while only using 1 condition:
# now time for the millions of OLS
# format: OLSABCD where ABCD are binary for the values of MSA/UA and years
# A = 1 if MSA, 0 if UA
# B = 1 if 2010
# C = 1 if 2000
# D = 1 if 1990
OLS1000<-summary(lm(lnrank ~ lnpop, data = subset(df, msa==1)))
OLS1000
However I cannot figure out how to get both the MSA/UA classification to work with the year variables as well. I have tried:
OLS1100<-summary(lm(lnrank ~ lnpop, data = subset(df, msa==1, df$2010==1)))
OLS1100
But it returns the error:
Error: unexpected numeric constant in "OLS1100<-summary(lm(lnrank ~ lnpop,
data = subset(df, msa==1, df$2010"
How can I get the program to run utilizing both conditions?
Thank you again!
The problem is:
df$2010
If your data really has a column named 2010, then you need backticks around it:
df$`2010`
And in your subset, don't specify df twice:
subset(df, msa == 1, `2010` == 1)
In general it's better if column names don't start with digits. It's also best not to name data frames df, since that's a function name.
#neilfws pointed out the "numeric as column names issue", but there is actually another issue in your code.
The third argument of subset() is actually reserved for the select =, which lets you choose which columns to include (or exclude). So the correct syntax should be:
subset(df, msa == 1 & `2010` == 1)
instead of
subset(df, msa == 1, `2010` == 1)
This second code would not give you an error, but it also would not give you the right condition.
is there a command to see how a categorical variable is coded?
Example, I have a variable called HbA1c and the categories I see are <5.7 and >=5.7. I want to know what value does <5.7 and >=5.7 take (if it is a 0 or a 1 or a 2). I Need it for regression analysis.
I am sorry if this question has been addressed already but I was not able to find the post.
Thank you in advance.
if x is a factor (the technical name for a categorical variable in R), then levels(x) gives you the levels in order, so something like
setNames(1:length(levels(f)),levels(f))
## a b c
## 1 2 3
will give you a correspondence table.
Your question in the comments isn't entirely clear, but if you wanted to run a regression with numeric values starting at zero, I would try something like:
mydata$n <- as.numeric(mydata$f)-1
(the numeric codes associated with factors always run from 1 to N; this gives you a numeric variable running from 0 to N-1). Then you can run a regression something like this:
lm(y~n,data=mydata)
How can I transform values of three separate variables in R to create new values in a single, combined variable? I have experimental data with three conditions, 'negative', 'control', and 'pro'. The data in raw form gives information about who was in what condition (each participant/row could only be in each condition) by putting a '1' next to a variable named for that condition, then the value is missing if a participant was not in that condition. I would like to create a single variable called "Manip", with values of -1 (for those with the value of 1 in the negative condition), 0 (for those with a value of 1 in the control condition), and 1 (for those in the pro condition). Thank you!
Supposing that your data frame is named df
df$Manip[df$negative==1] <- -1
df$Manip[df$control==1] <- 0
df$Manip[df$positive==1] <- 1
Alternatively you could also make this a fancy factor, like so
df$Manip[df$negative==1] <- 'negative'
df$Manip[df$control==1] <- 'control'
df$Manip[df$positive==1] <- 'positive'
df$Manip <- as.factor(df$Manip,
levels=c(-1,0,1),
labels=('negative','control','positive'))
I need to apply the following to a data.set which contains a number of aggregated scores.;
Dataset: P = Participant, TYPE = Trial type (factor), rt=score
TYPE P rt
1 A 1 607.500
2 A 2 481.000
3 A 3 298.125
4 A 5 568.250
I need to calculate the following normalized score: NewScore = OldScore - Grandmean (mean of RT column) + Participant Mean (mean of RT column for the given subject, P)
I've been experimenting with ddply and have come up with the following;
grandmean<-mean(data$rt)
ddply(data, .(P, TYPE), mutate, mean=mean(rt), grandmean=grandmean, subjectmean=mean(rt[P]), newscore=rt-grandmean-subjectmean)
The key question here is; how do I get the subjectmean to subset the data based on the current rows subject.
Is ddply even appropriate here? I am trying to avoid using loops...
Thanks!
You didn't describe splitting on the TYPE column, so I'll leave it out here. But you're on the right track. I'd use transform though instead of mutate:
data$grandmean <- mean(data$rt)
ddply(data, .(P), transform, newscore = rt - grandmean - mean(rt))
It is usually easiest to have plyr operate on a single thing, rather than trying to rely on it looking outside of its scope to find the global grandmean. So instead, make it a column.