Select from a matrix based on the values of 2 distinct variables - r

Suppose I have a matrix with values of a response variable as one column and 2 characteristics such as Gender and location as the other two columns.
How do I select the particular values of the response based on specific values of both gender and location?
For example, I know
dataset$response[gender=="Male"]
will select all the Males. But say I want to select the response values from males that are from location=='SE' as well. I don't know how to do this.
Thanks a lot!
p.s. (I tried looking for this on the internet, but it is difficult finding help for the [] operator)

Logical 'and':
dataset$response[dataset$gender=="Male" & dataset$location=="SE"]
More information on logical operators in R can be found by using help("&").

If dataset is a data-frame, simply use subset:
subset( dataset, gender == 'Male' & location == 'SE' )$response

Related

Replace values of one column based on range of values in another column in R

I have a data that mismatches state abbreviation with zipocode.
For example, my data assignes CA with illinois zipcode, 61820.
Hence, I want to match state and zipcode properly.
I was thinking about this approach:
df$state[60001 <= df$zipcode <= 62999] <-"IL"
But obviously, it is wrong approach.
Does anyone know how to to replace values?

Subset based on one value in multiple columns

I have a dataset with weekly number of lucky days for some of those weekly values i have values greater than 7 which must be a mistake.
Therefore what I want to do is to delete rows which have a value greater than 7 in one of the multiple columns. Those columns are column 21 to 68. What I have tried so far is this:
new_df <- subset(df, 21:68 <= 7)
This leaves me with an completely empty new_df.
I know there is a option that goes like this:
new_df <- subset(df, b != 7 & d != 7)
But I feel like there must be a more elegant way than to name every single column which I want to refer to. Do I need to use square brackets or sth. like that?
There is no Error message when computing the above mentioned command.
The referred values are numerical.
Can someone help?

Subsetting everything but a given line in 1 column

I have a dataset where I have 4 different treatments. One of these treatments is the control group. I want to subset the data between control and other treatments.
I wrote this in R Studio:
ControlQ2<-subset(Q2, Treatment == "No_Suite")
Now how to select all the treatments except "No_Suite"?
Thanks
I'm not sure if I understood you well, but what about this one?
ExceptControlQ2<-subset(Q2, Treatment != "No_Suite")
If this is not what you were looking for, please provide an example with an expected output.

How to write for loop to count frequency in specific row

I want to count frequencies of specific rows by age groups, the steps are:
1. the data frame of "pud", when the column of "icd3" meet the following conditions
2. Select the qualified rows and count the frequencies.
The codes are as follows:
u2<-which(pud$icd3>="A00"&pud$icd3<="A99"|
pud$icd3>="B00"&pud$icd3<="B94"|
pud$icd3=="B99")
u3<-which(pud$icd3>="A00"&pud$icd3<="A99"|
pud$icd3>="B00"&pud$icd3<="B49"|
pud$icd3>="B90"&pud$icd3<="B94"|
pud$icd3=="B99")
for (i in 2:3){co[i]=addmargins(table(pud[u[i],]$agegroups))}
but the output panel reprents:
for (i in 2:3){co[i]=addmargins(table(pud[u[i],]$agegroups))}
Error in [.data.frame(pud, u[i], ) : object 'u' not found
How can I adjust the codes?
If you want the frequencies, why not do it like this?
sum(pud$icd3>="A00"&pud$icd3<="A99"|
pud$icd3>="B00"&pud$icd3<="B94"|
pud$icd3=="B99")

Subset dataframe by unique id variables with certain number of rows

I have not found a clear answer to this question, so hopefully someone can put me in the right direction!
I have a nested data frame (panel data), with multiple observations within multiple individuals. I want to subset my data frame by those individuals (id) which have at least 20 rows of data.
I have tried the following:
subset1 = subset(df, table(df$id)[df$id] >= 20)
However, I still find individuals with less that 20 rows of data.
Can anyone supply a solution?
Thanks in advance
subset1 = subset(df, as.logical(table(df$id)[df$id] >= 20))
Now, it should work.
The subset function actually is getting a series of true and false from the condition part, which indicates if the row should be kept or not/ meet the condition or not. Hence, the output of the condition part should be a series of true or false.
However, if you put table(df$id)[df$id]>=20 in the console, you will see it returns an array rather than logic. In this case, it is pretty straight that you just need to turn it into logic. Then, it works.

Resources