Creating a table 1 in R - r

I have one dataset, called unchf_data, it contains participants where some have answered a questionnaire and some have not. I want to make a descriptive table (table 1) where I have the columns (Total, No answer, Answer).
Making new column subgroup, for a table 1. # if PREG column answered "Yes" or 1, then in final cohort.
`unchf_data2 <- unchf_data2[!(unchf_data2$PREG == "0"),]`# removing all where pregnancy is 0. If no pregnancy then no chance of having an APO.
unchf_data2<- unchf_data2[!c(is.na(unchf_data2$HYPTPREG) == "TRUE" &
is.na(unchf_data2$DIABPREG) == "TRUE" &
is.na(unchf_data2$BABYLT5LBS) == "TRUE" &
is.na(unchf_data2$BORN3WKSERLY) == "TRUE" &
is.na(unchf_data2$PREECLMP) == "TRUE" &
is.na(unchf_data2$BABYGT9LBS) == "TRUE"),] # removing all that have missing in all APO variables.
`unchf_data2$subgroup <- 1` # creating new variable - subgroup which has 1 in all
unchf_data2$subgroup <- factor(unchf_data2$subgroup, levels = c(1))
`subgroup2 <- unchf_data2$subgroup` # making the new vector from subgroup variable
library(qpcR)
unchf_data <- qpcR:::cbind.na(subgroup2, unchf_data) # Binds our new vector with length of 10375 to our data set with length of 44174, filling in NA values for everything longer than the vector.
unchf_data$subgroup2[is.na(unchf_data$subgroup2) == "TRUE"] <- 0 # recoding all NA values to 0
unchf_data$subgroup2 <- factor(unchf_data$subgroup2, levels = c(0,1), labels = c("No answer", "Answer")) # converting variable to factor.
summary(unchf_data$subgroup2) # No answer 33799, Answer 10375
My problem now is that apparently the new subgroup2 variable does not line up with the original dataset.
Is there another way I can do this?
The reason I want a new variable in the dataset is because I'm using the UnivariateTable function from the Publish Package

Related

How to replace zeros and ones instead of words into a data frame that has quantitative and qualitative data

I have 6 columns in my data frame the column names are exam 1, exam 2, exam 3, result exam 1, result exam 2, result exam 3 respectively the first three columns have numbers and NAs and the last three columns have Pass and Fail and NAs. I want to replace all the NAs with 0s and I want to replace instead of all the Pass words with 1s and instead of all the Fail words with 0s. So I want to replace the Fail with zeros and the NAs also with zeros.
I have used multiple approaches in R but I can't make it work.
df[df == 'NA'] <- 0 , df[df == NA] <- 0
df[df$"result exam 1" == "Pass",]$"result exam 1" = 1
df[df$"result exam 1" == "Fail",]$"result exam 1" = 0
None of these codes are working.
Would someone be able to please help with this problem?
Thank you
You really need to get a better grasp of basic R syntax:
You are putting the subset operator [ in the wrong place (you need to subset your vector, not the data frame)
You are then using the $ operator on the result of the previous operation, and that throws an error (that you should have posted) because $ cannot be used on vectors.
You are testing a value for missingness: x == NA has no sense: how can you check a non-available value? You must use the is.na() function.
Here is what you should have done (with just a bit of help from basic R tutorials):
df$exam.results.1[df$exam.results.1 == 1] <- "Pass"
df$exam.results.1[df$exam.results.1 == 0] <- "Fail"
df$exam[is.na(df$exam)] <- 0
To do this for multiple columns in one go you can use the following -
cols <- grep('result', names(df))
df[cols][is.na(df[cols])] <- 0
df[cols][df[cols] == 'Fail'] <- 0
df[cols][df[cols] == 'Pass'] <- 1
df
Assuming the name of the data frame is dt.
Make a vector for the names of result columns
result <- c("result exam 1", "result exam 2", "result exam 3")
dt <- dt %>% mutate_at(result, ~ifelse(.x == "Pass", 1, 0) )
This will replace all "pass" with 1 and rest of fail and NA with 0.
For NA s in other columns
dt[c("exam 1", "exam 2", "exam 3")][is.na(dt[c("exam 1", "exam 2", "exam 3")])] <- 0

Looping over a data set, using ifelse to check the value of a column in order to set a new column (factor)

I'm new to R, I've had this issue for a week now and have attempted many searches for a solution and just can't figure this out.
I am using the House Regression data set from Kaggle and attempting to do some feature engineering. See link below for more information about the data set
https://www.kaggle.com/c/house-prices-advanced-regression-techniques
In short, there are two columns: Condition1 and Condition2. Each has 9 factor levels of the same factors. 4/9 factors specify that the house is near a railroad. Instead of using 18 factors to specify if the house is near a railroad, I'm attempting to create a new column: ByRR. This column is a binary column, 0 if the value in either of these two columns is in the RR_List list, 1 if it is.
I've done a series of different methods to try to accomplish this task. The latest is running an lapply function over the data set. I get the following warning message and all of the values are set to 0. I know that there are values that should be = 1.:
'''
RR_List = c("RRNn","RRAn","RRNe", "RRAe")
data <-data[c("Condition1","Condition2")]
data$ByRR = factor(x="No",levels= c("Yes","No"), labels=c(1,0))
lapply(data,function(x) {
x$ByRR <- ifelse(data$Condition1 %in% RR_List ||
data$Condition2 %in% RR_List, 1, 0)
})
Warning messages:
1: In x$ByRR <- ifelse(data$Condition1 %in% RR_List || data$Condition2 %in%
Coercing LHS to a list
2: In x$ByRR <- ifelse(data$Condition1 %in% RR_List || data$Condition2 %in% :
Coercing LHS to a list
3: In x$ByRR <- ifelse(data$Condition1 %in% RR_List || data$Condition2 %in% :
Coercing LHS to a list
Any help with this is much appreciated!
No need to use lapply, try using |
data$ByRR <- +(data$Condition1 %in% RR_List | data$Condition2 %in% RR_List)
This will create a new column ByRR in data if either of Condition1 or Condition2 has value from RR_List. + converts the boolean value (TRUE, FALSE) to integer value (1, 0) respectively.
If you need values to be "Yes"/"No" instead of 1/0 use
data$ByRR <- c("No", "Yes")[(data$Condition1 %in% RR_List |
data$Condition2 %in% RR_List) + 1]

Creating variables in R- two issues

I have two basic questions about new variable creation in R. I will show some code and hopefully someone can help answer these!
df0$new <- ifelse(df0$old=="yes",1,0)
In this code I am creating a new variable called "new" that is equal to 1 if the variable "old" is equal to yes or is otherwise equal to 0. But in the variable "old" I have missing data (represented as -99, -98, NAN). So how can I account for there being missing values?
The second question is about using an "OR" statement.
df0$z <- ifelse(df0$x1=="yes",1,0 | )
I want to create a new variable z that is equal to 1 if the participant responds "yes" to any of 5 questions (q1-q5). So I want to code it so it looks like: z = 1 if q1 ==1 OR q2 == 1 OR q3 == 1 OR q4 == 1 OR q5 == 1. If none of q1-q5 equal 1 than I want to set z equal to 0. However this also brings up the issue with the missing values as described up above. Thanks so much!
You could do something like the following.
First, get rid of the -99, -98 and NaN. I am assuming that when, in the question, you write NAN you are meaning NaN.
Encode NA values as NA.
is.na(df0$old) <- (df0$old %in% c(-99, -98)) | is.nan(df0$old)
Now, note that FALSE/TRUE are encoded as 0/1 and coerce the logical results to class integer.
df0$new <- as.integer(df0$old == "yes")
df0$z <- as.integer(q1 == "yes" | q2 == "yes" | q3 == "yes" | q4 == "yes" | q5 == "yes")
Another solution for the first part.
library(dplyr) #because of left_join function
df1 <- data.frame(old = c("yes", "no"), new = c(1, 0))
df0 <- left_join(df0, df1)

Subsetting data in R returns 0 columns, 619 rows

Currently I am working with a dataset in r that I imported from a SPSS file converted to a csv. The data includes multiple factors such as gender, ethnicity, and test group, along with a set of weights I want to sum. I want to sum these weights based on multiple conditions (i.e. female + white + group1) so I tried subsetting the data.
small.set<-subset(df, df[,"gender"]==1 & df[,"ethnicity"] ==1 &
df[,"group"==1])
However, I get the following error:
Error in matrix(unlist(value, recursive = FALSE, use.names = FALSE), nrow = nr,
: 'data' must be of a vector type, was 'NULL'
I found that when trying to select group 1 in any case, R returned strange results:
df["group"==1]
> data frame with 0 columns and 619 rows
The structure of "group" is as follows:
str(df["group")
>Factor w/ 3 levels "1", "2", "3": 1 3 1 1 2...
Does anyone know what is causing this to happen?
why dont you aboid using sample and use directly:
small.set<-df[df$gender == 1 & df$ethnicity == 1 && df$group == 1,]
Another good way is using data.table package:
library(data.table)
df<-data.table(df)
small.set<-df[.(gender==1,ethnicity == 1,group == 1)]

Using the apply function over each column for adjusting of data.frame

So my hope is to change columns 14:18 into 1 column "Type". I wanted to give each of the entries in this new column (for matching observations in the previous) the value of which of the 5 is a 1 (because only 1 of them can be true). This is my best attempt at doing this in R (and beyond frustrated).
library(caret)
data("cars")
carSubset <- subset(cars)
head(carSubset)
# I want to convert the columns from of carSubset with following vector names
types <- c("convertible","coupe", "hatchback", "sedan", "wagon")
# into 1 column named Type, with the corresponding column name
carSubset$Type <- NULL
carSubset <- apply(carSubset[,types],
2,
function(each_obs){
hit_index <- which(each_obs == 1)
carSubset$Type <- types[hit_index]
})
head(carSubset) # output:
1 2 3 4 5
"sedan" "coupe" "convertible" "convertible" "convertible"
Which is what I wanted ... however, I also wanted the rest of my data.frame to come along with it, like I just wanted the new column of "Type" but I cannot even access it with the following line of code...
head(carSubset$Type) # output: Error in carSubset$Type : $ operator is invalid for atomic vectors
Any help on how to Add a new column dynamically while appending previously related data observations to it?
I actually figured it out! Probably not the best way to do it, but hey, it works.
library(caret)
data("cars")
carSubset <- subset(cars)
head(carSubset)
# I want to convert the columns from of carSubset with following vector names
types <- c("convertible","coupe", "hatchback", "sedan", "wagon")
head(carSubset[,types])
carSubset[,types]
# into 1 column named Type, with the corresponding column name
carSubset$Type <- NULL
newSubset <- c()
newSubset <- apply(carSubset[,types],
1,
function(obs){
hit_index <- which(obs == 1)
newSubset <- types[hit_index]
})
newSubset
carSubset$Type <- cbind(Type = newSubset)
head(carSubset[, !(names(carSubset) %in% types)])

Resources