I'm stuck in a simple problem argument with the ifelse function in R. I'm a new user of R and I'm trying to fill blanks in a column ("first column") depending on values in another column ("second column").
As I try ifelse function, I'm getting this error "argument "no" is missing, with no default"
All I need is: if the test (condition) is FALSE, keep the values (from a factor variable) in the "first column" as they are.
This is an example of my data frame which has ~6000 obs
#
# first second third
# 1 Cluster 1 Chest Pain 1
# 2 Coronary Artery Diseases 1
# 3 Cluster 6 Anemia 5
# 4 Cluster 7 Hypertension and Cerebrovascular Disease 4
# 5 Chronic Obstructive Pulmonary Disease 2
# 6 Cluster 5 Diabetes 10
My try is
sample$first= ifelse(sample$second=="Coronary Artery Diseases","Cluster 10",sample$first)
The result of this is filling "Cluster 10" in first column if a have "Coronary Artery Diseases" in the second column BUT all the remaining obs in the first column I get a number. The problem is that "first" is a factor variable and I need it to be a factor.
Any suggestions?
data
sample <- structure(list(first = c("Cluster 1", "", "Cluster 6", "Cluster 7",
"", "Cluster 5"), second = c("Chest Pain", "Coronary Artery Diseases",
"Anemia", "Hypertension and Cerebrovascular Disease",
"Chronic Obstructive Pulmonary Disease", "Diabetes"),
third = c("1", "1", "5", "4", "2", "10")),
.Names = c("first", "second","third"),
class = "data.frame", row.names = c(NA, -6L))
As the first column was a factor (not shown in data above), when using
ifelse to replace values, it coerced the other values in the column to the factor levels (removing their labels). This could be worked around by using as.character()
sample$first <- as.factor(ifelse(sample$second=="Coronary Artery Diseases",
"Cluster 10",as.character(sample$first))
Related
Currently trying to use sample R code and implement it into my own
Sample code goes like this:
syn_data <- syn_data %>%
dplyr::mutate(gender = factor(gender,
labels = c("female", "male")))
My code goes:
data <- data %>%
dplyr::mutate(condition = factor(condition,
labels = c("Fixed Ratio 6", "Variable Ratio 6", "Fixed Interval 8", "Variable Interval 8")))
Getting this error:
Error in UseMethod("mutate") :
no applicable method for 'mutate' applied to an object of class "character"
Edit:
categorical. Reinforcement schedule the rat has been assigned to: 0 = 'Fixed Ratio 6'; 1 = 'Variable Ratio 6'; 2 = 'Fixed Interval 8'; 3 = 'Variable Interval 8'.
Data (sample right, mine left)
The cause of your problem is that data is not a data.frame, which is the required class for the first argument of mutate. If you change it to a data.frame, your code works.
For example:
tap_data <- data.frame(rat_id = 1:4, condition = c(0,1,2,3))
tap_data <- tap_data %>% mutate(condition = factor(condition,
labels = c("Fixed Ratio 6", "Variable Ratio 6",
"Fixed Interval 8", "Variable Interval 8")))
tap_data
# rat_id condition
# 1 1 Fixed Ratio 6
# 2 2 Variable Ratio 6
# 3 3 Fixed Interval 8
# 4 4 Variable Interval 8
To check if an object is a data.frame, you can use is.data.frame(). You can check for some other classes with similar syntax, such as is.factor().
is.data.frame(tap_data)
#[1] TRUE
is.data.frame(tap_data$condition)
# [1] FALSE
is.factor(tap_data$condition)
#[1] TRUE
In addition to the answer above, you can convert the matrix or array to a data frame as follows:
data <- data %>%
as.data.frame(.) %>%
dplyr::mutate(condition = factor(condition,
labels = c("Fixed Ratio 6", "Variable Ratio 6", "Fixed Interval 8", "Variable Interval 8")))
These dplyr functions are set to handle data frames only, therefore, you need to check if the data structure you are working on is of a data-frame class.
I am trying to recode multiple columns of data from string variables (e.g. "None of the time", "Some of the time", "Often"...) to numeric values (e.g. "None of the time" = 0). I have seen a number of different responses to similar questions but when I have tried these they seem to remove all of the data and replace it with NA.
For_Analysis <- data.frame(Q11_1=c("None of the time", "Often", "Sometimes"),
Q11_2=c("Sometimes", "Often", "Never"), Q11_3=c("Never", "Never", "Often"))
For_Analysis <- For_Analysis%>%
mutate_at(c("Q11_1", "Q11_2", "Q11_3"),
funs(recode(., "None of the time"=1, "Rarely"=2,
"Some of the time"=3, "Often"=4, "All of the time"=5)))
When I run this second bit of code I get the following output
## There were 14 warnings (use warnings() to see them)
And all of the data within the dataframe is recoded to NA instead of the numeric values I want.
You are getting an error because there are some values which do not match. Also you can replace mutate_at with across.
library(dplyr)
For_Analysis <- For_Analysis%>%
mutate(across(starts_with('Q11'), ~recode(., "None of the time"=1, "Rarely"=2,
"Sometimes"=3, "Often"=4, "All of the time"=5, "Never" = 1)))
For_Analysis
# Q11_1 Q11_2 Q11_3
#1 1 3 1
#2 4 4 1
#3 3 1 4
I have taken the liberty to assume "Never" is same as "None of the time" and coded as 1.
The following method seems to have worked for my issue (recoding string variables to numeric in multiple columns):
For_Analysis <- data.frame(Q11_1=c("Never", "Often", "Sometimes"),
Q11_2=c("Sometimes", "Often", "Never"), Q11_3=c("Never", "Never", "Often"))
New_Values <- c(1, 2, 3, 4, 5)
Old_Values <- unique(For_Analysis$Q11_1)
For_Analysis[1:3] <- as.data.frame(sapply(For_Analysis[1:3],
mapvalues, from = Old_Values, to = New_Values))
Thanks for the help!
The easiest way to convert it to a variable of type factor and then to numeric.
library(tidyverse)
For_Analysis <- data.frame(Q11_1=c("None of the time", "Often", "Sometimes"),
Q11_2=c("Sometimes", "Often", "Never"), Q11_3=c("Never", "Never", "Often"))
fRecode = function(x) x %>% fct_inorder() %>% as.numeric()
For_Analysis %>% mutate_all(fRecode)
output
Q11_1 Q11_2 Q11_3
1 1 1 1
2 2 2 1
3 3 3 2
I have a vector of terms:
terms <- c("white blood cell", "acp5 mutation", "acquired immunodeficiency syndrome",
"activated pi3k delta syndrome", "acute disseminated encephalomyelitis"
)
> terms
[1] "white blood cell"
[2] "acp5 mutation"
[3] "acquired immunodeficiency syndrome"
[4] "activated pi3k delta syndrome"
[5] "acute disseminated encephalomyelitis"
And a dataframe:
df <- structure(list(ID = c(22952603L, 20639394L, 27989323L,
29221444L, 30595370L, 30595370L), TRAIT = c("acp5 mutation syndrome",
"Bilirubin levels", "Macrophage colony stimulating factor levels",
"Coronary artery calcified atherosclerotic plaque score in type 2 diabetes",
"White blood cell count", "Red cell distribution width")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -6L))
> df
# A tibble: 6 x 2
ID TRAIT
<int> <chr>
1 22952603 acp5 mutation syndrome
2 20639394 Bilirubin levels
3 27989323 Macrophage colony stimulating factor levels
4 29221444 Coronary artery calcified atherosclerotic plaque score in type 2 dia…
5 30595370 White blood cell count
6 30595370 Red cell distribution width
I want to be able to run through each row of my data frame, and if a term from the terms vector appears within the TRAIT column for that row, I want to keep that row.
eg. the resulting data frame would look like this:
> df
# A tibble: 2 x 2
ID TRAIT
<int> <chr>
1 22952603 acp5 mutation syndrome
2 30595370 White blood cell count
Since both "acp5 mutation" and "white blood cell" appear within the terms list.
What is the best way to go about creating this dataframe?
Here is one way:
subset(df, grepl(paste(terms, collapse = '|'), tolower(TRAIT)))
# ID TRAIT
# 1 22952603 acp5 mutation syndrome
# 5 30595370 White blood cell count
Similarly but using [ instead of subset:
df[grep(paste(terms, collapse = '|'), tolower(df$TRAIT)), ]
So your terms don't perfectly align, but assuming they do in your actual dataset, you could just to:
df%>%
mutate(new = ifelse(terms %in% TRAIT, 1, NA))
and then just filter out the NAs.
I have a data column that contains a bunch of ranges as strings (e.g. "2 to 4", "5 to 6", "7 to 8" etc.). I'm trying to create a new column that converts each of these values to a random number within the given range. How can I leverage conditional logic within my function to solve this problem?
I think the function should be something along the lines of:
df<-mutate(df, c2=ifelse(df$c=="2 to 4", sample(2:4, 1, replace=TRUE), "NA"))
Which should produce a new column in my dataset that replaces all the values of "2 to 4" with a random integer between 2 and 4, however, this is not working and replacing every value with "NA".
Ideally, I am trying to do something where the dataset:
df<-c("2 to 4","2 to 4","5 to 6")
Would add a new column:
df<-c2("3","2","5")
Does anyone have any idea how to do this?
We can split the string on "to" and create a range between the two numbers after converting them to numeric and then use sample to select any one of the number in range.
df$c2 <- sapply(strsplit(df$c1, "\\s+to\\s+"), function(x) {
vals <- as.integer(x)
sample(vals[1]:vals[2], 1)
})
df
# c1 c2
#1 2 to 4 2
#2 2 to 4 3
#3 5 to 6 5
data
df<- data.frame(c1 = c("2 to 4","2 to 4","5 to 6"), stringsAsFactors = FALSE)
We can do this easily with sub. Replace the to with : and evaluate to get the sequence, then get the sample of 1 from it
df$c2 <- sapply(sub(" to ", ":", df$c1), function(x)
sample(eval(parse(text = x)), 1))
df
# c1 c2
#1 2 to 4 4
#2 2 to 4 3
#3 5 to 6 5
Or with gsubfn
library(gsubfn)
as.numeric(gsubfn("(\\d+) to (\\d+)", ~ sample(seq(as.numeric(x),
as.numeric(y), by = 1), 1), df$c1))
Or with read.table/Map from base R
sapply(do.call(Map, c(f = `:`, read.csv(text = sub(" to ", ",", df$c1),
header = FALSE))), sample, 1)
data
df <- structure(list(c1 = c("2 to 4", "2 to 4", "5 to 6")),
class = "data.frame", row.names = c(NA, -3L))
I have a dataset that is basically a response of PHQ-9 questionnaire. Where in there are 9 columns which have factors "Not at all", "Sometimes", "Several Days", "More than half the days", "Nearly everyday". The scores of which are 0, 1, 1, 2, 3 respectively.
The response to all the 9 questions finally gives a PHQ score out of 27.
In my dataset, I however have the responses to these questions stored as :
$ Interest : Factor w/ 5 levels "More than half the days",..: 1 4 2 2 4 5 4 4 4 5 ...
Now what I actually want is another column adjacent to each feature like the above which contains the corressponding score. Moreover, at the end I want to calculate the result using these factor scores at the end to give the depression score.
This is the output I am looking at:
Interest I_Factor Pleasure P_factor Score
Not at all 0 Nearly Everyday 2 2
Creating a simulated dataframe for you:
df <- data.frame(id = c("001", "002", "003", "004", "005"),
PHQ_1 = c("Not at all", "Not at all", "Sometimes", "Sometimes", "Several Days"),
PHQ_2 = c("Sometimes", "Sometimes", "Several Days", "More than half the days", "Nearly everyday"))
Using mutate_at to select the questionnaire items for you, and then mass applying recode from the psych package to change the likert scales from factors to numeric. Giving a "name" for the new columns and they would not replace the old columns (e.g. "numeric_columns" in the example below).
Once this is done, using mutate again to compute the row sums and put it into a new column.
library(dplyr)
library(psych)
test <- df %>%
mutate_at(vars(PHQ_1:PHQ_2), funs(numeric_columns = recode(.,
"Not at all" = 0,
"Sometimes" = 1,
"Several Days" = 1,
"More than half the days" = 2,
"Nearly everyday" = 3))) %>%
mutate(total = rowSums(select(., contains("numeric_columns"))))
The sample output is as follows. The original columns are retained and you have the new columns in numeric format as well as the total score of the questionnaire.
id PHQ_1 PHQ_2 PHQ_1_numeric_columns PHQ_2_numeric_columns total
1 001 Not at all Sometimes 0 1 1
2 002 Not at all Sometimes 0 1 1
3 003 Sometimes Several Days 1 1 2
4 004 Sometimes More than half the days 1 2 3
5 005 Several Days Nearly everyday 1 3 4