Currently trying to use sample R code and implement it into my own
Sample code goes like this:
syn_data <- syn_data %>%
dplyr::mutate(gender = factor(gender,
labels = c("female", "male")))
My code goes:
data <- data %>%
dplyr::mutate(condition = factor(condition,
labels = c("Fixed Ratio 6", "Variable Ratio 6", "Fixed Interval 8", "Variable Interval 8")))
Getting this error:
Error in UseMethod("mutate") :
no applicable method for 'mutate' applied to an object of class "character"
Edit:
categorical. Reinforcement schedule the rat has been assigned to: 0 = 'Fixed Ratio 6'; 1 = 'Variable Ratio 6'; 2 = 'Fixed Interval 8'; 3 = 'Variable Interval 8'.
Data (sample right, mine left)
The cause of your problem is that data is not a data.frame, which is the required class for the first argument of mutate. If you change it to a data.frame, your code works.
For example:
tap_data <- data.frame(rat_id = 1:4, condition = c(0,1,2,3))
tap_data <- tap_data %>% mutate(condition = factor(condition,
labels = c("Fixed Ratio 6", "Variable Ratio 6",
"Fixed Interval 8", "Variable Interval 8")))
tap_data
# rat_id condition
# 1 1 Fixed Ratio 6
# 2 2 Variable Ratio 6
# 3 3 Fixed Interval 8
# 4 4 Variable Interval 8
To check if an object is a data.frame, you can use is.data.frame(). You can check for some other classes with similar syntax, such as is.factor().
is.data.frame(tap_data)
#[1] TRUE
is.data.frame(tap_data$condition)
# [1] FALSE
is.factor(tap_data$condition)
#[1] TRUE
In addition to the answer above, you can convert the matrix or array to a data frame as follows:
data <- data %>%
as.data.frame(.) %>%
dplyr::mutate(condition = factor(condition,
labels = c("Fixed Ratio 6", "Variable Ratio 6", "Fixed Interval 8", "Variable Interval 8")))
These dplyr functions are set to handle data frames only, therefore, you need to check if the data structure you are working on is of a data-frame class.
Related
I have a variable with 75 levels, that I would like to format. However, I find it difficult to do so without formatting a level wrong.
As you know creating a factor with its levels is done like this:
df$f <- factor(df$f, levels=c('a','b','c'),
labels=c('Treatment A','Treatment B','Treatment C'))
Is this there a way to code this differently so that the label is written next to the level. I'm looking for a code in this structure:
'a' = 'Treatment A'
'b' = 'Treatment B'
'c' = 'Treatment C'
Thanks in forward
You could use a named vector for your level-label-pairs and convert to a factor like so:
foo <- c("a", "c", "b")
rec <- c(
"a" = "Treatment A",
"b" = "Treatment B",
"c" = "Treatment C"
)
factor(foo, levels = names(rec), labels = rec)
#> [1] Treatment A Treatment C Treatment B
#> Levels: Treatment A Treatment B Treatment C
If you have a long list of equivalences it's generally a good workflow to include it as a separate file, e.g. icdcodes.csv containing
code,descr
C00.0,Upper lip cancer
C00.1,Lower lip cancer
...
Then you could do:
codeinfo <- read.csv("icdcodes.csv")
factor(foo, levels = codeinfo$code, labels = codeinfo$descr
Ideally, you could even get the ICD10 descriptions straight from the CDC (although in practice this probably doesn't work because the descriptions are longer than yours, e.g. C000 is "Malignant neoplasm of external upper lip", not "Upper lip cancer" ...) [Also note that the CDC file doesn't have a dot separator in the codes, C0000 rather than C00.00]
icd_url <- "https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Publications/ICD10CM/2022/icd10cm_codes_2022.txt"
codeinfo <- read.fwf(icd_url, widths = c(8,100))
names(codeinfo) <- c("code", "descr")
codeinfo$code <- trimws(codeinfo$code)
I have a df that contain long strings. If I want to separate it into different variable, how should I do that?
sample data is here:
df <- structure(list(tx = c(" [1] Timepoint EGTMPT Categorical select one (nominal) 51 Screening",
" [2] N/A : O ff-Study EGTNA Categorical yes/no (dichotomous) 3",
" [3] Check if Not Done EGTMPTND Categorical yes/no (dichotomous) 3",
" [4] Date Performed ECGDT Date 11",
" [5] Time (24-hour format) ECGTM Time 5",
" [6] O verall ECG Interpretation ECGRES Categorical select one (nominal) 37 Normal"
)), row.names = c(NA, 6L), class = "data.frame")
It seems that the variables occupy a fixed space, so to find those spaces we do the following:
Manually separate one line:
vars = c(" [1] ", "Timepoint ", "EGTMPT ",
"Categorical select one (nominal) ", "51 ", "Screening")
Count the number of characters in each variable:
sizes = numeric(length(vars))
for(i in 1:length(vars)){
sizes[i] = nchar(vars[i])}
Cumulatively sum those values and add a 1 (starting point) at the beggining:
sizes = c(1, cumsum(sizes))
The result is:
> sizes
[1] 1 14 62 74 107 118 127
So the first variable goes from the 1st to the 14th position, etc. Now we just need to cut each line in those places:
df2 = character()
for(i in 2:length(sizes)){
df2 = cbind(df2, apply(df, 1, function(x){substr(x, sizes[i-1], sizes[i])}))}
And lastly remove the extra spaces:
df2 = gsub(" ", "", df2)
I would like to show a statistic (for this example, it would be very helpful to know how to do means and # of observations), by two other variable, where one variable is on one side and another variable is on another side.
I don't know if there is a way for R to figure out how to make it so that the value labels, which would be in string for this example, are rearranged for "optimal" viewing in the resulting table, but that would be ideal. What I have in mind is where the string value labels might be really long, so that in a resulting table everything just gets squished. If there really isn't a smart way but instead just the judicious use of \n, that would be fine too.
An example below for where means could be applied, along group and type.
# Example data frame
df <- data.frame(
num <- c(0.225802, 0.384, 0.583, 0.868, 0.3859, 0.58582, 0.9485802, 0.085802),
type <- c("This is a description of type 1", "This is a description of type 2", "This is a description of type 3", "This is a description of type 4", "This is a description of type 1", "This is a description of type 2", "This is a description of type 3", "This is a description of type 4"),
group <- c("This is a really long description for group A", "This is a really long description for group A", "This is a really long description for group A", "This is a really long description for group A", "This is a really long description for group B", "This is a really long description for group B", "This is a really long description for group B", "This is a really long description for group B")
)
colnames(df) <- c("num", "type", "group")
Thanks!
You can do the following using data.table. In this case, I create a summary table containing the Means and No. Obs for num across Type/Group pairs.
Code
require(data.table)
setDT(df)
untypes = df[, unique(as.character(type))] # Unique type descr
ungroups = df[, unique(as.character(group))] # Unique group descr
types = c(1,2,3,4) # Short types in the order they appear in `untypes` (1 to 4)
groups = c('A', 'B') # Short groups in the order they appear in `ungroups` (A to B)
df[, stype := sapply(type, function(x) types[which(untypes == x)])] # Assign short notation type ID
df[, sgroup := sapply(group, function(x) groups[which(ungroups == x)])] # Assign short notation group ID
dcast(df[, .(Mean = mean(num), No = length(num)), .(stype, sgroup)], stype ~ sgroup, value.var = c('Mean', 'No')) # Create summary matrix
Result
stype Mean_A Mean_B No_A No_B
1: 1 0.225802 0.3859000 1 1
2: 2 0.384000 0.5858200 1 1
3: 3 0.583000 0.9485802 1 1
4: 4 0.868000 0.0858020 1 1
It is important that types and groups are declared such that their orders coincide with the corresponding orders of untypes and ungroups, respectively. For instance, if the long description of type 2 enters as the second observation in untypes, then types[2] must equal 2.
I have a data column that contains a bunch of ranges as strings (e.g. "2 to 4", "5 to 6", "7 to 8" etc.). I'm trying to create a new column that converts each of these values to a random number within the given range. How can I leverage conditional logic within my function to solve this problem?
I think the function should be something along the lines of:
df<-mutate(df, c2=ifelse(df$c=="2 to 4", sample(2:4, 1, replace=TRUE), "NA"))
Which should produce a new column in my dataset that replaces all the values of "2 to 4" with a random integer between 2 and 4, however, this is not working and replacing every value with "NA".
Ideally, I am trying to do something where the dataset:
df<-c("2 to 4","2 to 4","5 to 6")
Would add a new column:
df<-c2("3","2","5")
Does anyone have any idea how to do this?
We can split the string on "to" and create a range between the two numbers after converting them to numeric and then use sample to select any one of the number in range.
df$c2 <- sapply(strsplit(df$c1, "\\s+to\\s+"), function(x) {
vals <- as.integer(x)
sample(vals[1]:vals[2], 1)
})
df
# c1 c2
#1 2 to 4 2
#2 2 to 4 3
#3 5 to 6 5
data
df<- data.frame(c1 = c("2 to 4","2 to 4","5 to 6"), stringsAsFactors = FALSE)
We can do this easily with sub. Replace the to with : and evaluate to get the sequence, then get the sample of 1 from it
df$c2 <- sapply(sub(" to ", ":", df$c1), function(x)
sample(eval(parse(text = x)), 1))
df
# c1 c2
#1 2 to 4 4
#2 2 to 4 3
#3 5 to 6 5
Or with gsubfn
library(gsubfn)
as.numeric(gsubfn("(\\d+) to (\\d+)", ~ sample(seq(as.numeric(x),
as.numeric(y), by = 1), 1), df$c1))
Or with read.table/Map from base R
sapply(do.call(Map, c(f = `:`, read.csv(text = sub(" to ", ",", df$c1),
header = FALSE))), sample, 1)
data
df <- structure(list(c1 = c("2 to 4", "2 to 4", "5 to 6")),
class = "data.frame", row.names = c(NA, -3L))
I'm stuck in a simple problem argument with the ifelse function in R. I'm a new user of R and I'm trying to fill blanks in a column ("first column") depending on values in another column ("second column").
As I try ifelse function, I'm getting this error "argument "no" is missing, with no default"
All I need is: if the test (condition) is FALSE, keep the values (from a factor variable) in the "first column" as they are.
This is an example of my data frame which has ~6000 obs
#
# first second third
# 1 Cluster 1 Chest Pain 1
# 2 Coronary Artery Diseases 1
# 3 Cluster 6 Anemia 5
# 4 Cluster 7 Hypertension and Cerebrovascular Disease 4
# 5 Chronic Obstructive Pulmonary Disease 2
# 6 Cluster 5 Diabetes 10
My try is
sample$first= ifelse(sample$second=="Coronary Artery Diseases","Cluster 10",sample$first)
The result of this is filling "Cluster 10" in first column if a have "Coronary Artery Diseases" in the second column BUT all the remaining obs in the first column I get a number. The problem is that "first" is a factor variable and I need it to be a factor.
Any suggestions?
data
sample <- structure(list(first = c("Cluster 1", "", "Cluster 6", "Cluster 7",
"", "Cluster 5"), second = c("Chest Pain", "Coronary Artery Diseases",
"Anemia", "Hypertension and Cerebrovascular Disease",
"Chronic Obstructive Pulmonary Disease", "Diabetes"),
third = c("1", "1", "5", "4", "2", "10")),
.Names = c("first", "second","third"),
class = "data.frame", row.names = c(NA, -6L))
As the first column was a factor (not shown in data above), when using
ifelse to replace values, it coerced the other values in the column to the factor levels (removing their labels). This could be worked around by using as.character()
sample$first <- as.factor(ifelse(sample$second=="Coronary Artery Diseases",
"Cluster 10",as.character(sample$first))