So I have a data frame my_df as follows
my_df <- data.frame(c("0600", "0602", "0603"))
Now I need to write ifelse statement which calculates 3 more variables and append it to a new data frame.
I am not able to find out on how to add multiple executable statements in the loop and append the calculated variables to a new data frame.
Below is my code for ifelse statement.
with(my_df, ifelse(my_df$H == "0600",{d$D <- 1+1 & d$c <- "0600"},
ifelse(my_df$H == "0602",{d$D <- 2+1 & d$c <- "0602"},
{ d$D <- 3+1 & d$c <- "0603"}
)))
I am able to append values to new dataframe with only one executable code inside the ifloop i.e if I have only {d$D <- 1+1} it works perfectly but fails when I have multiple statements to execute.
My output data frame should be as shown below,
D C
2 0600
3 0602
4 0603
Your syntax for ifelse is off, but I would recommend using case_when from the dplyr library here:
library(dplyr)
d$D <- case_when(
my_df$H == "0600" ~ 1+1,
my_df$H == "0602" ~ 2+1,
TRUE ~ 3+1
)
d$c <- case_when(
my_df$H == "0600" ~ "0600",
my_df$H == "0602" ~ "0602",
TRUE ~ "0603"
)
You could also use ifelse, but you would need nested calls to ifelse and it probably would not look good, or be very maintainable.
Using a list:
# My data frame
my_df <- data.frame(H = c("0600", "0602", "0603"))
# My list to be used as a lookup
my_list <- list("0600" = c(D = 2, C = "0600"),
"0602" = c(D = 3, C = "0602"),
"0604" = c(D = 4, C = "0603"))
# Find corresponding values for 'H'
# Then bind into a data frame
do.call(bind_rows, my_list[my_df$H])
Result:
# A tibble: 3 x 2
# D C
# <chr> <chr>
# 1 2 0600
# 2 3 0602
# 3 4 0603
Using Base R
my_df <- data.frame("C" = c("0600", "0602", "0603"))
my_df$D <- ifelse(my_df$C=="0600",2,ifelse(my_df$C=="0602",3,ifelse(my_df$C=="0603",4,NA)))
Related
I have a data frame which I would like to group according to the value in a given row and column of the data frame
my_data <- data.frame(matrix(ncol = 3, nrow = 4))
colnames(my_data) <- c('Position', 'Group', 'Data')
my_data[,1] <- c('A1','B1','C1','D1')
my_data[,3] <- c(1,2,3,4)
grps <- list(c('A1','B1'),
c('C1','D1'))
grp.names = c("Control", "Exp1", "EMPTY")
my_data$Group <- case_when(
my_data$Position %in% grps[[1]] ~ grp.names[1],
my_data$Position %in% grps[[2]] ~ grp.names[2]
)
OR
my_data$Group <- with(my_data, ifelse(Position %in% grps[[1]], grp.names[1],
ifelse(Position %in% grps[[2]], grp.names[2],
grp.names[3])))
These examples work and produce a Group column with appropriate labels, however I need to have flexibility in the length of the grps list from 1 to approximately 25.
I see no way to iterate through case_with or ifelse in a for loop eg.
my_data$Group <- for (i in 1:length(grps)){
case_when(
my_data$Well %in% grps[[i]] ~ grp.names[i])
}
This example simply deletes the Group column
What is the most appropriate way to handle a variable grps length?
I believe your question implies that the grps variable is a list and every element in that list is itself an array that holds all the positions that belong to that group.
Specifically, in your grps variable below, if the Position is "A1" or "B1" it belongs to the whatever your first entry is grp.names. Similarly, if the position is "C1" or "D1" it belongs to whatever your second entry is in grp.names
> grps
[[1]]
[1] "A1" "B1"
[[2]]
[1] "C1" "D1"
Assuming that to be the case you can do the following:
matching_group_df <- sapply(grps, function(x){ my_data$Position %in% x})
selected_group <- apply(matching_group_df, 1, function(x){which(x == TRUE)})
my_data$Group <- grp.names[selected_group]
Position Group Data
1 A1 Control 1
2 B1 Control 2
3 C1 Exp1 3
4 D1 Exp1 4
The way it works is as follows:
matching_group_df is a matrix of True/False (created via the sapply function) that specifies what group index the position belongs to:
> matching_group_df
[,1] [,2]
[1,] TRUE FALSE
[2,] TRUE FALSE
[3,] FALSE TRUE
[4,] FALSE TRUE
You then select the column that has the TRUE value row by row using an apply command:
selected_group <- apply(matching_group_df, 1, function(x){which(x == TRUE)})
> selected_group
[1] 1 1 2 2
Finally you pass those indices to your grp.names list to select the appropriate ones and set them into your original dataframe.
grp.names[selected_group]
[1] "Control" "Control" "Exp1" "Exp1"
This also has the small side benefit of just using base R functions if that is important to you.
Approach 1: Hash table
I would opt for a different approach here, as group makeup might change during analysis, specifically a lookup table of key-value pairs, and write a small accessor function.
library(tidyverse)
# First, a small adjustment to `grps` to reflect an empty group.
grps <- list(c('A1','B1'),
c('C1','D1'),
NULL)
names <- unlist(grps, use.names = F)
values <- rep(grp.names, map_dbl(grps, length))
h = as.list(values) %>%
set_names(names) %>%
list2env()
# find x in h
f <- Vectorize(function(x) h[[x]], c("x")) # scoping here
This takes some time to setup, but usage is quite convenient:
my_data %>%
mutate(Groups = f(Position))
Position Group Data
1 A1 Control 1
2 B1 Control 2
3 C1 Exp1 3
4 D1 Exp1 4
This avoids having to change your code in multiple places, and can take on arbitrary length of groups.
Approach 2: Dynamic switch
Alternatively, we can make an arbitrary length switch expression, building it from the group names and their unique values.
constructor <- function(ids, names){
purrr::imap_chr(as.character(ids), ~paste(paste0("\"", .x ,"\""),
paste0("\"", names[.y], "\""),
sep = "=")) %>%
paste0(collapse = ", ") %>%
paste0("Vectorize(function(x) switch(as.character(x), ", ., ", NA))", collapse = "") %>%
str2expression()
}
my_data %>%
mutate(Group = eval(constructor(names, values)))
In this case, it would evaluate the expression
expression(Vectorize(function(x) switch(as.character(x), A1 = "Control",
B1 = "Control", C1 = "Exp1", D1 = "Exp1",
NA)))
For each item in my_data$Position you want to go through each of the grps and look for a match and assign grp.names, if so. If you don't find a match in any grp, assign grp.names[3]:
my_data$Group <- lapply(my_data$Position, function(position){ # Goes through each my_data$Position
for(i in 1:length(grps)){
if(position %in% grps[[i]]){
return(grp.names[i]) # Give matching index of grp.names to grps
} else if (i == length(grps)){ # if no matches assign grp.names[3]
return(grp.names[3])
}
}
}) %>% unlist() # Put the list into a vector
I'm trying for my first time to code a function. It's supposed to split a string into severals ones and returned each piece into a tibble row.
For example, let's say I have that kind of data.
nasty_entry <- tibble(ID = 1:3, Var = c("ABC", "AB", "A"))
I would like to get that.
nice_entry <- tibble(ID = c(1, 1, 1, 2, 2, 3), var = c("A", "B", "C", "A", "B", "A"))
So, I try to code a function using different kind of loops (for practice) because my orignal data have about 300 entries.
nice_entry <- function(data, var, pattern)
#--------------------DECLARATION--------------------#
# data : The tibble containing the data to split.
# var : The variable containing the data to split.
# pattern : The pattern to use for the spliting.
if(!require(tidyverse)){install.packages("tidyverse")}
library(tidyverse)
if(!require(magrittr)){install.packages("tidyverse")}
library(magrittr)
c1 <- 0 # Reset the counter #1
c2 <- 0 # Reset the counter #2
unchanged_rows <- 0 # The number of rows that has been unchanged.
changed_rows <- 0 # The number of rows that has been changed.
new_data <- tibble() # The tibble where the data will be stored.
repeat{
c1 <- c1 +1 # Increase the counter #1 by one at each loop.
c2 <- 0 # Reset the counter #2 at each loop.
# Split the string into several strings.
splited_str <- str_split(string = data %>% select({{ var }}) %>% slice(c1), pattern = pattern) %>%
unlist()
# Add the row into the "new_data" variable if the original string hasn't been splited.
if(length(splited_str) <= 1) {
unchanged_rows <- unchanged_rows +1
new_data <- new_data %>%
bind_rows(slice(data, c1))
next
}
# Duplicate the row of the original string. It duplicates it several times according to the
# number of times the original string has been splited.
if(length(splited_str) > 1){
changed_rows <- changed_rows +1
duplicated_rows <- data %>%
slice(rep(c1, each = length(splited_str)))
# Replace each original string with the new splited strings.
while (c2 < length(splited_str)) {
c2 <- c2 +1
duplicated_rows <- duplicated_rows %>%
mutate({{ var }} = replace(x = {{ var }}, list = c2, values = splited_str[c2]))
new_data <- new_data %>%
bind_rows(slice(duplicated_rows, c2))
}
}
# Break the loop if the entire tibble has been analyse and return the "new_data" variable.
if(c1 == length(nrow(data))) {
break
return(new_data)
}
}
}
I tried the same code by using "real variables" inside the loops and it seems to work. The problem comes when I embrace them into the function. I get this error.
Error: object 'c1' not found
}
Error: unexpected '}' in " }"
}
Error: unexpected '}' in "}"
What do I do wrong? Maybe it's indexing problem?.
I would also like to have some advices for coding function and if there's alternatives to do the same.
Thank you very much!
Mathieu
Here is another approach you may want to get
library(tidyverse)
nasty_entry2 <- nasty_entry %>%
mutate(Var = strsplit(as.character(Var), "")) %>%
tidyr::unnest(Var)
# A tibble: 6 x 2
# ID Var
# <int> <chr>
# 1 1 A
# 2 1 B
# 3 1 C
# 4 2 A
# 5 2 B
# 6 3 A
We can use separate_rows. Specify a regex lookaround to match between two characters. The . in regex match any character. So, it is basically splitting between two adjacent characters
library(dplyr)
library(tidyr)
nasty_entry %>%
separate_rows(Var, sep="(?<=.)(?=.)")
# A tibble: 6 x 2
# ID Var
# <int> <chr>
#1 1 A
#2 1 B
#3 1 C
#4 2 A
#5 2 B
#6 3 A
I am trying to perform a function simmiliar to the function in excel fount below:
IF(COUNTIF(RANGE, CRITERIA), "FOUND", "MISSING")
I want to print a new column in my dataframe with found or missing. I understand in R that I can use %in% for example:
A$C %in C$B
To find if the values in column C of the A dataframe exist in the values in column B of the C datafame. However, I do not know how to subset said results with a conditional function to print found or missing to a new column in the correct row.
Here is an example of the dataframes:
A <- data.frame("C" = c(3,5,9,21,25), "D" = 1:5)
C <- data.frame("B" = c(3,6,21,22,8) , "F" = 10:14)
A$C %in% C$B
A[A$C %in% C$B,]
Based on the limited information:
lookup_list <- c(1:3)
x <- c('a','b','c')
y <- c(10, 3, 5)
df <- data.frame(x,y)
x y
1 a 10
2 b 3
3 c 5
df <- df %>%
mutate(status = case_when(
y %in% lookup_list ~ 'FOUND',
!y %in% lookup_list ~ 'MISSING'
))
x y status
1 a 10 MISSING
2 b 3 FOUND
3 c 5 MISSING
I need to create a function for mean calculation using an specific rule without the use of apply or aggregate functions. I have 3 variables and I would like to calculate the mean of var3 each change in var2 first and second the var 3 mean each change in the var1 in the same function. This is possible? My code is:
Variable 1
var1 <- sort(rep(LETTERS[1:3],10))
Variable 2
var2 <- rep(1:5,6)
Variable 3
var3 <- rnorm(30)
Create data frame
DB<-NULL
DB<-cbind(var1,var2,as.numeric(var3))
head(DB)
Function for calculate the mean follow a rule
mymean <- function(x, db=DB){
for (1:length(db[,1])){
if (db[,[i]] != db[,[i]]) {
mean(db[,[i]])
}
else (db[,[i]] == db[,[i]]) {
stop("invalid rule")
}}
Here start the problems and doesn't work
Thanks
Alexandre
It appears that you want to obtain means by groups.
To do this I would use the dplyr package
library(dplyr)
db <- data.frame(var1 = sort(rep(LETTERS[1:3],10)), var2=rep(1:5,6), var3=rnorm(30))
db %>%
group_by(var1) %>%
summarise(mean_over_va1 = mean(var3))
var1 mean_over_va1
1 A 0.07314416
2 B -0.05983557
3 C -0.03592565
db %>%
group_by(var2) %>%
summarise(mean_over_va2 = mean(var3))
var2 mean_over_va2
1 1 -0.4512942044
2 2 -0.1331316802
3 3 0.0821958902
4 4 -0.0001081054
5 5 0.4646429921
From you comments however, it appears that you don't want to use any base R commands like apply and aggregate so I assume you may not like the above solution.
If I had to do this with brute force do something like this:
db <- data.frame(var1 = sort(rep(LETTERS[1:3],10)), var2=rep(1:5,6), var3=rnorm(30), stringsAsFactors = FALSE)
#Obtaining Groups
group1 <- unique(db$var1)
group2 <- unique(db$var2)
#Obtaining Number of Different types of groups so I dont have to keep calling length
N1 <- length(group1)
N2 <- length(group2)
#Preallocating, not necessary but a good habit
res1 <- data.frame(group = group1, mean = rep(NA, N1))
res2 <- data.frame(group = group2, mean = rep(NA, N2))
#Looping over the group members rather than each row of data. I like this approach because it relies more heavily on sub-setting than it does on iteration, which is always a good idea in R.
for (i in seq(1, N1)){
res1[i,"mean"] <- mean(db[db$var1%in%group1[i], "var3"])
}
for (i in seq(1, N2)){
res2[i,"mean"] <- mean(db[db$var2%in%group2[i], "var3"])
}
res <- list(res1, res2)
I'm trying to identify the values in a data frame that do not match, but can't figure out how to do this.
# make data frame
a <- data.frame( x = c(1,2,3,4))
b <- data.frame( y = c(1,2,3,4,5,6))
# select only values from b that are not in 'a'
# attempt 1:
results1 <- b$y[ !a$x ]
# attempt 2:
results2 <- b[b$y != a$x,]
If a = c(1,2,3) this works, as a is a multiple of b. However, I'm trying to just select all the values from data frame y, that are not in x, and don't understand what function to use.
If I understand correctly, you need the negation of the %in% operator. Something like this should work:
subset(b, !(y %in% a$x))
> subset(b, !(y %in% a$x))
y
5 5
6 6
Try the set difference function setdiff. So you would have
results1 = setdiff(a$x, b$y) # elements in a$x NOT in b$y
results2 = setdiff(b$y, a$x) # elements in b$y NOT in a$x
You could also use dplyr for this task. To find what is in b but not a:
library(dplyr)
anti_join(b, a, by = c("y" = "x"))
# y
# 1 5
# 2 6