R: Get index names while looping through df elements - r

Say, I have a data frame and I need to do something with its cells and remember what cells I have changed. One way is to loop through indices with two for-loops. But is there a way to do this with one loop?
Perfectly I need something like this:
changes = data.frame(Row = character(), Col = character())
for (cell in df){
if (!(is.na(df))){
cell = do.smt(cell)
temp = list(Row = get.row(cell), Col = get.col(cell))
changes = rbind(changes,temp)
}
}
Example of what I need:
df = data.frame(A = c(1,2,3), B = c(4,5,6), C = c(7,8,9))
rownames(df) = c('a','b','c')
changes = data.frame(Row = NA, Col = NA)
for (i in rownames(df)){
for (j in colnames(df)) {
if (df[i,j] > 5) {
df[i,j] = 0
temp = list(Row = i, Col = j)
changes = rbind(changes, temp)
}
}
}

This gets rid of both loops
df = data.frame(A = c(1,2,3), B = c(4,5,6), C = c(7,8,9))
rownames(df) = c('a','b','c')
changes <- which(df > 5, arr.ind=TRUE)
df[changes] <- 0
If you want the format exactly as specified you can sort that out with
changes <- data.frame(changes,row.names=NULL)
changes$row <- rownames(df)[changes$row]
changes$col <- colnames(df)[changes$col]
and its a simple matter of sorting if you're concerned that the order of the rows matches your example output

Related

R - for loop only gives the last interaction result

I am trying to save the results of all interactions, but for loop only gives me the result of last interaction. Just like this:
l <- list(a = c(1, 3, 5), b = c(4, 8), c = 2)
df <- data.frame()
for (i in 1:length(l)) {
s <- data.frame(name = names(l[i]),
value = mean(l[[i]]))
out <- rbind(df, s)
}
This code returns this:
I need to something like this:
How can I solve this?
Thanks in advance!
Your out variable only contains the result of the last iteration since out is overriden in every iteration of the loop.
Replace out by df like so, your expected result will be in the df variable:
l <- list(a = c(1, 3, 5), b = c(4, 8), c = 2)
df <- data.frame()
for (i in 1:length(l)) {
s <- data.frame(name = names(l[i]),
value = mean(l[[i]]))
df <- rbind(df, s)
}
df

Subsetting everything but a given index in a list (R)

Suppose I have a list of data frames. I am iterating through the list and removing one item (aka one data frame) of the list, and then rbinding the remaining items (aka data frames) of the list to create one final dataframe.
Can you help me how to remove a given index from a list and keep the rest?
Thanks!!! Example code below
testDF1 = data.frame(a = c(1,2,3,4,5), b = c(10,20,30,40,50))
testDF2 = data.frame(a = c(11,12,13,14,15), b = c(110,120,130,140,150))
testDF3 = data.frame(a = c(21,22,23,24,25), b = c(210,220,230,240,250))
testDF4 = data.frame(a = c(31,32,33,34,35), b = c(310,320,330,340,350))
testDF5 = data.frame(a = c(41,42,43,44,45), b = c(410,420,430,440,450))
myList = list(DF1 = testDF1, DF2 = testDF2, DF3 = testDF3, DF4 = testDF4, DF5 = testDF5)
for (i in 1:length(myList)) {
chosenItem = myList[[i]]
removedItemList = myList - chosenItem ## HELP HERE!!!!
updatedList = do.call("rbind", removedItemList)
}
I just figured it out...
for (i in 1:length(myList)) {
chosenItem = myList[[i]]
removedItemList = myList[i]
updatedList = do.call("rbind", removedItemList)
}

Populating a Data Frame with Characters in a For Loop R

Currently I have a loop that is adding rows from one data frame into another master data frame. Unfortunately, it converts the characters into numbers, but I don't want that. How can I get the following for loop to add the rows from one data frame into the master data frame while keeping the characters?
AnnotationsD <- data.frame(x = vector(mode = "numeric",
length = length(x)), type = 0, label = 0, lesion = 0)
x = c(1,2)
for(i in length(x)){
D = data.frame(x = i, type = c("Distance"),
label = c("*"), lesion = c("Wild"))
AnnotationsD[[i,]] <- D[[i]]
}
So what I would like to come out of this is:
x type label lesion
1 1 Distance * Wild
2 2 Distance * Wild
This should work:
x = c(1,2)
AnnotationsD <- data.frame(x = as.character(NA), type = as.character(NA),
label = as.character(NA), lesion = as.character(NA),
stringsAsFactors =F)
for(i in 1:length(x)){
D = c(x = as.character(i), type = as.character("Distance"),
label = as.character("*"), lesion = as.character("Wild"))
AnnotationsD[i,] <- D
}

randomize observations by groups (blocks) without replacement

This is a follow up question. The answers in the previous question are doing the random sampling with replacement. How can I change the code so that I assign each observation to on of J "urn" without putting the observation back in the 'lottery'?
This is the code I have right now:
set.seed(9782)
I <- 500
g <- 10
library(dplyr)
anon_id <- function(n = 1, lenght = 12) {
randomString <- c(1:n)
for (i in 1:n)
{
randomString[i] <- paste(sample(c(0:9, letters, LETTERS),
lenght, replace = TRUE),
collapse = "")
}
return(randomString)
}
df <- data.frame(id = anon_id(n = I, lenght = 16),
group = sample(1:g, I, T))
J <- 3
p <- c(0.25, 0.5, 0.25)
randomize <- function(data, urns=2, block_id = NULL, p=NULL, seed=9782) {
if(is.null(p)) p <- rep(1/urns, urns)
if(is.null(block_id)){
df1 <- data %>%
mutate(Treatment = sample(x = c(1:urns),
size = n(),
replace = T,
prob = p))
return(df1)
}else{
df1 <- data %>% group_by_(block_id) %>%
mutate(Treatment = sample(x = c(1:urns),
size = n(),
replace = T,
prob = p))
}
}
df1 <- randomize(data = df, urns = J, block_id = "group", p = p, seed = 9782)
If I change replace = T to replace = F I get the following error:
Error: cannot take a sample larger than the population when 'replace = FALSE'
Clarification of my objective:
Suppose that I have 10 classrooms (or villages, or something like that). To keep it simple, suppose each classroom has 20 students (in reality they will have N_j). Classroom per classroom, I want to assign each student to one of J groups, for example J=3. P says the fraction that will be assigned to each group. For example 25% to group 1 40% to group 2 and 35% to group 3.
This solution is based on #Frank's comment. I created one function that does the randomization for block j and another that calls that function for every block.
randomize_block <- function(data, block=NULL, block_name=NULL, urns, p, seed=9782) {
set.seed(seed)
if(!is.null(block)) {
condition <- paste0(block_name,"==",block)
df <- data %>% filter_(condition)
} else df <- data
if(is.null(p)) p <- rep(1/urns, urns)
N <- nrow(df)
Np <- round(N*p,0)
if(sum(Np)!=N) Np[1] <- N - sum(Np[2:length(Np)])
Urns = rep(seq_along(p), Np)
Urns = sample(Urns)
df$urn <- Urns
return(df)
}
randomize <- function(data, block_name=NULL, urns, p, seed=9782) {
if(is.null(p)) p <- rep(1/urns, urns)
if(!is.null(block_name)){
blocks <- unique(data[,block_name])
df <- lapply(blocks, randomize_block,
data = data,
block_name=block_name,
urns = urns,
p = p,
seed=seed)
return(data.table::rbindlist(df))
}else {
df <- randomize_block(data = data,
urns = urns, p = p,
seed=seed)
}
}
test <- randomize(data = df, block_name = "group",
urns = 3, p = c(0.25, 0.5, 0.25),
seed=4222016)
I'm trying to figure out if it is possible to use dplyr to do this, alternative solutions implementing that are more than welcome!
My answer to your other question is without replacement, as can be seen below:
block_rand <- as.tibble(randomizr::block_ra(blocks = df$group, conditions = c("urn_1","urn_2","urn_3")))
df2 <- as.tibble(bind_cols(df, block_rand))
df2 %>% janitor::tabyl(group, value)
df2 %>%
group_by(id) %>%
filter(n()>1) %>%
str()

R function to combine lists but prioritize the values in one of them

I'm trying to make a function to combine multiple lists, usually between 2 and 4, that will weed out duplicates and hopefully (if possible) prioritize the values of one of the lists. Is this possible? It's better explained with code:
PassOpts <- function(in1 = list(), in2 = list(), in3 = list(), in4 = list(){
c(in1, in2, in3, in4)
}
opts1 <- list(a = 1, b = 2, c = 4)
opts2 <- list(a = 1, b = 2, c = 4)
opts3 <- list(a = 5, b = 10)
combinedOpts <- PassOpts(opts1, opts2, opts3)
Ideally what I want is for it to be possible to 'prioritize' the list that is the most different from the rest, so in this case I would want for combinedOpts to be a list of a = 5, b = 10, c = 4. I'm using it as a way to set and combine default and also user input options.
Thanks
**Solved, ended up doing this as I realized the latest input (i.e. with 3 inputs in3) would be the one I want to use as default, so did as follows
PassOpts <- function(in1 = list(), in2 = list(), in3 = list(), in4 = list()){
if(length(in4) != 0){
in4Names <- names(in4)
rList <- in4
temp <- c(in1,in2,in3)
tempNames <- names(temp)
for(i in 1:length(tempNames)){
nam <- tempNames[i]
if(!(nam %in% in4Names)){
in4Names <- c(in4Names,nam)
rList[nam] <- temp[nam]
}
}
}else if(length(in3) != 0){
in3Names <- names(in3)
rList <- in3
temp <- c(in1,in2)
tempNames <- names(temp)
for(i in 1:length(tempNames)){
nam <- tempNames[i]
if(!(nam %in% in3Names)){
in3Names <- c(in3Names, nam)
rList[nam] <- temp[nam]
}
}
}else if(length(in2) != 0){
in2Names <- names(in2)
rList <- in2
temp <- in1
tempNames <- names(temp)
for(i in 1:length(tempNames)){
nam <- tempNames[i]
if(!(nam %in% in2Names)){
in2Names <- c(in2Names, nam)
rList[nam] <- temp[nam]
}
}
}else{
return(in1)
}
return(rList)
}
Looks likes you are looking of most unique number.
Here is how I would do:
1. aggregate input lists
2. find out the most unique one for each key
PassOpts <- function(listOfList){
resList = list()
# reduce lists by key
for (l in listOfList){
for (i in 1:length(l)){
key = names(l[i])
value = l[[i]]
resList[[key]] = c(resList[[key]], value)
}
}
# found most diffent one for each key
findDiff <- function(elements){
countTable = table(elements)
minCount = min(countTable)
return(names(countTable)[countTable == minCount])
}
return(lapply(resList, FUN=findDiff))
}
opts1 <- list(a = 1, b = 2, c = 4)
opts2 <- list(a = 1, b = 2, c = 4)
opts3 <- list(a = 5, b = 10)
combinedOpts <- PassOpts(list(opts1, opts2, opts3))

Resources