For loop to extract data - r

I Have data set with these variables (Branch, Item, Sales, Stock) I need to make a for loop to extract a data with the following
The same item which has
1-different branches
2- its sales is higher than the stock
and save the result in data frame
The code I used is
trials <- sample_n(Data_with_stock,1000)
for (i in 1:nrow(trials))
{
if(trials$sales[i] > trials$stock[i] & trials$item[i] == trials$item[i+1] & trials$branch[i] != trials$branch[i+1])
{s <-data.frame( (trials$NAME[i])
,(trials$branch[i]))
}
}

Suggest you use dplyr library, post installing considering "df" is your dataset, use the below commands for question 1 and 2
Question 1
question_one = df %>%
group_by(Item) %>%
summarise(No_of_branches = n_distinct(Branch))
items_with_more_than_one_branch = question_one[which(question_one$No_of_branches>1)"Item"]
Question 2: Similarly,
question_two = df %>%
group_by(Item) %>%
summarise(Stock_Val = sum(Stock), Sales_Val = sum(Sales))
item_with_sales_greater_than_stock = question_two[which(question_two$Sales > question_two$Stock),"Item"]
Couldn't help but solve without dplyr, however suggest, if not used yet, dplyr will always be useful for data crunching

As you just want to fix your code:
You missed to set one =in your code.
Use:
trials <- sample_n(Data_with_stock,1000)
# next you need first to define s used in your loop
s <- array(NA, dim = c(1,2)) # as you only save 2 things in s per iteration
for (i in 1:nrow(trials)) {
# but I dont get why you compare the second condition.
if(trials$sales[i] > trials$stock[i] & trials$item[i] == trials$item[i] & trials$branch[i] != trials$branch[i+1]) {
s[i,] <- cbind(trials$NAME[i], trials$branch[i])
} else {
s[i,] <- NA # just to have no problem with the index i, you can delete the one with na afterwards with na.omit()
}

Related

Removing/Filtering rows based on condition in R

I would like to remove participants who scored 2 in column EXP_MAN and 1 in Ethnicity_Rescuer.
I used the following code and it worked.
mydata <- mydata %>%
mutate(to_exclude =
case_when(EXP_MAN == 2 &
(Ethnicity_Rescuer ==1) ~ 1)) %>%
mutate(to_exclude = replace(to_exclude, is.na(to_exclude), 0))
mydata <- mydata %>% filter(to_exclude == 0)
However, this code seems very complicated and I am sure there should be a simpler solution.
I tried to filter out participants with the below code but it did not work. Just wondering what is the simplest code for removing participants in this case.
mydata <- mydata %>% filter(EXP_MAN != 2 & Ethnicity_Rescuer !=1)
You can use subset function to select a dataset and one/several conditions
subset(mydata,EXP_MAN != 2 & Ethnicity_Rescuer !=1)

Add an incremental counter to a dataframe in R

I have a data-frame composed of 3 columns and the third column ('V3') contains the name of markers that were presented during the experiment.
I'd like to add another column that tells me for each marker how many time that specific marker was presented before that instance.
I have done it in R without using the tidyverse but it is quite time consuming, and I was wondering if you could help me do it using the tidyverse.
At the moment my script goes like this:
data$counter <- NA
x<-1
for (i in 1:nrow(data)){
if ((str_detect(data$V3[i], 'NEGATIVE'))==TRUE){
data$counter[i] <- x
x <- x +1
}
}
I think cumsum will do an excellent job here:
testdata <- data.frame(V3=sample(c("NEGATIVEsomething","others"),50,replace=TRUE), stringsAsFactors = FALSE)
negatives <- grepl("NEGATIVE",testdata$V3)
negatives <- as.numeric(negatives)
negatives <- cumsum(negatives)
negatives[negatives == 0] <- NA
testdata$counter <- negatives
Edit: Since you want to increment the counter after finding a "NEGATIVE" and place the old counter at the position, you should use
negatives <- cumsum(negatives)-1
and then remove the 0 and -1 counts at the beginning:
negatives[negatives %in% c(0,-1)] <- NA
You can do this with group_by() and then generating the counters with seq_along().
library(dplyr)
data %>%
group_by(V3) %>%
mutate(counter = seq_along(V3)) %>%
ungroup()
Try
data$counter <- cumsum((str_detect(data$V3, 'NEGATIVE')))

Dynamic sum/count condition while assignment

I have two data frames (table1 and randomdata) with the following schema:
#randomdata
randomdata$cube = {1,5,3,3,4,5,5,2,2,6,1,2,....} (1000 rows)
#table1
table1$side = {1,2,3,4,5,6} (6 rows)
table1$frequency = NULL
I want to count the occurence from the different sides of the cube (of the first 10 rows from randomdata$cube) and assign the result to table1$frequency to the corresponding row (based on table1$side).
I can do this successfuly this way:
table1$frequency[1] <- sum(randomdata$cube[1:10] == 1)
table1$frequency[2] <- sum(randomdata$cube[1:10] == 2)
table1$frequency[3] <- sum(randomdata$cube[1:10] == 3)
...
table1$frequency[6] <- sum(randomdata$cube[1:10] == 6)
This works very well, but there must be a better way.
Instead of 6 statements, I imagine something like this:
table1$frequency <- sum(randomdata$cube[1:10] == table1$side)
Can someone show me a more dynamic way to do this?
Thank you.
We can do this with converting the 'cube' column to factor with levels specified as 1:6 and then do the table. If we do it without that, missing elements can get dropped out of the table output. Here, it would be 0 if a level is missing
table1$frequency <- table(factor(randomdata$cube[1:10], levels = 1:6))
Or using tidyverse
library(tidyverse)
randomdata %>%
slice(1:6) %>%
count(cube = factor(cube, levels = 1:6), .drop = FALSE) %>%
pull(n) %>%
mutate(table1, frequency = .)

Loop through multiple columns in R

I have the following code I'd like to run for multiple columns in a data frame called ccc.
ccc %>%
group_by(LA) %>%
summarise(Def = sum(DefaultOct05 == 'Def'),
NDef = sum(DefaultOct05 != 'Def'),
DRate = mean(DefaultOct05 == 'Def'))
LA is the name of one of the columns. How would I set up a loop to run through a number of different columns?
I've tried the following.
for (i in 26:ncol(ccc)) {
ccc %>%
group_by(i) %>%
summarise(Def = sum(DefaultOct05 == 'Def'),
NDef = sum(DefaultOct05 != 'Def'),
DRate = mean(DefaultOct05 == 'Def'))
}
But I get the following error message.
Error in resolve_vars(new_groups, tbl_vars(.data)) :
unknown variable to group by : i
What most people will miss in your question is a reproducible data set. Without it, its often very hard to reproduce your problem and solve it.
If I got you right, your data-set looks like the one above:
set.seed(1)
ccc=data.frame(Default=sample(c(0,1),100,replace = TRUE),LA=sample(c("X","Y","Z"),100,replace = TRUE),DC=sample(c("A","B","C"),100,replace = TRUE))
do.call() - applies rbind() to the subsequent elements.
lapply(dat,function(x)) applies the function to every element of dat - in our case columns.
library(dplyr)
do.call(rbind,lapply(ccc, function(Var) {
dat=data.frame(Var,Default=ccc$Default) %>% group_by(Var) %>% summarise(Def=sum(Default),NDef=n()-sum(Default),DRate=mean(Default))
return(as.data.frame(dat))
}
))
"LA is the name of one of the columns"
Actually, group by dplyr construction works on variables inside the columns. I guess you want to do other things.
If you want to apply the same function to different columns you could use summarize_at.
df <- data.frame( id = c(1:20),
a1 = runif(20),
b1 = runif(20),
c1 = runif(20)
)
library(dplyr)
df %>% summarise_at(c("a1","b1","c1"), funs(med = median,
avr = mean))
# result:
# a1_med b1_med c1_med a1_avr b1_avr c1_avr
# 1 0.6444056 0.5266252 0.6420554 0.5605837 0.4983654 0.5546381

Simple mutate with dplyr gives "wrong result size" error

My data table df has a subject column (e.g. "SubjectA", "SubjectB", ...). Each subject answers many questions, and the table is in long format, so there are many rows for each subject. The subject column is a factor. I want to create a new column - call it subject.id - that is simply a numeric version of subject. So for all rows with "SubjectA", it would be 1; for all rows with "SubjectB", it would be 2; etc.
I know that an easy way to do this with dplyr would be to call df %>% mutate(subject.id = as.numeric(subject)). But I was trying to do it this way:
subj.list <- unique(as.character(df$subject))
df %>% mutate(subject.id = which(as.character(subject) == subj.list))
And I get this error:
Error: wrong result size (12), expected 72 or 1
Why does this happen? I'm not interested in other ways to solve this particular problem. Rather, I worry that my inability to understand this error reflects a deep misunderstanding of dplyr or mutate. My understanding is that this call should be conceptually equivalent to:
df$subject.id <- NULL
for (i in 1:nrow(df)) {
df$subject.id[i] <- which(as.character(df$subject[i]) == subj.list))
}
But the latter works and the former doesn't. Why?
Reproducible example:
df <- InsectSprays %>% rename(subject = spray)
subj.list <- unique(as.character(df$subject))
# this works
df$subject.id <- NULL
for (i in 1:nrow(df)) {
df$subject.id[i] <- which(as.character(df$subject[i]) == subj.list)
}
# but this doesn't
df %>% mutate(subject.id = which(as.character(subject) == subj.list))
The issue is that operators and functions are applied in a vectorized way by mutate. Thus, which is applied to the vector produced by as.character(df$subject) == subj.list, not to each row (as in your loop).
Using rowwise as described here would solve the issue: https://stackoverflow.com/a/24728107/3772587
So, this will work:
df %>%
rowwise() %>%
mutate(subject.id = which(as.character(subject) == subj.list))
Since your df$subject is a factor, you could simply do:
df %>% mutate(subj.id=as.numeric(subject))
Or use a left join approach:
subj.df <- df$subject %>%
unique() %>%
as_tibble() %>%
rownames_to_column(var = 'subj.id')
df %>% left_join(subj.df,by = c("subject"="value"))

Resources