Assigning values according to set limits in R - r

I have a list of stores with some quantities of different articles in them and a warehouse with these articles - these are two separate data frames.
Article <- c('a','b','a','b','c','d')
forecast <- c( 1,5,80,10,100,1000)
StoreID <- c(1,1,2,2,3,4)
StoreData <- data.frame(StoreID, Article, Order)
Smth like this:
StoreData
StoreID Article forecast
1 a 1
1 b 5
2 a 80
2 b 10
3 c 100
4 d 1000
And the warehouse data:
Stock <- c(10,11,12,100)
WarehouseData <- data.frame(Article, Stock)
WarehouseData
Article Stock
a 10
b 11
c 12
d 100
My target is to have a purchase order column. The logic has to be following: for every row in the StoreData table I have to look at the Stock of the Article in the Warehouse and if it is enough - approve the fcst, if not - approve only the avaulable quantity. My problem is that while approving quantities the avaialble stock is reducing and I cannot find out how to take it into account.
The expected result looks like this:
StoreData
StoreID Article forecast PO
1 a 1 1
1 b 5 5
2 a 80 9
2 b 10 6
3 c 100 12
4 d 1000 100
Can anyone, please, tell how to get it right?

Here's another approach using dplyr:
library(dplyr)
left_join(storeData, WarehouseData, by = "Article") %>%
group_by(Article) %>%
mutate(PO = ifelse(cumsum(forecast) <= Stock, forecast,
Stock - cumsum(forecast) + forecast)) %>% ungroup
#Source: local data frame [6 x 5]
#
# StoreID Article forecast Stock PO
# (int) (fctr) (int) (dbl) (dbl)
#1 1 a 1 10 1
#2 1 b 5 11 5
#3 2 a 80 10 9
#4 2 b 10 11 6
#5 3 c 100 12 12
#6 4 d 1000 100 100

See the loop below for example:
StoreData$PO <- NA
for (i in 1:nrow(StoreData)) {
query <- WarehouseData$Article == StoreData[i, "Article"]
po <- ifelse(StoreData[i, "forecast"] > WarehouseData[query, 2],
WarehouseData[query, 2],
StoreData[i, "forecast"])
WarehouseData[query, 2] <- WarehouseData[query, 2] - po
StoreData[i, "PO"] <- po
}
print(StoreData)
# StoreID Article forecast PO
# 1 1 a 1 1
# 2 1 b 5 5
# 3 2 a 80 9
# 4 2 b 10 6
# 5 3 c 100 12
# 6 4 d 1000 100
This is another alternative based on the other solution using base R:
StoreData <- merge(StoreData, WarehouseData)
StoreData$PO <- do.call(c, lapply(split(StoreData, StoreData$Article), function(z) {
ifelse(cumsum(z$forecast) <= z$Stock, z$forecast,
z$Stock - cumsum(z$forecast) + z$forecast)
}))
And here is what I used to recreate your dataset, might help other answers:
StoreData <- read.table(text = "StoreID Article forecast
1 a 1
1 b 5
2 a 80
2 b 10
3 c 100
4 d 1000", header = T)
Article <- c('a','b','c','d')
Stock <- c(10,11,12,100)
WarehouseData <- data.frame(Article, Stock)

Related

Finding the maximum value of a variable

I would like to find the maximum value of a variable (column) and then retain this value (the maximum value) and all values below it. Along with these values, I would like to retain the corresponding values from all other variables (columns) within the data frame. I want to exclude all values above this point from the data frame, for all variables within it. Included is the script for an example data frame (df), and an expected data frame (df2) i.e. what I am trying to achieve. I would be so grateful for some script to do this.
Ba <- c(1,1,1,2,2)
Sr <- c(1,1,1,2,2)
Mn <- c(1,1,2,1,1)
df <- data.frame(Ba, Sr, Mn)
df
# Ba Sr Mn
# 1 1 1 1
# 2 1 1 1
# 3 1 1 2
# 4 2 2 1
# 5 2 2 1
Showing 1 to 5 of 5 entries, 3 total columns
This is what I want to achieve in R:
Ba2 <- c(1,2,2)
Sr2 <- c(1,2,2)
Mn2 <- c(2,1,1)
df2 <- data.frame(Ba2, Sr2, Mn2)
df2
# Ba2 Sr2 Mn2
# 1 1 1 2
# 2 2 2 1
# 3 2 2 1
Showing 1 to 3 of 3 entries, 3 total columns
You can subset df with the sequence from min to nrow(df) of which.max per column:
df[min(sapply(df, which.max)):nrow(df),]
# Ba Sr Mn
#3 1 1 2
#4 2 2 1
#5 2 2 1
Does this work:
df[max(apply(df, 1, which.max)):nrow(df),]
Ba Sr Mn
3 1 1 2
4 2 2 1
5 2 2 1
Using cummax
library(dplyr)
library(purrr)
df %>%
filter(cummax(invoke(pmax, cur_data())) == max(cur_data()))
Ba Sr Mn
1 1 1 2
2 2 2 1
3 2 2 1

Count number of shared observations between samples using dplyr

I have a list of observations grouped by samples. I want to find the samples that share the most identical observations. An identical observation is where the start and end number are both matching between two samples. I'd like to use R and preferably dplyr to do this if possible.
I've been getting used to using dplyr for simpler data handling but this task is beyond what I am currently able to do. I've been thinking the solution would involve grouping the start and end into a single variable: group_by(start,end) but I also need to keep the information about which sample each observation belongs to and compare between samples.
example:
sample start end
a 2 4
a 3 6
a 4 8
b 2 4
b 3 6
b 10 12
c 10 12
c 0 4
c 2 4
Here samples a, b and c share 1 observation (2, 4)
sample a and b share 2 observations (2 4, 3 6)
sample b and c share 2 observations (2 4, 10 12)
sample a and c share 1 observation (2 4)
I'd like an output like:
abc 1
ab 2
bc 2
ac 1
and also to see what the shared observations are if possible:
abc 2 4
ab 2 4
ab 3 6
etc
Thanks in advance
Here's something that should get you going:
df %>%
group_by(start, end) %>%
summarise(
samples = paste(unique(sample), collapse = ""),
n = length(unique(sample)))
# Source: local data frame [5 x 4]
# Groups: start [?]
#
# start end samples n
# <int> <int> <chr> <int>
# 1 0 4 c 1
# 2 2 4 abc 3
# 3 3 6 ab 2
# 4 4 8 a 1
# 5 10 12 bc 2
Here is an idea via base R,
final_d <- data.frame(count1 = sapply(Filter(nrow, split(df, list(df$start, df$end))), nrow),
pairs1 = sapply(Filter(nrow, split(df, list(df$start, df$end))), function(i) paste(i[[1]], collapse = '')))
# count1 pairs1
#0.4 1 c
#2.4 3 abc
#3.6 2 ab
#4.8 1 a
#10.12 2 bc

creating row index based on time difference in R

I have data that looks like:
player event diff
A x NA
A y 2
A z 240
A w 3
A x 9
B x NA
B y 3
B z 120
C x NA
C x 8
What I did to get this was to group by the player column and take the difference between time events, hence the NA's for diff column whenever a new player has an event.
What I want to do is to partition the data into player specific interactions that are within a few minutes of each other (say a cutoff of diff = 20). What I want in the end is to have:
player event diff interaction
A x NA 1
A y 2 1
A z 240 2
A w 3 2
A x 9 2
B x NA 1
B y 3 1
B z 120 2
C x NA 1
C x 8 1
So basically the interactions are grouped based on having the same player and the difference being less than 20, otherwise a new interaction is started. A new interaction is also started if an NA is present. I'm not really sure how to do this in a fast/efficient way as I've got a large data set with many players. My preference is for a dplyr solution
You can replace NA with 0(or other number that is below your threshold) using coalesce in the diff column and do a cumsum on the diff >= 20 condition, which will give a distinct id whenever diff exceed some threshold:
library(dplyr)
df %>% group_by(player) %>%
mutate(interaction = cumsum(coalesce(diff, 0L) >= 20) + 1)
# Source: local data frame [10 x 4]
# Groups: player [3]
# player event diff interaction
# <fctr> <fctr> <int> <dbl>
# 1 A x NA 1
# 2 A y 2 1
# 3 A z 240 2
# 4 A w 3 2
# 5 A x 9 2
# 6 B x NA 1
# 7 B y 3 1
# 8 B z 120 2
# 9 C x NA 1
# 10 C x 8 1
We can also use base R to get the expected output
df1$interaction <- with(df1, ave(diff, player, FUN = function(x)
cumsum(x > 20 & !is.na(x))+1))
df1$interaction
#[1] 1 1 2 2 2 1 1 2 1 1

convert individual response data-frame into team-level data-frame in r

I have the following data-frames:
> team_1_A
MemberA Q1 Q2
1 C 2 3
2 B 3 4
> team_1_B
MemberB Q1 Q2
1 A 5 4
2 C 5 2
> team_1_C
MemberC Q1 Q2
1 A 2 5
2 B 5 5
These data-frames correspond to individual responses regarding their perceptions of team members. For example, the first data-frame is how team member A (in team 1) perceives team members B and C for questions 1 and 2 (Q1 and Q2). My goal is to automate a process that converts these data-frames from individual responses to team-level data-frames for each question, so that what would be obtained for these data-frames would be:
> T1Q1
X A B C
1 A 0 3 2
2 B 5 0 5
3 C 2 5 0
> T1Q2
X A B C
1 A 0 3 2
2 B 5 0 5
3 C 2 5 0
Thus, 0s appear along the diagonal, and row by row is what each member said about the other for a given question. For example, in T1Q1 in row1 we see A's perceptions of B and C.
The reshape2 package will make your life easy for this task:
rm(list=ls())
library(reshape2)
team_1_A <- data.frame(MemberA=c('C', 'B'), Q1=c(2,3), Q2=c(3,4))
team_1_B <- data.frame(MemberB=c('A', 'C'), Q1=c(5,5), Q2=c(4,2))
team_1_C <- data.frame(MemberC=c('A', 'B'), Q1=c(2,5), Q2=c(5,5))
# store data frames in a list
members <- list(team_1_A, team_1_B, team_1_C)
# format rows/columns
formatted <- lapply(members, function(m){
m$Respondent <- gsub('Member', '', names(m)[1])
names(m)[1] <- 'TeamMate'
return(m)
})
formatted <- do.call('rbind', formatted)
# separate questions into different data frames
questionList <- lapply(c(1,2), function(q) formatted[,c('Respondent', 'TeamMate', paste0('Q',q))])
# reshape, ensure order is correct
questionList <- lapply(questionList, function(q) {
q <- dcast(q, Respondent ~ TeamMate)
q <- q[,c('Respondent', 'A', 'B', 'C')]
return(q)
})
names(questionList) <- c('T1Q1', 'T1Q2')
# replace NA with 0
# etc...
questionList
$T1Q1
Respondent A B C
1 A NA 3 2
2 B 5 NA 5
3 C 2 5 NA
$T1Q2
Respondent A B C
1 A NA 4 3
2 B 4 NA 2
3 C 5 5 NA

how to group my string vector in two data frame?

Hi: I have a simple question but it confuse me a lot. Below are my codes:
a <- data.frame(url = c("1","2","3","4","5"),
id = c("a","b","c","d","e")
)
b <- data.frame(url = c("1","1","2","2","2","3","3","3","3","4","4","5","5"),
price = c(10,10,20,20,20,30,30,30,30,40,40,50,50),
recipt=c("n","n","n","n","n","n","n","n","n","y","y","n","n")
)
I want my newdata , which merge b$recipt into a and becomes:
>newdata
url id recipt
1 a n
2 b n
3 c n
4 d y
5 e n
please give me some hint, thanks
You could try this:
a$recipt <- sapply(1:nrow(a),function(x) b$recipt[b$url==a$url[x]][1])
#> a
# url id recipt
#1 1 a n
#2 2 b n
#3 3 c n
#4 4 d y
#5 5 e n
Here it is assumed that the recipt entries are the same for any given value of url in b. If this is not the case, things become more complicated.
If you want to keep a unchanged and generate a new frame newdata with the new column, then the above code can be slightly modified in a rather trivial way:
newdata <- a
newdata$recipt <- sapply(1:nrow(a),function(x) b$recipt[b$url==a$url[x]][1])
You could use match
transform(a, recipt= b$recipt[match(url, b$url)])
# url id recipt
#1 1 a n
#2 2 b n
#3 3 c n
#4 4 d y
#5 5 e n
Or using the devel version of data.table. Instructions to install the devel version are here
library(data.table)#v1.9.5+
setDT(a)[unique(b[c(1,3)], by='url'), on='url']
# url id recipt
#1: 1 a n
#2: 2 b n
#3: 3 c n
#4: 4 d y
#5: 5 e n
So I think what you want is to merge a onto b as there are multiple prices with the same url. Thus b is your base data frame and you want to append an id value to it. Some of the id values would be repeated.
One easy way is to do this with dplyr.
library(dplyr)
a <- data.frame(url = c("1","2","3","4","5"),
id = c("a","b","c","d","e")
)
b <- data.frame(url = c("1","1","2","2","2","3","3","3","3","4","4","5","5"),
price = c(10,10,20,20,20,30,30,30,30,40,40,50,50),
recipt=c("n","n","n","n","n","n","n","n","n","y","y","n","n")
)
left_join(b, a, by = "url")
url price recipt id
1 1 10 n a
2 1 10 n a
3 2 20 n b
4 2 20 n b
5 2 20 n b
6 3 30 n c
7 3 30 n c
8 3 30 n c
9 3 30 n c
10 4 40 y d
11 4 40 y d
12 5 50 n e
13 5 50 n e

Resources