Syntax errors I believe in my If statements - r

I am trying to complete a problem and I believe I'm running into a sort of formatting error for my if statements? My code is partially working in the sense that it is giving a sort of shipping surcharge, however, not correctly to which market I'm asking of it.
The question at hand is asking me to perform this:
In the imported data frame, create another column named “shipping_surcharge” whose value is computed based on the Region and Sales as follows.
a. If the Region Market is US, Canada or LATAM and Sales is less than $200, the shipping surcharge is 10% of Sales. For these regions markets, if Sales is $200 or more, the shipping surcharge is 15% of Sales.
b. If the Region Market is EMEA, EU or Africa and Sales is less than $250, the shipping surcharge is 15% of Sales. For these regions markets, if Sales if $250 or more, the shipping surcharge is 25% of Sales.
c. For the APAC region market, if Sales is less than $150, the shipping surcharge is 20% of Sales. Otherwise, it is 30% of Sales.
The code I have written thus far is this:
orders$shipping_surcharge <- ""
for(i in (1:n))
{
if(orders$Market[i] = "US" | orders$Market[i] = "Canada" | orders$Market[i] = LATAM & orders$Sales[i] < 200)
{
orders$shipping_surcharge[i] <- (0.10 * orders$Sales)
}
else if(orders$Sales[i] >= 200)
{
orders$shipping_surcharge[i] <- (0.15 * orders$Sales)
}
else if(orders$Market[i] = "EMEA" | orders$Market[i] = "EU" | orders$Market[i] = "Africa" & orders$Sales < 250)
{
orders$shipping_surcharge[i] <- (0.15 * orders$Sales)
}
else if(orders$Sales[i] >= 250)
{
orders$shipping_surcharge[i] <- (0.25 * orders$Sales)
}
else if(orders$Market[i] = "APAC" & orders$Sales[i] < 150)
{
orders$shipping_surcharge[i] <- (0.20 * orders$Sales)
}
else orders$shipping_surcharge[i] <- (0.30 * orders$Sales)
}
If you could explain to me what is wrong with my syntax so that I can understand in the future if I'm ever tested on it. Thank you in advance.

Related

Set multiple threshold on a log based kusto query

I have set up a log-based alert in Microsoft Azure. The deployment of the alerts done via ARM template.
Where you can input your query and set threshold like below.
"triggerThresholdOperator": {
"value": "GreaterThan"
},
"triggerThreshold": {
"value": 0
},
"frequencyInMinutes": {
"value":15
},
"timeWindowInMinutes": {
"value": 15
},
"severityLevel": {
"value": "0"
},
"appInsightsQuery": {
"value": "exceptions\r\n| where A_ != '2000' \r\n| where A_ != '4000' \r\n| where A_ != '3000' "
}
As far as I understand we can only set threshold once ON an entire query.
Questions: I have multiple statements in my query which I am excluding since it's just a noise. But now I want to set a threshold on value 3000 to 5 and also want to set a time-window to 30 in the same query. meaning only exclude 3000 when it occurs 5 times in the last 30 minutes(when query get run).
exceptions
| where A_ != '2000'
| where A_ != '4000'
| where A_ != '3000'
I am pretty sure that I can't set a threshold like this in the query and the only workaround is to create a new alert just for value 3000 and set a threshold in ARM template. I haven't found any heavy threshold/time filters in Aure. Is there any way I can set multiple thresholds and time filters in a single query? which is again getting checked by different threshold and time filetrs in the ARM template.
Thanks.
I don't fully understand your question.
But for your time window question you could do something like
exceptions
| summarize count() by A_, bin(TimeGenerated, 30m)
That way you will get a count of A_ in blocks of 30 minutes.
Another way would be to do:
let Materialized = materialize(
exceptions
| summarize Count=count(A_) by bin(TimeGenerated, 30m)
); 
Materialized | where Count == 10
But then again it all depends on what you would like to achieve
You can easily set that in the query and fire based on the aggregate result.
exceptions
| where timestamp > ago(30m)
| summarize count2000 = countif(A_ == '2000'), count3000 = countif(A_ == '3000'), count4000 = countif(A_ == '4000')
| where count2000 > 5 or count3000 > 3 or count4000 > 4
If the number of results is greater than one than the aggregate condition applies.

Conditional mutating of the R data frame based on the strings

I am using R and trying to create a new column based on the string information from the existing columns.
My data is like:
risk_code | area
-----------------------------------
DEEP DIGGING ALL | --
CONSTRUCTION PRO | Construction
CLAIMS ONSHORE | --
OFFSHORE CLAIMS | --
And the result I need is:
risk_code | area | area_new
-------------------------------------------------
DEEP DIGGING ALL | -- | Digging
CONSTRUCTION PRO | Construction | Construction
CLAIMS ONSHORE | -- | Onshore
OFFSHORE CLAIMS | -- | Offshore
I understanding that I make several mistakes in the code, but after the whole week of staring on it and internet searching, I cannot get the result I need.
I appreciate your help.
Thanks in advance.
Occupancy <- read_excel("Occupancy.xlsx")
OccupancyMutated <- mutate(Occupancy, area_new = area)
OccupancyMutated <- as.data.frame(OccupancyMutated)
OccupancyMutated$area_new[Occupancy$area == "--"] <-
{
if (OccupancyMutated$risk_code == %Digging%) {"Digging"}
else if (OccupancyMutated$risk_code == %ONSHORE%) {"Onshore"}
else if (OccupancyMutated$risk_code == %OFFSHORE%) {"Offshore"}
else {"empty"}
}
View(OccupancyMutated)
We can use stringr for this operation. The function word will extract the first word of each string in risk_code and the function str_to_title will convert to your required format. Both functions are vectorized so simply,
library(stringr)
str_to_title(word(df$risk_code, 1, 1))
#[1] "Digging" "Construction" "Onshore" "Offshore"
If it is not always the first word and you need to do it for specific words only, you can do,
str_to_title(str_extract(tolower(df$risk_code), 'digging|offshore|onshore'))
#[1] "Digging" NA "Onshore" "Offshore"
So, this is the answer (thanks to Sotos):
Occupancy <- read_excel("Occupancy.xlsx")
OccupancyMutated <- mutate(Occupancy, area_new = area)
OccupancyMutated <- as.data.frame(OccupancyMutated)
OccupancyMutated$area_new[Occupancy$area == "--"] <-
str_to_title(str_extract(tolower(Occupancy$risk_code), 'Extraction|Offshore|Onshore'))
View(OccupancyMutated)

Name ''total'' is not defined

What should I differently? The Result is line 12 print(total) NameError: name 'total' is not defined
def gross_pay (hours,rate):
info =()
info = getUserInfo()
rate = float(input('How much do you make an hour?:'))
hours = int(input('How many hours did you work?:'))
total = rate * hours
taxes = total * 0.05
total = total - taxes
print(total)
total is a local variable. It doesn't exist outside the function. Also you need to call the function, where you can return total. getUserInfo() is not present and info is unused. Asking for the input parameters inside the function is incorrect as well. Technically, pay after taxes is net pay, not gross:
def net_pay(hours,rate):
total = rate * hours
taxes = total * 0.05
return total - taxes
rate = float(input('How much do you make an hour? '))
hours = int(input('How many hours did you work? '))
print(net_pay(hours,rate))
Output:
How much do you make an hour? 10.50
How many hours did you work? 40
399.0
def gross_pay (hours,rate):
info =()
# getUserInfo() should also be defined on your code:
info = getUserInfo()
rate = float(input('How much do you make an hour?:'))
hours = int(input('How many hours did you work?:'))
total = rate * hours
taxes = total * 0.05
total = total - taxes
print(total)
#calling the declarated (defined) function:
hours=0
rate=0
gross_pay()
I'm assuming you're passing the parameters hours and rate by reference because you're gonna need the values later, otherwise they're not necesary, since you're asking for input inside the gross_pay function

Back testing for (HK) Stock Market with R

I complete my first back testing scripts with help of great people in Stackoverflow. However, when I try to run this by using the data of my local stock market (Hong Kong) it got an error. I cannot find out where the problem is. Please give me a hand to take a look my coding. thanks.
library(quantmod)
library(lubridate)
library(xlsx)
stock0<-getSymbols("^HSI",src="yahoo",from="1988-01-01",auto.assign=F)
stock0 <- to.weekly(stock0)
stock1<-na.locf(stock0)
stock1$SMA1<-SMA(Cl(stock1),n=1)
stock1$SMA30<-SMA(Cl(stock1),n=30)
stock1$SMACheck<-ifelse(stock1$SMA1>stock1$SMA30,1,0)
stock1$SMA_CrossOverUp<-ifelse(diff(stock1$SMACheck)==1,1,0)
stock1$SMA_CrossOverDown<-ifelse(diff(stock1$SMACheck)==-1,-1,0)
stock1<-stock1[index(stock1)>="1998-01-01",]
stock1_df<-data.frame(index(stock1),coredata(stock1))
colnames(stock1_df)<-c("Date","Open","High","Low","Close","Volume","Adj","SMA1","SMA30","EMACheck","EMACheck_up","EMACheck_down")
#To calculate the number of crossoverup transactions during the duration from 2016-01-01
sum(stock1_df$SMACheck_up==1 & index(stock1)>="2010-01-01",na.rm=T)
stock1_df$Date[stock1_df$SMACheck_up==1 & index(stock1)>="2010-01-01"]
sum(stock1_df$SMACheck_down==-1 & index(stock1)>="2010-01-01",na.rm=T)
stock1_df$Date[stock1_df$SMACheck_down==-1 & index(stock1)>="2010-01-01"]
stock1_df
#To generate the transcation according to the strategy
transaction_dates<-function(stock2,Buy,Sell)
{
Date_buy<-c()
Date_sell<-c()
hold<-F
stock2[["Hold"]]<-hold
for(i in 1:nrow(stock2)) {
if(hold == T) {
stock2[["Hold"]][i]<-T
if(stock2[[Sell]][i] == -1) {
#stock2[["Hold"]][i]<-T
hold<-F
}
} else {
if(stock2[[Buy]][i] == 1) {
hold<-T
stock2[["Hold"]][i]<-T
}
}
}
stock2[["Enter"]]<-c(0,ifelse(diff(stock2[["Hold"]])==1,1,0))
stock2[["Exit"]]<-c(ifelse(diff(stock2[["Hold"]])==-1,-1,0),0)
Buy_date <- stock2[["Date"]][stock2[["Enter"]] == 1]
Sell_date <- stock2[["Date"]][stock2[["Exit"]] == -1]
if (length(Sell_date)<length(Buy_date)){
#Sell_date[length(Sell_date)+1]<-tail(stock2[["Date"]],n=2)[1]
Buy_date<-Buy_date[1:length(Buy_date)-1]
}
return(list(DatesBuy=Buy_date,DatesSell=Sell_date))
}
#transaction dates generate:
stock1_df <- na.locf(stock1_df)
transactionDates<-transaction_dates(stock1_df,"SMACheck_up","SMACheck_down")
transactionDates
num_transaction1<-length(transactionDates[[1]])
Open_price<-function(df,x) {
df[which(df[["Date"]]==x)+1,][["Open"]]
}
transactions_date<-function(df,x) {
df[which(df[["Date"]]==x)+1,][["Date"]]
}
transactions_generate<-function(df,num_transaction)
{
price_buy<-sapply(1:num_transaction,function(x) {Open_price(df,transactionDates[[1]][x])})
price_sell<-sapply(1:num_transaction,function(x) {Open_price(df,transactionDates[[2]][x])})
Dates_buy<-as.Date(sapply(1:num_transaction,function(x) {transactions_date(df,transactionDates[[1]][x])}))
Dates_sell<-as.Date(sapply(1:num_transaction,function(x) {transactions_date(df,transactionDates[[2]][x])}))
transactions_df<-data.frame(DatesBuy=Dates_buy,DatesSell=Dates_sell,pricesBuy=price_buy,pricesSell=price_sell)
#transactions_df$return<-100*(transactions_df$pricesSell-transactions_df$pricesBuy)/transactions_df$pricesBuy
transactions_df$Stop_loss<-NA
return(transactions_df)
}
transaction_summary<-transactions_generate(stock1_df,num_transaction1)
transaction_summary$Return<-100*(transaction_summary$pricesSell-transaction_summary$pricesBuy)/transaction_summary$pricesBuy
transaction_summary
I complete my first back testing scripts with help of great people in Stackoverflow. However, when I try to run this by using the data of my local stock market (Hong Kong) it got an error. I cannot find out where the problem is. Please give me a hand to take a look my coding. thanks.

Efficient loan repayment calculation

I have a table of loan issuance and repayments by customers that I have preprocessed like this
customerID | balanceChange | trxDate | TYPE
242105 | 500 | 20170605 | loan
242105 | 1500 | 20170605 | loan
242105 | -1000 | 20170607 | payment
242111 | 500 | 20170605 | loan
242111 | -500 | 20170606 | payment
242111 | 500 | 20170607 | loan
242111 | -500 | 20170609 | payment
242151 | 500 | 20170605 | loan
What I would like to do is to (1) count for each of the loans issued every day, how many of them have been paid back in full, and (2) how many days did it take the customer to pay them.
The rule of the repayment is of course FIFO (First In First Out), so the oldest loan gets paid back first.
In the example above, the solution would be
trxDate | nRepayments | timeGap(days)
20170605 | 2 | 1.5
20170606 | 0 | 0
20170607 | 1 | 2
So, the explanation on why the solution is like that is on 20170605, there are 4 loans issued (2 to customerID 242105, and he other two to 242111 and 242151), but only 2 of those loans were paid back (the 500 given to 242105 and the 500 given to 242111). The timeGap is the average of sum of how many days did it took every customers to pay them back (242105 paid back on 20170607 - 2 day, and 242111 paid back on 20170606 - 1 day), so (2+1)/2 = 1.5.
I have tried to calculate the nRepayments (I figured if I did this the timeGap should be a piece of cake) with the following R script.
#Recoveries
data_loans_rec <- data_loans %>% arrange(customerID, trxDate) %>% as.data.table()
data_loans_rec[is.na(data_loans_rec)] <- 0
data_loans_rec <- data_loans_rec[, index := seq_len(.N), by = customerID][!(index == 1 & TYPE == "payment")][, index := seq_len(.N), by = customerID]
n_loans_given <- data_loans[TYPE == "loan", ][, .(nloans = .N), .(payment)][order(payment)]
n_loans_rec <- copy(n_loans_given)
n_loans_rec[, nloans:=0]
unique_cust <- unique(data_loans_rec$customerID)
#Check repayment for every customer================
for (i in 1:length(unique_cust)) {
cur_cust <- unique_cust[i]
list_loan <- as.vector(data_loans_rec[customerID == cur_cust & TYPE == "loan", .(balanceChange)] )
list_loan_time <- as.vector(data_loans_rec[customerID == cur_cust & TYPE == "loan", .(trxDate) ])
list_pay <- as.vector(data_loans_rec[customerID == cur_cust & TYPE == "payment", .(balanceChange) ])
if (dim(list_pay)[1] == 0) { #if there are no payments
list_pay <- c(0)
}
sum_paid <- sum(abs(list_pay))
i_paid_until <- 0
for (i_loantime in 1:(dim(list_loan_time)[1])) {
#if there is only one loan
if (i_loantime == 0) {
i_loantime <- 1
}
loan_curr <- list_loan[i_loantime]
loan_left <- loan_curr - sum_paid
if (loan_left <= 0) {
n_loans_rec[trxDate == list_loan_time[i_loantime], nloans:=nloans+1]
sum_paid <- sum_paid - loan_curr
print (paste(i_loantime, list_loan_time[i_loantime], n_loans_rec[trxDate == list_loan_time[i_loantime], .(nloans)]))
# break
} else {
break
}
}
print (i)
}
The idea is that for every customer, make a list of loans, time of loan, and payments. The best case scenario is if the customer's total amount of loan is equal or less (due to dirty data) the total amount of payment (full payment). Then the number of repayments equals the number of loans issued to that customer. The average case is when customer's make a partial payment. In which case, I sum the total amount of payments, and I iterate through each loan the customer made whilst summing the total amount of loans as I iterate. If the amount of loan finally exceeds the amount of payments, then I count how many loans have actually been covered by the customer's payments.
The problem is I have millions of customers, and each of them have made loans and payments at least 5 times. So, since I am using a nested loop, it would take hours to complete.
So, I am asking here if anyone has ever come across this problem and/or have a better, more efficient solution.
Thanks in advance!
Your logic is quite complicated and with this answer I don't attempt to replicate it fully; my intention is just to give you some ideas on how to optimise.
Also, as mentioned in comments, you could try to parallelise, or maybe use another programming language.
Anyway, as your setup is already with data.table, you can try to use global operations to the full set, as much as you can, which will usually go faster than your big loop. Something for example like this.
I first calculate, per customer id, the balance and the sum of payments done:
data_loans_rec <- data_loans_rec[, balance := sum(balanceChange), by = customerID]
data_loans_rec <- data_loans_rec[, sumPayments := sum(balanceChange[TYPE == "payment"]), by = customerID]
With this, you already know that every customer with balance 0 has repaid everything:
data_loans_rec <- data_loans_rec[TYPE == "loan" & balance == 0, repaid := TRUE, by = list(customerID, index)]
These operations of course read a lot of data if you have millions of customers, but I'd say that data.table should handle them pretty quickly.
For the rest of the customers, but only for those registers that are a loan and you don't know yet if they have been repaid, you can use a data.table function.
setRepaid <- function(balanceChange, sumPayments) {
# note that here you get a vector for all the loans of a customer
sumPay <- (-1) * sumPayments[1]
if (sumPay == 0)
return(rep(FALSE, length(balanceChange)))
number_of_loans_paid <- 0
for (i in 1:length(balanceChange)) {
if (sum(balanceChange[1:i]) > sumPay)
break
number_of_loans_paid <- number_of_loans_paid + 1
}
return(c(rep(TRUE, number_of_loans_paid), rep(FALSE, length(balanceChange)-number_of_loans_paid)))
}
data_loans_rec <- data_loans_rec[TYPE == "loan" & is.na(repaid), repaid := setRepaid(balanceChange, sumPayments), by = list(customerID) ]
With that you get the desired result, at least for your example.
customerID balanceChange trxDate TYPE index balance sumPayments repaid
1: 242105 500 20170605 loan 1 1000 -1000 TRUE
2: 242105 1500 20170605 loan 2 1000 -1000 FALSE
3: 242105 -1000 20170607 payment 3 1000 -1000 NA
4: 242111 500 20170605 loan 1 0 -1000 TRUE
5: 242111 -500 20170606 payment 2 0 -1000 NA
6: 242111 500 20170607 loan 3 0 -1000 TRUE
7: 242111 -500 20170609 payment 4 0 -1000 NA
8: 242151 500 20170605 loan 1 500 0 FALSE
Advantages being: the final loop works over much less customers, you have some stuff already precalculated, and you rely on data.table for actually substituting your loop. Hopefully this approach will give you an improvement. I think it is work a try.

Resources