Suppose we have this data set, where avg_1, avg_2 and avg_3 repeat themselves:
avg_1 avg_2 avg_3 party_gender
424 242 213 RM
424 242 213 RF
424 242 213 DM
How can I edit this using R so that the data set looks like this (where the avg values aren't repeated, and avg_1, avg_2 and avg_3 correspond to RM, RF and DM respectively):
avg party_gender
424 RM
242 RF
213 DM
Admittedly, this is a bit hacky and doesn't work nicely if you have more than just a few conditions for the avg. value:
library(tidyverse)
dat %>%
pivot_longer(-party_gender) %>%
filter(party_gender == "RM" & value == 424 |
party_gender == "RF" & value == 242 |
party_gender == "DM" & value == 213) %>%
mutate(name = "avg") %>%
pivot_wider()
which gives:
# A tibble: 3 x 2
party_gender avg
<chr> <dbl>
1 RM 424
2 RF 242
3 DM 213
I want to categorize rivers dataset into “tiny” (<500), “short” (<1500), “medium” (<3000) and “long”
(>=3000). I want to plot a pie chart that visualizes frequency of these four categories.
I tried:
rivers[rivers >= 3000] = 'long'
rivers[rivers >= 1500 & rivers < 3000] = 'meidum'
rivers[rivers >= 500 & rivers < 1500]='short'
rivers[rivers < 500] = 'tiny'
It seems the third command has no effect on data and they are the same as before!
table(rivers)
rivers
500 505 524 525 529 538 540 545 560 570 600 605
2 1 1 2 1 1 1 1 1 1 3 1
610 618 620 625 630 652 671 680 696 710 720 730
1 1 1 1 1 1 1 1 1 1 2 1
735 760 780 800 840 850 870 890 900 906 981 long
2 1 1 1 1 1 1 1 2 1 1 1
meidum tiny
36 62
What is wrong with my commands, and is it the right way to draw a pie chart for them?
The cut function and easily perform this task:
#random data
rivers<-runif(20, 0, 5000)
#break into desired groups and label
answer<-cut(rivers, breaks=c(0, 500, 1500, 3000, Inf),
labels=c("tiny", "short", "medium", "long"), right=FALSE)
table(answer)
# tiny short medium long
# 1 10 7 2
You are running into this problem because you are trying to assign character values to an integer vector. If you work with a character vector instead, it should work:
> rivers_size <- as.character(rivers)
> rivers_size[rivers >= 3000] = 'long'
> rivers_size[rivers >= 1500 & rivers < 3000] = 'meidum'
> rivers_size[rivers >= 500 & rivers < 1500]='short'
> rivers_size[rivers < 500] = 'tiny'
> table(rivers_size)
rivers_size
long meidum short tiny
1 5 53 82
> pie(table(rivers_size))
Alternatively, the same thing can be accomplished using cut (as #Dave2e shows):
rivers <- cut(datasets::rivers,
breaks = c(0, 500, 1500, 3000, Inf),
labels = c("tiny", "short", "medium", "long"),
right = FALSE)
pie(table(rivers))
Here is another alternative using dplyr::case_when. It is more verbose than using cut but it is also easier generalize.
library("tidyverse")
set.seed(1234) # for reproducibility
# `case_when` vectorizes multiple `if-else` statements.
rivers <- sample.int(5000, size = 1000, replace = TRUE)
rivers <- case_when(
rivers >= 3000 ~ "long",
rivers >= 1500 ~ "medium",
rivers >= 500 ~ "short",
TRUE ~ "tiny"
)
table(rivers)
#> rivers
#> long medium short tiny
#> 406 303 199 92
Created on 2019-04-10 by the reprex package (v0.2.1)
R gurus,
I would like to write a function to apply dynamic prices based on quantities purchased.
Here is the dataset.
prices <- data.frame(from = c(0,101,201,301,401,501,601,701,801,901,1001,1101,2001),
to = c(100,200,300,400,500,600,700,800,900,1000,1100,2000,10000),
price = c(50,45,40,35,30,25,20,15,10,8,7,6,5))
purchases <- data.frame(customer = LETTERS[1:20],
quantity = c(305,100,70,576,687,1200,5007,491,870,320,80,295,680,1100,1305,1024,1800,7400,3500,730),
bill = NA)
purchases dataset has quantities and price dataset has sliding scale prices for different quantity ranges.
For example, customer A purchased 305 units. To calculate billing for this quantity, first hundred units will be billed at $50, second hundred units at $45, third hundred units at $40 and remaining 5 units at $35. Mathematically:
purchases$bill[1] = 100*50 + 100*45 + 100*40 + 5*35
OR
purchases$bill[1] = 100*prices$price[1] + 100*prices$price[2] + 100*prices$price[3] + 5*prices$price[4]
I wonder what is the best way to do this using an R function to calculate bill for each purchase.
Any help is much appreciated.
Using base R we dan do something like below:
c(prices$price%*%diff(replace(A<-outer(c(0,prices$to),purchases$quantity,"-"),A>0,0)))
[1] 13675 5000 3500 21900 24240 29100 48935 19730 26700 14200 4000 13300 24100 28500 29730 27968
[17] 32700 60900 41400 24950
Elaboration:
price=prices$price
lowr=c(0,prices$to)
qnty=purchases$quantity
x=outer(lowr,qnty,"-")
M=diff(replace(x,x>0,0))
colSums(price*M)##similar to c(price%*%M)
transform(purchases,bill=colSums(price*M))
transform(purchases,bill=colSums(price*M))
customer quantity bill
1 A 305 13675
2 B 100 5000
3 C 70 3500
4 D 576 21900
5 E 687 24240
6 F 1200 29100
7 G 5007 48935
8 H 491 19730
9 I 870 26700
10 J 320 14200
11 K 80 4000
12 L 295 13300
13 M 680 24100
14 N 1100 28500
15 O 1305 29730
16 P 1024 27968
17 Q 1800 32700
18 R 7400 60900
19 S 3500 41400
20 T 730 24950
Here is an example of a bad solution. It is not 100% accurate either.
library(dplyr)
prices <- data.frame(from = c(0,101,201,301,401,501,601,701,801,901,1001,1101,2001),
to = c(100,200,300,400,500,600,700,800,900,1000,1100,2000,10000),
price = c(50,45,40,35,30,25,20,15,10,8,7,6,5))
purchases <- data.frame(customer = LETTERS[1:20],
quantity = c(305,100,70,576,687,1200,5007,491,870,320,80,295,680,1100,1305,1024,1800,7400,3500,800),
bill = NA)
prices$qty = prices$to - prices$from + 1
prices$qty[1] = prices$to[1]
prices$c_qty = cumsum(prices$qty)
prices$bill = prices$qty * prices$price
prices$c_bill = cumsum(prices$bill)
prices$id = 1:nrow(prices)
calculate_billing <- function(qty) {
if(qty <= 100){ price_case = 1}
if(qty >= 101 & qty <= 200) { price_case = 2}
if(qty >= 201 & qty <= 300) { price_case = 3}
if(qty >= 301 & qty <= 400) { price_case = 4}
if(qty >= 401 & qty <= 500) { price_case = 5}
if(qty >= 501 & qty <= 600) { price_case = 6}
if(qty >= 601 & qty <= 700) { price_case = 7}
if(qty >= 701 & qty <= 800) { price_case = 8}
if(qty >= 801 & qty <= 900) { price_case = 9}
if(qty >= 901 & qty <= 1000) { price_case = 10}
if(qty >= 1001 & qty <= 1100) { price_case = 11}
if(qty >= 1101 & qty <= 2000) { price_case = 12}
if(qty >= 2001 & qty <= 10000){ price_case = 13}
if(price_case==1) {
billing = prices$price[price_case]*qty
}
if(price_case>1 & price_case<=11 ) {
remainder <- qty%%100
billing = prices$c_bill[price_case-1] + prices$price[price_case]*remainder
}
if(price_case==12) {
remainder <- qty - 1100
billing = prices$c_bill[price_case-1] + prices$price[price_case]*remainder
}
if(price_case==13) {
remainder <- qty - 2000
billing = prices$c_bill[price_case-1] + prices$price[price_case]*remainder
}
return(billing)
}
purchases %>%
rowwise() %>%
mutate(bill = calculate_billing(quantity))
I have a data frame that I'm trying to do some scenario analysis with. It looks like this:
Revenue Item_1 Item_2 Item_3
552 200 220 45
1500 400 300 200
2300 600 400 300
I'd like to generate something where 1 item is increased or decreased by some fixed amount (ie 1 unite) like this:
Revenue Item_1 Item_2 Item_3
552 201 220 45
1500 401 300 200
2300 601 400 300
552 200 221 45
1500 400 301 200
2300 600 401 300
552 200 220 46
1500 400 300 201
2300 600 400 301
I'm currently doing it in loop like this but am wondering if there's a faster way:
l1 <- list()
increment_amt <- 1
for(i in c('Item_1','Item_2','Item_3')){
newDf <- df1
newDf[,i] <- newDf[,i] + increment_amt
l1[[i]] <- newDf
}
df2 <- do.call(rbind, l1)
Any suggestions?
With lapply,
do.call(rbind, lapply(names(dat)[2:4], function(x) {dat[,x] <- dat[,x] + 1; dat}))
Revenue Item_1 Item_2 Item_3
1 552 201 220 45
2 1500 401 300 200
3 2300 601 400 300
4 552 200 221 45
5 1500 400 301 200
6 2300 600 401 300
7 552 200 220 46
8 1500 400 300 201
9 2300 600 400 301
Of course, do.call / rbind can be replaced with the data.table's speedier rbindlist, which returns a data.table.
library(data.table)
rbindlist(lapply(names(dat)[2:4], function(x) {dat[,x] <- dat[,x] + 1; dat}))
# Data frame
df <- data.frame(Item_1= c(200, 400, 600),
Item_2= c(220, 300, 400),
Item_3= c(45, 200, 300))
# Perturbation
p <- 1
# Add to all columns
df.new <- apply(diag(ncol(df)) * p, MAR = 1, function(x)data.frame(t(t(df) + x)))
[[1]]
Item_1 Item_2 Item_3
1 201 220 45
2 401 300 200
3 601 400 300
[[2]]
Item_1 Item_2 Item_3
1 200 221 45
2 400 301 200
3 600 401 300
[[3]]
Item_1 Item_2 Item_3
1 200 220 46
2 400 300 201
3 600 400 301
We can write a function and use lapply to achieve this task. df is your original data frame. df_list is a list with all final outputs. You can later use df2 <- do.call(rbind, df_list), or bind_rows from dplyr.
# A function to add 1 to all numbers in a column
add_one <- function(Col, dt){
dt[, Col] <- dt[, Col] + 1
return(dt)
}
# Get the column names
Col_vec <- colnames(df)[2:ncol(df)]
# Apply the add_one function
df_list <- lapply(Col_vec, add_one, dt = df)
# Combine all results
df2 <- dplyr::bind_rows(df_list)
You can use perturb function in R using library(perturb). The code is as follows:
# using the most important features, we create a model
m1 <- lm(revenue ~ item1 + item2 + item3)
#summary(m1)
#anova(m1)
#install.packages("perturb")
library(perturb)
set.seed(1234)
p1_new <- perturb(m1, pvars=c("item1","item2") , prange = c(1,1),niter=20)
p1_new
summary(p1_new)
EMPLTOT_N FIRMTOT average min
12289593 4511051 5 1
26841282 1074459 55 10
15867437 81243 300 100
6060684 8761 750 500
52366969 8910 1000 1000
137003 47573 5 1
226987 10372 55 10
81011 507 300 100
23379 52 750 500
13698 42 1000 1000
67014 20397 5 1
My data look like the data above. I want to create a new column EMP using mutate function that:
emp= average*FIRMTOT if EMPLTOT_N/FIRMTOT<min
and emp=EMPLTOT_N if EMPLTOT_N/FIRMTOT>min
In your sample data EMPLTOT_N / FIRMTOT is never less than min, but this should work:
df <- read.table(text = "EMPLTOT_N FIRMTOT average min
12289593 4511051 5 1
26841282 1074459 55 10
15867437 81243 300 100
6060684 8761 750 500
52366969 8910 1000 1000
137003 47573 5 1
226987 10372 55 10
81011 507 300 100
23379 52 750 500
13698 42 1000 1000
67014 20397 5 1", header = TRUE)
library('dplyr')
mutate(df, emp = ifelse(EMPLTOT_N / FIRMTOT < min, average * FIRMTOT, EMPLTOT_N))
In the above if EMPLTOT_N / FIRMTOT == min, emp will be given the value of EMPLTOT_N since you didn't specify what you want to happen in this case.