Sequential calculations fail in R - r

I tried to do some calculations with a constant and several variables in a dataframe.
For example we can use the following dummy data
constant <- 100
df <- as.data.frame(cbind(c(1,2,3,4,5),
c(4,3,6,1,4),
c(2,5,6,6,2),
c(5,5,5,1,2),
c(3,6,4,3,1)))
colnames(df) <- c("aa", "bb", "cc", "dd", "ee")
Now say that for every row in my dataframe I want to multiply my constant with variable bb, then cc, and then dd sequentially. I tried
answers <- sapply(df, function(x) constant * (1 + x[,2:4])
and similar attempts with lapply.
How would I go about it so that I get my: constant * bb * cc * dd? They are percentages, that is why I have the (1+... there

Try this approach with apply():
#Data
constant <- 100
df <- as.data.frame(cbind(c(1,2,3,4,5),
c(4,3,6,1,4),
c(2,5,6,6,2),
c(5,5,5,1,2),
c(3,6,4,3,1)))
colnames(df) <- c("aa", "bb", "cc", "dd", "ee")
#Apply
answers <- as.data.frame(t(apply(df,1, function(x) constant * (1 + x))))
Output:
answers
aa bb cc dd ee
1 200 500 300 600 400
2 300 400 600 600 700
3 400 700 700 600 500
4 500 200 700 200 400
5 600 500 300 300 200
Or using dplyr with across():
library(dplyr)
#Code
answer <- df %>% mutate(across(everything(),~constant * (1 + .)))
Output:
aa bb cc dd ee
1 200 500 300 600 400
2 300 400 600 600 700
3 400 700 700 600 500
4 500 200 700 200 400
5 600 500 300 300 200
Or with the same sapply():
#Code 3
answers <- sapply(df,function(x) constant * (1 + x))
answers <- as.data.frame(answers)
Output:
aa bb cc dd ee
1 200 500 300 600 400
2 300 400 600 600 700
3 400 700 700 600 500
4 500 200 700 200 400
5 600 500 300 300 200
Or any of these options will produce same output:
#Code 4
answers <- as.data.frame(do.call(cbind,lapply(df,function(x) constant * (1 + x))))
#Code 5
answers <- as.data.frame(mapply(function(x) constant * (1 + x),x=df))

Related

R fast way to perturb through data frame

I have a data frame that I'm trying to do some scenario analysis with. It looks like this:
Revenue Item_1 Item_2 Item_3
552 200 220 45
1500 400 300 200
2300 600 400 300
I'd like to generate something where 1 item is increased or decreased by some fixed amount (ie 1 unite) like this:
Revenue Item_1 Item_2 Item_3
552 201 220 45
1500 401 300 200
2300 601 400 300
552 200 221 45
1500 400 301 200
2300 600 401 300
552 200 220 46
1500 400 300 201
2300 600 400 301
I'm currently doing it in loop like this but am wondering if there's a faster way:
l1 <- list()
increment_amt <- 1
for(i in c('Item_1','Item_2','Item_3')){
newDf <- df1
newDf[,i] <- newDf[,i] + increment_amt
l1[[i]] <- newDf
}
df2 <- do.call(rbind, l1)
Any suggestions?
With lapply,
do.call(rbind, lapply(names(dat)[2:4], function(x) {dat[,x] <- dat[,x] + 1; dat}))
Revenue Item_1 Item_2 Item_3
1 552 201 220 45
2 1500 401 300 200
3 2300 601 400 300
4 552 200 221 45
5 1500 400 301 200
6 2300 600 401 300
7 552 200 220 46
8 1500 400 300 201
9 2300 600 400 301
Of course, do.call / rbind can be replaced with the data.table's speedier rbindlist, which returns a data.table.
library(data.table)
rbindlist(lapply(names(dat)[2:4], function(x) {dat[,x] <- dat[,x] + 1; dat}))
# Data frame
df <- data.frame(Item_1= c(200, 400, 600),
Item_2= c(220, 300, 400),
Item_3= c(45, 200, 300))
# Perturbation
p <- 1
# Add to all columns
df.new <- apply(diag(ncol(df)) * p, MAR = 1, function(x)data.frame(t(t(df) + x)))
[[1]]
Item_1 Item_2 Item_3
1 201 220 45
2 401 300 200
3 601 400 300
[[2]]
Item_1 Item_2 Item_3
1 200 221 45
2 400 301 200
3 600 401 300
[[3]]
Item_1 Item_2 Item_3
1 200 220 46
2 400 300 201
3 600 400 301
We can write a function and use lapply to achieve this task. df is your original data frame. df_list is a list with all final outputs. You can later use df2 <- do.call(rbind, df_list), or bind_rows from dplyr.
# A function to add 1 to all numbers in a column
add_one <- function(Col, dt){
dt[, Col] <- dt[, Col] + 1
return(dt)
}
# Get the column names
Col_vec <- colnames(df)[2:ncol(df)]
# Apply the add_one function
df_list <- lapply(Col_vec, add_one, dt = df)
# Combine all results
df2 <- dplyr::bind_rows(df_list)
You can use perturb function in R using library(perturb). The code is as follows:
# using the most important features, we create a model
m1 <- lm(revenue ~ item1 + item2 + item3)
#summary(m1)
#anova(m1)
#install.packages("perturb")
library(perturb)
set.seed(1234)
p1_new <- perturb(m1, pvars=c("item1","item2") , prange = c(1,1),niter=20)
p1_new
summary(p1_new)

Descriptive Statistics By Group - R

I'm looking for a way to produce descriptive statistics by group number in R. There is another answer on here I found, which uses dplyr, but I'm having too many problems with it and would like to see what alternatives others might recommend.
I'm looking to obtain descriptive statistics on revenue grouped by group_id. Let's say I have a data frame called company:
group_id company revenue
1 Company A 200
1 Company B 150
1 Company C 300
2 Company D 600
2 Company E 800
2 Company F 1000
3 Company G 50
3 Company H 80
3 Company H 60
and I'd like to product a new data frame called new_company:
group_id company revenue average min max SD
1 Company A 200 217 150 300 62
1 Company B 150 217 150 300 62
1 Company C 300 217 150 300 62
2 Company D 600 800 600 1000 163
2 Company E 800 800 600 1000 163
2 Company F 1000 800 600 1000 163
3 Company G 50 63 50 80 12
3 Company H 80 63 50 80 12
3 Company H 60 63 50 80 12
Again, I'm looking for alternatives to dplyr. Thank you
Using the sample data frame
dd<-read.csv(text="group_id,company,revenue
1,Company A,200
1,Company B,150
1,Company C,300
2,Company D,600
2,Company E,800
2,Company F,1000
3,Company G,50
3,Company H,80
3,Company H,60", header=T)
You could do something fancy like use ave() to create all the values per row for your different functions and then just combine that with the original data.frame.
ext <- with(dd, Map(function(x) ave(revenue, group_id, FUN=x),
list(avg=mean, min=min, max=max, SD=sd)))
cbind(dd, ext)
# group_id company revenue avg min max SD
# 1 1 Company A 200 216.66667 150 300 76.37626
# 2 1 Company B 150 216.66667 150 300 76.37626
# 3 1 Company C 300 216.66667 150 300 76.37626
# 4 2 Company D 600 800.00000 600 1000 200.00000
# 5 2 Company E 800 800.00000 600 1000 200.00000
# 6 2 Company F 1000 800.00000 600 1000 200.00000
# 7 3 Company G 50 63.33333 50 80 15.27525
# 8 3 Company H 80 63.33333 50 80 15.27525
# 9 3 Company H 60 63.33333 50 80 15.27525
but really a simple dplyr command would be easier.
dd %>% group_by(group_id) %>%
mutate(
avg=mean(revenue),
min=min(revenue),
max=max(revenue),
SD=sd(revenue))
Another function I like to use is: describeBy from package "psych".
library(psych)
description <- describeBy(data.frame$variable_to_be_described, df$group_variable)

How to make new variable across conditions

I need to calculate new variable from data using conditions. New Pheno.
Data set is huge.
I have data set: Animal, Record, Days, Pheno
A R D P
1 1 240 300
1 2 230 290
2 1 305 350
2 2 260 290
3 1 350 450
Conditions are:
Constant pheno per day is 2.
If record days is more than 305 old pheno should be keept.
If record is less than 305 but has next records Pheno should be keept.
If record is less than 305 and have no next records it should be calculated as : 305-days*constant+pheno = (305 - 260)*2+300
Example for animal 1 having less than 305 for both records. So First record will be same in new pheno, but secon record is las and has less than 305, so we need to re-calculate... (305-230)*2+290=440
Finaly data will be like:
A R D P N_P
1 1 240 300 300
1 2 230 290 440
2 1 305 350 350
2 2 260 290 380
3 1 350 450 450
How to do it in R or linux ...
Here is a solution with base R
df <- read.table(header=TRUE, text=
"A R D P
1 1 240 300
1 2 230 290
2 1 305 350
2 2 260 290
3 1 350 450")
newP <- function(d) {
np <- numeric(nrow(d))
for (i in 1:nrow(d)) {
if (d$D[i] > 305) { np[i] <- d$P[i]; next }
if (d$D[i] <= 305 && i<nrow(d)) { np[i] <- d$P[i]; next }
np[i] <- (305-d$D[i])*2 + d$P[i]
}
d$N_P <- np
return(d)
}
D <- split(df, df$A)
D2 <- lapply(D, newP)
do.call(rbind, D2)
Check this out (I assume R is the number of records sorted, so if you have 10 records the last will have R=10)
library(dplyr)
df <- data.frame(A=c(1,1,2,2,3),
R=c(1,2,1,2,1),
D=c(240,230,305,260,350),
P=c(300,290,350,290,450))
df %>% group_by(A) %>%
mutate(N_P=ifelse(( D<305 & R==n()), # check if D<305 & Record is last record
((305-D)*2)+P # calculate new P
,P)) # Else : use old P
Source: local data frame [5 x 5]
Groups: A [3]
A R D P N_P
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 240 300 300
2 1 2 230 290 440
3 2 1 305 350 350
4 2 2 260 290 380
5 3 1 350 450 450
If you have predefined constants that depend on R value in the df, for example :
const <- c(1,2,1.5,2.5,3)
You can replace R in the code by const[R]
df %>% group_by(A) %>%
mutate(N_P=ifelse(( D<305 & R==n()), # check if D<305 & Record is last record
((305-D)*const[R])+P # calculate new P
,P)) # Else : use old P

if statement and mutate

EMPLTOT_N FIRMTOT average min
12289593 4511051 5 1
26841282 1074459 55 10
15867437 81243 300 100
6060684 8761 750 500
52366969 8910 1000 1000
137003 47573 5 1
226987 10372 55 10
81011 507 300 100
23379 52 750 500
13698 42 1000 1000
67014 20397 5 1
My data look like the data above. I want to create a new column EMP using mutate function that:
emp= average*FIRMTOT if EMPLTOT_N/FIRMTOT<min
and emp=EMPLTOT_N if EMPLTOT_N/FIRMTOT>min
In your sample data EMPLTOT_N / FIRMTOT is never less than min, but this should work:
df <- read.table(text = "EMPLTOT_N FIRMTOT average min
12289593 4511051 5 1
26841282 1074459 55 10
15867437 81243 300 100
6060684 8761 750 500
52366969 8910 1000 1000
137003 47573 5 1
226987 10372 55 10
81011 507 300 100
23379 52 750 500
13698 42 1000 1000
67014 20397 5 1", header = TRUE)
library('dplyr')
mutate(df, emp = ifelse(EMPLTOT_N / FIRMTOT < min, average * FIRMTOT, EMPLTOT_N))
In the above if EMPLTOT_N / FIRMTOT == min, emp will be given the value of EMPLTOT_N since you didn't specify what you want to happen in this case.

adding as integers instead of list elements in R

adding as integers instead of list elements in R
I am getting
> total = 0
> for (qty in a[5]){
+ total = total + as.numeric(unlist(qty))
+ print(total)
+ }
[1] 400 400 400 400 400 400 400 400 400 400
what i really want is :
> total = 0
> for (qty in a[5]){
+ total = total + as.numeric(unlist(qty))
+ print(total)
+ }
[1] 400 800 1200 1600 2000 2400 2800 3200 3600 4000
refine: a little bit more to a more specific scenario,
price buy_sell qty
100 B 100
100 B 200
90 S 300
100 S 400
I want to make a forth column
price buy_sell qty net
100 B 100 10000
100 B 200 30000
90 S 300 3000
100 S 400 -37000
Note that if a is a list, you want to use double brackets. Otherwise you are getting back a list of size one, where the first element has the values you are looking for
Try:
total <- cumsum(a[[5]])
a <- list()
a[[5]] <- rep(400, 10)
cumsum(a[[5]])
# [1] 400 800 1200 1600 2000 2400 2800 3200 3600 4000
Compare:
a[5]
a[[5]]
a[5][[1]]

Resources