adding as integers instead of list elements in R - r

adding as integers instead of list elements in R
I am getting
> total = 0
> for (qty in a[5]){
+ total = total + as.numeric(unlist(qty))
+ print(total)
+ }
[1] 400 400 400 400 400 400 400 400 400 400
what i really want is :
> total = 0
> for (qty in a[5]){
+ total = total + as.numeric(unlist(qty))
+ print(total)
+ }
[1] 400 800 1200 1600 2000 2400 2800 3200 3600 4000
refine: a little bit more to a more specific scenario,
price buy_sell qty
100 B 100
100 B 200
90 S 300
100 S 400
I want to make a forth column
price buy_sell qty net
100 B 100 10000
100 B 200 30000
90 S 300 3000
100 S 400 -37000

Note that if a is a list, you want to use double brackets. Otherwise you are getting back a list of size one, where the first element has the values you are looking for
Try:
total <- cumsum(a[[5]])
a <- list()
a[[5]] <- rep(400, 10)
cumsum(a[[5]])
# [1] 400 800 1200 1600 2000 2400 2800 3200 3600 4000
Compare:
a[5]
a[[5]]
a[5][[1]]

Related

How put column name after make for loop with xlsx file

I am looping to load multiple xlsx files. This I am doing well. But when I want to add the name of the columns of the documents (the same names for all files) I have not managed to do it.
library(dplyr)
library(readr)
library(openxlsx)
library(readxl)
setwd("C:/Users/MiguelAngel/Documents/R Miguelo/Guillermo Ahumada")
ldf <- list()
listxlsx <- dir(pattern = "*.xlsx")
for (k in 1:length(listxlsx)){
ldf[[k]] <-as.data.frame(read.xlsx(listxlsx[k]))
}
The result:
355 1500 1100 43831
1 190 850 600 43832
2 93 4000 3000 43833
3 114 4000 3000 43834
4 431 1000 700 43835
5 182 1000 700 43836
6 496 500 300 43837
7 254 500 300 43838
8 174 600 300 43839
9 397 1500 945 43840
10 198 1500 900 43841
11 271 1500 900 43842
12 94 3000 2000 43843
13 206 400 230 43844
14 305 1500 1100 43845
15 184 850 600 43846
16 90 4000 3000 43847
17 70 4000 3000 43848
18 492 1000 700 43849
19 168 1000 700 43850
20 530 500 300 43851
They load all the files well but without the name of the columns.
I need add the name of columns:
list_file <- dir(pattern = "*.xlsx") %>%
lapply(read.xlsx) %>% # *I use stringAsFactor but appear error.
bind_rows
but appear this
list_file
Form of original columns all files
I need put this columns names after make the loop with for.
Thanks for help me guys
I cannot check this since I don't have Excel files to load, but I think this should work:
listxlsx <- list.files(path = "C:/Users/MiguelAngel/Documents/R Miguelo/Guillermo Ahumada", pattern = "*.xlsx", full.nams = TRUE)
names(listxlsx) <- listxlsx
purrr::map_dfr(listxlsx, readxl::read_excel, .id = "Filename")
(The first line is a better practice to get the filenames than relying on setwd.)
When listxlsx is a named vector the function map_dfr gives a column named Filename where the values are taken from listxlsx.

Sequential calculations fail in R

I tried to do some calculations with a constant and several variables in a dataframe.
For example we can use the following dummy data
constant <- 100
df <- as.data.frame(cbind(c(1,2,3,4,5),
c(4,3,6,1,4),
c(2,5,6,6,2),
c(5,5,5,1,2),
c(3,6,4,3,1)))
colnames(df) <- c("aa", "bb", "cc", "dd", "ee")
Now say that for every row in my dataframe I want to multiply my constant with variable bb, then cc, and then dd sequentially. I tried
answers <- sapply(df, function(x) constant * (1 + x[,2:4])
and similar attempts with lapply.
How would I go about it so that I get my: constant * bb * cc * dd? They are percentages, that is why I have the (1+... there
Try this approach with apply():
#Data
constant <- 100
df <- as.data.frame(cbind(c(1,2,3,4,5),
c(4,3,6,1,4),
c(2,5,6,6,2),
c(5,5,5,1,2),
c(3,6,4,3,1)))
colnames(df) <- c("aa", "bb", "cc", "dd", "ee")
#Apply
answers <- as.data.frame(t(apply(df,1, function(x) constant * (1 + x))))
Output:
answers
aa bb cc dd ee
1 200 500 300 600 400
2 300 400 600 600 700
3 400 700 700 600 500
4 500 200 700 200 400
5 600 500 300 300 200
Or using dplyr with across():
library(dplyr)
#Code
answer <- df %>% mutate(across(everything(),~constant * (1 + .)))
Output:
aa bb cc dd ee
1 200 500 300 600 400
2 300 400 600 600 700
3 400 700 700 600 500
4 500 200 700 200 400
5 600 500 300 300 200
Or with the same sapply():
#Code 3
answers <- sapply(df,function(x) constant * (1 + x))
answers <- as.data.frame(answers)
Output:
aa bb cc dd ee
1 200 500 300 600 400
2 300 400 600 600 700
3 400 700 700 600 500
4 500 200 700 200 400
5 600 500 300 300 200
Or any of these options will produce same output:
#Code 4
answers <- as.data.frame(do.call(cbind,lapply(df,function(x) constant * (1 + x))))
#Code 5
answers <- as.data.frame(mapply(function(x) constant * (1 + x),x=df))

how to extract number beside specific string in a cell?

I would like to extract number information in cells which is located beside specific string. My data looks like this.
item stock
PRE 24GUSSETX4SX15G 200
PLS 12KLRX10SX15G 200
ADU 24SBX200ML 200
NIS 18BNDX40SX11G 200
REF 500GX12BTL 200
i want to extract the numbers which located besides string 'GUSSET','KLR','SB','BND' and 'BTL'. I want to use this number to do multiplication with the stock. For example like this.
item stock pcs total
PRE 24GUSSETX4SX15G 200 24 4800
PLS 12KLRX10SX15G 200 12 2400
ADU 24SBX200ML 200 24 4800
NIS 18BNDX40SX11G 200 18 3600
REF 500GX12BTL 200 12 2400
anyone know how to extract the numbers? thanks very much in advance
One way using base R, is to use sub to extract numbers besides those groups and multiply them with stock to get total.
df$pcs <- as.numeric(sub(".*?(\\d+)(GUSSET|KLR|SB|BND|BTL).*", "\\1", df$item))
df$total <- df$stock * df$pcs
df
# item stock pcs total
#PRE 24GUSSETX4SX15G 200 24 4800
#PLS 12KLRX10SX15G 200 12 2400
#ADU 24SBX200ML 200 24 4800
#NIS 18BNDX40SX11G 200 18 3600
#REF 500GX12BTL 200 12 2400
Or everything in one pipe
library(dplyr)
df %>%
mutate(pcs = as.numeric(sub(".*?(\\d+)(GUSSET|KLR|SB|BND|BTL).*", "\\1", item)),
total = stock * pcs)
We can do this in tidyverse
library(tidyverse)
df %>%
mutate(pcs = as.numeric(str_extract(item, "(\\d+)(?=(GUSSET|KLR|SB|BND|BTL))")),
total = pcs * stock)
# item stock pcs total
#1 PRE 24GUSSETX4SX15G 200 24 4800
#2 PLS 12KLRX10SX15G 200 12 2400
#3 ADU 24SBX200ML 200 24 4800
#4 NIS 18BNDX40SX11G 200 18 3600
#5 REF 500GX12BTL 200 12 2400
data
df <- structure(list(item = c("PRE 24GUSSETX4SX15G", "PLS 12KLRX10SX15G",
"ADU 24SBX200ML", "NIS 18BNDX40SX11G", "REF 500GX12BTL"), stock = c(200L,
200L, 200L, 200L, 200L)), class = "data.frame", row.names = c(NA,
-5L))

R fast way to perturb through data frame

I have a data frame that I'm trying to do some scenario analysis with. It looks like this:
Revenue Item_1 Item_2 Item_3
552 200 220 45
1500 400 300 200
2300 600 400 300
I'd like to generate something where 1 item is increased or decreased by some fixed amount (ie 1 unite) like this:
Revenue Item_1 Item_2 Item_3
552 201 220 45
1500 401 300 200
2300 601 400 300
552 200 221 45
1500 400 301 200
2300 600 401 300
552 200 220 46
1500 400 300 201
2300 600 400 301
I'm currently doing it in loop like this but am wondering if there's a faster way:
l1 <- list()
increment_amt <- 1
for(i in c('Item_1','Item_2','Item_3')){
newDf <- df1
newDf[,i] <- newDf[,i] + increment_amt
l1[[i]] <- newDf
}
df2 <- do.call(rbind, l1)
Any suggestions?
With lapply,
do.call(rbind, lapply(names(dat)[2:4], function(x) {dat[,x] <- dat[,x] + 1; dat}))
Revenue Item_1 Item_2 Item_3
1 552 201 220 45
2 1500 401 300 200
3 2300 601 400 300
4 552 200 221 45
5 1500 400 301 200
6 2300 600 401 300
7 552 200 220 46
8 1500 400 300 201
9 2300 600 400 301
Of course, do.call / rbind can be replaced with the data.table's speedier rbindlist, which returns a data.table.
library(data.table)
rbindlist(lapply(names(dat)[2:4], function(x) {dat[,x] <- dat[,x] + 1; dat}))
# Data frame
df <- data.frame(Item_1= c(200, 400, 600),
Item_2= c(220, 300, 400),
Item_3= c(45, 200, 300))
# Perturbation
p <- 1
# Add to all columns
df.new <- apply(diag(ncol(df)) * p, MAR = 1, function(x)data.frame(t(t(df) + x)))
[[1]]
Item_1 Item_2 Item_3
1 201 220 45
2 401 300 200
3 601 400 300
[[2]]
Item_1 Item_2 Item_3
1 200 221 45
2 400 301 200
3 600 401 300
[[3]]
Item_1 Item_2 Item_3
1 200 220 46
2 400 300 201
3 600 400 301
We can write a function and use lapply to achieve this task. df is your original data frame. df_list is a list with all final outputs. You can later use df2 <- do.call(rbind, df_list), or bind_rows from dplyr.
# A function to add 1 to all numbers in a column
add_one <- function(Col, dt){
dt[, Col] <- dt[, Col] + 1
return(dt)
}
# Get the column names
Col_vec <- colnames(df)[2:ncol(df)]
# Apply the add_one function
df_list <- lapply(Col_vec, add_one, dt = df)
# Combine all results
df2 <- dplyr::bind_rows(df_list)
You can use perturb function in R using library(perturb). The code is as follows:
# using the most important features, we create a model
m1 <- lm(revenue ~ item1 + item2 + item3)
#summary(m1)
#anova(m1)
#install.packages("perturb")
library(perturb)
set.seed(1234)
p1_new <- perturb(m1, pvars=c("item1","item2") , prange = c(1,1),niter=20)
p1_new
summary(p1_new)

if statement and mutate

EMPLTOT_N FIRMTOT average min
12289593 4511051 5 1
26841282 1074459 55 10
15867437 81243 300 100
6060684 8761 750 500
52366969 8910 1000 1000
137003 47573 5 1
226987 10372 55 10
81011 507 300 100
23379 52 750 500
13698 42 1000 1000
67014 20397 5 1
My data look like the data above. I want to create a new column EMP using mutate function that:
emp= average*FIRMTOT if EMPLTOT_N/FIRMTOT<min
and emp=EMPLTOT_N if EMPLTOT_N/FIRMTOT>min
In your sample data EMPLTOT_N / FIRMTOT is never less than min, but this should work:
df <- read.table(text = "EMPLTOT_N FIRMTOT average min
12289593 4511051 5 1
26841282 1074459 55 10
15867437 81243 300 100
6060684 8761 750 500
52366969 8910 1000 1000
137003 47573 5 1
226987 10372 55 10
81011 507 300 100
23379 52 750 500
13698 42 1000 1000
67014 20397 5 1", header = TRUE)
library('dplyr')
mutate(df, emp = ifelse(EMPLTOT_N / FIRMTOT < min, average * FIRMTOT, EMPLTOT_N))
In the above if EMPLTOT_N / FIRMTOT == min, emp will be given the value of EMPLTOT_N since you didn't specify what you want to happen in this case.

Resources