Distributed computing using r - r

I am trying to build a system where I have the monthly sales data of all employees in my department for the past 1 year.
sales<-read.table(text="MONTH Emp1 Emp2 Emp3
1 1000 1500 1100
2 1200 1400 1600
3 1500 1400 1600
4 1300 1500 1400
5 1500 1200 1200", header=T)
and so on till month 10
Through an algorithm, I have forecasted their future values and found the maximum percentage increase they can achieve.
threshold<-read.table(text="Employee 'Max Increment Possible'
Emp1 200
Emp2 220
Emp3 300",header=T)
Now I am setting a target of 400 increase in my department but want to distribute it among my employees in the best possible way. Wondering if there is an existing package that can do this in R.
The output should be:
Employee Incremental Value
Emp1 120
Emp1 140
Emp1 140

Related

How to calculate average of adjacent rows?

I got a DF with peoples salary data and their job. One row is one person. I need to calculate the average salary of 3 people on the same job and make a new DF out of it. The 3 people need to be on the same job and their wages need to be adjacent if the DF is sorted from highest to lowest salary. The average salary of the person themselves and the ones above and below them in the DF if they have the same job. The people with the highest and lowest salary in a job are excluded as they have nobody above or below them.
This is a sample of the data i have
Job salary
IT 5000
IT 4500
IT 4000
IT 4000
Sales 4500
Sales 4500
Sales 4000
Sales 3000
Sales 2500
HR 3000
HR 2500
HR 2300
This is what i would like to get (if the average salary went to decimal places i rounded it. But in the R DF there is no need to do it. Decimal places are ok.
Job salary
IT 4500
IT 4167
Sales 4333
Sales 3833
Sales 3167
HR 2600
I'm stuck as i can't figure out how to calculate the average of the 3 people on the same job and exclude the top and bottom. Hope you can help.
Thank you
You want a rolling average by group. This can be done with zoo::rollmean coupled with dplyr::group_by.
library(dplyr)
library(zoo)
dat %>%
group_by(Job) %>%
summarise(mean = rollmean(salary, 3, align = "right"))
Job mean
<fct> <dbl>
1 IT 4500
2 IT 4167.
3 Sales 4333.
4 Sales 3833.
5 Sales 3167.
6 HR 2600
Here are some base R options
> with(df,stack(tapply(salary, Job, function(x) rowMeans(embed(x, 3)))))
values ind
1 2600.000 HR
2 4500.000 IT
3 4166.667 IT
4 4333.333 Sales
5 3833.333 Sales
6 3166.667 Sales
> aggregate(salary ~ ., df, function(x) rowMeans(embed(x, 3)))
Job salary
1 HR 2600
2 IT 4500.000, 4166.667
3 Sales 4333.333, 3833.333, 3166.667

How do i use hts() to aggregate a time series data?

I an new to R and have very basic doubts,
Company Customer Product Q1 Q2 Q3 Q4
xyz Customer1 ProductA 500 600 400 800
xyz Customer1 ProductB 100 255 520 642
xyz Customer1 ProductC 846 566 320 54
xyz Customer1 ProductD 510 53 100 210
xyz Customer2 ProductX 500 50 466 260
xyz Customer2 ProductY 100 120 150 620
xyz Customer2 ProductZ 500 460 240 543
The above mentioned is an example of my data set. I need to create a hierarchical time series using hts() with 3 levels. The bottom level (level 0) should contain the products(column - product) which will be aggregated to an upper level (level 1) which is based on customers (colunm - Customer) which inturn will have to be aggregated to the top level based on company.
My ques are,
how do i write a hts() code for this data set?
the data type of my data set is data frames, should i convert to
matrix before using?

trying to calculate month-over-month percentage growth rate

I have a huge data of 5000 rows and trying to find month over month growth rate for each cities for period June to august. My data is
id | host_id | Host_since | area
1 121 2017-08-31 LA
2 243 2017-08-15 SF
3 243 2017-06-12 SF
4 100 2017-07-13 NYC
5 300 2017-05-19 CHI
6 250 2017-07-20 MIN
7 135 2017-08-25 LA
.
.
.
I don't care about duplicate host_id, the only thing I want is to see total ids created in a month.So I fixed my query but it still says error near "OVER". Not able to figure out what is the issue. Don't know what wrong I am doing as the query looks perfect for me? Any help would be great.
Select strftime('%Y-%m', host_since) as month, area,
count(id) as count,
100 * (count(id) - lag(count(id), 1) over (partition by area,
order by strftime('%Y-%m', host_since))) / lag(count(id), 1) over
(partition by area, order by strftime('%Y-%m', host_since)))
as growth
from listings
where host_since between '2017-06-01' and '2017-08-31'
group by 1,2
order by 1;
SQLITE_ERROR: near "over": syntax error

traversing through a text file and comparing with other and output the concatenated file in unix

I have a scenario in bash where in i have 2 files emp_details.txt and emp_file.txt. emp_details has the details of employees as below
1 emp1 sales
2 emp2 marketng
3 emp3 testing
emp_file.txt has
1 emp1 30 2500
2 emp2 25 1200
3 emp3 33 4000
how do i traverse through these files and create a third file which displays complete details of the emp 1, emp 2 and emp 3
like
1 emp1 sales 30 2500
2 emp2 marketing 25 1200
3 emp3 testing 33 4000
Here is one way using awk:
awk '
NR==FNR { emp[$1,$2] = $0; next }
(($1,$2) in emp) { print emp[$1,$2], $3, $4 }
' emp_details.txt emp_file.txt
1 emp1 sales 30 2500
2 emp2 marketng 25 1200
3 emp3 testing 33 4000
If you are on solaris (variant of unix), please do not use the default awk. Use nawk, or /usr/xpg4/bin/awk.
If this mocked up data is considerably different that your real data, you might need to tweak the code yourself or update your question and publish data that represents your data more accurately.

conditional sorting of columns based on contents of one column in R

I have a R data.frame of daily stock volumes of 5 stocks for many days:
date stock1 stock2 stock3 stock4 stock5
1 350 600 1900 3000 250
2 800 800 1200 4200 400
3 500 600 1500 3500 550
4 600 900 1800 3200 1000
...
...
What I am looking for is a way to get a sorted list of stocks at end of each day, sorted on the volume numbers(descending ranking). I am thinking I can run a for loop for nrow(df) and at each iteration, sort the row contents on volume and save the sorted columns headers(stock names) as the expected list for that day. How can I manage to do this. Is it possible to do?
I am a novice in R and programming. I hope my question was clear. Grateful for any help.! thanks.

Resources