Each month a certain number of resources are made available.
There's a defined 'usage' rate from months 0 to month n
For ex, in the month of release, 10% of resources are used, 12% additionally in second month, & 15% additionally in third month, so on & so forth until the maximum available resources are used.
required::
How many resources are used each month.
for example,
in month 1, there are 10% of the resources released in month 1
in month 2, there are 12% of resources released in month 1 + 10% of resources released in month 2
in month 3, there are 15% of resources released in month 1 + 12 % of resources released in month 2 + 10% of resources released in month 3
& so on..
The logic is implemented in Excel thus: http://www.mrexcel.com/forum/excel-questions/752098-array-formula-allocate-revenues-across-periods.html
How can I implement this in R?
Thank you for your help!
Have a look at cumsum()
> used<-c(0.1,0.12,0.15)
> cumsum(used)
[1] 0.10 0.22 0.37
Hope this is what you were looking for.
Related
Consider the following two datasets. The first dataset describes an id variable that identifies a person and the date when his or her unemployment benefits starts.
The second dataset shows the number of service years, which makes it possible to calculate the maximum entitlement period. More precisely, each year denotes a dummy variable, which is equal to unity in case someone build up unemployment benefits rights in a particular year (i.e. if someone worked). If this is not the case, this variable is equal to zero.
df1<-data.frame( c("R005", "R006", "R007"), c(20120610, 20130115, 20141221))
colnames(df1)<-c("id", "start_UI")
df1$start_UI<-as.character(df1$start_UI)
df1$start_UI<-as.Date(df1$start_UI, "%Y%m%d")
df2<-data.frame( c("R005", "R006", "R007"), c(1,1,1), c(1,1,1), c(0,1,1), c(1,0,1), c(1,0,1) )
colnames(df2)<-c("id", "worked2010", "worked2011", "worked2012", "worked2013", "worked2014")
Just to summarize the information from the above two datasets. We see that person R005 worked in the years 2010 and 2011. In 2012 this person filed for Unemployment insurance. Thereafter person R005 works again in 2013 and 2014 (we see this information in dataset df2). When his unemployment spell started in 2012, his entitlement was based on the work history before he got unemployed. Hence, the work history is equal to 2. In a similar vein, the employment history for R006 and R007 is equal to 3 and 5, respectively (for R007 we assume he worked in 2014 as he only filed for unemployment benefits in December of that year. Therefore the number is 5 instead of 4).
Now my question is how I can merge these two datasets effectively such that I can get the following table
df_final<- data.frame(c("R005", "R006", "R007"), c(20120610, 20130115, 20141221), c(2,3,5))
colnames(df_final)<-c("id", "start_UI", "employment_history")
id start_UI employment_history
1 R005 20120610 2
2 R006 20130115 3
3 R007 20141221 5
I tried using "aggregate", but in that case I also include work history after the year someone filed for unemployment benefits and that is something I do not want. Does anyone have an efficient way how to combine the information from the two above datasets and calculate the unemployment history?
I appreciate any help.
base R
You should use Reduce with accumulate = T.
df2$employment_history <- apply(df2[,-1], 1, function(x) sum(!Reduce(any, x==0, accumulate = TRUE)))
merge(df1, df2[c("id","employment_history")])
dplyr
Or use the built-in dplyr::cumany function:
df2 %>%
pivot_longer(-id) %>%
group_by(id) %>%
summarise(employment_history = sum(value[!cumany(value == 0)])) %>%
left_join(df1, .)
Output
id start_UI employment_history
1 R005 2012-06-10 2
2 R006 2013-01-15 3
3 R007 2014-12-21 5
This question already has answers here:
Extract number after a certain word
(4 answers)
Closed 2 years ago.
I have some text data and I want to extract from it the first number after the word "expects earnings of". What I currently have is the following:
x <- d %>%
mutate(
expectsEarningsOf = str_match_all(newCol, "expects earnings of (.*?) cents")
)
Which extracts the text along with the number after the word "expects earnings of" and before the word "cents". I just want to now extract the first number after "expects earnings of". I thought about something:
x <- d %>%
mutate(
expectsEarningsOf = str_match_all(newCol, "expects earnings of (.*?) anyStringCharacter")
)
Where anyStringCharacter is any non numeric number.
Data:
d <- structure(list(grp = c(2635L, 1276L, 10799L, 10882L, 6307L, 7622L,
2448L, 6467L, 3224L, 2064L, 9232L, 5039L, 2888L, 5977L, 3565L
), newCol = c("For 2008, True Religion expects earnings of $1.48 to $1.52 a share and net sales of $210 million to $215 million. The company expects to incur additional marketing expenses of about $1.7 million. ",
"But Hospira also said it now expects net sales on a GAAP basis to grow at a rate of 1% to 2% this year, reduced from earlier expectations by lower-than-expected international sales and purchasing delays in the medication-management business. After the second quarter, the company had projected growth in a range of 3% to 5%. ",
"14 Nov 2013 16:04 EDT *Thermogenesis Sees Net Savings About $1.5 Million From Reorganization",
" The Company announced that net sales for this nine week period increased by 25.4% to $185.3 million while comparable store sales for this period decreased by 0.5%. Based on this quarter-to-date performance, the Company now expects net sales for the fourth quarter of fiscal 2013 to be in the range of $208 million to $210 million, comparable store sales to be in the range of -1.5% to -0.5% and GAAP net income to be in the range of $23.3 million to $24.3 million, with a GAAP diluted income per common share range of $0.43 to $0.45 on approximately 54.0 million estimated weighted average shares outstanding. Excluding $0.9 million, or $0.02 per adjusted diluted share in tax-effected expenses related to the founders' transaction(1) , adjusted net income is expected to be approximately $24.2 million to $25.2 million, or $0.44 to $0.46 per diluted share based on estimated adjusted diluted weighted average shares outstanding of approximately 54.6 million., 9 Jan 2014 16:45 EDT *Five Below, Inc. Updates 4Q Fiscal 2013 Guidance Based On Qtr-To-Date Results",
"", "1323 GMT Raiffeisen Centrobank calls Verbund's (VER.VI) recent guidance increase for 2014 a \"mixed bag,\" raising its target price to EUR15.60 from EUR14.30. The bank retains its hold rating as positive effects are mostly due to one-offs, although the utility's sustainable cost savings were a positive surprise. \"The power price environment is still bleak following a weakish outlook for Central European economies, coal prices falling further and only lacklustre hopes for a quick fix of the European energy and climate policy,\" Raiffeisen adds. Verbund's shares trade up 0.6% at EUR15.34. (Nicole.lundeen#wsj.com; #nicole_lundeen) ",
"As a result of its third quarter results and current fourth quarter outlook, the Company has updated its guidance for fiscal 2007. The Company now expects net sales to range from $2.68 billion to $2.7 billion, which compares to prior expectations of $2.7 billion to $2.75 billion. Same-store sales for the year are expected to increase approximately 2.5% to 3% compared to previous expectations of an increase of approximately 3.0% to 4.5%. The Company now expects full year net income to range from $2.37 to $2.43 per diluted share, which compares to its prior guidance of $2.49 to $2.56 per diluted share. ",
" Sempra Energy (SRE) sees earnings next year growing 15% from this year's estimate, putting 2010 expectations above Wall Street's, as the parent of San Diego Gas & Electric anticipates much lower capital spending for the next five years.",
"Outlook for 2008: Midpoint for EPS guidance increased, For the full year 2008, the company now expects results from continuing operations as follows: earnings per diluted share of between $3.10 and $3.20, compared to the previous range of $3.00 to $3.20; revenue growth of approximately 9%, and operating income to approach 17% of revenues. Over the same period, the company expects cash from operations to approximate $900 million and capital expenditures of between $240 million and $260 million. These estimates exclude potential special charges.",
"California Pizza Kitchen expects second-quarter earnings of 34 cents to 36 cents a share. Wall Street expects earnings of 36 cents a share. ",
" -- Q1 2013 gross margin within guidance, sales ahead of guidance , \"We achieved first quarter sales ahead of and gross margin in line with our guidance, and reiterate our expectation for a sales acceleration during the year, with a second quarter markedly stronger than the first quarter and a large second half, leading to expected 2013 full year net sales at a similar level to that of 2012. The underlying assumptions are unchanged, with foundry and logic preparing for very lithography-intensive 14-20 nm technology nodes to be used for next generation mobile end-products; while lithography investments in memory are still muted, memory chip price recovery and discussions on scanner shipment capability are signs of potential upside for second half deliveries. EUV technology industrialization continues to make steady progress on the trajectory set with the introduction of the improved source concept last year: firstly, the EUV light sources have now been demonstrated at 55 Watts with adequate dose control; secondly, the scanners themselves have demonstrated production-worthy, 10 nm node compatible imaging and overlay specifications. We therefore confirm our expectation of the ramp of EUV-enabled semiconductor production in 2015, supported by our NXE:3300B scanners, two of which are being prepared for shipment and installation in Q2 and Q3,\" said Eric Meurice, President and Chief Executive Officer of ASML., -- For the second quarter of 2013, ASML expects net sales of about EUR 1.1 ",
"In the first quarter, Covanceexpects earnings of 60 cents a share on a modest sequential increase in net revenues. Analysts predicted income of 66 cents share on $534 million in revenue, which is nearly flat with the latest quarter's revenue.",
"The company said Monday it expects to report revenue of about $875 million for 2007, up sharply from $196 million in 2006, mostly because of new military contracts. However, it expects net income to remain nearly the same at $16.6 million. ",
"For the fourth quarter, the company sees earnings of $1.13 to $1.16 a share. ",
"Chip maker now expects earnings from continuing operations of 15c-17c a share, excluding restructuring charges, and a revenue decline of 25% to 30% sequentially, because of weak demand. Shares fall 6% late., Chip maker now expects earnings from continuing operations of 15c-17c a share, excluding restructuring charges, and a revenue decline of 25% to 30% sequentially, because of weak demand. Shares fall 6% late."
)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-15L))
The first number after "expects earnings of":
library(stringr)
str_extract_all(d$newCol, "(?<=expects earnings of )\\d+")
This solution uses positive lookbehind in (?<=expects earnings of ), encoding an instruction to match \\d+if it is immediately preceded by expects earnings of (with a white space).
I have summary level data that tells me how often a group of patients actually went to the doctor until a certain cut-off date. I do not have individual data, I only know that some e.g. went 5 times, and some only once.
I also know that some were already patients at the beginning of the observation interval, and would be expected to come more often, whereas some were new patients that entered later. If they only joined a month before the cutoff data, they would be expected to come less often than someone who was in the group from the beginning.
Of course, the patients are not well behaved, so they sometimes miss a visit, or they come more often than expected. I am setting some boundary conditions to define the expectation about minimum and maximum number of doctor visits relative to the month they started appearing at the doctor.
Now, I want to distribute the actual summary level data to individuals, i.e. create a data frame that tells me during which month each individual started appearing at the doctor, and how many times they came for check-up until the cut-off date.
I am assuming this can be done with some type of random sampling, but the result needs to fit both the summary level information I have about the actual subjects as well as the boundary conditions telling how often a subject would be expected to come to the doctor relative to their joining time.
Here is some code that generates the target data frame that contains the month when the observation period starts, the respective number of doctor's visits that is expected (including boundary for minimum and maximum visits), and the associated percentage of subjects who start coming to the doctor during this month:
library(tidyverse)
months <- c("Nov", "Dec", "Jan", "Feb", "Mar", "Apr")
target.visits <- c(6,5,4,3,2,1)
percent <- c(0.8, 0.1, 0.05, 0.03, 0.01, 0.01)
df.target <- data.frame(month = months, target.visits = target.visits,
percent = percent) %>%
mutate(max.visits = c(7,6,5,4,3,2),
min.visits = c(5,4,3,2,1,1))
This is the data frame:
month target.visits percent max.visits min.visits
Nov 6 0.80 7 5
Dec 5 0.10 6 4
Jan 4 0.05 5 3
Feb 3 0.03 4 2
Mar 2 0.01 3 1
Apr 1 0.01 2 1
In addition, I can create the data frame that shows the actual subject n with the actual number of visits:
subj.n <- 1000
actual.visits = c(7,6,5,4,3,2,1)
actual.subject.perc = c(0.05,0.6,0.2,0.06,0.035, 0.035,0.02)
df.observed <- data.frame(actual.visits = actual.visits,
actual.subj.perc = actual.subject.perc, actual.subj.n = subj.n * actual.subject.perc)
Here is the data frame with the actual observations:
actual.visits actual.subj.perc actual.subj.n
7 0.050 50
6 0.600 600
5 0.200 200
4 0.060 60
3 0.035 35
2 0.035 35
1 0.020 20
Unfortunately I do not have any idea how to bring these together. I just know that if I have e.g. 60 subjects that come to the doctor 4 times during their observation period, I would like to randomly assign a starting month to each of them. However, based on the boudary conditions min.visits and max.visits, I know that it can only be a month from Dec - Feb.
Any thoughts are much appreciated.
I need help understanding why I am getting the wrong answer for Problem 19 of Project Euler.
The problem is:
You are given the following information, but you may prefer to do some research for yourself.
1 Jan 1900 was a Monday.
Thirty days has September,
April, June and November.
All the rest have thirty-one,
Saving February alone,
Which has twenty-eight, rain or shine.
And on leap years, twenty-nine.
A leap year occurs on any year evenly divisible by 4, but not on a century unless it is divisible by 400.
How many Sundays fell on the first of the month during the twentieth century (1 Jan 1901 to 31 Dec 2000)?
#rm(list=ls())
days=seq(from=as.Date("1900/1/1"), to=as.Date("2000/12/31"), by="month")
firstSundays=days[weekdays(as.Date(days))=="Sunday"&months(as.Date(days))=="January"]
length(firstSundays)
The answer it gives me is 14 and when I look at firstSundays it gives me:
[1] "1905-01-01" "1911-01-01" "1922-01-01" "1928-01-01" "1933-01-01"
[6] "1939-01-01" "1950-01-01" "1956-01-01" "1961-01-01" "1967-01-01"
[11] "1978-01-01" "1984-01-01" "1989-01-01" "1995-01-01"
I don't understand what is going on here. Could someone please explain? I am fairly new to R and I'm not sure what I am doing wrong.
To compute it in R you could do as follows:
firsts_of_months <- seq(as.Date("1901-01-01"), as.Date("2000-12-01"), by = "1 month")
sum(weekdays(firsts_of_months) == "Sonntag") # use == "sunday" or your local language
I get data into graphite with a granularity of an hour. For example
2013-12-06-12:00 15
2013-12-08-09:00 14
2013-12-09-12:00 3
2013-12-13-00:00 10
2013-12-14-08:00 20
2013-12-14-09:00 1
2013-12-15-00:00 5
2013-12-16-00:00 11
2013-12-16-02:00 12
... and so on
Now, I'd like to be able to graph this into the "evolution of the value for every day in the week" so the actual value displayed is the sum (or average) of the values for this particular day of week over some weeks (let's say 2 weeks for example).
My graph would look like that if I only look at the last week :
^ 21
20| |
| |
| 12.5| 13
10| | | 9.5 |
| | | | |
| | | | |
0+--------------------------------->
Mon Tue Wed Thu Fri Sat Sun
12 13 14 15 16
So for example, for the "Friday" point, it takes the values of today (11+12), the value of last friday (3) and makes an average of both ((11+12)+3)/2
Is this possible, how ?
summarize(your.metric.goes.here, "1week", "sum") will summarize the data in 1 week intervals by summing them. You can also use avg, max, min there.
As far as semantics go- Timers, usually need to be averaged and counters need to be summed when summarized.
Example: If you measure lap-counts and lap-times when you run every day, and want weekly summary, you average the lap-time of seven days and allocate it to that one weekly lap-time. With lap-count, it makes more sense to know total, so you sum it.
On a different note: timeStack and timeShift are used in cases when you want to compare last month's data with this month's on the same timeline. Also, you can timeShift the summarized data too.
I think specifically what you are looking for is a combination of both timeStack and averageSeries. For example:
averageSeries(timeStack(your.metric.here,"1week", 0, 2))
Where the last two arguments are the range of "1week" series you'd like to incorporate (so it gets a series for this week and each of the previous 2 weeks).