I need to transform data table into triangle form.
So, I download the ChainLadder package.Then, I import the data from the table. You can see it as below:
> head(triangle.csv)
année dev montant
1 2009 1 2147
2 2009 2 3365
3 2009 3 2129
4 2009 4 1070
5 2009 5 0
6 2009 6 300
I need to convert this table into triangle form like :
this table constructed in Excel
So I wrote this code:
data<-as.triangle(triangle.csv)
But an Error is shown:
Error in as.triangle(triangle.csv) : could not find function
"as.triangle"
How do I resolve this problem please?
What you need to do is actually just specify the arguments in the function. You should write:
as.triangle(triangle.csv, origin="année", dev="dev", "montant")
Remember you can also specify the package name, as pointed out by #Paul in the comments, as: ChainLadder::as.triangle().
EDIT: for future issues, remember to practice with the examples that you can find in the man pages (eg .?as.triangle).
Related
Whenever I want to lag in a data frame I realize that something that should be simple is not. While the problem has been asked & answered many times (see p.s.), I did not find a simple solution which I can remember until the next time I lag. In general, lagging does not seem to be a simple thing in R as the multiple workarounds testify. I run into this problem often and it would be very helpful to have some basic R solutions which do not need extra packages. Could you provide your simple solution for lagging?
If that is not possible, could you at least provide your workaround here so we can choose amongst second best alternatives? One collection already exists here
Also, in all blog posts on this subject I see people complain about how unexpectedly difficult lagging is so how can we get a simple lag function for data frames into R Core? This must be extremely disappointing for anyone coming from Stata or EViews. Or am I missing something and there is a simple built in solution?
say we want to lag "value" by 3 "year"s for each "country" here:
Data <- data.frame(year=c(rep(2010:2015,2)),country=c(rep("AT",6),rep("DE",6)),value=rnorm(12))
to create L3 like:
year country value L3
2010 AT 0.3407 NA
2011 AT -1.7981 NA
2012 AT -0.8390 NA
2013 AT -0.6888 0.3407
2014 AT -1.1019 -1.7981
2015 AT -0.8953 -0.8390
2010 DE 0.5877 NA
2011 DE -1.0204 NA
2012 DE -0.6576 NA
2013 DE 0.6620 0.5877
2014 DE 0.9579 -1.0204
2015 DE -0.7774 -0.6576
And we neither want to change the nature of our data (to ts or data table) nor do we want to immerse ourselves in three new packages when the deadline is tonight and our supervisor uses Stata and thinks lagging is easy ;-) (its not, I just want to be prepared...)
p.s.:
without groups
with data.table: Lag in dataframe or How to create a lag variable within each group?
time series are straightforward
If the question is how to provide a column with the prior third year's value not using packages then try this:
prior_year3 <- function(x, k = 3) head(c(rep(NA, k), x), length(x))
transform(Data, prior_year_value = ave(value, country, FUN = prior_year3))
giving:
year country value prior_year_value
1 2010 AT -1.66562121 NA
2 2011 AT -0.04950063 NA
3 2012 AT 1.55930293 NA
4 2013 AT -0.40462394 -1.66562121
5 2014 AT 0.78602610 -0.04950063
6 2015 AT 0.73912916 1.55930293
7 2010 DE 1.03710539 NA
8 2011 DE -1.13370942 NA
9 2012 DE -1.20530981 NA
10 2013 DE 1.66870572 1.03710539
11 2014 DE 1.53615793 -1.13370942
12 2015 DE -0.09693335 -1.20530981
That said, to use R effectively you do need to learn how to use the key packages.
Try slide from data combine package, its simple
slide(Data,Var='value',GroupVar = 'country',slideBy=-3)
I have code like this:
today<-as.Date(Sys.Date())
spec<-as.Date(today-c(1:1000))
df<-data.frame(spec)
stage.dates<-as.Date(c('2015-05-31','2015-06-07','2015-07-01','2015-08-23','2015-09-15','2015-10-15','2015-11-03'))
stage.vals<-c(1:8)
stagedf<-data.frame(stage.dates,stage.vals)
df['IsMonthInStage']<-ifelse(format(df$spec,'%m')==(format(stagedf$stage.dates,'%m')),stagedf$stage.vals,0)
This is producing the incorrect output, i.e.
df.spec, df.IsMonthInStage
2013-05-01, 0
2013-05-02, 1
2013-05-03, 0
....
2013-05-10, 1
It seems to be looping around, so stage.dates is 8 long, and it is repeating the 'TRUE' match every 8th. How do I fix this so that it would flag 1 for the whole month that it is in stage vals?
Or for bonus reputation - how do I set it up so that between different stage.dates, it will populate 1, 2, 3, etc of the most recent stage?
For example:
31st of May to 7th of June would be populated 1, 7th of June to 1st of July would be populated 2, etc, 3rd of November to 30th of May would be populated 8?
Thanks
Edit:
I appreciate the latter is functionally different to the former question. I am ultimately trying to arrive at both (for different reasons), so all answers appreciated
see if this works.
cut and split your data based on the stage.dates consider them as your buckets. you don't need btw stage.vals here.
Cut And Split
data<-split(df, cut(df$spec, stagedf$stage.dates, include.lowest=TRUE))
This should give you list of data.frame splitted as per stage.dates
Now mutate your data with index..this is what your stage.vals were going to be
Mutate
data<-lapply(seq_along(data), function(index) {mutate(data[[index]],
IsMonthInStage=index)})
Now join the data frame in the list using ldply
Join
data=ldply(data)
This will however give out or order dates which you can arrange by
Sort
arrange(data,spec)
Final Output
data[1:10,]
spec IsMonthInStage
1 2015-05-31 1
2 2015-06-01 1
3 2015-06-02 1
4 2015-06-03 1
5 2015-06-04 1
6 2015-06-05 1
7 2015-06-06 1
8 2015-06-07 2
9 2015-06-08 2
10 2015-06-09 2
I am having a tough time understanding the how to formulate code to a cutting stock problem. I have searched the web extensively and I see a lot of theory but no actual examples.
The majority of query results point to the wikipedia page: http://en.wikipedia.org/wiki/Cutting_stock_problem
13 patterns to be produced, with required amounts indicated alongside.
The machine produces by default a 5600 width piece to be cut into widths below. Goal is to minimize waste.
Widths/Required amount
1380 22
1520 25
1560 12
1710 14
1820 18
1880 18
1930 20
2000 10
2050 12
2100 14
2140 16
2150 18
2200 20
Would someone show me how to formulate this solution in R with lpsolve/lpsolve API?
stock=5600
widths=c(1380,1520,1560,1710,1820,1880,1930,2000,2050,2100,2140,2150,2200)
required=c(22,25,12,14,18,18,20,10,12,14,16,18,20)
library(lpSolveAPI)
...
solve(lprec)
get.variables(lprec)
You could model it as a Mixed Integer Problem and solve it using various techniques. Of course to generate variables (i.e. a valid pattern of widths) you need to use a suitable column generation method.
Have a look at this C++ project: https://code.google.com/p/cspsol
cspsol is based on GLPK API library, uses column generation and branch & bound to solve the MIP. It may give you some hints about how to do it in R.
Good luck !
I'm running into a problem when trying to use the readHTMLTable function in the R package XML. When running
library(XML)
baseurl <- "http://www.pro-football-reference.com/teams/"
team <- "nwe"
year <- 2011
theurl <- paste(baseurl,team,"/",year,".htm",sep="")
readurl <- getURL(theurl)
readtable <- readHTMLTable(readurl)
I get the error message:
Error in names(ans) = header :
'names' attribute [27] must be the same length as the vector [21]
I'm running 64 bit R 2.15.1 through R Studio 0.96.330. It seems there are several other questions that have been asked about the readHTMLTable() function, but none addressed this specific question. Does anyone know what's going on?
When readHTMLTable() complains about the 'names' attribute, it's a good bet that it's having trouble matching the data with what it's parsed for header values. The simplest way around this is to simply turn off header parsing entirely:
table.list <- readHTMLTable(theurl, header=F)
Note that I changed the name of the return value from "readtable" to "table.list". (I also skipped the getURL() call since 1. it didn't work for me and 2. readHTMLTable() knows how to handle URLs). The reason for the change is that, without further direction, readHTMLTable() will hunt down and parse every HTML table it can find on the given page, returning a list containing a data.frame for each.
The page you have sent it after is fairly rich, with 8 separate tables:
> length(table.list)
[1] 8
If you were only interested in a single table on the page, you can use the which attribute to specify it and receive its contents as a data.frame directly.
This could also cure your original problem if it had choked on a table you're not interested in. Many pages still use tables for navigation, search boxes, etc., so it's worth taking a look at the page first.
But this is unlikely to be the case in your example since it actually choked on all but one of them. In the unlikely event that the stars aligned and you were only interested in the successfully-oarsed third table on the page (passing statistics) you could grab it like this, keeping header parsing on:
> passing.df = readHTMLTable(theurl, which=3)
> print(passing.df)
No. Age Pos G GS QBrec Cmp Att Cmp% Yds TD TD% Int Int% Lng Y/A AY/A Y/C Y/G Rate Sk Yds NY/A ANY/A Sk% 4QC GWD
1 12 Tom Brady* 34 QB 16 16 13-3-0 401 611 65.6 5235 39 6.4 12 2.0 99 8.6 9.0 13.1 327.2 105.6 32 173 7.9 8.2 5.0 2 3
2 8 Brian Hoyer 26 3 0 1 1 100.0 22 0 0.0 0 0.0 22 22.0 22.0 22.0 7.3 118.7 0 0 22.0 22.0 0.0
I'm looking for a mathmatical ranking formula.
Sample is
2008 2009 2010
A 5 6 4
B 6 7 5
C 7 8 2
I want to add a rank column for each period code field
rank
2008 2009 2010 2008 2009 2010
B 6 7 5 2 1 1
A 5 6 4 3 2 2
C 7 2 2 1 3 3
please do not reply with methods that loop thru the rows and columns, incrementing the rank value as it goes, that's easy. I'm looking for a formula much like finding the percent total (item / total). I know i've seen this before but an havning a tough time locating it.
Thanks in advance!
sort ((letters_col, number_col) descending by number_col)
As efficient as your sort alg.
Then number the rows, of course
Edit
I really got upset by your comment "please don't up vote this answer, sorting and loop is not what I'm asking for. i specifically stated this in my original question. " , and the negative votes, because, as you may have noted by the various answers received, it's basically correct.
However, I remained pondering where and how you may "have seen this before".
Well, I think I got the answer: You saw this in Excel.
Look at this:
This is the result after entering the formulas and sorting by column H.
It's exactly what you want ...
What are you using? If you're using Excel, you're looking for RANK(num, ref).
=RANK(B2,B$2:B$9)
I don't know of any programming language that has that built in, it would always require a loop of some form.
If you want the rank of a single element, you can do it in O(n) by looping through the elements, counting how many have value above the given element, and adding 1.
If you want the rank of all the elements, the best (and really only) way is to sort the elements. Anything else you do will be equivalent to sorting (there is no "formula")
Are you using T-SQL? T-SQL RANK() may pull what you want.