I think I am mostly there, but I can't figure out the remaining piece of how to code this properly.
I begin with a single column of 15 values. I want to create two new columns with the 'previous' containing the average of the previous two values, and the 'future' creating the average of the next two values.
My code is failing because it is INCLUSIVE of the current row's values.
For example, row3 or '30' should have a 'previous' value of 15 ((10+20/2)) and a future value of 45 ((40+50)/2). instead it is returning 25 and 35 because it is including the 30 with the 20 or 40 when making the averages.
I also am stuck on how to just display the previous value.
Anyone mind telling me how to avoid this problem that I am experiencing?
I am using filter but I don't know if there is a better way to do this..
values=data.frame(col=c(10,20,30,40,50,60,70,80,90,100,110,120,130,140,150))
values$previous2 = filter(values$col, rep(1/2,2),sides=1)
values$future2 = filter(values$col, rep(1/2,2),sides=2)
values$last = #should be the previous value - ex 2nd row should be 10
values
For returning the last value try:
values$last = c(NA,values[-nrow(values),1])
or the lag function could be used as well I believe.
Related
cv.uk.df$new.d[2:nrow(cv.uk.df)] <- tail(cv.uk.df$deaths, -1) - head(cv.uk.df$deaths, -1) # this line of code works
I wanted to know why do we -1 in the tail and -1 in head to create this new column.
I made an effort to understand by removing the -1 and "R"(The code is in R studio) throws me this error.
Could anyone shed some light on this? I can't explain how much I would appreciate it.
Look at what is being done. On the left-hand side of the assignment operator, we have:
cv.uk.df$new.d[2:nrow(cv.uk.df)] <-
Let's pick this apart.
cv.uk.df # This is the data.frame
$new.d # a new column to assign or a column to reassign
[2:nrow(cv.uk.df)] # the rows which we are going to assign
Specifically, this line of code will assign a new value all rows of this column except the first. Why would we want to do that? We don't have your data, but from your example, it looks like you want to calculate the change from one line to the next. That calculation is invalid for the first row (no previous row).
Now let's look at the right-hand side.
<- tail(cv.uk.df$deaths, -1) - head(cv.uk.df$deaths, -1)
The cv.uk.df$deaths column has the same number of rows as the data.frame. R gets grouchy when the numbers of elements don't follow sum rules. For data.frames, the right-hand side needs to have the same number of elements, or a number that can be recycled a whole-number of times. For example, if you have 10 rows, you need to have a replacement of 10 values. Or you can have 5 values that R will recycle.
If your data.frame has 100 rows, only 99 are being replaced in this operation. You cannot feed 100 values into an operation that expects 99. We need to trim the data. Let's look at what is happening. The tail() function has the usage tail(x, n), where it returns the last n values of x. If n is a negative integer, tail() returns all values but the first n. The head() function works similarly.
tail(cv.uk.df$deaths, -1) # This returns all values but the first
head(cv.uk.df$deaths, -1) # This returns all values but the last
This makes sense for your calculation. You cannot subtract the number of deaths in the row before the first row from the number in the first row, nor can you subtract the number of deaths in the last row from the number in the row after the last row. There are more intuitive ways to do this thing using functions from other packages, but this gets the job done.
I want to create a continuous futures series, that is to eliminate a gap between two series.
First thing I want is to download all individual contracts from the beginning to the now, the syntax is always the same:
Quandl("CME/INSTRUMENT_MONTHCODE_YEAR")
1.INSTRUMENT is GC (gold) in this case
2.MONTHCODE is G J M Q V Z
3.YEAR is from 1975 to 2017 (the actual contract)
With the data, I start working from the last contract, in this case "CME/GCG1975" and with the next contract "CME/GCJ1975". Then I see the last 6 values (are the more recent because date is descending) of the first contract GCG1975
require(Quandl)
GCG1975 = Quandl("CME/GCG1975",order="asc", type="raw")
tail(GCG1975,6)
order can be asc desc (ascending or descending), type can be : raw (data frame) ts xts zoo
And it outputs:
Image: quandl-1.png = Last values of GCG1975
Then I just want the 6th row starting from the final, and I want to eliminate the columns "Last" "Change" (this could be before starting processing each individual contract):
Image: quandl-2.png = Last 6th value GCG1975
Then I want to find the row with date 1975-02-18 (last 6th value GCG1975) in the next contract (GCJ1975):
Image: quandl-3.png = 1975-02-18 on GCJ1975
Then I compute the difference between the "Settle" of the G contract and the "Settle" of the J contract.
Difference_contract = 183.6 - 185.4
Difference_contract = -1.8
So that means that the next or J contract is 1.5 points up respect the before contract so we have to sum -1.8 to all the following numbers of the J contract (Open, High, Low, Settle), including the row 1975-02-18. This:
Image: quandl-4.png = Differences between contracts
And then we have a continuous series like this:
Image: quandl-5.png = Continuous series
All this differences and sums to make a continuous series is done since the last contract until the actual contract.
I think I can't post this because I don't have 10 points of reputation and I can just post 2 image-links.
Any guidance would help me, any question you have ask me.
Thanks and hope everything is well.
RTA
Edit: I have uploaded the photos and its links on post to my dropbox so you must look into it because Stackoverflow don't allow to post more than 2 links without 10 points of reputation.
Dropbox file
i m a total beginner in Choco Solver. I want to make a simple shift scheduler.
i have set integer variables like this
IntVar day1 = model.intVar("day1", new int[] {0,1,2,3,4,5});
where 0 , 1,...5 is a reference ID to an employee.
I have a total of 30 variables,(one for every day of the month) since this a monthly based shift schedule.
I have set up constraints, that do not allow e.g. not be on shift for two days in a row.
My question is,
how can i set up a constraint, such that each employer has a minimum of 5 shifts ie. each value in the domain appears at least 5 times in all 30 variables ?
Thank you!
There are several ways of doing this. Give a look at model.globalCardinality and model.count, these constraints enable to count the number of times a value is used by a set of variables.
http://choco-solver.org/apidocs/org/chocosolver/solver/constraints/IConstraintFactory.html
For instance, model.count(3, vars, model.intVar(5,10)).post(); means that between 5 and 10 variables in vars should be equal to 3, so employee 3 should do between 5 and 10 shifts.
I would like to create a function that looks at a column of values. from those values look at each value individually, and asses which of the other data points value is closest to that data point.
I'm guessing it could be done by checking the length of the data frame, making a list of the respective length in steps of 1. Then use that list to reference which cell is being analysed against the rest of the column. though I don't know how to implement that.
eg.
data:
20
17
29
33
1) is closest to 2)
2) is closest to 1)
3) is closest to 4)
4) is closest to 3)
I found this example which tests for similarity but id like to know what letter is assigns to.
x=c(1:100)
your.number=5.43
which(abs(x-your.number)==min(abs(x-your.number)))
Also if you know how I could do this, could you expain the parts of the code and what they mean?
I wrote a quick function that does the same thing as the code you provided.
The code you provided takes the absolute value of the difference between your number and each value in the vector, and compares that the minimum value from that vector. This is the same as the which.min function that I use below. I go through my steps below. Hope this helps.
Make up some data
a = 1:100
yourNumber = 6
Where Num is your number, and x is a vector
getClosest=function(x, Num){
return(which.min(abs(x-Num)))
}
Then if you run this command, it should return the index for the value of the vector that corresponds to the closest value to your specified number.
getClosest(x=a, Num=yourNumber)
I am wanting to sort my data but the standard Excel "A to Z" sort function isn't cutting it. I was hoping someone knew how to make a custom sort that could suit my needs. Here is a sample:
chrPos count
chr1_10000598 10
chr1_10000647 10
chr1_10001370 30
chr1_10001390 30
chr1_10001392 30
chr1_10001414 30
chr1_10001418 30
chr1_10001473 10
chr1_10001505 10
chr1_10001516 20
chr1_1000156 30
As you can see the last row is out of place when using the built in sort function, this should be the first row not the last one here. I think adding a second layer of sorting would to the trick but that layer would have to sort by ascending value based on the number that is following the underscore.
Any ideas? Would this possibly be easier with R instead?
Edit to add details from comments:
Sorting is to be ascending on the numeric part after the underscore, within ascending on the chr numeric part (running from 1 to 22 both inclusive) and then chrM_, chrX_ and chrY_ in that order (also with their numeric parts sorted ascending).
The numeric part after the underscore may be up to 8 digits.
Assuming chrPos is in ColumnA, please try in a helper column:
=IF(FIND("_",A1)=5,CHAR(64+MID(A1,4,1)),CHAR(64+MID(A1,4,2)))&REPT("0",8-LEN(A1)+FIND("_",A1))&MID(A1,FIND("_",A1)+1,8)
OR, for additional requirements as mentioned in comments:
=IF(MID(A1,4,1)="M","W",IF(MID(A1,4,1)="X","X",IF(MID(A1,4,1)="Y","Y",IF(FIND("_",A1)=5,CHAR(64+MID(A1,4,1)),CHAR(64+MID(A1,4,2))))))&REPT("0",9-LEN(A1)+FIND("_",A1))&MID(A1,FIND("_",A1)+1,9)
then select the helper column, Copy, Paste Special, Values over the top and use that for sorting.