I am trying to carry out following operation in R
I have different series of data,
series 1: 75, 56, 100, 23, 38, 40 series 2: 60, 18, 86, 100, 44
I would like to annex these data. To do so, I have to multiply series 1 by 1.5 to make last data of series 1 (40) match with the first data of the second series (60) (40*1.5=60)
Same way I would like to match many different series, but for other series I will need to multiply by other numbers. For another series i.e Series1: ...20 ; Series 2: 80... I would have to multiply it by 4.
How can I carry out such an operation to many series in many data frames?
Thanks in advance,
Given two vectors x and y, the function f(x,y) below will convert x the way you desire.
f <- function(x,y) x*(y[1]/x[length(x)])
Usage:
x = c(75,56,100,23,38,40)
y = c(60,18,86,100,44)
f(x,y)
Output:
[1] 112.5 84.0 150.0 34.5 57.0 60.0
However, how this approach gets applied to "many series in many data frames" depends on the actual structure you have, and what type of output you want.
Related
Say I have three individuals and I know the payment they require to enter different amounts of land into a scheme. I want to know how much land each participant would enter into the scheme for a given payment rate. I want them to enter the max amount they are willing for that payment rate. Previously I did this with a long ifelse statement, but that will not run inside a loop, so I'm looking for an alternative.
In this example, I've excluded a load of areas so it just presents as if participants can enter 50, 49 or 1 unit(s) of area.
paym_sh1a=200
paym_area_50 <- c(250, 150, 210)
paym_area_49 <- c(240, 130, 190)
paym_area_1 <- c(100, 20, 90)
area_enrolled<-
ifelse(paym_area_50<paym_sh1a,50,ifelse(paym_area_49<paym_sh1a,49,
ifelse(paym_area_1<paym_sh1a,1,0)))
You could create a table of your values:
paym_area = rbind(paym_area_50, paym_area_49, paym_area_1)
And then use vectorised operations more effectively. In particular, since your thresholds are decreasing, you could compare the whole table to the sh1a value, and count how many rows are below it:
(sums = colSums(paym_area < paym_sh1a))
# [1] 1 3 2
This vector can be used as an index into your results:
values = c(0, 50, 49, 1)
(result = values[sums + 1L])
# [1] 50 1 49
Right now I am trying to create a new dummy variable in a dataset out of a variable that has more than two vectors. More specifically, my dataset has a "State" variable, and I want to make a dummy where 1 = states in the North, and 0 = all other states. Here's a portion of the dataset (it's an extremely large set so I'll only include the essential data):
Year StateICP
1 1940 71
2 1940 21
3 1940 22
4 1940 32
5 1940 18
6 1940 22
7 1940 45
8 1940 40
9 1940 33
So what I would want to do is create a new Column (called "North") where if the StateICP = 21, 22, 40, or 45, then the new variable would = 1, and otherwise would be 0. Like I said, this is a very large dataset (over 1000000 observations), so I can't enter it row by row manually. I tried an ifelse function, but that only gave me errors.
I'm sure this isn't that complicated, but I am fairly new to R. I know how to create a dummy variable normally, but I am getting stuck here. Any help would be greatly appreciated! Thank you!
So, creating simple dataset to replicate what you have above:
df <- data.frame(Year = rep(1940,500), StateICP = sample(1:100, 500, TRUE))
This will create a data.frame with columns like you describe and 500 records. The StateICP values are randomly generated integers between 1 and 100. If we want to code a boolean we could simply add a new column:
df$boolean <- df$StateICP %in% c(21, 22, 40, 45)
If we want to code them specifically as 0,1 as you describe then you can use ifelse:
df$dummy <- ifelse(df$StateICP %in% c(21, 22, 40, 45), 1, 0)
You have to make sure you are using a vector in the ifelse (since it does not accept a data argument).
I have a program that outputs to file an unevenly spaced time series of vectors (one vector per interval) that vary in size . I'm wondering what would be the best way of formatting the output so that the file can be read into a list of vectors in R (Assuming that is the correct data structure), and what code in R i would use to read it.
For example, I imagine the output could look something like this:
1, 24, 5, 211
3, 5
59, 465, 3, 333, 9, 98
or
(1 24 5 211)
(3 5)
(59 465 3 333 9 98)
But what I'm saying is that I want to change the formatting to suite the R read function.
Keep fill = TRUE
data = read.table(file.choose(),sep=",",fill=TRUE)
data[is.na(data)] <- "" # Replacing NA Values with nothing..
I am trying to calculate a rolling sum for a time series of returns r ranging over T dates. However at each date t when I calculate the rolling sum, I want to factor in a weight w for each number in the rolling sum.
The formula would be for every date t:
[Sum from i=1 to m](w(i)*r(t-i-1))
Lets look at a very simple example. I have a return series of T=6 returns r. For each date t I want to calculate the rolling sum over the last two dates (m=2). I also want to weight the first observation twice as much as the second.
r <- c(100,110,100,110,100,110)
w <- c(1,0.5)
I know that I can easily do the rolling sum using the filter function:
filter(r, rep(1, 2))
However I am not able to include the weight factor into the rolling sum. The following line gives the wrong result of c(155, 155, 155, 155, 155, NA)
filter(r*w, rep(1, 2))
where I would really like to have the result c(155, 160, 155, 160, 155, NA)
Any help is appreciated.
Here's one way to do it:
filter(r, rev(w))
# [1] 155 160 155 160 155 NA
An important information about the argument filter from the help page of ?filter:
filter
a vector of filter coefficients in reverse time order (as for AR or MA coefficients).
rollapply in the zoo package can do that:
> rollapply(r, 2, crossprod, w, fill = NA)
[1] 155 160 155 160 155 NA
I'm trying to have my bottom graphs have widths proportional to the length of the month.
However, I end up with
I have 2 levels of graphs, 1 with a larger plot that takes an entire row and another with 12 plots that take up the entire 2nd row.
For the 2nd row plots, I wish to have their widths proportional to the length of the month so I do
layout(matrix(c(rep(1,12),2:13),nrow=2,byrow=T),widths=c(1,months))
months
[1] 31 28 31 30 31 30 31 31 30 31 30 31
The odd thing is when I manually adjust the numbers in the widths array ( using 1s and 2s). The sizes do fluctuate accordingly. However, in this case. This appears to not be the case.
January is much too short. Am I missing something in my logic?
You have only 12 plots on the bottom line but you provided 13 values for the widths= argument (number 1 plus all months values). Just use widths=months to get the result you need. Number of values you provided with the widths= argument should be equal to the number of plots in one longest row but not to the number of all plots.
months<-c( 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31)
layout(matrix(c(rep(1,12),2:13),nrow=2,byrow=T),widths=months)
layout.show(n = 13)
The answer of Didzis Elferts is a good answer. Mine juste to note that when you deal with time series you can use xts package. Specially that plot are done in the base graphics .
here an example:
I use monthly.apply to split my object.
I use plot.xts to plts my time series.
First , I generate some random data
library(xts)
days <- seq.Date( as.Date("2011-01-01"), as.Date("2011-12-31") ,1)
dat <- xts(rnorm(365),days)
## I use monthly apply to compute months widths.
## no need to give them by hand.
widths <- coredata(apply.monthly(dat,length))
par(bg="lightyellow", mar=c(2,2,2,0))
layout(matrix(c(rep(1,12),2:13),nrow=2,byrow=T),widths=widths*2)
mon <- months(days,abbreviate=T)
plot(dat,main = 'my year time series')
apply.monthly(dat,function(x) {
if(unique(format((index(x)),'%m')) =='01') {#JAN
par(mar=c(2,2,2,0)) ## special case of JAN because it contians y axis
plot(x,main='')
}
else{
par( mar=c(2,0,2,0))
plot(x,main='',ylab='')
}
})
Note that : JAN panel is not smaller than Feb one. It contains the y axis.