r data frame partial matching returns NULL - r

Following data frame
> dx
estyear age us_population
1 1980 0 3559857
2 1980 1 3315535
3 1981 0 3607440
4 1981 1 3436005
Split into groups
years <- c(dx[,'estyear'])
> years
[1] 1980 1980 1981 1981
dx.split <- split(dx, years)
> dx.split
$`1980`
estyear age us_population
1 1980 0 3559857
2 1980 1 3315535
$`1981`
estyear age us_population
3 1981 0 3607440
4 1981 1 3436005
Then return partial match subset by year. When passing a string literal the partial works just fine.
dx.split$'1980'
> dx.split$'1980'
estyear age us_population
1 1980 0 3559857
2 1980 1 3315535
This is where the problem occurs. However, when passing a variable set to the same value it returns NULL.
selyear <- '1980'
> selyear
[1] "1980"
dx.split$selyear
> dx.split$selyear
NULL
Hopefully this is something simple.
The reason I want to use a variable is because I intend to iterate over the data frame by year driven by external inputs. I am open to alternate paths that get to the same result and/or are easier
Thanks

Related

Loop through and subtract two columns in R

I have a df that looks like this:
Year Subscribers Forecast AbsError
1 2006 23188171 0 0
2 2007 28745769 0 0
3 2008 34880964 0 0
4 2009 46373266 0 0
I have a lop that fills in the forecast column and then it should subtract the subscriber value from the forecast value and put that number into the AbsError column, like so:
Year Subscribers Forecast AbsError
1 2006 23188171 9680000 13508171
2 2007 28745769 27960000 46240000
3 2008 3488096 46240000 11359036
My Loop looks like this:
for (i in 1:nrow(new.phone)) {
new.phone$Forecast[i] <- ((1.828e+07 )*new.phone$Year[i]) + (-3.666e+10)
new.phone$AbsError <- abs((new.phone$Subscribers[i] - new.phone$Forecast[i]))
}
Although this loop is giving the correct forecasted values, its giving the incorrect AbsError values, but i cannot figure out why. All the AbsError values are 10464033, but that is wrong. Any ideas why this is?
Thanks for the help!
You don't need a for loop to do that. This does what you need:
new.phone$Forecast <- ((1.828e+07) * new.phone$Year) + (-3.666e+10)
new.phone$AbsError <- abs(new.phone$Subscribers - new.phone$Forecast)
You were just missing the index in the second line of the loop. Should be: new.phone$AbsError[i] <- [...] not new.phone$AbsError <- [...].
Anyway, you could skip the loop of you want:
new.phone$Forecast <- (1.828e+07) * new.phone$Year + (-3.666e+10)
new.phone$AbsError <- abs(new.phone$Subscribers - new.phone$Forecast)
new.phone
Year Subscribers Forecast AbsError
1 2006 23188171 9680000 13508171
2 2007 28745769 27960000 785769
3 2008 34880964 46240000 11359036
4 2009 46373266 64520000 18146734
Try this in dplyr:
require(dplyr)
k <- read.table(text = "Year Subscribers Forecast AbsError
1 2006 23188171 0 0
2 2007 28745769 0 0
3 2008 34880964 0 0
4 2009 46373266 0 0")
k%>%mutate(Forecast = ((1.828e+07 )*Year) + (-3.666e+10) )%>%
mutate(AbsError = abs(Subscribers-Forecast))
Results:
Year Subscribers Forecast AbsError
1 2006 23188171 9680000 13508171
2 2007 28745769 27960000 785769
3 2008 34880964 46240000 11359036
4 2009 46373266 64520000 18146734

Convert dataframe to list for Farrington algorithm algo.farrington

I have the following example data frame:
data.frame(WEEK=c(1:10),YEAR=2000,
NUMBER=c(0,1,4,25,9,7,4,2,9,12))
WEEK YEAR NUMBER
1 1 2000 0
2 2 2000 1
3 3 2000 4
4 4 2000 25
5 5 2000 9
6 6 2000 7
7 7 2000 4
8 8 2000 2
9 9 2000 9
10 10 2000 12
I want to use the Farrington algorithm algo.farrington from the surveillance package in R. However, in order to do so my data have to be an object of class disProgObj. Based on the example I found in the PDF of the surveillance package the result should be a list.
Does anyone know how to convert my data so I can get the algorithm to work?
To handle such data, the R package surveillance provides the S4 class "sts" (surveillance time series), which supersedes the "disProg" class. To convert your data to an "sts" object:
x <- data.frame(WEEK=c(1:10), YEAR=2000, NUMBER=c(0,1,4,25,9,7,4,2,9,12))
xsts <- sts(observed = x$NUMBER, start = c(2000, 1), frequency = 52)
xsts
which yields:
-- An object of class sts --
freq: 52
start: 2000 1
dim(observed): 10 1
Head of observed:
observed1
[1,] 0
This "sts" object could be converted to the obsolete "disProg" class via sts2disProg() as illustrated in Roman's answer. However, this conversion is not necessary since the function farrington() can be used directly with an "sts" object (it internally calls algo.farrington()).
The package authors encourage the use of the newer "sts" class to encapsulate count time series. See the package vignette("monitoringCounts") published at http://doi.org/10.18637/jss.v070.i10 for a description of the outbreak detection tools.
Something like this?
library(surveillance)
x <- data.frame(WEEK=c(1:10),YEAR=2000,
NUMBER=c(0,1,4,25,9,7,4,2,9,12))
xsts <- sts(observed = x$NUMBER, start = c(2000, 1), frequency = 52)
sts2disProg(sts = xsts)
The above conversion results in a "disProg" object, which prints as follows:
-- An object of class disProg --
freq: 52
start: 2000 1
dim(observed): 10 1
Head of observed:
observed1
[1,] 0

How transform (calculate) an ordinal variable to a dichotomous in R?

I want to transform an ordinal variabel (0-2) – where 0 is no rights, 1 is some rights, and 2 full rights – to a dichotomous variable.
The original ordinal variable is coded for each country and year (country-year unit).
I want to create a dichotomous variable, (let's call it Improvement), capturing all annual positive changes, for each country-year. So when it goes from 0 to 1 (or from 0 to 2, or from 1 to 0), I want it to be 1 for that year and country. And zero otherwise.
Below I give an example of how my data looks like. The "RIGHTS" is the original ordinal variable. The "MY DICHOTOMOUS" variable is what I want to calculate in R. How can I do it?
COUNTRY YEAR RIGHTS MY DICHOTOMOUS
A 1990 0 0
A 1991 0 0
A 1992 0 0
A 1993 1 1
A 1994 0 0
B 1990 1 1
B 1991 1 0
B 1992 1 0
B 1993 1 0
B 1994 1 0
Please, note that the original data can go the other away as well, i.e. it can go negative. I do not want to code for negative changes for this dichotomous variable.
We can use diff
df1$dichotomous <- +c(FALSE,diff(df1$RIGHTS)==1)
df1$dichotomous
#[1] 0 0 0 1 0 1 0 0 0 0
This assumes you don't consider starting with a 1 in rights as a 1 in dichotomous:
x <- rights
n <- length(x)
dichotomous <- c(0, as.numeric(x[-1] - x[-n] == 1))
Might have to do a series of ifelse() statements. But then again I might be miss reading your question. An example is posted below.
MY.DATA$MY.DICHOTOMOUS <- with(MY.DATA,ifelse(COUNTRY=="A",RIGHTS,ifelse(COUNTRY=="B"&YEAR==1990,1,factor(RIGHTS)))`

Ifelse statements for a dataframe in R

I am hoping that someone can help me figure out how to write an if-else statement to work on my dataset. I have data on tree growth rates by year. I need to calculate whether growth rates decreased by >50% from one year to the next. I am having trouble applying an ifelse statement to calculate my final field. I am relatively new to R, so my code is probably not very efficient, but here is an example of what I have so far:
For an example dataset,
test<-data.frame(year=c("1990","1991","1992","1993"),value=c(50,25,20,5))
year value
1 1990 50
2 1991 25
3 1992 20
4 1993 5
I then calculate the difference between the current year and previous year's growth ("value"):
test[-1,"diff"]<-test[-1,"value"]-test[-nrow(test),"value"]
year value diff
1 1990 50 NA
2 1991 25 -25
3 1992 20 -5
4 1993 5 -15
and then calculate what 50% of each years' growth would be:
test$chg<-test$value * 0.5
year value diff chg
1 1990 50 NA 25.0
2 1991 25 -25 12.5
3 1992 20 -5 10.0
4 1993 5 -15 2.5
I am then trying to use an ifelse statement to calculate a field "abrupt" that would be "1" when the decline from one year to the next is greater than 50%. This is the code I am trying to use, but I'm not sure how to properly reference the "chg" field from the previous year, because I am getting an error (copied below):
test$abrupt<-ifelse(test$diff<0 && abs(test$diff)>=test[-nrow(test),"chg"],1,0)
Warning message:
In abs(test$diff) >= test[-nrow(test), "chg"] :
longer object length is not a multiple of shorter object length
> test
year value diff chg abrupt
1 1990 50 NA 25.0 NA
2 1991 25 -25 12.5 NA
3 1992 20 -5 10.0 NA
4 1993 5 -15 2.5 NA
A test of a similar ifelse statement worked when I just assigned a few numbers, but I'm not sure how to get this to work in the context of a datframe. Here is an example of it working on just a few values:
prevyear<-50
curryear<-25
chg<-prevyear*0.5
> chg
[1] 25
> diff<-curryear-prevyear
> diff
[1] -25
> abrupt<-ifelse(diff<0 && abs(diff)>= chg,1,0)
> abrupt
[1] 1
If anyone could help me figure out how to apply a similar ifelse statement to my dataframe I would greatly appreciate it! Thank you for any help you can provide.
thank you,
Katie
It's throwing a warning because the two vectors compared abs(test$diff) >= test[-nrow(test),"chg"] have different lengths. Also, for logical and, you are using && (which gives only one TRUE or FALSE) when you should be using & (which is vectorized: it operates elementwise over two vectors and returns a vector of the same length). Try this:
test$abrupt<-ifelse(test$diff<0 & abs(test$diff)>=test$chg,1,0)
I would change where you're putting chg so that it lines up with the diff you want to compare it to:
test$chg[2:nrow(test)] <- test$value[1:(nrow(test)-1)] * 0.5
Then, correct your logical operator like Blue Magister said:
test$abrupt<-ifelse(test$diff<0 & abs(test$diff)>=test$chg,1,0)
and you have your results:
year value diff chg abrupt
1 1990 50 NA NA NA
2 1991 25 -25 25.0 1
3 1992 20 -5 12.5 0
4 1993 5 -15 10.0 1
Also, you may find the function diff helpful: rather than doing this:
test[-1,"value"]-test[-nrow(test),"value"]
you can just do
diff(test$value)

row minus row within different list R

How can I calculate the difference between different rows within different list?
and different list have different dimensions.
I use the code as follows
names(ri1)
[1] "Sedol" "code" "ri" "date"
ri1<-ri1[order(ri1$Sedol,ri1$date),]
sri<-split(ri1,ri1$Sedol)
ri1$r<-as.vector(sapply(seq_along(sri), function(x) diff(c(0, sri[[x]][,3]))))
however it shows the result
"Error in `$<-.data.frame`(`*tmp*`, "r", value = list(c(100, 0.00790000000000646, :
replacement has 1485 rows, data has 4687655"
for example
I have three lists
date ri
1990 1
1991 2
1992 3
date ri
1990 1
1991 2
1992 3
1993 4
date ri
1990 1
1991 2
I want the results like
date ri r
1990 1 0%
1991 2 100%
1992 3 100%
date ri r
1990 1 0%
1991 2 100%
1992 3 100%
1993 4 100%
date ri r
1990 1 0%
1991 2 100%
notice: r= r(t+1)/r(t)-1
Using diff and lapply you can get something like
# I generate some data
dat1 <- data.frame(date = seq(1990,1999,length.out=5),ri = seq(1,10,length.out=5))
dat2 <- data.frame(date = seq(1990,1999,length.out=5),ri=seq(1,5,length.out=5))
# I put the data.frame in a list
ll <- list(dat1,dat2)
# I use lapply:
ll <- lapply(ll,function(dat){
# I apply the formula you give in a vector version
# maybe you need only diff in percent?
dat$r <- round(c(0,diff(dat$ri))/dat$ri*100)
dat
})
ll
[[1]]
date ri r
1 1990.00 1.00 0
2 1992.25 3.25 69
3 1994.50 5.50 41
4 1996.75 7.75 29
5 1999.00 10.00 22
[[2]]
date ri r
1 1990.00 1 0
2 1992.25 2 50
3 1994.50 3 33
4 1996.75 4 25
5 1999.00 5 20
You should use a combination of head and tail as follows:
r.fun <- function(ri) c(0, tail(ri, -1) / head(ri, -1) - 1)
lapply(sri1, transform, r = r.fun(ri))
If your goal is to recombine (rbind) your data afterwards, then know that you can split/apply/combine everything within a single call to ave from the base package, or ddply from the plyr package:
transform(ri1, r = ave(ri, Sedol, FUN = r.fun))
or
library(plyr)
ddply(ri1, "Sedol", transform, r = r.fun(ri))
Edit: If you want the output to be in XX% as in your example, replace r.fun with:
r.fun <- function(ri) paste0(round(100 * c(0, tail(ri, -1) / head(ri, -1) - 1)), "%")

Resources