R - Object not found error when using ddply - r

I'm applying ddply to the following data frame. The point is to apply ecdf function to yearly_test_count value to rows that have the same country.
> head(test)
country yearly_test_count download_speed
1 AU 1 2.736704
2 AU 6 3.249486
3 AU 6 2.287267
4 AU 6 2.677241
5 AU 6 1.138213
6 AU 6 3.205364
This is the script I used:
house_total_year_ecdf <- ddply(test, c("country"), mutate,
ecdf_val = ecdf(yearly_test_count)(yearly_test_count)*length(yearly_test_count))
But I received the following error:
Error in eval(substitute(expr), envir, enclos) :
object 'yearly_test_count' not found
==================================================================
I tried using the function ecdf alone with yearly_test_count column and it works:
ecdf(test$yearly_test_count)(test$yearly_test_count)*length(test$yearly_test_count)
Anyone has any idea why this doesn't work when using ddply?
This is weird since the script worked before, now I run the script again and encounter the mentioned error. I'm not sure if this issue is related to different in versions of R or versions of the package?
Any help is much appreciated ! :)

One option would be using ave from base R
test$ecdf_val <- with(test, ave(yearly_test_count, country,
FUN = function(x) ecdf(x)(x)*length(x)))

Related

R - Making a ggplot while using survey package

I am stuck with a real problem.
My dataset comes from a survey and to make it usable to find statistics about the whole French population, I must weight it with weights.
For this purpose, I used the survey package, but the syntax is not really easy to use with R.
Is there a way to use ggplot while having weights?
To explain it a bit better, here is my dataset:
head(df)
Id Weight Var1
1 30 0
2 12.4 0
3 68.2 1
So my individual 1 accounts for 30 people in the French population.
I create a df_weighted dataset using the survey package.
How can I use ggplot now? df_weighted is a list!
I did something like this to try to escape the list problem but I did not work at all...
df_weighted_ggplot$var1 <- svytable(~var1, df_weighted)
df_weighted_ggplot$var_fill <- svytable(~var_fill, df_weighted)
ggplot(df_weighted_ggplot, aes(fill = var_fill , x =var1)) + geom_bar(position = "fill")
I received this predictable error:
Erreur : `data` must be a data frame, or other object coercible by `fortify()`, not a list
Do you know any other package which should help me? But I read many forums and it seems to be the most helpful...

R mutate ifelse update conditional row with calculated function value

I am use R mutate to update a specific (conditional) row with a calculated function, namely, nrow(), to update with an add (+) value. I cannot use apply() as I need to update only one (1) row for a specific value.
For example, when find row Year==2007 and Month==06, add Incoming.Exam + nrow(df3), so that row will be 698+nrow value.
I get the following error from mutate impl:
Error in mutate_impl(.data, dots) :
Column abberville_LA must be length 96 (the number of rows) or one, not 4
abberville_LA %>%
mutate(abberville_LA, Incoming.Exam = ifelse(abberville_LA$Year == 2007 & abberville_LA$Month == 06, abberville_LA, Incoming.Exam + nrow(abberville_df3), abberville_LA$Incoming.Exam))
head(abberville_LA, 3)
Incoming.Exam Year Month ts_date
1 698 2007 6 2007-06-01
2 NaN 2010 6 2010-06-01
1 .Your question is not clear , So I am trying to apprehend what you want and answering the question
2 .You are using $ in mutate which is not required . Running the below code should solve the issue .
abberville_LA %>%
mutate(Incoming.Exam = ifelse(Year == '2007' & Month == '06', Incoming.Exam + nrow(abberville_df3),Incoming.Exam))
the issue was the library dplyr. I discovered that I had an slightly older version and needed to update to resolve the "Error in mutate_impl(.data, dots) : Evaluation error: as_dictionary() is defunct as of rlang 0.3.0. Please use as_data_pronoun() instead" error message, which was pointing out that another version of dplyr should be utilized. This fixed the code that was provided as answers on this forum.

subscript out of bounds error in R programming

Getting following error while using prophet library:
Error in [<-(*tmp*, m$history$t >= m$changepoints.t[i], i, value =
1) : subscript out of bounds
Code : m <- prophet(data) this data I've loaded from csv file.
My dataset looks like this :
ds y
1 2017-05-23 08:07:00 21.16641
2 2017-05-23 08:07:10 16.79345
3 2017-05-23 08:07:20 16.40846
4 2017-05-23 08:07:30 16.24653
5 2017-05-23 08:07:40 16.14694
6 2017-05-23 08:07:50 15.89552
ds column is of following type :"POSIXct" "POSIXt"
y column is of following type :"numeric" (these are log values of some count values)
Being new to R, i don't have any clue on how to resolve this. Please help.
Your data does not have any change points (points of interest in your data series where there is change in the local trend direction). This error seems like a bug in the Prophet package which is not handling this situation gracefully. However you can fix this by setting the changepoint tuning parameters.
Quick fix: set changespoints to 0 by using param:
n.changepoints = 0
in your prophet call.

R data table usage as parameter in package

I have a problem using a data.table as a parameter to a function.
If I define the function in the script I'm working in it works - see fn_good.
If I define the function (identically) as part of a package I've made it won't work fully. It seems that the column names are not recognized. Commands within the function such as 'tables()' or x[1:5,1:2] work fine. It is just the column names can't be used as they were in fn_good.
The other functions in my package work ok.
Any Ideas?
many thanks
R.version 3.0.0
cd<-data.table(PY=1992:2001,DV=1:10,IN=2000)
fn_good<-function(x) {x[1:5, list(PY, DV)]}
fn_good(x=cd)
PY DV
1: 1992 1
2: 1993 2
3: 1994 3
4: 1995 4
5: 1996 5
fn_in_Package_Bad
function (x)
{
x[1:5, list(PY, DV)] #identical to above
}
<environment: namespace:RBasicChainLadder>
fn_in_Package_Bad(x=cd)
Error in `[.data.frame`(x, i, j) : object 'PY' not found
To make the package data.table aware I had to add
depends: data.table
to the package description file

fast join data.table (potential bug, checking before reporting)

This might be a bug. In that case, I will delete this question and report as bug. I would like someone to take a look to make sure I'm not doing something incorrectly so I don't waste the developer time.
test = data.table(mo=1:100, b=100:1, key=c("mo", "b"))
mo = 1
test[J(mo)]
That returns the entire test data.table instead of the correct result returned by
test[J(1)]
I believe the error might be coming from test having the same column name as the table which is being joined by, mo. Does anyone else get the same problem?
This is a scoping issue, similar to the one discussed in data.table-faq 2.13 (warning, pdf). Because test contains a column named mo, when J(mo) is evaluated, it returns that entire column, rather than value of the mo found in the global environment, which it masks. (This scoping behavior is, of course, quite nice when you want to do something like test[mo<4]!)
Try this to see what's going on:
test <- data.table(mo=1:5, b=5:1, key=c("mo", "b"))
mo <- 1
test[browser()]
Browse[1]> J(mo)
# mo
# 1: 1
# 2: 2
# 3: 3
# 4: 4
# 5: 5
# Browse[1]>
As suggested in the linked FAQ, a simple solution is to rename the indexing variable:
MO <- 1
test[J(MO)]
# mo b
# 1: 1 6
(This will also work, for reasons discussed in the documentation of i in ?data.table):
mo <- data.table(1)
test[mo]
# mo b
# 1: 1 6
This is not a bug, but documented behaviour afaik. It's a scoping issue:
test[J(globalenv()$mo)]
mo b
1: 1 100

Resources