Random errors from Google Distance Matrix API using gmapsdistance - r

I have a table of origin/destination pairs (lat-long coordinates). I'm using the R library gmapsdistance and mapply to loop over the table calling the API for each row. (The primary reason for this approach is that each row is a unique combination of origin, destination, and departure time, but gmapsdistance does not accept a vector of departure times.)
The problem is that I'm getting random, non-reproducible errors. I'll run the first 2000 rows and something will crash. I will back up and run the first 1000 and then the second 1000 and get no error.
As a result, I'm unable to provide a reproducible example. (If I could, I'd like to think I would have solved this by now.) Here is my mapply call:
result <- mapply(
gmapsdistance,
origin = to_skim$coords_orig,
destination = to_skim$coords_dest,
combinations = "pairwise",
key = api_key,
mode = mode,
departure = to_skim$departure_secs
)
The error messages themselves are not constant. I've seen ERROR : replacement has length zero but also:
AttValue: " or ' expected
attributes construct error
Couldn't find end of Start Tag html line 2
Extra content at the end of the document
The key is that I can re-run the exact same call and get a successful result. Thanks for any and all advice!

Related

nTrials must be be greater.... issue on conjoint design

I'm trying to create a list of conjoint cards using R.
I have followed the professor's introduction, with my own dataset, but I'm stuck with this issue, which I have no idea.
library(conjoint)
experiment<-expand.grid(
ServiceRange = c("RA", "Active", "Passive","Basic"),
IdentProce = c("high", "mid", "low"),
Fee = c(1000,500,100),
Firm = c("KorFin","KorComp","KorStrt", "ForComp")
)
print(experiment)
design=caFactorialDesign(data=experiment, type="orthogonal")
print(design)
at the "design" line, I'm keep getting the following error message:
Error in optFederov(~., data, nTrials = i, approximate = FALSE, nRepeats = 50) :
nTrials must not be greater than the number of rows in data
How do I address this issue?
You're getting this error because you have 144 rows in experiment, but the nTrials mentioned in the error gets bigger than 144. This causes an error for optFederov(), which is called inside caFactorialDesign(). The problem stems from the fact that your Fee column has relatively large values.
I'm not familiar with how the conjoint package is set up, but I can show you how to troubleshoot this error. You can read the conjoint documentation for more on how to select appropriate experimental data.
(Note that the example data in the documentation always has very low numeric values, usually values between 1-10. Compare that with your Fee vector, which has values up to 1000.)
You can see the source code for a function loaded into your RStudio namespace by highlighting the function name (e.g. caFactorialDesign) and hitting Command-Return (on a Mac - probably something similar on PC). You can also just look at the source code on GitHub.
The caFactorialDesign is implemented here. That link highlights the line (26) that is throwing the error for you:
temp.design<-optFederov(~., data, nTrials=i, approximate=FALSE, nRepeats=50)
Recall the error message:
nTrials must not be greater than the number of rows in data
You've passed in experiment as the data parameter, so nrow(experiment) will tell us what the upper limit on nTrials is:
nrow(experiment) # 144
We can actually just think of the error for this dataset as:
nTrials must not be greater than 144
Ok, so how is the value for nTrials determined? We can see nTrials is actually an argument to optFederov(), and its value is set as i - often a sign that there's a for-loop wrapping an operation. And in fact, that's what we see:
for (i in ca.number: profiles.number)
{
temp.design<-optFederov(~., data, nTrials=i, approximate=FALSE, nRepeats=50)
...
}
This tells us that optFederov() is going to get called for each value of i in the loop, which will start at ca.number and will go up to profiles.number (inclusive).
How are these two variables assigned? If we look a little higher up in the caFactorialDesign() definition, ca.number is defined on lines 5-9:
num <- data.frame(data.matrix(data))
vars.number<-length(num)
levels.number<-0
for (i in 1:length(num)) levels.number<-levels.number+max(num[i])
ca.number<-levels.number-vars.number+1
You can run these calculations outside of the function - just remember that data == experiment. So just change that first line to num <- data.frame(data.matrix(experiment)), and then run that chunk of code. You can see that ca.number == 1008!!
In other words, the very first value of i in the for-loop which calls optFederov() is already way bigger than the max limit: 1008 >> 144.
It's possible you can include these numeric values as factors or strings in your definition of experiment - I'm not sure if that is an appropriate way to do this analysis. But I hope it's clear that you won't be able to use such large values in caFactorialDesign(), unless you have a much larger number of total observations in your data.

In mclapply : scheduled cores 9 encountered errors in user code, all values of the jobs will be affected

I went through the existing stackoverflow links regarding this error, but no solution given there is working (and some questions dont have solutions there either)
Here is the problem I am facing:
I run Arima models in parallel using mclapply of parallel package. The sample data is being split by key onto different cores and results are clubbed together using do.call + rbind (the server I place the script in has 20 cores of cpu which is passed on to mc.cores field)
Below is my mclapply code:
print('Before lapply')
data_sub <- do.call(rbind, mclapply(ds,predict_function,mc.cores=num_cores))
print('After lapply')
I get multiple set of values like below as output of 'predict_function'
So basically, I get the file as given above from multiple cores to be send to rbind. The code works perfectly for some part of data. Now, I get another set of data , same like above with same data type of each column, but different value in column 2
data type of each column is given in the column name above.
For the second case, I get below error:
simpleError in charToDate(x): character string is not in a standard unambiguous format
Warning message:
In mclapply(ds, predict, mc.cores = num_cores) :
scheduled cores 9 encountered errors in user code, all values of the jobs will be affected
I dont see this print: print('After lapply') for the second case, but is visible for first case.
I checked the date column in above dataframe, its in Date format. When I tried unique(df$DATE) it threw all valid values in the format as given above.
What is the cause of the error here? is it the first one due to which mclapply isnt able to rbind the values? Is the warning something we need to understand better?
Any advice would be greatly appreciated.

R - skmeans with zeros

I'm a total R beginner and try to cluster user data using the function skmeans.
I always get the error message:
"Error in if (!all(row_norms(x) > 0)) stop("Zero rows are not allowed.") :
missing value where TRUE/FALSE needed".
There already is a topic about this error message explaining that zeros are not allowed in rows.
However, my blueprint for what I'm trying to do is an example based on a data set which is also full of zeros. Working with this example, the error message does not appear and the function works fine. The error message only occurs when I apply the same procedure to my data set which doesn't seem different from the blueprint's data set.
Here's the function used for the kmeans:
weindaten.clusters <- skmeans(wendaten.tr, 5, method="genetic")
And here's the data set:
For my own data set, I used this function
kunden.cluster<- skmeans(test4, 5, method="genetic")
for this data set:
Could somebody please help me understand what the difference between the two data sets is (vector vs. something else maybe) and how I can change my data to be able to use skeams?
You cannot use spherical k-means on this data.
Spherical k-means uses angles for similarity. But the all-zero row cannot be used in angular computations.
Choose a different algorithm, unless you can treat the all-zero roe specially (for example on text, this would be an empty document).

Error in "If" statement in R coding

This is my code:
#Start of my Code#
test1<-function(c,x){
high=0
low=0
samp=NULL
samp=sample(c,x)
for(i in 1:x){
if(samp[i]>1){high=high+1}
else if (samp[i]<0){low=low+1}}
c(high,low,mean(samp),var(samp),samp)
}
sim1 <-function(c,x){
replicate(nsim,{test1(c,x)})}
size=10
a<-sim1(overall,size)
listnormwor=NULL
countnormwor=0
meannormwor=NULL
for(i in 0:nsim-1){
**if (a[1+(size+4)*i]+a[2+(size+4)*i]==0)**{
countnormwor=countnormwor +1
for (z in 5:(size+4)){
listnormwor=c(listnormwor, a[z+(size+4)*i])}
meannormwor=c(meannormwor,a[3+(size+4)*i])}
}
countnormwor
mean(meannormwor)
var(listnormwor)
Simply, I want to say if there are no outliers (indicated as '0' in first and second value of every 14 data points), count it into a normal bucket and keep its values to calculate variance and mean later.
But the problem is that, it generates all values from a and at the very end, it provides actual values I want.
For example, it must satisfy length(listnormwor) = 10 * countnormwor
But it gives me a ridiculous amount of data and when I play around with the if statement, it says "missing value where TRUE/FALSE needed."
I'd suggest stepping through the code (sending each line to the interpreter) one line at a time. Inspect the value of variables by calling them in the interpreter. I bet this will lead you to the source of your problem. To start, create the values x and c inside the function then work from there. Instead of running the for loop, create your own index variable i. Again, the point is to work line by line and carefully check your expectations against the values that variables take at each point.

how to do a sum in graphite but exclude cases where not all data is present

We have 4 data series and once in a while one of the 4 has a null as we missed reading the data point. This makes the graph look like we have awful spikes in loss of volume coming in which is not true as we were just missing the data point.
I am doing a basic sumSeries(server*.InboundCount) right now for server 1, 2, 3, 4 where the * is.
Is there a way where graphite can NOT sum the locations on the line and just have sum for those points in time be also null so it connects the line from the point where there is data to the next point where there is data.
NOTE: We also display the graphs server*.InboundCount individually to watch for spikes on individual servers.
or perhaps there is function such that it looks at all the series and if any of the values is null, it returns null for every series that it takes X series and returns X series points to the sum function as null+null+null+null hopefully doesn't result in a spike and shows null.
thanks,
Dean
This is an old question but still deserves an answer as a point of reference, what you're after I believe is the function KeepLastValue
Takes one metric or a wildcard seriesList, and optionally a limit to the number of ‘None’ values to skip over. Continues the line with the last received value when gaps (‘None’ values) appear in your data, rather than breaking your line.
This would make your function
sumSeries(keepLastValue(server*.InboundCount))
This will work ok if you have a single null datapoint here and there. If you have multiple consecutive null data points you can specify how far back before a null breaks your data. For example, the following will look back up to 10 values before the sumSeries breaks:
sumSeries(keepLastValue(server*.InboundCount, 10))
I'm sure you've since solved your problems, but I hope this helps someone.

Resources