Kibana Timelion error issues - kibana

I have used Timelion to generate Time series data. However, I have come across with errors when using split function.
This is my query without split .es(index='cpu-*',metric='avg:CPU(%)').mvavg(1,left) It runs fine and did generate chart.
However, when it comes to splitting .es(index='cpu-*', split=Hostname:5,metric='avg:CPU(%)').mvavg(1,left). It gives me the error of Timelion: Error: in cell #1: Request Timeout after 30000ms.
What could be the causes for that? I could not find what is wrong with my query

Related

R code halts in for loop no error message

I'm working on a segment of R code which is a for loop that reads in columns from a collection of .csv files the paths of which are compiled in a file path directory I've made. As it reads in each file it stores four different columns into four different grand tables within my working environment.
I'm trying to make part of code calculates the monthly average for each month then stores it in a new column so that I can use it to replace missing data points and this involves a couple more for loops for mapping the subset of the aggregate table.
All this ends up being four nested for loops which handles a great deal of data at once before overwriting with the next large file.
After incorporating the nested 3 loops which build the monthly average vector the code started halting as soon as it tries to read in the data file with no error message it just looks like this.
halted code no errror
if I add show_col_types = FALSE to the read_csv it looks like this instead still halting the code.
halted code with show_col_types = FALSE
I can't include the data or much of the code because my company will not allow it but I would appreciate any input since there isn't any error message I can google. Thanks!

importxml could not fetch url after repeated attempts

I am trying to import the weather data for a number of dates, and one zip code, in Google Sheets. I am using importxml for this in the following base formula:
=importxml("https://www.almanac.com/weather/history/zipcode/89118/2020-01-21","//*")
When using this formula with certain zip codes and certain times, it returns the full text of the page which I then query for the mean temperature and mean dew point. However, with the above example and in many other cases, it returns "Could not fetch URL" and #N/A in the cells.
Thus, the issue is, it works a number of times, but by the fifth date or so, it throws the "Could not fetch URL" error. It also fails as I change zip codes. My only guess based on reading many threads is that because I'm requesting the URL so often from Sheets, it is eventually being blocked. Is there any other error anyone can see? I have to use the formula a few times to calculate relative humidity and other things, so I need it to work multiple times. Is it possible there would be a better way to get this working using a script? Or anything else that could cause this?
Here is the spreadsheet in question (just a work in progress, but the weather part is my issue): https://docs.google.com/spreadsheets/d/1WPyyMZjmMykQ5RH3FCRVqBHPSom9Vo0eaLlff-1z58w/edit?usp=sharing
The formulas that are throwing errors start at column N.
This Sheet contains many formulas using the above base formula, in case you want to see more examples of the problem.
Thanks!
After a great deal of trial and error, I found a solution to my own problem. I'm answering this in detail for anyone who needs to find weather info by zip code and date.
I switched to using importdata, transposed it to speed up the query, and used a helper cell to hold the result for each date. I then have the other formulas searching within the result in the helper cell, instead of calling import*** many times throughout. It is slow at times, but it works. This is the updated helper formula (where O3 contains the date in "YYYY-MM-DD" form, O5 contains the URL "https://www.almanac.com/weather/history/", and O4 contains the zip code:
=if(O3="",,query(transpose(IMPORTdata($O$5&$O$4&"/"&O3)),"select Col487 where Col487 contains 'Mean'"))
And then to get the temperature (where O3 contains the date and O8 contains the above formula):
=if(O3="",,iferror(text(mid(O$8,find("Mean Temperature",O$8)+53,4),"0.0° F"),"Loading..."))
And finally, to calculate the relative humidity:
=if(O3="",,iferror(if(now()=0,,exp(((17.625*243.04)*((mid(O$8,find("Mean Dew Point",O$8)+51,4)-32)/1.8-(mid(O$8,find("Mean Temperature",O$8)+53,4)-32)/1.8))/((243.04+(mid(O$8,find("Mean Temperature",O$8)+53,4)-32)/1.8)*(243.04+(mid(O$8,find("Mean Dew Point",O$8)+51,4)-32)/1.8)))),"Loading..."))
Most importantly, importdata has not once thrown the Could not fetch URL error, so it appears to be a better fetch method for this particular site.
Hopefully this can help others who need to pull in historical weather data :)

In mclapply : scheduled cores 9 encountered errors in user code, all values of the jobs will be affected

I went through the existing stackoverflow links regarding this error, but no solution given there is working (and some questions dont have solutions there either)
Here is the problem I am facing:
I run Arima models in parallel using mclapply of parallel package. The sample data is being split by key onto different cores and results are clubbed together using do.call + rbind (the server I place the script in has 20 cores of cpu which is passed on to mc.cores field)
Below is my mclapply code:
print('Before lapply')
data_sub <- do.call(rbind, mclapply(ds,predict_function,mc.cores=num_cores))
print('After lapply')
I get multiple set of values like below as output of 'predict_function'
So basically, I get the file as given above from multiple cores to be send to rbind. The code works perfectly for some part of data. Now, I get another set of data , same like above with same data type of each column, but different value in column 2
data type of each column is given in the column name above.
For the second case, I get below error:
simpleError in charToDate(x): character string is not in a standard unambiguous format
Warning message:
In mclapply(ds, predict, mc.cores = num_cores) :
scheduled cores 9 encountered errors in user code, all values of the jobs will be affected
I dont see this print: print('After lapply') for the second case, but is visible for first case.
I checked the date column in above dataframe, its in Date format. When I tried unique(df$DATE) it threw all valid values in the format as given above.
What is the cause of the error here? is it the first one due to which mclapply isnt able to rbind the values? Is the warning something we need to understand better?
Any advice would be greatly appreciated.

Some failures to use Rstudio + sparklyr in Watson Studio for data manipulation on large data set

I got Error in curl::curl_fetch_memory(url, handle = handle) : Empty reply from server for some operations in Rstudio (Watson studio) when I tried to do data manipulation on Spark data frames.
Background:
The data is stored on IBM Cloud Object Storage (COS). It will be several 10GB files but currently I'm testing only on the first subset (10GB).
The workflow I suppose is, in Rstudio (Watson Studio), connect to spark (free plan) using sparklyr, read the file as Spark data frame through sparklyr::spark_read_csv(), then apply feature transformation on it (e.g., split one column into two, compute the difference between two columns, remove unwanted columns, filter out unwanted rows etc.). After the preprocessing, save out the cleaned data back to COS through sparklyr::spark_write_csv().
To work with Spark I added 2 spark services into the project (seems like any spark service under the account can be used by Rstudio.. Rstudio is not limited by project?); I may need to use R notebooks for data exploration (to show the plots in a nice way) so I created the project for that purpose. In previous testings I found that for R notebooks / Rstudio, the two env cannot use the same Spark service at the same time; so I created 2 spark services, the first for R notebooks (let's call it spark-1) and the second for Rstudio (call it spark-2).
As I personally prefer sparklyr (pre-installed in Rstudio only) over SparkR (pre-installed in R notebooks only), for almost the whole week I was developing & testing code in Rstudio using spark-2.
I'm not very familiar with Spark and currently it behaves in a way that I don't really understand. It would be very helpful if anyone can give suggestions on any issue:
1) failure to load data (occasionally)
It worked quite stable until yesterday, since when I started to encounter issues loading data using exactly the same code. The error does not tell anything but R fails to fetch data (Error in curl::curl_fetch_memory(url, handle = handle) : Empty reply from server). What I observed for several times is, after I got this error, if I again run the code to import data (just one line of code), the data would be loaded successfully.
Q1 screenshot
2) failure to apply (possibly) large amount of transformations (always, regardless of data size)
To check whether the data is transformed correctly, I printed out the first several rows of interested variables after each step (most of them are not ordinal, i.e., the order of steps doesn't matter) of transformation. I read a little bit of how sparklyr translates operations. Basically sparklyr doesn't really apply the transformation to the data until you call to preview or print some of the data after transformation. After a set of transformations, if I run some more, when I printed out the first several rows I got error (same useless error as in Q1). I'm sure the code is right as once I run these additional steps of code right after I load the data, I'm able to print and preview the first several rows.
3) failure to collect data (always for the first subset)
By collecting data I want to pull the data frame down to the local machine, here to Rstudio in Watson Studio. After applying the same set of transformations, I'm able to collect the cleaned version of a sample data (originally 1000 rows x 158 cols, about 1000 rows x 90 cols after preprocessing), but failed on the first 10 GB subset file (originally 25,000,000 rows x 158 cols, at most 50,000 rows x 90 cols after preprocessing). The space it takes up should not exceed 200MB in my opinion, which means it should be able to be read into either Spark RAM (1210MB) or Rstudio RAM. But it just failed (again with that useless error).
4) failure to save out data (always, regardless of data size)
The same error happened every time when I tried to write the data back to COS. I suppose this has something to do with the transformations, probably something happens when Spark received too many transformation request?
5) failure to initialize Spark (some kind of pattern found)
Starting from this afternoon, I cannot initialize spark-2, which has been used for about a week. I got the same useless error message. However I'm able to connect to spark-1.
I checked the spark instance information on IBM Cloud:
spark-2
spark-1
It's weird that spark-2 has 67 active tasks since my previous operations got error messages. Also, I'm not sure why "input" in both spark instances are so large.
Does anyone know what happened and why did it happen?
Thank you!

mapdist query limit reached in my R session. any solution?

I have been trying to calculate driving distances from a specific zip code to all other zip codes in a state. The program was running fine until the following error occured:
> mapdist('19111 USA', '19187 USA')
by using this function you are agreeing to the terms at :
http://code.google.com/apis/maps/documentation/distancematrix/
Information from URL : http://maps.googleapis.com/maps/api/distancematrix/json?origins=19111+USA&destinations=19187+USA&mode=driving&sensor=false
matching was not perfect, returning what was found.
Error in `*tmp*`[[c(1, 1)]] : no such index at level 1
When I run the function distQueryCheck() it indicates that I still have over 2000 queries left. Am I over the limit despite what the results of distQueryCheck() are? I suspect that I may be since I have been running this function over the last couple of days. Are there any ways to increase the limit? How long do I have to wait until the limit resets back to 2500?

Resources