Pull data using R from a specific weather station - r

I've looked through many pages of how to do this and they essentially all have the same R code suggestions, which I've followed. Here's the R code I'm using for the specific weather station I'm looking for:
library(rnoaa)
options(noaakey="MyKeyHere")
ncdc(datasetid='GHCND', stationid='GHCND:USW00014739', datatypeid='dly-tmax-normal', startdate='2017-05-15', enddate='2018-01-04')
The error message I get when I run this is:
Warning message:
Sorry, no data found
I've gone directly to the NOAA site (https://www.ncdc.noaa.gov/cdo-web/search) and manaually pulled the dataset out there (using the "daily summaries" dataset, which is the same as GHCND in the API). There is in fact data there for my entire date range.
What am I missing?

The documentation says:
Note that NOAA NCDC API calls can take a long time depending on the call. The NOAA API doesn't perform well with very long timespans, and will time out and make you angry - beware.
Have you tried a smaller timespan?

Related

importxml could not fetch url after repeated attempts

I am trying to import the weather data for a number of dates, and one zip code, in Google Sheets. I am using importxml for this in the following base formula:
=importxml("https://www.almanac.com/weather/history/zipcode/89118/2020-01-21","//*")
When using this formula with certain zip codes and certain times, it returns the full text of the page which I then query for the mean temperature and mean dew point. However, with the above example and in many other cases, it returns "Could not fetch URL" and #N/A in the cells.
Thus, the issue is, it works a number of times, but by the fifth date or so, it throws the "Could not fetch URL" error. It also fails as I change zip codes. My only guess based on reading many threads is that because I'm requesting the URL so often from Sheets, it is eventually being blocked. Is there any other error anyone can see? I have to use the formula a few times to calculate relative humidity and other things, so I need it to work multiple times. Is it possible there would be a better way to get this working using a script? Or anything else that could cause this?
Here is the spreadsheet in question (just a work in progress, but the weather part is my issue): https://docs.google.com/spreadsheets/d/1WPyyMZjmMykQ5RH3FCRVqBHPSom9Vo0eaLlff-1z58w/edit?usp=sharing
The formulas that are throwing errors start at column N.
This Sheet contains many formulas using the above base formula, in case you want to see more examples of the problem.
Thanks!
After a great deal of trial and error, I found a solution to my own problem. I'm answering this in detail for anyone who needs to find weather info by zip code and date.
I switched to using importdata, transposed it to speed up the query, and used a helper cell to hold the result for each date. I then have the other formulas searching within the result in the helper cell, instead of calling import*** many times throughout. It is slow at times, but it works. This is the updated helper formula (where O3 contains the date in "YYYY-MM-DD" form, O5 contains the URL "https://www.almanac.com/weather/history/", and O4 contains the zip code:
=if(O3="",,query(transpose(IMPORTdata($O$5&$O$4&"/"&O3)),"select Col487 where Col487 contains 'Mean'"))
And then to get the temperature (where O3 contains the date and O8 contains the above formula):
=if(O3="",,iferror(text(mid(O$8,find("Mean Temperature",O$8)+53,4),"0.0° F"),"Loading..."))
And finally, to calculate the relative humidity:
=if(O3="",,iferror(if(now()=0,,exp(((17.625*243.04)*((mid(O$8,find("Mean Dew Point",O$8)+51,4)-32)/1.8-(mid(O$8,find("Mean Temperature",O$8)+53,4)-32)/1.8))/((243.04+(mid(O$8,find("Mean Temperature",O$8)+53,4)-32)/1.8)*(243.04+(mid(O$8,find("Mean Dew Point",O$8)+51,4)-32)/1.8)))),"Loading..."))
Most importantly, importdata has not once thrown the Could not fetch URL error, so it appears to be a better fetch method for this particular site.
Hopefully this can help others who need to pull in historical weather data :)

Quantmod getSymbols systematically returns missing value on Chinese stocks

Today(2019-2-27), I discovered that almost all the stock price of Chinese companies listed in Shanghai/Shenzhen cannot be completely downloaded by "getSymbols" function in quantmod, which always generated a warning message of missing data. However, Neither US companies nor Chinese companies listed in US were affected. As far as I can remember, this is the first time I encounter this issue. I was thinking which parts of this process went wrong. Yahoo finance database or getSymbols??? Examples I tried were actually some of the biggest companies, so I assume their stock data are fully available.
> getSymbols("BABA") ### Alibaba listed in US, not affected
[1] "BABA"
> getSymbols("BILI")
[1] "BILI"
> getSymbols("0700.hk") ### Tencent listed in HK, affected.
[1] "0700.HK"
Warning message:
0700.hk contains missing values. Some functions will not work if objects contain missing values in the middle of the series. Consider using na.omit(), na.approx(), na.fill(), etc to remove or replace them.
> getSymbols("601398.SS")
[1] "601398.SS"
Warning message:
601398.SS contains missing values. Some functions will not work if objects contain missing values in the middle of the series. Consider using na.omit(), na.approx(), na.fill(), etc to remove or replace them.
> getSymbols("601318.SS")
[1] "601318.SS"
Warning message:
601318.SS contains missing values. Some functions will not work if objects contain missing values in the middle of the series. Consider using na.omit(), na.approx(), na.fill(), etc to remove or replace them.
It is a yahoo issue. If you look at the december 2011 data from tencent on the historical data tab of yahoo, you can see that yahoo doesn't have the data for the 24th and the 31st of December. Which are two of the 3 records that are missing data. The other is for 2008-08-22.
You do know, that the default request with getSymbols for yahoo starts at 2007-01-01. So you could change that to a more recent date. But it is free data. You can not expect the same data quality as other data providers. And it happens more often with yahoo for other tickers as well.
Yes,as mentioned above by #phiver, the data quality from yahoo finance database are not satisfying. Meanwhile, google finance has stoped providing support to quantmod since March 2018. Thus I was looking for another data source within the framework of quantmod.
I found that tiingo database started to support quantmod as google finance exits.
https://www.r-bloggers.com/goodbye-google-hello-tiingo/
go to tiingo website to create an account, then you have you api.
use getSymbols.tiingo(ticker,api.key="your key") to download data
by the way,the ticker of Chinese stocks is a bit different in getSymbol.tiingo compared with getSymbols. You don't need to indicate which stock exchange, ss or sz.
getSymbols("000001.SS")
getSymbols.tiingo("000001",api.key="xxxxx")
also you might need to store your api.key, I recommend you to create a snippet, this is the most efficient way i have found so far. Further details can be seen in my another answer on how to store api.key in Rstudio.

How can I generate a fully-formatted Excel document as a final output?

I have built a script that extracts data from numerous backend systems and I want the output to be formatted like this:
Which package is the best and/or easiest to do this with?
I am aware of the xlsx package, but am also aware that there are others available so would like to know which is best in terms of ease and/or simplicity to achieve my desired output.
A little more detail:
If I run the report across seven days, then the resulting data frame is 168 rows deep (1 row represents 1 hour, 168 hours per week). I want each date (00:00 - 23:00) to be broken out into day-long blocks, as per the image I have provided.
(Also note that I am in London, England, and as such am currently in timezone UTC+1, which means that right now, the hourly breakdown for each date will range from 01:00 - 00:00 on the next day because our backend systems run on the UTC timezone and that is fine.)
At present, I copy and paste (transpose) the values across manually, but want to be able to fully automate the process so that I can run the script (function), and have the resulting output looking like the image.
This is what the current final output looks like:
Try the package openxlsx. The package offers a lot of custom formatting for .xslx documents and is actively developed / fairly responsive to githhub issues. The vignettes on the cran website are particularly useful.

Having trouble figuring out how to approach this exercise #R scraping #extracting web data

So, sometimes I need to get some data from the web organizing it into a dataframe and waste a lot of time doing it manually. I've been trying to figure out how to optimize this proccess, and I've tried with some R scraping approaches, but couldn't get to do it right and I thought there could be an easier way to do this, can anyone help me out with this?
Fictional exercise:
Here's a webpage with countries listed by continents: https://simple.wikipedia.org/wiki/List_of_countries_by_continents
Each country name is also a link that leads to another webpage (specific of each country, e.g. https://simple.wikipedia.org/wiki/Angola).
I would like as a final result to get a data frame with number of observations (rows) = number of countries listed and 4 variables (colums) as ID=Country Name, Continent=Continent it belongs to, Language=Official language (from the specific webpage of the Countries) and Population = most recent population count (from the specific webpage of the Countries).
Which steps should I follow in R in order to be able to reach to the final data frame?
This will probably get you most of the way. You'll want to play around with the different nodes and probably do some string manipulation (clean up) after you download what you need.

Use own depth/coordinate data with 'marmap' for bathymetry analyses

I have over 111,000 longitude and latitude points with depths associated with each coordinate. The data is in the format Longitude, Latitude, and Depth. When I load the data into R and convert the data into class bathy using as.bathy R seems to hang. When I check the format using is.bathy R returns FALSE. Can 'marmap' handle such large datasets?
There could be several causes to this behavior; to help you diagnose the problem, could you send me:
- your session info (use sessionInfo() ) and tell me what kind of machine you have (e.g. RAM)?
- your code and the error message that you received?
- your data, or a subset of it, so I can try to re-create the problem here, on my machine?
cheers, eric

Resources