I'm new to working with NetCDF files and I haven't been able to find the answer to my question elsewhere.
Daily precip data for year 2015 (from Gridmet): https://www.northwestknowledge.net/metdata/data/pr_2015.nc
My question: Maps are displaying with lat on the x axis and long on the y axis. How do I flip these axes? Futhermore, it also appears that values for latitude are inverted. (see linked map below)
library(raster)
library(ncdf4)
nc15 <- nc_open("C:\\Users\\vsteen\\Desktop\\BorealToad\\Climate\\pr_2015.nc")
b <- brick("C:\\Users\\vsteen\\Desktop\\BorealToad\\Climate\\pr_2015.nc",varname="precipitation_amount")
plot(b[[3]])
print(nc15)
1 variables (excluding dimension variables):
float precipitation_amount[lat,lon,day]
units: mm
description: Daily Accumulated Precipitation
_FillValue: -32767
esri_pe_string: GEOGCS[\"GCS_WGS_1984\",DATUM[\"D_WGS_1984\",SPHEROID[\"WGS_1984\",6378137.0,298.257223563]],PRIMEM[\"Greenwich\",0.0],UNIT[\"Degree\",0.0174532925199433]]
coordinates: lon lat
cell_methods: time: sum(interval: 24 hours)
missing_value: -32767
3 dimensions:
lon Size:1386
units: degrees_east
description: longitude
lat Size:585
units: degrees_north
description: latitude
day Size:365
units: days since 1900-01-01 00:00:00
calendar: gregorian
description: days since 1900-01-01
9 global attributes:
author: John Abatzoglou - University of Idaho, jabatzoglou#uidaho.edu
date: 20 September 2016
note1: The projection information for this file is: GCS WGS 1984.
note2: Citation: Abatzoglou, J.T., 2013, Development of gridded surface meteorological data for ecological applications and modeling, International Journal of Climatology, DOI: 10.1002/joc.3413
last_permanent_slice: 365
last_early_slice: 365
note3: Data in slices after last_permanent_slice (1-based) are considered provisional and subject to change with subsequent updates
note4: Data in slices after last_early_slice (1-based) are considered early and subject to change with subsequent updates
note5: Days correspond approximately to calendar days ending at midnight, Mountain Standard Time (7 UTC the next calendar day)
str(nc15$dim)
List of 3
$ lon:List of 10
..$ name : chr "lon"
..$ len : int 1386
..$ unlim : logi FALSE
..$ group_index : int 1
..$ group_id : int 65536
..$ id : int 0
..$ dimvarid :List of 5
.. ..$ id : int 0
.. ..$ group_index: int 1
.. ..$ group_id : int 65536
.. ..$ list_index : num -1
.. ..$ isdimvar : logi TRUE
.. ..- attr(*, "class")= chr "ncid4"
..$ units : chr "degrees_east"
..$ vals : num [1:1386(1d)] -125 -125 -125 -125 -125 ...
..$ create_dimvar: logi TRUE
..- attr(*, "class")= chr "ncdim4"
$ lat:List of 10
..$ name : chr "lat"
..$ len : int 585
..$ unlim : logi FALSE
..$ group_index : int 1
..$ group_id : int 65536
..$ id : int 1
..$ dimvarid :List of 5
.. ..$ id : int 1
.. ..$ group_index: int 1
.. ..$ group_id : int 65536
.. ..$ list_index : num -1
.. ..$ isdimvar : logi TRUE
.. ..- attr(*, "class")= chr "ncid4"
..$ units : chr "degrees_north"
..$ vals : num [1:585(1d)] 49.4 49.4 49.3 49.3 49.2 ...
..$ create_dimvar: logi TRUE
..- attr(*, "class")= chr "ncdim4"
$ day:List of 11
..$ name : chr "day"
..$ len : int 365
..$ unlim : logi FALSE
..$ group_index : int 1
..$ group_id : int 65536
..$ id : int 2
..$ dimvarid :List of 5
.. ..$ id : int 2
.. ..$ group_index: int 1
.. ..$ group_id : int 65536
.. ..$ list_index : num -1
.. ..$ isdimvar : logi TRUE
.. ..- attr(*, "class")= chr "ncid4"
..$ units : chr "days since 1900-01-01 00:00:00"
..$ calendar : chr "gregorian"
..$ vals : num [1:365(1d)] 42003 42004 42005 42006 42007 ...
..$ create_dimvar: logi TRUE
..- attr(*, "class")= chr "ncdim4"
>
Thanks in advance for any help. It will be much appreciated!
Rotated U.S. precipitation map
you can use the combination of transpose and flip from raster package:
s <- stack("pr_2015.nc", varname="precipitation_amount")
s2 <- t(flip(s, direction='y' ))
You can use the stars package to read the data directly from the netcdf file without the "rotation" problem.
library(stars)
s2 <- read_ncdf("pr_2015.nc", var = "precipitation_amount")
Here's the plot of the first image in the time series, just to show how the images are read using read_ncdf (without the rotation).
# Chose the first image from the time series
s2<- s2[,,,1]
# Plot to see it
plot(s2)
Related
I'm trying to scrape https://www.yachtfocus.com/boten-te-koop.html#price=10000%7C30000&length=9.2%7C&super_cat_nl=Zeil. I'm using the R packages read_html and rvest. I do this using this code:
library('rvest')
#scrape yachtfocus
url <- "https://www.yachtfocus.com/boten-te-koop.html#price=10000|30000&length=9.2|&super_cat_nl=Zeil"
webpage <- read_html(url)
#Using CSS selectors to scrap the rankings section
amount_results_html <- html_node(webpage,".res_number")
#create text
amount_results <- html_text(amount_results_html)
This returns not the expected value when using the filters provided in the url, but instead returns the "unfiltered" value. So the same when I'd use:
url <- "https://www.yachtfocus.com/boten-te-koop.html"
webpage <- read_html(url)
Can I "force" read_html to execute the filter parameters correctly?
The issue is that the site turns the anchor link into an asynchronous POST request, retrieves JSON and then dynamically builds the page.
You can use Developer Tools in the browser and reload the request to see ^^:
If you right-click the highlighted item and choose "Copy as cURL" you can use the curlconverter package to automagically turn it into an httr function:
httr::POST(
url = "https://www.yachtfocus.com/wp-content/themes/yachtfocus/search/",
body = list(
hash = "#price=10000%7C30000&length=9.2%7C&super_cat_nl=Zeil"
),
encode = "form"
) -> res
dat <- jsonlite::fromJSON(httr::content(res, "text"))
This is what you get (you still need to parse some HTML):
str(dat)
## List of 8
## $ content : chr " <!-- <div class=\"list_part\"> <span class=\"list_icon\">lijst</span> <span class=\"foto\"><"| __truncated__
## $ top : chr " <h3 class=\"res_number\">317 <em>boten\tgevonden</em></h3> <p class=\"filters_list red_border\"> <span>prijs: "| __truncated__
## $ facets :List of 5
## ..$ categories_nl :List of 15
## .. ..$ 6u3son : int 292
## .. ..$ 1v3znnf: int 28
## .. ..$ 10opzfl: int 27
## .. ..$ 1mrn15c: int 23
## .. ..$ qn3nip : int 3
## .. ..$ 112l5mh: int 2
## .. ..$ 1xjlw46: int 1
## .. ..$ ci62ni : int 1
## .. ..$ 1x1x806: int 0
## .. ..$ 1s9bgxg: int 0
## .. ..$ 1i7r9mm: int 0
## .. ..$ qlys89 : int 0
## .. ..$ 1wwlclv: int 0
## .. ..$ 84qiky : int 0
## .. ..$ 3ahnnr : int 0
## ..$ material_facet_nl:List of 11
## .. ..$ 911206 : int 212
## .. ..$ c9twlr : int 53
## .. ..$ 1g88z3 : int 23
## .. ..$ fwfz2d : int 14
## .. ..$ gvrlp6 : int 5
## .. ..$ 10i8nq1: int 4
## .. ..$ h98ynr : int 4
## .. ..$ 1qt48ef: int 1
## .. ..$ 1oxq1p2: int 1
## .. ..$ 1kc1p0j: int 0
## .. ..$ 10dkoie: int 0
## ..$ audience_facet_nl:List of 13
## .. ..$ 71agu9 : int 69
## .. ..$ eb9lzb : int 63
## .. ..$ o40emg : int 55
## .. ..$ vd2cm9 : int 41
## .. ..$ tyffgj : int 24
## .. ..$ icsp53 : int 20
## .. ..$ aoqm1 : int 11
## .. ..$ 1puyni5: int 6
## .. ..$ 1eyfin8: int 5
## .. ..$ 1920ood: int 4
## .. ..$ dacmg4 : int 4
## .. ..$ e7bzw : int 3
## .. ..$ offcbq : int 3
## ..$ memberships :List of 7
## .. ..$ 137wtpl: int 185
## .. ..$ 17vn92y: int 166
## .. ..$ wkz6oe : int 109
## .. ..$ 1mdn78e: int 87
## .. ..$ aklw3a : int 27
## .. ..$ 1d9qtvu: int 20
## .. ..$ zqsmlf : int 3
## ..$ super_cat_nl :List of 3
## .. ..$ 2xl9ac : int 271
## .. ..$ glli8c : int 317
## .. ..$ 1key6o0: int 0
## $ filter :List of 3
## ..$ brand : chr "<label><input type=\"checkbox\" name=\"yfilter[brand][Dehler]\" data-solr=\"brand\" value=\"Dehler\" class=\"cu"| __truncated__
## ..$ brokers: chr "<label><input type=\"checkbox\" name=\"yfilter[brokers][Scheepsmakelaardij Goliath]\" data-solr=\"brokers\" val"| __truncated__
## ..$ land_nl: chr "<label><input type=\"checkbox\" name=\"yfilter[land_nl][Nederland]\" data-solr=\"land_nl\" value=\"Nederland\" "| __truncated__
## $ hash : chr "&price=10000|30000&length=9.2|&super_cat_nl=Zeil"
## $ ifield :List of 3
## ..$ y_price_min : chr "10000"
## ..$ y_price_max : chr "30000"
## ..$ y_length_min: chr "9.2"
## $ rcfield :List of 1
## ..$ y_glli8c: chr "1"
## $ session_id: chr "spghrfb8urv50u2kfg6bp3hejm"
Note that this is a super common problem that's been covered many times on SO. Each situation requires finding the right URL in the XHR requests but that's usually the only difference. If you're going to web scrape you should spend some time reading up on how to do so (even 10m of searching on SO would have likely solved this for you).
If you don't want to do this type of page introspection, you need to use Rselenium or splashr or decapitated. Again, the use of those tools in the context of a problem like this is a well-covered topic on SO.
I am having trouble understanding the outputs when using this google_distance function. When using mydist() in ggmap I would get the number of miles, minutes, hours that it would take to get to point A to point B.
Now my output looks like this when I use google_distance. Can anyone help explain what each of the numbers is referring to?
$rows
elements
1 791 km, 790588, 7 hours 28 mins, 26859, 7 hours 35 mins, 27286, OK
My code is as follows:
results <- google_distance(origins = list(c(26.19660, -98.23591)),
destinations = list(c(31.62327, -94.64276)),
mode = "driving", key = key, simplify = TRUE)
What you're seeing is the standard JSON response, but simplified into a data.frame (as per the simplify = TRUE argument)
If you look one level deeper at your response, you'll get the description of those valeus
results$rows$elements
# [[1]]
# distance.text distance.value duration.text duration.value duration_in_traffic.text duration_in_traffic.value
# 1 791 km 790588 7 hours 28 mins 26859 7 hours 28 mins 26906
where
distance.value is in metres
duration.value is in seconds
Similarly, looking at the structure of the result object, you'll see all the JSON elements
str(results)
# List of 4
# $ destination_addresses: chr "805 E College St, Nacogdoches, TX, USA"
# $ origin_addresses : chr "1400-1498 W Houston Ave, McAllen, TX 78501, USA"
# $ rows :'data.frame': 1 obs. of 1 variable:
# ..$ elements:List of 1
# .. ..$ :'data.frame': 1 obs. of 4 variables:
# .. .. ..$ distance :'data.frame': 1 obs. of 2 variables:
# .. .. .. ..$ text : chr "791 km"
# .. .. .. ..$ value: int 790588
# .. .. ..$ duration :'data.frame': 1 obs. of 2 variables:
# .. .. .. ..$ text : chr "7 hours 28 mins"
# .. .. .. ..$ value: int 26859
# .. .. ..$ duration_in_traffic:'data.frame': 1 obs. of 2 variables:
# .. .. .. ..$ text : chr "7 hours 28 mins"
# .. .. .. ..$ value: int 26906
# .. .. ..$ status : chr "OK"
# $ status : chr "OK"
Further Reference:
Google Developers Guide: Distance Matrix
I'm a bit stuck at the moment, i have been able to create a spatial points data frame and out of this i made an object of the class Ltraj. This i will need to do further analysis. But my x and y coordinates aren't in UTM which might give problems if i do further analysis down the line.
Format:
x y date dx dy dist dt
1 -32.09245 116.0426 2015-08-07 00:22:00 -2.19e-05 0.0000194 2.925696e-05 1800 ...
Structure:
List of 1
$ :'data.frame': 109 obs. of 10 variables:
..$ x : num [1:109] -32.1 -32.1 -32.1 -32.1 -32.1 ...
..$ y : num [1:109] 116 116 116 116 116 ...
..$ date : POSIXct[1:109], format: "2015-08-07 00:22:00" "2015-08-07 00:52:00" "2015-08-07 01:22:00" "2015-08-07 01:52:00" ...
..$ dx : num [1:109] -2.19e-05 -5.73e-05 -5.15e-05 4.52e-05 -4.96e- 05 ...
..$ dy : num [1:109] 1.94e-05 -3.21e-04 -2.61e-05 2.75e-04 -1.06e-04 ...
..$ dist : num [1:109] 2.93e-05 3.26e-04 5.77e-05 2.79e-04 1.17e-04 ...
..$ dt : num [1:109] 1800 1800 1800 3840 1800 3600 1740 1920 4680 900 ...
..$ R2n : num [1:109] 0.00 8.56e-10 9.71e-08 1.24e-07 1.00e-08 ...
..$ abs.angle: num [1:109] 2.42 -1.75 -2.67 1.41 -2.01 ...
..$ rel.angle: num [1:109] NA 2.119 -0.925 -2.203 2.865 ...
..- attr(*, "id")= chr "2172"
..- attr(*, "burst")= chr "2172"
..- attr(*, "infolocs")='data.frame': 109 obs. of 1 variable:
.. ..$ pkey: Factor w/ 109 levels "2172.2015-08-07 00:22:00",..: 1 2 3 4 5 6 7 8 9 10 ...
- attr(*, "class")= chr [1:2] "ltraj" "list"
- attr(*, "typeII")= logi TRUE
- attr(*, "regular")= logi FALSE
- attr(*, "proj4string")=Formal class 'CRS' [package "sp"] with 1 slot
.. ..# projargs: chr NA
-->I was able to create a formal class Spatial points set from my lats and longs in the UTM format but that is a seperate object now:
Structure:
Formal class 'SpatialPoints' [package "sp"] with 3 slots
..# coords : num [1:109, 1:2] 409662 409664 409634 409631 409657 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:2] "coords.x1" "coords.x2"
..# bbox : num [1:2, 1:2] 406647 13536726 415659 13551107
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:2] "coords.x1" "coords.x2"
.. .. ..$ : chr [1:2] "min" "max"
..# proj4string:Formal class 'CRS' [package "sp"] with 1 slot
.. .. ..# projargs: chr "+init=epsg:32750 +proj=utm +zone=50 +south +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0"
I basically want my values for x and y in the ltraj to be in UTM,
I guess i will probably have to do this before i create the object ltraj or even the spatial points dataframe.
I was wondering if anybody has any advice as to change my lats and longs to UTM within the original dataframe; or to make it into a spatial dataframe and then bind them again with the dataframe that contains "subject" and "date"?
Kind regards,
Sam Rycken
I'm working on validating the goodness of hierarchical clustering using clValid. Below is my code. The clustering always results in one noisy cluster which contains the 70% of the elements and hence I recursively cluster the elements in the noisy cluster.
intern <- clValid(primaryDataSource, 2:10,clMethods = c("Hierarchical"),
validation="internal", maxitems = 2200)
summary(intern)
Output of summary(intern):
Clustering Methods:
hierarchical
Cluster sizes:
2 3 4 5 6 7 8 9 10
Validation Measures:
2 3 4 5 6 7 8 9 10
hierarchical Connectivity 3.8738 3.8738 8.2563 10.9452 16.0286 18.6452 20.6452 22.6452 24.6452
Dunn 4.0949 0.8810 0.6569 0.8694 0.8808 1.0416 1.0230 1.0262 1.3724
Silhouette 0.9592 0.9879 0.9785 0.9751 0.9727 0.9729 0.9727 0.9726 0.9725
Optimal Scores:
Score Method Clusters
Connectivity 3.8738 hierarchical 2
Dunn 4.0949 hierarchical 2
Silhouette 0.9879 hierarchical 3
At each iteration I have to execute the clValid() and select the number of clusters which would give me the highest Silhouette value (in the above example it's 3). I'm trying to automate the recursive clustering approach. Hence I'm looking to pick the number of clusters which would have the highest Silhouette value. Can you please help me in extracting that piece of information? Thank you.
P.S: I tried converting the results into a data frame or a table. However it didn't work.
Update: After using str()
> str(intern)
Formal class 'clValid' [package "clValid"] with 14 slots
..# clusterObjs:List of 1
.. ..$ hierarchical:List of 7
.. .. ..$ merge : int [1:2173, 1:2] -1673 -714 -1121 -1688 -1876 -1123 -1689 -1228 -429 -535 ...
.. .. ..$ height : num [1:2173] 0 0.001 0.001 0.001 0.001 ...
.. .. ..$ order : int [1:2174] 2165 2166 1950 1951 1954 1955 1577 1565 1564 1576 ...
.. .. ..$ labels : chr [1:2174] "out_M_aacald_c_boundary" "out_M_12ppd_DASH_R_e_boundary" "out_M_12ppd_DASH_S_e_boundary" "in_M_14glucan_e_boundary" ...
.. .. ..$ method : chr "average"
.. .. ..$ call : language hclust(d = Dist, method = method)
.. .. ..$ dist.method: chr "euclidean"
.. .. ..- attr(*, "class")= chr "hclust"
..# measures : num [1:3, 1:9, 1] 3.874 4.095 0.959 3.874 0.881 ...
.. ..- attr(*, "dimnames")=List of 3
.. .. ..$ : chr [1:3] "Connectivity" "Dunn" "Silhouette"
.. .. ..$ : chr [1:9] "2" "3" "4" "5" ...
.. .. ..$ : chr "hierarchical"
..# measNames : chr [1:3] "Connectivity" "Dunn" "Silhouette"
..# clMethods : chr "hierarchical"
..# labels : chr [1:2174] "out_M_aacald_c_boundary" "out_M_12ppd_DASH_R_e_boundary" "out_M_12ppd_DASH_S_e_boundary" "in_M_14glucan_e_boundary" ...
..# nClust : num [1:9] 2 3 4 5 6 7 8 9 10
..# validation : chr "internal"
..# metric : chr "euclidean"
..# method : chr "average"
..# neighbSize : num 10
..# annotation : NULL
..# GOcategory : chr "all"
..# goTermFreq : num 0.05
..# call : language clValid(obj = primaryDataSource, nClust = 2:10, clMethods = c("Hierarchical"), validation = "internal", maxitems = 2200)
I guess the important section is
# measures : num [1:3, 1:9, 1] 3.874 4.095 0.959 3.874 0.881 ...
.. ..- attr(*, "dimnames")=List of 3
.. .. ..$ : chr [1:3] "Connectivity" "Dunn" "Silhouette"
.. .. ..$ : chr [1:9] "2" "3" "4" "5" ...
.. .. ..$ : chr "hierarchical"
when I executed >intern#measuresI got the below result.
2 3 4 5 6 7 8 9
Connectivity 3.8738095 3.8738095 8.2563492 10.9452381 16.0285714 18.6452381 20.6452381 22.645238
Dunn 4.0948837 0.8810494 0.6568857 0.8694067 0.8808228 1.0415614 1.0230197 1.026192
Silhouette 0.9591803 0.9879153 0.9784684 0.9751393 0.9727454 0.9728736 0.9727153 0.972622
10
Connectivity 24.6452381
Dunn 1.3724494
Silhouette 0.9725379
I'm able to get the max and access individual items based on the index. I want to get the maximum value for Silhouette.
intern#measures[1]
max(intern#measures)
Some additionnal explanation, when str() shows # signs, this points that the object you are inspecting is a S4 class with attributes. I am not familiar with clValid but a quick look at the source code shows that the clValid class inherits from S4.
You can access those using object#attribute. Typically these attributes can be anything.
Looking at the print function for clValid it seems that you can access the measures using the convenience function measures(object). Looking at the remaining source code for clValid there are utility functions that may be of use for you. Check optimalScores().
This question already has an answer here:
mgcv: How to set number and / or locations of knots for splines
(1 answer)
Closed 5 years ago.
I am running a GAM across many samples and am extracting coefficients/t-values/r-squared from the results in the way shown below. For background, I am using a natural spline, so the regular lm() works fine here and perhaps that is why this method works fine.
tvalsm93exf=ldply(fitsm93exf, function(x) as.data.frame(t(coef(summary(x))[,'t value', drop=FALSE])))
r2m93exf=ldply(fitsm93exf, function(x) as.data.frame(t(summary(x))[,'r.squared', drop=FALSE]))
I would also like to extract the knot locations for each sample set(df=4 and no intercept, so three internal knots and the boundaries). I have tried several variations of the commands above, but haven't been able to index in to this. The regular way to do this is below, so I was attempting to put this into the form above. But I am not certain if the summary function contains these values, or if there is another result I should be including instead.
attr(terms(fits),"predvars")
http://www.inside-r.org/r-doc/splines/ns
Note: This question is related to the question below, if that helps, though its solution did not help me solve my problem:
Extract estimates of GAM
The knots are fixed at the time that the ns function is called in the examples on the help page you linked to, so you could have extracted the knots without going into the model object. But ... you have not provided the code for the GAM model creation, so we can only speculate about what you might have done. Just because the word "spline" is used in both the ?ns-help-page and in the documentation does not mean they are the same. The model in the other page you linked to had two "smooth" terms constructed wtih the s function.
.... + s(time,bs="cr",k=200) + s(tmpd,bs="cr")
The result of that gam call had a list node named "smooth" and the first one looked like this when viewed with str():
str(ap1$smooth)
List of 2
$ :List of 22
..$ term : chr "time"
..$ bs.dim : num 200
..$ fixed : logi FALSE
..$ dim : int 1
..$ p.order : logi NA
..$ by : chr "NA"
..$ label : chr "s(time)"
..$ xt : NULL
..$ id : NULL
..$ sp : Named num -1
.. ..- attr(*, "names")= chr "s(time)"
..$ S :List of 1
.. ..$ : num [1:199, 1:199] 5.6 -5.475 2.609 -0.577 0.275 ...
..$ rank : num 198
..$ null.space.dim: num 1
..$ df : num 199
..$ xp : Named num [1:200] -2556 -2527 -2502 -2476 -2451 ...
.. ..- attr(*, "names")= chr [1:200] "0.0000000%" "0.5025126%" "1.0050251%" "1.5075377%" ...
..$ F : num [1:40000] 0 0 0 0 0 0 0 0 0 0 ...
..$ plot.me : logi TRUE
..$ side.constrain: logi TRUE
..$ S.scale : num 9.56e-05
..$ vn : chr "time"
..$ first.para : num 5
..$ last.para : num 203
..- attr(*, "class")= chr [1:2] "cr.smooth" "mgcv.smooth"
..- attr(*, "qrc")=List of 4
.. ..$ qr : num [1:200, 1] -0.0709 0.0817 0.0709 0.0688 0.0724 ...
.. ..$ rank : int 1
.. ..$ qraux: num 1.03
.. ..$ pivot: int 1
.. ..- attr(*, "class")= chr "qr"
..- attr(*, "nCons")= int 1
So the smooth was evaluated at each of 200 points and a polynomial function fit to the data on that grid. If you forced the knots to be at three interior locations then they will just be at the extremes and evenly spaced location between the extremes.