Check for data availability before calling ee.ImageCollection.filterDate - google-earth-engine

Hey fellow EE developers!
I am currently working with the Dynamic World data set and am analyzing different locations for a set time range (2016-2022).
However, at some locations it seems data is missing, so EE returns:
Image.eq: If one image has no bands, the other must also have no bands. Got 0 and 1.
Minimal reproducible example (js, requires Google EE):
var startDate = "2017-01-01";
var endDate = "2018-01-01";
var geometry = ee.Geometry.Point([80.67174096,29.92240786]);
var dw = ee.ImageCollection("GOOGLE/DYNAMICWORLD/V1").filterDate(startDate,endDate).filterBounds(geometry);
print(dw.size());
I thought about running a try/except, but that didn't work. Can someone think how I can check for data availability before calling the ImageCollection.filterDate() method?

Related

AzureML Dataset.File.from_files creation extremely slow even with 4 files

I have a few thousand of video files in my BlobStorage, which I set it as a datastore.
This blob storage receives new files every night and I need to split the data and register each split as a new version of AzureML Dataset.
This is how I do the data split, simply getting the blob paths and splitting them.
container_client = ContainerClient.from_connection_string(AZ_CONN_STR,'keymoments-clips')
blobs = container_client.list_blobs('soccer')
blobs = map(lambda x: Path(x['name']), blobs)
train_set, test_set = get_train_test(blobs, 0.75, 3, class_subset={'goal', 'hitWoodwork', 'penalty', 'redCard', 'contentiousRefereeDecision'})
valid_set, test_set = split_data(test_set, 0.5, 3)
train_set, test_set, valid_set are just nx2 numpy arrays containing blob storage path and class.
Here is when I try to create a new version of my Dataset:
datastore = Datastore.get(workspace, 'clips_datastore')
dataset_train = Dataset.File.from_files([(datastore, b) for b, _ in train_set[:4]], validate=True, partition_format='**/{class_label}/*.mp4')
dataset_train.register(workspace, 'train_video_clips', create_new_version=True)
How is it possible that the Dataset creation seems to hang for an indefinite time even with only 4 paths?
I saw in the doc that providing a list of Tuple[datastore, path] is perfectly fine. Do you know why?
Thanks
Do you have your Azure Machine Learning Workspace and your Azure Storage Account in different Azure Regions? If that's true, latency may be a contributing factor with validate=True.
Another possibility may be slowness in the way datastore paths are resolved. This is an area where improvements are being worked on.
As an experiment, could you try creating the dataset using a url instead of datastore? Let us know if that makes a difference to performance, and whether it can unblock your current issue in the short term.
Something like this:
dataset_train = Dataset.File.from_files(path="https://bloburl/**/*.mp4?accesstoken", validate=True, partition_format='**/{class_label}/*.mp4')
dataset_train.register(workspace, 'train_video_clips', create_new_version=True)
I'd be interested to see what happens if you run the dataset creation code twice in the same notebook/script. Is it faster the second time? I ask because it might be an issue with the .NET core runtime startup (which would only happen on the first time you run the code)
EDIT 9/16/20
While it doesn't seem to make sense that .NET core invoked when not data is moving, is suspect it is the validate=True part of the param that requires that all the data be inspected (which can computationally expensive). I'd be interested to see what happens if that param is False

Error while using "EpiEstim" and "ggplot2" libraries

First of all, I must say I'm completely noob in R. So I apologize in advance for asking for help with such a simple task. My task is to form a graph of COVID-19 cases for a certain period using data from the CSV file. Unfortunately, at the moment I cannot contact the person from the World Health Organization who provided the data and the script for launching. But I was left with an error that I cannot fix either myself, not with the help of Google.
script.R
library(EpiEstim)
library(ggplot2)
COVID<-read.csv("dataset.csv")
res_parametric_si<-estimate_R(COVID$I,method="parametric_si",config=make_config(list(mean_si=4,std_si=3)))
plot(res_parametric_si)
dataset.csv
Date,Suspected per day,Total suspected,Discarded/pending,Confirmed per day,Total confirmed,Deaths per day,Deaths Total,Case fatality rate,Daily confirmed,Recovered per day,Recovered total,Active cases,Tested with PCR,# of PCR tests total,average tests/ 7 days,Inf HCW,Inf HCW/d,Vent HCW,Susp per day
01-Jul-20,1239,91172,45285,889,45887,12,1185,2.58%,889,505,20053,24649,11109,676684,10073,6828,63,,1239
02-Jul-20,1249,92421,45658,876,46763,27,1212,2.59%,876,505,20558,24993,13167,689851,9966,6874,46,,1249
03-Jul-20,1288,93709,46032,914,47677,15,1227,2.57%,914,597,21155,25295,11825,701676,9915.7,6937,63,,1288
04-Jul-20,926,94635,46135,823,48500,22,1249,2.58%,823,221,21376,25875,9934,711610,9957,6990,53,,926
05-Jul-20,680,95315,46272,543,49043,13,1262,2.57%,543,327,21703,26078,6696,718306,9963.7,7030,40,,680
06-Jul-20,871,96186,46579,564,49607,21,1283,2.59%,564,490,22193,26131,9343,727649,10303.9,7046,16,,871
07-Jul-20,1170,97356,46942,807,50414,23,1306,2.59%,807,926,23119,25989,13568,741217,10806,7092,46,,1170
Error
Error in process_I(incid) (script.R#4): incid must be a vector or a dataframe with either i) a column called 'I', or ii) 2 columns called 'local' and 'imported'.
For the example data the issue seems to be that it does only cover 7 data points, and the configurator assumes that there it can window over more than 7 days. What worked for me was the following code (working in the sense that it does not throw an error).
config <- make_config(incid = COVID$Daily.confirmed,
method="parametric_si",
list(mean_si=4,std_si=3, t_start = c(2,3),t_end = c(6,7)))
res_parametric_si<-estimate_R(COVID$Daily.confirmed,method="parametric_si",config=config)
plot(res_parametric_si)

ggmap and spatial data plotting issue

I am trying to update some old code that I inherited from before the Google API days to do a fairly simple (I think) plot.
By the time I get to the plot, my data consists of 50 triplets of latitude, longitude, and $K amount of investment at that location.
head(investData)
amount latitude longitude
1 1404 42.45909 -71.27556
2 1 42.29076 -71.35368
3 25 42.34700 -71.10215
4 1 40.04492 -74.58916
5 15 43.16431 -75.51130
at this point I use the following
register_google(key = "###myKey###") #my actual key here
USAmap <- qmap("USA",zoom=4)
USAmap + geom_point(data=investData, aes(x=investData$longitude, y=investData$latitude,size=investData$amount))
I've been fighting all ay with establishing accounts and enabling APIs with Google, so it's entirely possible I've simply failed to enable something I need to. I have the geocoding, geolocation, and maps static APIs enabled.
I get the following output at the console
Source : https://maps.googleapis.com/maps/api/staticmap?center=USA&zoom=4&size=640x640&scale=2&maptype=terrain&language=en-EN&key=xxx
Source : https://maps.googleapis.com/maps/api/geocode/json?address=USA&key=xxx
But I get no plot.
if I simply run
qmap("USA", zoom=4)
I get the map I expect. But when I try to overlay the investment data I get zilch. I'm told by the folks who handed this to me that it worked in 2017...
Any idea where I'm going wrong?
If you are running your script via the source function or with the run command (from inside RStudio) you must explicitly call the print function on your ggplot commands. For example:
print(USAmap + geom_point(data=investData, aes(x=longitude, y=latitude,size=amount)))
As Camille mentioned, no need for the $ inside the aes

NetworkMapCache is empty in MockNetwork tests

I'm writing some lightweight flow tests with mocked everything and I ran into error that on all nodes NetworkMapService contains only the node itself. Identity service from the other hand contains all 3 nodes participating in a test.
net = MockNetwork()
issuer = net.createNode(legalName = CHARLIE.name)
alice = net.createNode(legalName = ALICE.name)
bob = net.createNode(legalName = BOB.name)
issuer.registerInitiatedFlow(IssueClaimFlow.Issuer::class.java)
alice.registerInitiatedFlow(VerifyClaimFlow.Prover::class.java)
MockServices.makeTestDatabaseAndMockServices(createIdentityService = { InMemoryIdentityService(listOf(ALICE_IDENTITY, BOB_IDENTITY, CHARLIE_IDENTITY), emptySet(), DEV_TRUST_ROOT) } )
net.registerIdentities()
net.runNetwork()
In this case flow goes well until first sendAndRecieve() call. There I get:
12:28:12.832 [Mock network] WARN net.corda.flow.[8f685c46-9ab6-4d64-b3f2-6b7476813c3b] - Terminated by unexpected exception
java.lang.IllegalArgumentException: Don't know about party C=ES,L=Madrid,O=Alice Corp
Funny thing that test is still finishes green (no useful work done thou). But its probably topic for another question.
I can overcome it with setting up cache manually like this:
alice.services.networkMapCache.addNode(issuer.info)
bob.services.networkMapCache.addNode(alice.info)
But is this correct way to go? I don't see something like this in samples or wherever.
If you look at the definition of MockNetwork.createNode, the networkMapAddress defaults to null.
Instead, you should use MockNetwork.createSomeNodes, which creates a network map node for you, then sets that node as the network map for every subsequent node it creates.
Here's an example from the CorDapp Example:
network = MockNetwork()
val nodes = network.createSomeNodes(2)
a = nodes.partyNodes[0]
b = nodes.partyNodes[1]
You can see the full example here: https://github.com/corda/cordapp-example/blob/release-V2/kotlin-source/src/test/kotlin/com/example/flow/IOUFlowTests.kt.

Unable to build inline segments in RSiteCatalyst package in R

I am trying to build the inline segment to filter the pages (ex. to separate the pages for blogs and games) using the function BuildClassificationValueSegment() to get the data from Adobe Analytics API,
I have tried some thing like
report.data.visits <- QueueTrended(reportsuite.id,date.from,date.to,metrics,elements,
segment.inline = BuildClassificationValueSegment("evar2","blog","OR")).
Got error like :
Error in ApiRequest(body = report.description, func.name = "Report.Validate") :
ERROR: segment_invalid - Segment "evar2" not valid for this company
In addition: Warning message:
In if (segment.inline != "") { :
the condition has length > 1 and only the first element will be used
Please help on the same.Thanks in advance...
I recommend you to declare the InlineSegment in advance and store it in a variable. Then pass it to the QueueTrended function.
I've been using the following syntax to generate an inline segment:
InlineSegment <- list(container=list(type=unbox("hits"),
rules=data.frame(
name=c("Page Name(eVar48)"),
element=c("evar48"),
operator=c("equals"),
value=c(as.character("value1","value2"))
))
You can change the name and element arguments in order to personalize the query.
The next step is to pass the InlineSegment to the QueueRanked function:
Report <- as.data.frame(QueueRanked("reportsuite",
date.from = dateStart,
date.to = dateEnd,
metrics = c("pageviews"),
elements = c("element"),
segment.inline = InlineSegment,
max.attempts=500))
I borrowed that syntax from this thread some time ago: https://github.com/randyzwitch/RSiteCatalyst/issues/129
Please note that there might be easier ways to obtain this kind of report without using InlineSegmentation. Maybe you can use the selected argument from the QueueRanked function in order to narrow down the scope of the report.
Also, I'm purposefully avoiding the BuildClassificationValueSegment function as I found it a bit difficult to understand.
Hope this workaround helps...

Resources