ggmap and spatial data plotting issue - r

I am trying to update some old code that I inherited from before the Google API days to do a fairly simple (I think) plot.
By the time I get to the plot, my data consists of 50 triplets of latitude, longitude, and $K amount of investment at that location.
head(investData)
amount latitude longitude
1 1404 42.45909 -71.27556
2 1 42.29076 -71.35368
3 25 42.34700 -71.10215
4 1 40.04492 -74.58916
5 15 43.16431 -75.51130
at this point I use the following
register_google(key = "###myKey###") #my actual key here
USAmap <- qmap("USA",zoom=4)
USAmap + geom_point(data=investData, aes(x=investData$longitude, y=investData$latitude,size=investData$amount))
I've been fighting all ay with establishing accounts and enabling APIs with Google, so it's entirely possible I've simply failed to enable something I need to. I have the geocoding, geolocation, and maps static APIs enabled.
I get the following output at the console
Source : https://maps.googleapis.com/maps/api/staticmap?center=USA&zoom=4&size=640x640&scale=2&maptype=terrain&language=en-EN&key=xxx
Source : https://maps.googleapis.com/maps/api/geocode/json?address=USA&key=xxx
But I get no plot.
if I simply run
qmap("USA", zoom=4)
I get the map I expect. But when I try to overlay the investment data I get zilch. I'm told by the folks who handed this to me that it worked in 2017...
Any idea where I'm going wrong?

If you are running your script via the source function or with the run command (from inside RStudio) you must explicitly call the print function on your ggplot commands. For example:
print(USAmap + geom_point(data=investData, aes(x=longitude, y=latitude,size=amount)))
As Camille mentioned, no need for the $ inside the aes

Related

Error while using "EpiEstim" and "ggplot2" libraries

First of all, I must say I'm completely noob in R. So I apologize in advance for asking for help with such a simple task. My task is to form a graph of COVID-19 cases for a certain period using data from the CSV file. Unfortunately, at the moment I cannot contact the person from the World Health Organization who provided the data and the script for launching. But I was left with an error that I cannot fix either myself, not with the help of Google.
script.R
library(EpiEstim)
library(ggplot2)
COVID<-read.csv("dataset.csv")
res_parametric_si<-estimate_R(COVID$I,method="parametric_si",config=make_config(list(mean_si=4,std_si=3)))
plot(res_parametric_si)
dataset.csv
Date,Suspected per day,Total suspected,Discarded/pending,Confirmed per day,Total confirmed,Deaths per day,Deaths Total,Case fatality rate,Daily confirmed,Recovered per day,Recovered total,Active cases,Tested with PCR,# of PCR tests total,average tests/ 7 days,Inf HCW,Inf HCW/d,Vent HCW,Susp per day
01-Jul-20,1239,91172,45285,889,45887,12,1185,2.58%,889,505,20053,24649,11109,676684,10073,6828,63,,1239
02-Jul-20,1249,92421,45658,876,46763,27,1212,2.59%,876,505,20558,24993,13167,689851,9966,6874,46,,1249
03-Jul-20,1288,93709,46032,914,47677,15,1227,2.57%,914,597,21155,25295,11825,701676,9915.7,6937,63,,1288
04-Jul-20,926,94635,46135,823,48500,22,1249,2.58%,823,221,21376,25875,9934,711610,9957,6990,53,,926
05-Jul-20,680,95315,46272,543,49043,13,1262,2.57%,543,327,21703,26078,6696,718306,9963.7,7030,40,,680
06-Jul-20,871,96186,46579,564,49607,21,1283,2.59%,564,490,22193,26131,9343,727649,10303.9,7046,16,,871
07-Jul-20,1170,97356,46942,807,50414,23,1306,2.59%,807,926,23119,25989,13568,741217,10806,7092,46,,1170
Error
Error in process_I(incid) (script.R#4): incid must be a vector or a dataframe with either i) a column called 'I', or ii) 2 columns called 'local' and 'imported'.
For the example data the issue seems to be that it does only cover 7 data points, and the configurator assumes that there it can window over more than 7 days. What worked for me was the following code (working in the sense that it does not throw an error).
config <- make_config(incid = COVID$Daily.confirmed,
method="parametric_si",
list(mean_si=4,std_si=3, t_start = c(2,3),t_end = c(6,7)))
res_parametric_si<-estimate_R(COVID$Daily.confirmed,method="parametric_si",config=config)
plot(res_parametric_si)

OpenWeatherMap not returning metric despite having included necessary parameter

I was working on some code that used the OpenWeatherMap API. I need my data to be in Celsius, so I included the necessary &units=metric as said on their site.
api.openweathermap.org/data/2.5/weather?id=XXX&appid=XXX&units=metric
However, this returns:
{"error": "404"}
Furthermore, I tried without including the &units=metric parameter and everything works just fine.
So what exactly is the problem?
It looks it is not allowed to use units parameter in this type of api call.
You need to use it in calls like this:
samples.openweathermap.org/data/2.5/find?q=London&units=metric&appid=xxx
if you change /find with /weather you'll get 404 and the same if you change q=London with id=524901
So I think it's better to get the temperature in Fahrenheit and then use the formula to convert it in Celsius :
(X °F - 32) × 5/9 = Y °C

Tarifx.geo - Creating multiple georoutes

I'm not sure what I'm doing wrong. I'm trying to find the drive time between many zip codes. I am able to do so with ggmap, however the API load limit is 2,500. I need to run this for more and heard that Bing can help. I tried using tarifx.geo, which works for a single zip to zip drive, but I need each zip combination individually listed. Lets get to the examples:
Google:
require(ggmap)
from <- as.character(c("27205","48212"))
to <- as.character(c("54952","14450"))
driveTimes <- mapdist(from, to, mode='driving')
print(driveTimes)
from to m km miles seconds minutes hours
1 27205 54952 1533077 1533.077 952.654 52716 878.6000 14.643333
2 48212 14450 555906 555.906 345.440 19700 328.3333 5.472222
^^ Notice the two drive times
taRifx.geo
When I use tarifx.geo, it looks at each zip code as a waypoint along a trip.
require("taRifx.geo")
from <- as.character(c("27205","48212"))
to <- as.character(c("54952","14450"))
combined <- data.frame(from, to)
combined[] <- lapply(combined, as.character)
for (i in 1:nrow(combined){
driveTimes <- georoute( c(combined[i,1], combined[i,2]),
verbose=TRUE,
returntype="time",
service="bing" )
}
print(driveTimes)
time
1 21040
^^ Here I need it to print two rows for each zip to zip drive time.
Thanks in advance for your help! I've tried several methods defining the from/to, but I might be suffering from lack of sleep and can't see the problem right in front of me. If there is a better solution, please do tell. :)

Titan Graph Queries taking too long to execute

I have a problem with the executing speed of Titan queries.
To be more specific:
I created a property file for my graph using BerkeleyJe which is looking like this:
storage.backend=berkeleyje
storage.directory=/finalGraph_script/graph
Afterwards, i opened the Gremlin.bat to open my Graph.
I set up all the neccessary Index Keys for my nodes:
m = g.getManagementSystem();
username = m.makePropertyKey('username').dataType(String.class).make()
m.buildIndex('byUsername',Vertex.class).addKey(username).unique().buildCompositeIndex()
m.commit()
g.commit()
(all other keys are created the same way...)
I imported a csv file containing about 100 000 lines, each line is producing at least 2 nodes and some edges. All this is done via Batchloading.
That works without a Problem.
Then i execute a groupBy query which is looking like that:
m = g.V.has("imageLink").groupBy{it.imageLink}{it.in("is_on_image").out("is_species")}{it._().species.groupCount().cap.next()}.cap.next()
With this query i want for every node with the property key "imageLink" the number of the different "species". "Species" are also nodes, and can be called by going back the edge "is_on_image" and following the edge "is_species".
Well this is also working like a charm, for my recent nodes. This query is taking about 2 minutes on my local PC.
But now to the problem.
My whole dataset is a csv with 10 million entries. The structure is the same as above, and each line is also creating at least 2 nodes and some edges.
With my local PC i cant even import this set, causing an Memory Exception after 3 days of loading.
So I tried the same on a server with much more RAM and memory. There the Import works, and takes about 1 day. But the groupBy failes after about 3 days.
I actually dont know if the groupBy itself fails, or just the Connection to the Server after that long time.
So my first Question:
In my opinion about 15 million nodes shouldn't be that big deal for a graph database, should it?
Second Question:
Is it normal that it takes so long? Or is there anyway to speed it up using indices? I configured the indices as listet above :(
I don't know which exact information you need for helping me, but please just tell me what you need in addition to that.
Thanks a lot!
Best regards,
Ricardo
EDIT 1: The way im loading the CSV in the Graph:
I'm using this code, i deleted some unneccassry properties, which are also set an property for some nodes, loaded the same way.
bg = new BatchGraph(g, VertexIDType.STRING, 10000)
new File("annotation_nodes_wNothing.csv").eachLine({ final String line ->def (annotationId,species,username,imageLink) = line.split('\t')*.trim();def userVertex = bg.getVertex(username) ?: bg.addVertex(username);def imageVertex = bg.getVertex(imageLink) ?: bg.addVertex(imageLink);def speciesVertex = bg.getVertex(species) ?: bg.addVertex(species);def annotationVertex = bg.getVertex(annotationId) ?: bg.addVertex(annotationId);userVertex.setProperty("username",username);imageVertex.setProperty("imageLink", imageLink);speciesVertex.setProperty("species",species);annotationVertex.setProperty("annotationId", annotationId);def classifies = bg.addEdge(null, userVertex, annotationVertex, "classifies");def is_on_image = bg.addEdge(null, annotationVertex, imageVertex, "is_on_image");def is_species = bg.addEdge(null, annotationVertex, speciesVertex, "is_species");})
bg.commit()
g.commit()

Issue while ingesting a Titan graph into Faunus

I have installed both Titan and Faunus and each seems to be working properly (titan-0.4.4 & faunus-0.4.4)
However, after ingesting a sizable graph in Titan and trying to import it in Faunus via
FaunusFactory.open( )
I am experiencing issues. To be more precise, I do seem to get a faunus graph from the call FaunusFactory.open( ),
faunusgraph[titanhbaseinputformat->titanhbaseoutputformat]
but then, even asking a simple
g.v(10)
I do get this error:
Task Id : attempt_201407181049_0009_m_000000_0, Status : FAILED
com.thinkaurelius.titan.core.TitanException: Exception in Titan
at com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.getAdminInterface(HBaseStoreManager.java:380)
at com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.ensureColumnFamilyExists(HBaseStoreManager.java:275)
at com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.openDatabase(HBaseStoreManager.java:228)
My property file is taken straight out of the Faunus page with Titan-HBase input, except of course changing the url of the hadoop cluster:
faunus.graph.input.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseInputFormat
faunus.graph.input.titan.storage.backend=hbase
faunus.graph.input.titan.storage.hostname= my IP
faunus.graph.input.titan.storage.port=2181
faunus.graph.input.titan.storage.tablename=titan
faunus.graph.output.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseOutputFormat
faunus.graph.output.titan.storage.backend=hbase
faunus.graph.output.titan.storage.hostname= IP of my host
faunus.graph.output.titan.storage.port=2181
faunus.graph.output.titan.storage.tablename=titan
faunus.graph.output.titan.storage.batch-loading=true
faunus.output.location=output1
zookeeper.znode.parent=/hbase-unsecure
titan.graph.output.ids.block-size=100000
Anyone can help?
ADDENDUM:
To address the comment below, here is some context: as I have mentioned, I have a graph in Titan and can perform basic gremlin queries on it.
However, I do need to run a gremlin global query which, due to the size of the graph, needs Faunus and its underlying MR capabilities. Hence the need to import it. The error I get doesn't look to me as if it points to some inconsistency in the graph itself.
I'm not sure that you have your "flow" of Faunus right. If your end result is to do a global query of the graph, then consider this approach:
pull your graph to sequence file
issue your global query over the sequence file
More specifically create hbase-seq.properties:
# input graph parameters
faunus.graph.input.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseInputFormat
faunus.graph.input.titan.storage.backend=hbase
faunus.graph.input.titan.storage.hostname=localhost
faunus.graph.input.titan.storage.port=2181
faunus.graph.input.titan.storage.tablename=titan
# hbase.mapreduce.scan.cachedrows=1000
# output data (graph or statistic) parameters
faunus.graph.output.format=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
faunus.output.location=snapshot
faunus.output.location.overwrite=true
In Faunus, copy do:
g = FaunusFactory.open('hbase-seq.properties')
g._()
That will read the graph from hbase and write it to sequence file in HDFS. Next, create: seq-noop.properties with these contents:
# input graph parameters
faunus.graph.input.format=org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat
faunus.input.location=snapshot/job-0
# output data parameters
faunus.graph.output.format=com.thinkaurelius.faunus.formats.noop.NoOpOutputFormat
faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
faunus.output.location=analysis
faunus.output.location.overwrite=true
The above configuration will read your sequence file from the previous step and without re-writing the graph (that's what NoOpOutputFormat is for). Now in Faunus do:
g = FaunusFactory.open('seq-noop.properties')
g.V.sideEffect('{it.degree=it.bothE.count()}').degree.groupCount()
This will execute a degree distribution, writing the results in HDFS to the 'analysis' directory. Obviously you can do whatever Faunus-flavored Gremlin you want here - I just wanted to provide an example. I think this is a pretty standard "flow" or pattern for using Faunus from a graph analysis perspective.

Resources