Scraping wallets address on etherscan to .XLMS

Scraping wallets address on etherscan to .XLMS - web-scraping

I'm a student in IT Big Data, I'm currently working on a school project where I want to create a graph of all recent transaction. But can't find a good way to get the data correctly from the API anybody has an idea to do it ?

My personal recommendation is to not use Etherscan, and instead to use ethers.js and an RPC provider like infura or alchemy - or if you're feeling ambitious, run your own node. Instantiate a provider and listen for the block event: https://docs.ethers.io/v5/api/providers/provider/#Provider--events. If you want to listen for EIP-20 token transfers only, you can use the answer in this question: https://ethereum.stackexchange.com/questions/87643/how-to-listen-to-contract-events-using-ethers-js
Getting the data manually might seem more complicated, but it's actually more simple (and quicker/more customizable!) than polling the Etherscan API.

So thanks for your answer, I chose to work with infura and JS here is the ways i made it work, with that you ll get all the transaction from 150 blocks on mainnet :
async function data() {
var Web3 = require('web3');
var provider = 'https://mainnet.infura.io/v3/apikey';
var web3Provider = new Web3.providers.HttpProvider(provider);
var web3 = new Web3(web3Provider);
console.log("transaction per block");
var k= 15623650;
for(var j= 15623650;k-j<150;j--){
var a;
var onumber_of_transaction_by_block = await web3.eth.getBlockTransactionCount(j).then(a = this);
var Number_by_block =await Number(onumber_of_transaction_by_block);
for(var i=1;i<=Number_by_block-1;i++){
console.log("transaction");
var transaction = await web3.eth.getTransactionFromBlock(j , i);
console.log("block :" + j + ", transaction :" + i)
}
}

Related

Server performance question about streaming from cosmos dB

I read the article here about IAsyncEnumerable, more specifically towards a Cosmos Db-datasource
public async IAsyncEnumerable<T> Get<T>(string containerName, string sqlQuery)
{
var container = GetContainer(containerName);
using FeedIterator<T> iterator = container.GetItemQueryIterator<T>(sqlQuery);
while (iterator.HasMoreResults)
{
foreach (var item in await iterator.ReadNextAsync())
{
yield return item;
}
}
}
I am wondering how the CosmosDB is handling this, compared to paging, lets say 100 documents at the time. We have had some "429 - Request rate too large"-errors in the past and I dont wish to create new ones.
So, how will this affect server load/performance.
I dont see a big difference from the servers perspective, between when client is streaming (and doing some quick checks), and old way, get all document and while (iterator.HasMoreResults) and collect the items in a list.

The SDK will retrieve batches of documents that can be adjusted in size using the QueryRequestOptions and changing the MaxItemCount (which defaults to 100 if not set). It has no option though to throttle the RU usage apart from it running into the 429 error and using the retry mechanism the SDK offers to retry a while later. Depending on how generous you set the retry mechanism it'll retry oft & long enough to get a proper response.
If you have a situation where you want to limit the RU usage for e.g. there's multiple processes using your cosmos and you don't want those to result in 429 errors you would have to write the logic yourself.
An example of how something like that could look:
var qry = container
.GetItemLinqQueryable<Item>(requestOptions: new() { MaxItemCount = 2000 })
.ToFeedIterator();
var results = new List<Item>();
var stopwatch = new Stopwatch();
var targetRuMsRate = 200d / 1000; //target 200RU/s
var previousElapsed = 0L;
var delay = 0;
stopwatch.Start();
var totalCharge = 0d;
while (qry.HasMoreResults)
{
if (delay > 0)
{
await Task.Delay(delay);
}
previousElapsed = stopwatch.ElapsedMilliseconds;
var response = await qry.ReadNextAsync();
var charge = response.RequestCharge;
var elapsed = stopwatch.ElapsedMilliseconds;
var delta = elapsed - previousElapsed;
delay = (int) ((charge - targetRuMsRate * delta) / targetRuMsRate);
foreach (var item in response)
{
results.Add(item);
}
}
Edit:
Internally the SDK will call the underlying Cosmos REST API. Once your code reaches the iterator.ReadNextSync() it will call the query documents method in the background. If you would dig into the source code or intercept the message send to HttpClient you can observe the resulting message which lacks the x-ms-max-item-count header that determines the number of the documents it'll try to retrieve (unless you have specified a MaxItemCount yourself). According to the Microsoft Docs it'll default to 100 if not set:
Query requests support pagination through the x-ms-max-item-count and x-ms-continuation request headers. The x-ms-max-item-count header specifies the maximum number of values that can be returned by the query execution. This can be between 1 and 1000, and is configured with a default of 100.

IngestFromStreamAsync method does not work

I manage to ingest data successfully using below code
var kcsbDM = new KustoConnectionStringBuilder(
"https://test123.southeastasia.kusto.windows.net",
"testdb")
.WithAadApplicationTokenAuthentication(acquireTokenTask.AccessToken);
using (var ingestClient = KustoIngestFactory.CreateDirectIngestClient(kcsbDM))
{
var ingestProps = new KustoQueuedIngestionProperties("testdb", "TraceLog");
ingestProps.ReportLevel = IngestionReportLevel.FailuresOnly;
ingestProps.ReportMethod = IngestionReportMethod.Queue;
ingestProps.Format = DataSourceFormat.json;
//generate datastream and columnmapping
ingestProps.IngestionMapping = new IngestionMapping() {
IngestionMappings = columnMappings };
var ingestionResult = ingestClient.IngestFromStream(memStream, ingestProps);
}
when I try to use QueuedClient and IngestFromStreamAsync, the code is executed successfully but no any data is ingested into database even after 30 minutes
var kcsbDM = new KustoConnectionStringBuilder(
"https://ingest-test123.southeastasia.kusto.windows.net",
"testdb")
.WithAadApplicationTokenAuthentication(acquireTokenTask.AccessToken);
using (var ingestClient = KustoIngestFactory.CreateQueuedIngestClient(kcsbDM))
{
var ingestProps = new KustoQueuedIngestionProperties("testdb", "TraceLog");
ingestProps.ReportLevel = IngestionReportLevel.FailuresOnly;
ingestProps.ReportMethod = IngestionReportMethod.Queue;
ingestProps.Format = DataSourceFormat.json;
//generate datastream and columnmapping
ingestProps.IngestionMapping = new IngestionMapping() {
IngestionMappings = columnMappings };
var ingestionResult = ingestClient.IngestFromStreamAsync(memStream, ingestProps);
}

Try running .show ingestion failures on "https://test123.southeastasia.kusto.windows.net" endpoint, see if there are ingestion error.
Also, you set Queue reporting method, you can get the detailed result by reading from the queue.
ingestProps.ReportLevel = IngestionReportLevel.FailuresOnly;
ingestProps.ReportMethod = IngestionReportMethod.Queue;
(On the first example you used KustoQueuedIngestionProperties, you should use KustoIngestionProperties. KustoQueuedIngestionProperties has additional properties that will be ignored by the ingest client, ReportLevel and ReportMethod for example)

Could you please change the line to:
var ingestionResult = await ingestClient.IngestFromStreamAsync(memStream, ingestProps);
Also please note that queued ingestion has a batching stage of up to 5 minutes before the data is actually ingested:
IngestionBatching policy
.show table ingestion batching policy

I find the reason finally, need to enable stream ingestion in the table:
.alter table TraceLog policy streamingingestion enable
See the Azure documentation for details.

enable streamingestion policy is actually only needed if
stream ingestion is turned on in the cluster (azure portal)
the code is using CreateManagedStreamingIngestClient
the ManagedStreamingIngestClient will first try stream ingesting the data, if it fails a few times, then it will use the QueuedClient
if the ingesting data is smaller, under 4MB, it's recommended to use this client.
if using QueuedClient, you can try
.show commands-and-queries | | where StartedOn > ago(20m) and Text contains "{YourTableName}" and CommandType =="DataIngestPull"
This can give you the command executed; however it could have latency > 5 mins
Finally, you can check the status with any client you use, do this
StreamDescription description = new StreamDescription
{
SourceId = Guid.NewGuid(),
Stream = dataStream
};
then you have the source id
ingesting by calling this:
var checker = await client.IngestFromStreamAsync(description, ingestProps);
after that, call
var statusCheck = checker.GetIngestionStatusBySourceId(description.sourceId.Value);
You can figure out the status of this ingestion job. It's better wrapped in a separate thread, so you can keep checking once a few seconds, for example.

Gremlin add relationship function is giving exception

Please see below code:
peopleObj.forEach( item=>{
let user = item.user;
let event = item.event;
var userNode = g.addV('user');
Object.keys(user).forEach(att=>{
console.log('att: ' + att+", val: "+ user[att]);
userNode.property(att, user[att]);
});
userNode.next();
console.log('created userNode');
eventNode = g.addV('event');
Object.keys(event).forEach(att=>{
console.log('att: ' + att+", val: "+ event[att]);
eventNode.property(att, event[att]);
});
eventNode.next();
console.log('created eventNode');
// const names = await g.V().hasLabel('event').values('name').toList();
// console.log(names);
var u_p = g.V().hasLabel('user').has('name',user.name).next();
var e_p = g.V().hasLabel('event').has('name',event.name).next();
var r1 = g.V(u_p).addE('triggers').to(e_p);
r1.next();
}
when I run it in console, I see below error:
(node:30272) UnhandledPromiseRejectionWarning: Error: Server error: Could not locate method: DefaultGraphTraversal.to([{}]) (599)
at DriverRemoteConnection._handleMessage (/Users/frankhe/projects/aws/sam-app/hello-world/node_modules/gremlin/lib/driver/driver-remote-connection.js:180:9)
I followed the Gremlin V3 doc, why the node can be added, but the edge can not be added here?
Another question is in Gremlin, what is the best approach to check existence before creating? if you look at code, I am just creating event directly, but I need to avoid duplicated events, I tries to use await as indicated in the doc, but there is NO await at all in nodeJS. can anyone tell me the best approach provided by Gremlin?
Thanks in advance.
Answer:
My gremlin nodejs is:
"gremlin": "^3.3.4"
and my gremlin server is
apache-tinkerpop-gremlin-server-3.3.4
The most important problem is no matter how I did, I always meet this error:
Server error: Could not locate method: DefaultGraphTraversal.to([{}])
I changed to asyn way already, but useless. Can anyone show me a working sample for using nodeJS with Gremlin?
Thanks
The simplified version is here:
var g1 = graph.traversal().withRemote(new DriverRemoteConnection('ws://localhost:8182/gremlin'));
var v1 = g1.addV('user').property('name','Frank').next(()=>{
console.log('created1')
var v2 = g1.addV('event').property('name','Event1').next(()=>{
console.log('created2')
g1.V(v1).addE('trigger').to(v2).property('weight',0.75).iterate();
});
});
But in the console. I never saw he log info for created1 at all.
Can you give me a working sample in nodeJS?

You are mixing sync and asynchronous executions in the same code.
Consider that when you use:
userNode.next();
console.log('created userNode');
The message is incorrect. next() returns a Promise that gets resolved asynchronously (see: RemoteConnection Submission).
The correct usage would be:
userNode.next().then(() => console.log('created userNode'));
or, if you are using async functions:
await userNode.next();
console.log('created userNode');

Just removed the .next() and g.V(v1) calls and now the example is working well:
const v1 = await g.addV('person').property('name','marko');
const v2 = await g.addV('person').property('name','stephen');
await v1.addE('knows').to(v2).property('weight',0.75).iterate();
I'm using: gremlin#3.4.1

AngularFire extending the service issue

I've been looking at the documentation for Synchronized Arrays https://www.firebase.com/docs/web/libraries/angular/api.html#angularfire-extending-the-services and https://www.firebase.com/docs/web/libraries/angular/guide/extending-services.html#section-firebasearray
I'm using Firebase version 2.2.7 and AngularFire version 1.1.2
Using the code below, I'm having trouble recognizing $$removed events.
.factory("ExtendedCourseList", ["$firebaseArray", function($firebaseArray) {
// create a new service based on $firebaseArray
var ExtendedCourseList= $firebaseArray.$extend({
$$added: function(dataSnapshot, prevChild){
var course = dataSnapshot.val();
var course_key = dataSnapshot.key();
console.log("new course");
return course;
},
$$removed: function(snap){
console.log("removed");
return true;
}
});
return function(listRef) {
return new ExtendedCourseList(listRef);
}
}])
.factory("CompanyRefObj", function(CompanyRef) {
//CompanyRef is a constant containing the url string
var ref = new Firebase(CompanyRef);
return ref;
})
.factory('CourseList', function (localstorage,$rootScope,ExtendedCourseList,CompanyRefObj) {
var companyID = localstorage.get("company");
$rootScope.courseList = ExtendedCourseList(CompanyRefObj.child(companyID).child("courses"));
)
If I run this code, only the $$added events will be triggered. To simulate the remove events I use the web-interface at Firebase to display data, where I press the remove button and accept the data being deleted permanently.
Additionally, if I delete the $$removed function, the extended service still won't synchronize when a record is deleted.
If I modify my code to use the $firebaseArray instead of extending the service (as seen above) both add and remove events will be recognized.
.factory('CourseList', function (localstorage,$rootScope,$firebaseArray,CompanyRefObj) {
var companyID = localstorage.get("company");
$rootScope.courseList = $firebaseArray(CompanyRefObj.child(companyID).child("courses"));
)
Finally, are there any bad practices I've missed that can cause some of the extended functions to not work?
Solved
$$added: function(dataSnapshot, prevChild){
var course = dataSnapshot.val();
var course_key = dataSnapshot.key();
//Modified below
course.$id = course_key;
//End of modification
console.log("new course");
return course;
}

After posting about the issue at firebase/angularfire github I received an answer that solved my issue. When $$added got overridden by the code provided, the $firebaseArray also lost its internal record $id.
Adding this line of code: course.$id = course_key; before returning the course, made AngularFire recognize when the record was removed from the server.

WebRTC Peerconnection: Which IP flow of candidates set is used?

I am currently working on a monitoring tool for webrtc sessions investigating into the transferred SDP from caller to callee and vice versa. Unfortunately I cannot figure out which ip flow is really used since there are >10 candidate lines per session establishment and somehow the session is established after some candidates are pushed inside the PC.
Is there any way to figure out which flow is being used of the set of candidate flows?

I solved the issue by myself! :)
There is a function called peerConnection.getStats(callback);
This will give a lot of information of the ongoing peerconnection.
Example: http://webrtc.googlecode.com/svn/trunk/samples/js/demos/html/constraints-and-stats.html
W3C Standard Description: http://dev.w3.org/2011/webrtc/editor/webrtc.html#statistics-model
Bye

I wanted to find out the same thing, so wrote a small funtion which returns a promise which resolves to candidate details:
function getConnectionDetails(peerConnection){
var connectionDetails = {}; // the final result object.
if(window.chrome){ // checking if chrome
var reqFields = [ 'googLocalAddress',
'googLocalCandidateType',
'googRemoteAddress',
'googRemoteCandidateType'
];
return new Promise(function(resolve, reject){
peerConnection.getStats(function(stats){
var filtered = stats.result().filter(function(e){return e.id.indexOf('Conn-audio')==0 && e.stat('googActiveConnection')=='true'})[0];
if(!filtered) return reject('Something is wrong...');
reqFields.forEach(function(e){connectionDetails[e.replace('goog', '')] = filtered.stat(e)});
resolve(connectionDetails);
});
});
}else{ // assuming it is firefox
var stream = peerConnection.getLocalStreams()[0];
if(!stream || !stream.getTracks()[0]) stream = peerConnection.getRemoteStreams()[0];
if(!stream) Promise.reject('no stream found')
var track = stream.getTracks()[0];
if(!track) Promise.reject('No Media Tracks Found');
return peerConnection.getStats(track).then(function(stats){
var selectedCandidatePair = stats[Object.keys(stats).filter(function(key){return stats[key].selected})[0]]
, localICE = stats[selectedCandidatePair.localCandidateId]
, remoteICE = stats[selectedCandidatePair.remoteCandidateId];
connectionDetails.LocalAddress = [localICE.ipAddress, localICE.portNumber].join(':');
connectionDetails.RemoteAddress = [remoteICE.ipAddress, remoteICE.portNumber].join(':');
connectionDetails.LocalCandidateType = localICE.candidateType;
connectionDetails.RemoteCandidateType = remoteICE.candidateType;
return connectionDetails;
});
}
}

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Scraping wallets address on etherscan to .XLMS - web-scraping

I'm a student in IT Big Data, I'm currently working on a school project where I want to create a graph of all recent transaction. But can't find a good way to get the data correctly from the API anybody has an idea to do it ?

Related

Server performance question about streaming from cosmos dB

IngestFromStreamAsync method does not work

Gremlin add relationship function is giving exception

AngularFire extending the service issue

WebRTC Peerconnection: Which IP flow of candidates set is used?

Categories

Resources