Ocean Protocol: Decentralized Big Data Sharing and Artificial Intelligence - bigdata

Ocean Protocol claims that it created a "decentralized data marketplace" using decentralization of data sharing by blockchain. They say that their platform can be used in Artificial Intelligence. However, the amount of data employed in Artificial Intelligence is hugely big in size. (Please have a look here, where it introduces this protocol.)
One of the serious questions is that where does data get stored, especially, when the sizes of data are huge? The Ocean protocol white paper answers this question as follows:
Ocean itself does not store the data. Instead, it links to data that
is stored, and provides mechanisms for access control. The most
sensitive data (e.g. medical data) should be behind firewalls,
on-premise. Via its services framework, Ocean can bring the compute to
the data, using on-premise compute. Other data may be on a centralized
cloud (e.g. Amazon S3) or decentralized cloud (e.g. Filecoin). In both cases it should be encrypted. This means that
the Ocean blockchain does not store data itself. This also means that
we can remove the data, if it’s not on a decentralized and immutable
substrate.
So, Does not it mean that Ocean blockchain cannot provide immutability of shared data?

You are correct. Ocean itself does not store the data, so it has no control over whether the data is stored immutably (or not).

Related

Network Simulation using NS-3

Can I simulate susceptible-infected-susceptible (SIS) model using NS-3?
I'm aiming to model malware flow using SIS and trying to simulate using NS-3.
I'm a newbie to networks, and have been searching for this since hours, going through tens of research papers but can't find anything similar.
SIS implies some sort of discovery mechanism to spread the infection of a network. These discovery mechanisms typically use exploits in network capable daemons. So, to simulate SIS, you want a simulator that works at the application level of the network stack.
ns-3 has some application-level capabilities, but it's mostly intended to be used to simulate the network stack below the application level. The application-level capabilities that ns-3 does have are limited to traffic generation. Discovering the presence of a service on a Node, let alone a compromised version of a daemon, is not supported.
So, it seems like you'll need to find another simulator. I'm not sure what your options are, but depending on how complex of a simulation you want, you could just roll your own by representing the network as a graph, and infecting a susceptible node with probability p.

AWS wordpress - calculating network data transfer charge

I'm trying to calculate the price of network data transfer in and out from an AWS WP website.
Everything is behind Cloudfront. EC2/RDS returns dynamic resources and few statics, S3 returns only static resources. The Application Loadbalancer is there just for autoscaling purpose.
Even if everything seems simple the experience taught that the devil is in the detail.
So, at the end of my little journey (reading blogs and docs) I would like to share the result of my search and understand what the community thinks of.
Here is the architecture, all created within the same region/availability zone (let's say Europe/Ireland):
At time of writing, the network data transfer charge is:
the traffic out from Cloudfront (first 10 TB $0.15/GB per month, etc.)
the traffic in and out from the Application load balancer (processed bytes: 1 GB per hour for EC2 instance costs ~7.00$/GB)
For the rest, within the same region is free of charge and Cloudfront does not charge the incoming data.
For example: within the same region, there should be no charge between an EC2 and an RDS DB Instance.
Do anyone knows if I'm missing something? There are subtle costs that I have to oversee?
Your question is very well described. Thanks for the little graph you drew to help clarify the overall architecture. After reading your question, here are the things that I want to point out.
The link to the CloudFront data transfer price is very outdated. That blog post was written by Jeff Barr in 2010. The latest CloudFront pricing page is linked here.
The data transfer from CloudFront out to the origin S3 is not free. This is listed in "Regional Data Transfer Out to Origin (per GB)" section. In your region, it's $0.02 per GB. Same thing applies to the data from CloudFront to ALB.
You said "within the same region, there should be no charge between an EC2 and an RDS DB Instance". This is not accurate. Only the data transfer between RDS and EC2 Instances in the same Availability Zone is free. [ref]
Also be aware that S3 has request and object retrieval fees. It will still apply in your architecture.
In addition, here is a nice graph made by the folks in lastweekinaws which visually listed all the AWS data transfer costs.
Source: https://www.lastweekinaws.com/blog/understanding-data-transfer-in-aws/

What does big data have to do with cloud computing?

What does big data have to do with cloud computing?
I have try to explain the relation between big data and cloud computing.
They overlap. One is not dependent on the other. Cloud computing enables companies to rent infrastructure over time rather than pay up-front for computers and maintain it over time.
In general, cloud vendors allow you to rent out large amounts of server pools and build networks of servers (clusters).
You can pay for servers with large storage drives and install software like Hadoop FileSystem (HDFS), Ceph, GlusterFS, etc. These softwares will make a single "shared filesystem". The more servers you combine together into this filesystem, the more data you can store.
Now, that's just storage. Hopefully, these servers also have some reasonable amount of memory and CPU processing. Other technology such as YARN (with Hadoop), Apache Mesos, Kubernetes/Docker allow you to create resource pools to deploy distributed applications that spread over all those servers and read data that's stored in all those other machines.
The above is mostly block storage, though. The alternative, cheaper alternative is object storage such as Amazon S3, which is a Hadoop Compatible filesystem. There are other object storage solutions, but people use this as it's more highly available (via replication) and can be secured easier with access keys and policies
Big data and Cloud Computing are one of the most used technologies in today’s Information Technology world. With these two technologies, business, education, healthcare, research & development, etc are growing rapidly and will provide various advantages to expand their areas with tricks and techniques.
In cloud computing, we can store and retrieve the data from anywhere at any time. Whereas, big data is the large set of data which will process to extract the necessary information.
A customer can shift to Cloud Computing when they need rapid deployment and scaling of the applications. The application deals with highly sensitive data and requires strict compliance one should keep things on the cloud. Whereas, we can use Big Data for traditional methods and here frameworks are ineffective. Big data is not replacement for relational database system and big data solve specific problem statement related to large data sets and most of the large data sets do not deal with small data.
Big Data Technology is Hadoop, MapReduce, and HDFS. while Cloud Computing includes three types which are: public, private, hybrid and community cloud.
Cloud computing provides enterprises a cost-effective & flexible way to access a vast volume of information we call the Big Data. Because of Big Data and cloud computing, it is now much easier to start an IT company than ever before. When the combination of big data and cloud computing was first initiated, it opened the road to endless possibilities. Various fields have seen many drastic changes that were made possible by this combination. It changed the decision-making process for companies and gave a huge advantage to analysts, who could base their results on concrete data.

DICOM C-StoreSCP: Is there any way to know for sure that the study is received completely?

This question is next part of my other question.
My SCP receive images from multiple clients.
Each client behaves differently.
Some clients send complete study on only one association; so in this case, when association is closed, SCP can know that complete study is received.
Some clients send multiple studies on same association; which is DICOM legal.
Some clients send one study on multiple associations; which is DICOM legal.
Data transfer happens on unstable internet. If study is being transferred and connection disconnects for any reason, instances those were successfully stored will not be sent again. Only failed/pending instances will be sent in next attempt.
Considering all above, is there any DICOM way to know study is received completely.
Storage commitment is not good solution in my understanding. Most of the users do not support it. Also, this feature is designed for SCU to know if instance is stored on SCP; not other way around.
MPPS is also not reliable. Please refer Conclusion section of my other question.
I read this post which has similar requirement. Timeout solution mentioned there is not reliable in my understanding.
The goal of DICOM Storage Service is to allow simple transfer of information Instances (objects) such as images, waveform, reports etc. So the concept of Study is not associated with this service.
Also, there is no clear cut definition of what constitutes a "Study" and standard does not constrain how a study is transferred from SCU to SCP (e.g. in a single association, multiple association or any time constrain).
Storage Commitment Service also operates on Instances and it is in implementer desecration when to send the Storage Commitment request. As for example, each time modality captures an image and stores it to PACS or once all images are transferred to PACS or when modality needs to free-up the local drive space etc.
However, an SCU can send study level C-MOVE to an SCP for transfer all Instances sharing the same Study Instance UID (a study) to a destination SCP.
So, there is no definitive way to know if a client completed sending a study.

NBAD, Netflow on layer 7

I'm developing Network Behavior Anomaly Detection and I'm using Cisco protocol NetFlow for collecting traffic information. I want to collect information about layer 7 of ISO OSI Reference Model, especially https protocol.
What is the best way to achieve this?
Maybe someone find it helpful:
In my opinion you should try sFlow or Flexible NetFlow.
SFlow uses a sampling to achieve scalability. System architecture consists receiving devices getting two types of samples:
-randomly sampling packets
-basis of sampling counters at certain time intervals
Sampled packets are sent as sFlow datagrams to a central server running the software for the analysis and reporting of network traffic, sFlow collector.
SFlow may be implemented in hardware or software, and while the name "sFlow" means that this is flow technology, however, this technology is not flow at all, and represents the transmission image on the basis of samples.
NetFlow is a real flow technology. Entries for the flow generated in the network devices and combined into packages.
Flexible NetFlow allows customers to export almost everything that passes through the router, including the entire package and doing it in real time, like sFlow.
In my opinion Flexible NetFlow is much better and if you're afraid of DDoS attack choose it.
If FNF is better why use sFlow? Cause many switches today only supports sFlow, and if we don't have possibility of use FNF and want to get real-time data sFlow is best option.

Resources