Falcor GraphQL in big project - redux

I read a lot of articles about Falcor and GraphQL. And noone can say how they help in big projects! I use Redux + React for a long time (also RESTAPI), can't understand what BIG problem Falcor and GraphQL solve.
Someone can explain it in very simple way ?

When you try to understand a new thing such as GraphQL, it helps to compare it with something existing, for example, REST, which you already know.
Imagine we have several web and mobile applications that retrieve data from the same server. In RESTful architecture, we design each entity as a resource. When request for fetching a resource is received, the server usually returns everything about that resource. Thus, the clients get redundant and unnecessary data which consumes bandwidth. Depending on the scenario, this can total an amount significant enough for the client's performance (think about mobile clients).
How about the clients specifying exactly which data they need and the server sends only those data? GraphQL enables us to achieve this.
Is GraphQL suitable for BIG projects?
Like pretty much everything in life, it depends. Not all projects, regardless of their sizes, have the same requirements. Determine your project's requirements. Consider the available technologies and their pros and cons. It's a trade-off. There's no silver bullet or one-size fits all solution. Nonetheless, Facebook uses GraphQL and there are strong reasons to consider their project as BIG.

Related

Send event directly to server container via HTTP request instead of web container

After some experimenting, I noticed it is possible to send events directly to a server container via HTTP request instead of pushing to the data layer (which is connected to a web container). A big advantage of this setup is that the front-end doesn't need to load any GTM script. Yet, I have some doubts because I don't find much documentation about this setup. This setup also brings some challenges like implementing automatically collected events (e.g. page_view). Does anyone have experience with this setup or is able to tell me why I shouldn't be following this path?
Regards, Thomas
This is definitely not a best practice, although this is actually a technically more beneficial path since... A few things, actually:
Can make your tracking completely immune to adblockers.
Has the potential to protect from malicious analytics spam, also makes it way harder for third parties to spoil your data.
Doesn't surface your analytics stack and libraries to the public.
Is typically way lighter than the GTM lib.
You have a much better degree of control about what happens and have much more power over the tracking.
But this is only if you have the competency to develop it, which is a rarity, actually. Normally web-developers don't know analytics well enough to make it work well while analytics developers lack the technical knowledge. You now suddenly can't just hire a junior or mid implementation expert to help with the tracking. A lot of those who call themselves seniors wouldn't be able to maintain raw JS tracking libraries either.
As you've mentioned, you won't be able to rely on automatic tracking from GTM or gtag libraries. And not having automatic events is actually not the issue. The more important thing is manually collecting all dimensions, including the proper maintenance of client ids and session ids.
Once your front-end is ready, it's important to note that you don't want to expose your server-side GTM's endpoint. I mean, you can, but this would defeat the purpose significantly. You want to make a mirror on your backend that would reroute the events to the sGTM.
Finally, you may want to make up some kind of data encryption/protection/validation/authentication logic on your mirror for the data. You may consider it just because without surfacing the endpoints, you're now able to further conceal what you're doing thus avoiding much of potential data tampering. This won't make it impossible to look into what you're doing, of course, but it will make nearly impossible any casual interference.
In the end, people don't do it because this would effectively double the monetary cost of tracking since sufficient experts would charge approximately double from what regular analytics folks charge. However, the clarity of data will only grow about 10-20%. Such an exchange generally doesn't make business sense unless you're a huge corporation for which even enterprise analytics solutions like Adobe Analytics is not good enough. Amazon would probably be a good example.
Also, If you're already redefining users and sessions, you're not that far from using something like Segment for tracking and then ETLing all that into a data warehouse and use a proper BI tool for further analysis. And now is there still sense in having the sGTM at all if you can just stream your events to Segment realtime from your mirror, and then it can seamlessly re-integrate this data into GA, Firebase, AA, Snowflake, Facebook and tens if not hundreds more destinations, and this all server-side.
You want to know where to stop, and the best way to do it is by assessing the depth of the analysis/data science your company is conducting on the user behavioral data. And in 99% of cases, it's not deep enough to even consider sGTM.
In response to #BNazaruk
So it's been a while now… I've been looking into the setup, because it’s just way too cool. I also took a deeper dive into CGTM to better understand the benefits of SGTM. And honestly, everything that has the probability to replace CGTM should be considered. My main reasons are;
Cybersecurity - Through injection it is possible to insert malicious software like keyloggers. The only thing that withholds this, are the login details to CGTM. These are, relatively speaking easy to get with targeted phishing.
Speed - A CGTM setup, with about 10 - 15 tags, means an avg performance loss of 40 points in Lighthouse.
Quality - Like you said; because browser restrictions like cookie policies and ad blockers that intercept/manipulate/block CGTM signals: On avg. 10-20% of the events are not registered in proper fashion.
Mistakes - Developing code outside a proper dev process, limits the insight into the impact of the code with possible errors or performance loss as a result.
So far I have created a standardized setup (container templates, measurement plans, libraries) for online marketers and developers to use. Within the setup, we maintain our own client and session ID’s. Developers are able to make optimal use of SGTM and increase productivity drastically. The only downside to the setup is that we still use CGTM to implement page_view and exceptions. Which is a shame, because I’m not far away from a full server-to-server setup. Companies are still too skeptical to fully commit to SGTM I guess. Though, my feeling says that in 5 years time, high-end apps won't use CGTM anymore.
Once again, thanks for your answer, it’s been an important part of my journey.

Architecture For A Real-Time Data Feed And Website

I have been given access to a real time data feed which provides location information, and I would like to build a website around this, but I am a little unsure on what architecture to use to achieve my needs.
Unfortunately the feed I have access to will only allow a single connection per IP address, therefore building a website that talks directly to the feed is out - as each user would generate a new request, which would be rejected. It would also be desirable to perform some pre-processing on the data, so I guess I will need some kind of back end which retrieves the data, processes it, then makes it available to a website.
From a front end connection perspective, web services sounds like it may work, but would this also create multiple connections to the feed for each user? I would also like the back end connection to be persistent, so that data is retrieved and processed even when the site is not being visited, I believe IIS will recycle web services and websites when they are idle?
I would like to keep the design fairly flexible - in future I will be adding some mobile clients, so the API needs to support remote connections.
The simple solution would have been to log all the processed data to a database, which could then be picked up by the website, but this loses the real-time aspect of the data. Ideally I would be looking to push the data to the website every time the data changes or now data is received.
What is the best way of achieving this, and what technologies are there out there that may assist here? Comet architecture sounds close to what I need, but that would require building a back end that can handle multiple web based queries at once, which seems like quite a task.
Ideally I would be looking for a C# / ASP.NET based solution with Javascript client side, although I guess this question is more based on architecture and concepts than technological implementations of these.
Thanks in advance for all advice!
Realtime Data Consumer
The simplest solution would seem to be having one component that is dedicated to reading the realtime feed. It could then publish the received data on to a queue (or multiple queues) for consumption by other components within your architecture.
This component (A) would be a standalone process, maybe a service.
Queue consumers
The queue(s) can be read by:
a component (B) dedicated to persisting data for future retrieval or querying. If the amount of data is large you could add more components that read from the persistence queue.
a component (C) that publishes the data directly to any connected subscribers. It could also do some processing, but if you are looking at doing large amounts of processing you may need multiple components that perform this task.
Realtime web technology components (D)
If you are using a .NET stack then it seems like SignalR is getting the most traction. You could also look at XSockets (there are more options in my realtime web tech guide. Just search for '.NET'.
You'll want to use signalR to manage subscriptions and then to publish messages to registered client (PubSub - this SO post seems relevant, maybe you can ask for a bit more info).
You could also look at offloading the PubSub component to a hosted service such as Pusher, who I work for. This will handle managing subscriptions and component C would just need to publish data to an appropriate channel. There are other options all listed in the realtime web tech guide.
All these components come with a JavaScript library.
Summary
Components:
A - .NET service - that publishes info to queue(s)
Queues - MSMQ, NServiceBus etc.
B - Could also be a simple .NET service that reads a queue.
C - this really depends on D since some realtime web technologies will be able to directly integrate. But it could also just be a simple .NET service that reads a queue.
D - Realtime web technology that offers a simple way of routing information to subscribers (PubSub).
If you provide any more info I'll update my answer.
A good solution to this would be something like http://rubyeventmachine.com/ or http://nodejs.org/ . It's not asp.net, but it can easily solve the issue of distributing real time data to other users. Since user connections, subscriptions and broadcasting to channels are built in to each, that will make coding the rest super simple. Your clients would just connect over standard tcp.
If you needed clients to poll for updates then you would need a que system to store info for the next request. That could be a simple array, or a more complicated que system depending on your requirements and number of users.
There may be solutions for .net that I am not aware of that do the same thing, but those are the 2 I know of.

Pros/cons of sharding using "app+db nodes" vs. separately sharding the db and load-balancing the app servers

We are preparing to scale the API side of an API-heavy web application. My (technically savvy) client proposes a rather unconventional approach to this: instead of balancing the load to several app servers, which would talk to a sharded database, he wants us to:
“shard the app servers”, putting both app server code and db on each physical server, so that the app server only connects to its own db shard;
have the app servers talk to each other when they need to access other shards (instead of talking to another shard's DB directly);
have the API client pick an app shard itself (on the client side, based on some stable hash) and talk directly to it.
The underlying reasoning is that this is the most natural thing to do it, and that this would allow us to move to a multisite distributed system in the future.
(The stack is PHP + Node.js on MySQL, although at this point a transition to MongoDB is considered too.)
Now, I don't see huge problems with it off the shelf. It might get somewhat cumbersome to code these server-to-server interactions, but then it will surely have its own benefits. Basically I'm at a loss on whether this is a good idea or not.
What pros and cons come to your mind? I'm looking for technical issues and advantages here. Thanks!
This is just plain bad for many reasons.
The API client should not know which app shard to talk to. This will limit you in ways you probably can't foresee now, but may/will become a problem in the future. The API client should play dumb so you can route requests appropriately if an app server dies, changes, gets sharded again etc.
What happens if your app code or database architecture is slow? (Not both at the same time, just one). Now you have a db shard slowing down an app shard.
Your db+app shards will need to keep both app code+memory and db code+memory in RAM. This means the CPUs will spend more time swapping code and memory in and out to perform both sets of tasks.
I'm finding it hard to put down in words, but this type of architecture screams 'bad coupling' and 'no separation of concerns' (probably not the right terminology but I hope you understand what I mean). You are putting two distinctly different types of applications (app server and database) onto one box. The management nightmare of updating them and routing around failed instances will be very difficult.
I hate to argue my point this way, but a lot of very smart people have dealt with these problems before and I've never heard of this type of architecture. There's probably a reason for it. Not to mention there's a lot of technology and resources out there that can help you handle traditional sharding and load balancing of app and database servers. If you go with your client's suggested architecture you're on your own.

How to upgrade my asp.net app to support more users?

When an asp.net website has about 1,000 active users, it works good.
How should I do if the website has about 100,000 active users?
How to upgrade my asp.net app to support a larger number of users?
Changing the webApp's architecture?
Or buying more web servers?
I just wonder in the real-world, how do other people build an asp.net website supporting millions of users? What's the app architecture of a website to support that?
Any suggestion will be welcome.
First, make sure you're with a first rate hosting provider.
Second, download a performance profiler (I always suggest Red Gate Performance Profiler) and profile your app. Find the bottlenecks and eliminate them. Repeat until you get your desired performance metric.
If your application is querying a database or other web services, try to use asynchronous methods. Using asynch methods will free up the web server to handle a lot more client requests while it is waiting for a response from the database server or web service.
You say it "works good" at the moment. It's impossible to know what the point at which this may change will be wihtout knowing a whole lot more about the nature of your traffic, current set up, what else runs on the server, etc ,etc. It could be that it continues to "work good" with a million users as it is.
When you need to make changes (and slowly reducing performance will alert you), that's whne you need to worry. And then, as Justin says, knowing the potential bottelnecks will give you pointers as to what solution you need.
Buying more servers is one strategy. So is changing the architecture. The easiest and cost effective is throwing more servers at it. It does depend a little bit on the current application architecture, but nothing that can't be easily overcome.
What I suggest, is to load test your application. See what happens as you increase the active users. Who knows it might handle 100k active users, maybe it won't but at least you will know the tipping point.
In regards to what you should do, that really depends on your business needs. If your company has the $$ and this is a core product, then it makes sense to architect a robust application. If it's not, maybe throwing hardware at the problem is good enough.
It would also help if you could define an active user. Is it someone who is visiting your site and has a session? Is it 100k concurrent requests to the server...?
In terms of hardware scaling: Scaling Up or Scaling out
Software scaling - Profile your app

How robust is the asp.net profile system? Is it ready for prime time?

I've used asp.net profiles (using the AspNetSqlProfileProvider) for holding small bits of information about my users. I started to wonder how it would handle a robust profile for a large number of users. Does anyone have experience using this on a large website with large numbers of simultaneous users? What are the performance implications? How about maintenance?
Running against this via SQL I have found is a bit tricky, but i have worked with clients that have scaled it up to a few hundred properties, and 10K+ users without difficulty. Granted not a lot of users but it is working thus far.
I think it really depends on the specific project, and your exact needs when it comes to working with the profile information. Do you need to query on it regularly via SQL? Do you just need to for user display only, these types of things might help provide a more solid answer for your needs.
The SQL provider performance is more closely correlated to big iron throughput. Performance is more or less directly proportional to a single SQL Server's ability to handle the number of queries. Scale-up is the only option, so as such its not really five-nines robust out the box.
You'll have to figure out if you need scale-out performance and availability e.g. through partitioning, replication, redundancy etc. and at what cost to performance. Some of the capabilities are are possible as is - the current implementation is more aimed at the middle-market and enterprise.
Good thing is you can put your own implementation of the profile provider - then attach it to services and systems with capabilities outlined above.
We wrote a custom authn,authz and profile provider and strapped it to large AD/LDS LDAP cluster across 3 datacenters. We're in the Comscore Top 10 - so you could say that we deal with a good slice of internet every day. 1000's of profile queries per second and 100'millions of profiles - it can scale with good planning, engineering and operations.

Resources