Amazon API ASP.NET - asp.net

I am working on a site (ASP.NET MVC) that will ultimately display millions of book. I already have the titles and authors of these books in my MySQL database.
The goal is when a user searches for book, the top 20 matches (title and author) will appear on the page. I then plan to use the Amazon API to get more information (isbn, image, description etc) for these 20 books and flesh out these items via Ajax. I would then also add this info to MySQL so next time these specific books are requested, I already have the data.
My question is what Amazon Web Service should I use? There are so many like Amazon S3, Amazon SimpleDB etc. I just don't know which would be best for my needs. Cost is also a factor.
Any guidance would be greatly appreciated.

The API you're looking for is Amazon's Product Advertising API:
https://affiliate-program.amazon.com/gp/advertising/api/detail/main.html

in short, Amazon S3 is a technology oriented on large data storage whilst SimpleDB is a non relational database (as mongoDB and raven could be).
We use the first for storing the static files (javascript, css and pictures).
The first is cheaper but you can only retrieve a "file" at once. The second gives you some degree of support to queries.
If you need a relational database, you could use Amazon RDS which is a MySql database ready for replicas.

Related

Archive Data from Cosmos DB - Mongo DB

in the project i am working on, we have a database per tenant and each tenant consists of at least 1 department. One of the requirements we have is that when an admin user deletes a department using a custom frontend we've provided, the system should first archive the data of that department on a blob storage before the data is deleted. The same we have for the tenant, we need to archive the data before the database of that tenant is removed from the account.
Now, my question: is there any best practice to do this? We are planning to retrieve all the data from all collections, using a mongo query, based on the department id (which is also the partition key) and then send it to a blob storage. The challenge we have is the execution of the query to retrieve all the data because it can be a huge amount and the RUs required for that action may affect the performance of the system because other users may be using the system while we remove the data.
I looked at mongodump and mongoexport but these are applications so we cannot execute it from our code?
Any ideas? Thanks a lot.
I think one way to solve this is by using ChangeFeed, as it reallyhelps and simplifies writing a carbon copy somewhere else.
However, as of now the change feed processor won't notify you for deleted documents so you can't listen for them, this feature is planned as of now.
Your best bet is to write some custom application that does archiving using Query language support

Is there a reason why not model social network with SQL API?

In one of our apps we need to introduce a feature of users being able to choose friends, see friends activity, etc.
So far we use CosmosDb container with SQL API (for things app does beside this social network aspect).
I am wondering is there a reason not to model it with SQL API but to go strictly with Gremlin?
I’ve seen examples on Microsoft site about modeling basic social network done with ordinary SQL API but i am not sure if i am missing something that would bite me down the road in a case not going with Gremlin?
You should be safe in choosing either. From docs:
Each API operates independently, except the Gremlin and SQL API, which
are interoperable.
The stored data is JSON in both cases.
More on choosing an API.

How do services like Jungle Scout and ASINspector provide Amazon review/rating data if Amazon doesn't provide that info through their API?

I'm trying to write a tool that looks up the average review score and number of ratings for a given Amazon product.
Unfortunately, Amazon seems to intentionally exclude those two things from their API, which has been the subject of many forum threads.
You can technically scrape a product page's HTML and get it, but Amazon will quickly notice that you're running a script and begin serving a CAPTCHA, furthering the idea that they don't want you to collect it.
But with all of that being the case, how do third-party services collect and serve that data? Are they violating Amazon's TOS and collecting it through shady means, or is there some kind of legitimate method that I'm not seeing?

Searchable data in CosmosDb with Graph api

My team uses CosmosDb to store data.
For our use case some of this data needs to be searchable.
Currently there are some filters in the Gremlin that has been implemented in CosmosDb so far, but not enough to suit our needs, which are mainly search in text.
This would be implemented to make a fuzzy search for a vertex, say, a person, where both name, email and company name would be included in the text.
In https://github.com/Azure/azure-documentdb-dotnet/issues/413 there was some talk of some string filters, but there has been no updates for a while.
My question is would it be better to use Azure Search for this use case?
We could add a step in the pipeline that would synchronize our data to an Azure Search service upon doing CRUD, but this would mean slower CRUD as well as data duplication, and the consumer of our api would have to use a search endpoint to get an id, and then do an additional lookup afterwards to get any related data.
If you can expose the data you want to make searchable to Azure Search using a "vanilla" (non-Gremlin) SQL API query, consider using Azure Search indexers for Cosmos DB. However, for simple string matching searches Azure Search may be an overkill - use it if you need more sophisticated searches (natural language-aware in many languages, custom tokenization, custom scoring, etc.).
If you need a tighter integration between Cosmos DB Graph API and Azure Search, vote for this UserVoice suggestion.

Putting records into the Elasticsearch index before the relational database

I have an application which consumes RSS feeds and makes them searchable by performing the following steps:
pulling article from the feed URL
storing that data in a relational DB
indexing the data in Elasticsearch
I'd like to reverse this process so that I can use the RSS River Elasticsearch plugin to pull data from feeds. However, this plugin integrates directly with Elasticsearch, bypassing my relational DB (which is a problem for other parts of the application which rely on each article having a record in the DB).
How can I have Elasticsearch notify the DB when a new article has been indexed (and de-indexed)?
Edit
Currently I'm using Ruby on Rails 4 with a PostgreSQL DB. RSS feeds are fetched in the background using Sidekiq to manage jobs. They go directly into PG and are then indexed by Elasticsearch. I'm using Chewy to provide an interface to the ES index. It doesn't support callbacks like I'm looking for (no Ruby library does afaik?).
Searching queries ES for matches then loads the records from PG to display results.
It sounds like you are looking for the sort of notification/trigger functionality described in this feature request. In the absence of that feature I think the approach suggested in that thread by the user "cravergara" is your best bet - that is, you can alter the RSS river Elasticsearch plugin to update your DB whenever an article is indexed.
That would handle the indexing requirement. To sync the de-indexing, you should make sure that any code that deletes your Elasticsearch documents also deletes the corresponding DB records.

Resources