GAE - Any performace benefits storing shorter names for datastore fields? - objectify

Do we get any performance benefits by naming the data-store fields with shorter names? As in while pulling the data from data-store we might get a benefit during data serialization and de-serialization?
Example:
Before
#Entity
public class Data
{
// Id
private int id;
// Name
private String name;
// Marks
private long marks;
}
After:
#Entity
public class Data
{
// Id
private int id;
// Name
private String n;
// Marks
private long m;
}
Mainly when we fetch multiple[max 1000] records out?

A typical put operation takes about 50 to 100 msec, a get 10 to 20 msec and a query 20 to 100 msec. (You can check it with Appstat https://developers.google.com/appengine/docs/java/tools/appstats.) Most of this time is spend waiting for network or disk. What affects performance more is the number of properties that require indexing. The size of the whole entity is unimportant in the performance. Considering that the slowest instances run at 600Mhz, reducing few characters in the name of a field is negligible.

Related

java DynamoDBMapper - partially mapped entities - amount of Read Capacity Units

Does java DynamoDB load whole Items when the #DynamoDBTable annotated class maps only a subset of their attributes?
example: "Product" table, holding items with these attributes:
id, name, description. I would like to get the names of several products, without loading the description (which would be a huge amount of data).
Does this code load description from DynamoDB?
#DynamoDBTable(tableName = "Product")
public class ProductName {
private UUID id;
private String name;
#DynamoDBHashKey
#DynamoDBTyped(DynamoDBAttributeType.S)
public UUID getId() { return id; }
public void setId(UUID id) { this.id = id; }
#DynamoDBAttribute
public String getName() { return name; }
public void setName(String name) { this.name = name; }
}
...
DynamoDBMapper dynamoDBMapper = ...
dynamoDBMapper.batchLoad(products); // TODO is description loaded? what is the amount of Consumed Read Capacity Units?
As their docs say:
DynamoDB calculates the number of read capacity units consumed based on item size, not on the amount of data that is returned to an application. For this reason, the number of capacity units consumed will be the same whether you request all of the attributes (the default behavior) or just some of them (using a projection expression). The number will also be the same whether or not you use a filter expression.
As you see, projections does not impact on the amount of capacity units used.
BTW, in your case, description field will be returned anyway, because you do not need to annotate every field with DynamoDB annotation, only those, who are keys, or named differently, or need custom converters. All non-annotated fields will populated from the corresponding DB fields automatically.

How to use transactional DatastoreIO

I’m using DatastoreIO from my streaming Dataflow pipeline and getting an error when writing an entity with the same key.
2016-12-10T22:51:04.385Z: Error: (af00222cfd901860): Exception: com.google.datastore.v1.client.DatastoreException: A non-transactional commit may not contain multiple mutations affecting the same entity., code=INVALID_ARGUMENT
If I use a random number in the key then things work but I need to update the same key so is there a transactional way to do this using DataStoreIO?
static class CreateEntityFn extends DoFn<KV<String, Tile>, Entity> {
private static final long serialVersionUID = 0;
private final String namespace;
private final String kind;
CreateEntityFn(String namespace, String kind) {
this.namespace = namespace;
this.kind = kind;
}
public Entity makeEntity(String key, Tile tile) {
Entity.Builder entityBuilder = Entity.newBuilder();
Key.Builder keyBuilder = makeKey(kind, key );
if (namespace != null) {
keyBuilder.getPartitionIdBuilder().setNamespaceId(namespace);
}
entityBuilder.setKey(keyBuilder.build());
entityBuilder.getMutableProperties().put("tile", makeValue(tile.toString()).build());
return entityBuilder.build();
}
#Override
public void processElement(ProcessContext c) {
String key = c.element().getKey();
// this works key = key.concat(":" + UUID.randomUUID().toString());
c.output(makeEntity(key, c.element().getValue()));
}
}
...
...
inputData = pipeline
.apply(PubsubIO.Read.topic(pubsubTopic));
windowedDataStreaming = inputData
.apply(Window.<String>into(
SlidingWindows.of(Duration.standardMinutes(15))
.every(Duration.standardSeconds(31))));
...
...
...
//Create a Datastore entity
PCollection<Entity> siteTileEntities = tileSiteKeyed
.apply(ParDo.named("CreateSiteEntities").of(new CreateEntityFn(options.getNamespace(), options.getKind())));
// write site tiles to datastore
siteTileEntities
.apply(DatastoreIO.v1().write().withProjectId(options.getDataset()));
// Run the pipeline
pipeline.run();
Your code snippet doesn't explain how tileSiteKeyed is created. Presumably it's a PCollection<KV<String, Tile>, but if it might have duplicate String keys, that would explain the issue.
Generally a PCollection<KV<K, V>> may contain multiple KV pairs with the same key. If you'd like to ensure unique keys per window, you can use a GroupByKey to do that. That will give you a PCollection<KV<K, Iterable<V>>> with unique keys per window. Then augment CreateEntityFn to take an Iterable<Tile> and create a single mutation with the changes you need to make.
This error indicates that Cloud Datastore received a Commit request with two mutations for the same key (i.e. it tries to insert the same entity twice or modify the same entity twice).
You can avoid the error by only including one mutation per key per Commit request.

ASP.NET MVC2 LINQ - Repository pattern, where should the pagination code go?

I'm working on adding an HtmlHelper for pagination, but I am unsure where the proper and/or most beneficial place to put certain parts of the pagination code from a performance and maintainability standpoint.
I am unsure if the Skip(), Take() and Count() portions of Linq to SQL data manipulation should live within the repository or the controller.
I am also unsure if their order and where they are used affects performance in any way.
If they live within the repository from my understanding this is how it would work:
1. I would pass the pageIndex and pageSize as arguments to the repository's method that grabs the data from the database.
2. Then grab the full data set from the database.
3. Then store the count of TotalItems of that full data set in a variable.
4. Then apply the Skip() and Take() so the data set retains only the page I need.
5. Display the partial data set as a single page in the view.
If they live in the controller from my understanding this is how it would work:
1. I would grab the full data set from the repository and store it into a variable inside of the controller.
2. Then get the count of TotalItems for the full data set.
3. Then apply the Skip() and Take() so the data set retains only the page I need.
4. Display the partial data set as a single page in the view.
Inside the controller (I realize I will incorrectly get the page count here and not TotalItems):
Character[] charactersToShow = charactersRepository.GetCharactersByRank(this.PageIndex, this.PageSize);
RankViewModel viewModel = new RankViewModel
{
Characters = charactersToShow,
PaginationInfo = new PaginationInfo
{
CurrentPage = this.PageIndex,
ItemsPerPage = this.PageSize,
TotalItems = charactersToShow.Count()
}
};
Inside the repository:
public Character[] GetCharactersByRank(int PageIndex, int PageSize)
{
IQueryable characters = (from c in db.Characters
orderby c.Kill descending
select new Character {
CharID = c.CharID,
CharName = c.CharName,
Level = c.Level
});
characters = PageIndex > 1 ? characters.Skip((PageIndex - 1) * PageSize).Take(PageSize) : characters.Take(PageSize);
return characters.ToArray();
}
This code is a partial example of how I was implementing the Skip(), Take() and Count() code living in the repository. I didn't actually implement getting and returning the TotalItems because that was when I realized I didn't know the proper place to put this.
Part of the reason I am unsure where to put these is that I don't know how Linq to SQL works underneath the hood, and thus I don't know how to optimize for performance. Nor do I know if this is even an issue in this case.
Does it have to grab ALL the records from the database when you do a .Count() on the Linq to SQL?
Does it have to make separate queries if I do a .Count(), then later do a .Skip() and .Take()?
Is there any possible performance problems with using .Count() prior to a .Skip() and .Take()?
This is my first time using an ORM so I'm not sure what to expect. I know I can view the queries Linq to SQL is running, however I feel that listening to someone with experience in this case would be better use of my time.
I would like to understand this more in depth, any insight would be appreciated.
I keep a generic PaginatedList class inside my Helpers folder where I also put other Helper classes.
The PaginatedList is straight out of NerdDinner, and it looks like this.
public class PaginatedList<T>: List<T>
{
public int PageIndex { get; private set; }
public int PageSize { get; private set; }
public int TotalCount { get; private set; }
public int TotalPages { get; private set; }
public PaginatedList(IQueryable<T> source, int pageIndex, int pageSize)
{
PageIndex = pageIndex;
PageSize = pageSize;
TotalCount = source.Count();
TotalPages = (int) Math.Ceiling(TotalCount / (double)PageSize);
this.AddRange(source.Skip(PageIndex * PageSize).Take(PageSize));
}
public bool HasPreviousPage
{
get
{
return (PageIndex > 0);
}
}
public bool HasNextPage
{
get
{
return (PageIndex + 1 < TotalPages);
}
}
}
I found this on the NerdDinner site that Marko mentioned above and it answered a lot of my questions.
From NerdDinner on the bottom of page 8:
IQueryable is a very powerful feature that enables a variety of interesting deferred execution scenarios (like paging and composition based queries). As with all powerful features, you want to be careful with how you use it and make sure it is not abused.
It is important to recognize that returning an IQueryable result from your repository enables calling code to append on chained operator methods to it, and so participate in the ultimate query execution. If you do not want to provide calling code this ability, then you should return back IList or IEnumerable results - which contain the results of a query that has already executed.
For pagination scenarios this would require you to push the actual data pagination logic into the repository method being called. In this scenario we might update our FindUpcomingDinners() finder method to have a signature that either returned a PaginatedList:
PaginatedList< Dinner> FindUpcomingDinners(int pageIndex, int pageSize) { }
Or return back an IList, and use a "totalCount" out param to return the total count of Dinners:
IList FindUpcomingDinners(int pageIndex, int pageSize, out int totalCount) { }

Adding/updating child and parent record same time

Can someone please show me the easiest way to create/update a parent and child record at the same time (like customer with multiple addresses) with least or no code as possible? Both Web Forms and in MVC.
The basic idea would be to create/update the parent record and return the new ID (key). Then use that key to create the related child records. For example, say you have an Events table and a related EventDates table:
public static int CreateEvent(
out int eventId,
DateTime datePosted,
string title,
string venue,
string street1,
string city,
string state,
string zipCode)
{
...
}
public static void AddEventDates(
int eventDateID,
int eventID,
DateTime startDate,
DateTime endDate)
{
...
}
It's important to maintain data integrity here; if one of the updates fails then both need to be returned to the original state. You could implement this yourself or use transactions:
http://msdn.microsoft.com/en-us/library/z80z94hz%28VS.90%29.aspx

ASP.NET; Several session variables or a "container object"?

I have several variables that I need to send from page to page...
What is the best way to do this?
Just send them one by one:
string var1 = Session["var1"] == null ? "" : Session["var1"].ToString();
int var2 = Session["var2"] == null ? 0 : int.Parse(Session["var2"].ToString());
and so on...
Or put them all in some kind of container-object?
struct SessionData
{
public int Var1 { get; set; }
public string Var2 { get; set; }
public int Var3 { get; set; }
}
--
SessionData data = Session["data"] as SessionData;
What is the best solution? What do you use?
A hybrid of the two is the most maintainable approach. The Session offers a low-impedance, flexible key-value pair store so it would be wasteful not to take advantage of that. However, for complex pieces of data that are always related to each other - for example, a UserProfile - it makes sense to have a deeply nested object.
If all the data that you're storing in the Session is related, then I would suggest consolodating it into a single object like your second example:
public class UserData
{
public string UserName { get; set; }
public string LastPageViewed { get; set; }
public int ParentGroupId { get; set; }
}
And then load everything once and store it for the Session.
However, I would not suggest bundling unrelated Session data into a single object. I would break each seperate group of related items into their own. The result would be something of a middleground between the two hardline approaches you provided.
I use a SessionHandler, which is a custom rolled class that looks like this
public static class SessionHandler
{
public static string UserId
{
get
{
return Session["UserId"];
}
set
{
Session["UserId"] = value;
}
}
}
And then in code I do
var user = myDataContext.Users.Where(u => u.UserId = SessionHandler.UserId).FirstOrDefault();
I don't think I've every created an object just to bundle other objects for storage in a session, so I'd probably go with the first option. That said, if you have such a large number of objects that you need to bundle them up to make it easier to work with, you might want to re-examine your architecture.
I've used both. In general, many session variable names leads to a possibility of collisions, which makes collections a litte more reliable. Make sure the collection content relates to a single responsibility, just as you would for any object. (In fact, business objects make excellent candidates for session objects.)
Two tips:
Define all session names as public static readonly variables, and make it a coding standard to use only these static variables when naming session data.
Second, make sure that every object is marked with the [Serializable] attribute. If you ever need to save session state out-of-process, this is essential.
The big plus of an object: properties are strongly-typed.

Resources