One of the interviewer had asked me the ways to filter dataview.
I replied as;
(A) Dataview
(B) RowFilter
(C) Select
Is there any other way apart from mentioned above?
Besides those options you can also use LINQ-to-DataSets to filter data in memory.
Additionally, a superior answer in an interview would ask whether or not filtering a DataSet is the best approach in a given situation. Sometimes it is best to cache data and then filter in memory and sometimes it's better to just add the filters onto the original SQL call and have the database filter. Neither option is correct in all situations--it's case by case.
In my opinion a good interview question and answer is more of a discussion of options and pros and cons as opposed to just knowing the answer to some random question.
Related
I'm designing an API and I want to allow my users to combine a GET parameter with AND operators. What's the best way to do this?
Specifically I have a group_by parameter that gets passed to a Mongo backend. I want to allow users to group by multiple variables.
I can think of two ways:
?group_by=alpha&group_by=beta
or:
?group_by=alpha,beta
Is either one to be preferred? I've consulted a few API design references but no-one seems to have a view on this.
There is no strict preference. The advantage to the first approach is that many frameworks will turn group_by into an array or similar structure for you, whereas in the second approach you need to parse out the values yourself. The second approach is also less verbose, which may be relevant if your query string is particularly large.
You may also want to test with the first approach that the query strings always come into your framework in the order the client sent them. Some frameworks have a bug where that doesn't happen.
I've been looking for a proper implementation of hash map in R, with functionalities similar to the map type in Python.
After some googling and searching the R documentations, I found that environment and named list are the ONLY options I can use (is that really so?).
But the problem with the two is that they can only take charaters as key for the hashing, not even a number, let alone other type of things.
So is there a way to use arbitrary things as key? or at least more than just characters.
Or is there a better implemtation of hash map that I didn't find with better functionalities ?
Thanks in advance.
Edit:
My current problem: I need a map to store the distance relationship between data points. That is, the key of the map is a tuple (p1, p2) and the value is a number.
The reason I asked a generic question instead of a concrete one is that I'm learning R recently and I want to know how to manipulate some of the most fundamental data structures, not only what my problem refers to. So I may need to use other things as key in the future, and I want to avoid asking similar questions with only minor difference every time I run into them.
Edit 2:
I got a lot of very good advices on this topic. It seems I'm still thinking quite in the Pythonic way, rather than the should-be R way. I should really get more R-ly ! I think my purpose can easily be satisfied by a matrix in R. Thanks All !
The reason people keep asking you for a specific example is that most problems for which hash tables are the appropriate technique in Python have a good solution in R that does not involve hash tables.
That said, there are certainly times when a real hash table is useful in R, and I recommend you check out the hash package for R. It uses environments as its base but lets you do a lot of R-like vector work with them. It's efficient and I've never run into a problem with it.
Just keep in mind that if you're using hash tables a lot while working with R and your code is running slowly or is buggy, you may be able to get some mileage from figuring out a more R-like way of doing it :)
How do I choose the right keys for data.table objects?
Are the considerations similar to those for RDBMSs? My first guess was to have a look for some documentation about indexes and keys for RDBMSs. Google came up with this helpful stackoverflow question related to Oracle.
Do the considerations from that answer apply to data.tables? Perhaps with the exception of those relating to UPDATE, INSERT or DELETE type statements? I'm guessing that our data.tables objects won't really be used in that way.
I'm trying to get my head around this stuff by using the documentation and examples, but I haven't seen any discussion on key selection.
PS: Thanks to #crayola pointing me toward the data.table package in the first place!
I am not sure this is a very helpful answer, but since you mention me in the question I'll say what I think anyway. But remember that I am a bit of a data.table newbie myself.
I personally only use keys when there is a clear benefit for it, e.g. merging datatables, or where it seems clear that doing so will speed things up (e.g. subsetting repeatedly on a variable).
But to my knowledge, there is sometimes no real need to define keys at all; the package is already faster than data.frame without keys.
This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
How do you recommend implementing tags or tagging
I have a website with a database that contains a number of articles. I'd like to implement tags similar to the tags on stackoverflow.
I can think of two basic ways to implement them:
Create a separate Tags table with a one-to-many relationship with my Articles table.
Add a Tags text field to my Articles table.
The first approach seems the best but would require two additional tables that would grow quite large. It seems like there would also be considerable overhead updating and maintaining that data.
The second approach would be far easier to implement and maintain, and use less resources. But searching would be less efficient. I'd probably use LIKE or maybe even full-text searching.
I'm interested in which approach others think is best. Or perhaps there another approach altogether.
I would personally go with option 1, You mention two additional tables later so I assume you're thinking of.
Table -Tag
Fields - TagID, TagName
Table -TagArticle
Fields - ArticleID, TagID
Table - Article
Fields - ArticleID, blah, blah, blah
This shouldn't require much more in the way of storage than dumping to a field in Article. Plus it is normalised which will always stand you in good stead for the future and will leave your database far better able to search for articles by tag. As for updating, chances are you'll only be updating occasionally compared to the number of times you're reading so the impact should be negligible and I cant think of any maintenance tasks beyond ensuring your indices are up to date which you're going to have to do on other tables anyway and should be automated.
Fringe benefits mean you can quickly create things like a top tags list or a tag cloud.
The first option is clearly the best of the two. This works with the relational model, and leaves your data normalized. The second option works against the relational model, and breaks normalization. How are you going to run queries such as "give me the top 10 most popular tags"? Or "how many times has the tag 'x' been used?" These queries become trivial with option 1, especially as (assuming Robb's schema) you can keep a Count column against each tag.
Option 2 gains you a slight simplification for a great loss in functionality (and in the long run, I contend, efficiency too). The relational model is tried, tested and works! Use it!
I am currently in the analysis phase of developing some sort of Locale-based Stock Screener ( please see Google's' for similar work) and I would appreciate advice from the SO Experts.
Firstly the Stock Screener would obviously need to store the formulas required to perform Calculations. My initial conclusion would that the formulae would need to be stored in the Database Layer. What are your ideas on this? Could I improve speed( very important) by storing formulas in a flat file(XML/TXT)?
Secondly, I would also like to ask advice on the internal execution of formulae by the Application. Currently I am leaning towards executing formulae on parameters AT RUN TIME as against running the formulae on parameters whenever these parameters are provided to the system and storing the execution results in the DB for simple retrieval later( My Local Stock Exchange currently does NOT support Real Time Stock Price updates). While I am quite certain that the initial plan ( executing at run time) is better initially , the application could potentially handle a wide variety of formulae as well as work on a wide variety of input parameters. What are your thoughts on this?
I have also gone through SO to find information on how to store formulae in a DB but wanted to enquire the possible ways one could resolve recursive formulae i.e. formaulae which require the results of other formulae to perform calculations? I wouldn't mind pointers to other questions or fora at all.
[EDIT]
[This page]2 provides a lot of infromation as to what I am trying to achieve but what is different is the fact that I need to design some formulae with SPECIAL tokens such as SP which would represent Stock Price for the current day and SP(-1) would represent price for the previous day. These special token would require the Application to perform some sort of DB access to retrieve the values which they are replaced with.
An example formula would be:
(SP/SP(-1)) / 100
which calculates Price Change for Securities and my idea is to replace the SP tokens with the values for the securities when Requested by the user and THEN perform the calculation and send the result to the user.
Thanks a lot for all your assistance.
Kris, I don't mean to presume that I have a better understanding of your requirements than you, but by coincidence I read this article this afternoon after I posted my earlier comment;
http://thedailywtf.com/Articles/Soft_Coding.aspx
Are you absolutely sure that the "convenience" of updating formulae without recompiling code is worth the maintenance head ache that such a solution may possibly become down the line?
I would strongly recommend that you hard code your logic unless you want someone without access to the source to be updating formulae on a fairly regular basis.
And I can't see this happening too often anyway, given that the particular domain here, stock prices, has a well established set of formulae for calculating the various relevant metrics.
I think your effort will be much better spent in making a solid and easily extensible "stock price" framework, or even searching for some openly available one with a proven track record.
Anyway, your project sounds very interesting, I hope it works out well whatever approach you decide to take. Good luck!