How to connect a program to a (school) website to extract data - information-extraction

This is just brought up by a program created for Universities that pulls entire lists of available classes, codes, teachers, times, locations.
To access this information, I need to log in to my university's secured website and search for individual classes. How do programs (i've seen iphone apps, etc), where you search for a university, a class and it displays CURRENT and updated rosters for classes.
How do programs access this data without an API, or login credentials to pull available course data.

Assuming the university doesn't have some sort of API, these apps are probably just scraping html data off the screen and pulling relevant pieces out for usage in their app. This can work, but it's always a pain to work because sites will often change their HTML structure, which would necessitate you rewriting your screen scraper to compensate.
Be aware that certain university data usage can be restricted under federal FERPA laws. Since the university doesn't have a public API of some sort to get that data, you'd be wise to check with them on how they feel about you pulling data from their site for usage in your app. That'll avoid big problems down the line when they find out your app is grabbing data from their secured website.

Related

Can Alfresco function as an archives management system?

I am trying to figure out what is the exact difference between a document management system and archives management system? For example, what is the difference between Alfresco and Archivesspace (http://www.archivesspace.org/)?
Can Alfresco function as an archives management tool? What is the difference between the two? I read there is a record management module in Alfresco, is this what is meant by archives management?
Can Alfresco be used as an Archives Management System? Yes, of course. One real world example of this is the New York Philharmonic. They digitized their musical scores and associated artifacts going back to 1842 and then made them available online for researchers. Here is a video about it.
At its heart, Alfresco is a repository that allows you to capture any type of file, secure those files, route those files through workflows, search across the files, and associate metadata with each file. What I've just described are what most people would consider the basic set of functionality present in any worthwhile document management system.
Now, what makes that specific to archival purposes? I'm not an archivist. That's a highly-specialized field. One thing that is missing from my list of functionality above is "capture" or how the artifacts you are archiving will get into the system. This depends on exactly what it is you are archiving. One might use document scanners or high-end photography equipment, for example. None of that is addressed by Alfresco. You'll have to use third-party hardware and software and then integrate it, although many integrations exist between Alfresco and third-party capture vendors.
So I would say, yes, Alfresco can be used for archives management. But perhaps more importantly than considering whether or not a piece of software can be given a label, you should be thinking about how your users will use the software and what it is they need to get done. Then focus on how each of the packages you are evaluating can be used to achieve those goals to try to figure out whether or not each package will be a fit.
The difference is that ArchivesSpace is an 'archives information management system', whereas Alfresco is a full 'content management system', which means that it can manage any type of content.
What ArchivesSpace is:
ArchivesSpace Version 1.0 was completed in August 2013. It includes basic functionality for accessioning, processing, description, digital object description, and authority control workflows for archival material, as well as for searching descriptions and exporting metadata objects such as EAD, MARCXML, MODS, Dublin Core, METS, and CSV.
http://www.archivesspace.org/developmentplan
As for Alfresco:
The Alfresco One platform allows organizations to fully manage any type of content from simple office documents to scanned images, photographs, engineering drawings and even large video files.
http://www.alfresco.com/products/one/aws?utm_expid=11184972-12.IcCW-3j6RMavigPGfjODyw.1&utm_referrer=http%3A%2F%2Fwww.alfresco.com%2F
What the difference ultimately comes down to is not what it can store but what functionality you get in addition. ArchivesSpace seems to be a simple implementation of a document storage system that stores documents in collections with associated metadata. Alfresco also offers workflows, custom actions, previews, sites, wikis etc.
If your specific use case is related to archiving off documents specifically and you want something that will already be good at this then go ahead and use ArchivesSpace, if not, or if you want to expand the system out in future, then Alfresco will likely be able to do more but will likely take more effort to configure to your specific use case as you will have to create a custom content model and such.
Alfresco Records Management is for managing documents that will likely have some legal significance, such as court papers, official government department responses etc, and as such their creation and destruction need to be closely managed. As far as I can see this is not something ArchivesSpace can do.
(Full disclosure: I work for an Alfresco partner)

executing code of a webapp from mutiple tenants within a single process

Can you explain multi tenancy in more detail? how to check whether that is working or not?
What is a http adaptor? Can we create two http adaptors in a single process?Correct me if I am wrong?
To give you a brief overview of multi tenant concepts may run through several pages. IMHO, the concept of multi tenancy for a developer can be mentioned as,
Single code base or multiple code bases (Based on the level of multi-tenancy) set up in a server or in a server farm to cater to all the disparate tenant's that may have varying user experience, varying applications (managed through subscription) and each tenant given a feel of dedicated application by showing them their own data and the corresponding bills for what features of the app they are using.
If you maintain a single code base, it is complicated in terms of development and a piece of cake when it comes to upgrade or bug fix etc...
You should Google around for multi-tenancy. The sample link is :http://blogs.gartner.com/alessandro-perilli/multitenancy-is-not-just-network-isolation-and-rbac/
Please fell free to post your specific focus area in multi-tenancy and technology that you are opting for so the community help will be to the grain.

Are there any real-world case studies on the ASP.NET Dynamic Data Framework (DDF)?

I just wrapped up an arch review and next-gen recommendation for a client of ours that needs about the deepest level of customization I’ve ever seen for an application. Their desire is to customize their enterprise web application from the UI to the back-end by customer (40+ customers needing control-level customization). The customization will even include special business rules engines and very complex logic involving the transportation industry. As much as is possible, they want developer nirvana by automating everything so customizations can be driven by their customers and have minimal to no involvement by their devs.
Based on my research, though there will need to be some additional plumbing built in as well as security, the DDF will get them closer to their goals more than anything else out there. However, they're requesting more detailed information than what I provided for them.
I really need a case-study or some other such testimony of an enterprise-level company that has successfully implemented the DDF and gives details as to the enterprise problems it solved for them. Any direction or help would very much be appreciated. Thanks!
Since it is now July, your question is probably OBE by now. However, I have designed and fielded a transportation scheduling web app (ASP.Net 4.0) currently in use by 15 facilities within the Army and Air Force using Dynamic Data. This is a single instance, scalable web appplication adapting to customer requirements through database resident configuration settings. I extended the field templates to use Telerik ASP.Net controls and be configurable by user role and facility.
I have found little in the metadata that was much of a hindrance in providing a flexible configurable UI.
Well at least one word of caution. One important aspect (and selling point) of DDF is the assignment of metadata attributes to help scaffold columns and tables and the use of new dynamic data controls to gain advantage of that metadata (like QueryableFilterUserControl or DynamicDataManager or PageAction). One aspect of metadata however is that it is assigned at run time, and cannot be manipulated once the application has started. Therefore different users would all be logging into basically the same metadata set, and customization based on user would be a nightmare. You can certainly set security and permissions based on group roles, but control level customization would be difficult. I hope this helps.

Backoffice and Frontoffice to separate projects

I'm building a project using mvc framework.
I'm at a point where i need to decide if I should separate frontend and backoffice to two mvc applications
This is to make my solution tidy and well structured. But at the same time I don't want to increase maintenance on the long run. Can you please share with me your experience on the long term when application becomes quite large, if it is better to have two seperate projects or just have a Backoffice folder under the main web project.
Another concern that I am realizing is how do I handle images between these two projects.. User uploads photos from backoffice to a folder that reside under backoffice application then how do I display these photos through frontend or vice versa..
Thanks
One way to think about this is to ask "Are the fundamental actions and desires of the users different". Then ask, "What's the scale of the back office site". The larger, the more likely I'd split it. If it was just a handful of admin pages or a couple reports, I'd live with it in the original project. In addition, "Do the front and back offices need to scale at different rates?" 1 million hits / hour vs 20 fulfillment staff to make numbers up out of no where. And thirdly, "Do the front and back office customers live in different security domains" Being able to deploy the back office code behind the firewall would make it a tocuh safer.
There is overhead for the developer, but sometimes that's okay if you get clarity, security, or simplicity.
A suggestion, assuming you want to split it, split into three projects: Two web front ends and a library that holds shared code and resources, like some basic database access code. Actually, three may be two few projects if you want to share helpers, etc.
I'm a huge fan of separating the apps as the advantages tend to outweigh the advantages in most cases. Ball points out a number of the big points, mainly revolving around the different use cases and security contexts. The other big kickers for me are:
a) You can get moving on the back office stuff while some of the front-end details are still up for discussion. IE--you are not blocked by marketing being all wishy/washy about what template and color scheme to use.
b) You can make different technology choices depending on app. EG, your back office could use traditional ASP.NET webforms because you are not concerned with SEO and other webby behavior as well as can guarantee what browsers and bandwidth capabilities you will have to deal with.
Overhead-wise, there really isn't too much additional work IMHO. And most of those problems are pretty solvable. For example, your images issue could be handled by either storing the images in a database or making a common file store that both apps could reference.

Architecture for Reporting Web Site

I've alluded to this project before, in this question, but the scope of redesign has been slightly tightened, i.e. I can't redesign the whole thing, so I 'd like some general advice on how to structure the existing artefacts in the application as an incremental step in improving the design.
The site has two areas of functionality, v.i.z. reporting and maintenance. This is the major function of the site, not data manipulation, but just presentation. The site includes a small number of maintenance pages that use standard GridViews and FormViews for maintaining a small area of data. This is why I have decided against a rich, complex DAL in favour of the Enterprise Library DAAB and plain vanilla datasets.
Each report is an isolated page that uses results of a dynamic SQL query to explicitly render HTML table rows for the report. In order to not maintain one set of queries for both mySQL and MSSQL for each report, I will move all data access to stored procedures, removing coupling to either DB engine through the DAAB.
I am looking at reporting tools to decouple the report structure definition from the report presentation. I would prefer to not define report structure in classes, as telerik reporting does, but have yet to look at other reporting tools.
All reports share a common filter page that is presented on choosing a report from a menu, which redirects to the chosen report once the user is happy with their filter selections. Is there any guidance available for this very common scenario that I seem to have to keep reinventing?
I am looking for general advice on how to move this toward a better structured product, without actually restructuring the whole project. Simple stuff like separating maintenance pages from report pages in sub-directories, etc.
I'm not looking for others to do my work, and will in due course have implemented many improvements of my own making, but I would appreciate general opinions on how other people would handle a project like this.
For presentation of reports based upon a database of facts I would investigate one of the tools that has got all the core functionality already implemented, such as BIRT (and I am sure that there are some .NET alternatives). You might think a table of data is good enough for an intelligent person to parse, but down the line someone will ask for graphs and PDFs.
The maintenance / admin pages can be a standard website - forms and backend, using whatever framework you are comfortable with.
Architecturally wise, are you going to be running the reports against a live database, or an archive database (or the failover/replication DB)?
Are you generating aggregate data tables (fact tables) to generate reports from, or running against the more complex main schema?
I'm going to go with a management system for metadata on reports, with a simple directory structure for a page-per-report system, using the DevExpress ASPxGridView. This grid is more than capable of meeting the reporting needs of this application.
I'll be implementing maintenance and admin as standard web forms, with a shared filter form for most reports, dynamically configured based on report metadata. There will be no runtime maintenance of reports, as adding a report requires adding a page. This is acceptable as maintenance will be infrequent.

Resources