Getting random Tumblr blogs/posts - web-scraping

I'm working on a data mining project for a class, and part of our project involves obtaining a large set of random posts from Tumblr. Unfortunately, it seems that the Tumblr API (specifically using PyTumblr) has no direct capacity to do this, since it requires either a specific blog name or tag. Is there any reasonable way to do this? My best idea is to grab blog names and/or tags from posts on Trending, but I'm not sure that will be adequate for our purposes. We ultimately want a data set that can be partitioned into posts that are tagged with one or more tags from a specified set, and posts that contain none of those tags.

Related

Redux: How to handle different queries of same type of data?

I'm looking for best practices of handling same type of data in a redux store while we can have different queries for it.
Imagine a wordpress website, on different pages we have different queries for posts. In homepage, for example we fetch 10 most recent posts and save them to redux store, but for a category page we have to again fetch posts in posts store, but now we might have different posts which may/may not include the ones we had before.
And this applies to many different pages, like tag,taxonomy,author,date,etc...
So basically having to create a separate store for each case doesn't seem to be a good solution since it might end up with many duplicate values.
This is one of the standard reasons why the Flux concept was invented in the first place. Per Dan Abramov's article on The Case for Flux, caching queries is an excellent use case for a Flux-type architecture.
Going beyond that, the Redux principle of having a "single source of truth" applies here, as does the idea of normalizing data in the store.
Overall, you'd probably want to store your data in a normalized form, with multiple "tables" in your state. Add each set of results into the state to cache them, and have different parts of the UI read out the specific posts they're interested in as needed.
See the Redux FAQ entry on organizing nested or duplicate data for links to more information.

wordpress custom fields: structured data table

I know how to make custom fields for my custom post types. But I need a pretty complex custom field this time.
I'm making a website about agricultural machinery. There are a lot of types of machines. For every type of machine there are some products available with properties like size, mass, diameter, capacity, ....
Per type there is only one description, one title and one image needed. So my custom post-type machine contains posts (custom type = machines) with the basic information. For every machine I would like to sum up all the different size - mass - capacity - prize information in a table. I know I could just create a html table in the description, but I would really like the data to be structured so I can nicely style it, and eventually even do structured searches for price and stuff.
So ideally I would like to have a custom field that looks a bit like an excel sheet where I can create some columns and add rows per available product in the category. Any ideas how I can do this?
EDIT: I'm trying out the Types wordpress plugin http://wordpress.org/extend/plugins/types/. There's an option to group custom fields together and also an option to allow multiple instances of one field. Would be useful for me if there would be a possibility to allow multiple instances of a group. Does something like that exist?
EDIT: This looks promising. http://wp-types.com/documentation/user-guides/bulk-content-editing-with-fields-table/
I will look into it and maybe answer myself if it turns out to work :)
It seems like wp-types had the solution all along.
http://wp-types.com/documentation/user-guides/bulk-content-editing-with-fields-table/
This is what my custom data table looks like now:
Behind the scenes, all data is not actually saved in a single post. But wp-types allows you to create hierarchical relations between post types, and then provides a way to quickly create sub-posts straight from the parent. That's awesome enough for me.
In my case I created two post types. One called machine-type and another called machine.
machine-type holds all the basic information about the machine (title, description and image) while it's sub-type machine holds the specs for any of the different variants.

Global data in drupal 7

I'm building one village's official site in Drupal 7. I need to create and store some information about village that will be accessible everywhere on the website (e.g. village's name, mayor's name, phone number, email, etc.). I want to define them in the admin site and access them in any node (e.g. all the data will be shown in the section about municipal office and some of them like phone number mentioned in the contact section. What is the best way to do that? Is there some module to handle that? Or should I write the own one? I have tried to search the answer there, but I found only topics about global variables (in PHP).
You can use the functions variable_get() and variable_set() to store arbitrary information that is available on all pages. It is easy to write a form that automatically saves all form fields with variable_set(), see http://drupal.org/node/222158.
Note:
- Saving variables with variable_set() will clear the cache of all variables, you should not use it for information that changes regularly.
- All variables are cached in a single, global cached and fetched on every single page request. You shouldn't store large amounts of data or data that is only used very seldomly.
The answer by Berdir is already very good in case you only want to store the raw data. However, if you always want to display the data in the same way like in some kind of widget format, you have other options, too.
For example, you could create a block with the contact details and you only show it on specific pages.
If you need more flexibility, you might consider to write a small module with different theming functions. You would either store the data directly in the module or in the variable table as outlined by Berdir.
In any case, if you want to allow the user to change this data on his own, you will probably need to write a small form in the backend. Otherwise, the user will need to manipulate the database directly to change the data.
You could also consider Creating advanced theme settings.
See how you can specify the site's logo path in your theme? You could do something similar with the info you would like to display on your site, practically setting up your theme as a template for a village website.
You get to add custom fields in your theme admin settings, field values can then be retrieved by using theme_get_setting()

Measurement sheets in Drupal 6

I'm doing a project for a surveying company in Drupal 6. Ultimately, employees will need to enter measurements into a new content type. Currently they do it in a big Excel spreadsheet that has a bunch of macros to do calculations between cells (Meters to Feet, m^2, some pricing stuff).
My question is:
How would one go about replicating the functionality of this spreadsheet best in Drupal 6, keeping in mind the amount of data the employees using this system gather varies between projects (I.e., 3-floor house versus 1-floor with basement; etc.).
I have two ideas so far:
Create a new content type and populate it with CCK fields. Use Sheetnode's CCK functionality to drop Sheetnode CCK fields to replicate the spreadsheet exactly. Use Views to display this data other ways.
Create a new content type and populate it with CCK fields. Use Views (and possibly something like Views Calc) to get varying measurement values.
That said, I'm completely open to suggestions outside of these two implementations.
Thank you!
I dont have an answer so much as a caution. I did a project where a number of pieces of content were rendered simultaneously on a page, each piece being in its own node. The down side to multiple nodes is the performance hit you will take having at least one (possibly complex) query per node.
So in this case, if there were many many lines in the sheet, and each line was a node, you might take a pretty heavy performance knock.
This might be acceptable - but I thought I'd give you the heads up.
This might be one of those times where its best to actually create a purpose-built mechanism on-top of the Drupal API instead of using nodes+CCK.
Like I said, I could be being over cautious. It depends on your exact usecase.
HTH!
This seems like the kind of question http://drupal.stackexchange.com was made for...
I ended up doing this with jQuery, Measured Value Field and Flexinode; see: Auto-updating width/length/area using jQuery and Drupal6

Best way to model page attribute data onto different database tables

I'm developing a website (using asp.net-mvc) with a SqlServer 2005 database.
I have numerous database tables which drive content pages for the site e.g. I have a table called Activity:
Activity
-----------
ID
Name
So for each activity record, there would be a corresponding 'Activity' page. The same applies for other tables e.g. Location and Person etc.
For the purposes of SEO I want to allow storing of additional info for pages such as html meta title/descritpion/keyword info, and perhaps even page content.
I'm considering two main options to do this:
1) Modify Activity, Location and Person tables to include the relevant fields to hold this additional info
or
2) Create a PageInfo table to hold all this info in the same place, then simply add a PageInfoID field to the above mentioned tables
What are the pros and cons of these approaches and are there any better ways of doing this?
(One con I can think of for option 2 is that you can't really enforce the 1:1 relationship, so you could, theoretically, have a PageInfo record used by an Activity record and a Person record.
If i may suggest.. you may be going about this SEO thing the wrong way.
Rather than trying to pack each page with additional meta-data per 'Activity' object to get better search results from the page, i think you should concentrate on simply putting the 'Activity' data into the page with clean, semantic, valid XHTML and CSS. This way, search engines will worry about finding out which part of each page is of the most importance and index/rank that accordingly.
Also, trying to add these 'page meta' objects to the domain model of your system will create all sorts of conceptual (and indeed practical) problems for your design and it will certainly be confusing to users who will struggle to understand what the importance of the difference between say an Activity Title verses a Page Title is. Let the bots figure our keywords from your content, dont try to do it yourself - this type of over-optimisation will actually end up resulting in worse page rankings than better ones.

Resources