Create wikidata items from records in OpenRefine (and not rows)? - wikidata

I read that OpenRefine Wikidata plugins always operates in row mode.
I am in a situation where I have data in records mode : The record is a serial/magazine, and the rows in this records are the various formats of the same serial/magazine (typically, paper and electronic version). Each row has a unique ISSN identifier.Wikidata considers there is only one item for the serial/magazine (my records), but no separate items for each of the formats (my rows).
When reconciling data to Wikidata, all rows of the same record will typically match the same wikidata item, or none of the rows will match, or sometines only one row of the record will match (e.g. if only one ISSN of the format - say paper format - is known in Wikidata, but not the others).
What I would like to do is create items in Wikidata for each records for which no reconciliation result was found (iow, for which no rows has matched), and not for each row. And, when creating this item, I would like to add the ISSNs of all the rows in this record.
I am wondering if it is possible to do that ? and how ?
Thanks

Yes, it is possible. You need to perform the reconciliation operation on the first column instead.
As mentioned by the documentation, use the Fill down operation on the first column, which defines your records;
Reconcile the column to Wikidata;
Then, the Create one new item for similar cells action (in the Reconcile -> Actions menu)
Create a schema where the first column is used as subject id.
Assuming the values in your first column are initially distinct (which is the case in your example), this will create one item per record.
In your example, because your first column contains ISSNs and not titles, I would first create a root column with titles instead (before the process explained above). In rows mode, facet to keep the first row of each record by selecting non-blank values in the first column, and then copy your column with titles, and move this new column in first position. This should ensure that reconciliation picks up existing items. Note that if the same title is used by multiple journals this will create a single item for both of them, unless you add other properties in your reconciliation configuration (such as ISSN).

Related

react-table filtering the data in the cell

I Have a react-table which I'm using for logs. The table has two columns: Id's and the log per Id.
The logs can be very long therefore I created a list (every sentence in the log is an "li"). In each cell in the Log column I have a list of strings.
Now I want the global filter to filter the list in the cell, meaning to show me only the Id's that have the filtering string and also to show me Only the filtered string and not the entire list of logs that Id has.
Is that possible?
TIA

Tableau data source column name changed when using a duplicated view from database (Teradata)

I was using a view (VW_NEW_CUSTOMERS) in Teradata and all the column names had an underscore in it. The column names in tableau did not contain underscores.
For example:
Customer_Number (From Table View)
Customer Number (From Tableau Column Name)
Now I created a duplicate of the view (VW_NEW_CUSTOMERS_2), all the columns have the underscore in Tableau. So when I use replace data sources, the column name mapping is completely different from the above because of the underscores.
New Tableau fields from duplicated View:
Customer_Number (From Table View)
Customer_Number (From Tableau Column Name)
I would like to know why the underscores did not appear 1st time and it is now appearing when I duplicated the view. How can I rename the fields so that it comes like the 1st time? Should I do them manually now?
Note: Database columns were using aliases
Check this thread, this isn't new, Tableau decided to start renaming fields some time ago. Not sure why it would have done on one of your data sources, but not the other.
Anyway, the exec summary, you may need to reset the field names of the version without the underscore, which should bring the underscore back into your data, making both data sources the same. To do this, copied from the thread:
"Version 9.3 and 10.1, you can select all the measures (and dimensions) in a worksheet, right click and "reset names" in two operations"
I think there's also a way to hack the xml to add the spaces to your copy, should that be preferrable. The thread covers hacking the xml to remove spaces, therefore I assume to add spaces do the same but in reverse.

SQLiteStudio automatically insert value

I have a database created with SQLiteStudio that has a products table with two columns, item and price. It also has a sales table with an item column that is linked to the item column in products. I'd like the sales table to also have a price column, whose value is automatically set to that of the products.price row corresponding to the value selected from the products.item column. How would I define the sales.price column so that this value is automatically set?
Also, the prices in the products table may be changed from time to time, but the price listed in any existing sales records must not be updated when this is done.
SQLite has generated columns (also called expression columns) that are found in other DBMSes. But you'll need a fairly recent version of SQLite:
Generated column support was added with SQLite version 3.31.0
(2020-01-22). If an earlier version of SQLite attempts to read a
database file that contains a generated column in its schema, then
that earlier version will perceive the generated column syntax as an
error and will report that the database schema is corrupt.
Source: Generated Columns
You could simply build a view to augment your table a little bit.
You have mentioned that the prices will change over time and this is perfectly normal. So you have at least two design choices:
add an additional table to store price history, in that table you store product ID, price, start date and end date (as an example). Then you join it with the other tables. The effective price shall be determined based on the order date. That also means that the prices are to be stored in that table and not in products... you have to redesign your schema slightly.
the other option is to store the unit price of the product in the sales table as a historical value. This is the price that was in force when the sale was made.
One thing to consider: you may require more flexibility on pricing: it can depend on the client (different rates based on volume) and also on specific circumstances. The final price may be the result of numerous and complex calculations.

"Calculated columns cannot contain volatile functions like Today and Me" error message on Sharepoint

I try to add a new calculated column to sharepoint list that will show elapsed day. I enter name and write a formula like;
=ABS(ROUND(Today-Created;0))
The data type returned from this formula is: Single line of text
When I want to save I get an error like
Calculated columns cannot contain volatile functions like Today and
Me.
Calculated Column Values Only Recalculate As Needed
The values in SharePoint columns--even in calculated columns--are stored in SharePoint's underlying SQL Server database.
The calculations in calculated columns are not performed upon page load; rather, they are recalculated only whenever an item is changed (in which case the formula is recalculated just for that specific item), or whenever the column formula is changed (in which case the formula is recalculated for all items).
(As a side note, this is the reason why in SharePoint 2010 you cannot create or change a calculated column on a list that has more than the list view threshold of 5000 items; it would require a mass update of values in all those items, which could impact database performance.)
Thus, in order for calculated columns to accurately store "volatile" values like "Me" and "Today", SharePoint would need to somehow constantly recalculate those column values and continuously update the column values in the database. This simply isn't possible.
Alternatives to Calculated Columns
I suggest taking a different approach entirely instead of using a calculated column for this purpose.
Conditional Formatting: You can apply conditional formatting to highlight records that meet certain criteria. This can be done using SharePoint Designer or HTML/JavaScript.
Filtered List views: Since views of lists are queried and generated in real time, you can use volatile values in list view filters. You can set up a list view web part that only shows items where Created is equal to [Today]. Since you can place multiple list view web parts on one page, you could have one section for today's items, and another web part for all the other items, giving you a visual separation.
A workflow, timer job, or scheduled task: You can use a repeating process to set the value of a normal (non-calculated) column on a daily basis. You need to be careful with this approach to ensure good performance; you wouldn't want it to query for and update every item in the list if the list has surpassed the list view threshold, for example.
I found some conversations about this issue. Many people suggest to creating a new Date Time column, visible is false, default value is Today's Date and it will be named as Today. Then we can use this column in our formulas.
I tried this suggestion and yes error is gone and formula is accepted but calculated columns' values are wrong. I setted column Today is visible and checked, it was empty. Default value Today's Date was not working. When I looking for a solution for this issue I deleted column Today carelessly. Then I realized calculated columns' values are right.
Finally; I don't know what is trick but before using Today keyword in your formulas if you create a column named as Today and after your formula saving if you delete Today column, it is working.
UPDATE
After #Thriggle's answer I realized this approach doesn't work like a charm. Yes, formula doesn't cause an error when calculated column saving but it works correctly only first time, in the next day the calculated column shows old values, because its values are static as Thriggle explained.

Database schema design options

I'm struggling to decide what database schema to use. One large table, or many small (though more difficult to manage).
I have 10 templates each with their own text fields. I am trying to store the text for the templates in a database and then when the web page is called I will show the correct text in the html template. Because a mixture of these templates are to be in a sequence of screens where you can navigate backwards or forwards, I need to be able to sequence them, I can only think of adding a page_number column. I also would like to re-order them and delete them as necessary using the page_number column.
I was planning to do all this in a web application without the need for a standard folder/web page structure, like a small CMS system.
option 1,
I can create one large table with many columns, lot's of which will be empty, over half with each row. Is this bad?
option 2,
I could create many tables using only the relevant template columns required.
The problem I see with this, is the headache of repopulating a column in each table when I delete a row, because I need to re-sequence a column that represents page numbers. Which I reduce if I use one large table.
I've thought of moving page numbers into another table called page_order but I cannot think of a way to maintain an effective relationship between the other tables if I make changes.
I'm yet to figure out how to re-sequence a column in a database when a row is deleted. Surely this is a common problem!?
Thanks for taking the time to help!
Have one table that contains one row per template. It might look like:
id (INT, auto-increment)
page_order (INT, unique key here, so pages cannot have the same number)
field1 (STRING, name of the text field)
value1 (STRING, contents of the text field)
field2
value2
Then you have to decide the maximum fields that any page can have (N) and keep adding field/value columns up to N.
The advantage of this is you have one table that isn't sparsely populated (as long as the templates have about the same number of fields, even if the names of those fields are different).
If you want to make an improvement to his (maybe not necessary for a small amount of data) you could change field to an INT id and connect it to a lookup table that contains (field_id, field_name).

Resources