Can sqlite-utils convert function select two columns? - sqlite

I'm using sqlite-utils to load a csv into sqlite which will later be served via Datasette. I have two columns, likes and dislikes. I would like to have a third column, quality-score, by adding likes and dislikes together then dividing likes by the total.
The sqlite-utils convert function should be my best bet, but all I see in the documentation is how to select a single column for conversion.
sqlite-utils convert content.db articles headline 'value.upper()'
From the example given, it looks like convert is followed by the db filename, the table name, then the col you want to operate on. Is it possible to simply add another col name or is there a flag for selecting more than one column to operate on? I would be really surprised if this wasn't possible, I just can't find any documentation to support it.

This isn't a perfect answer as it doesn't resolve whether sqlite-utils supports multiple column selection for transforms, but this is how I solved this particular problem.
Since my quality_score column would just be basic math, I was able to make use of sqlite's Generated Columns. I created a file called quality_score.sql that contained:
ALTER TABLE testtable
ADD COLUMN quality_score GENERATED ALWAYS AS (likes /(likes + dislikes));
and then implemented it by:
$ sqlite3 mydb.db < quality_score.sql
You do need to make sure you are using a compatible version of sqlite, as this only works with version 3.31 or later.
Another consideration is to make sure you are performing math on integers or floats and not text.
Also attempted to create the table with the virtual generated column first then fill it with my data later, but that didn't work in my case - it threw an error that said the number of items provided didn't match the number of columns available. So I just stuck with the ALTER operation after the fact.

Related

Is there a way to extract a substring from a cell in OpenOffice Calc?

I have tens of thousands of rows of unstructured data in csv format. I need to extract certain product attributes from a long string of text. Given a set of acceptable attributes, if there is a match, I need it to fill in the cell with the match.
Example data:
"[ROOT];Earrings;Brands;Brands>JeweleryExchange;Earrings>Gender;Earrings>Gemstone;Earrings>Metal;Earrings>Occasion;Earrings>Style;Earrings>Gender>Women's;Earrings>Gemstone>Zircon;Earrings>Metal>White Gold;Earrings>Occasion>Just to say: I Love You;Earrings>Style>Drop/Dangle;Earrings>Style>Fashion;Not Visible;Gifts;Gifts>Price>$500 - $1000;Gifts>Shop>Earrings;Gifts>Occasion;Gifts>Occasion>Christmas;Gifts>Occasion>Just to say: I Love You;Gifts>For>Her"
Look up table of values:
Zircon, Diamond, Pearl, Ruby
Output:
Zircon
I tried using the VLOOKUP() function, but it needs to match an entire cell and works better for translating acronyms. Haven't really found a built in function that accomplishes what I need. The data is totally unstructured, and changes from row to row with no consistency even within variations of the same product. Does anyone have an idea how to do this?? Or how to write an OpenOffice Calc function to accomplish this? Also open to other better methods of doing this if anyone has any experience or ideas in how to approach this...
ok so I figured out how to do this on my own... I created many different columns, each with a keyword I was looking to extract as a header.
Spreadsheet solution for structured data extraction
Then I used this formula to extract the keywords into the correct row beneath the column header. =IF(ISERROR(SEARCH(CF$1,$D769)),"",CF$1) The Search function returns a number value for the position of a search string otherwise it produces an error. I use the iserror function to determine if there is an error condition, and the if statement in such a way that if there is an error, it leaves the cell blank, else it takes the value of the header. Had over 100 columns of specific information to extract, into one final column where I join all the previous cells in the row together for the final list. Worked like a charm. Recommend this approach to anyone who has to do a similar task.

How to read from REDCap forms with data validation into R (REDCapR::readcap_read)

I've been using the REDCapR package to read in data from my survey form. I was reading in the data with no issue using redcap_read until I realized I needed to add a field restriction to one question on my survey. Initially it was a short answer field asking users how many of something they had, and people were doing expectedly annoying things like spelling out numbers or entering "a few" instead of a number. But all of that data read in fine. I changed the field to be a short answer field (same type as before) that requires the response to be an integer and now the data won't read into R using redcap_read.
When I run:
redcap_read(redcap_uri=uri, token=api_token)$data
I get the error message that:
Column [name of my column] can't be converted from numeric to character
I also noticed when I looked at the data that that it read in the 1st and 6th records of that column (both zeros) just fine (out of 800+ records), but everything else is NA. Is there an inherent problem with trying to read in data from a text field restricted to an integer or is there another way to do this?
Edit: it also reads the dates fine, which are text fields with a date field restriction. This seems to be very specific to reading in the validated numbers from the text field.
I also tried redcapAPI::exportRecords and it will continue to read in the rest of the dataset, but reads in NA for all values in the column with the test restriction.
Upgrade REDCapR to the version on GitHub, which stacks the batches on top of each other before determining the data type (see #257).
# install.packages("remotes") # Run this line if the 'remotes' package isn't installed already.
remotes::install_github(repo="OuhscBbmc/REDCapR")
In your case, I believe that the batches (of 200 records, by default) contain different different data types (character & numeric, according to the error message), which won't stack on top of each other silently.
The REDCapR::redcap_read() function should work then. (If not, please create a new issue).
Two alternatives are
calling redcap_read_oneshot with a large value of guess_max, or
calling redcap_read_oneshot with guess_type = TRUE.

Load multiple tables from one word sheet and split them by detecting one fully emptied row

So generally what I would like to do is to load a sheet with 4 different tables, and split this one big data into smaller tables using str_detect() to detect one fully blank row that's deviding those tables. After that I want to plug that information into the startRow, startCol, endRow, endCol.
I have tried using this function as followed :
str_detect(my_data, ‘’) but the my_data format is wrong. I’m not sure what step shall I make do prevent this and make it work.
I’m using read_xlsx() to read my dataset

How to Add Column (script) transform that queries another column for content

I’m looking for a simple expression that puts a ‘1’ in column E if ‘SomeContent’ is contained in column D. I’m doing this in Azure ML Workbench through their Add Column (script) function. Here’s some examples they give.
row.ColumnA + row.ColumnB is the same as row["ColumnA"] + row["ColumnB"]
1 if row.ColumnA < 4 else 2
datetime.datetime.now()
float(row.ColumnA) / float(row.ColumnB - 1)
'Bad' if pd.isnull(row.ColumnA) else 'Good'
Any ideas on a 1 line script I could use for this? Thanks
Without really knowing what you want to look for in column 'D', I still think you can find all the information you need in the examples they give.
The script is being wrapped by a function that collects the value you calculate/provide and puts it in the new column. This assignment happens for each row individually. The value could be a static value, an arbitrary calculation, or it could be dependent on the values in the other columns for the specific row.
In the "Hint" section, you can see two different ways of obtaining the values from the other rows:
The current row is referenced using 'row' and then a column qualifier, for example row.colname or row['colname'].
In your case, you obtain the value for column 'D' either by row.D or row['D']
After that, all you need to do is come up with the specific logic for ensuring if 'SomeContent' is contained in column 'D' for that specific row. In your case, the '1 line script' would look something like this:
1 if [logic ensuring 'SomeContent' is contained in row.D] else 0
If you need help with the logic, you need to provide more specific examples.
You can read more in the Azure Machine Learning Documentation:
Sample of custom column transforms (Python)
Data Preparations Python extensions
Hope this helps

Creating a boxplot from two tables in KNIME

I am trying to plot two data columns coming from different tables using KNIME. I want to plot them in a single plot in order to be easier to compare them. In R this effect can be achieved by the following:
boxplot(df$Delay, df2$Delay,names=c("Column from Table1","Column from Table2"), outline=FALSE)
However, by using KNIME, I cannot think of a way that you can use data coming from two different tables. Have you ever faced this issue in KNIME?
If you have the same number of rows in the columns you can use the Column Appender node to get both columns into the same table. If not, you can use the Column Joiner node to create a superset of both columns.
Seems to be a working solution would be -according to the discussions in the comments- the following:
Installing the KNIME Interactive R Statistics Integration (you might already have it installed)
Using the Add Table To R node to add the second table to R
I guess the usual R code can be used to create the figures
The above answer using the "Add Table to R" node is a very nice option.
You could also do it before hand in KNIME. If the two tables have the same columns you could concatenate them using the "Concatenate" node and, if you need mark the rows with the "Constant Value" node from which table the come from originally.
If the two tables have different columns but some row identifiers in common you could join them using the "Joiner" node into one table. Then pass the concatenated or joined table over to R.

Resources