Disclaimer: This is not a database administration or design question. I did not design this database and I do not have rights to change it.
I have a database in which many fields are compound. For example, a single column is used for acre usage for a district. Many districts have one primary crop and the value is a single number, such as 14. Some have two primary crops and it has two numbers separated by a comma like "14,8". Some have three, four, or even five primary crops resulting in a compound value like "14,8,7,4,3".
I am pulling data out of this database for analytical research. Right now, I am pulling columns like that into R, splitting them into 5 values (padding nulls if there aren't 5 values), and performing work on the values. I want to do it in the database itself. I want to split the value on the comma, perform an operation on the resulting values, and then concatenate the result of the operation back into the original column format.
Example, I have a column that is in acres. I want it in square meters. So, I want to take "14,8", temporarily turn it into 14 and 8, multiply each of those by 4046.86, and get "56656.04,32374.88" as my result. What I am currently doing is using regexp_replace. I start with all rows where "acres REGEXP '^[0-9.]+,[0-9.]+,[0-9.]+,[0-9.]+$'" for the where clause. That gives me rows with 5 numbers in the field. Then, I can do the first number with "cast(regexp_replace(acres,',.*%','') as float) * 4046.86". I can do each of the 5 using a different regexp_replace. I can concatenate those values back together. Then, I run a query for those with 4 numbers, then 3, then 2, and finally the single number rows.
Is this possible as a single query?
Use a function to parse the string and to convert it to desired result. This will allow for you to use a sigle query for the job.
Related
I have a column of store IDs which all have leading zeroes. I.E. 0017 shows rather than 17, 0876 shows rather than 876.
All Store IDs are 4 digits long with these leading zeroes. Is there a way to remove these leading zeroes and therefore leave me with 17 and 876 (as per above).
I imagine this would involve a REGEXP statement but I haven't been able to successfully create one yet.
Create a calculated field using this formula REGEXP_REPLACE(Store ID, r'^\D*0*', '').
Working example here.
Here is a challenge for you: I was trying to make a tic tac toe based on R. First, the players have to configure putting in the name of the players, and the game should check if the name exists in a file called "Players.txt" (if not, the game will create one), if the name exists, the game will ask for a new one. The last part of the game is that the game should record all the punctuation of the players (each gambling chip used will subtract 5 points of 100 that the player has at the beginning of the game). The problem is when a player wins, the game shows the following error: "Error in table[location_name1, 3]: Incorrect number of dimension in R".
A vector can either be atomic or a list. Atomic vectors can only contain elements of one and the same data type. That means, you are "accidentally" creating a list with
vector=c(win,name1,name2,table)
with the result that each column of the data frame should become an entry.
You can solve it with
vector <- list(win, name1, name2, table)
vector is still a list but now it has the format I believe you want.
Having done that you still get errors. The reason is that these assignments fail.
location_name1=which(grepl(name1,table$gamers))
location_name2=which(grepl(name2,table$gamers))
They return an empty vector because earlier in the code you set win=vector[1]... table=vector[4]. Since vector is now a list, you have to subset it accordingly. That means you have to chance the statements to table=vector[[4]].
Now you are going to get another problem. The reason is that you treat the columns table$scores as text. When you read the data you need to make sure that this columns is not interpreted as text. You also have to eliminate all statements that coerce the column into text. Otherwise table[location_name1,3]=table[location_name1,3]+pointsx will obviously fail because you cannot add a number to a string.
For example, you coerce the column into a character column with this statement:
name1 <- data.frame(gamers=name1,games="1",scores="100")
games and scores are strings not numbers. Another example is the assigment after reading the table from the file. You can make sure that scoresare numeric by doing this.
scores <- as.numeric(table[,3])
Please get familiar with Rstudio debugging capabilities (https://support.rstudio.com/hc/en-us/articles/205612627-Debugging-with-RStudio). This way you can go through your code line by line and check consequences of each assignment to the data frame.
To help with some regular label-making and printing I need to do, I am looking to write a script that allows me to enter a range of sequential numbers (some with string identifiers) that I can export with a specific format to Excel. For example, if I entered the range '1:16', I am looking for an output in Excel exactly as:
Example Excel Output
For each unique sequential number (i.e., 1 to 16) the first five rows must be labeled with a 'U", the next three rows with an 'F' and the last two rows must be the number alone. The final exported matrix will be n columns x 21 rows, where n will vary depending on the number range I enter.
My main problem is in writing to Excel. I can't find out how to customize this output and write to specific rows and columns as in the example above. I am limited to 'openxlsx' since I work on a corporate secure workstation. Here is what I have so far:
Example Code
Any help you may have would be very appreciated, thanks in advance!
I have a column containing user entry descriptions, these descriptions can be anything however i do need them sorted into a logical order.
The text can be anything like
16 to 26 months
40 to 60 months
Literacy
Mathematics
When i order these in sql statement the text items return fine. However any beginning with numbers come back in an order not logical
i.e.
16 to 26 months
will be before
8 to 20 months
i understand why as it takes first character etc but don't know how to alter sql statement (using sqlite) to improve the performance without messing up the entries beginning with text
When i cast to numeric the numbers are fine the items beginning with text go wrong
Thanks
What you need is sorting the values in "natural order". To achieve this you will need to implement your own collating sequence; SQLite doesn't provide one for this case.
There are some questions (and answers) regarding this topic here on SO, but they are for other RDBMS. The best I could find in a quick search was this:
http://wiki.ozanh.com/doku.php?id=python:database:sqlite:how_to_natural_sort
You should think about improving your table schema, e. g. splitting the period into separate integer columns (monthsMin, monthsMax) instead of using text, which would make sorting much easier. You can always build a string from this values if necessary.
I am looking to optimize my contains query. I have a pipe separated list of numbers in one of my Aerospike bins(columns) something like 234|235|236|
These numbers may vary from 1 to 2^14
Currently I am applying a contains query to find 235| in this column but it is getting slow. Is there any Math or any strategy I can apply to convert this contains query to an exact match??
TIA,
Karan
Did you try using a List type for this bin? You can then build a secondary index on the List values (indextype = LIST, type=NUMERIC)and get all records that match the value of interest in the list using a secondary index query.