I've a data table like this
+------------+-------+
| Model | Price |
+------------+-------+
| Apple-1 | 10 |
+------------+-------+
| New Apple | 11 |
+------------+-------+
| Orange | 13 |
+------------+-------+
| Orange2019| 15 |
+------------+-------+
| Cat | 19 |
+------------+-------+
I'want to define a list of base model tags that I want to add to any single row that matches certain condition/value. So for example defined a data frame for tagging like this
+------------+--------+
| Model | Tag |
+------------+------ -+
| Apple-1 | A |
+------------+------ -+
| New Apple | A |
+------------+------ -+
| Orange | B |
+------------+------ -+
| Cat | B |
+------------+--------+
I would like to find some way to get this results:
+------------+-------+--------+
| Model | Price | Tag |
+------------+-------+--------+
| Apple-1 | 10 | A |
+------------+-------+--------|
| New Apple | 11 | A |
+------------+-------+--------|
| Orange | 13 | B |
+------------+-------+--------|
| Orange2019| 15 | B |
+------------+-------+--------|
| Cat | 19 | B |
+------------+-------+--------|
I'm don't mind to use a table to managed the tagging data, and I know that I could write very "ad-hoc" mutate statement to achieve the results I want, just wondering if there is more elegant way to tagging a string based on a pattern match.
One idea is to use the Levenshtein distances to cluster the words you have. You would need to provide with a number of clusters. Once you have this clusters, just add the number of each one as a category tag to your table. Check out this answer which goes into detail of Levenshtein distance clustering. Text clustering with Levenshtein distances
edit
I think I totally misunderstood your question... try this
df=data.frame("Model"=c("Apple-1","New Apple","Organe","Orange2019","Cat"),
"Price"=c(10,11,13,15,19),stringsAsFactors = FALSE)
tags=data.frame("Model"=c("Apple-1","New Apple","Orange","Cat"),
"Tag"=c("A","A","B","B"),stringsAsFactors = FALSE)
df%>%rowwise()%>%mutate(Tag=if_else(!is.na(tags$Tag[which(!is.na(str_extract(Model,tags$Model)))[1]]),
tags$Tag[which(!is.na(str_extract(Model,tags$Model)))[1]],false="None"))
Model Price Tag
<chr> <dbl> <chr>
1 Apple-1 10 A
2 New Apple 11 A
3 Organe 13 None
4 Orange2019 15 B
5 Cat 19 B
I actually changed Orange for Organe so that you see what happens if there is not match ( none is returned)
I have tried different things, but none succeeded. I have the following issue, and would be very gratefull if someone could help me.
I get the data from a view as several billions of records, for different measures
A)
| s_c_m1 | s_c_m2 | s_c_m3 | s_c_m4 | s_p_m1 | s_p_m2 | s_p_m3 | s_p_m4 |
|--------+--------+--------+--------+--------+--------+--------+--------|
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
|--------+--------+--------+--------+--------+--------+--------+--------|
Then I need to aggregate it by each measure. And so long so fine. I got this figured out.
B)
| s_c_m1 | s_c_m2 | s_c_m3 | s_c_m4 | s_p_m1 | s_p_m2 | s_p_m3 | s_p_m4 |
|--------+--------+--------+--------+--------+--------+--------+--------|
| 3 | 6 | 9 | 12 | 15 | 18 | 21 | 24 |
|--------+--------+--------+--------+--------+--------+--------+--------|
Then I need to get the data in the following form. I need to turn it into a key-value form.
C)
| measure | c | p |
|---------+----+----|
| m1 | 3 | 15 |
| m2 | 6 | 18 |
| m3 | 9 | 21 |
| m4 | 12 | 24 |
|---------+----+----|
The first 4 columns from B) would form in C) the first column, and the second 4 columns would form another column.
Is there an elegant way, that could be easily maintainable? The perfect solution would be if another measure would be introduced in A) and B), there no modification would be required and it would automatically pick up the difference.
I know how to get this done in SqlServer and Postgres, but here I am missing the expirience.
I think you should use map for this
I have excel file full of alphabets which is "a","b","c" only. There are total 20 columns and 300 rows. I just wanted to find pattern matching in all columns. For example, if i search for 4 rows of pattern matching where there is "aabc","bcdb" and etc the output will be how many times the same pattern repeat from the csv file. If i search for 5 rows of pattern matching where there is "abcaa", "bbaca" and etc the output will be how many times the same pattern repeat from the csv file. It is not necessary that the matching must be in same rows. If the string pattern occur anywhere else in the file can be considered too. The output may be in the next sheet should be fine. I have tried in VBA and R using Regex but only counting single cell. Any advice on how to find the pattern matching from the excel file would be greatly appreciated. Thanks in advance.
Excel File:
**A B C D**
1 a | a | a | b |
2 a | a | a | c |
3 b | b | b | a |
4 c | c | c | c |
5 d | b | d | b |
6 b | a | b | c |
7 b | b | b | b |
8 a | a | a | c |
9 a | c | a | a |
10 c | c | c | c |
11 c | a | c | c |
12 a | a | a | a |
13 b | b | a | b |
14 b | b | b | a |
15 c | c | c | c |
Output Example:
If search for 4 rows
aabc 3
dbba 2
baac 2
so on...
If search for 5 rows
aabcd 2
aacca 3
so on..
In E1 enter:
=TEXTJOIN("",TRUE,A1:D1)
and copy down. Then copy column E and PasteSpecialValues into column F.Then apply RemoveDuplicates to column F.Then in G1 enter:
=COUNTIF(E:E,F1)
and copy down:
The Modeling and Solving Linear Programming with R book has a nice example on planning shifts in Sec 3.7. I am unable to solve it with R. Also, I am not clear with the solution provided in the book.
Problem
A company has a emergency center which is working 24 hours a day. In
the table below, is detailed the minimal needs of employees for each of the
six shifts of four hours in which the day is divided.
Shift Employees
00:00 - 04:00 5
04:00 - 08:00 7
08:00 - 12:00 18
12:00 - 16:00 12
16:00 - 20:00 15
20:00 - 00:00 10
R solution
I used the following to solve the above.
library(lpSolve)
obj.fun <- c(1,1,1,1,1,1)
constr <- c(1,1,0,0,0,0,
0,1,1,0,0,0,
0,0,1,1,0,0,
0,0,0,1,1,0,
0,0,0,0,1,1,
1,0,0,0,0,1)
constr.dir <- rep(">=",6)
constr.val <-c (12,25,30,27,25,15)
day.shift <- lp("min",obj.fun,constr,constr.dir,constr.val,compute.sens = TRUE)
And, I get the following result.
> day.shift$objval
[1] 1.666667
> day.shift$solution
[1] 0.000000 1.666667 0.000000 0.000000 0.000000 0.000000
This is nowhere close to the numerical solution mentioned in the book.
Numerical solution
The total number of solutions required as per the numerical solution is 38. However, since the problem stated that, there is a defined minimum number of employees in every period, how can this solution be valid?
s1 5
s2 6
s3 12
s4 0
s5 15
s6 0
Your mistake is at the point where you initialize the variable constr, because you don't define it as a matrix. Second fault is your matrix itself. Just look at my example.
I was wondering why you didn't stick to the example in the book because I wanted to check my solution. Mine is based on that.
library(lpSolve)
obj.fun <- c(1,1,1,1,1,1)
constr <- matrix(c(1,0,0,0,0,1,
1,1,0,0,0,0,
0,1,1,0,0,0,
0,0,1,1,0,0,
0,0,0,1,1,0,
0,0,0,0,1,1), ncol = 6, byrow = TRUE)
constr.dir <- rep(">=",6)
constr.val <-c (5,7,18,12,15,10)
day.shift <- lp("min",obj.fun,constr,constr.dir,constr.val,compute.sens = TRUE)
day.shift$objval
# [1] 38
day.shift$solution
# [1] 5 11 7 5 10 0
EDIT based on your question in the comments:
This is the distribution of the shifts on the periods:
shift | 0-4 | 4-8 | 8-12 | 12-16 | 16-20 | 20-24
---------------------------------------------------
20-4 | 5 | 5 | | | |
0-8 | | 11 | 11 | | |
4-12 | | | 7 | 7 | |
8-16 | | | | 5 | 5 |
12-20 | | | | | 10 | 10
18-24 | | | | | |
----------------------------------------------------
sum | 5 | 16 | 18 | 12 | 15 | 10
----------------------------------------------------
need | 5 | 7 | 18 | 12 | 15 | 10
---------------------------------------------------
i have a dataset in sqlite db which has ids and image urls. It is as follows:
ID | URL
1 | http://img.url1.com
2 |
3 | http://img.url2.com
4 | http://img.url3.com
5 | http://img.url4.com
6 |
7 | http://img.url5.com
8 |
9 | http://img.url6.com
10 |
11 | http://img.url7.com
12 | http://img.url8.com
13 |
I want to sort this accordingly in such a way that the ids with blank values appears at the bottom and the ids with nearest with img urls get appeared at the top with nearest values in the sorted order of ids as follows:
ID | URL
1 | http://img.url1.com
3 | http://img.url2.com
4 | http://img.url3.com
5 | http://img.url4.com
7 | http://img.url5.com
9 | http://img.url6.com
11 | http://img.url7.com
12 | http://img.url8.com
2 |
6 |
8 |
10 |
13 |
Can you suggest me what kind of query I need to put ?
SELECT * FROM T
ORDER BY (URL IS NULL),id
SQLFiddle demo