R Merge 2 tables/dataframes by partial match - r

I have two tables/dataframes.
The first Table (ID) one looks like this:
The second table (Names) looks like this:
I want to match the "IDTag" variable to the first few letters of the "Name" variable. In other programming languages I would do a foreach and run through each of the IDTags for each of the rows of the second table (matching the IDTag to the first n characters of the "Name" variable where n is the number of characters of the IDTag in question.
In R it seems like there should be a method for doing this and I have looked at pmatch and a few others but those either don't appear to make the match at all or when I try to use them come up with several NAs in places where I wouldn't have expected them (Sample code using the table data above:
NameMatches <- Names[pmatch(
ID$IDTag,
Names$Name,
duplicates.ok = TRUE
),]
I have the feeling I am going about this with the wrong theory or concept so I am looking to see if someone can guide on the simplest/clearest way to do this accurately.
Editing original question to reply to comments...
The expected output would look something like this (i.e. - all of the columns of the Names table with the addition of the Group column from the ID table. Multiple matches are expected - one to many relationship between ID and Names tables):
Thanks,

If you are open to using the sqldf package, then one option would be to just write a join using the logic you gave us:
library(sqldf)
sql <- "SELECT * FROM ID t1 INNER JOIN Names t2
ON t2.Name LIKE t1.IDTag || '%'"
output <- sqldf(sql)
Note: If you want to keep all rows from the ID data frame, regardless of whether or not they match to anything in Names, then use a left join instead.

Related

In SQLite3 how to group by those having no certain values?

A table named "example" is like this:
enter image description here
I would like to SELECT Name FROM example GROUP BY NAME. But at the same time, if anyone's value contains X, then that one should be excluded from the result. In the "example" table, A has three values and one of them is X so A should be excluded from the result. So does B.
The result produced by SQL clauses should be:
C
D
Could anyone help me write corresponding SQL clauses so that I can get the result I want?
Thank you!!
I tried to write something like this:
SELECT Name FROM example WHERE Value <> X GROUP BY NAME.
It didn't work because A and B still have other values which prevent them from being excluded. I just have no idea of what else I can do. I'm very new in SQL.
A comparison results in either 0 or 1.
So taking the largest result in a group shows whether there is a match in the group:
SELECT Name
FROM Example
GROUP BY Name
HAVING MAX(Value = 'X') = 0;

Inserting and combining data between 2 table

We have a table students with ID,NAME,SURNAME. We want have another table (created) students_2 with ID1,NAME1,SURNAME1.
Starting from table students, i want to fill data in the second table in the following way : I want to have in the second table combinations of names ( example : NAME,SURNAME1; NAME1,SURNAME1). Moreover, i want to generate combination of names.
How can I do that ? I tried something like :
INSERT INTO students_2 (ID1,NAME1,SURNAME1) SELECT ID,NAME,NAME from students;
But it's not correct cause I don't generate combinations, just inserting . A solution is appreciated, but mainly i need ideas.
You could write something like
INSERT INTO students2(NAME, VALUE) FROM
SELECT s1.name, s2.value from students1 s1 cross join students1 s2
This will do Cartesian product , and will get a NxN rows with combinations

Can you only use one Select command w/SqlDataSource

This is a pretty simple question that I haven't been able to find an answer for. Is it possible to have two separate SELECT commands (from the same table) in the same SqlDataSource command to populate two different cells in a given GridView?
I haven't been able to find current information so far.
::EDIT::
The challenge is that I'm attempting to manupulate one cell with a COUNT command and the second cell with a numerical grand total from the same information.
You can Combine results from two separate SELECT Statements by doing something like this..
SELECT X.A , Y.B
FROM (SELECT Column1 AS A FROM TableName) X, (SELECT Column2 AS B FROM TableName) Y

How to create a view that returns a 2x2 (or NxN) matrix of results

So I know enough SQL just to be really dangerous (I don't normally work the back-end) but cannot get the following view to be created successfully ;) The result set I'm after is a data set that has rows assigned as a column alias from multiple tables (instead of a 1xN flat of all columns). There is a many-to-one relationship when looking at the main table, based on foreign keys associated to the row id of the appropriate related table.
Ideally I'd like a data set that looks like this in the return:
dataset.transaction_row[n]: col1, col2, col3, coln... (columns from the transaction table)
dataset.category_row[n]: col1, co2, col3, coln... (columns from the category table)
and so on...
I get the following error:
Query Error: near "AS": syntax error Unable to execute statement
From:
CREATE VIEW view_unreconciled_transactions
AS SELECT account_transaction.* AS transaction_row,
category.* AS category_row,
memorized.name_rule_replace OR account_transaction.name AS payee
FROM account_transaction
LEFT JOIN memorized ON account_transaction.memorized_key = memorized.id
LEFT JOIN category ON account_transaction.category_key = category.id
WHERE status != 2
ORDER BY account_transaction.dt_posted DESC
It seems easy enough since the result-column selector is repeatable which includes expressions (referencing sqlite's syntax diagrams). In reference to the error, I'm assuming it's complaining about the 2nd 'AS' where I'm trying to get table.* assigned as an alias. Any help in the right direction is appreciated. If I had to, I suppose I could explicitly state all columns but that feels like a kludge.
The AS modifier can only be applied to a single column, not to a collection such as the * you used. You will have to break them out into specific names, (which is best practice IMHO anyway)
It looks like you want to make a "pivot table". They can be tricky to make in a database. I can say that if you a data result, where each row comes from a different table source, and the columns form each table are IDENTICAL, then you could try using a UNION statement to join the different results together like they are just one dataset.
NOTE that the columns all take their naming cue from the first dataset in a UNION and the datatype all need to be the same.

sqlite subqueries with group_concat as columns in select statements

I have two tables, one contains a list of items which is called watch_list with some important attributes and the other is just a list of prices which is called price_history. What I would like to do is group together 10 of the lowest prices into a single column with a group_concat operation and then create a row with item attributes from watch_list along with the 10 lowest prices for each item in watch_list. First I tried joins but then I realized that the operations where happening in the wrong order so there was no way I could get the desired result with a join operation. Then I tried the obvious thing and just queried the price_history for every row in the watch_list and just glued everything together in the host environment which worked but seemed very inefficient. Now I have the following query which looks like it should work but it's not giving me the results that I want. I would like to know what is wrong with the following statement:
select w.asin,w.title,
(select group_concat(lowest_used_price) from price_history as p
where p.asin=w.asin limit 10)
as lowest_used
from watch_list as w
Basically I want the limit operation to happen before group_concat does anything but I can't think of a sql statement that will do that.
Figured it out, as somebody once said "All problems in computer science can be solved by another level of indirection." and in this case an extra select subquery did the trick:
select w.asin,w.title,
(select group_concat(lowest_used_price)
from (select lowest_used_price from price_history as p
where p.asin=w.asin limit 10)) as lowest_used
from watch_list as w

Resources