Merge Rows data with matching columns data from SQLite table? - sqlite

I want to query a table name Lines like this:
ID Part Count
--- --------- ------------
1 5 234
2 5 846
3 5 234
4 6 585
5 6 585
6 7 465
and return the rows data like following :
ID Part Count
--- --------- ------------
1 5 1314
4 6 1170
6 7 465
What I want is to merge the Count column value where Part column matches and return other rows as is.I know little about database and have tried many queries but not able to achieve the result that I want.

select Part, sum(Part) as Count from tableName
group by Part

Related

How to add columns from another data frame where there are multible matching rows

I'm new to R and I'm stuck.
NB! I'm sorry I could not figure out how to add more than 1 space between numbers and headers in my example so i used "_" instead.
The problem:
I have two data frames (Graduations and Occupations). I want to match the occupations to the graduations. The difficult part is that one person might be present multiple times in both data frames and I want to keep all the data.
Example:
Graduations
One person may have finished many curriculums. Original DF has more columns but they are not relevant for the example.
Person_ID__curriculum_ID__School ID
___1___________100__________10
___2___________100__________10
___2___________200__________10
___3___________300__________12
___4___________100__________10
___4___________200__________12
Occupations
Not all graduates have jobs, everyone in the DF should have only one main job (JOB_Type code "1") and can have 0-5 extra jobs (JOB_Type code "0"). Original DF has more columns but the are not relevant currently.
Person_ID___JOB_ID_____JOB_Type
___1_________1223________1
___3_________3334________1
___3_________2120________0
___3_________7843________0
___4_________4522________0
___4_________1240________1
End result:
New DF named "Result" containing the information of all graduations from the first DF(Graduations) and added columns from the second DF (Occupations).
Note that person "2" is not in the Occupations DF. Their data remains but added columns remain empty.
Note that person "3" has multiple jobs and thus extra duplicate rows are added.
Note that in case of person "4" has both multiple jobs and graduations so extra rows were added to fit in all the data.
New DF: "Result"
Person_ID__Curriculum_ID__School_ID___JOB_ID____JOB_Type
___1___________100__________10_________1223________1
___2___________100__________10
___2___________200__________10
___3___________300__________12_________3334________1
___3___________300__________12_________2122________0
___3___________300__________12_________7843________0
___4___________100__________10_________4522________0
___4___________100__________10_________1240________1
___4___________200__________12_________4522________0
___4___________200__________12_________1240________1
For me the most difficult part is how to make R add extra duplicate rows. I looked around to find an example or tutorial about something similar but could. Probably I did not use the right key words.
I will be very grateful if you could give me examples of how to code it.
You can use merge like:
merge(Graduations, Occupations, all.x=TRUE)
# Person_ID curriculum_ID School_ID JOB_ID JOB_Type
#1 1 100 10 1223 1
#2 2 100 10 NA NA
#3 2 200 10 NA NA
#4 3 300 12 3334 1
#5 3 300 12 2122 0
#6 3 300 12 7843 0
#7 4 100 10 4522 0
#8 4 100 10 1240 1
#9 4 200 12 4522 0
#10 4 200 12 1240 1
Data:
Graduations <- read.table(header=TRUE, text="Person_ID curriculum_ID School_ID
1 100 10
2 100 10
2 200 10
3 300 12
4 100 10
4 200 12")
Occupations <- read.table(header=TRUE, text="Person_ID JOB_ID JOB_Type
1 1223 1
3 3334 1
3 2122 0
3 7843 0
4 4522 0
4 1240 1")
An option with left_join
library(dplyr)
left_join(Graduations, Occupations)

Filtering dataset by values and replacing with values in other dataset in R [duplicate]

This question already has answers here:
Replace values in data frame based on other data frame in R
(4 answers)
Closed 4 years ago.
I have two datasets like this:
>data1
id l_eng l_ups
1 6385 239
2 680 0
3 3165 0
4 17941 440
5 135 25
6 151 96
7 102188 84
8 440 65
9 6613 408
>data2
id l_ups
1 237
2 549
3 100
4 444
5 28
6 101
7 229
8 92
9 47
I want to filterout the values from data1 where l_ups==0 and replace them with values in data2 using id as lookup value in r.
Final output should look like this:
id l_eng l_ups
1 6385 239
2 680 549
3 3165 100
4 17941 440
5 135 25
6 151 96
7 102188 84
8 440 65
9 6613 408
I tried the below code but no luck
if(data1[,3]==0)
{
filter(data1, last_90_uploads == 0) %>%
merge(data_2, by.x = c("id", "l_ups"),
by.y = c("id", "l_ups")) %>%
select(-l_ups)
}
I am not able to get this by if statement as it will take only one value as logical condition. But, what if I have more than one value as logical statement?
like this:
>if(data1[,3]==0)
TRUE TRUE
Edit:
I want to filter the values with a condition and replace them with values in another dataset. Hence, this question is not similar to the one suggested as repetitive.
You don't want to filter. filter is an operation that returns a data set where rows might have been removed.
You are looking for a "conditional update" operation (in terms of a databases). You are already using dplyr, so try a join operation instead of match:
left_join(data1, data2, by='id') %>%
mutate(l_ups = ifelse(!is.na(l_ups.x) || l_ups.x == 0, l_ups.y, l_ups.x))
By using a join operation rather than the direct subsetting comparison as #markus suggested, you ensure that you only compare values with same ids. If one of your data frames happens to miss a row, the direct subsetting comparison will fail.
By using a left_join rather than inner_join also ensures that if data2 is missing an id, the corresponding id will not be removed from data1.

How to find the unique row with the largest column value?

I have a SQLite database where entries are sorted like this:
| ID | length | breadth | height | time |
1 10 20 30 123
1 10 20 15 432
2 4 2 7 543
2 4 2 8 234
As you see, the height column can vary over time. I want to get the entry with the largest height, for every unique ID in my database. Is there some way to do this in one single query, instead of looping through all id's with something like this
for x in ids:
SELECT length, breadth, height FROM table WHERE id = x ORDER BY height DESC LIMIT 1
Use GROUP BY:
SELECT ID, MAX(height) FROM table GROUP BY ID

Retrieving the name of a particular column of a table in R

I have the next data frame (matrix):
> Table
Clima Crecimiento
1 1 350
2 1 375
3 1 360
4 1 400
5 1 380
6 2 500
7 2 530
8 2 520
9 2 550
10 2 545
I would like to know the command to retrieve the name of the first column exclusively (Clima), as it is a categorical column and I need to introduce it in a code I am preparing to ease discriminant function analysis.
I tryed using names(), colnames[.1], with no succeed. I suppose I am near to the solution, however, some missing code is needed in order to solve what I request.
Thanks

How to find differences in elements of 2 data frames based on 2 unique identifiers

I have 2 very large data frames similar to the following:
df1<-data.frame(DS.ID=c(123,214,543,325,123,214),OP.ID=c("xxab","xxac","xxad","xxae","xxaf","xxaq"),P.ID=c("AAC","JGK","DIF","ADL","AAC","JGR"))
> df1
DS.ID OP.ID P.ID
1 123 xxab AAC
2 214 xxac JGK
3 543 xxad DIF
4 325 xxae ADL
5 123 xxaf AAC
6 214 xxaq JGR
df2<-data.frame(DS.ID=c(123,214,543,325,123,214),OP.ID=c("xxab","xxac","xxad","xxae","xxaf","xxaq"),P.ID=c("AAC","JGK","DIF","ADL","AAC","JGS"))
> df2
DS.ID OP.ID P.ID
1 123 xxab AAC
2 214 xxac JGK
3 543 xxad DIF
4 325 xxae ADL
5 123 xxaf AAC
6 214 xxaq JGS
The unique id is based on the combination of the DS.ID and the OP.ID, so that DS.ID can be repeated but the combination of DS.ID and OP.ID will not. I want to find the instances where P.ID changes. Also, the combination of DS.ID and OP.ID will not necessarily be in the same row.
In the example above, it would return row 6, as the P.ID changed. I'd want to write both the initial and final values to a data frame.
I have a feeling the initial step would be
rbind.fill(df1,df2)
(.fill because there's added columns in the data frames I'm trying to loop through).
Edit: Assume there's other columns that have different values as well. Thus, duplicated would not work unless you isolated them to their own data frame. But, I'll be doing this for many columns and many data frames, so I'd rather not go with that method for speed sake.
If ident is 0 in the following code, then probably, there is difference between two:
ll<-merge(df1,df2,by=c("DS.ID", "OP.ID"))
library(plyr)
ddply(ll,.(DS.ID, OP.ID),summarize,ident=match(P.ID.x, P.ID.y,nomatch=0))
DS.ID OP.ID ident
1 123 xxab 1
2 123 xxaf 1
3 214 xxac 1
4 214 xxaq 0
5 325 xxae 1
6 543 xxad 1

Resources