I have created a query with two tables (basic info) (trainings) joined with inner join and gives me data as below:
District Department Person Trainings
Chitral Health ABC Training 1
Chitral Health ABC Training 2
Chitral Health DEF Training 1
Chitral Health DEF Training 3
Chitral Health DEF Training 1
Chitral Health GHI Training 4
On a report's District Header, i want to count unique Persons and count no of Trainings he received.
For example: i want a text field that counts Persons availed Trainings =3 and No of Trainings = 4
Note: Few persons and few trainings are duplicates
Please guide me, how to get it right.
Related
This question already has answers here:
Remove URLs from string
(4 answers)
Closed 4 years ago.
I have the following data frame named bbchealth:
head(bbchealth)
# A tibble: 6 x 1
Tweets
<chr>
1 Breast cancer risk test devised http://bbc.in/1CimpJF
2 GP workload harming care - BMA poll http://bbc.in/1ChTBRv
3 Short people's 'heart risk greater' http://bbc.in/1ChTANp
4 New approach against HIV 'promising' http://bbc.in/1E6jAjt
5 Coalition 'undermined NHS' - doctors http://bbc.in/1CnLwK7
6 Review of case against NHS manager http://bbc.in/1Ffj6ci
As you can see, each row, which contains a single tweet, has a URL at the end. I would like to remove only this URL while leaving the rest of the data frame unaffected.
If I try to use something like rm_url, I get the following:
[1] "c(\"Breast cancer risk test devised \"GP workload harming care - BMA poll \"Short people's 'heart risk greater' \"New approach against HIV 'promising' \"Coalition 'undermined NHS' - doctors \"Review of case against NHS manager \"\\\"VIDEO: 'All day is empty, what am I going to do?' \"VIDEO: 'Overhaul needed' for end-of-life care \"Care for dying 'needs overhaul' \"VIDEO: NHS: Labour and Tory key policies \"Have GP services got worse? \"A&E waiting hits new worst level \"Parties row over GP opening hours \"Why strenuous runs may not be so bad after all \"VIDEO: Health surcharge for non-EU patients \"VIDEO: Skin cancer spike 'from 60s holidays' \"\.........
That is, a single vector(?) consisting of a string of the tweets with the URLs removed.
The code I used was rm_url(bbchealth, replacement = "").
If I use gsub("http.*","",bbchealth), I get the following output:
[1] "c(\"Breast cancer risk test devised "
However, this is not what I want. I want to retain the columnar structure. That is,
# A tibble: 6 x 1
Tweets
<chr>
1 Breast cancer risk test devised
2 GP workload harming care - BMA poll
3 Short people's 'heart risk greater'
4 New approach against HIV 'promising'
5 Coalition 'undermined NHS' - doctors
6 Review of case against NHS manager
How can I accomplish this?
Here you go, with stringi package
dt <- data.frame(
Tweets = c(
"Breast cancer risk test devised http://bbc.in/1CimpJF ",
"GP workload harming care - BMA poll http://bbc.in/1ChTBRv",
"Short people's 'heart risk greater' http://bbc.in/1ChTANp "
)
)
library(stringi)
dt$Tweets2 <- stringi::stri_replace_all_regex(dt$Tweets, "\\shttp://.*$", "")
I want to cluster my qualitative data using kmeans in R. The data represents trade ID's, Counterparty Names, Regulator, Product Type and Error Type. All these values are not numeric and I know that kmeans only works with numeric values. I want to cluster based on Error Types and want to know which Counterparties and regulators group together. The data that I have is as follows:
Reported_USI Counterparty Regulator Product_Type Error Code
ABC243 ABC CSA InterestRate G1234 1
ABC111 ABC CSA InterestRate G1234 1
TRE567 TRE CSA Equity G5689 2
YTY111 YTY CSA Equity G4523 3
DEF111 DEF CSA InterestRate G1234 1
CBC111 CBC CSA InterestRate G5689 2
TTT111 TTT CFTC Credit G4523 3
PPP111 PPP CFTC Credit G5555 4
GGG111 GGG CFTC Credit G5555 4
RRR111 RRR CFTC Credit G0000 5
EEE111 EEE CFTC Credit G0000 5
SSS111 SSS CSA InterestRate G0000 5
VVV111 VVV CSA ForeignExchange G1234 1
BBB111 BBB CSA ForeignExchange G5555 4
NNN111 NNN CSA InterestRate G4523 3
Here is the code:
cluster_file<-read.csv("Sample_clustering.csv")
cluster_file<-as.data.frame(cluster_file,row.names = NULL)
clusters<-kmeans(cluster_file[,6],4)$cluster
clusters1<-names(clusters[clusters==1])
I gave the Error's a number from 1-5. I want to see what cluster the Counterparty and USI fall under and then use a graph to visualize it. If anyone can give me a direction I will really appreciate it. The data that I gave is a subset from a very huge data set. Hopefully I have been clear. Thank you.
EDIT: I put the code up. When I went on to pull the names of the USI associated with the cluster it returned a null value.
Stop expecting magic. kmeans cannot do magic.
It performs a least-squares optimization. It assumes you have continuous variables.
it is up to you to have data where this is the right approach.
Judging from your data k-means is the wrong tool here. If I'm not mistaken, you are attempting to run k-means on the last column only,which contains your random enumeration of error codes 1,2,3,4,5 only? What result would you expect there?!?
In fact, I don't think any clustering will yield a statistically sound result on your data set, which could as well be random strings...
If you cannot show that least-squares is a "reasonable" optimization criteria, and means are reasonable representatives for your data set, then you shouldn't use k-means.
In your case, on 1,2,3,4,5 error codes, this obviously cannot not work.
I am coming from Swing where in a JTable, I can just set the column and the row to a value. In a JavaFX TableView, I have to make each row represent an object. I am trying to represent a schedule for a race track. I have round and race number, and then whoever is in each lane.
Round | Race | Lane 1 | Lane 2
1 1 Bob Joe
1 2 Tom Sam
2 1 Sam Joe
2 2 Bob Tom
Each object in a lane, (Bob, Tom, ...) is a Car object. It has various fields but what is being represented in the table should be whatever toString() returns, in this case, the driver's name. I have an array of Round object and each Round has an array of Races which has an array of Cars for lanes. I need a way to represent this data structure in a TableView as shown above. Note that the amount of lanes, races in a round and total rounds can be changed by the user at runtime.
i have one requirement to "select all rows from fund table whose own fund_id is not found as replacement fund_id on other rows in fund table".
every fund record is having history record created with old status and new status.
whenever a particular fund is going through void process (i.e old status to new status : null-->'Issued'-->'void'--->'reissue' then a replacement fund_id is generated
linked to original record which is treated as new fund record with history as null--> 'issued'.
please see below data for more clarifications.
FUND HISTORY TABLE:
columns and data are
fund_hist_id fund_id old_status new_status
128 2444582 null I
127 2445579 V R
124 2445579 I v
123 2445579 null I
129 2445562 null I
FUND TABLE:
columns and it's data are
FUND_ID FUND_NAME ORIGINAL_FUND_ID REPLACEMENT_FUND_ID
2444582 ABC FUND 2444582 NULL
2445579 ABC FUND 2445579 2444582
2445562 XYZ FUND 2445562 NULL
PLEASE note: as per my requirement i have to select original fund ids from fund table :2445579,2445562
since 2444582 is linked as replacement fund id to any other record in fund table i have to ignore this record ,but pick 2445579 as this is the original record with
ONE OF THE history record'null' to 'issued' .Also 2445562 is not having any replacement records linked in history as well and hence i need to select this record as
well.
Can anybody provide me query considering performance into mind.
please let me know if any of the details is not clear ?
regards
rajesh
considered two tables hist and fund. And required query will be:
select *
from fund f, hist h
where f.FUND_ID=h.FUND_ID and f.fund_id is not null
and f.FUND_ID not in (select nvl(REPLACEMENT_FUND_ID,'0') from fund)
and h.OLD_STATUS is null and h.NEW_STATUS='I';
I made some wrong moves in Neo4j, and now we have a graph with duplicate nodes. Among the duplicate pairs, the full property set belongs to the first of the pair, and the relationships all belong to the second in the pair. The index is the node_auto_index.
Nodes:
Id Name Age From Profession
1 Bob 23 Canada Doctor
2 Amy 45 Switzerland Lawyer
3 Sam 09 US
4 Bob
5 Amy
6 Sam
Relationships:
Id Start End Type
1 4 6 Family
2 5 6 Family
3 4 5 Divorced
I am trying to avoid redoing the whole batch import. Is there a way to merge the nodes in cypher based on the "name" string property, while keeping all of the properties and the relationships?
Thank you!
Okay, I think I figured it out:
START first=node(*), second=node(*)
WHERE has(first.Name) and has(second.Name) and has(second.Age) and NOT(has(first.Age))
WITH first, second
WHERE first.Name= second.Name
SET first=second
The query is still processing, but is there a more efficient way of doing this?
You create a cross product here between the two sets, so that will be expensive. Better is to do an index lookup for name.
START first=node(*), second=node(*)
WHERE has(first.Name) and has(second.Name) and has(second.Age) and NOT(has(first.Age))
WITH first, second
SKIP 20000 LIMIT 20000
WHERE first.Name= second.Name
SET first=second
And you probably have to paginate the processing as well.
START n=node:node_auto_index("Name:*")
WITH n.Name, collect(n) nodes
SKIP 20000 LIMIT 20000
WHERE length(nodes) == 2
WITH head(filter(x in nodes : not(has(x.Age)))) as first, head(filter(x in nodes : has(x.Age))) as second
SET first=second