Countrows duplicates powerbi - count

I have status log table which contains: student name and results, and a Status table containing 3 categories: attemped, Failed and Passed.
A student can attempt an exam for 5 times. If he failed for first 3 times and then passed in 4th attempt, then he should no longer remain in failed count.
When I try to filter only failed students, I am getting his previous failed counts also. The powerbi row context considering his previous failed count as values.
Sample date:
https://drive.google.com/file/d/1_6DnTIMKfn-sCs_OkSaDNZYv69S9Floh/view?usp=sharing
Failed count should be 1.

Related

Failed to add new value partitions to database

My cluster is composed of 3 linux_x64 servers. It contains 1 controller node and 3 data nodes. The server version of DolphinDB is v2.0.1.1.
dfsReplicationFactor=2
dataSync=1
The schema of the database is:
2021.09.09 08:42:57.180: execution was completed [34ms]
partitionSchema->[2021.09.06,2021.09.05,2021.09.04,2021.09.03,2021.09.02,2021.09.01,2021.08.31,2021.08.30,...]
databaseDir->dfs://dwd
engineType->OLAP
partitionSites->
partitionTypeName->VALUE
partitionType->1
When I insert data to the database “dfs://dwd”, I get an error:
Failed to add new value partitions to database dfs://dwd.Please manaually add new partitions [2021.09.07].
Then I use the following script to manually add partitions:
db=database("dfs://dwd")
addValuePartitions(db,2021.09.07..2021.09.09)
The error is:
<ChunkInRecovery>openChunks failed on '/dwd/domain', chunk cf57375e-b4b3-dc87-9b41-667a5e91a757 is in RECOVERING state
The repair method is shown as follows:
Step 1: use etClusterChunksStatus to get chunkid of `/dwd/domain' at the controller node. The sample cade is shown following:
select * from rpc(getControllerAlias(), getClusterChunksStatus) where  file like "%/domain%" and state != 'COMPLETE'
Step 2: use getAllChunks to get the partition information for that chunkid at the data node. In the code below, The chunkid "4503a64f-4f5f-eea4-4247-a0d0fc3941a1" is obtained by step 1.
select * from pnodeRun(getAllChunks)  where chunkId="4503a64f-4f5f-eea4-4247-a0d0fc3941a1"
Step 3: Use copyReplicas to copy the partition copy. Assuming that the result of step 2 shows that the partition copy is on datanode3, now copy to datanode1:
rpc(getControllerAlias(), copyReplicas{`datanode3, `datanode1, "4503a64f-4f5f-eea4-4247-a0d0fc3941a1"})
Step 4: use getClusterChunksStatus to check if the status is COMPLETE. If it is, then the repair is successful.

CreateML Recommender Training Error: Item IDs in the recommender model must be numbered 0, 1, ..., num_items - 1

I'm using CreateML to generate a Recommender model using an implicit dataset of the format: User ID, Item ID. The data is loaded into CreateML as a CSV with about 400k rows.
When attempting to 'Train' the model, I receive the following error:
Training Error: Item IDs in the recommender model must be numbered 0, 1, ..., num_items - 1
My dataset is in the following format:
"user_id","item_id"
"e7ca1b039bca4f81a33b21acc202df24","f7267c60-6185-11ea-b8dd-0657986dc989"
"1cd4285b19424a94b33ad6637ec1abb2","e643af62-6185-11ea-9d27-0657986dc989"
"1cd4285b19424a94b33ad6637ec1abb2","f2fd13ce-6185-11ea-b210-0657986dc989"
"1cd4285b19424a94b33ad6637ec1abb2","e95864ae-6185-11ea-a254-0657986dc989"
"31042cbfd30c42feb693569c7a2d3f0a","e513a2dc-6185-11ea-9b4c-0657986dc989"
"39e95dbb21854534958d53a0df33cbf2","f27f62c6-6185-11ea-b14c-0657986dc989"
"5c26ca2918264a6bbcffc37de5079f6f","ec080d6c-6185-11ea-a6ca-0657986dc989"
I've tried modifying both Item ID and User ID to enumerated IDs, but I still receive the training error. Example:
"item_ids","user_ids"
0,0
1,0
2,0
2,0
0,225
400,225
409,225
0,282
0,4
8,4
8,4
I receive this error both within the CreateML UI and when using CreateML within a Swift playground. I've also tried removing duplicates and verified that the maximum ID for each column is (num_items - 1).
I've searched for documentation on what the exact requirement is for the set of IDs with no luck.
Thank you in advance for any helping clarifying this error message.
I was able to discuss this issue with Apple's CoreML developers during WWDC2020. They described this as a known bug which will be fixed with the upcoming OS (Big Sur). The work-around for this bug is:
In the CSV dataset, create records for a single user which interacts with ALL items, and create records for a single item interacted with by ALL users.
Using pandas in python, I essentially implemented the following:
# Find the unique item ids
item_ids = ratings_df.item_id.unique()
# Find the unique user ids
user_ids = ratings_df.user_id.unique()
# Create a 'dummy user' which interacts with all items
mock_item_interactions_df = pd.DataFrame({'item_id': item_ids, 'user_id': 'mock-user'})
ratings_with_mocks_df = ratings_df.append(mock_item_interactions_df)
# Create a 'dummy item' which interacts with all users
mock_item_interactions_df = pd.DataFrame({'item_id': 'mock-item', 'user_id': user_ids})
ratings_with_mocks_df = ratings_with_mocks_df.append(mock_item_interactions_df)
# Export the CSV
ratings_with_mocks_df.to_csv('data/ratings-w-mocks.csv', quoting=csv.QUOTE_NONNUMERIC, index=True)
Using this CSV, I successfully generated a CoreML model using CreateML.
Try adding unnamed first column to your csv data which counts rows from 0 ... number of items - 1
like
"","userID","itemID","rating"
0,"a","x",1
1,"a","y",0
...
I think today after adding this column it started working for me. I use UUID for userID and itemID in my training model. and be sure to sort rows by itemID so all for one itemID are close to each other

Make SQLite abort on first error (and sing songs)

I wanted to name this post Make SQLite abort on first error but StackOverflow's AI overlords decided it doesn't fit their conception of intelligent human behavior. For the record, I was googling exactly that, but perhaps even Google AI considered my question unworthy and didn't bother to help me. Mods, feel free to change the title according to what your AI bosses desire (if you can figure it out).
I have this script
create if not exists table entries (
id integer primary key,
start datetime not null,
end datetime not null
);
delete from entries;
insert into entries values (1, '2018-08-01 10:00', '2018-08-01 15:00');
insert into entries values (2, '2018-08-01 17:00', '2018-08-01 20:00');
insert into entries values (1, '2018-08-02 19:00', '2018-08-02 00:00');
insert into entries values (1, '2018-08-03 00:00', '2018-08-03 04:00');
insert into entries values (1, '2018-08-03 14:00', '2018-08-03 18:00');
There is a mistake in create statement. When I run the script I get
% sqlite3 db.sqlite3 <ddl.sql
Error: near line 1: near "if": syntax error
Error: near line 7: no such table: entries
Error: near line 8: no such table: entries
Error: near line 9: no such table: entries
Error: near line 10: no such table: entries
Error: near line 11: no such table: entries
Error: near line 12: no such table: entries
How do I make SQLite exit executing the script on first error it encounters? I'm looking for equivalent of set -e in Bash.
From the documentation, it looks like you can turn on the dot command .bail.
.bail on|off Stop after hitting an error. Default OFF
See also - O'Reilly Using Sqlite
Edit
To exit, you can use the .exit dot command.

Uneven numbers of tokens and subscript out of bound errors

I am trying to analyze data from flow cytometry, where there is a package that was developed like 10 years ago. It requires a few dependencies packages that I was able to install all.
Now when I tried to run it with the first function to create a gate frame for a winlist processed fcs file.
create_gate_frame(frame = archframe1x36x16, inputfile =c("facsdata/TrungTran/Gelfree-8-lane7-5_1_1_A5.fcs"), popdesc = "frames/popdescriptions/array1xpopdesc.txt").
I just got the following errors that I don't know how to solve. So, any help would be very much appreciated.
uneven number of tokens: 1013
The last keyword is dropped.
uneven number of tokens: 1013
The last keyword is dropped.
Error in mat[, c(scatters, dims1, dims2, PE)] : subscript out of bounds

Required Appropriate query to find out the result

I need a desired result with the less number of execution time.
I have a table which contains many rows (over 100k) , in this table a field name is notes varchar2(1800).
It contains following values:
notes
CASE Transfer
Surnames AAA : BBBB
Case Status ACCOUNT TXFERRED TO BORROWERS
Completed Date 25/09/2022
Task Group 16
Message sent at 12/10/2012 11:11:21
Sender : lynxfailures123#google.com
Recipient : LFRB568767#yahoo.com
Received : 21:31 12/12/2002
Rows should return with the values of(ACCOUNT TXFERRED TO BORROWERS).
I have used the following queries but it takes a long time(72150436 sec) to execute:
Select * from cps_case_history where (dbms_lob.instr(notes, 'ACCOUNT
TFR TO UFSS') > 1)
Select * from cps_case_history where notes like '%ACCOUNT TFR TO
UFSS%'
Could you please share us the exact query which will take less time to execute.
Can you try parallel hints. Optimizer hints
Select /*+ PARALLEL(a,8) */ a.* from cps_case_history a
where INSTR(NOTES,'Text you want to search') > 0; -- your condition
Replace 8 with 16 and see if the performance improves further.
Avoid % in beginning of the like operator
ie., where notes like '%Account...'
Updated answer : Try creating partition tables.You can go with range partitioning on completed_date column Partitioning

Resources