Multiple orderBy in firestore - firebase

I have a question about how multiple orderBy works.
Supposing these documents:
collection/
doc1/
date: yesterday at 11:00pm
number: 1
doc2/
date: today at 01:00am
number: 6
doc3/
date: today at 13:00pm
number: 0
If I order by two fields like this:
.orderBy("date", "desc")
.orderBy("number", "desc")
.get()
How are those documents sorted? And, what about doing the opposite?
.orderBy("number", "desc")
.orderBy("date", "desc")
.get()
Will this result in the same order?
I'm a bit confused since I don't know if it will always end up ordering by the last orderBy.

In the documentation for orderBy() in Firebase it says this:
You can also order by multiple fields. For example, if you wanted to order by state, and within each state order by population in descending order:
Query query = cities.orderBy("state").orderBy("population", Direction.DESCENDING);
So, it is basically that. With logic from SQL where you have ORDER BY to order your table. Let's say you have a database of customers who are from all over the world. Then you can use ORDER BY Country and you will order them by their Country in any order you want. But if you add the second argument, let's say Customer Name, then it will first order by the Country and then within that ordered list it will order by Customer Name. Example:
1. Adam | USA |
2. Jake | Germany |
3. Anna | USA |
4. Semir | Croatia |
5. Hans | Germany |
When you call orderBy("country") you will get this:
1. Semir | Croatia |
2. Jake | Germany |
3. Hans | Germany |
4. Adam | USA |
5. Anna | USA |
Then when you call orderBy("customer name") you get this:
1. Semir | Croatia |
2. Hans | Germany |
3. Jake | Germany |
4. Adam | USA |
5. Anna | USA |
You can see that Hans and Jake switched places, because H is before J but they are still ordered by the Country name. In your case when you use this:
.orderBy("date", "desc")
.orderBy("number", "desc")
.get()
It will first order by the date and then by the numbers. But since you don't have the same date values, you won't notice any difference. This also goes for the second one. But let's say that one of your fields had the same date, so your data looks like this:
collection/
doc1/
date: yesterday at 11:00pm
number: 1
doc2/
date: today at 01:00am
number: 6
doc3/
date: today at 01:00am
number: 0
Now, doc2 and doc3 are both dated to today at 01:00am. Now when you order by the date they will be one below the other, probably doc2 will be shown first. But when you use orderBy("number") then it will check for numbers inside the same dates. So, if its just orderBy("number") without "desc" you would get this:
orderBy("date");
// output: 1. doc1, 2. doc2, 3. doc3
orderBy("number");
// output: 1. doc1, 2. doc3, 3. doc2
Because number 0 is before 6. Just reverse it for desc.

Related

How to obtain distinct values based on another column in the same table?

I'm not sure how to word the title properly so sorry if it wasn't clear at first.
What I want to do is to find users that have logged into a specific page, but not the other.
The table I have looks like this:
Users_Logins
------------------------------------------------------
| IDLogin | Username | Page | Date | Hour |
|---------|----------|-------|------------|----------|
| 1 | User_1 | Url_1 | 2019-05-11 | 11:02:51 |
| 2 | User_1 | Url_2 | 2019-05-11 | 14:16:21 |
| 3 | User_2 | Url_1 | 2019-05-12 | 08:59:48 |
| 4 | User_2 | Url_1 | 2019-05-12 | 16:36:27 |
| ... | ... | ... | ... | ... |
------------------------------------------------------
So as you can see, User 1 logged into Url 1 and 2, but User 2 logged into Url 1 only.
How should I go about finding users that logged into Url 1, but never logged into Url 2 during a certain period of time?
Thanks in advance!
I will try to improve the title of your question later, but for the time being, this is how I accomplished what you are asking for:
Query:
select distinct username from User_Logins
where page = 'Url_1'
and username not in
(select username from User_Logins
where Page = 'Url_2')
and date BETWEEN '2019-05-12' AND '2019-05-12'
and hour BETWEEN '00:00:00' AND '12:00:00';
Returns:
User_2
Comments:
I basically used a sub query to filter out the usernames you don't care about. :)
The time range is getting only 1 result, which you can test by removing the "distinct" in the first line of the query. If you then remove the time range from the query, you'll get 2 results.
You can do it with group by username and apply the conditions in a HAVING clause:
select username
from User_Logins
where
date between '..........' and '..........'
and
hour between '..........' and '..........';
group by username
having
sum(page = 'Url_1') > 0
and
sum(page = 'Url_2') = 0
Replace the dots with the date/time intervals you want.

Levensthein logic to get all the string with minimum difference

Suppose i have a datframe with values
Mtemp:
-----+
code |
-----+
Ram |
John |
Tracy|
Aman |
i want to compare it with dataframe
M2:
------+
code |
------+
Vivek |
Girish|
Rum |
Rama |
Johny |
Stacy |
Jon |
i want to get result so that for each value in Mtemp i will get maximum 2 possible match in M2 with Levensthein distance 2.
i have used
tp<-as.data.frame(amatch(Mtemp$code,M2$code,method = "lv",maxDist = 2))
tp$orig<-Mtemp$code
colnames(tp)<-c('Res','orig')
and i am getting result as follow
Res |orig
-----+-----
3 |Ram
5 |John
6 |Tracy
4 |Aman
please let me know a way to get 2 values(if possible) for every Mtemp string with Lev distance =2

How can I access the data in a Cassandra Table using RCassandra

I need to get the data in a column of a table Cassandra Database. I am using RCassandra for this. After getting the data I need to do some text mining on it. Please suggest me how do connect to cassandra, and get the data into my R Script using RCassandra
My RScript :
library(RCassandra)
connect.handle <- RC.connect(host="127.0.0.1", port=9160)
RC.cluster.name(connect.handle)
RC.use(connect.handle, 'mykeyspace')
sourcetable <- RC.read.table(connect.handle, "sourcetable")
print(ncol(sourcetable))
print(nrow(sourcetable))
print(sourcetable)
This will print the output as:
> print(ncol(sourcetable))
[1] 1
> print(nrow(sourcetable))
[1] 18
> print(sourcetable)
144 BBC News
158 IBN Live
123 Reuters
131 IBN Live
But my cassandra table contains four columns, but here its showing only 1 column. I need to get each column values separated. So how do I get the individual column values(Eg.each feedurl) What changes should I make in my R script?
My cassandra table, named sourcetable
I have used Cassandra and R with the correct Cran Jar files, but RCassandra is easier. RCassandra is a direct interface to Cassandra without the use of Java. To connect to Cassandra you will use RC.connect to return a connection handle like this.
RC.connect(host = <xxx>, port = <xxx>)
RC.login(conn, username = "bar", password = "foo")
You can then use a RC.get command to retrieve data or RC.ReadTable command to read table data.
BUT, First you should read THIS
I am confused as well. Table demo.emp has 4 row and 4 columns ( empid, deptid, first_name and last_name). Neither RC.get nor RC.read.table gets the all the data.
cqlsh:demo> select * from emp;
empid | deptid | first_name | last_name
-------+--------+------------+-----------
1 | 1 | John | Doe
1 | 2 | Mia | Lewis
2 | 1 | Jean | Doe
2 | 2 | Manny | Lewis
> RC.get.range.slices(c, "emp", limit=10)
[[1]]
key value ts
1 1.474796e+15
2 John 1.474796e+15
3 Doe 1.474796e+15
4 1.474796e+15
5 Mia 1.474796e+15
[[2]]
key value ts
1 1.474796e+15
2 Jean 1.474796e+15
3 Doe 1.474796e+15
4 1.474796e+15
5 Manny 1.474796e+15

Last matching date in spreadsheet function

I have a spreadsheet where dates are being recorded in regards to individuals, with additional data, as such:
Tom | xyz | 5/2/2012
Dick | foo | 5/2/2012
Tom | bar | 6/1/2012
On another sheet there is a line in which I want to be able to put in the name, such as Tom, and retrieve on the following cell through a formula the data for the LAST (most recent by date) entry in the first sheet. So the first sheet is a log, and the second sheet displays the most recent one. In the following example, the first cell is entered and the remaining are formulas displaying data from the first sheet:
Tom | bar | 6/1/2012
and so on, showing the latest dated entry in the log.
I'm stumped, any ideas?
If you only need to do a single lookup, you can do that by adding two new columns in your log sheet:
Sheet1
| A | B | C | D | E | F
1 | Tom | xyz | 6/2/2012 | | * | *
2 | Dick | foo | 5/2/2012 | | * | *
3 | Tom | bar | 6/1/2012 | | * | *
Sheet2
| A | B | C
1 | Tom | =Sheet1.E1 | =Sheet1.F1
*(E1) = =IF(AND($A1=Sheet2.$A$1;E2=0);B1;E2)
(i.e. paste the formula above in E1, then copy/paste it in the other cells with *)
Explanation: if A is not what you're looking for, go for the next; if it is, but there is a non-empty next, go for the next; otherwise, get it. This way you're selecting the last one corresponding to your search. I'm assuming you want the last entry, not "the one with the most recent date", since that's what you asked in your example. If I interpreted your question wrong, please update it and I can try to provide a better answer.
Update: If the log dates can be out of order, here's how you get the last entry:
*(F1) = =IF(AND($A1=Sheet2.$A$1;C1>=F2);C1;F2)
*(E1) = =IF(C1=F1;B1;E2)
Here I just replaced the test F2=0 (select next if non-empty) for C1>=F2 (select next if more recent) and, for the other column, select next if the first test also did so.
Disclaimer: I'm very inexperienced with spreadsheets, the solution above is ugly but gets the job done. For instance, if you wanted a 2nd row in Sheet2 to do another lookup, you'd need to add two more columns to Sheet1, etc.

Code new variable based on grep return in R

I have a variable actor which is a string and contains values like "military forces of guinea-bissau (1989-1992)" and a large range of other different values that are fairly complex. I have been using grep() to find character patterns that match different types of actors. For example I would like to code a new variable actor_type as 1 when actor contains "military forces of", doesn't contain "mutiny of", and the string variable country is also contained in the variable actor.
I am at a loss as to how to conditionally create this new variable without resorting to some type of horrible for loop. Help me!
Data looks roughly like this:
| | actor | country |
|---+----------------------------------------------------+-----------------|
| 1 | "military forces of guinea-bissau" | "guinea-bissau" |
| 2 | "mutiny of military forces of guinea-bissau" | "guinea-bissau" |
| 3 | "unidentified armed group (guinea-bissau)" | "guinea-bissau" |
| 4 | "mfdc: movement of democratic forces of casamance" | "guinea-bissau" |
if your data is in a data.frame df:
> ifelse(!grepl('mutiny of' , df$actor) & grepl('military forces of',df$actor) & apply(df,1,function(x) grepl(x[2],x[1])),1,0)
[1] 1 0 0 0
grepl returns a logical vector and this can be assigned to whatever, e.g. df$actor_type.
breaking that appart:
!grepl('mutiny of', df$actor) and grepl('military forces of', df$actor) satisfy your first two requirements. the last piece, apply(df,1,function(x) grepl(x[2],x[1])) goes row by row and greps for country in actor.

Resources