I have a table like this
acct_no code
==============
12 A
12 B
13 A
14 Z
15 w
There are also other columns in the table.
I am trying to find for which acct_no there are more than one code.
Like in the above 12 should be the result.
My query
Select acct_no,code,count(code)
from dbo.cas
group by acct_no
having count(code)>1
I want to know if this will work fine. The database is very large and the query is taking too much time to execute. I want it to be exactly correct.
Related
I have a table like below. Each row has store id, discount % for one of their coupons. Each store could have multiple coupons but (store+discount %) is a primary key. I would like to find out top 10 coupons (by decreasing order of discount %) but would like to get only 2 coupon from the same store. What is the most efficient way to do this? My logic involves sorting data multiple times. Is there a better and more efficient way? I would like to do this in R.
Sample data:
df <- data.frame(Store=c("Lowes","Lowes","Lowes","Lowes","HD","HD","HD","ACE",
"ACE","Misc","Misc","Other","Other","Last","Last","Last"),
`discount_%`=c("60%","50%","40%","30%","60%","50%","40%","30%",
"20%","50%","30%","20%","10%","10%","5%","3%"),
check.names = FALSE)
my solution is ignore the store and sort the table by discount then
create a ID. ID would represent coupons in descending order
Then by Store and discount create ID2 which would have rankings of
coupons by store.
then filter all rows where ID2>2
then sort table by ID
take top 10 rows
Try this:
df$`discount_%` <- as.numeric(gsub("%","",df$`discount_%`))
require(data.table)
setDT(df)[order(-`discount_%`),.SD[1:2],by=Store][order(-`discount_%`)[1:10],]
Output:
Store discount_%
1: Lowes 60
2: HD 60
3: Lowes 50
4: HD 50
5: Misc 50
6: Misc 30
7: ACE 30
8: ACE 20
9: Other 20
10: Other 10
Data is easier to work with in R without special characters, but if you need to add the percent sign back, try something like this:
paste0(df$`discount_%`,"%")
I have the following data frame:
group_id date_show date_med
1 1976-02-07 1971-04-14
1 1976-02-09 1976-12-11
1 2011-03-02 1970-03-22
2 1993-08-04 1997-06-13
2 2008-07-25 2006-09-01
2 2009-06-18 2005-11-12
3 2009-06-18 1999-11-03
I want to subset my data frame in such a way that the new data frame only shows the rows in which the values of date_show are further than 10 days apart but this condition should only be applied per group. I.e. if the values in the date_show column are less than 10 days apart but the group_ids are different, I need to keep both entries. What I want my result to look like based on the above table is:
group_id date_show date_med
1 1976-02-07 1971-04-14
1 2011-03-02 1970-03-22
2 1993-08-04 1997-06-13
2 2008-07-25 2006-09-01
2 2009-06-18 2005-11-12
3 2009-06-18 1999-11-03
Which row gets deleted isn't important because the reason why I'm subsetting in the first place is to calculate the number of rows I am left with after applying this criteria.
I've tried playing around with the diff function but I'm not sure how to go about it in the simplest possible way because this problem is already within another sapply function so I'm trying to avoid any kind of additional loop (in this case by group_id).
The df I'm working with has around 100 000 rows. Ideally, I would like to do this with base R because I have no rights to install any additional packages on the machine I'm working on but if this is not possible (or if solving this with an additional package would be significantly better), I can try and ask my admin to install it.
Any tips would be appreciated!
In sparkR I have a DataFrame data. It contains user, game.
user contains the users and game contains the name of a game a user has played. There are only 14 games, namely 1,2,...,14.
So
head(data)
gives this output
user game
3521 3
52 14
865 4
52 3
I want to find the first game a fixed user is playing. For example user 52 plays game 14 3 3 5 10 and here game 14 is the first game this user was playing.
In sparkR I do it this way
su <- groupBy(data, data$user)
sus <- agg(su, FirstPlayed= first(data$game))
# Making it local
local_sus <- collect(sus)
Here I get the correct result because I can use the first function in sparkR.
I want to find the 'second' and 'third' game a user has played but I can't do that because sparkR don't have a "second" function.
How should one then solve it - maybe I should use the except-function to delete the first element ?
I am querying data in a table that has a field value that I want across the top of my report, and then multiple values I want to display down. Perhaps a cross-tab will work, but if so, how? (I'm open to suggestions to using something besides a cross-tab or formatting the original data differently.)
My data looks something like this:
Record Type Value1 Value2 PercentV1V2 Value3 PercentV1V3
TypeA 10 100 10 50 20
TypeB 20 40 50 200 10
TypeC 50 100 50 50 100
And, I would like my output to look like this (this formatting is not so negotiable)
TypeA TypeB TypeC
Set1Info
Value1 10 20 50
Value2 100 40 100
PercentV1V2 10 50 50
Set2Info
Value3 50 200 50
PercentV1V3 20 10 100
I've been messing with a cross-tab and I can get the first value set. But there doesn't appear to be a way to add a new row to the cross-tab.
I think I could format the data differently and use a cross-tab, but I'd prefer doing the calculations in my procedure, not in Crystal, if possible. They won't be simple totals, in any case.
So, if I formatted my data like so:
Record Type ValueType Value
TypeA Value1 10
TypeA Value2 100
TypeA PercentV1V2 10 <-- this would take another pass to calculate
TypeB Value1 20
etc, then I think the cross-tab would work better. But even then, would I be able to make formatting changes between Set1Info and Set2Info?
Or, is there a way with the original data to get the Types across the top and the values down, but without using a cross-tab? I can hard-code the headers, but that means every number displayed would have to be a formula, right? (I'd need to match up the correct number with each RecordType and ValueType.)
It seems like I'm overlooking something obvious, an easy way to approach this. But sometimes with Crystal, there is no easy way.
To the extent I understand your problem and requirement... looks like what you have done is correct.
Only one addition that comes in to my mind to solve the issue is to use Embeeded Summay.
Use the embeeded summary and add a blank row where require and add your Set2Info as header.
Solution 2:
One option you can try is to create a formula #Group and manually saperate the rows and create a group.
if RecordType = "Value1" or RecordType = "Value2" or RecordType = " PercentV1V2"
then "Set1info"
go to the cross tab expert and place the formula first in rows part so that it will group as required and show you a header.
So I am a bit new with R, so forgive me if this is a silly question. I have a data set of behaviors that looks something like this:
time behavior
10:04:36 FEED
10:04:37 FEED
10:04:38 REST
10:04:39 REST
10:04:40 RUN
etc..
I have added a column that numbers each new behavior as a unique number, something like:
time behavior Number
10:04:36 FEED 1
10:04:37 FEED 1
10:04:38 REST 2
10:04:39 REST 2
10:04:40 RUN 3
Therefore, if the behaviors at 10:04:36 and 10:30:00 are both FEED, they are still recognized as different behavior events because they have different numbers. I then subsetted my data by behavior category so that I have a dataset of all one behavior. However, in this data set I have Number categories for each time I have a new behavior event, for example:
time behavior Number
10:04:36 FEED 1
10:04:37 FEED 1
10:30:00 FEED 10
10:30:01 FEED 10
10:30:02 FEED 10
11:01:00 FEED 21
11:01:01 FEED 21
etc...
Now, what I would like to do is randomize this new dataset by Number category. So I would like to tell R to take each chunk of data with the same Number value and reorganize these chunks. I tried to use sample(), but that only seems to work for randomizing by row. As you can see the Number categories are not all the same size either. Basically I would like to create a new matrix that looks something like this:
time behavior Number
10:30:00 FEED 10
10:30:01 FEED 10
10:30:02 FEED 10
11:01:00 FEED 21
11:01:01 FEED 21
10:04:36 FEED 1
10:04:37 FEED 1
So, I would like R to recognize each new Number category as a distinct event, and randomly reorganize the data by each new event, not by row.
Does anyone know a way to do what I am trying to do in R?
You could create a helper funciton, such as
reorderingFunc <- function(data, indxCol){
indx <- sample(unique(data[, indxCol]))
data[order(unique(data[, indxCol])[match(data[, indxCol], indx)]), ]
}
Testing
set.seed(111) # Setting a seed so the outcome of `sample` be reproducible
reorderingFunc(df, "Number")
# time behavior Number
# 3 10:30:00 FEED 10
# 4 10:30:01 FEED 10
# 5 10:30:02 FEED 10
# 6 11:01:00 FEED 21
# 7 11:01:01 FEED 21
# 1 10:04:36 FEED 1
# 2 10:04:37 FEED 1