I am able to find out the Count number of occurrences of values in a single column.
By using
select column_name,count(count_name)
from table_name order by column_name
But I want a query for no of occurrences of multiple column values.
The count function, when used directly on a column, just returns a count of the rows. The sum of the counts over multiple columns is just the amount of rows times the amount of columns. One thing we could do is to return the sum of decodes of the condition over all columns, e.g.:
select mytable.*,
DECODE(mytable.column1,"target value",1,0) + DECODE(mytable.column2,"target
value",1,0) as hits from mytable
Basically what that does, is for each row, it will check the amount of columns that meet the condition. In this case, that value ('hits') can be 0, 1 or 2 because we are checking the condition over 2 columns.
Related
I want to find a series of consecutive rows in a dataset where a condition is met the most often.
I have two columns that I can use for this; Either one with ones and zeros that alternate based on the presence or absence of a condition or a column which increments for the duration across which the desirable condition is present. I envision that I will need to use subset(),filter(), and/or rle() in order to make this happen but am at a loss as to how to get it to work.
In the example, I want to find 6 sequential rows that maximize the instances in which happens occurs.
Given the input:
library(data.frame)
df<-data.frame(time=c(1:10),happens=c(1,1,0,0,1,1,1,0,1,1),count=c(1,2,0,0,1,2,3,0,1,2))
I would like to see as the output the rows 5 through 10, inclusive, as the data subset output, using either the happens or count columns since this sequence of rows would yield the highest output of happens occurrences on 6 consecutive rows.
library(zoo)
which.max( rollapply( df$happens, 6, sum) )
#[1] 5
The fifth window of 6 rows apparently holds the maximum sum of df$happens
So the answer is row 5:10
I added a column in my SQLite database, and I need to insert repeating sequence numbers, starting with 1...n BUT it's based on grouping by other columns. The sequence needs to start over at 1 again when there is a new grouping.
Here is my table:
CREATE TABLE "ProdRunResults" ("ID" INTEGER PRIMARY KEY NOT NULL UNIQUE , "SeqNumbr" INTEGER, "Shift" INTEGER, "ShiftSeqNumbr" INTEGER, "Date" DATETIME, "ProdRunID" INTEGER, "Result" VARCHAR)
ShiftSeqNumbr is the new column that I need to populate with sequence numbers, based on grouping of numbers in ProdRunID column then by numbers in the Shift column.
There could be up to 3 "shifts" (work shifts in a 24 hr period).
I scraped together some code to do this but it adds the sequence numbers to ShiftSeqNumbr column in reverse (descending) order:
UPDATE ProdRunResults
SET ShiftSeqNumbr = (SELECT COUNT (*)
FROM ProdRunResults AS N
WHERE N.ProdRunID = ProdRunResults.ProdRunID
AND N.Shift = ProdRunResults.Shift
AND N.ShiftSeqNumbr = ProdRunResults.ShiftSeqNumbr);
How can I change the Update statement so the sequence numbers start at 1 and go up? Or is there a better way to do this?
Your UPDATE statement counts how many rows there are that have the same values in the ProdRunID/Shift/ShiftSeqNumbr columns as the current row. The current row always has an empty value in ShiftSeqNumbr, so it is counting how many rows in the current group have not yet been updated.
You need to count how many rows come before the current row, i.e., how many rows have the same ProdRunID and Shift values, and the same or a smaller SeqNumbr value:
UPDATE ProdRunResults
SET ShiftSeqNumbr = (SELECT COUNT (*)
FROM ProdRunResults AS N
WHERE N.ProdRunID = ProdRunResults.ProdRunID
AND N.Shift = ProdRunResults.Shift
AND N.SeqNumbr <= ProdRunResults.SeqNumbr);
I have a dataframe with multiple columns and I want to apply different functions on each column.
An example of my dataset -
I want to calculate the count of column pq110a for each country mentioned in qcountry2 column(me-mexico,br-brazil,ar-argentina). The problem I face here is that I have to use filter on these columns for example for sample patients I want-
Count of pq110 when the values are 1 and 2 (for some patients)
Count of pq110 when the value is 3 (for another patients)
Similarly when the value is 6.
For total patient I want-total count of pq110.
Output I am expecting is-Output
Similalry for each country I want this output.
Please suggest how can I do this for other columns also,countrywise.
Thanks !!
I guess what you want to do is count the number of columns of 'pq110' which have the same value within different 'qcountry2'.
So I'll try to use 'tapply' to divide data into several subsets and then use 'table' to count column number for each different value.
tapply(my_data[,"pq110"], INDEX = as.factor(my_data[,"qcountry2"]), function(x)table(x))
I am exploring a new table in SQL and was wondering what is the best way find the count of occurrence of each value. In essence I would like to better understand the distribution of values in the column.
At first I did a select Top 10000 for the table and for this particular column I am interested in I get 2-3 differing values. Let's call them A, B, C.
But when I do a select distinct on that column I get 5 million separate values.
What I am wanting to do is know the distribution of the values in the column.
So an example of output from the query I am looking for being:
Distinct Value of Column Count of Occurrence
A A lot
B A lot
C A lot
D 1
E 1
F 1
G 1
What's your looking for is "GROUP BY" :
Exemple :
SELECT category, COUNT(*) FROM CATALOGS GROUP BY category
Will give you the number of element per category.
I would like to divide every number in all columns by 1000. I would like to omit the row header and the 1st column from this function.
I have tried this code:
TEST2=(TEST[2:503,]/(1000))
But it is not what I am looking for. My dataframe has 503 columns.
Is TEST a dataframe? In that case, the row header won't be divided by 1000. To choose all columns except the first, use an index in j to select all columns but the first? e.g.
TEST[, 2:ncol(TEST)]/1000 # selects every row and 2nd to last columns
# same thing
TEST[, -1]/1000 # selects every row and every but the 1st column
Or you can select columns by name, etc (you select columns just like how you are selecting rows at the moment).
Probably take a look at ?'[' to learn how to select particular rows and columns.