$V{REPORT_COUNT} not counting logically - count

I'm using the $V{REPORT_COUNT} function in iReport to generate row numbers per ID, but when an ID has more than one Value it is using the rows to add to the row count giving the current output below.
Current Output
Row Number ID Value
1 23 A
2 65 N
3 89 P
4 34 B
Q
A
7 77 B
I want the output to be like the below with each row number only incriminating by ID
$V{REPORT_COUNT} Settings
Print Repeated Values is TRUE
Evaluation Time is NOW
Desired Output
Row Number ID Value
1 23 A
2 65 N
3 89 P
4 34 B
Q
A
5 77 B

I couldnt use $V{REPORT_COUNT} as it counts the number of rows in the report and the value cannot be manipulated as this is an inbuilt variable But a workaround was that create Report Group Based on ID and create a variable that just counts ID , set Increment Type to Group --> (ID) and Reset Type is set to Report. This will count the unique ID's
I then moved Rownumber, ID and Value in the Group Header and put Value in the detail band though the value text field in the detail band will have the Print When Expression: $V{ID_Count} != 1 (This will remove the repetition)

Related

How to find the ID number of a value?

I am currently working with a dataset with 551 observation and 141 variables. Normally there are some mistakes done by the data entry operators and I am now screening and correcting those. But the problem is the ID number and the row number of the dataset is not similar/corresponding. And I can only bring the row number where the problematic data lies in. It is taking more time of mine to find the ID number as they do not correspond. Is there any way to get the ID number of the problematic data within one command?
Suppose, the row number of the B345 ID, is #1. For B346 ID the row is #2.
My dataset is presented like this-
ID S1 S2 S3 I30 I31 I34
B345 12 23 3 2 1 4
B346 15 4 4 3 2 4
I am using the following command in my original dataset and got the following results. Row number 351 and 500 but actually their ID number is B456 and B643.
which (x$I30 ==0)
[1] 351 500
I am expecting to get the ID number within 1 command. It will be very helpful to me.
How about this?
x$ID[which(x$I30==0)]
We can just use the logical condition to subset the 'ID'
x$ID[x$I30 ==0]

Using list of row numbers as criteria to populate field

I have a list of row numbers that represent row containing outliers in a data set. I would like to add an "outlier" column to the original data set that flags the rows containing outliers, but I can't figure out how to use row numbers as criteria in r.
Example:
I have a dataframe like this:
id <-c("a","b","c","d")
values <-c(10,11,22,33)
df<-data.frame(names,values)
id values
1 a 10
2 b 11
3 c 22
4 d 33
And a list like this containing row number (more correctly "row names"):
outliers <-c(2,4)
I'd like to find a way to use the list of row numbers as criteria in something like:
df$outlier_test<-ifelse( if row number is on my list, "outlier","")
to produce something like this:
id values outlier_test
1 a 10
2 b 11 outlier
3 c 22
4 d 33 outlier
Spent quite a while trying to puzzle this out and had inspiration as soon as I posted the question. For anyone else who comes here with this question:
First:
df$rownumber<- row.names(df)
then:
df$outlier_test<- ifelse(df$rownumber %in% outliers,"outlier","")

Generate a column with the number of times a value happens in a column in R

First: I already checked Using R: Make a new column that counts the number of times 'n' conditions from 'n' other columns occur and I believe this is different.
I have a huge dataset (1,304,708 observations) containing students' scores on a test and information about the classroom they are in, and I need to know how many students per group there are. My datatable is called data_means, and I am trying to create group_size. X.2 is the unique identifier of each student (a number) and classroom is a factor variable indicating their classroom.
I need something like this third column (group_size)
X.2 classroom group_size
1 09PTV0002Q 3
2 09PTV0002Q 3
3 09PTV0002Q 3
4 09PTV0007B 2
5 09PTV0007B 2
7 15PTV0014Z 4
8 15PTV0014Z 4
9 15PTV0014Z 4
10 15PTV0014Z 4
data_means$group_size <-data_means[, Count := .N, by = list(X.2, classroom)]

What is the function of an ID statement in Proc means in SAS?

I am working on replicating a SAS code into a R code and I came across the following SAS code snippet -
proc means data=A noprint;
by name date;
id comp_no;
var price;
id rep_dats act no;
output out= test(drop=_type_ _freq_)
median=median n=num;
run;
I know that the 'by' statement is used to group by to give statistics at that level. But, what is 'id' used for? Why are there two 'id' statements? I checked out SAS help but I didn't really understand it. I also checked out their examples at http://support.sas.com/documentation/cdl/en/proc/65145/HTML/default/viewer.htm#p19dfq16fqt1t3n1eroiabnn6r3s.htm.
But there was no example illustrating the use of ID.
As I don't have access to SAS, I can't try this out and see how the output looks like.
Any clarifications would be of great help to me. Thanks!
The proc means procedure can calculate and display simple summary statistics of a data set and output that summary statistics. By default, it summarizes numeric variables (columns) by analyzing every numeric variable in the data set.
By using ID statement with by in a proc means it will produce a one value per group. This one value is the greatest value of the first variable specified in ID within the by group. Thus, if you specify many variables, e.g. id A B; It will output the only greatest value of A for that group.
http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a000146733.htm
By the way, I don't know how your data set looks like, but it seems like your proc means is only summarizing the price variable.
For example, if you have a data set:
Obs sex A B C D
1 M 20 50 1 34
2 F 500 45 3 45
3 M 200 23 7 32
4 M 120 67 5 44
5 F 400 98 2 59
then
proc means data=sorted;
by sex;
var A B;
id D C;
output out=means(drop =_type_ _freq_);
run;
will output:
sex D C _STAT_ A B
F 59 2 N 2.000 2.0000
F 59 2 MIN 400.000 45.0000
F 59 2 MAX 500.000 98.0000
F 59 2 MEAN 450.000 71.5000
F 59 2 STD 70.711 37.4767
M 44 5 N 3.000 3.0000
M 44 5 MIN 20.000 23.0000
M 44 5 MAX 200.000 67.0000
M 44 5 MEAN 113.333 46.6667
M 44 5 STD 90.185 22.1886
Note that in variable D, 59 is the greatest value of D in group F, but C is not because D was specified first. It is the similar case for Group M as well where C is just the number that was on the same row as the greatest value of D.
It allows you to add columns to the output other than the columns in the class and var statements. This makes sense if the id variable is constant across each class combination; otherwise sas returns the largest value within each combination of classes. See here:
http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a000146733.htm

How to build a new column (/data.frame) from a table, and assign corresponding values to the rows

I printed out the summary of a column variables as such:
summary(document$subject)
A,B,C,D,E,F,.. are the subjects belonging to a column of a data.frame where A,B,C,...appear many times in the column, and the summary above shows the number of times (frequency) these subjects have appeared in the file. Also, the term "OTHER" refers to those subjects which have appeared only once in the file, I also need to assign "1" to these subjects.
There are so many different subjects that it's difficult to list out all of them if we use command "c".
I want to build up a new column (or data.frame) and then assign these corresponding numbers (scores) to the subjects. Ideally, it will become this in the file:
A 198
B 113
C 96
D 69
A 198
E 65
F 62
A 198
C 113
BZ 21
BC 1
CJ 1
...
I wonder what command I should use to take the scores/values from the summary table and then build a new column to assign these values to the corresponding subjects in the file.
Plus, since it's a summary table printed by R, I don't know how to build it into a table in a file, or take out the values and subject names from the table. I also wonder how I could find out the subject names which appeared only once in the file, so that the summary table added them up into "OTHER".
Your question is hard to interpret without a reproducible example. Please take a look this threat for tips on how to do that:
How to make a great R reproducible example?
Having said that, here is how I interpret your question. You have two data frames, one with a score per subject and another with the subjects multiple times in a column:
Sum <- data.frame(subject=c("A","B"),score=c(1,2))
foo <- data.frame(subject=c("A","B","A"))
> Sum
subject score
1 A 1
2 B 2
> foo
subject
1 A
2 B
3 A
You can then use match() to match the subjects in one data frame to the other and create the new variable in the second data frame:
foo$score <- Sum$score[match(foo$subject, Sum$subject)]
> foo
subject score
1 A 1
2 B 2
3 A 1

Resources