Specifying Order of Variables for SAS Report - report

I am working on a project where I need to specify the order of objects in the data for a custom SAS report. I am having trouble with something that should be easy, here is an example of the data I am working with.
obs ord ord2 name
1 3 1 A
2 3 . B
3 3 . C
4 3 . D
5 4 1 E
6 4 . F
7 5 1 G
8 5 . H
9 5 . I
10 5 . J
What I'd like is...
obs ord ord2 name
1 3 1 A
2 3 2 B
3 3 3 C
4 3 4 D
5 4 1 E
6 4 2 F
7 5 1 G
8 5 2 H
9 5 3 I
10 5 4 J
So that for every first occurrence of ord, ord2 = 1,...,n_i.
Thanks for the help!

Just apply a group numbering to the original data set, provided that the table has been sorted by ord.
data table1;
set table1;
by ord;
ord2_ + 1;
if first.ord then ord2_ = 1;
drop ord2;
rename ord2_=ord2;
Run;

Related

Difference Matrix in R

I would like to take the difference between two dataframes that are of different lengths and output a matrix in R.
x = data.frame(name=c('a','b','c','d','e'),length=c(5,6,7,8,9))
y = data.frame(name=c('r','t','v'),length=c(10,11,12))
> x
name length
1 a 5
2 b 6
3 c 7
4 d 8
5 e 9
> y
name length
1 r 10
2 t 11
3 v 12
The result I want is the difference in a matrix. Length of y minus length of x. I also want to keep the names consistent. So something like this:
>
0 r t v
a 5 6 7
b 4 5 6
c 3 4 5
d 2 3 4
e 1 2 3
How can I approach this problem?
This is an outer operation:
outer(setNames(x$length, x$name), setNames(y$length, y$name), FUN=\(x,y) y-x)
# r t v
#a 5 6 7
#b 4 5 6
#c 3 4 5
#d 2 3 4
#e 1 2 3

Split data.frame function by a field

I have a data frame function whose output is too lengthy which is being used as an output in a r shiny app. I want to spilt this by field fac. How could I do it. So I want tables which has fac= A and so on for the unique fields in fac. Thank you.
prod()
x y fac
1 1 1 C
2 1 2 B
3 1 3 B
4 1 4 B
5 1 5 A
6 1 6 B
7 1 7 B
8 1 8 C
9 1 9 C
10 1 10 C

How to reverse a column in R

I have a dataframe as described below. Now I want to reverse the order of column B without hampering the total order of the dataframe. So now the column B has 5,4,3,2,1. I want to change it to 1,2,3,4,5. I don't want to sort as it will hamper the total ordering.
A B C
1 5 6
2 4 8
3 3 5
4 2 5
5 1 3
You can replace just that column:
x$B <- rev(x$B)
On your data:
> x$B <- rev(x$B)
> x
A B C
1 1 1 6
2 2 2 8
3 3 3 5
4 4 4 5
5 5 5 3
transform is also handy for this:
> transform(x, B = rev(B))
A B C
1 1 1 6
2 2 2 8
3 3 3 5
4 4 4 5
5 5 5 3
This doesn't modify x so you need to assign the result to something (perhaps back to x).

Returning 1st Largest and 2nd Largest numbers

df <- A B C D E F G H
0 1 2 3 4 5 6 7
1 2 3 8 5 6 7 4
Need to find the 1st and 2nd largest number in the above given data frame . Result should be as below .
A B C D E F G H 1st Largest 2nd Largest
0 1 2 3 4 5 6 7 7 6
1 2 3 8 5 6 7 4 8 7
We can loop through the rows using apply (with MARGIN=1), sort the elements with decreasing=TRUE option, and get the first two elements with head or just [1:2], transpose the output and assign it to create two new columns in 'df'.
df[c("firstLargest", "SecondLargest")] <- t(apply(df, 1,
function(x) head(sort(x, decreasing=TRUE),2)))
df
# A B C D E F G H firstLargest SecondLargest
#1 0 1 2 3 4 5 6 7 7 6
#2 1 2 3 8 5 6 7 4 8 7

R For Loop with Certain conditions

I have a dataframe (surveillance) with many variables (villages, houses, weeks). I want to eventually do a time-series analysis.
Currently for each village, there are between 1-183 weeks, each of which has several houses associated. I need the following: each village to have a single data point at each week. Thus, I need to sum up a third variable.
Example:
Village Week House Affect
A 3 7 12
B 6 3 0
C 6 2 2
A 3 9 1
A 5 8 0
A 5 2 8
C 7 19 0
C 7 2 1
I tried this and failed. How do I ask R to only sum observations with the same village and week value?
for (i in seq(along=surveillance)) {
if (surveillance$village== surveillance$village& surveillance$week== surveillance$week)
{surveillance$sumaffect <- sum(surveillance$affected)}
}
Thanks
No need for loop. Use ddply or similar
library(plyr)
Village = c("A","B","C","A","A","A","C","C")
Week = c(3,6,6,3,5,5,7,7)
Affect = c(12,0,2,1,0,8,0,1)
df = data.frame(Village,Week,Affect)
View(df)
result = ddply(df,.(Village,Week),summarise, val = sum(Affect))
View(result)
DF:
Village Week Affect
1 A 3 12
2 B 6 0
3 C 6 2
4 A 3 1
5 A 5 0
6 A 5 8
7 C 7 0
8 C 7 1
Result:
Village Week val
1 A 3 13
2 A 5 8
3 B 6 0
4 C 6 2
5 C 7 1
The function aggregate will do what you need.
dfs <- ' Village Week House Affect
1 A 3 7 12
2 B 6 3 0
3 C 6 2 2
4 A 3 9 1
5 A 5 8 0
6 A 5 2 8
7 C 7 19 0
8 C 7 2 1
'
df <- read.table(text=dfs)
Then the aggregation
> aggregate(Affect ~ Village + Week , data=df, sum)
Village Week Affect
1 A 3 13
2 A 5 8
3 B 6 0
4 C 6 2
5 C 7 1
This is an example of a split-apply-combine strategy; if you find yourself doing this often, you should investigate the dplyr (or plyr, its ancestor) or data.table as alternatives to quickly doing this sort of analysis.
EDIT: updated to use sum instead of mean

Resources