SAS summary data transposed to line list? - count

I have a data set like this
YEAR GENDER RACE AGE COUNT
2015 Female W 30 3
So in 2015, there were 3 White 30 year females. I'd like to transpose this into line list data, like this:
YEAR GENDER RACE AGE
2015 Female W 30
2015 Female W 30
2015 Female W 30
Any help would be greatly appreciated! Thank you!

I added a row for purpose of illustration:
data have;
input YEAR GENDER $ RACE $ AGE COUNT;
datalines;
2015 Female W 30 3
2014 Male B 45 4
;
The following code will hopefully accomplish what you are asking for:
data want (drop=i count);
set have;
do i = 1 to count;
output;
end;
run;

Related

How to perform a Poisson Regression for patient count data in R

I have a dataset (DF) for patients seen at the emergency department of a hospital who were all admitted for heart attacks from the years of 2010-2015 (simplified example of data is below, each row is a new patient and the actual dataset is over 1000 patients)
Patient ID age smoker Overweight YearHeartAttack
0001 34 Y N 2015
0002 44 Y Y 2014
0003 67 N N 2015
0004 75 Y Y 2011
0005 23 N Y 2015
0006 45 Y N 2010
0007 55 Y Y 2013
0008 64 N Y 2012
0009 27 Y N 2012
0010 48 Y Y 2014
0011 65 N N 2010
I'd like to model a poisson regression for the number of patients who have had heart attacks by each year using the glm function in R, however the only way that I found this to be possible is if I use some summary function to take a count of each of the years and create a new dataset such as below and then use the glm function;
Count Year
2 2010
1 2011
2 2012
1 2013
2 2014
2 2015
HeartAttackfit <- glm(Count ~ Year, data = CountDF, family = poisson) #poisson model
This method works for just creating a simple poisson model, however I plan on taking this model a lot further through applying generalized estimating equations with the geeglm package for example and it has several issues with feeding in simplified data in this Count/Year form. I was wondering if there is any way I can create the poisson model directly from the DF dataset for the number of patients who have had heart attacks by each year utilizing the glm function without summarizing the data to the Count/Year form? Thank you very much in advance.

How can i sum values of 1 column based on the categories of another column, multiple times, in R?

I guess my question its a little strange, let me try to explain it. I need to solve a simple equation for a longitudinal database (29 consecutive years) about food availability and international commerce: (importations-exportations)/(production+importations-exportations)*100[equation for food dependence coeficient, by FAO]. The big problem is that my database has the food products and its values of interest (production, importation and exportation) dissagregated, so i need to find a way to apply that equation to a sum of the values of interest for every year, so i can get the coeficient i need for every year.
My data frame looks like this:
element product year value (metric tons)
Production Wheat 1990 16
Importation Wheat 1990 2
Exportation Wheat 1990 1
Production Apples 1990 80
Importation Apples 1990 0
Exportation Apples 1990 72
Production Wheat 1991 12
Importation Wheat 1991 20
Exportation Wheat 1991 0
I guess the solution its pretty simple, but im not good enough in R to solve this problem by myself. Every help is very welcome.
Thanks!
This is a picture of my R session
require(data.table)
# dummy table. Use setDT(df) if yours isn't a data table already
df <- data.table(element = (rep(c('p', 'i', 'e'), 3))
, product = (rep(c('w', 'a', 'w'), each=3))
, year = rep(c(1990, 1991), c(6,3))
, value = c(16,2,1,80,0,72,12,20,0)
); df
element product year value
1: p w 1990 16
2: i w 1990 2
3: e w 1990 1
4: p a 1990 80
5: i a 1990 0
6: e a 1990 72
7: p w 1991 12
8: i w 1991 20
9: e w 1991 0
# long to wide
df_1 <- dcast(df
, product + year ~ element
, value.var = 'value'
); df_1
# apply calculation
df_1[, food_depend_coef := (i-e) / (p+i-e)*100][]
product year e i p food_depend_coef
1: a 1990 72 0 80 -900.000000
2: w 1990 1 2 16 5.882353
3: w 1991 0 20 12 62.500000

How can I extract the unique variables in one column conditional to a variable in another and make a new data frame with the output?

I would like to extract the number of camera trap nights (CTN) (one column in df) per camera trap station (another column in DF) so I can work out relative abundance indices for each cameras station. For example Station 1 has had 5 triggers/events (of the same species) and has had 30 CTN. It is listed in my database 5 times (has 5 rows). I want to extract the unique CTN for Station 1 and subsequently all the other Stations in the DF.
Data frame:
EventID CameraStation CTN
001 Station 1 30
002 Station 1 30
003 Station 1 30
004 Station 1 30
005 Station 2 29
006 Station 2 29
007 Station 2 29
008 Station 2 29
009 Station 2 29
010 Station 3 31
011 Station 3 31
I have tried to use 'unique' and 'with' but do not get the result I want.
with(unique(rai.PS[c("CameraStation", "CTN")]), table(CameraStation))
I expect to get the following results;
CameraStation CTN
Station 1 30
Station 2 29
Station 3 31
I.e. Station 1 is only listed once with the outcome of CTN and in a new data frame.
But instead I get;
CameraStation
Station 1
1
Station 2
1
Station 3
1
I am assuming it is giving me the unique station once without the CTN as the criteria.

From panel data to cross-sectional data using averages

I am very new to R so I am not sure how basic my question is, but I am stuck at the following point.
I have data that has a panel structure, similar to this
Country Year Outcome Country-characteristic
A 1990 10 40
A 1991 12 40
A 1992 14 40
B 1991 10 60
B 1992 12 60
For some reason I need to put this in a cross-sectional structure such I get averages over all years for each country, that is in the end, it should look like,
Country Outcome Country-Characteristic
A 12 40
B 11 60
Has anybody faced a similar problem? I was playing with lapply(table$country, table$outcome, mean) but that did not work as I wanted it.
Two tips: 1- When you ask a question, you should provide a reproducible example for the data too (as I did with read.table below). 2- It's not a good idea to use "-" in column names. You should use "_" instead.
You can get a summary using the dplyr package:
df1 <- read.table(text="Country Year Outcome Countrycharacteristic
A 1990 10 40
A 1991 12 40
A 1992 14 40
B 1991 10 60
B 1992 12 60", header=TRUE, stringsAsFactors=FALSE)
library(dplyr)
df1 %>%
group_by(Country) %>%
summarize(Outcome=mean(Outcome),Countrycharacteristic=mean(Countrycharacteristic))
# A tibble: 2 x 3
Country Outcome Countrycharacteristic
<chr> <dbl> <dbl>
1 A 12 40
2 B 11 60
We can do this in base R with aggregate
aggregate(.~Country, df1[-2], mean)
# Country Outcome Countrycharacteristic
#1 A 12 40
#2 B 11 60

R Table data with a grouping command

This seems like a very simple problem, but I can't seem to sort it out. I have sought help from this forum, with the below topics being close, but don't seem to do exactly what I need. I have count data over several years. I want to obtain frequencies of the count value by year. It seems I need a table function with a grouping option, but I haven't found the proper syntax.
Data:
count year
1 15 1957
2 6 1957
3 23 1957
4 23 1957
5 2 1957
6 28 1980
7 15 1980
8 32 1980
9 18 1981
thank you in advance!
Counting the number of elements with the values of x in a vector
grouping data splitted by frequencies
Aggregate data in R
You're looking for the table function. Something like:
with(yourdata, table(Year, Count))

Resources