one to many merge in SAS to complete lines - panel-data

I am struggling with merge in the statistics programm SAS and hope you guys can help me:
I have Dataset that I want to join together
it looks smth. like this:
input:
id |var1 |var2 |var3 |var4 |Var5
>--------------------------------<
1 |A1 |B1 |C1 | -- | 0 |
1 |A2 |B2 |-- | D2 | 1 |
desired output:
id |var1 |var2 |var3 |var4 |Var5
>--------------------------------<
1 |A1 |B1 |C1 | D2| 0|
1 |A2 |B2 |C1 | D2| 1|
I tried to seperate the data set in two by if "Var5=0/1 then delete"
statments and then merging them together like:
Data example1
id |var1 |var2 |var3 |var4 |Var5
>----------------------------<
1 |A1 |B1 |C1 | -- | 0|
Data Example2
id |var1 |var2 |var3 |var4 |Var5
>--------------------------------<
1| A2 | B2 |-- | D2| 1|
Merge code:
data Example12;
merge example1 (IN=X) example2;
by persnr;
IF x=1;
run;
but this results in something like:
id |var1 |var2 |var3 |var4 |Var5
1|A1 |B1 |C1| D2| 0|
1|A1 |B1 |C1| D2| 0|
any help greatly appreciated.

If it is the case that only one record per ID will have a non missing value for the VAR3 or VAR4 then you could use dataset options to set up a one-to-many merge that would get the value of VAR3 and VAR4 merged onto all rows for that ID.
First let's setup your example data:
data have ;
input (id var1-var4) ($) var5 ;
cards;
1 A1 B1 C1 . 0
1 A2 B2 . D2 1
;
Now try the merge:
data want ;
merge have(drop=var3 var4)
have(keep=id var3 where=(not missing(var3)))
have(keep=id var4 where=(not missing(var4)))
;
by id;
run;

Related

Loop to replace all non-blank values in certain variables in R

How would I be able to run a loop to set all NA values to FALSE and all non-NA values to TRUE in the variables which are in a separate list?
example dataframe:
| var1 | var2 | var3 | var4 |
| --- | --- | --- | --- |
| name | email | NA | b|
| name | email | a| b|
| name | email | a| NA|
| name | email | NA| b|
| name | email | a| b|
| name | email | a| NA|
list.vars <- list("var3", "var4")
example outcome dataframe:
| var1 | var2 | var3 | var4 |
| --- | --- | --- | --- |
| name | email | FALSE| TRUE|
| name | email | TRUE| TRUE|
| name | email | TRUE| FALSE|
| name | email | FALSE| TRUE|
| name | email | TRUE| TRUE|
| name | email | TRUE| FALSE|
Does this work:
library(dplyr)
df %>% mutate(across(var3:var4, ~ ifelse(is.na(.), FALSE, TRUE)))
var1 var2 var3 var4
1 name email FALSE TRUE
2 name email TRUE TRUE
3 name email TRUE FALSE
4 name email FALSE TRUE
5 name email TRUE TRUE
6 name email TRUE FALSE
Data used:
df
var1 var2 var3 var4
1 name email <NA> b
2 name email a b
3 name email a <NA>
4 name email <NA> b
5 name email a b
6 name email a <NA>
assuming your dataframe is called df
for (v in list.vars){
df[[paste0(v,"_result")]] <- df[[v]] %>% is.na() %>% `!`
}

Using several fields on the same level in pivottabler

I'm looking in pivottabler for the option to build the table for 2 parallel fields, like that:
SEX | POPULATION GROUPS
______________________|_______________________________________
1 | 2 | 1 | 2 | 3 | 4
___________|_________|_________|_________|_________|___________
AgeGroups |AgeGroups|AgeGroups|AgeGroups|AgeGroups|AgeGroups
| | | | | | | | | | | | | | | | |
1 | 2| 3| 1 | 2| 3| 1 | 2| 3| 1 |2 | 3| 1 | 2| 3| 1 | 2| 3
______|__|__|___|__|__|___|__|__|___|__|__|___|__|__|___|__|___
| | | | | | | | | | | | | | | | |
How can I add in AddColumnDataRows 2 or more fields parallel and not hierarchically?
This can be done by specifying atLevel=1 when adding the population group columns to the pivot table.
I have made up some sample data to use in an example below.
n <- 100
sex <- sample(x=c("M","F"), size=n, replace=TRUE)
pg <- sample(x=c("pg1","pg2","pg3","pg4"), size=n, replace=TRUE)
ag <- sample(x=c("ag1","ag2","ag3"), size=n, replace=TRUE)
grp <- sample(x=c("g1","g2","g3","g4"), size=n, replace=TRUE)
df <- data.frame(sex, pg, ag, grp)
library(pivottabler)
pt <- PivotTable$new()
pt$addData(df)
pt$addColumnDataGroups("sex", addTotal=FALSE)
pt$addColumnDataGroups("pg", atLevel=1, addTotal=FALSE)
pt$addColumnDataGroups("ag", addTotal=FALSE)
pt$addRowDataGroups("grp")
pt$defineCalculation(calculationName="Count", summariseExpression="n()")
pt$renderPivot()
Example output:
Hope that helps
Chris

Need some advise to use reshape2:melt function for ggplot-ing

I need help writing a command to do the following:
I have two data frames (which I plan to combine into one to do some ggplotting) of the following form:
df1
|..D..|..A...|..B...|
| d1 | a11 | b11 |
| d2 | a12 | b12 |
| d3 | a13 | b13 |
df2
|..D.|..A....|..B....|
| d1 | a21 | b21 |
| d2 | a22 | b22 |
| d3 | a23 | b23 |
The values in the "D" column are the same for both tables, and the variables A and B have the same name, but the values are different. I need to get an output table of the following form:
df3
|..D..|..A...|..B...|Class|
| d1 | a11 | b11 | df1 |
| d2 | a12 | b12 | df1 |
| d3 | a13 | b13 | df1 |
| d1 | a21 | b21 | df2 |
| d2 | a22 | b22 | df2 |
| d3 | a23 | b23 | df2 |
I could just rbind both tables but I know (I think) that this can also be done with the "melt" function, but have not been able to make it happen.
reshape is more or less deprecated... If you want a tidyverse solution, you can do:
library(dplyr)
df3 <- row_binds(df1 = df1, df2 = df2, .id = "class")
Just use cbind then rbind. Make use of R's recycling ability.
df1 <- cbind(mtcars,Class="df1")
df2 <- cbind(mtcars,Class="df2")
rbind(df1,df2)

I want to covert columns from one table to row to another table using Procedures(Pl/sql)

From Table A
| id |col1 | col2 | col3|
--------------------------
| 1 | a | x1 | y1 |
--------------------------
| 2 | b | x2 | y2 |
--------------------------
| 3 | c | x3 | y3 |
--------------------------
to Table B
| id |x_a | y_a | x_b | y_b | x_c | y_c |
------------------------------------------
| 1 | x1 | y1 | x2 | y2 | x3 | y3 |
------------------------------------------
thanks
Your PLSQL code is as below:
declare
var varchar2(4000):= NULL;
var1 varchar2(4000):= NULL;
begin
for i in (select * from tabA where id = 1)
loop
dbms_output.put_line('| id |x_a | y_a | x_b | y_b | x_c | y_c |');
dbms_output.put_line('-------------------------------------------');
for j in (select col2,col3 from tabA)
loop
var1:= j.col2 ||' | '|| j.col3||' | ';
var:= var ||var1;
end loop;
dbms_output.put_line('|'|| i.id ||' | '|| var);
var:=NULL;
end loop;
end;
and the output is :
| id |x_a | y_a | x_b | y_b | x_c | y_c |
-------------------------------------------
|1 | x1 | y1 | x2 | y2 | x3 | y3 |

Find and replace function for R

Is there a function in R that can modify a dataframe to find and replace a specific word or character with another?
If I have a dataframe that looks like this:
| Full Path | File |
|:------------------------:|:---------:|
| C:/Path/to/the/file1.ext | file1.ext |
| C:/Path/to/the/file2.ext | file2.ext |
| C:/Path/to/the/file3.ext | file3.ext |
| C:/Path/to/the/file4.ext | file4.ext |
I'd like to modify it to look like this
| Full Path | File |
|:------------------------:|:---------:|
| C:\Path\to\the\file1.ext | file1.ext |
| C:\Path/to\the\file2.ext | file2.ext |
| C:\Path/to\the\file3.ext | file3.ext |
| C:\Path/to\the\file4.ext | file4.ext |
df <- data.frame(matrix(c(rep("this/and/that",5),rep("the other",5)),ncol=2))
df
X1 X2
1 this/and/that the other
2 this/and/that the other
3 this/and/that the other
4 this/and/that the other
5 this/and/that the other
data.frame(apply((df),MARGIN = 2,function(x){gsub("/","\\",x,fixed = TRUE)}),stringsAsFactors=FALSE)
X1 X2
1 this\\and\\that the other
2 this\\and\\that the other
3 this\\and\\that the other
4 this\\and\\that the other
5 this\\and\\that the other

Resources