melt dataframe with multiple IDs - r

My data.frame
a<-sample(12)
b<-sample(-100:100, 12)
d<-c(-11:0)
O<-rep(c("N","H"), each=6)
H<-rep(c("In+", "In-"), each=3, times=2)
ID<-rep(c("bo","co", "do", "fo"), each=3)
mydata_1<-data.frame(ID, a, b, d, O, H)
I want to melt the dataframe variables a, b, d; while O and H should be ordered like the ID. My solution below:
mydata_2<-data.frame(ID, a, b, d)
gg.df <- melt(mydata, id="ID", variable.name="int")
O<-rep(c("N","H"), each=6, times=3)
H<-rep(rep(c("In+", "In-"), each=3, times=2), times=3)
gg.df[, "OX"] <- O
gg.df["HI"] <- H
I am wondering how this can be done inside the melt function by using the full dataframe (mydata_1)

Related

How to create new variable at the end of each loop iteration in R

I am trying to create a variable that is a function of 4 other variables. I have the following code:
set.seed(123)
iter <- 1000
group <- c('A','B','C','D','E','F')
for (i in group) {
df <- df1[df1$group == i,]
x_ <- vector(mode="numeric", length=1000)
assign(eval(paste0("X_", i)), globalenv()) #This is the issue
a <- rnorm(iter, mean=df$a, sd=df$sea)
b <- rnorm(iter, mean=df$b, sd=df$seb)
c <- rnorm(iter, mean=df$c, sd=df$sec)
z <- rnorm(iter, mean=df$zbar, sd=df$se_z)
X_[i] = (a + c*(z-df$zbar))/(-b)
}
I am unable to create a unique group-specific variable (e.g. X_A, X_B, ...) and I am unsure why the -assign( )- function is not working properly. The dataframe df1 has 6 rows (one for each group) and then the number of columns is equal to the number of variables plus a string variable for group. I am not trying to append this new variables X_[i] to the dataset I am just trying to place it in the global environment. I believe the issue lies in my assigning the placement of the variable, but it isn't generating a numeric variable X.
df1 is a dataframe with 6 observations of 9 variables containing a, sea, b, seb, c, sec, zbar, se_z. These are just the means and standard deviations of a, b, c, and z, respectively. The 9th variable is group which contains A, B, ..., F. When I use the code df <-df1[df1$group == i,] I am trying to create a unique X variable for each group entity.
Try something like this:
dynamicVariableName <- paste0("X_", i)
assign(dynamicVariableName, (a + c*(z-df$zbar))/(-b))
Alternatively to the answer from #ErrorJordan, you can write your loop like that:
set.seed(123)
iter <- 1000
group <- c('A','B','C','D','E','F')
for(i in group)
{
df <- df1[df1$group == i,]
a <- rnorm(iter, mean=df$a, sd=df$sea)
b <- rnorm(iter, mean=df$b, sd=df$seb)
c <- rnorm(iter, mean=df$c, sd=df$sec)
z <- rnorm(iter, mean=df$zbar, sd=df$se_z)
X <- (a + c*(z-df$zbar))/(-b)
assign(paste0("X_",i),X,.GlobalEnv)
}
As suggested by #MrFlick, you can also stored your data into a list, to do so you can just modify your loop to get:
set.seed(123)
iter <- 1000
group <- c('A','B','C','D','E','F')
X = vector("list",length(group))
names(X) = group
for(i in 1:length(group))
{
df <- df1[df1$group == group[i],]
a <- rnorm(iter, mean=df$a, sd=df$sea)
b <- rnorm(iter, mean=df$b, sd=df$seb)
c <- rnorm(iter, mean=df$c, sd=df$sec)
z <- rnorm(iter, mean=df$zbar, sd=df$se_z)
X[[i]] <- (a + c*(z-df$zbar))/(-b)
}
df1 dataframe
df1 = data.frame(a = c(1:6),
b = c(1:6),
c = c(1:6),
zbar = c(1:6),
sea = rep(1,6),
seb = rep(1,6),
sec = rep(1,6),
se_z = rep(1,6),
group = group)
It's a little hard to parse what you want to do, but I'm assuming it's something like
for each value in group make an object (in the global env) called X_A, X_B, ...
for each one of those objects, assign it the value (a + c*(z-df$zbar))/(-b)
I think this should do that for you:
set.seed(123)
group <- c('A','B','C','D','E','F')
for (i in group) {
df <- df1[df1$group == i,]
a <- rnorm(iter, mean=df$a, sd=df$sea)
b <- rnorm(iter, mean=df$b, sd=df$seb)
c <- rnorm(iter, mean=df$c, sd=df$sec)
z <- rnorm(iter, mean=df$zbar, sd=df$se_z)
assign(paste0("X_", i), (a + c*(z-df$zbar))/(-b), globalenv())
}
Note that in the code example you gave, the command iter <- 1000 has no effect, and the command x_ <- vector(mode="numeric", length=1000) also has no effect. By that I mean, you make those objects, but never subsequently use them in any further computation. If those commands should do something meaningful I'll need your help in explaining their intended purpose.

rev() in r and how to apply it to a list using loops

I have a list of say {a,b,c,d,...} and each element, a,b,c,d, ... are data.table that I need to reverse the order of, however, for the data.table I only want to rev() all of it except the first column, as it is an ID. I tried using loops to do it but it returned
Error in `[<-.data.table`(`*tmp*`, , -1, value = list(code_a = c("a", :
Item 1 of column numbers in j is -1 which is outside range [1,ncol=4]. Use column names instead in j to add new columns.
Example:
a <- c("a","b","c","d","e","f")
b <- 1:6
c <- c("F","E","D","C","B","A")
d <- 10:15
dt1 <- data.table("ID" = b, "code_a" = a)
dt2 <- data.table("ID" = b, "code_c" = c)
dt3 <- data.table("ID" = b, "code_d" = d)
dt <- list(dt1,dt2,dt3)
rev_dt <- rev(dt)
merged_list <- list()
rev_merged_list <- list()
rev_merged_list <- Reduce(merge, rev_dt, accumulate = TRUE)
merged_list <- rev_merged_list
merged_list <- rev(merged_list)
for(z in 1:length(dt)){
merged_list[[z]][,-1] = rev(merged_list[[z]][,-1])
}
More Information:
The for loop here is supposed to be:
- for z from 1 to the length of dt
- the merged_list element z (which with double square brackets) should be a data.table
- where the data does not include the first column
- should be assigned to the rev of the same element z, where the first column is also excluded
Does this logic hold for the above loop? I am unsure what is wrong!
Expected Output:
output_ <- list()
a_ <- data.table("ID" = b, "code_a" = a, "code_c" = c, "code_d" = d)
b_ <- data.table("ID" = b, "code_c" = c, "code_d" = d)
c_ <- data.table("ID" = b, "code_d" = d)
output_[[1]] <- a_
output_[[2]] <- b_
output_[[3]] <- c_
output_
I was told yesterday that the merge above i can specify a right hand merge, however in doing so, I need to specify a by = "ID" in the merge, but I am unsure what is the x and y values in the case of merging multiple sets of data.
I am also under the impression that lapply() can do the same thing instead of loop, but I am unsure in this case how might I achieved that. Thanks~
We can use setcolorder
for(i in seq_along(merged_list)){
setcolorder(merged_list[[i]],
c(names(merged_list[[i]])[1], rev(names(merged_list[[i]])[-1])))
}
all.equal(merged_list, output_, check.attributes = FALSE)
#[1] TRUE

Automatically add any variables that exist in one data.frame but missing in other data.frames in R

Suppose I have a reference data.frame called a. I was wondering how I could automatically add any variables that exist in a but missing in other data.frames b and d?
NOTE: My goal is to make a function out of this such that any number of data.frames, and any number of variables can be completed based on a single reference data.frame.
a <- data.frame(x = 2:3, y = 4:5, z = c(T, F)) ## reference data.frame
b <- data.frame(x = 6:7) ## Add y and z here
d <- data.frame(x = 7:8) ## Add y and z here
Supposing all the data.frames involved share the same number of rows, you can simply:
toadd<-setdiff(colnames(a),colnames(b))
b[toadd]<-a[toadd]
Wrapping the above in a function:
f<-function(refdf, ...) {
res<-listdf<-list(...)
res<-lapply(listdf, function(x) {
toadd<-setdiff(names(refdf),names(x))
x[toadd]<-refdf[toadd]
x
})
c(list(refdf),res)
}
Then try for instance:
f(a,b)
f(a,b,d)
# Using a reference data.frame perform a right join in order
# to append required vectors to provided data.frames:
add_empty_vecs <- function(refdf, ...){
# Store the names of the other data.frames: df_names => character vector
df_names <- as.list(substitute(list(...)))[-1L]
# Return the right joined the reference data.frame to the
# provided data.frames: list => .GlobalEnv()
setNames(lapply(list(...), function(y){
merge(refdf, y, by = intersect(names(refdf), names(y)), all.y = TRUE)
}
), c(df_names))
}
# Apply function only df b:
add_empty_vecs(a, b)
# Apply function to both df b & df d:
add_empty_vecs(a, b, d)
# Apply function to all b, d, e:
add_empty_vecs(a, b, d, e)
Data:
a <- data.frame(x = 2:3, y = 4:5, z = c(T, F)) ## reference data.frame
b <- data.frame(x = 6:7) ## Add y and z here
d <- data.frame(x = 7:8) ## Add y and z here
e <- data.frame(x = 9:10)

Difference between two dates excluding weekends and given list of holidays in R

I want to calculate the number of days between 2 dates excluding weekends and the given list of holidays
a <- c('2016/05/2')
b <- c('2016/05/11')
a <- as.Date(a,'%Y/%m/%d')
b <- as.Date(b,'%Y/%m/%d')
Nweekdays <- Vectorize(function(a, b)
sum(!weekdays(seq(a, b, "days")) %in% c("Saturday", "Sunday")))
Nweekdays(a, b)
This is how I can calculate the number of days without the weekends. However, suppose I want to exclude a list of given holidays which is on '2016/05/3' and '2016/05/4' and then calculate the number of days between the mentioned two dates (excluding the weekends). I am unable to write this code. Please help me. Thanks in advance.
What about
f <- function(a, b, h) {
d <- seq(a, b, 1)[-1]
sum(!format(d, "%u") %in% c("6", "7") & !d %in% h)
}
f(a, b, as.Date(c("2016/05/3", "2016/05/4"),'%Y/%m/%d'))
or, with a data frame:
vf <- Vectorize(f, c("a", "b"))
df <- data.frame(a=rep(a, 2), b=rep(b, 2))
df$diff <- with(df, vf(a, b, as.Date(c("2016/05/3", "2016/05/4"),'%Y/%m/%d')) )
You can do it as following,
holidays <- as.Date(c('2016/05/3' , '2016/05/4' ))
date_range <- seq.Date(a, b, 1)
date_range[!weekdays(date_range) %in% (c("Saturday", "Sunday")) & !date_range %in% holidays]
#[1] "2016-05-02" "2016-05-05" "2016-05-06" "2016-05-09" "2016-05-10" "2016-05-11"

Priority/Decision Based Choice of Row

I have a data.frame that has a number of duplicate rows, akin to something like this:
con <- textConnection(Lines <- "
First, Last, Address, Address 2, Email, Custom1, Custom2, Custom3
A, B, C, D, F#G.com,1,2,3
A, B, C, D, F#G.com,1,2,2
A, B, C, D, F#G.com,1,2,1
")
x <- read.csv(con)
close(con)
Now, when I de-duplicate, in the following manner:
x <- x[!duplicated(x[,c("email")]),]
Could you recommend a method for prioritizing those rows that contain Custom3=1? Or is there a better mechanism for de-duplication?
Try sorting before finding duplicates:
x <- x[order(x[,c("Custom3")]),]
x <- x[!duplicated(x[,c("email")]),]

Resources