My data.frame
a<-sample(12)
b<-sample(-100:100, 12)
d<-c(-11:0)
O<-rep(c("N","H"), each=6)
H<-rep(c("In+", "In-"), each=3, times=2)
ID<-rep(c("bo","co", "do", "fo"), each=3)
mydata_1<-data.frame(ID, a, b, d, O, H)
I want to melt the dataframe variables a, b, d; while O and H should be ordered like the ID. My solution below:
mydata_2<-data.frame(ID, a, b, d)
gg.df <- melt(mydata, id="ID", variable.name="int")
O<-rep(c("N","H"), each=6, times=3)
H<-rep(rep(c("In+", "In-"), each=3, times=2), times=3)
gg.df[, "OX"] <- O
gg.df["HI"] <- H
I am wondering how this can be done inside the melt function by using the full dataframe (mydata_1)
Related
I am trying to create a variable that is a function of 4 other variables. I have the following code:
set.seed(123)
iter <- 1000
group <- c('A','B','C','D','E','F')
for (i in group) {
df <- df1[df1$group == i,]
x_ <- vector(mode="numeric", length=1000)
assign(eval(paste0("X_", i)), globalenv()) #This is the issue
a <- rnorm(iter, mean=df$a, sd=df$sea)
b <- rnorm(iter, mean=df$b, sd=df$seb)
c <- rnorm(iter, mean=df$c, sd=df$sec)
z <- rnorm(iter, mean=df$zbar, sd=df$se_z)
X_[i] = (a + c*(z-df$zbar))/(-b)
}
I am unable to create a unique group-specific variable (e.g. X_A, X_B, ...) and I am unsure why the -assign( )- function is not working properly. The dataframe df1 has 6 rows (one for each group) and then the number of columns is equal to the number of variables plus a string variable for group. I am not trying to append this new variables X_[i] to the dataset I am just trying to place it in the global environment. I believe the issue lies in my assigning the placement of the variable, but it isn't generating a numeric variable X.
df1 is a dataframe with 6 observations of 9 variables containing a, sea, b, seb, c, sec, zbar, se_z. These are just the means and standard deviations of a, b, c, and z, respectively. The 9th variable is group which contains A, B, ..., F. When I use the code df <-df1[df1$group == i,] I am trying to create a unique X variable for each group entity.
Try something like this:
dynamicVariableName <- paste0("X_", i)
assign(dynamicVariableName, (a + c*(z-df$zbar))/(-b))
Alternatively to the answer from #ErrorJordan, you can write your loop like that:
set.seed(123)
iter <- 1000
group <- c('A','B','C','D','E','F')
for(i in group)
{
df <- df1[df1$group == i,]
a <- rnorm(iter, mean=df$a, sd=df$sea)
b <- rnorm(iter, mean=df$b, sd=df$seb)
c <- rnorm(iter, mean=df$c, sd=df$sec)
z <- rnorm(iter, mean=df$zbar, sd=df$se_z)
X <- (a + c*(z-df$zbar))/(-b)
assign(paste0("X_",i),X,.GlobalEnv)
}
As suggested by #MrFlick, you can also stored your data into a list, to do so you can just modify your loop to get:
set.seed(123)
iter <- 1000
group <- c('A','B','C','D','E','F')
X = vector("list",length(group))
names(X) = group
for(i in 1:length(group))
{
df <- df1[df1$group == group[i],]
a <- rnorm(iter, mean=df$a, sd=df$sea)
b <- rnorm(iter, mean=df$b, sd=df$seb)
c <- rnorm(iter, mean=df$c, sd=df$sec)
z <- rnorm(iter, mean=df$zbar, sd=df$se_z)
X[[i]] <- (a + c*(z-df$zbar))/(-b)
}
df1 dataframe
df1 = data.frame(a = c(1:6),
b = c(1:6),
c = c(1:6),
zbar = c(1:6),
sea = rep(1,6),
seb = rep(1,6),
sec = rep(1,6),
se_z = rep(1,6),
group = group)
It's a little hard to parse what you want to do, but I'm assuming it's something like
for each value in group make an object (in the global env) called X_A, X_B, ...
for each one of those objects, assign it the value (a + c*(z-df$zbar))/(-b)
I think this should do that for you:
set.seed(123)
group <- c('A','B','C','D','E','F')
for (i in group) {
df <- df1[df1$group == i,]
a <- rnorm(iter, mean=df$a, sd=df$sea)
b <- rnorm(iter, mean=df$b, sd=df$seb)
c <- rnorm(iter, mean=df$c, sd=df$sec)
z <- rnorm(iter, mean=df$zbar, sd=df$se_z)
assign(paste0("X_", i), (a + c*(z-df$zbar))/(-b), globalenv())
}
Note that in the code example you gave, the command iter <- 1000 has no effect, and the command x_ <- vector(mode="numeric", length=1000) also has no effect. By that I mean, you make those objects, but never subsequently use them in any further computation. If those commands should do something meaningful I'll need your help in explaining their intended purpose.
I have a list of say {a,b,c,d,...} and each element, a,b,c,d, ... are data.table that I need to reverse the order of, however, for the data.table I only want to rev() all of it except the first column, as it is an ID. I tried using loops to do it but it returned
Error in `[<-.data.table`(`*tmp*`, , -1, value = list(code_a = c("a", :
Item 1 of column numbers in j is -1 which is outside range [1,ncol=4]. Use column names instead in j to add new columns.
Example:
a <- c("a","b","c","d","e","f")
b <- 1:6
c <- c("F","E","D","C","B","A")
d <- 10:15
dt1 <- data.table("ID" = b, "code_a" = a)
dt2 <- data.table("ID" = b, "code_c" = c)
dt3 <- data.table("ID" = b, "code_d" = d)
dt <- list(dt1,dt2,dt3)
rev_dt <- rev(dt)
merged_list <- list()
rev_merged_list <- list()
rev_merged_list <- Reduce(merge, rev_dt, accumulate = TRUE)
merged_list <- rev_merged_list
merged_list <- rev(merged_list)
for(z in 1:length(dt)){
merged_list[[z]][,-1] = rev(merged_list[[z]][,-1])
}
More Information:
The for loop here is supposed to be:
- for z from 1 to the length of dt
- the merged_list element z (which with double square brackets) should be a data.table
- where the data does not include the first column
- should be assigned to the rev of the same element z, where the first column is also excluded
Does this logic hold for the above loop? I am unsure what is wrong!
Expected Output:
output_ <- list()
a_ <- data.table("ID" = b, "code_a" = a, "code_c" = c, "code_d" = d)
b_ <- data.table("ID" = b, "code_c" = c, "code_d" = d)
c_ <- data.table("ID" = b, "code_d" = d)
output_[[1]] <- a_
output_[[2]] <- b_
output_[[3]] <- c_
output_
I was told yesterday that the merge above i can specify a right hand merge, however in doing so, I need to specify a by = "ID" in the merge, but I am unsure what is the x and y values in the case of merging multiple sets of data.
I am also under the impression that lapply() can do the same thing instead of loop, but I am unsure in this case how might I achieved that. Thanks~
We can use setcolorder
for(i in seq_along(merged_list)){
setcolorder(merged_list[[i]],
c(names(merged_list[[i]])[1], rev(names(merged_list[[i]])[-1])))
}
all.equal(merged_list, output_, check.attributes = FALSE)
#[1] TRUE
Suppose I have a reference data.frame called a. I was wondering how I could automatically add any variables that exist in a but missing in other data.frames b and d?
NOTE: My goal is to make a function out of this such that any number of data.frames, and any number of variables can be completed based on a single reference data.frame.
a <- data.frame(x = 2:3, y = 4:5, z = c(T, F)) ## reference data.frame
b <- data.frame(x = 6:7) ## Add y and z here
d <- data.frame(x = 7:8) ## Add y and z here
Supposing all the data.frames involved share the same number of rows, you can simply:
toadd<-setdiff(colnames(a),colnames(b))
b[toadd]<-a[toadd]
Wrapping the above in a function:
f<-function(refdf, ...) {
res<-listdf<-list(...)
res<-lapply(listdf, function(x) {
toadd<-setdiff(names(refdf),names(x))
x[toadd]<-refdf[toadd]
x
})
c(list(refdf),res)
}
Then try for instance:
f(a,b)
f(a,b,d)
# Using a reference data.frame perform a right join in order
# to append required vectors to provided data.frames:
add_empty_vecs <- function(refdf, ...){
# Store the names of the other data.frames: df_names => character vector
df_names <- as.list(substitute(list(...)))[-1L]
# Return the right joined the reference data.frame to the
# provided data.frames: list => .GlobalEnv()
setNames(lapply(list(...), function(y){
merge(refdf, y, by = intersect(names(refdf), names(y)), all.y = TRUE)
}
), c(df_names))
}
# Apply function only df b:
add_empty_vecs(a, b)
# Apply function to both df b & df d:
add_empty_vecs(a, b, d)
# Apply function to all b, d, e:
add_empty_vecs(a, b, d, e)
Data:
a <- data.frame(x = 2:3, y = 4:5, z = c(T, F)) ## reference data.frame
b <- data.frame(x = 6:7) ## Add y and z here
d <- data.frame(x = 7:8) ## Add y and z here
e <- data.frame(x = 9:10)
I want to calculate the number of days between 2 dates excluding weekends and the given list of holidays
a <- c('2016/05/2')
b <- c('2016/05/11')
a <- as.Date(a,'%Y/%m/%d')
b <- as.Date(b,'%Y/%m/%d')
Nweekdays <- Vectorize(function(a, b)
sum(!weekdays(seq(a, b, "days")) %in% c("Saturday", "Sunday")))
Nweekdays(a, b)
This is how I can calculate the number of days without the weekends. However, suppose I want to exclude a list of given holidays which is on '2016/05/3' and '2016/05/4' and then calculate the number of days between the mentioned two dates (excluding the weekends). I am unable to write this code. Please help me. Thanks in advance.
What about
f <- function(a, b, h) {
d <- seq(a, b, 1)[-1]
sum(!format(d, "%u") %in% c("6", "7") & !d %in% h)
}
f(a, b, as.Date(c("2016/05/3", "2016/05/4"),'%Y/%m/%d'))
or, with a data frame:
vf <- Vectorize(f, c("a", "b"))
df <- data.frame(a=rep(a, 2), b=rep(b, 2))
df$diff <- with(df, vf(a, b, as.Date(c("2016/05/3", "2016/05/4"),'%Y/%m/%d')) )
You can do it as following,
holidays <- as.Date(c('2016/05/3' , '2016/05/4' ))
date_range <- seq.Date(a, b, 1)
date_range[!weekdays(date_range) %in% (c("Saturday", "Sunday")) & !date_range %in% holidays]
#[1] "2016-05-02" "2016-05-05" "2016-05-06" "2016-05-09" "2016-05-10" "2016-05-11"
I have a data.frame that has a number of duplicate rows, akin to something like this:
con <- textConnection(Lines <- "
First, Last, Address, Address 2, Email, Custom1, Custom2, Custom3
A, B, C, D, F#G.com,1,2,3
A, B, C, D, F#G.com,1,2,2
A, B, C, D, F#G.com,1,2,1
")
x <- read.csv(con)
close(con)
Now, when I de-duplicate, in the following manner:
x <- x[!duplicated(x[,c("email")]),]
Could you recommend a method for prioritizing those rows that contain Custom3=1? Or is there a better mechanism for de-duplication?
Try sorting before finding duplicates:
x <- x[order(x[,c("Custom3")]),]
x <- x[!duplicated(x[,c("email")]),]