Related
I am having a problem with get() in R.
I have a set of data.frames with a common structure in my environment. I want to loop through these data frames and change the name of the 2nd column so that the name of the 2nd column contains a prefix from the 1st column.
For example, if column 1 = A_cat and column 2 is dog, I want column 2 to be changed to A_dog.
Below is an example of the R code I am using:
df <- data.frame('A_cat'= 1:10 , 'dog' = 11:20)
for( element in grep('^df$', names(environment()), value=TRUE) ) {
colnames(get(element))[2] <- paste(strsplit(colnames(get(element)) [1], '`_`')[[1]][1],
colnames(get(element))[2], sep='`_`')
}
The arguments within the for loop, on either side of the assignment operator, both give the expected result if I run them separately but when run together produce the following error.
Error in colnames(get(element))[2] <- paste(strsplit(colnames(get(element))[1], :
could not find function "get<-"
Any help with this problem would be greatly appreciated.
This does the same thing as the code in the question without using get:
df <- data.frame('A_cat'= 1:10 , 'dog' = 11:20)
e <- environment() ##
df.names <- grep("^df$", names(e), value = TRUE)
# nm is the current data frame name and nms are its column names
for(nm in df.names) {
nms <- names(e[[nm]])
names(e[[nm]])[2] <- paste0(sub("_.*", "_", nms[1]), nms[2])
}
giving:
> df
A_cat A_dog
1 1 11
2 2 12
3 3 13
4 4 14
5 5 15
6 6 16
7 7 17
8 8 18
9 9 19
10 10 20
Keeping the data.frames in a named list as suggested in a comment to the question might be even better. For example, if instead of keeping the data.frames in an environment they were in a list called e
e <- list(df = df)
then omit the line marked ## and the rest works as is.
Here would be one way to accomplish this goal if the data.frames have systematic names (here, df1 df2 df3, etc) and the prefix ends with "_" as in the example:
# suggested by #roland roll them up in a list:
myDfList <- mget(ls(pattern="^df"))
# change names
for(dfName in names(myDfList)) {
names(myDfList[[dfName]])[2] <- paste0(gsub("^(.*_)", "\\1",
names(myDfList[[dfName]])[1]),
names(myDfList[[dfName]])[2])
}
This is about ordering column names that contain both numbers and text. I have a dataframe which resulted from dcastand has 200 rows. I have a problem with the ordering.
The column names are in the following format:
names(DF) <- c('Testname1.1', 'Testname1.100','Testname1.11','Testname1.2',...,Testname2.99)
Edit: I would like to have the columns ordered as:
names(DF) <- c('Testname1.1, Testname1.2,Testname1.3,...Testname1.100,Testname2.1,...Testname 2.100)
The original input has a column which specifies the day, but it is not being used when I 'cast' the data. Is there a way to specify the 'dcast' function to order combined column names numerically?
What would be the easiest way to get the columns ordered as I need to in R?
Thanks a lot!
I think you need to split the column before you can use it to order the data frame:
library("reshape2") ## for colsplit()
library("gtools")
Construct test data:
dat <- data.frame(matrix(1:25,5))
names(dat) <- c('Testname1.1', 'Testname1.100',
'Testname1.11','Testname1.2','Testname2.99')
Split and order:
cdat <- colsplit(names(dat),"\\.",c("name","num"))
dat[,order(mixedorder(cdat$name),cdat$num)]
## Testname1.1 Testname1.2 Testname1.11 Testname1.100 Testname2.99
## 1 1 16 11 6 21
## 2 2 17 12 7 22
## 3 3 18 13 8 23
## 4 4 19 14 9 24
## 5 5 20 15 10 25
The mixedorder() above (borrowed from #BondedDust's answer) is not really necessary for this example, but would be needed if the first (Testnamexx) component had more than 9 elements, so that Testname1, Testname2, and Testname10 would come in the proper order.
The mixedorder and mixedsort functions of pkg:gtools sometimes does what is desired but in this case I think the period separator is messing things up because it is part of numeric values. But clearly was intended go be a separator rather than decimal point. Try
nvec <- c('Testname1.1', 'Testname1.100', 'Testname1.11', 'Testname1.2', 'Testname2.99')
#------------
> require(gtools)
Loading required package: gtools
Attaching package: ‘gtools’
The following objects are masked from ‘package:boot’:
inv.logit, logit
#------------
myvec <- nvec[order( mixedorder( sapply(strsplit(nvec, "\\."), "[[", 1)),
as.numeric(sapply(strsplit(nvec, "\\."), "[[", 2)) )
]
One way would be:
library(gtools) #use gtools library
library(NCmisc) #use NCmisc library for pad.left()
myvec <- c('Testname1.1', 'Testname1.100','Testname1.11','Testname1.2','Testname2.99') #construct your vector
myvec[mixedorder( paste(substring(myvec,1,9), pad.left(substring(myvec,11,100),'0') , sep='') ) ]
[1] "Testname1.1" "Testname1.2" "Testname1.11" "Testname1.100" "Testname2.99"
How to combine this list of vectors by elements names ?
L1 <- list(F01=c(1,2,3,4),F02=c(10,20,30),F01=c(5,6,7,8,9),F02=c(40,50))
So to get :
results <- list(F01=c(1,2,3,4,5,6,7,8),F02=c(10,20,30,40,50))
I tried to apply the following solution merge lists by elements names but I can't figure out how to adapt this to my situation.
sapply(unique(names(L1)), function(x) unname(unlist(L1[names(L1)==x])), simplify=FALSE)
$F01
[1] 1 2 3 4 5 6 7 8 9
$F02
[1] 10 20 30 40 50
You can achieve the same result using map function from purrr
map(unique(names(L1)), ~ flatten_dbl(L1[names(L1) == .x])) %>%
set_names(unique(names(L1)))
The first line transforms the data by merging elements with matching names, while the last line renames new list accordingly.
I have, for example, this three datasets (in my case, they are many more and with a lot of variables):
data_frame1 <- data.frame(a=c(1,5,3,3,2), b=c(3,6,1,5,5), c=c(4,4,1,9,2))
data_frame2 <- data.frame(a=c(6,0,9,1,2), b=c(2,7,2,2,1), c=c(8,4,1,9,2))
data_frame2 <- data.frame(a=c(0,0,1,5,1), b=c(4,1,9,2,3), c=c(2,9,7,1,1))
on each data frame I want to add a variable resulting from a transformation of an existing variable on that data frame. I would to do this by a loop. For example:
datasets <- c("data_frame1","data_frame2","data_frame3")
vars <- c("a","b","c")
for (i in datasets){
for (j in vars){
# here I need a code that create a new variable with transformed values
# I thought this would work, but it didn't...
get(i)$new_var <- log(get(i)[,j])
}
}
Do you have some valid suggestions about that?
Moreover, it would be great for me if it were possible also to assign the new column names (in this case new_var) by a character string, so I could create the new variables by another for loop nested in the other two.
Hope I've not been too tangled in explain my problem.
Thanks in advance.
You can put your dataframes in a list and use lapply to process them one by one. So no need to use a loop in this case.
For example you can do this :
data_frame1 <- data.frame(a=c(1,5,3,3,2), b=c(3,6,1,5,5), c=c(4,4,1,9,2))
data_frame2 <- data.frame(a=c(6,0,9,1,2), b=c(2,7,2,2,1), c=c(8,4,1,9,2))
data_frame3 <- data.frame(a=c(0,0,1,5,1), b=c(4,1,9,2,3), c=c(2,9,7,1,1))
ll <- list(data_frame1,data_frame2,data_frame3)
lapply(ll,function(df){
df$log_a <- log(df$a) ## new column with the log a
df$tans_col <- df$a+df$b+df$c ## new column with sums of some columns or any other
## transformation
### .....
df
})
the dataframe1 becomes :
[[1]]
a b c log_a tans_col
1 1 3 4 0.0000000 8
2 5 6 4 1.6094379 15
3 3 1 1 1.0986123 5
4 3 5 9 1.0986123 17
5 2 5 2 0.6931472 9
I had the same need and wanted to change also the columns in my actual list of dataframes.
I found a great method here (the purrr::map2 method in the question works for dataframes with different columns), followed by
list2env(list_of_dataframes ,.GlobalEnv)
I have a dataset called "J_BL5H1", this includes :
Var1 Freq
4 10
8 10
10 13
11 7
13 3
17 10
19 10
25 1
26 4
27 8
53 13
From this dataset, I want to find all Var1s seperately, and I want to called this new data like J_BL5H1JNVar1Number, here Var1Number denotes to specific Var1s, e.g. "4, 8, 10".
I will use this :
J_BL5H1JNVar1Number <- J_BL5H1$Freq[1]
Here, I want to replace Var1Number to "Var1" values in the old data.
For example, if I want to know the "Freq[4]", my new data should be called like "J_BL5H1JN11", the "Var1Number" will be automatically replaced by the Var1 of Freq[4], in this case by 11.
I hope I can clearly state my problem, Thanks.
First use paste to create the names of the data.sets:
data.string <- "J_BL5H1LN"
split.var <- "Var1"
data.sets <- paste(data.string, J_BL5H1[, split.var], sep = "")
Then use a loop to assign the according values to the data sets:
for( i in seq_along(data.sets) ) assign(data.sets[i], J_BL5H1[i, "Freq"])
Now you have the data sets in your workspace:
ls()
Btw, if you want to access the different data sets without actually calling them every time, you can access them by name using the get function:
sapply(data.sets, get)