I have written a loop that stores data frames in a list and would like to use strings stored in a vector as their names. This way, I could refer to the dataframes stored in the list by their names without having to use indexes. I have searched the internet extensively to this issue but so far have not found any solution.
So far, I have used a workaround: I loop over a list of data frame names using read.csv(). In each iteration, I write the imported data frame to the global environment using assign() which allows me to me set a variable name. Using get() and a pattern matching approach, I then fetch data frames from the global environment and store them in a list.
This approach is quite cumbersome and only works when data frame names follow a shared pattern.
Preferably, I would like to rename data frames without having to use assign():
Name of imported data frame 1 <- First element of vector containing the data frame names
How could I achieve this?
I highly appreciate every help!
My approach to this sort of problem is to use lapply to create the loop and then supply names for the elements of the resulting list. This gives a simple, two line solution once the "create a data frame" function has been written.
For example, generating a random data.frame rather than reading a csv file for easy reproduction:
createDataFrame <- function(x) {
data.frame(X=x, Y=rnorm(5))
}
beatles <- lapply(1:4, createDataFrame)
names(beatles) <- c("John", "Paul", "George", "Ringo")
beatles
$John
X Y
1 1 -1.1590175
2 1 0.6872888
3 1 -0.8868616
4 1 -0.3458603
5 1 1.1136297
$Paul
X Y
1 2 -0.3761409
2 2 -0.9059801
3 2 -0.7039736
4 2 -0.4490143
5 2 1.1337149
$George
X Y
1 3 -0.4804286
2 3 1.0573272
3 3 -1.9000426
4 3 0.8887967
5 3 0.6550380
$Ringo
X Y
1 4 -0.7539840
2 4 -0.3743590
3 4 -0.9748449
4 4 -1.1448570
5 4 -1.3277712
beatles$George
X Y
1 3 -0.4804286
2 3 1.0573272
3 3 -1.9000426
4 3 0.8887967
5 3 0.6550380
Make the obvious changes to createDataFrame for your actual use case.
Related
I have certain data in a list extracted from a bayesian processing from certain electrodes and I want to populate a dataframe out of a loop. First I have a list of 729 processing outcomes and an object elecs which is basically a list of 729 pairs of electrodes (27*27) as you can see.
> head(elecs)
X Elec1 Elec2
1 1 1 1
2 2 1 2
3 3 1 3
4 4 1 4
5 5 1 5
6 6 1 6
The thing is I would like to fill dataf1 with the outcome of this loop which happens to be a dataframe of 4000 rows.
dataf1 <- data.frame('Elec1'=rep(NA,4000*729),'Elec2'=rep(NA,4000*729),'int'=rep(NA,4000*729))
for (i in nrow(elecs)){
Elec1=as.data.frame(rep(elecs[i,]$Elec1,4000))
Elec2=as.data.frame(rep(elecs[i,]$Elec2,4000))
post <- posterior_samples(bayeslist[[i]])
int <- as.data.frame(post$b_Intercept)
df <- cbind(Elec1,Elec2,est)
colnames(df) <- c('Elec1','Elec2','int')
dataf1[(1+(i-1)*4000):((1+(i-1)*4000)+3999),c('Elec1','Elec2','int')] <- df
}
Everything works perfectly fine until the last line in the loop:
dataf1[(1+(i-1)*4000):((1+(i-1)*4000)+3999),c('Elec1','Elec2','int')] <- df
And I don't know why exactly this is not working as expected and populating the dataf1 preinitialised dataframe.
Any insight, as always, will be highly appreciated.
I realised I was missing the init in the for, so it's kinda newbie typo. Apart from this, the code works, in case anyone is wondering.
for (i in nrow(elecs)){
for (i in 1:nrow(elecs)){
I have a list with two dataframes (each with two columns) and I want to rename a specific column in this list.
sample_df1<-data.frame(coltest11=1:6,coltest12=5:10)
sample_df2<-data.frame(coltest21=5:10,coltest22=1:6)
sample_ls<-list("a"=sample_df1, "b"=sample_df2)
colnames(sample_ls[["a"]][2])<-"test"
names(sample_ls[["a"]][2])
but the result is
[1] "coltest12"
I spent more than an hour looking at other topics but can't figure out what I am missing.
Your current problem is that you are accessing the second entry in the list, then taking its names and trying to change it. Instead, if you want to rename the second column in the a data frame, then just access the second entry in names, and rename it:
names(sample_ls$a)[2] <- "test" # the [2] belongs on the outside, not inside
sample_ls$a
coltest11 test
1 1 5
2 2 6
3 3 7
4 4 8
5 5 9
6 6 10
Data:
sample_df1 <- data.frame(coltest11=1:6, coltest12=5:10)
sample_df2 <- data.frame(coltest21=5:10, coltest22=1:6)
sample_ls <- list(a=sample_df1, b=sample_df2)
I have 20 excel files containing city level data for each year. I imported them in a list because I thought it will be easier to loop over them.
The first task that I wanted to do is to change the name of the second column of each file.
If, for a single file I do:
#data is a list of data tables/frames. Example:
data<-list(a = data.frame(1:2,3:4),b = data.frame(5:8,15:18) )
#renaming first column of a (works)
names(data[[1]])[2]<-"ABC"
I am able to rename the column.
To do batch editing I wanted to write a function to be used in lapply. The function should be a simple version of the above thing:
rename <-function(df){
names(df)[2]<-"XYZ"}
Rename(data[[1]]) however, does nothing to the second column. Any ideas why?
You need to return the full modified object at each iteration:
data <- lapply( data, function(x) {names(x)[2]<-"ABC"; x})
data
#---------
[[1]]
X1.2 ABC
1 1 3
2 2 4
[[2]]
X5.8 ABC
1 5 15
2 6 16
3 7 17
4 8 18
I'm sure this is a duplicate but I don't know what the right search terms might be, so I'm just answering it .... again.
I am trying to select a column from a dataframe using a variable as a column name, with the problem that the column name is escaped. I have a couple of workarounds for doing it, which involve changing my code a bit too much, and anyway I've been looking around and I am curious if anybody knew the solution for this kind of weird case.
My dataset is actually a list of time series (which I construct after some operations), this would be a toy example.
df <- list(`01/19/17`=seq(1,10), `01/20/17`=seq(2,11))
> df
$`01/19/17`
[1] 1 2 3 4 5 6 7 8 9 10
$`01/20/17`
[1] 2 3 4 5 6 7 8 9 10 11
I don't put the escapes ` in the column names because I want to, but because they come as dates from the process I follow to construct the dataset.
If I know the column name I can access like this,
df$`01/19/17`
If I want to use a variable, looking around e.g. here I see I could rewrite it to something like this,
`$`(df, `01/19/17`)
But I cannot assign a variable like this,
> name1 <- `01/19/17`
Error: object '01/19/17' not found
and if assign it this other way I get a NULL,
> name1 <- "01/19/17"
> `$`(df, name1)
NULL
As I say there are workarounds like e.g. changing all the column names in the list of series, but I just would like to know. Thank you so much.
You can access with brackets rather than with $, even when the key is a string:
df <- list(`01/19/17`=seq(1,10), `01/20/17`=seq(2,11))
name1 <- "01/19/17"
df[[name1]]
# [1] 1 2 3 4 5 6 7 8 9 10
Is l_ply or some other apply-like function capable of inserting results to an existing data frame?
Here's a simple example...
Suppose I have the following data frame:
mydata <- data.frame(input1=1:3, input2=4:6, result1=NA, result2=NA)
input1 input2 result1 result2
1 1 4 NA NA
2 2 5 NA NA
3 3 6 NA NA
I want to loop through the rows, perform operations, then insert the answer in the columns result1 and result2. I tried:
l_ply(1:nrow(mydata), function(i) {
mydata[i,"result1"] <- mydata[i,"input1"] + mydata[i,"input2"]
mydata[i,"result2"] <- mydata[i,"input1"] * mydata[i,"input2"]})
but I get back the original data frame with NA's in the result columns.
P.S. I've already read this post, but it doesn't quite answer my question. I have several result columns, and the operations I want to perform are more complicated than what I have above so I'd prefer not to compute the columns separately then add them to the data frame after as the post suggests.
I suppose there might be a plyr approach but this seems very easy and clear to do in base R:
> mydata[3:4] <- with(mydata, list( input1+input2, input1*input2) )
> mydata
input1 input2 result1 result2
1 1 4 5 4
2 2 5 7 10
3 3 6 9 18
Even if you got that plyr code to deliver something useful, you are still not assigning the results to anything so the it would have evaporated under the glaring sun of garbage collection. And do note that if you followed the advice of #Vlo you would have seen a result at the console that might have led you to think that 'mydata' was updated, but the 'mydata'-object would have remained untouched. You need to assign values back to the original object. For dplyr operations you are generally going to be assigning back entire objects.
You don't need to use apply or variations thereof. Instead, you can exploit that R is vectorized:
mydata$result1 <- mydata$input1 + mydata$input2
mydata$result2 <- mydata$input1 * mydata$input2
#> mydata
# input1 input2 result1 result2
#1 1 4 5 4
#2 2 5 7 10
#3 3 6 9 18