renaming dataframe column in a list - r

I have a list with two dataframes (each with two columns) and I want to rename a specific column in this list.
sample_df1<-data.frame(coltest11=1:6,coltest12=5:10)
sample_df2<-data.frame(coltest21=5:10,coltest22=1:6)
sample_ls<-list("a"=sample_df1, "b"=sample_df2)
colnames(sample_ls[["a"]][2])<-"test"
names(sample_ls[["a"]][2])
but the result is
[1] "coltest12"
I spent more than an hour looking at other topics but can't figure out what I am missing.

Your current problem is that you are accessing the second entry in the list, then taking its names and trying to change it. Instead, if you want to rename the second column in the a data frame, then just access the second entry in names, and rename it:
names(sample_ls$a)[2] <- "test" # the [2] belongs on the outside, not inside
sample_ls$a
coltest11 test
1 1 5
2 2 6
3 3 7
4 4 8
5 5 9
6 6 10
Data:
sample_df1 <- data.frame(coltest11=1:6, coltest12=5:10)
sample_df2 <- data.frame(coltest21=5:10, coltest22=1:6)
sample_ls <- list(a=sample_df1, b=sample_df2)

Related

Renaming dataframe without writing it to the global environment

I have written a loop that stores data frames in a list and would like to use strings stored in a vector as their names. This way, I could refer to the dataframes stored in the list by their names without having to use indexes. I have searched the internet extensively to this issue but so far have not found any solution.
So far, I have used a workaround: I loop over a list of data frame names using read.csv(). In each iteration, I write the imported data frame to the global environment using assign() which allows me to me set a variable name. Using get() and a pattern matching approach, I then fetch data frames from the global environment and store them in a list.
This approach is quite cumbersome and only works when data frame names follow a shared pattern.
Preferably, I would like to rename data frames without having to use assign():
Name of imported data frame 1 <- First element of vector containing the data frame names
How could I achieve this?
I highly appreciate every help!
My approach to this sort of problem is to use lapply to create the loop and then supply names for the elements of the resulting list. This gives a simple, two line solution once the "create a data frame" function has been written.
For example, generating a random data.frame rather than reading a csv file for easy reproduction:
createDataFrame <- function(x) {
data.frame(X=x, Y=rnorm(5))
}
beatles <- lapply(1:4, createDataFrame)
names(beatles) <- c("John", "Paul", "George", "Ringo")
beatles
$John
X Y
1 1 -1.1590175
2 1 0.6872888
3 1 -0.8868616
4 1 -0.3458603
5 1 1.1136297
$Paul
X Y
1 2 -0.3761409
2 2 -0.9059801
3 2 -0.7039736
4 2 -0.4490143
5 2 1.1337149
$George
X Y
1 3 -0.4804286
2 3 1.0573272
3 3 -1.9000426
4 3 0.8887967
5 3 0.6550380
$Ringo
X Y
1 4 -0.7539840
2 4 -0.3743590
3 4 -0.9748449
4 4 -1.1448570
5 4 -1.3277712
beatles$George
X Y
1 3 -0.4804286
2 3 1.0573272
3 3 -1.9000426
4 3 0.8887967
5 3 0.6550380
Make the obvious changes to createDataFrame for your actual use case.

List containing data tables - Unable to use a function to rename columns?

I have 20 excel files containing city level data for each year. I imported them in a list because I thought it will be easier to loop over them.
The first task that I wanted to do is to change the name of the second column of each file.
If, for a single file I do:
#data is a list of data tables/frames. Example:
data<-list(a = data.frame(1:2,3:4),b = data.frame(5:8,15:18) )
#renaming first column of a (works)
names(data[[1]])[2]<-"ABC"
I am able to rename the column.
To do batch editing I wanted to write a function to be used in lapply. The function should be a simple version of the above thing:
rename <-function(df){
names(df)[2]<-"XYZ"}
Rename(data[[1]]) however, does nothing to the second column. Any ideas why?
You need to return the full modified object at each iteration:
data <- lapply( data, function(x) {names(x)[2]<-"ABC"; x})
data
#---------
[[1]]
X1.2 ABC
1 1 3
2 2 4
[[2]]
X5.8 ABC
1 5 15
2 6 16
3 7 17
4 8 18
I'm sure this is a duplicate but I don't know what the right search terms might be, so I'm just answering it .... again.

Replace semicolon-separated values to tab

I am trying to convert the data which I have in txt file:
4.0945725440979;4.07999897003174;4.0686674118042;4.05960083007813;4.05218315124512;...
to a column (table) where the values are separated by tab.
4.0945725440979
4.07999897003174
4.0686674118042...
So far I tried
mydata <- read.table("1.txt", header = FALSE)
separate_data<- strsplit(as.character(mydata), ";")
But it does not work. separate_data in this case consist only of 1 element:
[[1]]
[1] "1"
Based on the OP, it's not directly stated whether the raw data file contains multiple observations of a single variable, or should be broken into n-tuples. Since the OP does state that read.table results in a single row where s/he expects it to contain multiple rows, we can conclude that the correct technique is to use scan(), not read.table().
If the data in the raw data file represents a single variable, then the solution posted in comments by #docendo works without additional effort. Otherwise, additional work is required to tidy the data.
Here is an approach using scan() that reads the file into a vector, and breaks it into observations containing 5 variables.
rawData <- "4.0945725440979;4.07999897003174;4.0686674118042;4.05960083007813;4.05218315124512;4.0945725440979;4.07999897003174;4.0686674118042;4.05960083007813;4.05218315124512"
value <- scan(textConnection(rawData),sep=";")
columns <- 5 # set desired # of columns
observations <- length(aVector) / columns
observation <- unlist(lapply(1:observations,function(x) rep(x,times=columns)))
variable <- rep(1:columns,times=observations)
data.frame(observation,variable,value)
...and the output:
> data.frame(observation,variable,value)
observation variable value
1 1 1 4.094573
2 1 2 4.079999
3 1 3 4.068667
4 1 4 4.059601
5 1 5 4.052183
6 2 1 4.094573
7 2 2 4.079999
8 2 3 4.068667
9 2 4 4.059601
10 2 5 4.052183
>
At this point the data can be converted into a wide format tidy data set with reshape2::dcast().
Note that this solution requires that the number of data values in the raw data file is evenly divisible by the number of variables.

Twofold, consecutive row selecting starting at different rows in R

I have got the following problem. I have a data.frame with an x and y column representing some points in space:
X<-c(18.25743,18.25783,18.25823,18.25850,18.25863,18.25878,
18.25885,18.25912,18.25943,18.25962,18.25978,18.26000,
18.26022,18.26051,18.26070,18.26095,18.26118,18.26140,
18.26189,18.26250,18.26310,18.26390)
Y<-c(44.69561,44.69564,44.69567,44.69567,44.69586,
44.69600,44.69637,44.69671,44.69691,44.69701,44.69720,
44.69740,44.69763,44.69774,44.69787,44.69790,44.69791,
44.69795,44.69812,44.69802,44.69812,44.69834)
eDF<-data.frame(X,Y)
Now my problem is they are "sorted" wrong for plotting.So what I need is a function to write together the rows of the two points which belong together (in a list of lists):
1 and 12 is ID1
2 and 13 is ID2
3 and 14 is ID3
...
11 and 22 is ID11
Every so created list within the list of lists should have its unique ID (just numerating from 1 to the end). Well because I got this problem in all my data with different length.
It would be great if the starting point of the second consecutive row selecting (the 12) is flexible always taking the first row after half of the data.((rownumber/2)+1) in this example
12.
Well I have tried some things and i think Im on the right way but I cant figure out a solution by myself.
This function is pretty near but i cant manage to make it start at different rows(1 and 12):
lapply(2:nrow(eDF), function(x) eDF[(x-1):x,])
I also tried to figure it out with seq and it would do what i need if i could make a list of lists by connecting both code samples. Well I also need to change the concrete start and end numbers to a dynamic solution.
eDF[(seq(1,to=11,by=1)),] # selecting rows 1 to 11
eDF[(seq(12,to=nrow(eDF),by=1)),] #selecting rows 12 to end
Anyone any ideas?
I don't know if you needed an ID column inside of the new list but another way would be:
#create the IDs
eDF$ID <- rep(1:11,2)
#split the data.frame according to those
mylist <- split(eDF, eDF$ID)
Output:
mylist
$`1`
X Y ID
1 18.25743 44.69561 1
12 18.26000 44.69740 1
$`2`
X Y ID
2 18.25783 44.69564 2
13 18.26022 44.69763 2
$`3`
X Y ID
3 18.25823 44.69567 3
14 18.26051 44.69774 3
$`4`
X Y ID
4 18.2585 44.69567 4
15 18.2607 44.69787 4
#and so on...
You could only do split(eDF, rep(1:11,2) if you don't need the ID column.
We can modify the OP's lapply code
lapply(1:11, function(i) eDF[c(i, i+11),])

making trigrams from a dataframe in R

I would like to find matched elements in a second column with the first column of a data frame ,and create a trigrams using the matched element as the middle element of the trigram. In case of no match, the middle and last element of the trigram will be the unmatched second-column element. Here is an example:
gdf <- data.frame(from=c(1,2,3,4,5),to=c(2,3,1,5,6),stringsAsFactors=FALSE)
gdf
# from to
# 1 2
# 2 3
# 3 1
# 4 5
# 5 6
The output trigrams are as follow:
from middle to
1 2 3
2 3 1
3 1 2
4 5 6
5 6 6
My code with for loop takes a long time to process my huge data set.my data set has 54304 rows.
This is what I wrote:
num <- nrow(gdf)
df2 <- data.frame(from=character(0),middle=character(0),to=character(0),stringsAsFactors=FALSE)
count <- rep(0,nrow(gdf))
for(row in 1:nrow(gdf)){
for(rowc in 1:nrow(gdf)){
if(gdf[rowc,]$from==gdf[row,]$to){
df2[nrow(df2)+1,]<-c(gdf[row,]$from,gdf[row,]$to,gdf[rowc,]$to)
count[row]<-row
}
}
if(count[row]==0){
df2[nrow(df2)+1,]<-c(gdf[row,]$from,gdf[row,]$to,gdf[row,]$to)
}
}
Any help would be greatly appreciated!
Not sure if your example is too simple for this to work in the real data set, but a simple merge works for the example and then I sort the columns to get them back in order since a merge places the column that you merge by as column 1.
Merged <- merge(gdf,gdf,by.x="to",by.y="from")[,c(2,1,3)]
Then you can add in the nomatch elements later using a row bind
rbind(Merged,gdf[! paste(gdf[,1],gdf[,2]) %in% paste(Merged[,1],Merged[,2]),][,c(1,2,2)])

Resources