Convert elements of list to dataframe keeping column names unchanged - r

I have a list with several elements in it. I want to convert these elements to separated dataframes. However, I dont know how to retain column names. Please see the code below.
for (i in 1: length(mylist)) {
assign(paste("df_",names(mylist[i]),sep = ""), as.data.frame(data.frame(mylist[i]),
col.names = names(dfi)))
}
names(df_april)
> names(df_april)
[1] "april.X" "april.Latitude" "april.month" "april.count1" "april.AOT40_1" "april.count_full"
[7] "april.AOT40_2" "april.State.Code" "april.County.Code" "april.Longitude" "april.Datum" "april.Parameter.Name"
[13] "april.State.Name" "april.County.Name" "april.year" "april.method"
As you can see above, I tried to use as.data.frame with defined col.names = names(dfi) to get rid of "april" in those column names. But it did not work.
Any idea?

In the OP's code, change the [ to [[ as the first is still a list of length 1 and have a list name as well, while the second [[ extracts the list element. So, naturally, when we do the as.data.frame, the list element names also gets appended while flattening that element
for (i in seq_along(mylist)) {
assign(paste("df_",names(mylist[[i]]),sep = ""), mylist[[i]],
col.names = names(dfi)))
}
names(df_april)
NOTE: It is better not to create mutiple objects in the global env.

Related

How to rename mutliples columns names (with prefix) according to respective dataframe name [R]

I have 15 dataframes, that I have merged together.
Here I'm loading my files.
data_files <- list.files() # Identify file names
for(i in 1:length(data_files)) {
assign(paste0(substr(data_files[i],1,nchar(data_files[i])-4)),
read_excel(paste0("",
data_files[i])))
}
Problem, they have the same columns names. That's why i want to rename the columns with the following code :
colnames(COMMUNITY)
[1] "OBJECTID" "SOURCE_ID" "mean" "LMiIndex Fixed 450000 RS"
[5] "LMiZScore Fixed 450000 RS" "LMiPValue Fixed 450000 RS" "COType Fixed 450000 RS" "NNeighbors Fixed 450000 RS"
[9] "ZTransform Fixed 450000 RS" "SpatialLag Fixed 450000 RS"
colnames(COMMUNITY) <-paste("PREFIX",colnames(COMMUNITY),sep="-")
I would like to do this to my 15 dataframes, so I tried this :
List_df_EU = list(COMMUNITY,CSR_STRATEGY, EMISSIONS,ENV_PILLAR,ESGCOMBINED,ESGCONTROVERSIES,
ESGSCORE,GOV_PILLAR,HUMANRIGHTS,INNOVATION,MANAGEMENT,PRODUCT_RESP, RESSOURCE_USE, SOC_PILLAR, WORKFORCE)
for(i in 1:length(List_df_EU)) {
colnames(List_df_EU[i]) <-paste("AS",colnames(List_df_EU[i]),sep="_")
}
It doesn't work, and, I don't know how to retrieve dataframe name, in order to put him as PREFIX of columns.
I could do it for each dataframe seperately, but it would take a long time, and would not be very clever.
Even after many web researches, I never found something that was automated.
After that, I use the following line of code to merge, It actually works, but as expected every colnames are identical.
Merged_file <- purrr::reduce(List_df_EU, dplyr::left_join, by = 'OBJECTID', suffix = c(".x", ".y"))
First, refer to elements of the list with double square brackets, like so List_df_EU[[i]] (List_df_EU[i] is a sub-list of 1 element, not the element itself).
Second, we could create List_df_EU with tibble::lst() instead of list(), so that elements are automatically named. Then, "AS" can be replaced with names(List_df_EU)[i].
List_df_EU <- tibble::lst(....)
for(i in 1:length(List_df_EU)) {
colnames(List_df_EU[[i]]) <- paste(
names(List_df_EU)[i], colnames(List_df_EU[[i]]), sep = "_")
}
Edit
To allow the subsequent join on OBJECTID, we could rename all columns but OBJECTID, for instance using dplyr that has a nice interface for this:
for(i in 1:length(List_df_EU)) {
List_df_EU[[i]] <- dplyr::rename_with(
List_df_EU[[i]],
~ paste(names(List_df_EU)[i], .x, sep = "_"),
.cols = - OBJECTID
)
}
The easiest may be to bring them all into the same columns but add a column that indicates what file they came from. You could also pivot_wider and separate them again, at that point.
This function is for filling in the column that will be used to identify the source file.
library(tidyverse)
library(data.table)
add_name <- function(flnm) {
fread(flmn) %>%
mutate(filename = basename(flmn))
}
Use this to collect the files and build the data frame.
mergedDF <- list.files(urlOrObject) %>%
map_df(~add_name(.))
Let me know if you have any questions.
Name your list, then you can get the name prefix:
List_df_EU = list(COMMUNITY = COMMUNITY,CSR_STRATEGY = CSR_STRATEGY ...)
to set the colnames there is a [] missing:
colnames(List_df_EU[[i]]) <- ...

How to name Dataframes inside a List

I have created a basic list, and inside this list called lista (not big fantasy I know) there are 10 small dataframes.
Each one of this dataframes is called "numberone","numbertwo",...,"numberten".
When I accede this list I can't see their names.
but the output I can see in the workspace (Rstudio) is this
This below is the code and my tries:
#creating multiple dataframes and a list and then give a title to this dataframes inside the list.
lista = list()
names = c("numberone","numbertwo","numberthree","numberfour","numberfive","numbersix","numberseven","numbereight","numbernine","numberten")
for (i in 1:10) {
x = rnorm(10)
df = data.frame(x)
assign(names[i],df)
lista[[i]] = df
}
#trying to change manually the names of the dataframes inside the "lista" list
names(lista[1]) = "number one"
print(names(lista[1])) #this gives no results
#trying using dput
output = dput(lista[1])
##trying put manually the name in front of the dput output to rename the first dataframe inside lista..
list('numberone'= structure(list(x = c(0.750704535096297, 1.16925878942967,
0.806475114411396, 1.00973486249489, -0.301553383694518, 0.546485320708262,
1.03645444095639, 0.247820396853631, -1.64294545886444, -0.216784798035195
)), class = "data.frame", row.names = c(NA, -10L)))
#this seems to have renamed the first dataframe but, it's not working anyway
lista$numberone
print(names(lista[1])) #still no results
I've tried almost everything I could, but I can't give this single dataframes their names inside the list.
How can i name these dataframes?
Thank You
Try to do names(list)
Here an example using empty lists
list_test = vector("list",4)
names(list_test) = c("A","B","C","D")
list_test
$A
NULL
$B
NULL
$C
NULL
$D
NULL
With your example, I did:
names(lista) <- names
and I get:
names(lista)
[1] "numberone" "numbertwo" "numberthree" "numberfour" "numberfive" "numbersix" "numberseven"
[8] "numbereight" "numbernine" "numberten"
I think you might be looking to use double brackets (e.g.[[1]]) to reference elements in your list. Using your example code, this will work:
names(lista[[1]]) = "number one"
print(names(lista[[1]])) #first element is now called "number one"
You can also use a setNames() function within a Map() function to rename each column for your list of dataframes.
lista <-Map(setNames, lista , names)
lista # each column is now assigned a name from your vector called names
To keep your code clean as possible, it is best to avoid naming objects with the same names as functions. (Your example code uses a vector called "names" but also uses names() function.)

Converting list of Characters to Named num in R

I want to create a dataframe with 3 columns.
#First column
name_list = c("ABC_D1", "ABC_D2", "ABC_D3",
"ABC_E1", "ABC_E2", "ABC_E3",
"ABC_F1", "ABC_F2", "ABC_F3")
df1 = data.frame(C1 = name_list)
These names in column 1 are a bunch of named results of the cor.test function. The second column should consist of the correlation coefficents I get by writing ABC_D1$estimate, ABC_D2$estimate.
My problem is now that I dont want to add the $estimate manually to every single name of the first column. I tried this:
df1$C2 = paste0(df1$C1, '$estimate')
But this doesnt work, it only gives me this back:
"ABC_D1$estimate", "ABC_D2$estimate", "ABC_D3$estimate",
"ABC_E1$estimate", "ABC_E2$estimate", "ABC_E3$estimate",
"ABC_F1$estimate", "ABC_F2$estimate", "ABC_F3$estimate")
class(df1$C2)
[1] "character
How can I get the numeric result for ABC_D1$estimate in my dataframe? How can I convert these characters into Named num? The 3rd column should constist of the results of $p.value.
As pointed out by #DSGym there are several problems, including the it is not very convenient to have a list of character names, and it would be better to have a list of object instead.
Anyway, I think you can get where you want using:
estimates <- lapply(name_list, function(dat) {
dat_l <- get(dat)
dat_l[["estimate"]]
}
)
cbind(name_list, estimates)
This is not really advisable but given those premises...
Ok I think now i know what you need.
eval(parse(text = paste0("ABC_D1", '$estimate')))
You connect the two strings and use the functions parse and eval the get your results.
This it how to do it for your whole data.frame:
name_list = c("ABC_D1", "ABC_D2", "ABC_D3",
"ABC_E1", "ABC_E2", "ABC_E3",
"ABC_F1", "ABC_F2", "ABC_F3")
df1 = data.frame(C1 = name_list)
df1$C2 <- map_dbl(paste0(df1$C1, '$estimate'), function(x) eval(parse(text = x)))

how to separate names using regular expression?

I have a name vector like the following:
vname<-c("T.Lovullo (73-58)","K.Gibson (63-96) and A.Trammell (1-2)","T.La Russa (81-81)","C.Dressen (16-10), B.Swift (32-25) and F.Skaff (40-39)")
Watch out for T.La Russa who has a space in his name
I want to use str_match to separate the name. The difficulty here is that some characters contain two names while the other contain only one like the example I gave.
I have write my code but it does not work:
str_match_all(ss,"(D[.]D+.+)s(\\(d+-d+\\))(s(and)s(D[.]D+.+)s(\\(d+-d+\\)))?")
Perhaps this helps
res <- unlist(strsplit(vname, "(?<=\\))(\\sand\\b\\s)*", perl = TRUE))
res
#[1] "T.Lovullo (73-58)" "K.Gibson (63-96)" "A.Trammell (1-2)" "T.La Russa (81-81)"
To get the names only (if that is what the expected)
sub("\\s*\\(.*", "", res)
#[1] "T.Lovullo" "K.Gibson" "A.Trammell" "T.La Russa"

Multiple Pattern matching in R over multiple files , multiple columns & rows

I've a list of CSV files i need to read from , in which multiple files with columns such as Title, description .... . From these columns over multiple files , a retrieval operation has to be written and matched against another CSV generated from popular keywords(~10k) generated from a tool similar to WordStream SEO.
What i was able to do
#Not sure if this is correct approach
Source1<- read.csv(path to csv file)
Keywords_tomatch<- read.csv(path to csv file)
#cant really take both the columns into single vector and iterate over them
subColdesc <- Source1[,c(3)]
subcolTitle <-Source1[,c(2)]
keywordget<- subset(Keywords_tomatch,grepl("*",Keywords_tomatch$col1))
#Two individual vectors since i'm not sure whether sapply() can be applied over multiple lists Definition: sapply(list,function)
descBoolean <- sapply(keywordget,
function(y)
sapply(subColdesc ,
function(x)
any(grepl(y,x)))
)
TitleBoolean = sapply(keywordget,
function(y)
sapply(subcolTitle ,
function(x)
any(grepl(y,x)))
)
#matches just the first element in the column of keywordget against (~4k) elements in description,title column. i.e returns a warning/error
In grepl(y, x) :
argument 'pattern' has length > 1 and only the first element will be used
I've tried at Akrun's version of grep and it hadn't worked for me
Question :
How to match all the elements in the keywordget vector and retrieve what columns matched on each row of Description,Title and what rows of Description and Title have matched.
In short how to retrieve all the game related products in the Source1 using Keywords_tomatch?
As a sample i'm posting the two files i've gathered. Source1 only contains few rows of 4k rows
Source1 =1.csv,
Keywords_tomatch = Gaming.csv
First, let me point out some possible ways why your code is not working (and correct me if I am wrong):
Your files are read with stringAsFactors = TRUE, and grepl does recognize factor variables for the pattern = argument. But since you did not get an error about grepl not recognizing factors, I assume you converted them to characters before matching.
You need the fixed = TRUE argument for grepl or else it will treat the elements of keys as regular expressions.
Your keywordget is a dataframe, and R treats dataframes as lists when being called as one. So since the first argument of sapply takes a list, it treats keywordget as a list with 1 element. So when this element (which is essentially the entire vector of keywordget) is supplied to the pattern argument of the grepl function, you get the error:
In grepl(y, x) : argument 'pattern' has length > 1 and only the first element will be used
For example, this should work:
sapply(keywordget$GAMING, function(y) {
sapply(source1$title, function(x) {
any(grepl(y,x, fixed = TRUE))
})
})
Below is my solution:
# Read files
source1 = read.csv("source1.csv", stringsAsFactors = FALSE)
keys = read.csv("gaming.csv", stringsAsFactors = FALSE)
# Finds the index of elements in source1 that matches
# with any of the keys
matchIndex = lapply(source1, function(x){
which(Reduce(`|`, lapply(keys$GAMING, grepl, x, fixed = TRUE)))
})
> matchIndex
$title
integer(0)
$description
[1] 189 293 382 402 456
title has zero matches and description has 5
# Returns the descriptions that match
source1$description[matchIndex$description]
# Returns the title corresponding to the descriptions that match
source1$title[matchIndex$description]
> source1$title[matchIndex$description]
[1] "tomb raider: legend"
[2] "namco museum 50th anniversary collection"
[3] "restricted area"
[4] "south park chef's luv shack"
[5] "brainfood games cranium collection 2006"

Resources