Unknown result of select command - r

I have multiple .csv files (mydata_1, mydata_2,...) with the same amount of columns and column names(, different row lengths if that helps finding an answer). After reading them into my environment they have the class data.frame . I was putting them all in a list and now want to select specific columns by name from all of them, resulting in in the same variable name with just the chosen columns.
mydata_1 = matrix(c(1:21), nrow=3, ncol=7,byrow = TRUE)
mydata_2 = matrix(c(1:21), nrow=3, ncol=7,byrow = TRUE)
colnames(mydata_1) = c(paste0("X","1":"7"))
colnames(mydata_2) = c(paste0("X","1":"7"))
df1 = as.data.frame(mydata_1)
df2 = as.data.frame(mydata_2)
all_data = c(df1, df2)
class(all_data)
class(df1)
for (i in all_data){
i = select(i,"X3":"X5")
}
My for command shall output the data.frames df1 and df2 with just three columns (instead of the prior seven), but when running the code an error message regarding the select command appears.
Error in UseMethod("select_") :
no applicable method for 'select_' applied to an object of class "c('integer', 'numeric')"
How can I get an working output of my new dfs?

The first issue here is that your are trying to create a list using c(df1, df2), while you have to use list(df1, df2)
Data
library(dplyr)
library(purrr)
mydata_1 = matrix(c(1:21), nrow=3, ncol=7,byrow = TRUE)
mydata_2 = matrix(c(1:21), nrow=3, ncol=7,byrow = TRUE)
colnames(mydata_1) = c(paste0("X","1":"7"))
colnames(mydata_2) = c(paste0("X","1":"7"))
df1 = as.data.frame(mydata_1)
df2 = as.data.frame(mydata_2)
all_data = list(df1 = df1, df2 = df2)
The second problem is within your loop. look, in this approach you have to create an empty list before running the loop, and then aggregate elements in each iteration.
all_data2 <- list()
for(i in 1:length(all_data)) {
all_data2[[i]] <- all_data[[i]] %>% select(X3, X4, X5)
}
try using map from purrr which is part of the tidyverse package and lead to a cleaner code with the same result.
# Down here the `.x` is replaced by each element of the list all_data
# in each iteration, ending wiht a list of two data frames
all_data2 = map(all_data, ~.x %>%
select(X3, X4, X5))

Consider base R's subset with select argument for contiguous column selection, wrapped in an lapply call. Unlike for loop, lapply does not require the bookkeeping to reassign each element back into a list:
all_data <- list(df1 = df1, df2 = df2)
all_data_sub <- lapply(all_data, function(df) subset(df, select=X3:X5))

Related

Change name of the last column in df stored in a list

I have a list with 5 data.frames. Now I want to change the name of the last column of each data.frame.
And I don't know exactly how many columns are in the df.
Example-data:
library(tidyverse)
data(mtcars)
df1 <- tail(mtcars)
df2 <- mtcars[1:5, 2:10]
df3 <- mtcars
df4 <- head(mtcars)
list <- list(df1, df2, df3, df4)
Doing it one by one, this would be the command:
colnames(list$df1)[length(list$df1)] <- "rank"
Within a for loop, I would think that the command would then be:
for (i in seq_along(list)) {
colnames(i)[length(i)] <- "rank"
}
But here I get the error:
Error in `colnames<-`(`*tmp*`, value = `*vtmp*`) :
attempt to set 'colnames' on an object with less than two dimensions
Any idea how to solve this problem? Maybe by the map-command?
Here I don't know how to include the index/length(df) to assign the colnames-command to the last column of the dataframe.
Thank you for your help :)
Kathrin
You can use last_col() from dplyr within map:
library(tidyverse)
list <- map(list,~{
.x %>%
rename(rank = last_col())
})

Losing row names during for loop

I have the following code:
supply = vector(length = 64, mode = 'list')
for (i in 1:64) {
supply[[i]] = df3[rownames(df6),]*df6[,i]
names(supply) <- sheetnames
both df3 and df6 have row names, which is use to match the 64 new tables on. In these new tables the row names dissappear (column names are still there). How do I get the row names in my results? I need to export them to Excel including the row names which are matched in the for loop.
**edit
i tried the following:
supply = vector(length = 64, mode = 'list')
for (i in 1:64) {
supply[[i]] = df3[rownames(df6),]*df6[,i]
row.names(supply[[i]]) = row.names(df6)}
but it does not work
You can try with this. It should return exactly what you're looking for, with the exception that rownames is a column.
# get initial columnnames
colnames3 <- names(df3)
colnames6 <- names(df6)
# set rownames as a column names "rowname"
df3 <- tibble::rownames_to_column(df3)
df6 <- tibble::rownames_to_column(df6)
# join by rowname
df3 <- dplyr::inner_join(df3, df6, by = "rowname")
# define the columns you need
out <- df3[c("rowname", colnames3)]
# your loop!
supply <- lapply(colnames6, function(col){
out[colnames3] <- df3[colnames3] * df3[,col]
out
})
Without a reproducible example is difficult to help you more.
lapply returns a list, so you don't need to initialize supply before and you don't need a for loop either.

summary table for dataset in global environment

Is it a way I can get the data info from global environment into a summary table?
For example, I have a lot of data set named TXXX in my global environment, like
I would like to table that looks like this
Is it possible to also get all the variable list for each data using programing?
it will looks like this:
Any way I can do that by programming? Thanks.
We can use mget to get all the objects that starts with 'T' followed by 3 digit number in to a list , then loo over the list get the number of rows, 'Obs' and number of columns 'Variable'), rbind the list elements after creating the column 'Data' as the names of the list
lst1 <- lapply(mget(ls(pattern = "^T\\d{3}$")),
function(x) data.frame(Obs = nrow(x),
Variable = ncol(x)))
out <- do.call(rbind, Map(cbind, Data = names(lst1), lst1))
row.names(out) <- NULL
If we need the column names, we could use rowr to cbind the column names when the lengths are not the same
lst1 <- lapply(mget(ls(pattern = "^T\\d{3}$")), names)
library(versions)
available.versions('rowr') # // check for available version. Not in CRAN
install.versions('rowr', '1.1.2') # // install a version
library(rowr) # // load the package
do.call(cbind.fill, c(lst1, fill = NA))
Or without installing rowr
mx <- max(lengths(lst1))
do.call(cbind, lapply(lst1, `length<-`, mx))
Or using tidyverse
library(dplyr)
library(purrr)
mget(ls(pattern = '^T\\d{3}$')) %>%
map_dfr(~ tibble(Obs = nrow(.x), Variable = ncol(.x)), .id = 'Data')

Changing Class of Column Across Multiple Dataframes

I have a list of 59 data frames that I want to merge together. Unfortunately, because I have scraped many of them, the columns in the data frames have different classes. They all have the column "Name", some in factor form and some in character form. I want to change all of them to character form. I tried the following
dts <- c("Alabama","Alaska","Arizona","Arkansas","California","Colorado","Connecticut","Delaware","Florida",
"Georgia","Hawaii","Idaho","Illinois","Indiana","Iowa","Kansas","Kentucky","Louisiana","Maine",
"Maryland","Massachusetts","Michigan","Minnesota","Mississippi","Missouri","Montana","Nebraska",
"Nevada","New_Hampshire","New_Jersey","New_Mexico","New_York","North_Carolina","North_Dakota",
"Ohio","Oklahoma","Oregon","Pennsylvania","Rhode_Island","South_Carolina","South_Dakota","Tennessee",
"Texas","Utah","Vermont","Virginia","Washington","West_Virginia","Wisconsin","Wyoming","Federal",
"CCJail","DC","LAJail","NOLA","NYCJail","OCJail","PhilJail","TXJail")
for(i in 1:length(dts)){
dts[i]$Name <- as.character(dts[i]$Name)
}
but it only gave me the error "Error: $ operator is invalid for atomic vectors".
Does anyone know of a good work-around? Thanks in advance for the help!
My ultimate goal is to run
dta <-dplyr::bind_rows(Alabama,Alaska,Arizona,Arkansas,California,Colorado,Connecticut,Delaware,Florida,
Georgia,Hawaii,Idaho,Illinois,Indiana,Iowa,Kansas,Kentucky,Louisiana,Maine,
Maryland,Massachusetts,Michigan,Minnesota,Mississippi,Missouri,Montana,Nebraska,
Nevada,New_Hampshire,New_Jersey,New_Mexico,New_York,North_Carolina,North_Dakota,
Ohio,Oklahoma,Oregon,Pennsylvania,Rhode_Island,South_Carolina,South_Dakota,Tennessee,
Texas,Utah,Vermont,Virginia,Washington,West_Virginia,Wisconsin,Wyoming,Federal,CCJail,
DC,LAJail,NOLA,NYCJail,OCJail,PhilJail,TXJail)
But I get the error "Error: Can't combine ..1$Residents.Confirmed and ..2$Residents.Confirmed ." There are a ton of columns in each data frame, and they are different classes very often. if anyone has a more elegant solution, I would also be open to that instead! Thanks!
We can get the datasets loaded into a list with mget (assuming the dataset objects are already created in the global environment) and then loop over the list with map, change the class of 'Name' column in mutate and row bind with suffix _dfr in map
library(dplyr)
library(purrr)
out <- map_dfr(mget(dts), ~ .x %>%
mutate(Name = as.character(Name)))
If there are many columns that are different class. May be, it is better to convert to a single class for all the columns and then bind
out <- map_dfr(mget(dts), ~ .x %>%
mutate(across(everything(), as.character)))
out <- type.convert(out, as.is = TRUE)
If the dplyr version is < 1.0.0, use mutate_all
out <- map_dfr(mget(dts), ~ .x %>%
mutate_all(as.character))
d1 <- data.frame(
Name = as.factor(c("name1", "name2")),
Residents.Confirmed = c(0,1)
)
d2 <- data.frame(
Name = c("name3", "name4"),
Residents.Confirmed = c(2,3)
)
dataframes_list <- list(d1, d2)
for(i in 1:length(dataframes_list)){
dataframes_list[[i]]$Name <- as.character(dataframes_list[[i]]$Name)
}
bind_rows(dataframes_list)
Base R solution:
type.convert(do.call("rbind",
Map(function(x){data.frame(lapply(x, as.character))}, dataframes_list)))
Data thanks #chase171:
d1 <- data.frame(
Name = as.factor(c("name1", "name2")),
Residents.Confirmed = c(0,1)
)
d2 <- data.frame(
Name = c("name3", "name4"),
Residents.Confirmed = c(2,3)
)
dataframes_list <- list(d1, d2)

R Paste List to Bind

data1 = data.frame("time" = c(1:10))
data2 = data.frame("time" = c(11:20))
data3 = data.frame("time" = c(21:30))
data4 = data.frame("time" = c(31:40))
rbind(data1, data2, data3, data4)
rbind(paste("'","data","'",1:4,sep=","))
I want to bind together a whole bunch of data frames but instead of spelling out all of them want to use paste functions. Here in my simple example you will see it doesn't work as desired but when I spell out the dataframes it works..
We can use mget on the pasted strings to return the values of the object names in a list and then rbind the elements with do.call
`row.names<-`(do.call(rbind, mget(paste0('data', 1:4))), NULL)
Or use pattern in ls
do.call(rbind, mget(ls(pattern = '^data\\d+$')))
With data.table, it would be rbindlist
library(data.table)
rbindlist(mget(paste0('data', 1:4)))

Resources