column names on data frames - r

Does anybody know how to avoid the following problem ?
dat <- as.data.frame(matrix(1:20, nrow=10))
names(dat) <- c("TEST","eval_12")
dat$eval_1
dat$"eval_1"
dat[,"eval_1"]
As far as I understand, "eval_1" is not a name of the data.frame

Related

Assign row names and column names to many matrices

I have a problem which I am sure has an easy answer, but I seem to be unable to figure it out. I have many matrices of the same format, and would like to assign the same column and row names to all of them. I am trying to do this in a loop, by calling all the matrices and then to assign the names.
Here is my reproducible example.
mnames <- letters[1:10] # The names to be assigned
mat1 <- matrix(rnorm(100),10,10)
mat2 <- matrix(rnorm(100),10,10)
mat3 <- matrix(rnorm(100),10,10)
obs <- c("mat2", "mat2", "mat3")
for(i in obs){
rownames(as.name(i)) <- mnames
colnames(as.name(i)) <- mnames
}
It seems like the loop does not call the object, but I do not understand why? Would be grateful for any help, I have tons of matrices and doing all the assigning one by one would be tedious. Thanks!
You can get the matrix in for loop but I think it would be better if you get them in a list with mget, change the dimnames and then if needed assign it back to global enviroment.
list_mat <- lapply(mget(obs), function(x) {dimnames(x) <- list(mnames, mnames);x})
list2env(list_mat, .GlobalEnv)

Using read_csv specifying data types for groups of columns in R

I would like to use read_csv because I am working with a large data. The types of variables are reading incorrectly because I have many missing values. It would be possible to identify the type of variable (column) from the name of the variable, because it includes "DATE" if it is a date-type, "Names" if it is a character type and a rest of the variables can have a default 'col_guess' type. I do not want to type all the 55 variables so I tried this code first:
df <- read_csv('df.csv', col_types = cols((grepl("DATE$", colnames(df))==T)=col_date()), cols((grepl("Name$", colnames(df))==T)=col_character()))
I received tghis message:
Error: unexpected '=' in "df <- read_csv('df.csv', col_types = cols((grepl("DATE$", colnames(df))==T)="
So I tried to write a loop and because the df data is already in R (but the wrongly identified data variables' values have been deleted).
for (colname in colnames(df)){
if (grepl("DATE$", colname)==T){
ct1 <- cols(colname=col_date("%d/%m/%Y"))
}else if (grepl("Name$", colname)==T){
ct2 <- cols(colname=col_character())
}else{
ct3 <- cols(colname=col_guess())
tx <- c(ct1, ct2, ct3)
print(tx)
}
}
It does not do what I would like to get as an output and I do not know how I would need to continue if I would get the loop right.
The data is a public data, you can download it here (BasicCompanyDataAsOneFile): http://download.companieshouse.gov.uk/en_output.html
Any suggestion would be appreciated, thank you.
Since the data is already read in R, you can identify the columns by their names and apply the function to their respective columns.
df <- readr::read_csv('df.csv')
date_cols <- grep('DATE$', names(df))
char_cols <- grep('Name$', names(df))
df[date_cols] <- lapply(df[date_cols], as.Date)
df[char_cols] <- lapply(df[char_cols], as.character)
You can also try type.convert which automatically changes data to their respective types but it might not work for date columns.
df <- type.convert(df)
I read the data in using read_csv
df <- read_csv('DF.csv', col_types = cols(.default="c"))
then I used the following codes for changing the columns' data types
date_cols <- grep('DATE$', names(df))
df[date_cols] <- lapply(df[date_cols], as.Date)

Can't reorder data frame columns by matching column names given in another column

I'm trying to re-order the variables of my data frame using the contents of a variable in another data frame but it's not working and I don't know why.
Any help would be appreciated!
# Starting point
df_main <- data.frame(coat=c(1:5),hanger=c(1:5),book=c(1:5),
bottle=c(1:5),wall=c(1:5))
df_order <- data.frame(order_var=c("wall","book","hanger","coat","bottle"),
number_var=c(1:5))
# Goal
df_goal <- data.frame(wall=c(1:5),book=c(1:5),hanger=c(1:5),
coat=c(1:5),bottle=c(1:5))
# Attempt
df_attempt <- df_main[df_order$order_var]
In you df_order, put stringsAsFactors = FALSE in the data.frame call.
The issue is that you have the order as a factor, if you change it to a character it will work:
df_goal <- df_main[as.character(df_order$order_var)]

R keep only data frames in a list of list

I'm beginning with R so I'm not really good at searching relevant answer for my question. I am sorry if similar questions have been asked.
I have a list made of data frames and lists.
I'd like to know how to keep only data frames so that I can bind them together to produce on huge data frame.
here I give you an example :
L1 <- list(c(1, "abc", 3))
L2 <- list(c("b","d"))
L3 <- list(L1,L2)
brand <- c("A","B","C","D")
price <- c(1,1,3,7)
df <- data.frame(brand , price)
brand2 <- c("E","F","G","H")
price2 <- c(20,3,5,10)
df2 <- data.frame(brand2, price2)
L4 <- list(df, L3, df2)
finaldf <- do.call("rbind.fill", L4)
Unfortunately I got this error : Error: All inputs to rbind.fill must be data.frames
So I know that the problem is that there is a list in that list L4. In my real data, there are even several lists in the big list. So can anyone tell me how to get rid of these lists inside the big list ? Thank you very very much !
You need to filter out which list entries are not data.frames like so:
is_df <- sapply(L4, is.data.frame)
finaldf <- do.call("rbind.fill", L4[is_df])
Alterntatively,
do.call("rbind.fill", Filter(is.data.frame, L4))
You can create an index to subset your list like so:
# Subset list
index <- sapply(L4, is.data.frame)
and then use it to make your final data.frame like so:
finaldf <- do.call("rbind", L4[index])
Keep in mind that in order for this to work both dataframes have to have the same column names, so when you create df2 you should specify the column names like so:
df2 <- data.frame(brand = brand2, price = price)
... before you even do the above.

R move named column to the end of a data frame

I'm trying to move a column to the end of a data frame and I'm struggling
output_index <- grep(output, names(df))
df <- cbind(df[,-output_index], df[,output_index])
This orders the data properly, however it converts the data to a matrix which doesn't work. How can I do this without losing the column names and keeping the data as a data frame.
Didn't need the , in front of the index:
output_index <- grep(output, names(df))
df <- cbind(df[-output_index], df[output_index])
df <- data.frame(id=1:10, output=rnorm(10,1,1), input=rnorm(10,1,1))
output_index <- grep("output", names(df))
res.df <- cbind(df[,-output_index], df[,output_index])

Resources