R Change Columns with prefix from factor to character - r

I have a data frame with factors and characters. I want to change the columns with the column prefix "ID_" to be changed from factors to characters.
I tried the below, but it changes the whole data frame to characters, I just want to change the colnames with "ID_". I don't know how many "ID_" will end up in the data frame (this is part of a larger function that will loop across dataframes with various numbers of "ID_")
###Changes the whole dataframe to character rather than only the intended columns
df.loc[] <- lapply(df.loc[, grepl("ID_", colnames(df.loc))], as.character)

The problem is you assign to the whole data frame with df.loc[] <-. Try this:
my_cols <- grepl("ID_", colnames(df.loc))
df.loc[my_cols] <- lapply(df.loc[my_cols], as.character)

Here is a tidyverse solution:
food <- data_frame(
"ID_fruits" = factor(c("apple", "banana", "cherry")),
"vegetables" = factor(c("asparagus", "broccoli", "cabbage")),
"ID_drinks" = factor(c("absinthe", "beer", "cassis"))
)
food %>%
mutate_at(vars(starts_with("ID_")), as.character)
```

You can also do this with ifelse:
df[] <- ifelse(grepl("^ID_", colnames(df)), lapply(df, as.character), df)

Related

Using read_csv specifying data types for groups of columns in R

I would like to use read_csv because I am working with a large data. The types of variables are reading incorrectly because I have many missing values. It would be possible to identify the type of variable (column) from the name of the variable, because it includes "DATE" if it is a date-type, "Names" if it is a character type and a rest of the variables can have a default 'col_guess' type. I do not want to type all the 55 variables so I tried this code first:
df <- read_csv('df.csv', col_types = cols((grepl("DATE$", colnames(df))==T)=col_date()), cols((grepl("Name$", colnames(df))==T)=col_character()))
I received tghis message:
Error: unexpected '=' in "df <- read_csv('df.csv', col_types = cols((grepl("DATE$", colnames(df))==T)="
So I tried to write a loop and because the df data is already in R (but the wrongly identified data variables' values have been deleted).
for (colname in colnames(df)){
if (grepl("DATE$", colname)==T){
ct1 <- cols(colname=col_date("%d/%m/%Y"))
}else if (grepl("Name$", colname)==T){
ct2 <- cols(colname=col_character())
}else{
ct3 <- cols(colname=col_guess())
tx <- c(ct1, ct2, ct3)
print(tx)
}
}
It does not do what I would like to get as an output and I do not know how I would need to continue if I would get the loop right.
The data is a public data, you can download it here (BasicCompanyDataAsOneFile): http://download.companieshouse.gov.uk/en_output.html
Any suggestion would be appreciated, thank you.
Since the data is already read in R, you can identify the columns by their names and apply the function to their respective columns.
df <- readr::read_csv('df.csv')
date_cols <- grep('DATE$', names(df))
char_cols <- grep('Name$', names(df))
df[date_cols] <- lapply(df[date_cols], as.Date)
df[char_cols] <- lapply(df[char_cols], as.character)
You can also try type.convert which automatically changes data to their respective types but it might not work for date columns.
df <- type.convert(df)
I read the data in using read_csv
df <- read_csv('DF.csv', col_types = cols(.default="c"))
then I used the following codes for changing the columns' data types
date_cols <- grep('DATE$', names(df))
df[date_cols] <- lapply(df[date_cols], as.Date)

Converting List of Vectors to Data Frame in R

I'm trying to convert a list of vectors into a data frame, with there being a column for Company Names and column for the MPE. My list is generated by running the following code for each company:
MPE[[2]] <- c("Google", abs(((forecasted - goog[nrow(goog),]$close)
/ goog[nrow(goog),]$close)*100))
Now, i'm having trouble making it into the appropriate data frame for further manipulation. What's the easiest way to do this?
This is an example list of vectors that I would want to manipulate into a dataframe with the company names in one column and the number in the second column.
test <- list(c("Google", 2))
test[[2]] <- c("Microsoft", 3)
test[[3]] <- c("Apple", 4)
You can use unlist with matrix and then turn into a dataframe. reducing with rbind could take a long time with a large dataframe I think.
df <- data.frame(matrix(unlist(test), nrow=length(test), byrow=T))
colnames(df) <- c("Company", "MPE")
I was actually able to achieve what I wanted with the following:
MPE_df <- data.frame(Reduce(rbind ,MPE))
colnames(MPE_df) <- c("Company", "MPE")
MPE_df

Selecting Columns in R using list and names

I found this question which was helpful in subsetting a DF for numeric columns only.
Selecting only numeric columns from a data frame
However I can't figure out how to do numeric columns PLUS any other columns.
I've tried:
nums <- sapply(df, is.numeric)
df <- df[, c(nums, "charcolumn")]
and:
df <- df[,c(sapply(df, is.numeric), "Pop_Size_Group")]
both of which gave me an "undefined columns selected error"
I understand that the sapply function gives me a list of TRUE/FALSE. How can I subset my df to include all numeric columns PLUS additional columns I identify?
Maybe selecting names and concatenating them with "Pop_Size_Group"
df <- df[,c(names(df)[sapply(df, is.numeric)], "Pop_Size_Group")]

R convert data.frame to list by column

I would like to convert a data.frame into a list of data.frames by column using base R functions and holding the first column constant. For example, I would like to split DF into a list of three data.frames, each of which includes the first column. That is, I would like to end up with the list named LONG without having to type out each list element out separately. Thank you.
DF <- data.frame(OBS=1:10,HEIGHT=rnorm(10),WEIGHT=rnorm(10),TEMP=rnorm(10))
DF
LONG <- list(HEIGHT = DF[c("OBS", "HEIGHT")],
WEIGHT = DF[c("OBS", "WEIGHT")],
TEMP = DF[c("OBS", "TEMP" )])
LONG
SHORT <- as.list(DF)
SHORT
SPLIT <- split(DF, col(DF))
We can loop through the names of 'DF' except the first one, cbind the first column with the subset of 'DF' from the names.
setNames(lapply(names(DF)[-1], function(x) cbind(DF[1], DF[x])), names(DF)[-1])
Or another option would be
Map(cbind, split.default(DF[-1], names(DF)[-1]), OBS=DF[1])

automatic column prefix with cbind and just one column

I have some trouble with a script which uses cbind to add columns to a data frame. I select these columns by regular expression and I love that cbind automatically provides a prefix if you add more then one column. Bit this is not working if you just append one column... Even if I cast this column as a data frame...
Is there a way to get around this behaviour?
In my example, it works fine for columns starting with a but not for b1 column.
df <- data.frame(a1=c(1,2,3),a2=c(3,4,5),b1=c(6,7,8))
cbind(df, log=log(df[grep('^a', names(df))]))
cbind(df, log=log(df[grep('^b', names(df))]))
cbind(df, log=as.data.frame(log(df[grep('^b', names(df))])))
A solution would be to create an intermediate dataframe with the log values and rename the columns :
logb = log(df[grep('^b', names(df))]))
colnames(logb) = paste0('log.',names(logb))
cbind(df, logb)
What about
cbw <- c("a","b") # columns beginning with
cbw_pattern <- paste0("^",cbw, collapse = "|")
cbind(df, log=log(df[grep(cbw_pattern, names(df))]))
This way you do select both pattern at once. (all three columns).
Only if just one column is selected the colnames wont fit.

Resources