as.numeric and as.character across different columns - r

I am trying to set some variables as character and others as numeric, what I currently have is;
colschar <- c(1:2, 68:72)
colsnum <- c(3:67)
subset <- as.data.frame(lapply(data[, colschar], as.character), (data[, colsnum], as.numeric))
which returns an error.
I am trying to set columns 1:2 and 68:72 as a character and columns 3:67 all as numeric.

I suggest:
data[colschar] <- lapply(data[colschar], as.character)
data[colsnum] <- lapply(data[colsnum], as.numeric)

It should be better if you share an extract of your data. In any case you may try with tidiverse approach:
library(dplyr)
mydf_molt <- mydf %>%
mutate_at(.vars=c(1:2, 68:72),.funs=funs(as.character(.))) %>%
mutate_at(.vars=c(3:67),.funs=funs(as.numeric(.)))

Related

How do I convert all numeric columns to character type in my dataframe?

I would like to do something more efficient than
dataframe$col <- as.character(dataframe$col)
since I have many numeric columns.
In base R, we may either use one of the following i.e. loop over all the columns, create an if/else conditon to change it
dataframe[] <- lapply(dataframe, function(x) if(is.numeric(x))
as.character(x) else x)
Or create an index for numeric columns and loop only on those columns and assign
i1 <- sapply(dataframe, is.numeric)
dataframe[i1] <- lapply(dataframe[i1], as.character)
It may be more flexible in dplyr
library(dplyr)
dataframe <- dataframe %>%
mutate(across(where(is.numeric), as.character))
All said by master akrun! Here is a data.table alternative. Note it converts all columns to character class:
library(data.table)
data.table::setDT(df)
df[, (colnames(df)) := lapply(.SD, as.character), .SDcols = colnames(df)]

Remove a subset of records from a dataframe in r

We can combine 2 dataframes using df = rbind(df, another_df). How it should be if its required to remove another_df from df where rownames of df and another_df are not matching.
df = data.frame(A=c('a','aa','aaa'), B=c('b','bb','bbb'))
rownames(df)
another_df =data.frame(A=c('aa','a'), B=c('bb','b'))
rownames(another_df)=c('3','4')
We can use anti_join
library(dplyr)
anti_join(df, another_df)
Or if this is based on the rownames, then %in% can be used for creating a logical index to subset the rows
df[!row.names(df) %in% row.names(another_df),]
You can do this without using any package very easily with setdiff.
df = data.frame(A=c('a','aa','aaa'), B=c('b','bb','bbb'))
another_df =data.frame(A=c('aa','a'), B=c('bb','b'))
s <- df[setdiff(rownames(df),rownames(another_df)),]
s is the output you want.

Using dplyr, Remove all strings from a data frame

I have a data frame with 300 columns which has a string variable somewhere which I am trying to remove. I have found this solution in stack overflow using lapply (see below), which is what I want to do, but using the dplyr package. I have tried using the mutate_each function but cant seem to make it work
"If your data frame (df) is really all integers except for NAs and garbage then then the following converts it.
df2 <- data.frame(lapply(df, function(x) as.numeric(as.character(x))))
You'll have a warning about NAs introduced by coercion but that's just all those non numeric character strings turning into NAs.
dplyr 0.5 now includes a select_if() function.
For example:
person <- c("jim", "john", "harry")
df <- data.frame(matrix(c(1:9,NA,11,12), nrow=3), person)
library(dplyr)
df %>% select_if(is.numeric)
# X1 X2 X3 X4
#1 1 4 7 NA
#2 2 5 8 11
#3 3 6 9 12
Of course you could add further conditions if necessary.
If you want to use this line of code:
df2 <- data.frame(lapply(df, function(x) as.numeric(as.character(x))))
with dplyr (by which I assume you mean "using pipes") the easiest would be
df2 = df %>% lapply(function(x) as.numeric(as.character(x))) %>%
as.data.frame
To "translate" this into the mutate_each idiom:
mutate_each(df, funs(as.numeric(as.character(.)))
This function will, of course, convert all columns to character, then to numeric. To improve efficiency, don't bother doing two conversions on columns that are already numeric:
mutate_each(df, funs({
if (is.numeric(.)) return(.)
as.numeric(as.character(.))
}))
Data for testing:
df = data.frame(v1 = 1:10, v2 = factor(11:20))
mutate_all works here, and simply wrap the gsub in a function. (I also assume you aren't necessarily string hunting, so much as trawling for non-integers.
StrScrub <- function(x) {
as.integer(gsub("^\\D+$",NA, x))
}
ScrubbedDF <- mutate_all(data, funs(StrScrub))
Example dataframe:
library(dplyr)
options(stringsAsFactors = F)
data = data.frame("A" = c(2:5),"B" = c(5,"gr",3:2), "C" = c("h", 9, "j", "1"))
with reference/help from Tony Ladson

R data frame columns from vector

I must be missing something obvious here but for this:
> range(data$timestamp)
[1] "2015-06-29 09:32:43.000 UTC" "2015-07-03 15:50:35.986 UTC"
I want to do something like:
df <- data.frame(as.Date(range(data$timestamp)))
names(df) <- c('from', 'to')
and get a data frame with columns 'from' and 'to' without needing an extra variable only to index. Written as above data.frame converts the vector to two rows in a single-column data frame. I've tried various combinations of cbind, matrix, t, list and attempts at destructuring. What is the best way to do this?
df <- as.data.frame(as.list(as.Date(range(data$timestamp))))
names(df) <- c('from', 'to')
This will work. data.frames are really just special lists after all.
If you wanted a one-liner, you could use setNames. I've also found this type of thing much more readable now using magrittr:
data$timestamp %>% range %>% as.Date %>% as.list %>% as.data.frame %>% setNames(c("from", "to")
Alternatively, you could cast via a matrix:
df <- as.data.frame(matrix(as.Date(range(data$timestamp)), ncol = 2))
names(df) <- c('from', 'to')
This will, however, strip the class (and other attributes) from the dates. If you instead set the dimensions of the vector using dim<-, then neither print nor as.data.frame will treat it as a matrix (because it still has the class Date).
To get round this, convert to Date after creating the data.frame:
df <- as.data.frame(matrix(range(data$timestamp), ncol = 2))
df[] <- lapply(df, as.Date)
names(df) <- c('from', 'to')
You can try :
range_timestamp <- c("2015-06-29 09:32:43.000 UTC", "2015-07-03 15:50:35.986 UTC")
df <- data.frame(from=as.Date(range_timestamp[1]), to=as.Date(range_timestamp)[2])
df
# from to
#1 2015-06-29 2015-07-03
Another option, using data.table and avoiding indexing:
require(data.table)
df <- `colnames<-`(data.frame(rbind(range_timestamp)), c("from","to"))
df <- setDT(df)[, lapply(.SD, as.Date)]
df
from to
1: 2015-06-29 2015-07-03
Or, as mentionned by #akrun in the comment:
require(data.table)
df <- setnames(setDT(as.list(as.Date(range_timestamp))), c('from', 'to'))[]
I was a few seconds too late with my suggestion. As I see, others have already answered. Anyway: here is an alternative that is similar to what you have attempted:
timestamp <-c("2015-06-29 09:32:43.000 UTC","2015-07-03 15:50:35.986 UTC")
df <- t(data.frame(as.Date(range(timestamp))))
colnames(df) <- c('from', 'to')
rownames(df) <- NULL
#> df
# from to
#[1,] "2015-06-29" "2015-07-03"

Transpose a data frame

I need to transpose a large data frame and so I used:
df.aree <- t(df.aree)
df.aree <- as.data.frame(df.aree)
This is what I obtain:
df.aree[c(1:5),c(1:5)]
10428 10760 12148 11865
name M231T3 M961T5 M960T6 M231T19
GS04.A 5.847557e+03 0.000000e+00 3.165891e+04 2.119232e+04
GS16.A 5.248690e+04 4.047780e+03 3.763850e+04 1.187454e+04
GS20.A 5.370910e+03 9.518396e+03 3.552036e+04 1.497956e+04
GS40.A 3.640794e+03 1.084391e+04 4.651735e+04 4.120606e+04
My problem is the new column names(10428, 10760, 12148, 11865) that I need to eliminate because I need to use the first row as column names.
I tried with col.names() function but I haven't obtain what I need.
Do you have any suggestion?
EDIT
Thanks for your suggestion!!! Using it I obtain:
df.aree[c(1:5),c(1:5)]
M231T3 M961T5 M960T6 M231T19
GS04.A 5.847557e+03 0.000000e+00 3.165891e+04 2.119232e+04
GS16.A 5.248690e+04 4.047780e+03 3.763850e+04 1.187454e+04
GS20.A 5.370910e+03 9.518396e+03 3.552036e+04 1.497956e+04
GS40.A 3.640794e+03 1.084391e+04 4.651735e+04 4.120606e+04
GS44.A 1.225938e+04 2.681887e+03 1.154924e+04 4.202394e+04
Now I need to transform the row names(GS..) in a factor column....
You'd better not transpose the data.frame while the name column is in it - all numeric values will then be turned into strings!
Here's a solution that keeps numbers as numbers:
# first remember the names
n <- df.aree$name
# transpose all but the first column (name)
df.aree <- as.data.frame(t(df.aree[,-1]))
colnames(df.aree) <- n
df.aree$myfactor <- factor(row.names(df.aree))
str(df.aree) # Check the column types
You can use the transpose function from the data.table library. Simple and fast solution that keeps numeric values as numeric.
library(data.table)
# get data
data("mtcars")
# transpose
t_mtcars <- transpose(mtcars)
# get row and colnames in order
colnames(t_mtcars) <- rownames(mtcars)
rownames(t_mtcars) <- colnames(mtcars)
df.aree <- as.data.frame(t(df.aree))
colnames(df.aree) <- df.aree[1, ]
df.aree <- df.aree[-1, ]
df.aree$myfactor <- factor(row.names(df.aree))
Take advantage of as.matrix:
# keep the first column
names <- df.aree[,1]
# Transpose everything other than the first column
df.aree.T <- as.data.frame(as.matrix(t(df.aree[,-1])))
# Assign first column as the column names of the transposed dataframe
colnames(df.aree.T) <- names
With tidyr, one can transpose a dataframe with "pivot_longer" and then "pivot_wider".
To transpose the widely used mtcars dataset, you should first transform rownames to a column (the function rownames_to_column creates a new column, named "rowname").
library(tidyverse)
mtcars %>%
rownames_to_column() %>%
pivot_longer(!rowname, names_to = "col1", values_to = "col2") %>%
pivot_wider(names_from = "rowname", values_from = "col2")
You can give another name for transpose matrix
df.aree1 <- t(df.aree)
df.aree1 <- as.data.frame(df.aree1)

Resources