numeric fields turning into "char" while using stringsAsFactor = F - r

I am trying to import a few csv files from a specific folder:
setwd("C://Users//XYZ//Test")
filelist = list.files(pattern = ".*.csv")
datalist = lapply(filelist, FUN=read.delim, sep = ',', header=TRUE,
stringsAsFactors = F)
for (i in 1:length(datalist)){
datalist[[i]]<-cbind(datalist[[i]],filelist[i])
}
Data = do.call("rbind", datalist)
After I use the above code, a few columns are type character, despite containing numbers. If I don't use stringsAsFactor = F then the fields read as factor which turns into missing values when I use as.numeric(as.character()) later on.
Is there any solution so that I can keep some fields as numeric? The fields that I want to be as numeric look like this:
Price.Plan Feature.Charges
$180.00 $6,307.56
$180.00 $5,431.25
Thanks

The $, , are not considered numeric, so while using stringsAsFactors = FALSE in the read.delim, it assigns the column type as character. To change that, remove the $, , with gsub, convert to numeric and assign it to the particular columns
df <- lapply(df, function(x) as.numeric(gsub("[$,]", "", x)))

Related

How to assign new data to the variable within variable in R

I have my file names as all.files in working directory. I want to read these files in loop and assign the names as gsub(".csv","", all.files) for each file.
all.files <- c("harvestA.csv", "harvestB.csv", "harvestC.csv", "harvestD.csv",
"seedA.csv", "seedB.csv", "seedC.csv", "seedD.csv")
I tried something like this below but it won't work. What do I need here?
for(i in 1:length(all.files)){
assign(gsub(".csv","", all.files)[i]) <- read.table(
all.files[i],
header = TRUE,
sep = ","
)
}
You could keep them in a named list as it is not a good practice to clutter the environment with lot of global variables
list_df <- lapply(all.files, read.csv)
names(list_df) <- sub("\\.csv", "", all.files)
You can always extract individual dataframes as list_df[["harvestA"]], list_df[["harvestB"]] etc.
If you still need them as separate dataframes
list2env(list_df, .GlobalEnv)
The . is a metacharacter in regex matching any character. So, we can use fixed = TRUE to match the literal dot. Also, in the OP's code, with the assign, there is no need for another assignment operator (<-), the second argument in assign is value and here it is the dataset read with the read.table
for(i in 1:length(all.files)){
assign(sub(".csv","", all.files, fixed = TRUE)[i], read.table(
all.files[i],
header = TRUE,
sep = ","
))
}
An option using strsplit
for (i in seq_along(all.files)) {
assign(x = strsplit(allfiles[i],"\\.")[[1]][1],
value = read.csv(all.files[i]),
pos = .GlobalEnv)
}

read.table that delete 0 of each rows in r

I have a file in which every row is a string of numbers. Example of a row: 0234
Example of this file:
00020
04921
04622
...
When i use read.table it delete all the first 0 of each row (00020 becomes 20, 04921 -> 4921,...). I use:
example <- read.table(fileName, sep="\t",check.names=FALSE)
After this, for obtain a vector i use as.vector(unlist(example)).
I try different options of read.table but the problem remains
The read.table by default checks the column values and change the column type accordingly. If we want a custom type, specify it with colClasses
example <- read.table(fileName, sep="\t",check.names=FALSE,
colClasses = "character", stringsAsFactors = FALSE)
When we are not specifying the colClasses, the function use type.convert to automatically assign the column types based on the value
read.table # function
...
...
data[[i]] <- if (is.na(colClasses[i]))
type.convert(data[[i]], as.is = as.is[i], dec = dec,
numerals = numerals, na.strings = character(0L))
...
...
If I understand the issue correctly, you read in your data file with read.table but since you want a vector, not a data frame, you then unlist the df. And you want to keep the leading zeros.
There is a simpler way of doing the same, use scan.
example <- scan(file = fileName, what = character(), sep = "\t")

make csv data import case insensitive

I realize this is a total newbie one (as always in my case), but I'm trying to learn R, and I need to import hundreds of csv files, that have the same structure, but in some the column names are uppercase, and in some they are lower case.
so I have (for now)
flow0300csv <- Sys.glob("./csvfiles/*0300*.csv")
for (fileName in flow0300csv) {
flow0300 <- read.csv(fileName, header=T,sep=";",
colClasses = "character")[,c('CODE','CLASS','NAME')]
}
but I get an error because of the lower cases. I have tried to apply "tolower" but I can't make it work. Any tips?
The problem here isn't in reading the CSV files, it's in trying to index using column names that don't actually exist in your "lowercase" data frames.
You can instead use grep() with ignore.case = TRUE to index to the columns you want.
tmp <- read.csv(fileName, header = T, sep = ";",
colClasses = "character")
ind <- grep(patt = "code|class|name", x = colnames(tmp),
ignore.case = TRUE)
tmp[, ind]
You may want to look into readr::read_csv2() or even data.table::fread() for better performance.
After reading the .csv-file you may want to convert the column names to all uppercase with
flow0300 <- read.csv(fileName, header = T, sep = ";", colClasses = "character")
colnames(flow0300) <- toupper(colnames(flow0300))
flow0300 <- flow0300[, c("CODE", "CLASS", "NAME")]
EDIT: Extended solution with the input of #xraynaud.

Specifying colClasses in read.table using the class function

Is there a way to use read.table() to read all or part of the file in, use the class function to get the column types, modify the column types, and then re-read the file?
Basically I have columns which are zero padded integers that I like to treat as strings. If I let read.table() just do its thing it of course assumes these are numbers and strips off the leading zeros and makes the column type integer. Thing is I have a fair number of columns so while I can create a character vector specifying each one I only want to change a couple from what R's best guess is. What I'd like to do is read the first few lines:
myTable <- read.table("//myFile.txt", sep="\t", quote="\"", header=TRUE, stringsAsFactors=FALSE, nrows = 5)
Then get the column classes:
colTypes <- sapply(myTable, class)
Change a couple of column types i.e.:
colTypes[1] <- "character"
And then re-read the file in using the modified column types:
myTable <- read.table("//myFile.txt", sep="\t", quote="\"", colClasses=colTypes, header=TRUE, stringsAsFactors=FALSE, nrows = 5)
While this seems like an infinitely reasonable thing to do, and colTypes = c("character") works fine, when I actually try it I get a:
scan() expected 'an integer', got '"000001"'
class(colTypes) and class(c("character")) both return "character" so what's the problem?
You use read.tables colClasses = argument to specify the columns you want classified as characters. For example:
txt <-
"var1, var2, var3
0001, 0002, 1
0003, 0004, 2"
df <-
read.table(
text = txt,
sep = ",",
header = TRUE,
colClasses = "character") ## read all as characters
df
df2 <-
read.table(
text = txt,
sep = ",",
header = TRUE,
colClasses = c("character", "character", "double")) ## the third column is numeric
df2
[updated...] or, you could set and re-set colClasses with a vector...
df <-
read.table(
text = txt,
sep = ",",
header = TRUE)
df
## they're all currently read as integer
myColClasses <-
sapply(df, class)
## create a vector of column names for zero padded variable
zero_padded <-
c("var1", "var2")
## if a name is in zero_padded, return "character", else leave it be
myColClasses <-
ifelse(names(myColClasses) %in% zero_padded,
"character",
myColClasses)
## read in with colClasses set to myColClasses
df2 <-
read.table(
text = txt,
sep = ",",
colClasses = myColClasses,
header = TRUE)
df2

Import csv file with both tab and quotes as separators into R

I have a dataset in csv with separators as displayed below.
NO_CAND";"DS_CARGO";"CD_CARGO";"NR_CAND";"SG_UE";"NR_CNPJ";"NR_CNPJ_1";
CLODOALDO JOSÉ DE RAMOS";"Deputado Estadual";"7";"22111";"PB";"08126218000107";"Encargos financeiros e taxas bancárias";
I am using the function read.csv2 with options
mydataframe <- read.csv2("filename.csv",header = T, sep=";", quote="\\'", dec=",",
stringsAsFactors=F, check.names = F, fileEncoding="latin1")
The code reads in the data, but with all the quotes.
I have tried to delete the quotes using
mydataframe[,] <- apply(mydataframe[,], c(1,2), function(x) {
gsub("\\'", "", x)
})
but it doesn't work.
Any ideas on how I could import the data getting rid of these quotes?
Many thanks.
To delete the quotes, use lapply and gsub as follows.
mydataframe[] <- lapply(mydataframe, function(x) gsub("\"", "", x))
lapply iterates over all columns of the data frame and returns a list; by having mydataframe[] on the LHS of the assignment, you assign the results back into the data frame without losing its attributes (dimensions, names, etc). Also, you don't have any single quotes ' in your data, so searching for them won't achieve anything.

Resources