This question already has answers here:
data.frame without ruining column names
(2 answers)
Closed 9 years ago.
I have a CSV file containing some columns with names like "beauty & spas", "american (new)" etc. When I read this file in R and use names() to see column names, they have been converted to "beauty...spas.1" and "american...new..1". How do I prevent them from being converted? I do not want to correct them manually.
If you read the documentation carefully at ?read.table (or ?read.csv) you will quickly see that there is an argument called check.names. You most likely want to set that to FALSE. Keep in mind, though, that those are not syntactically valid column names in R, so you you might actually prefer to change them to something that R will handle more smoothly anyway.
Related
This question already has answers here:
How can you read a CSV file in R with different number of columns
(5 answers)
Read a text file with variable number of columns to a list
(3 answers)
Closed 5 years ago.
I have a large comma delimted file that looks something like this:
LS_trap_10c,7C000000395C1641,trap10c_7C000000395C1641_150809.csv,c,5/30/2015,1800,25
LS_trap_10c,7C000000395C1641,trap10c_7C000000395C1641_150809.csv,c,5/30/2015,2000,24.5
LS_trap_10c,7C000000395C1641,trap10c_7C000000395C1641_150809.csv,c,5/30/2015,2200,24.5
LS_trap_10c,7C000000395C1641,trap10c_7C000000395C1641_150809.csv,c,5/31/2015,000,24
LS_trap_10c,7C000000395C1641,trap10c_7C000000395C1641_150809.csv,c,5/31/2015,200,23.5
LS_trap_10c,7C000000395C1641,trap10c_7C000000395C1641_150809.csv,c,5/31/2015,400,23.5,97
LS_trap_10c,7C000000395C1641,trap10c_7C000000395C1641_150809.csv,c,5/31/2015,600,23.5,98.5
As you can see the data vary (the bottom two instances have an extra column) and not all columns contain values. This data displays correctly in excel, but when I attempt to open it in RStudio with:
my_trap_dat = read.csv("path_to_file/la_selva_log.csv",(header = FALSE))
It does not contain all of the data- it leaves out the last column- so i have 7 columns instead of the 8 that are needed to display all data. The data in the last column seem to be just removed from the set when you load them into R.
I found this:
The number of data columns is determined by looking at the first five
lines of input (or the whole input if it has less than five lines), or
from the length of col.names if it is specified and is longer.
But I'm not sure how to implement any change that fixes my issue.
How can I make it so that all of my data is maintained in R?
This question is answered already on StackOverflow:
How can you read a CSV file in R with different number of columns
Read a text file with variable number of columns to a list
I'm sure you find more on stack overflow using the Search.
Quick example (given your exported CSV is not valid):
my_file = file("path_to_file/la_selva_log.csv")
my_data = strsplit(readLines(my_file), ",")
close(my_file)
This question already has answers here:
How to read data when some numbers contain commas as thousand separator?
(11 answers)
Closed 7 years ago.
Today I download dataset in csv format from the Eurostat website. I load this dataset to the rstudio by read.csv command and by subseting get data I need. Now I am in situation that I have 12 observation with around 9 variables. One of the variables is value I am interested in, but the problem is value is coded as factor variable (with 754 levels).
It would be easily overcome by as.numeric command, but problem is that the numbers are in the format like this "48,478", so Rstudio don't see one number (just my guess) and if I use as.numeric command I don't get 48478 but some different number, maybe mean or else but definitely not 48478 as a number. After few minutes I realize that problem is probably with the "," and start looking for solution how to remove it.
One solution I found is that use edit command and erase it manually, but I am planning to use more subsets from the original dataset and I hope it's not necessary to every time I will make new dataset to use edit command and manually erase symbol that make me mad there.
You can read the data in and then replace the "," before converting string to numeric:
Read the dataset with stringsAsFactors=FALSE:
raw <- read.csv("a.csv",stringsAsFactors=FALSE)
Converte the string to numeric (same logic as you replace the "," in editor):
raw$number <- as.numeric(gsub(",","",raw$numberAsString)) # converte the numberAsString to numeric after substituting ","
This question already has answers here:
data.frame without ruining column names
(2 answers)
Closed 9 years ago.
I have a CSV file containing some columns with names like "beauty & spas", "american (new)" etc. When I read this file in R and use names() to see column names, they have been converted to "beauty...spas.1" and "american...new..1". How do I prevent them from being converted? I do not want to correct them manually.
If you read the documentation carefully at ?read.table (or ?read.csv) you will quickly see that there is an argument called check.names. You most likely want to set that to FALSE. Keep in mind, though, that those are not syntactically valid column names in R, so you you might actually prefer to change them to something that R will handle more smoothly anyway.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Only read limited number of columns in R
I have a csv file that is quite large, and so I only want to read the data in R that is relevant. The csv file is 4 columns wide and a several million rows down. But the first column is unnecessary, (as it is a repeated string for every row).
Is there a way to only get the 2nd to 4th columns when reading in the csv file...(its easy enough to remove the original first column post reading it in...but was wondering if there was a more efficient way of doing this).
To expand on Joshua's comment:
data <- read.csv("data.csv",colClasses=c("NULL",NA,NA,NA))
"NULL" (note the quotes!) means skip the column, NA means that R chooses the appropriate data type for that column.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Only read limited number of columns in R
I have a data text file with a million observations and 150 variable (v1 to v150) delimited by semicolons. I need only a selected handful of variables. Is there any way to read in only the variables I need? I am using read.table("filepath/filename.txt", sep=";", header=T). If there is any other way than read.table() with which this can be done?
See help(read.table) and particularly the colClasses argument. Simply set the columns you want to ignore to NULL and they will be skipped.