This question already has answers here:
How can you read a CSV file in R with different number of columns
(5 answers)
Read a text file with variable number of columns to a list
(3 answers)
Closed 5 years ago.
I have a large comma delimted file that looks something like this:
LS_trap_10c,7C000000395C1641,trap10c_7C000000395C1641_150809.csv,c,5/30/2015,1800,25
LS_trap_10c,7C000000395C1641,trap10c_7C000000395C1641_150809.csv,c,5/30/2015,2000,24.5
LS_trap_10c,7C000000395C1641,trap10c_7C000000395C1641_150809.csv,c,5/30/2015,2200,24.5
LS_trap_10c,7C000000395C1641,trap10c_7C000000395C1641_150809.csv,c,5/31/2015,000,24
LS_trap_10c,7C000000395C1641,trap10c_7C000000395C1641_150809.csv,c,5/31/2015,200,23.5
LS_trap_10c,7C000000395C1641,trap10c_7C000000395C1641_150809.csv,c,5/31/2015,400,23.5,97
LS_trap_10c,7C000000395C1641,trap10c_7C000000395C1641_150809.csv,c,5/31/2015,600,23.5,98.5
As you can see the data vary (the bottom two instances have an extra column) and not all columns contain values. This data displays correctly in excel, but when I attempt to open it in RStudio with:
my_trap_dat = read.csv("path_to_file/la_selva_log.csv",(header = FALSE))
It does not contain all of the data- it leaves out the last column- so i have 7 columns instead of the 8 that are needed to display all data. The data in the last column seem to be just removed from the set when you load them into R.
I found this:
The number of data columns is determined by looking at the first five
lines of input (or the whole input if it has less than five lines), or
from the length of col.names if it is specified and is longer.
But I'm not sure how to implement any change that fixes my issue.
How can I make it so that all of my data is maintained in R?
This question is answered already on StackOverflow:
How can you read a CSV file in R with different number of columns
Read a text file with variable number of columns to a list
I'm sure you find more on stack overflow using the Search.
Quick example (given your exported CSV is not valid):
my_file = file("path_to_file/la_selva_log.csv")
my_data = strsplit(readLines(my_file), ",")
close(my_file)
Related
This question already has answers here:
Force R not to use exponential notation (e.g. e+10)?
(4 answers)
Closed 1 year ago.
In the excel sheet , i have Salary columns with large numbers.
But when i read the excel file with read_xlsb() and display the dataframe,
that columns are printed in scientific format with exponential.
How can get rid of this format?
Thanks
Output in R
The best way I have found is to use to Notedpad++ to ensure the large numbers don't get cut or rounded off.
The way I do it
do not make any changes to the CSV file
Open it with Notedpad++, you will see the full number
save the notepad ++ file as txt
Open the txt file using excel
you will find that the number is no longer cut
This question already has answers here:
data.frame without ruining column names
(2 answers)
Closed 9 years ago.
I have a CSV file containing some columns with names like "beauty & spas", "american (new)" etc. When I read this file in R and use names() to see column names, they have been converted to "beauty...spas.1" and "american...new..1". How do I prevent them from being converted? I do not want to correct them manually.
If you read the documentation carefully at ?read.table (or ?read.csv) you will quickly see that there is an argument called check.names. You most likely want to set that to FALSE. Keep in mind, though, that those are not syntactically valid column names in R, so you you might actually prefer to change them to something that R will handle more smoothly anyway.
This question already has answers here:
data.frame without ruining column names
(2 answers)
Closed 9 years ago.
I have a CSV file containing some columns with names like "beauty & spas", "american (new)" etc. When I read this file in R and use names() to see column names, they have been converted to "beauty...spas.1" and "american...new..1". How do I prevent them from being converted? I do not want to correct them manually.
If you read the documentation carefully at ?read.table (or ?read.csv) you will quickly see that there is an argument called check.names. You most likely want to set that to FALSE. Keep in mind, though, that those are not syntactically valid column names in R, so you you might actually prefer to change them to something that R will handle more smoothly anyway.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Only read limited number of columns in R
I have a csv file that is quite large, and so I only want to read the data in R that is relevant. The csv file is 4 columns wide and a several million rows down. But the first column is unnecessary, (as it is a repeated string for every row).
Is there a way to only get the 2nd to 4th columns when reading in the csv file...(its easy enough to remove the original first column post reading it in...but was wondering if there was a more efficient way of doing this).
To expand on Joshua's comment:
data <- read.csv("data.csv",colClasses=c("NULL",NA,NA,NA))
"NULL" (note the quotes!) means skip the column, NA means that R chooses the appropriate data type for that column.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Only read limited number of columns in R
I have a data text file with a million observations and 150 variable (v1 to v150) delimited by semicolons. I need only a selected handful of variables. Is there any way to read in only the variables I need? I am using read.table("filepath/filename.txt", sep=";", header=T). If there is any other way than read.table() with which this can be done?
See help(read.table) and particularly the colClasses argument. Simply set the columns you want to ignore to NULL and they will be skipped.