Exclude columns in read.table() in R [duplicate] - r

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Only read limited number of columns in R
I have a data text file with a million observations and 150 variable (v1 to v150) delimited by semicolons. I need only a selected handful of variables. Is there any way to read in only the variables I need? I am using read.table("filepath/filename.txt", sep=";", header=T). If there is any other way than read.table() with which this can be done?

See help(read.table) and particularly the colClasses argument. Simply set the columns you want to ignore to NULL and they will be skipped.

Related

How do you the return column(s) number(s) based on class of said column? [duplicate]

This question already has answers here:
How to find all numeric columns in data
(2 answers)
Closed 4 years ago.
I have a list of 185 data-frames. I'm trying to edit them so each data frame only shows its numeric columns and also 2 specific, non-numeric ones.
I've had many issues with solving this, so I plan to use a for loop and find the column numbers of all numeric columns, use match to do the same for the two specific ones and then use c() to overwrite the data-frames.
I can pull the column number for the specific ones with
match("Device_Name",colnames(DFList$Dataframe))
successfully.
However, I cannot figure out how to return the numbers for all integer columns in a data-frame.
I have tried
match(is.numeric(colnames(DFList$Dataframe)),colnames(DFList$Dataframe))
and
match(class == "numeric",colnames(DFList$Dataframe),colnames(DFList$Dataframe))
to name a few, but now I am just taking wild stabs in the dark. Any advice would be welcome.
which(sapply(DFList$Dataframe,is.numeric))

Column names: () being replaced by dots when reading csv files [duplicate]

This question already has answers here:
data.frame without ruining column names
(2 answers)
Closed 9 years ago.
I have a CSV file containing some columns with names like "beauty & spas", "american (new)" etc. When I read this file in R and use names() to see column names, they have been converted to "beauty...spas.1" and "american...new..1". How do I prevent them from being converted? I do not want to correct them manually.
If you read the documentation carefully at ?read.table (or ?read.csv) you will quickly see that there is an argument called check.names. You most likely want to set that to FALSE. Keep in mind, though, that those are not syntactically valid column names in R, so you you might actually prefer to change them to something that R will handle more smoothly anyway.

R leaves out columns when using read.csv [duplicate]

This question already has answers here:
How can you read a CSV file in R with different number of columns
(5 answers)
Read a text file with variable number of columns to a list
(3 answers)
Closed 5 years ago.
I have a large comma delimted file that looks something like this:
LS_trap_10c,7C000000395C1641,trap10c_7C000000395C1641_150809.csv,c,5/30/2015,1800,25
LS_trap_10c,7C000000395C1641,trap10c_7C000000395C1641_150809.csv,c,5/30/2015,2000,24.5
LS_trap_10c,7C000000395C1641,trap10c_7C000000395C1641_150809.csv,c,5/30/2015,2200,24.5
LS_trap_10c,7C000000395C1641,trap10c_7C000000395C1641_150809.csv,c,5/31/2015,000,24
LS_trap_10c,7C000000395C1641,trap10c_7C000000395C1641_150809.csv,c,5/31/2015,200,23.5
LS_trap_10c,7C000000395C1641,trap10c_7C000000395C1641_150809.csv,c,5/31/2015,400,23.5,97
LS_trap_10c,7C000000395C1641,trap10c_7C000000395C1641_150809.csv,c,5/31/2015,600,23.5,98.5
As you can see the data vary (the bottom two instances have an extra column) and not all columns contain values. This data displays correctly in excel, but when I attempt to open it in RStudio with:
my_trap_dat = read.csv("path_to_file/la_selva_log.csv",(header = FALSE))
It does not contain all of the data- it leaves out the last column- so i have 7 columns instead of the 8 that are needed to display all data. The data in the last column seem to be just removed from the set when you load them into R.
I found this:
The number of data columns is determined by looking at the first five
lines of input (or the whole input if it has less than five lines), or
from the length of col.names if it is specified and is longer.
But I'm not sure how to implement any change that fixes my issue.
How can I make it so that all of my data is maintained in R?
This question is answered already on StackOverflow:
How can you read a CSV file in R with different number of columns
Read a text file with variable number of columns to a list
I'm sure you find more on stack overflow using the Search.
Quick example (given your exported CSV is not valid):
my_file = file("path_to_file/la_selva_log.csv")
my_data = strsplit(readLines(my_file), ",")
close(my_file)

How to read column names 'as is' from CSV file? [duplicate]

This question already has answers here:
data.frame without ruining column names
(2 answers)
Closed 9 years ago.
I have a CSV file containing some columns with names like "beauty & spas", "american (new)" etc. When I read this file in R and use names() to see column names, they have been converted to "beauty...spas.1" and "american...new..1". How do I prevent them from being converted? I do not want to correct them manually.
If you read the documentation carefully at ?read.table (or ?read.csv) you will quickly see that there is an argument called check.names. You most likely want to set that to FALSE. Keep in mind, though, that those are not syntactically valid column names in R, so you you might actually prefer to change them to something that R will handle more smoothly anyway.

Is there a way to omit the first column when reading a csv [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Only read limited number of columns in R
I have a csv file that is quite large, and so I only want to read the data in R that is relevant. The csv file is 4 columns wide and a several million rows down. But the first column is unnecessary, (as it is a repeated string for every row).
Is there a way to only get the 2nd to 4th columns when reading in the csv file...(its easy enough to remove the original first column post reading it in...but was wondering if there was a more efficient way of doing this).
To expand on Joshua's comment:
data <- read.csv("data.csv",colClasses=c("NULL",NA,NA,NA))
"NULL" (note the quotes!) means skip the column, NA means that R chooses the appropriate data type for that column.

Resources