Error while trying to read .data file in R - r

I am trying to read car.data file at this location - https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data using read.table as below. Tried various solutions listed earlier, but did not work. I am using Windows 8, R version 3.2.3. I can save this file as txt file and then read, but not able to read the .data file directly from URL or even after saving using read.table
t <- read.table(
"https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data",
fileEncoding="UTF-16",
sep = ",",
header=F
)
Here is the error I am getting and is resulting in an empty dataframe with single cell with "?" in it:
Warning messages:
1: In read.table("https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data", : invalid input found on input connection 'https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data'
2: In read.table("https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data", :
incomplete final line found by readTableHeader on 'https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data'
Please help!

Don't use read.table when the data is not stored in a table. Data at that link is clearly presented in comma-separated format. Use the RCurl package instead and read the data as CSV:
library(RCurl)
x <- getURL("https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data")
y <- read.csv(text = x)
Now y contains your data.

Thanks to cory, here is the solution - just use read.csv directly:
x <- read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data")

Related

Convert bed file to vcf with bed2vcf function

I am trying to convert .bed files to vcf by using the function bed2vcf from bedr R package.
I tried the following code:
cromXvcf <-
bed2vcf("cromXmerged2_pruned_removed_sex_mr_hh_sex_pop.bed",
filename = cromXmerged, zero.based = 1, header = NULL, fasta = "/media/iriel/Cosmos/Doctorado/Proyectos/Cromosoma X/Bases dedatos/human_g1k_v37.fasta")
and it throws the following error:
VALIDATE REGIONS * Checking input type... FAIL ERROR: Not sure what
the input format is! Error in is.valid.region(x) :
Can anybody tell what could be wrong? Any other suggestion of how could I do this conversion without using Perl?
I solved it by loading bed file to variable and changing datatypes for column 1 and 4.
Afterwards I also checked that my reference has chromosomes as chr1..22 not just 1...22.
The other thing I checked that my bed file is sorted.
x <- read.table("cromXmerged2_pruned_removed_sex_mr_hh_sex_pop.bed")
x$V1 <- as.character(x$V1)
x$V4 <- as.character(x$V4)
sapply(x, mode)
y <- bed2vcf(x, zero.based=True, header=NULL, fasta="/media/iriel/Cosmos/Doctorado/Proyectos/Cromosoma X/Bases dedatos/human_g1k_v37.fasta")
And it worked fine for me.

Convert json file with multiple lines to R dataframe

I'm using jsonr to read a JSON file in to R. However, the fromJSON(file="file.json") command is only reading the first line of the file. Here's the JSON:
{"id":"a","emailAddress":"a#a.com","name":"abc"}
{"id":"b","emailAddress":"b#b.com","name":"def"}
{"id":"c","emailAddress":"c#c.com","name":"ghi"}
How do I get all 3 rows into an R dataframe? Note that the above content lives in a single file.
I found a hacky way to do that; First i read in the whole file/string with readr, then i split the data by new lines "\n", and finally i parse each line with fromJSON and then i bind it into one dataframe:
library(jsonlite)
library(readr)
json_raw <- readr::read_file("file.json")
json_lines <- unlist(strsplit(json_raw, "\\n"))
json_df <- do.call(rbind, lapply(json_lines,
FUN = function(x){as.data.frame(jsonlite::fromJSON(x))}))

Write col names while writing csv files in R

What is the proper way to append col names to the header of csv table which is generated by write.table command?
For example write.table(x, file, col.names= c("ABC","ERF")) throws error saying invalid col.names specification.Is there way to get around the error, while maintaining the function header of write.table.
Edit:
I am in the middle of writing large code, so exact data replication is not possible - however, this is what I have done:
write.table(paste("A","B"), file="AB.csv", col.names=c("A1","B1")) , I am still getting this error Error in write.table(paste("A","B"), file="AB.csv", col.names=c("A", : invalid 'col.names' specification.
Is that what you expect, tried my end
df <- data.frame(condition_1sec=1)
df1 <- data.frame(susp=0)
write.table(c(df,df1),file="table.csv",col.names = c("A","B"),sep = ",",row.names = F)

strsplit in R: How do I split one-column data separated by comma into multiple columns?

I am reading data from a website: https://raw.github.com/johnmyleswhite/ML_for_Hackers/master/02-Exploration/data/01_heights_weights_genders.csv
(1) At first I attempted to read the data directly into R with the following code:
raw_data <- read.table("https://raw.github.com/johnmyleswhite/ML_for_Hackers/master/02-Exploration/data/01_heights_weights_genders.csv", stringsAsFactors=FALSE)
But I received the following error:
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") : unsupported URL scheme
So I simply copied the data into a .csv file. I saved this file as "Raw_Data.csv" in a directory. The data is, however, all in one column.
(2) I read this file into R via the following code
raw_data <- read.csv("Raw_Data.csv", stringsAsFactors=FALSE)
What I would like to do is split this one column into three, with the column names as "Gender", "Height", "Weight". What I tried was this:
for(i in 1:nrow(raw_data)){
raw_data$Gender[i] <- strsplit(raw_data$Gender[i], ",")[[1]][1]
raw_data$Height[i] <- strsplit(raw_data$Height[i], ",")[[1]][2]
raw_data$Weight[i] <- strsplit(raw_data$Weight[i], ",")[[1]][3]
}
However, I get this error:
Error in strsplit(raw_data$Gender[i], ",") : non-character argument
Thank you in advance for your help!
may be it was because of quotes,
try
raw_data <- read.csv("Raw_Data.csv", stringsAsFactors=FALSE, quotes="\"")
I was able to read the data into R with 3 columns just fine.
I'm not sure how you saved the data into a .csv file, but I copied the data right into Notepad++ (http://notepad-plus-plus.org/), saved it as a text file, and read it into R with read.csv("filename.txt").

Error importing SPSS data into R

I imported a dataset in the .sav SPSS format, and I'm getting an error that I haven't seen before.
1: In read.spss("C:\\Users\\acer\\Desktop\\X\\X\\PIREDEU\\ees2009_v0.9_20110622.sav", ... :
C:\Users\acer\Desktop\X\X\PIREDEU\ees2009_v0.9_20110622.sav: File contains duplicate label for value 1.1 for variable V200
Error in cat(list(...), file, sep, fill, labels, append) :
argument 2 (type 'list') cannot be handled by 'cat'
This came up after I typed warnings(PIREDEU). I imported the data using the foreign library:
library(foreign)
PIREDEU<-read.spss("C:\\Users\\acer\\Desktop\\X\\X\\PIREDEU\\ees2009_v0.9_20110622.sav", use.value.labels=TRUE, max.value.labels=Inf, to.data.frame=TRUE)
I've fiddled with various combinations for the latter three arguments of the read.spss function, and I've gotten nowhere.
Anyone have any suggestions?
I used the below one and it worked perfectly, just ignore the warning message and check data by typing its name:
mydata4<-read.spss("C:\\Work\\data.sav",use.value.labels=F,to.data.frame=T)
mydata4 # check data
Do you have long strings in the file - longer than 8 bytes? Statistics uses some special arrangements to handle those. It looks like the problem is with the value labels. If you can delete those (using SPSS) you might be able to get the rest of the data.
Try to read data without labels.
library(foreign)
PIREDEU <- read.spss("C:\\Users\\acer\\Desktop\\X\\X\\PIREDEU\\ees2009_v0.9_20110622.sav",
use.value.labels = F,
to.data.frame = T)
Does it work?
Convert the spss datafile into .por (portable file) and in R, install the packages hMisc, memisc and foreign and load the package using library(foreign), library(hMisc) and library(memisc).
Then type the following:
mydata <- spss.get("c:/mydata.por", use.value.labels=TRUE)
# last option converts value labels to R factors

Resources