Error saving data for csv using R - r

When using program R, an error appears.
I wrote the following code.
txt <- readLines(file("test.csv"))
nouns <- sapply(txt, extractNoun, USE.NAMES = F)
head(unlist(nouns), 30)
tail(unlist(nouns), 30)
nouns2 <- unlist(nouns)
nouns <- Filter(function(x) {nchar(x) >= 2}, nouns2)
nouns <- gsub("지금", "", nouns)
show <-unlist(lapply(nouns,extractNoun))
showfrq<- data.frame(table(show),stringAsFactors=F)
aa<-as.matrix(showfrq)
write(aa, "test2.xls")
there is no error in script
but, when I look at the csv file, there are error in dividing sheet
I was expecting this
Why is this happening?
I am using R version 3.2.4
and windows 8 x64
excel version 2015

Two immediate things you need to do: is to change your write function to write.csv. Also, use the filename "test2.csv".
It is not necessary to create the aa matrix before writing to .csv

Related

Problem with XLS files with R's package readxl

I need to read a XLS file in R, but I'm having a problem regarding the way my file is generated and the R function readxl. I do not have this issue with Python, and this is my hope that it's possible to solve this problem inside R.
An application we use at my company exports reports in XLS format (not XLSX). This report is generated daily. What I need is to sum the total value of the rows in each file, in order to create a new report containing each day followed by this total value.
When I try to read these files in R using the readxl package, the program returns this error:
Erro: Can't subset columns that don't exist.
x Location 5 doesn't exist.
i There are only 0 columns.
Run rlang::last_error() to see where the error occurred.
Now, the weird thing is that, when I open the XLS file on Excel before running my script, R is able to run properly.
I guesses this was an error caused by something like the file only being completed when I open it... but the same python script does give me the correct result.
I am now assuming this is a bug in the readxl package. Is there another package I could use to run XLS (and not XLSX)? One that does not depend on Java installed on my computer, I mean.
my readxl script:
if (!require("readxl")) {install.packages("readxl"); library("readxl")}
"%,%" <- function(x,y) paste0(x,"\\",y)
year = "2021"
month = "Aug"
column = 5 # VL_COVAR
path <- "F:\\variancia" %,% year %,% month
tiposDF = c("date","numeric","list","numeric","numeric","numeric","list")
file.names <- dir(path, pattern =".xls")
vari <- c()
for (i in 1:length(file.names)){
file <- paste(path,sep="\\",file.names[i])
print(paste("Reading ", file))
dados <- read_excel(file, col_types = tiposDF)
somaVar <- sum(dados[column])
vari <- append(vari,c(somaVar))
}
vari
file <- paste(path,sep="\\",'Covariância.xls_02082021.xls')
print(paste("Reading ", file))
dados <- read_excel(file, col_types = tiposDF)
somaVar <- sum(dados[column])
vari <- append(vari,c(somaVar))
x <- import(file)
View(x)
Thanks everyone!

How to get a vector of the file names contained in a tempfile in R?

I am trying to automatically download a bunch of zipfiles using R. These files contain a wide variety of files, I only need to load one as a data.frame to post-process it. It has a unique name so I could catch it with str_detect(). However, using tempfile(), I cannot get a list of all files within it using list.files().
This is what I've tried so far:
temp <- tempfile()
download.file("https://url/file.zip", destfile = temp)
files <- list.files(temp) # this is where I only get "character(0)"
# After, I'd like to use something along the lines of:
data <- read.table(unz(temp, str_detect(files, "^file123.txt"), header = TRUE, sep = ";")
unlink(temp)
I know that the read.table() command probably won't work, but I think I'll be able to figure that out once I get a vector with the list of the files within temp.
I am on a Windows 7 machine and I am using R 3.6.0.
Following what was said before, this structure should allow you to check the correct download with a temporary file structure :
temp <- tempfile("test.zip")
download.file("https://url/file.zip", destfile = temp)
files <- list.files(temp)

Error extracting noun in R using KoNLP

I tried to extract noun for R. When using program R, an error appears. I wrote the following code:
setwd("C:\\Users\\kyu\\Desktop\\1-1file")
library(KoNLP)
useSejongDic()
txt <- readLines(file("1_2000.csv"))
nouns <- sapply(txt, extractNoun, USE.NAMES = F)
and, the error appear like this:
setwd("C:\\Users\\kyu\\Desktop\\1-1file")
library(KoNLP)
useSejongDic()
Backup was just finished!
87007 words were added to dic_user.txt.
txt <- readLines(file("1_2000.csv"))
nouns <- sapply(txt, extractNoun, USE.NAMES = F)
java.lang.ArrayIndexOutOfBoundsException Error in
Encoding<-(*tmp*, value = "UTF-8") : a character vector
argument expected
Why is this happening? I load 1_2000.csv file, there are 2000 lines of data. Is this too much data? How do I extract noun like large data file? I use R 3.2.4 with RStudio, and Excel version 2016 on Windows 8.1 x64.
The number of lines shouldn't be a problem.
I think that there might be a problem with the encoding. See this post. Your .csv file is encoded as EUC-KR.
I changed the encoding to UTF-8 using
txtUTF <- read.csv(file.choose(), encoding = 'UTF-8')
nouns <- sapply(txtUTF, extractNoun, USE.NAMES = F)
But that results in the following error:
Warning message:
In preprocessing(sentence) : Input must be legitimate character!
So this might be an error with your input. I can't read Korean so can't help you further.

Error while trying to read .data file in R

I am trying to read car.data file at this location - https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data using read.table as below. Tried various solutions listed earlier, but did not work. I am using Windows 8, R version 3.2.3. I can save this file as txt file and then read, but not able to read the .data file directly from URL or even after saving using read.table
t <- read.table(
"https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data",
fileEncoding="UTF-16",
sep = ",",
header=F
)
Here is the error I am getting and is resulting in an empty dataframe with single cell with "?" in it:
Warning messages:
1: In read.table("https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data", : invalid input found on input connection 'https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data'
2: In read.table("https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data", :
incomplete final line found by readTableHeader on 'https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data'
Please help!
Don't use read.table when the data is not stored in a table. Data at that link is clearly presented in comma-separated format. Use the RCurl package instead and read the data as CSV:
library(RCurl)
x <- getURL("https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data")
y <- read.csv(text = x)
Now y contains your data.
Thanks to cory, here is the solution - just use read.csv directly:
x <- read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data")

Parse multiple XBRL files stored in a zip file

I have downloaded multiple zip files from a website. Each zip file contains multiple html and xml extension files (~ 100K in each).
It is possible to manually extract the files and then parse them. However, i would like to be able to do this within R (if possible)
Example file (sorry it is a bit big) using code from a
previous question
- download one zip file
library(XML)
pth <- "http://download.companieshouse.gov.uk/en_monthlyaccountsdata.html"
doc <- htmlParse(pth)
myfiles <- doc["//a[contains(text(),'Accounts_Monthly_Data')]", fun = xmlAttrs][[1]]
fileURLS <- file.path("http://download.companieshouse.gov.uk", myfiles) [[1]]
dir.create("temp", "hmrcCache")
download.file(fileURLS, destfile = file.path("temp", myfiles))
I can parse the files using the
XBRL package if i manually extract them.
This can be done as follows
library(XBRL)
inst <- file.path("temp", "Prod224_0004_00000121_20130630.html")
out <- xbrlDoAll(inst, cache.dir="temp/hmrcCache", prefix.out=NULL, verbose=T)
I am struggling with how to extract these files from the zip folder and parse each , say, in a loop using R, without manually extracting them.
I tried making a start, but don't know how to progress from here. Thanks for any advice.
# Get names of files
lst <- unzip(file.path("temp", myfiles), list=TRUE)
dim(lst) # 118626
# unzip and extract first file
nms <- lst$Name[1] # Prod224_0004_00000121_20130630.html
lst2 <- unz(file.path("temp", myfiles), filename=nms)
I am using Windows 8.1
R version 3.1.2 (2014-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Using the suggestion from Karsten in the comments, I unzipped the files to a temporary directory, and then parsed each file. I used the snow package to speed things up.
# Parse one zip file to start
fls <- list.files(temp)[[1]]
# Unzip
tmp <- tempdir()
lst <- unzip(file.path(temp, fls), exdir=tmp)
# Only parse first 10 records
inst <- lst[1:10]
# Start to parse - in parallel
cl <- makeCluster(parallel::detectCores())
clusterCall(cl, function() library(XBRL))
# Start
st <- Sys.time()
out <- parLapply(cl, inst, function(i)
xbrlDoAll(i,
cache.dir="temp/hmrcCache",
prefix.out=NULL, verbose=T) )
stopCluster(cl)
Sys.time() - st

Resources