error in nops_eval 'register' does not contain all columns registration/name/id - r-exams

I am trying to use the nops_eval function like this:
eval <- nops_eval(register = "nops_eval/pauta.csv",
solutions = "nops_eval/Ex_AEI_MTI_v.rds",
scans = "nops_eval/nops_scan_20210712161737.zip",
language = "pt",
eval = exams_eval(partial = F, negative = -.25,rule = "false"),
dir = "eval",
mark = F,
file = "exame_M2_ep_rec",
results = "nops_eval_M2rec",
interactive = T)
My register file is a csv with semi-colon separated values:
But I am getting this error:
Error in nops_eval(register = "nops_eval/pauta.csv", solutions = "nops_eval/Ex_AEI_MTI_v.rds", : 'register' does not contain all columns registration/name/id
That I can't really explain. I am running Rstudio on Windows 10.
Any idea of what may be causing the non-recognition of "registration;name;id"?
Thanks!

The reason must be that when read into R the data has different column names. To check this "by hand" you can use
x <- read.csv2("nops_eval/pauta.csv", colClasses = "character")
names(x)
## [1] "registration" "name" "id"
This is what nops_eval() uses internally. Possibly, the problems are created by the Byte order mark added by some software packages (notably Excel) at the beginning of CSV files to signal how they were stored. Depending on the locale settings these may lead to hiccups and unwanted characters when reading the header line of the CSV.
To work around such problems it's best to fix the header line and re-save the CSV file, e.g., using write.table() or write.csv2() in R.

Related

How to define column names while using read.csv.sql on a file without header?

I often use read.csv function to read large CSV files. The files are without header and therefore by using col.names parameter I define properly the name of the variables in the dataframe that would be created after import.
Today, for the first time, I had to use read.csv.sql which is available in sqldf package. The file to import is very big and I only need certain rows based on a condition in that file. According to the online documentation, the filter has to be defined in the WHERE clause of the SELECT statement. Let's say that I have a column in my file (among other columns) which is user_account and I want to import only rows where the condition user_account = 'Foo' is satisfied. Therefore, I have to write something like
df <- read.csv.sql(
"my_big_data_file.csv",
sql = "select * from file where user_account = 'Foo'",
header = FALSE,
colClasses = c(... Here I define column types ...),
sep = "|",
eol = "\n"
)
Now the problem is, unlike read.csv apparently there is no col.names parameter in read.csv.sql. And given that my file has no header I don't know how to refer to column names. I get an error message as I have written user_account in the WHERE clause of the sql parameter in the above code. R complains that there is no such variable.
So, how can I refer to column names using read.csv.sql for a CSV file without header and at the same time referring to those column names in my filter? Is this even possible?
Thanks in advance
Finally I found the answer in the documentation of read.csv.sql. Instead of colClasses one has to use fields.types by specifying directly data types as they are defined in SQLite and not in R.
field.types: A list whose names are the column names and whose
contents
are the SQLite types (not the R class names) of the columns.
Specifying these types improves how fast it takes. Unless
speed is very important this argument is not normally used.
SQLite data types are available here
Therefore I modified my program accordingly :
df_tmp <- read.csv.sql(
file = input_file_path,
sql = "
select
*
from
file
where trim(lower(user_account)) = 'foo'",
header = FALSE,
sep = "|",
eol = "\n",
`field.types` = list(
col1 = c("TEXT"),
col2 = c("TEXT"),
user_account = c("TEXT"),
col4 = c("REAL"),
col5 = c("REAL")
),
dbname = tempfile(),
drv = "SQLite"
)
However, at the end I had to convert explicitly via as.numeric one variable that had been converted to character. But the program indicated this by a clear warning message. So at the end, this solution did the job for me.
I hope this might help those who have encountered the same problem.

An R Loop to wrap text in multiple saved excel files

I have hundreds of excel files with a single column and a single sheet containing text. I am trying to write a loop that will 'Wrap Text' and align the single column in all of the files, preferably without reading the files into R.
I already set the style object as follows:
style <-
openxlsx::createStyle(
halign = "left",
valign = "center",
wrapText = T
)
I have tried both a for loop and lapply but both only performs the openxlsx::addStyle to one file out of the 100s. Doesn't have to be openxlsx, it can be xlConnect or or any other package for xlsx files, even VBA is welcomed, if I can call it from R.
Please help.
Thanks in advance.
This will probably be pretty slow and will most likely require reading the files into R, so I'm not sure how much this helps 😅.
Libraries
library(openxlsx)
Find files
First you need a list of all the excel files you have:
xlsx_paths <- list.files(path = "./folder_with_yr_excels", pattern = "xlsx$")
This will create a vector of all the .xlsx files you have in the folder.
Write function
Then we can write a function to do what you want to a single file:
text_wrapper <- function(xlsx_path){
#this links the file to R using the openxlsx package
n3 <- openxlsx::loadWorkbook(file = xlsx_path)
# this creates the style that you wanted:
style <-
openxlsx::createStyle(
halign = "left",
valign = "center",
wrapText = TRUE
)
# this adds the style to the excel file we just linked with R
openxlsx::addStyle(n3, sheet = 1, cols = 1:400, rows= 1:400, style, gridExpand = TRUE)
#this removes the .xlsx part from the path name
xlsx_path2 <- sub(pattern = ".xlsx",
replacement = "",
x= xlsx_path)
# This is the naming standard I'll use:
#"original_file_name" -> "original_file_name_reformatted.xlsx"
new_path <- paste(xlsx_path2, "_reformatted", ".xlsx", sep = "")
# this saves the reformated excel file
saveWorkbook(n3, file = new_path, overwrite = TRUE)
}
Notes
For other people coming across this post, here's a more in depth description of the openxlsx R package and some of the formatting things that can be done with it: https://cran.r-project.org/web/packages/openxlsx/openxlsx.pdf
An annoying thing about this package is that you have to specify how many rows and columns you want to apply the style to, which become annoying when you don't know how many rows and columns you have. The not great workaround is to specify a large number of columns (in this case I did 400)
openxlsx::addStyle(n3, sheet = 1, cols = 1:400, rows= 1:400, style, gridExpand = TRUE)
As of the time of posting, it sounds like there's not a better solution: https://github.com/awalker89/openxlsx/issues/439
Apply function to files
Anyways, the final step is to apply the function we wrote to all the excel files we found.
lapply(paste("./folder_with_yr_excels",xlsx_paths,sep = ""), text_wrapper)
Since that was done inside of a function we don't have to go back and delete intermediate data file. Yay!
Notes
The paste("./folder_with_yr_excels",xlsx_paths,sep = "") step adds the folder name back to the path name. There's an option in list.files() to keep the whole file path intact, but I like to keep track of which folder I'm dealing with by pasting the folder name back on at the end.

Exporting a list of dataframes as csv

I have a list of dataframes which I want to export as csv. I tried :
for (i in listofdf){
write.csv(listofdf[i], file = paste(names(listofdf)[i],".csv", sep=""), row.names = FALSE)
}
and got : Error in listofdf[i] : invalid subscript type 'list'. I read that I am feeding a list data type into a process that expects a vector, and tried to apply the given solution : unlist(listofdf), but all I got is a massive list of numeric values that I don't know how to deal with.
Then I tried a solution found here, that works with the given example, but when I try to apply :
sapply(names(listofdf),
function (x) write.table(listofdf[x],
file = paste(names(listofdf)[x],".csv", sep=""),
row.names = FALSE))
but when I try it, it only exports one file named NA.csv. Do you know how to make any of those solutions work?
Your problem is how you're indexing your list object and names(listofdf)[i] isn't doing what you're thinking. Try This:
listofdf <- list(a = iris, b = iris, c = iris)
for (i in seq_along(listofdf)){
write.csv(listofdf[[i]], file = paste0(names(listofdf)[i], ".csv"), row.names = F)
}
Side note: the default separator for paste is a space. So you're putting a space before the ".csv" extension with your code. paste0 does not paste strings together with a space.
Alternatively, as mentioned you can use writexlsx by simply:
library(writexl)
write_xlsx(listofdfs, "output.xlsx")
This will create a file called "output.xlsx" with sheets that match names(listofdfs) and the proper values stored within those sheets.

Issues with user function/ map to read in and combine DBF files in R

I have written a function to read in a set of dbf files. Unfortunately, these files are very large, and I wouldn't want anyone to have to run them on my behalf.
readfun_dbf = function(path) {
test = read.dbf(path, as.is = TRUE) # dont convert to factors
test
}
dbfiles identifies the list of file names. map_dfr applies my function to the list of files and row binds them together. I've used very similar code to read in some text files, so I know the logic works.
dbfiles = list.files(pattern = "assign.dbf", full.names = F, recursive = T)
dbf_combined <- map_dfr(dbfiles, readfun_dbf)
When I run this, I get the error:
Error: Column `ASN_PCT` can't be converted from integer to character
So I ran the read.dbf command on all the files individually and noticed that some dfb files were being read in with all their feilds as characters, and some were being read in with some as integers and characters. I figured that map_dfr needs the fields to be of the same type to bind them, so I added the mutate_all command to my function--but it's still throwing the same error.
readfun_dbf = function(path) {
test = read.dbf(path, as.is = TRUE) # dont convert to factors
**mutate_all(test,as.character)**
test
}
Do you think the mixed field types are the issues? Or could it be something else? Any suggestions would be great!
Assign the value back to the object.
readfun_dbf = function(path) {
test = read.dbf(path, as.is = TRUE)
test <- dplyr::mutate_all(test,as.character)
return(test)
}
and then try :
dbf_combined <- purrr::map_dfr(dbfiles, readfun_dbf)

Error when reading in data

I’ trying to read in a set of data tables. All of which is representing different parts of a larger Excel table, selected using ”filter” and saved individually as a .csv file. Most of my tables have 5 rows of data but two of them have 4 rows. The tables with 5 rows of data reads in to R as requested:
Y <- read.csv(file = "MyFile.csv", row.names = 1,header = T, sep = ";")
No problem.
The tables with 4 rows of data gives following error meassage:
In read.csv("MyFile.csv", quote = "", : incomplete final line found
by readTableHeader on ' MyFile.csv'
It’s the same problem with
Z <- read.table("MyFile.csv", quote = "", sep = ';', header = TRUE)
There is no missing data in the file. When I print the Y or Z object in R no missing data is visible (or invisible as it were).
I know the problem is extremely simple, but as I’ve got frustration pouring out of my ears my officemates would really appreciate your help.
The final line of your CSV doesn't have a line feed or carriage return.
Plan A: open the files in a text editor, go to the end of the final line, hit enter and then save the modified file.
Plan B: if there are too many files for Plan A, you could simply ignore the warnings, since the files seem to be loaded fine (apart from that message).

Resources