R find maxima of multiple variables from multiple .CSV files

R find maxima of multiple variables from multiple .CSV files - r

I have multiple csv's, each containing multiple observations for one participant on several variables. Let's say each csv file looks something like the below, and the name of the file indicates the participant's ID:
data.frame(
happy = sample(1:20, 10),
sad = sample(1:20, 10),
angry = sample(1:20, 10)
)
I found some code in an excellent stackoverflow answer that allows me to access all files saved into a specific folder, calculate the sums of these emotions, and output them into a file:
# access all csv files in the working directory
fileNames <- Sys.glob("*.csv")
for (fileName in fileNames) {
# read original data:
sample <- read.csv(fileName,
header = TRUE,
sep = ",")
# create new data based on contents of original file:
data.summary <- data.frame(
File = fileName,
happy.sum = sum(sample$happy),
sad.sum = sum(sample$sad),
angry.sum = sum(sample$angry))
# write new data to separate file:
write.table(data.summary,
"sample-allSamples.csv",
append = TRUE,
sep = ",",
row.names = FALSE,
col.names = FALSE)}
However, I can ONLY get "sum" to work in this function. I would like to not only find the sums of each emotion for each participant, but also the maximum value of each.
When I try to modify the above:
for (fileName in fileNames) {
# read original data:
sample <- read.csv(fileName,
header = TRUE,
sep = ",")
# create new data based on contents of original file:
data.summary <- data.frame(
File = fileName,
happy.sum = sum(sample$happy),
happy.max = max(sample$happy),
sad.sum = sum(sample$sad),
angry.sum = sum(sample$angry))
# write new data to separate file:
write.table(data.summary,
"sample-allSamples.csv",
append = TRUE,
sep = ",",
row.names = FALSE,
col.names = FALSE)}
I get the following warning message:
In max(sample$happy) : no non-missing arguments to max; returning -Inf
Would sincerely appreciate any advice anyone can give me!

using your test data, the max() statement works fine for me. Is it related to a discrepancy between the sample code you have posted and your actual csv file structure?

Related

Records less in data frame

I have found few people posted similar issues but still can't solve my problem. The expected objects in the dataframe were 957463 but only 392400 extracted out.
I used read.delim2("test.csv", header = TRUE, sep = ",", quote = "\"", fill = TRUE) to create dataframe with the expected records but the output were less than expected.
#set working directory --------------------------------
L <- setwd("C:/Users/abmo8004/Documents/R Project/csv/")
#List files in the path ------------------------
l <- list.files(L)
#form dataframe from csv file ---------------------------
df <- read.delim2("test.csv", header = TRUE, sep = ",", quote = "\"", fill = TRUE)
I expect the output to be 957463 , but the actual output is 392400. Can anyone please look at the codes?

Issue downloading and opening xlsx-file from within R

I would like to download and open the following Excel-file with monthly and annual consumer price indices directly from within R.
https://www.bfs.admin.ch/bfsstatic/dam/assets/7066959/master
(the link can be found on this site: https://www.bfs.admin.ch/bfs/de/home/statistiken/preise/landesindex-konsumentenpreise/lik-resultate.assetdetail.7066959.html)
I used to download this file manually using the browser, save it locally on my computer, then open the xlsx-file with R and work with the data without any problems.
I have now tried to read the file directly from within R, but without luck so far. As you can see from the URL above, there is no .xlsx extension or the like, so I figured the file is zipped somehow. Here is what I've tried so far and where I am stuck.
library(foreign)
library(xlsx)
# in a browser, this links opens or dowloads an xlsx file
likurl <- "https://www.bfs.admin.ch/bfsstatic/dam/assets/7066959/master"
temp <- tempfile()
download.file(likurl, temp)
list.files <- unzip(temp,list=TRUE)
data <- read.xlsx(unz(temp,
+ list.files$Name[8]), sheetIndex=2)
The result from the last step is
Error in +list.files$Name[8] : invalid argument to unary operator
I do not really understand the unz function, but can see this is somehow wrong when reading the help file for unz (I found this suggested solution somewhere online).
I also tried the following, different approach:
library(XLConnect)
likurl <- "https://www.bfs.admin.ch/bfsstatic/dam/assets/7066959/master"
tmp = tempfile(fileext = ".xlsx")
download.file(likurl, tmp)
readWorksheetFromFile(tmp, sheet = 2, startRow = 4,
colNames = TRUE, rowNames = FALSE)
with the last line returning as result:
Error: ZipException (Java): invalid entry size (expected 1644 but got 1668 bytes)
I would greatly appreciate any help on how I can open this data and work with it as usual when reading in data from excel into R.
Thanks a lot in advance!

Here's my solution thanks to the hint by #Johnny. Reading the data from excel worked better with read.xlsx from the xlsx-package (instead of read_excel as suggested in the link above).
Some ugly details still remain with how the columns are named (colNames are not passed on correctly, except for the first and 11th column) and how strangely new columns are created from the options passed to read.xlsx (e.g., a column named colNames, with all entries == TRUE; for details, see the output structure with str(LIK.m)). However, these would be for another question and for the moment, they can be fixed in the quick and dirty way :-).
library(httr)
library(foreign)
library(xlsx)
# in a browser, this links opens or dowloads an xlsx file
likurl<-'https://www.bfs.admin.ch/bfsstatic/dam/assets/7066959/master'
p1f <- tempfile()
download.file(likurl, p1f, mode="wb")
GET(likurl, write_disk(tf <- tempfile(fileext = ".xlsx")))
# annual CPI
LIK.y <- read.xlsx(tf,
sheetIndex = 2, startRow = 4,
colNames = TRUE, rowNames = FALSE, stringsAsFactors = FALSE,
detectDates = FALSE, skipEmptyRows = TRUE, skipEmptyCols = TRUE ,
na.strings = "NA", check.names = TRUE, fillMergedCells = FALSE)
LIK.y$X. <- as.numeric(LIK.y$X.)
str(LIK.y)
# monthly CPI
LIK.m <- read.xlsx(tf,
sheetIndex = 1, startRow = 4,
colNames = TRUE, rowNames = FALSE, stringsAsFactors = FALSE,
detectDates = FALSE, skipEmptyRows = TRUE, skipEmptyCols = TRUE ,
na.strings = "NA", check.names = TRUE, fillMergedCells = FALSE)
LIK.m$X. <- as.numeric(LIK.m$X.)
str(LIK.m)

My R code isn't throwing any errors, but it's not doing what it's supposed to

Some background for my question: This is an R script that a previous research assistant wrote, but he did not provide any guidance to me on using it for myself. After working through an R textbook, I attempted to use the code on my data files.
What this code is supposed to do is load multiple .csv files, delete certain items/columns from them, and then write the new cleaned .csv files to a specified directory.
When I run my code, I don't get any errors, but the code isn't going anything. I originally thought that this was a problem with file permissions, but I'm still having the problem after changing them. Not sure what to try next.
Here's the code:
library(data.table)
library(magrittr)
library(stringr)
# create a function to delete unnecessary variables from a CAFAS or PECFAS
data set and save the reduced copy
del.items <- function(file)
{
data <- read.csv(input = paste0("../data/pecfas|cafas/raw",
str_match(pattern = "cafas|pecfas", string = file) %>% tolower, "/raw/",
file), sep = ",", header = TRUE, na.strings = "", stringsAsFactors = FALSE,
skip = 0, colClasses = "character", data.table = FALSE)
data <- data[-grep(pattern = "^(CA|PEC)FAS_E[0-9]+(T(Initial|[0-
9]+|Exit)|SP[a-z])_(G|S|Item)[0-9]+$", x = names(data))]
write.csv(data, file = paste0("../data/pecfas|cafas/items-del",
str_match(pattern = "cafas|pecfas", string = file) %>% tolower, "/items-
del/", sub(pattern = "ExportData_", x = file, replacement = "")) %>%
tolower, sep = ",", row.names = FALSE, col.names = TRUE)
}
# delete items from all cafas data sets
cafas.files <- list.files("../data/cafas/raw/", pattern = ".csv")
for (file in cafas.files){
del.items(file)
}
# delete items from all pecfas data sets
pecfas.files <- list.files("../data/pecfas/raw/", pattern = ".csv")
for (file in pecfas.files){
del.items(file)
}

R write.csv is creating an empty file

Some background for my question: This is an R script that a previous research assistant wrote, but he did not provide any guidance to me on using it for myself. After working through an R textbook, I attempted to use the code on my data files.
What this code is supposed to do is load multiple .csv files, delete certain items/columns from them, and then write the new cleaned .csv files to a specified directory.
Currently, the files are being created in the right directory with the right file name, but the .csv files that are being created are empty.
I am currently getting the following error message:
Warning in
fread(input = paste0("data/", str_match(pattern = "CAFAS|PECFAS",: Starting data input on line 2 and discarding line 1 because it has too few or too many items to be column names or data: (variable names).
This is my code:
library(data.table)
library(magrittr)
library(stringr)
# create a function to delete unnecessary variables from a CAFAS or PECFAS
data set and save the reduced copy
del.items <- function(file){
data <- fread(input = paste0("data/", str_match(pattern = "CAFAS|PECFAS",
string = file) %>% tolower, "/raw/", file), sep = ",", header = TRUE,
na.strings = "", stringsAsFactors = FALSE, skip = 0, colClasses =
"character", data.table = FALSE)
data <- data[-grep(pattern = "^(CA|PEC)FAS_E[0-9]+(TR?(Initial|[0-
9]+|Exit)|SP[a-z])_(G|S|Item)[0-9]+$", x = names(data))]
write.csv(data, file = paste0("data/", str_match(pattern = "CAFAS|PECFAS",
string = file) %>% tolower, "/items-del/", sub(pattern = "ExportData_", x =
file, replacement = "")) %>% tolower, row.names = FALSE)
}
# delete items from all cafas data sets
cafas.files <- list.files("data/cafas/raw", pattern = ".csv")
for (file in cafas.files){
del.items(file)
}
# delete items from all pecfas data sets
pecfas.files <- list.files("data/pecfas/raw", pattern = ".csv")
for (file in pecfas.files){
del.items(file)
}

Avoid repeating statements when importing data

Iv'e written the following code to import data into R:
## specify where all the data files are stored
DataFolder <- "DataFolder"
## obtain the name of each file in DataFolder
files <- list.files(DataFolder)
## obtain name of each file
LocNames <- unique(sub("^([^.]*).*", "\\1", files)) # this removes the extension and keeps the unique names
for (i in 1:length(LocNames)){
#
car <- read.table(paste(DataFolder, paste(LocNames[i], ".car", sep=""), sep="/"),
header = TRUE, sep = "\t", colClasses=c(dateTime="POSIXct"))
car <- aggregate(car[colnames(car)[2:length(colnames(car))]],list(dateTime = cut(car$dateTime,breaks = "hour")),mean, na.rm = TRUE)
#
light <- read.table(paste(DataFolder, paste(LocNames[i], ".light", sep=""), sep="/"),
header = TRUE, sep = "\t", colClasses=c(dateTime="POSIXct"))
light <- aggregate(light[colnames(light)[2]],list(dateTime = cut(light$dateTime, breaks = "hour")),mean, na.rm = TRUE)
}
So, here I have a DataFolder where all of my files are stored. The files are named according to the location where the data was recorded and the extension of the file given the name of the variable measured. Here we have car sales and light as examples.
From here I would like to reduce the size of the arguments inside of the loop so instead of having to name one variable after the other repeating the same steps I want to only have to write the variable name e.g. car, light and then the outcome of the script shown will be returned.
Please let me know if my intentions have not been clear.

Just use a function. Something to the effect of
## specify where all the data files are stored
DataFolder <- "DataFolder"
## obtain the name of each file in DataFolder
files <- list.files(DataFolder)
readMyFiles <- function(DataFolder, LocNames, extension){
data <- read.table(paste(DataFolder, paste(LocNames[i], ".", extension, sep=""), sep="/"),
header = TRUE, sep = "\t", colClasses=c(dateTime="POSIXct"))
data <- aggregate(data[colnames(data)[2:length(colnames(data))]],list(dateTime = cut(data$dateTime,breaks = "hour")),mean, na.rm = TRUE)
data
}
## obtain name of each file
LocNames <- unique(sub("^([^.]*).*", "\\1", files)) # this removes the extension and keeps the unique names
for (i in 1:length(LocNames)){
car <- readMyFiles(DataFolder, LocNames, ".car")
light <- readMyFiles(DataFolder, LocNames, ".light")
}

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R find maxima of multiple variables from multiple .CSV files - r

using your test data, the max() statement works fine for me. Is it related to a discrepancy between the sample code you have posted and your actual csv file structure?

Related

Records less in data frame

Issue downloading and opening xlsx-file from within R

My R code isn't throwing any errors, but it's not doing what it's supposed to

R write.csv is creating an empty file

Avoid repeating statements when importing data

Categories

Resources