Importing csv data to data frame not working - r

I try to load data from a csv to a data frame. What I do is that:
input <- read.csv("CONCAT_RESULT.CSV", sep = ",", skip = 1, col.names = c("ABS_ERG","MEHRFACH_COUNTER","TECH_KEY","XX_KEY","YY_SCHLUESSEL","CCC","LAND","HIERARCHIE_STICHTAGSABZUG","DATUM_SST_ERZEUGUNG","UHRZEIT_SST_ERZEUGUNG","FRUEHESTES_ABZUGSDATUM","AGGR_KLASSE_ID","ANTWORT_NUM","ANTWORT_TEXT","UMFRAGETYP_ID","ZZZ_ID","TTT_ID","BEANTWORTUNG_TYP","TRANSFORMIERT"))
In the next step I remove a few columns:
input["HIERARCHIE_STICHTAGSABZUG"] <- NULL
input["DATUM_SST_ERZEUGUNG"] <- NULL
input["UHRZEIT_SST_ERZEUGUNG"] <- NULL
input["FRUEHESTES_ABZUGSDATUM"] <- NULL
input["ANTWORT_TEXT"] <- NULL
Then I try to convert it to a data.frame with:
input.data <- as.data.frame(input)
But typeof(input.data) returns: [1] "list"
Can anybody tell me why?
Thanks

A data.frame is a list of vectors of the same length. Thus, list is a correct type for a data.frame.
Try
typeof(data.frame(a=1, b=2, c=3))
to see, that a data.frame is just a list. To learn more, see help(mode) and help(data.frame).

Related

How do I write a R dataframe to a csv file when every row has its own dataframe?

I have a dataframe where the rows all have their own dataframes. When I use the write.csv() function to save this dataframe into a csv file, I receive the following error:
Error in write.table(staff, "Chiefs of Staff.csv", col.names = NA, sep = ",", :
unimplemented type 'list' in 'EncodeElement'
Here is the code I used
chiefs_of_staff<-jsonlite::fromJSON("http://www.infogo.gov.on.ca/infogo/v1/individuals/search?&keywords=chief%20of%20staff&topOrgId=0&locale=en&_=1569503878383")
staff<-chiefs_of_staff$individuals
write.csv(staff,'Chiefs of Staff.csv')
Any help would be much appreciated.
The following code does what the question asks for.
The problem is complicated by the fact that some of the dataframes in staff[[1]] or staff$assignments have more than 1 row and therefore the dataframe resulting from their rbinding has more than 49 rows.
Also, I have substituted underscores for the spaces in the output filename.
chiefs_of_staff <- jsonlite::fromJSON("http://www.infogo.gov.on.ca/infogo/v1/individuals/search?&keywords=chief%20of%20staff&topOrgId=0&locale=en&_=1569503878383")
staff <- chiefs_of_staff$individuals
assignments <- do.call(rbind, staff[[1]])
assignments$positionTitle <- gsub('<.*>', '', assignments$positionTitle)
assignments$positionTitle <- trimws(assignments$positionTitle)
l <- sapply(staff[[1]], nrow)
n <- nrow(staff[-1])
tmp <- lapply(seq_len(n), function(k){
sapply(staff[k, -1], rep, l[k])
})
tmp <- do.call(rbind, tmp)
out <- cbind(assignments, tmp)
write.csv(out,'Chiefs_of_Staff.csv')
rm(tmp, l, n) # final clean up
You have to convert your json file to a format that write.csv can work with: calling rbind to your list makes a matrix writable to csv.
staff_csv <- do.call("rbind", staff)
write.csv(staff_csv,'Chiefs of Staff.csv')
The assignments column is a list of data.frame, there are a number of ways to handle this. Here is one:
staff$assignments = as.character(staff$assignments)
write.csv(staff,'Chiefs_of_Staff.csv')
That will work.

R - extracting column in dataframes of a loop

I need to save a list of csv files and extract values from thr 13th row on of a specific column (the second one) from each of dataframes.
Here's my try:
temp <- list.files(FILEPATH, pattern="*\\.csv$", full.names = TRUE)
for (i in 1:length(temp)){
assign(temp[i], read.csv(temp[i], header=TRUE, ski[=13, na.strings=c("", "NA")))
subset(temp[i], select=2) #extract the second column of the dataframe
temp[i] <- na.omit(temp[i])
However, this doesn't work. On the one hand, I think that's because of the skip argument of the read.csv command, as it apparently ignores the headers. On the other hand, if skip is not used, the following error pops up:
Error in subset.default(temp[i], select = 2) : argument "subset" is
missing, with no default
When I insert the argument subset=TRUE in the subset command, it doesn't give any error, but no extraction is performed.
Any possible solution?
Without seeing the files it's not easy to tell, but I would use lapply, not a for loop. Maybe you can get inspiration from something like the follwing. I use read.table because you skip = 13 lines and read.csv reads in the first line as column headers. Note that I avoid the use of assign.
df_list <- lapply(temp, read.table, sep = ",", skip = 13, na.strings = c("", "NA"))
names(df_list) <- temp
col2_list <- lapply(df_list, `[[`, 2)
col2_list <- lapply(col2_list, na.omit)
names(col2_list) <- temp
col2_list
If you want col2_list to be a list of df's with just one column each, column 2 of the original files, then, like I've said in comment use
col2_list <- lapply(df_list, `[`, 2)
And to rename that one column and renumber the rows consecutively
new_name <- "the_column_of_choice" # change this!
col2_list <- lapply(col2_list, function(x){
names(x) <- new_name
row.names(x) <- NULL
x
})

use name of dataframe on a list of dataframes

I try to solve a problem from a question I have previously posted looping inside list in r
Is there a way to get the name of a dataframe that is on a list of dataframes?
I have listed a serie of dataframes and to each dataframe I want to apply myfunction. But I do not know how to get the name of each dataframe in order to use it on nameofprocesseddf of myfunction.
Here is the way I get the list of my dataframes and the code I got until now. Any suggestion how I can make this work?
library(missForest)
library(dplyr)
myfunction <- function (originaldf, proceseddf, nonproceseddf, nameofprocesseddf=character){
NRMSE <- nrmse(proceseddf, nonproceseddf, originaldf)
comment(nameofprocesseddf) <- nameofprocesseddf
results <- as.data.frame(list(comment(nameofprocesseddf), NRMSE))
names(results) <- c("Dataset", "NRMSE")
return(results)
}
a <- data.frame(value = rnorm(100), cat = c(rep(1,50), rep(2,50)))
da1 <- data.frame(value = rnorm(100,4), cat2 = c(rep(2,50), rep(3,50)))
dataframes <- dir(pattern = ".txt")
list_dataframes <- llply(dataframes, read.table, header = T, dec=".", sep=",")
n <- length(dataframes)
# Here is where I do not know how to get the name of the `i` dataframe
for (i in 1:n){
modified_list <- llply(list_dataframes, myfunction, originaldf = a, nonproceseddf = da1, proceseddf = list_dataframes[i], nameof processeddf= names(list_dataframes[i]))
write.table(file = sprintf("myfile/%s_NRMSE20%02d.txt", dataframes[i]), modified_list[[i]], row.names = F, sep=",")
}
as a matter of fact, the name of a data frame is not an attribute of the data frame. It's just an expression used to call the object. Hence the name of the data frame is indeed 'list_dataframes[i]'.
Since I assume you want to name your data frame as the text file is named without the extension, I propose you use something like (it require the library stringr) :
nameofprocesseddf = substr(dataframes[i],start = 1,stop = str_length(dataframes[i])-4)

Loop through data frames and select one column in each of the data frame inside the loop using R

I would like to loop through a data frame and select an individual column from the data frame, for this I use the following code, but it gives me an error. Could someone please guide me what should be corrected in this code?
for (i in 1:3) {
cur_file <- paste(i,".csv",sep="")
curfile <- list.files(pattern = cur_file)
rd_data[i] <- read.csv(curfile, header=F,sep="\t")
col1 <- rd_data[i,1] # select the first column in the "1st" data frame
n_val[i] <- rd_data[i,2] # select the second column in the each of "ith" data frame
}
You can do this without the for loop entirely:
files <- list.files(pattern='*.csv')
dat <- lapply(files, read.csv, header=FALSE, sep='\t') # apply read.csv to each element of files
col_1_list <- lapply(dat, '[', 1) # use the [ function, see ?"[" for more info.
n_val_list <- lapply(dat, '[', 2)
Also, your cod:
col1 <- rd_data[i,1] # select the first column in the "1st" data frame
will select the first column of each data.frame not just the first.

Read xts from CSV file in R

I'm trying to read time series from CSV file and save them as xts to be able to process them with quantmod. The problem is that numeric values are not parsed.
CSV file:
name;amount;datetime
test1;3;2010-09-23 19:00:00.057
test2;9;2010-09-23 19:00:00.073
R code:
library(xts)
ColClasses = c("character", "numeric", "character")
Data <- read.zoo("c:\\dat\\test2.csv", index.column = 3, sep = ";", header = TRUE, FUN = as.POSIXct, colClasses = ColClasses)
as.xts(Data)
Result:
name amount
2010-09-23 19:00:00 "test1" "3"
2010-09-23 19:00:00 "test2" "9"
See amount column contains character data but expected to be numeric. What's wrong with my code?
The internal data structure of both zoo and xts is matrix, so you cannot mix data types.
Just read in the data with read.table:
Data <- read.table("file.csv", sep=";", header=TRUE, colClasses=ColClasses)
I notice your data have subseconds, so you may be interested in xts::align.time. This code will take Data and create one object with a column for each "name" by seconds.
NewData <- do.call( merge, lapply( split(Data,Data$name), function(x) {
align.time( xts(x[,"amount"],as.POSIXct(x[,"datetime"])), n=1 )
}) )
If you want to create objects test1 and test2 in your global environment, you can do something like:
lapply( split(Data,Data$name), function(x) {
assign(x[,"name"], xts(x[,"amount"],as.POSIXct(x[,"datetime"])),envir=.GlobalEnv)
})
You cannot mix numeric and character data in a zoo or xts object; however, if the name column is not intended to be time series data but rather is intended to distinguish between multiple time series, one for test1, one for test2, etc. then you can split on column 1 using split=1 to cause such splitting as shown in the following code. Be sure to set the digits.secs or else you won't see the sub-seconds on output (although they will be there in any case):
options(digits.secs = 3)
z <- read.zoo("myfile.csv", sep = ";", split = 1, index = 3, header = TRUE, tz = "")
x <- as.xts(z)

Resources