Hei,
I want to import JSON files from a folder to R data frame (as a single matrix). I have about 40000 JSON files with one observation each and different variable sizes.
I tried following codes
library(rjson)
jsonresults_all <- list.files("mydata", pattern="*.json", full.names=TRUE)
myJSON <- lapply(jsonresults_all, function(x) fromJSON(file=x))
myJSONmat <- as.data.frame(myJSON)
I want my data frame something like (40000 observations (rows) and some 175 variables (column) with some variable values NA.
But I get a single row containing each observation appended to the right.
Many thanks for your suggesion.
Related
Sorry for the terrible title. First post here, and new with R.
I am trying to import data from multiple CSV files, extract a single row from each CSV to individual data frames then make a new data frame for a specific value from each initial data frame. I hope this makes sense.
Here is the code I have used so far:
# Take downloaded IFD csv's for 15 points, extract 1% AEP, 6 hour rainfall depths.
files <- list.files(path = "C:PATH")
for (i in 1:length(files)){ # Head of for-loop, length is 15 files
assign(paste0("data", i), # Read and store data frames for row containing 6 hour depths
read.csv2(paste0("C:PATH", files[i]), sep = ",", header = FALSE, nrows = 1, skip = 26))
}
#final value in data frame, position [1,9] is the 1% AEP depth for 6 hours. Extract all of these values from the initial 15 data frames into new dataframes.
for (i in 1:15) {
SixHourOnePercentAEP[i] <- data[i][1,9]
}
In the last argument, an error is returned trying to call data[i][1,9] since dataframe[x,y] is trying to find a cell where the iteration of the i occurs. Looking for a way around this.
It seems that you are trying to create dataframes such as data1, data2, etc for each corresponding file. Then you are trying to access the i-th dataframe with the syntax data[i].
But that's not how it works. "data" is not an array of dataframes, but instead you have different variables named data1, data2, etc. What you need is to access given variable by name. You can do it this way:
for (i in 1:15) {
SixHourOnePercentAEP[i] <- get(paste0("data",i))[1,9]
}
The get() function gets a variable whose name has been passed as a character argument.
I found however your code extremely inefficient. Why gather all the entire dataframes beforehand, when the only thing you need is one cell from each one? You should rewrite your first loop to extract the desired value from the dataframe immediately then store it, discarding the rest of the data right away if I understand you purpose correctly.
I'm just learning R. I have 300 different files containing rainfall data. I want to create a function that takes a range of values (i.e., 20-40). I will then read csv files named "020.csv", "021.csv", "022.csv" etc. up to "040.csv".
Each of these files has a variable named "rainfall". I want to open each csv file, extract the "rainfall" values and store (append) them to some sort of object, like a data frame (maybe something else is better?). So, when I'm done, I'll have a data frame or list with a single column containing rainfall data from all processed files.
This is what I have...
rainfallValues <- function(id = 1:300) {
df = data.frame()
# Read anywhere from 1 to 300 files
for(i in id) {
# Form a file name
fileName <- sprintf("%03d.csv",i)
# Read the csv file which has four variables (columns). I'm interested in
# a variable named "rainfall".
x <- read.csv(fileName,header=T)
# This is where I am stuck. I know how to exact the "rainfall" variable values from
# x, I just don't know how to append them to my data frame.
}
}
Here is a method using lapply that will return a list of rainfalls
rainList <- lapply(id, function(i) {
temp <- read.csv(sprintf("%03d.csv",i))
temp$rainfall
})
To put this into a single vector:
rainVec <- unlist(rainList)
comment
The unlist function will preserve the order that you read in the files, so the first element of rainVec will be the first observation of the first rainfall column from the first file in id and the second element the second observation in that files and so on to the last observation of the last file.
I am extracting data from multiple CSV files and attempting to combine them into a single data frame. The source data is formatted weirdly, so I have to extract the data from specific locations in the source, then place them in a logical pattern in my resulting data frame.
I created two vectors of equal length and pulled the data from my source files. The end result is that I wind up with two vectors of length 3 (as expected), but instead of having a 3x2 data frame (3 observations of 2 variables), I wind up with a 1x6 data frame (1 observation of 6 variables).
What is curious to me is that although RStudio deems them both to be "List of 3", when I show them in the console, they display very differently:
The source code which doesn't work:
#set the working directory to where the data files are stored
setwd("/foo")
# identify how many data files are present
files = list.files("/foo")
# create vectors long enough to contain all the postal codes and income data
postalCodeData=vector(length=length(files))
medianIncomeData=vector(mode="character", length=length(files))
# loop through all the files, pulling data from rows 2 and 1585.
for(i in 1:length(files)) {
x = read.csv(files[i],skip=1,nrows= 1,header=F)
y = read.csv(files[i], skip = 1584, nrows = 1,header=F)
postalCodeData[i]=x
medianIncomeData[i]=y[2]
}
#create the data frame
Results=data.frame(postalCodeData,medianIncomeData)
#name the columns
names(Results)=c("FSA", "Median Income")
My data frame winds up looking like this:
Source code which does work:
setwd("/Users/Perry/Downloads/Postal Code Data/")
files = list.files("/Users/Perry/Downloads/Postal Code Data/")
postalCodeData=c("K0A","K0B","K0C")
medianIncomeData=c("10000","20000","30000")
Results=data.frame(postalCodeData,medianIncomeData)
names(Results)=c("FSA", "Median Income")
Unfortunately, I can't specify the values explicitly because I have a few hundred files to extract the information from. Any advice on how I can correct the loop to get the desired results would be appreciated.
The output of "read.csv" is a data frame, so, when you store
medianIncomeData[i]=y[2]
you are storing a column of a data frame, use
medianIncomeData[i]=y[2][1]
instead, to store only the value that you want, the same for x
I usually read a bunch of .csv files into a list of data frames and name it manually doing.
#...code for creating the list named "datos" with files from library
# Naming the columns of the data frames
names(datos$v1r1)<-c("estado","tiempo","x1","x2","y1","y2")
names(datos$v1r2)<-c(...)
names(datos$v1r3)<-c(...)
I want to do this renaming operation automatically. To do so, I created a data frame with the names I want for each of the data frames in my datos list.
Here is how I generate this data frame:
pru<-rbind(c("UT","TR","UT+","TR+"),
c("UT","TR","UT+","TR+"),
c("TR","UT","TR+","UT+"),
c("TR","UT","TR+","UT+"))
vec<-paste("v1r",seq(1,20,1),sep="")
tor<-paste("v1s",seq(1,20,1),sep="")
nombres<-do.call("rbind", replicate(10, pru, simplify = FALSE))
nombres_df<-data.frame(corrida=c(vec,tor),nombres)
Because nombres_df$corrida[1] is v1r1, I have to name the datos$v1r1 columns ("estado","tiempo", nombres_df[1,2:5]), and so on for the other 40 elements.
I want to do this renaming automatically. I was thinking I could use something that uses regular expressions.
Just for the record, I don't know why but the order of the list of data frames is not the same as the 1:20 sequence (by this I mean 10 comes before 2,3,4...)
Here's a toy example of a list with a similar structure but fewer and shorter data frames.
toy<-list(a=replicate(6,1:5),b=replicate(6,10:14))
You have a data frame where variable corridas is the name of the data frame to be renamed and the remaining columns are the desired variable names for that data frame. You could use a loop to do all the renaming operations:
for (i in seq_len(nrow(nombres_df))) {
names(datos[[nombres_df$corridas[i]]]) <- c("estado","tiempo",nombres_df[i,2:length(nombres_df)])
}
I have approximately 300 csv files of wind speed, temp, pressure, etc, columns and each row is a different time from 2007 to 2012. Each file is from a different location. I want to combine all files into one that is the average of all 300 files. So the new file would have the same number of rows and columns of each individual file but each cell would be a corresponding average of all the 300 files. Is there an easy way to do this?
Following this post, you could read all the files into a list (here I've assumed they're named weather*.csv):
csvs <- lapply(list.files(pattern="weather*.csv"), read.csv)
All that remains is to take the average of all those data frames. You might try something like:
Reduce("+", csvs) / length(csvs)
If you wanted to only add a subset of the columns, you could pass Reduce a list of data frames with the appropriate subset of the columns. For instance, if you wanted to remove the first column from each, you could do something like:
Reduce("+", lapply(csvs, "[", -1)) / length(csvs)