I've got a data structure as shown below:
It seems to be a data frame with meta data. I was able to manually build the data frame for this example with
d = data.frame(a1=x$value$value[1], a2=x$value$value[2], a3=x$value$value[3])
a=x$attributes
colnames(d)=a$names$value
However, I wonder if this is some sort of standard exchange format and if there is a more general solution to read the embedded data into a variable?
EDIT
The data structure came from an RDX2 file which contains JSON
load("data.json")
x=fromJSON(data_json)
The JSON structure contains the same data:
To answer my own question: the above is the result of serializing a data frame with
rlist::serialize(data, "data.json")
to a json file. Afterwards this file has been read as plain text, the text is converted using rjson::fromJSON, and the R data structure is written as it is to another file. Instead of this
data = rlist::unserialize("data.json")
should have been used.
Related
In R, each time a data frame is filtered for example are there any changes made to the source data frame? What are best practices for preserving the original data frame?
Okay, so I do not understand exactly what you mean but, if you have a .csv file for example ("example.csv") in your working directory and you create an r-object (example) from it, the original .csv file is maintained intact.
The example object however changes whenever you apply functions or filters to it. The easiest way to maintain an original data frame is to apply those functions to a differently named object (i.e. example2)
you may save as another data frame or output them for preservation
mtcars1 <- mtcars %>%
select(mpg,cyl,hp,vs)
Save one object to a file
saveRDS(mtcars1 , file = "my_data.rds")
Restore the object
readRDS(file = "my_data.rds")
Save multiple objects
save(mtcars, mtcars1, file = "multi_data.RData")
Restore multiple objects again
load("multi_data.RData")
I have created a dataset that consists of 574 Rows and 85 Columns. The data type is a list. I want to export this data to CSV as I want to perform some analysis. I tried converting List to Dataframe using dataFrame <- as.data.frame(Data) command. I also looked out for other commands but was not able to convert the list to dataframe, or any other format. My goal is to export the data to a CSV file.
This image is a preview of the dataset:
This image shows that data type is list of dimension 574*85:
You can try this "write.csv" function on your list.
write.csv(list,"a.csv")
it will automatically save in your working directory.
Provide a list format of yours. I'm not sure below answer is useful for you or not.
If your list is as below
all_data_list = [[1,2,3],[1,4,5],[1,5,6],...]
you have to do:
df = pd.DataFrame(all_data_list)
I am trying to figure out how to 'download' data into a nice CSV file to be able to analyse.
I am currently looking at WHO data here:
I am doing so through following documentation and getting output like so:
test_data <- jsonlite::parse_json(url("http://apps.who.int/gho/athena/api/GHO/WHS6_102.json?profile=simple"))
head(test_data)
This gives me a rather messy list of list of lists.
For example:
I get this
It is not very easy to analyse and rather messy. How could I clean this up by using say two columns that is returned from this json_parse, information only from say dim like REGION, YEAR, COUNTRY and then the values from the column Value. I would like to make this into a nice dataframe/CSV file so I can then more easily understand what is happening.
Can anyone give any advice?
jsonlite::fromJSON gives you data in a better format and the 3rd element in the list is where the main data is.
url <- 'https://apps.who.int/gho/athena/api/GHO/WHS6_102.json?profile=simple'
tmp <- jsonlite::fromJSON(url)
data <- tmp[[3]]
I want to export fake.bc.Rdata in package "qtl" into a CSV, and when running "summary" it shows this is an object of class "cross", which makes me fail to convert it. And I tried to use resave, but there is warning :cannot coerce class ‘c("bc", "cross")’ to a data.frame.
Thank you all for your help in advance!
CSV stands for comma-separated values, and is not suitable for all kinds of data.
It requires, like indicated in the comments, clear columns and rows.
Take this JSON as an example:
{
"name":"John",
"age":30,
"likes":"Walking","Running"
}
If you were to represent this in CSV-format, how would you deal with the difference in length? One way would be to have repeated data
name,age,likes
John,30,Walking
John,30,Running
But that doesn't really look right. Even if you merge the two into one you would still have trouble reading the data back, e.g.
name,age,likes
John,30,Walking/Running
Thus, CSV is best suited for tidy data.
TL;DR
Can your data be represented tidily as comma-separated values, or should you be looking at alternative forms of exporting your data?
EDIT:
It appears you do have some options:
If you look at the reference, you have the option to export your data using write.cross().
For your data, you could use write.cross(fake.bc, "csv", "myCrossData", c(1,5,13)). It then does the following:
Comma-delimited formats: a single csv file is created in the formats
"csv" or "csvr". Two files are created (one for the genotype data and
one for the phenotype data) for the formats "csvs" and "csvsr"; if
filestem="file", the two files will be names "file_gen.csv" and
"file_phe.csv".
I'm writing a script to plot data from multiple files. Each file is named using the same format, where strings between “.” give some info on what is in the file. For example, SITE.TT.AF.000.52.000.001.002.003.WDSD_30.csv.
These data will be from multiple sites, so SITE, or WDSD_30, or any other string, may be different depending on where the data is from, though it's position in the file name will always indicate a specific feature such as location or measurement.
So far I have each file read into R and saved as a data frame named the same as the file. I'd like to get something like the following to work: if there is a data frame in the global environment that contains WDSD_30, then plot a specific column from that data frame. The column will always have the same name, so I could write plot(WDSD_30$meas), and no matter what site's files were loaded in the global environment, the script would find the WDSD_30 file and plot the meas variable. My goal for this script is to be able to point it to any folder containing files from a particular site, and no matter what the site, the script will be able to read in the data and find files containing the variables I'm interested in plotting.
A colleague suggested I try using strsplit() to break up the file name and extract the element I want to use, then use that to rename the data frame containing that element. I'm stuck on how exactly to do this or whether this is the best approach.
Here's what I have so far:
site.files<- basename(list.files( pattern = ".csv",recursive = TRUE,full.names= FALSE))
sfsplit<- lapply(site.files, function(x) strsplit(x, ".", fixed =T)[[1]])
for (i in 1:length(site.files)) assign(site.files[i],read.csv(site.files[i]))
for (i in 1:length(site.files))
if (sfsplit[[i]][10]==grep("PARQL", sfsplit[[i]][10]))
{assign(data.frame.getting.named.PARQL, sfsplit[[i]][10])}
else if (sfsplit[i][10]==grep("IRBT", sfsplit[[i]][10]))
{assign(data.frame.getting.named.IRBT, sfsplit[[i]][10])
...and so on for each data frame I'd like to eventually plot from.Is this a good approach, or is there some better way? I'm also unclear on how to refer to the objects I made up for this example, data.frame.getting.named.xxxx, without using the entire filename as it was read into R. Is there something like data.frame[1] to generically refer to the 1st data frame in the global environment.