I have a data frame with somewhat difficult-to-understand feature names, so I have an attribute that gives more description for each column. However, if I write this file using write.csv the attribute doesn't get included in the file. The function write.arff does actually write something called "attribute" to file, but it's just additional column information like data type and factor labels/levels. When reading the file back in to R with read.arff, it doesn't appear that this additional info is brought back in with the rest of the data.
Is there a way to write a data frame to standard file type (csv, xml, fixed-width text, tab-separated) with its attributes intact?
I understand that there are functions like save and dput, but I need the body data to be able to be read by other programs too, like Excel, KNIME, or SPSS. The attributes I'll probably be using only in R, but if possible I'd like to keep them in the same file with the data. Thanks!
Here's an example of the attribute:
> attr(hipparcos, 'desc')
[1] "Identifier (HIP number) (H1)" "*[HT] Proximity flag (H2)"
[3] "? Magnitude in Johnson V (H5)" "*[1,3]? Coarse variability flag (H6)"
[5] "*[GHT] Source of magnitude (H7)" "*? alpha, degrees (ICRS, Epoch=J1991.25) (H8)"
[7] "*? delta, degrees (ICRS, Epoch=J1991.25) (H9)" "*[*+A-Z] Reference flag for astrometry (H10)"
[9] "? Trigonometric parallax (H11)" "*? Proper motion mu_alpha.cos(delta) ICRS (H12)"
Correction: the .arff file does not write the attribute to file - it writes data type, factor labels and column name and calls them "attributes". My mistake!
If it can help to someone, I have used the following code:
library(rlist)
file <- attr(hipparcos, 'desc')
list.save(file, 'your//pathway//file.yaml')
The yaml file can be opened with a text editor
Related
I'm having problems with editing a JSON file and saving the results in a usable form.
My starting point is : modify-json-file-using-some-condition-and-create-new-json-file-in-r
In fact I want to do something even simpler and it still doesn't work! I'm using the jsonlite package.
An equivalent sample would look like this ...
$Apples
$Apples$Origin
$Apples$Origin$Id
[1] 2615
$Apples$Origin$season
[1] "Fall"
$Oranges
$Oranges$Origin
$Oranges$Origin$Id
[1] 2615
$Oranges$Origin$airportLabel
[1] "Orange airport"
$Oranges$Shipping
$Oranges$Shipping$ShipperId
[1] 123
$Oranges$Shipping$ShipperLabel
[1] "Brighter Orange"
I read the file, make some changes and save the resulting file back to HDD. Nothing simpler right?
json_list = read_json(path = "../documents/dummy.json")
json_list$Apples$Origin$Id = 1234
json_list$Oranges$Origin$Id = 4567
json_list$Oranges$Shipping$ShipperLabel = "Suntan Blue"
json_modified <- toJSON(json_list, pretty = TRUE)
write_json(json_modified, path = "../documents/dummy_new.json")
json_list appears as character format under the Rstudio file type column.
json_modified appears as json format under the Rstudio file type column.
Why this difference?
Now if I run the original file it works but the modified file fails. The JSON format checks out and I can't see any errors.
The real file is bigger than the example above but the method I've used is the same.
Am I doing something wrong in the way I edit or save the file?
I'm really new to JSON and this is really frustrating!
Any Ideas?
Thanks
In the absence of reproducible data, I can diagnose at least one potential problem.
Background
Within the jsonlite package, there exist functions that are mutual inverses:
jsonlite::fromJSON() converts from raw text (in JSON format) to R objects.
jsonlite::toJSON() converts from R objects to raw text (in JSON format).
Now this raw text (txt) might be
a JSON string, URL or file
As for jsonlite::read_json() and jsonlite::write_json(), they are also a pair of mutual inverses, which are like the former pair
except [that] they explicitly distinguish between path and literal input, and do not simplify by default.
That is, the latter are simply designed to handle file(path)s rather than strings of raw text.
So toJSON(fromJSON(txt = ...)) should return unchanged the text passed to txt, just as write_json(read_json(path = ...)) should write a file identical to that passed to path.
In short, toJSON() belongs with fromJSON(); while write_json() belongs with read_json().
The Problem
However, you have added a spurious step by mingling toJSON() with read_json() and write_json():
json_list = read_json(...)
# ...
json_modified <- toJSON(json_list, ...) # SPURIOUS STEP
# ...
write_json(json_modified, ...)
You see, write_json() already converts "to JSON", so toJSON() is wholly unnecessary. Indeed, toJSON() actually sabotages the process, since its textual return value is passed on (in json_modified) to write_json(), which expects (a structure of) R objects rather than text.
The Fix
Once you're done modifying json_list, just go straight to writing it:
json_list = read_json(path = "../documents/dummy.json")
json_list$Apples$Origin$Id = 1234
# Further modifications...
write_json(json_list, path = "../documents/dummy_new.json", pretty = TRUE)
I'm working with data from many different sources, so I'm creating a name bridge and a function to make it easier to join tables. One of the sources uses an umlaut for a value and (I think) the excel csv isn't UTF-8 encoded, so I'm getting strange results.
Since I can't control how the other source compiles their data, I'd like to make a universal function that fixes all the weird encoding rules. I'll use Dennis Schröder as an example name.
One particular source uses the Umlaut, and when I read it in with read.csv and view the table in RStudio, it shows up as Dennis Schr<f6>der. However, if I index the particular table to his value (table[i,j]), the console reads Dennis Schr\xf6der
So in my name-bridge csv, I made a row to map all Dennis Schr\xf6der to Dennis Schroder. I read this name bridge in (with the condition allowEscapes = TRUE), and he shows up exactly the same in my name-bridge table. Great! I should be able to left_join this to the other source to change the name to just Dennis Schroder.
But unfortunately the names still don't map unless I Don't trim strings (I have to trim strings in general because other sources introduce white spaces). Here's the general function I use to fix names. The dataframe is the other source's table, VarUse is the name-column that I want to fix from dataframe, and correctionTable is my name-bridge.
nameUpdate <- dataframe %>%
mutate(name = str_trim(VarUse, 'both')) %>%
left_join(correctionTable, by = c('name' = 'WrongName'))
When I dig into the results of this mapping, I get the following:
correctionTable[14,1] is my name-bridge input of "Dennis Schr\xf6der".
nameUpdate[29,3] is the original name variable from the other source which reads "Dennis Schr\xf6der".
nameUpdate[29,19] is the mutated name variable from the other source after using str_trim, which also reads "Dennis Schr\xf6der".
However, for some reason the str_trim version is not equal to the name-bridge, so it won't map:
In writing this (non-reproducible, sorry) example, I've figured out a work-around by using a combo of str_trim and by not using it, but at this point I'm just confused why the name doesn't get fixed after I use str_trim. The values look exactly the same.
attach.files = c(paste("/users/joesmith/nosection_", currentDate,".csv",sep=""),
paste("/users/joesmith/withsection_", currentDate,".csv",sep=""))
Basically, if I did it like
c("nosection_051418.csv", "withsection_051418.csv")
And I did that manually it would work fine but since I'm automating this to run every day I can't do that.
I'm trying to attach files in an automated email but when I structure it like this, it doesn't work. How can I recreate this so that the character vector accepts it?
I thought your example implied the need for "parallel" inputs to the path stem, the first portion of the file name, and the date portions of those full paths. Consider this illustration of using a 2 item vector and a one item vector (produced by Sys.Date, replacing your "currentdate") to populate the %s positions in that sprintf string (suggested by #Gregor):
sprintf("/users/joesmith/%s_%s.csv", c("nosection", "withsection"), Sys.Date() )
[1] "/users/joesmith/nosection_2018-05-14.csv" "/users/joesmith/withsection_2018-05-14.csv"
I have the data as below manner.
<Status>Active Leave Terminated</Status>
<date>05/06/2014 09/10/2014 01/10/2015</date>
I want to get the data as in the below manner.
<status>Active</Status>
<date>05/06/2014</date>
<status>Leave</Status>
<date>09/10/2014</date>
<status>Terminated</Status>
<date>01/10/2015</date>
please help me on the query, to retrieve the data as specified above.
Well, you have a string and want to split it at the whitestapces. That's what tokenize() is for and \s is a whitespace. To get the corresponding date you can get the current position in the for loop using at. Together it looks something like this (note that I assume that the input data is the current context item):
let $dates := tokenize(date, "\s+")
for $status at $pos in tokenize(Status, "\s+")
return (
<status>{$status}</status>,
<date>{$dates[$pos]}</date>
)
You did not indicate whether your data is on the file system or already loaded into MarkLogic. It's also not clear if this is something you need to do once on a small set of data or on an on-going basis with a lot of data.
If it's on the file system, you can transform it as it is being loaded. For instance, MarkLogic Content Pump can apply a transformation during load.
If you have already loaded the content and you want to transform it in place, you can use Corb2.
If you have a small amount of data, then you can just loop across it using Query Console.
Regardless of how you apply the transformation code, dirkk's answer shows how you need to change it. If you are updating content already in your database, you'll xdmp:node-delete() the original Status and date elements and xdmp:node-insert-child() the new ones.
Setting:
I have (simple) .csv and .dat files created from laboratory devices and other programs storing information on measurements or calculations. I have found this for other languages but nor for R
Problem:
Using R, I am trying to extract values to quickly display results w/o opening the created files. Hereby I have two typical settings:
a) I need to read a priori unknown values after known key words
b) I need to read lines after known key words or lines
I can't make functions such as scan() and grep() work.
c) Finally I would like to loop over dozens of files in a folder and give me a summary (to make the picture complete: I will manage this part)
I woul appreciate any form of help.
ok, it works for the key value (although perhaps not very nice)
variable<-scan("file.csv", what=character(),sep="")
returns a charactor vector of everything
variable[grep("keyword", ks)+2] # + 2 as the actual value is stored two places ahead
returns characters of seaked values.
as.numeric(lapply(variable, gsub, patt=",", replace="."))
for completion: data had to be altered to number and "," and "." problem needed to be solved.
in a line:
data=as.numeric(lapply(ks[grep("Ks_Boden", ks)+2], gsub, patt=",", replace="."))
Perseverence is not to bad of an asset ;-)
The rest isn't finished, yet, I will post once finished.