How to save a data frame in R

How to save a data frame in R - r

According to the answer to this question, you can save a data frame "foo" in R with the save() function as follows:
save(foo,file="data.Rda")
Here is data frame "df":
> str(df)
'data.frame': 1254 obs. of 2 variables
$ text : chr "RT #SchmittySays: I love this 1st grade #science teacher from #Duluth http://t.co/HWDYFnIyqV #NSTA15 #AlbertEinstein #inspirat"| __truncated__ "RT #KVernonBHS: #smrtgrls would love Stellar Girls. Empowering female scientists rocks! #NSTA15 http://t.co/1ZU0yjVF67" "RT #leducmills: Leaving #SXSWedu to go straight to #NSTA15. There should be some sort of arbitrary conference-hopper social med"| __truncated__ "RT #KRScienceLady: Congrats to a wonderful colleague who helped #ngss Bcome reality, Stephen Pruitt, Distinguished Service to "| __truncated__ ...
$ group: Factor w/ 2 levels "narst","nsta": 2 2 2 2 2 2 2 2 2 2 ...
It seems to save fine:
> save(df, file = "~/downloads/df.Rda")
But it turns out only the name of the object saved:
> df1 <- load("~/downloads/df.Rda")
> str(df1)
chr "df"
I tried the saveRDS() function suggested in another answer to the same question referenced above which worked fine, but I'd like to know why save() isn't working.

You might want to take a look at this question here: R data formats: RData, Rda, Rds etc.
When loading an .rda object, you are going to load all objects with their original names to the global environment. You can't assign objects to new names using load as you tried to do.
If you want to save objects that can be loaded with different names later, then you should use the .rds format (saveRDS and readRDS). If you want to save more than one object in a .rds file, the simplest solution is to put all of them on a list and save only the list. If after reading the .rds you want to put the objects of the list in the global environment, you can use list2env.

Related

Google Places API and R -- calling 2nd column in a data frame returns six separate columns

I'm trying to store the results of a data frame I retrieved from a list via the Google Places API. My call to the API...
library(googleway)
HAVE_PLACES <- google_places(search_string = "grocery store",
location = c(35.4168, -80.5883),
radius = 10000, key = key)
...returns a list object HAVE_PLACES:
The third object in this list - results - is a data frame with one observation for each location retrieved in the API call. When I call View(HAVE_PLACES$results), I get what looks like a set of vectors - as I expect when looking at a data frame...
...But it looks like the data frame includes data frames:
WHAT IS GOING ON HERE?
More specifically:
How can a data frame contain data frames, and why does View() show the nested data frames as it would vectors?
When working with data of this type, where you want the columns you're seeing in View() to simply be vectors - for manipulation and exporting purposes - are there any best practices? I'm about to convert each vector of this alleged data frame called geometry into separate objects, and cbind() the results to the HAVE_PLACES$results. But this feels insane.

Akrun is right (as usual!). A data.frame can have lists as 'columns'. This is normal behaviour.
Your question seems to be a more general question about how to extract nested list data in R, but using Google's API response as an example. Given you're using googleway (I'm the author of the pacakge), I'm answering it in the context of Google's response. However, there are numerous other answers and examples online about how to work with lists in R.
Explanation
You're seeing the nested lists in your results because the data returned from Google's API is actually JSON. The google_places() function 'simplifies' this to a data.frame using jsonlite::fromJSON() internally.
If you set simplify = F in the function call you can see the raw JSON output
library(googleway)
set_key("GOOGLE_API_KEY")
HAVE_PLACES_JSON <- google_places(search_string = "grocery store",
location = c(35.4168, -80.5883),
radius = 10000,
simplify = F)
## run this to view the JSON.
jsonlite::prettify(paste0(HAVE_PLACES_JSON))
You'll see the JSON can contain many nested objects. When converted to an R data.frame these nested objects are returned as list columns'
If you're not familiar with JSON it may be worth a bit of research to see what it's all about.
Extracting Data
I've written some functions to extract useful pieces of information from the API responses which may be of help here
locations <- place_location(HAVE_PLACES)
head(locations)
# lat lng
# 1 35.38690 -80.55993
# 2 35.42111 -80.57277
# 3 35.37006 -80.66360
# 4 35.39793 -80.60813
# 5 35.44328 -80.62367
# 6 35.37034 -80.54748
placenames <- place_name(HAVE_PLACES)
head(placenames)
# "Food Lion" "Food Lion" "Food Lion" "Food Lion" "Food Lion" "Food Lion"
However, note that you will still get some list objects returned, because, in this case, a 'location' can have many 'types'
placetypes <- place_type(HAVE_PLACES)
str(placetypes)
# List of 20
# $ : chr [1:5] "grocery_or_supermarket" "store" "food" "point_of_interest" ...
# $ : chr [1:5] "grocery_or_supermarket" "store" "food" "point_of_interest" ...
# $ : chr [1:5] "grocery_or_supermarket" "store" "food" "point_of_interest" ...
# $ : chr [1:5] "grocery_or_supermarket" "store" "food" "point_of_interest" ...
Summary
With Google's API responses you will have to extract the specific data elemets you want and construct them into your required object
df <- cbind(
place_name(HAVE_PLACES)
, place_location(HAVE_PLACES)
, place_type(HAVE_PLACES)[[1]] ## only selecting the 1st 'type'
)
head(df)
# place_name(HAVE_PLACES) lat lng place_type(HAVE_PLACES)[[1]]
# 1 Food Lion 35.38690 -80.55993 grocery_or_supermarket
# 2 Food Lion 35.42111 -80.57277 store
# 3 Food Lion 35.37006 -80.66360 food
# 4 Food Lion 35.39793 -80.60813 point_of_interest
# 5 Food Lion 35.44328 -80.62367 establishment
# 6 Food Lion 35.37034 -80.54748 grocery_or_supermarket

R readr package - written and read in file doesn't match source

I apologize in advance for the somewhat lack of reproducibility here. I am doing an analysis on a very large (for me) dataset. It is from the CMS Open Payments database.
There are four files I downloaded from that website, read into R using readr, then manipulated a bit to make them smaller (column removal), and then stuck them all together using rbind. I would like to write my pared down file out to an external hard drive so I don't have to read in all the data each time I want to work on it and doing the paring then. (Obviously, its all scripted but, it takes about 45 minutes to do this so I'd like to avoid it if possible.)
So I wrote out the data and read it in, but now I am getting different results. Below is about as close as I can get to a good example. The data is named sa_all. There is a column in the table for the source. It can only take on two values: gen or res. It is a column that is actually added as part of the analysis, not one that comes in the data.
table(sa_all$src)
gen res
14837291 822559
So I save the sa_all dataframe into a CSV file.
write.csv(sa_all, 'D:\\Open_Payments\\data\\written_files\\sa_all.csv',
row.names = FALSE)
Then I open it:
sa_all2 <- read_csv('D:\\Open_Payments\\data\\written_files\\sa_all.csv')
table(sa_all2$src)
g gen res
1 14837289 822559
I did receive the following parsing warnings.
Warning: 4 parsing failures.
row col expected actual
5454739 pmt_nature embedded null
7849361 src delimiter or quote 2
7849361 src embedded null
7849361 NA 28 columns 54 columns
Since I manually add the src column and it can only take on two values, I don't see how this could cause any parsing errors.
Has anyone had any similar problems using readr? Thank you.
Just to follow up on the comment:
write_csv(sa_all, 'D:\\Open_Payments\\data\\written_files\\sa_all.csv')
sa_all2a <- read_csv('D:\\Open_Payments\\data\\written_files\\sa_all.csv')
Warning: 83 parsing failures.
row col expected actual
1535657 drug2 embedded null
1535657 NA 28 columns 25 columns
1535748 drug1 embedded null
1535748 year an integer No
1535748 NA 28 columns 27 columns
Even more parsing errors and it looks like some columns are getting shuffled entirely:
table(sa_all2a$src)
100000000278 Allergan Inc. gen GlaxoSmithKline, LLC.
1 1 14837267 1
No res
1 822559
There are columns for manufacturer names and it looks like those are leaking into the src column when I use the write_csv function.

R Refer to (part of) data frame using string in R

I have a large data set in which I have to search for specific codes depending on what i want. For example, chemotherapy is coded by ~40 codes, that can appear in any of 40 columns called (diag1, diag2, etc).
I am in the process of writing a function that produces plots depending on what I want to show. I thought it would be good to specify what I want to plot in a input data frame. Thus, for example, in case I only want to plot chemotherapy events for patients, I would have a data frame like this:
Dataframe name: Style
Name SearchIn codes PlotAs PlotColour
Chemo data[substr(names(data),1,4)=="diag"] 1,2,3,4,5,6 | red
I already have a function that searches for codes in specific parts of the data frame and flags the events of interest. What i cannot do, and need your help with, is referring to a data frame (Style$SearchIn[1]) using codes in a data frame as above.
> Style$SearchIn[1]
[1] data[substr(names(data),1,4)=="diag"]
Levels: data[substr(names(data),1,4)=="diag"]
I thought perhaps get() would work, but I cant get it to work:
> get(Style$SearchIn[1])
Error in get(vars$SearchIn[1]) : invalid first argument
enter code here
or
> get(as.character(Style$SearchIn[1]))
Error in get(as.character(Style$SearchIn[1])) :
object 'data[substr(names(data),1,5)=="TDIAG"]' not found
Obviously, running data[substr(names(data),1,5)=="TDIAG"] works.
Example:
library(survival)
ex <- data.frame(SearchIn="lung[substr(names(lung),1,2) == 'ph']")
lung[substr(names(lung),1,2) == 'ph'] #works
get(ex$SearchIn[1]) # does not work

It is not a good idea to store R code in strings and then try to eval them when needed; there are nearly always better solutions for dynamic logic, such as lambdas.
I would recommend using a list to store the plot specification, rather than a data.frame. This would allow you to include a function as one of the list's components which could take the input data and return a subset of it for plotting.
For example:
library(survival);
plotFromSpec <- function(data,spec) {
filteredData <- spec$filter(data);
## ... draw a plot from filteredData and other stuff in spec ...
};
spec <- list(
Name='Chemo',
filter=function(data) data[,substr(names(data),1,2)=='ph'],
Codes=c(1,2,3,4,5,6),
PlotAs='|',
PlotColour='red'
);
plotFromSpec(lung,spec);
If you want to store multiple specifications, you could create a list of lists.

Have you tried using quote()
I'm not entirely sure what you want but maybe you could store the things you're trying to get() like
quote(data[substr(names(data),1,4)=="diag"])
and then use eval()
eval(quote(data[substr(names(data),1,4)=="diag"]), list(data=data))
For example,
dat <- data.frame("diag1"=1:10, "diag2"=1:10, "other"=1:10)
Style <- list(SearchIn=c(quote(data[substr(names(data),1,4)=="diag"]), quote("Other stuff")))
> head(eval(Style$SearchIn[[1]], list(data=dat)))
diag1 diag2
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6

How to load CSV as factor in R

I have a file called metadata.csv that I want to load into R and convert to a factor.
I begin with:
metadata <- read.csv(file="metadata.csv", header=T, stringsAsFactors=T)
And this loads the CSV just fine. I've printed out metadata here:
> metadata
Filename Genre Date Gender
1 Austen_Emma.txt Social Early Female
2 Bronte_Eyre.txt Social Middle Female
3 Dickens_Expectations.txt Social Late Male
4 Eliot_Mill.txt Social Late Female
5 Lewis_Monk.txt Gothic Early Male
6 Radcliffe_Italian.txt Gothic Early Female
7 Shelley_Frankenstein.txt Gothic Middle Female
8 Stoker_Dracula.txt Gothic Late Male
9 Thackeray_Vanity.txt Social Middle Male
10 Trollope_Vicar.txt Social Middle Male
Now I want to convert it to a factor:
as.factor(metadata)
This gives me the following error:
Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?

metadata is a dataframe which is a special type of list made up of vectors of equal length. You can only use as.factor() on vectors. Therefore you must class as.factor() on each vector in the dataframe. This can be done using the lapply function:
metadata <- data.frame(lapply(metadata, factor))
This will convert each column to a factor (check this by class(metadata[, 1])). The overall structure of metadata will still be a dataframe.

read.csv puts data into a data.frame
You cannot convert a data.frame into a factor. That's very basic R stuff.
It's like you're trying to change of a bunch of .doc files into PDFs by converting your computer into a PDF. It just doesn't make sense.
The error is asking "Have you called sort on a list?" Yes, you have. as.factor calls sort, and your data.frame is a list.

Creating a vector from a file in R

I am new to R and my question should be trivial. I need to create a word cloud from a txt file containing the words and their occurrence number. For that purposes I am using the snippets package.
As it can be seen at the bottom of the link, first I have to create a vector (is that right that words is a vector?) like bellow.
> words <- c(apple=10, pie=14, orange=5, fruit=4)
My problem is to do the same thing but create the vector from a file which would contain words and their occurrence number. I would be very happy if you could give me some hints.
Moreover, to understand the format of the file to be inserted I write the vector words to a file.
> write(words, file="words.txt")
However, the file words.txt contains only the values but not the names(apple, pie etc.).
$ cat words.txt
10 14 5 4
Thanks.

words is a named vector, the distinction is important in the context of the cloud() function if I read the help correctly.
Write the data out correctly to a file:
write.table(words, file = "words.txt")
Create your word occurrence file like the txt file created. When you read it back in to R, you need to do a little manipulation:
> newWords <- read.table("words.txt", header = TRUE)
> newWords
x
apple 10
pie 14
orange 5
fruit 4
> words <- newWords[,1]
> names(words) <- rownames(newWords)
> words
apple pie orange fruit
10 14 5 4
What we are doing here is reading the file into newWords, the subsetting it to take the one and only column (variable), which we store in words. The last step is to take the row names from the file read in and apply them as the "names" on the words vector. We do the last step using the names() function.

Yes, 'vector' is the proper term.
EDIT:
A better method than write.table would be to use save() and load():
save(words. file="svwrd.rda")
load(file="svwrd.rda")
The save/load combo preserved all the structure rather than doing coercion. The write.table followed by names()<- is kind of a hassle as you can see in both Gavin's answer here and my answer on rhelp.
Initial answer:
Suggest you use as.data.frame to coerce to a dataframe an then write.table() to write to a file.
write.table(as.data.frame(words), file="savew.txt")
saved <- read.table(file="savew.txt")
saved
words
apple 10
pie 14
orange 5
fruit 4

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex