Read a file if it hasn't read before in R - r

I am writing an R script (not a function, just a collection of commands) that will read a csv file into a data frame. The CSV file is large and I don't want to read it every time I am running the script, if it has been already read. This is how I am checking if the variable exists:
if (!exists("df")) {
df <- read_csv(file = "./some_file.csv")
}
However, every time I run the script no matter whether df exists or not, the read_csv function runs.
what am I missing here? Should I specify where df data frame should be searched?
Edit: Here is a bit of context to what I am trying to achieve. Usually, when working I work interactively in R or Rstudio. If I am readying a file, I read it and then the data is in the GlobalEnvinronment and I play with my data. I was trying to put all my work in a script and add to it step by step. In the beginning of the script, I read this CSV file which is about 11MB and then start manipulating the data. However, as I add new lines to my script and I want to test them, I don't want to read the CSV file again. It is already read and the data frame is available in the Global environment. That was the the reason I put the call to read_csv() function inside an if statement.
The problem is despite variable existing in global environment, every time I run the script, the read_csv() function is run, as if the if statement is ignored.

df is actually a function in the stats package which normally exists :-)
So basically, just choose a better variable name!

Can you please use the "where" and "environment" argument and then try. These argument basically drive the exists command to look at this variable at what place/environment.
exists(x, where = -1, envir = , frame, mode = "any",
inherits = TRUE)
Here x is variable name

Related

How to use a file modified by a R chunk in a Python one

I am working in Rmarkdown into primarily R chunks, which I used to modify data frames. Now that they are ready, a colleague gave me Python codes to process some of the data. But when transitioning from a R chunk to a Python one, the environment changes and I do not know how to use the previous files directly.
reticulate::repl_python()
biodata_file = women_personal_data
NameError: name 'women_personal_data' is not defined
NameError: name 'women_personal_data' is not defined
Ideally, I would like not to have to save the files on my computer between R and Python, and then back at R again, to avoid accumulating files that are not completely clean yet (because I figured it could be a solution).
I tried this solution but seems to not work with Data Frames
Thanks !
biodata_file = r.women_personal_data
The '.r' makes it take it from R, because the variable was called
r women_personal_data
TIP = to come back to R, the variable is now called py$women_personal_data

Saving R dataframe from script

It should be very simple, but for now cannot figure it out. Say I create a generic dataframe with some data in a loop, let's call it df.
Now I want to assign a specific name to it and want to save it to specific destination. I generate two character variables - filename and file_destination and try to use the following in the script code:
assign(filename, df)
save(filename, file = file_destination)
Of course it save just a string with a name in the file and not the actual data.
How do i save the dataframe created via assign(filename,df)?
Try save(list=filename,file=file_destination). Also, use better names for your variables. filename for an object which is not a file name is very odd.
Put this as answer, to ensure other people find it easily.

Using R to write a .mat file not giving the right output?

I had a .csv file that I wanted to read into Octave (originally tried to use csvread). It was taking too long, so I tried to use R to workaround: How to read large matrix from a csv efficiently in Octave
This is what I did in R:
forest_test=read.csv('forest_test.csv')
library(R.matlab)
writeMat("forest_test.mat", forest_test_data=forest_test)
and then I went back to Octave and did this:
forest_test = load('forest_test.mat')
This is not giving me a matrix, but a struct. What am I doing wrong?
To answer your exact question, you are using the load function wrong. You must not assign it's output to a variable if you just want the variables on the file to be inserted in the workspace. From Octave's load help text:
If invoked with a single output argument, Octave returns data
instead of inserting variables in the symbol table. If the data
file contains only numbers (TAB- or space-delimited columns), a
matrix of values is returned. Otherwise, 'load' returns a
structure with members corresponding to the names of the variables
in the file.
With examples, following our case:
## inserts all variables in the file in the workspace
load ("forest_test.mat");
## each variable in the file becomes a field in the forest_test struct
forest_test = load ("forest_test.mat");
But still, the link you posted about Octave being slow with CSV files makes referece to Octave 3.2.4 which is a quite old version. Have you confirmed this is still the case in a recent version (last release was 3.8.2).
There is a function designed to convert dataframes to matrices:
?data.matrix
forest_test=data.matrix( read.csv('forest_test.csv') )
library(R.matlab)
writeMat("forest_test.mat", forest_test_data=forest_test)

how to read a file to data frame and print some colums in R

I got a question about reading a file into data frame using R.
I don't understand "getwd" and "setwd", do we must do these before reading the files?
and also i need to print some of the columns in the data frame, and only need to print 1 to 30,how to do this?
Kinds regards
getwd tells you what your current working directory is. setwd is used to change your working directory to a specified path. See the relevant documentation here or by typing ? getwd or ? setwd in your R console.
Using these allows you to shorten what you type into, e.g., read.csv by just specifying a filename without specifying its full path, like:
setwd('C:/Users/Me/Documents')
read.csv('myfile.csv')
instead of:
read.csv('C:/Users/Me/Documents/myfile.csv')

R load script objects to workspace

This is a rookie question that I cannot seem to figure out. Say you've built an R script that manipulates a few data frames. You run the script, it prints out the result. All is well. How is it possible to load objects created within the script to be used in the workspace? For example, say the script creates data frame df1. How can we access that in the workspace? Thanks.
Here is the script...simple function just reads a csv file and computes diff between columns 2 and 3...basically I would like to access spdat in workspace
mspreaddata<-function(filename){
# read csv file
rdat<-read.csv(filename,header=T,sep=",")
# compute spread value column 2-3
spdat$sp<-rdat[,2]-rdat[,3]
}
You should use the source function.
i.e. use source("script.R")
EDIT:
Check the documentation for further details. It'll run the script you call. The objects will then be in your workspace.
Alternatively you can save those objects using save and then load them using load.
So when you source that, the function mspreaddata is not available in your workspace? Because in there spdat is never created. You are just creating a function and not running it. That object spdat only exists within that function and not in any environment external to that. You should add something like
newObject <- mspreaddata("filename.csv")
Then you can access newObject
EDIT:
It is also the case that spdat is not created in your function so the call to spdat$sp<-rdat[,2]-rdat[,3] is itself incorrect. Simply use return(rdat[,2]-rdat[,3]) instead.

Resources