Creating a variable using select portions of an excel column - r

Let's say I have an excel column with 10 different cells with values. How do I create a variable in r that includes only the first four or first 6 cells in that column?

This question is very vague, please provide more information if you need specifics...
First of all, you'll want to use a library to import the contents of the excel file, I recommend using readxl (http://readxl.tidyverse.org)
You can then follow the documentation to read specific ranges from the excel file or just import all the contents and trim the resulting tibble.

Probably
# Install -readxl- package that loads in Excel spreadsheets
install.packages("readxl")
# Load -readxl- package for use
require(readxl)
# Change working directory to directory where spreadsheet is saved in
setwd("<Insert path here>")
# Save spreadsheet data to memory
myData <- read_excel("myData.xlsx", sheet = 1)
# Subset first four or six observations
firstFour <- myData[1:4,]
firstSix <- myData[1:6,]
Let me know if you don't understand.

Related

R show variable list/header of Stata or SAS file in R without loading the complete dataset

I am given very big (around 10 Gb each) datasets in both SAS and Stata format. I am going to read them into R for analysis.
Is there a way to show what variables (columns) they contain inside without reading the whole data file? I often only need some of the variables. I can view them of course from File Explorer, but it's not reproducible and takes a lot of time.
Both SAS and Stata are available on the system, but just opening a file might take a minute or so.
If you have SAS run a proc contents or proc datasets to see the details of the dataset without opening it. You may want to do that anyways, so that you can verify variable types, lengths and formats.
libname myFiles 'path to your sas7bdatfiles';
proc contents data=myfiles.datasetName;
run;
See below for the dta solution, which you can update to SAS using read_sas.
library(haven)
# read in first row of dta
dta_head <- read_dta("my_data.dta",
n_max = 1)
# get variable names of dta
dta_names <- names(dta_head)
After examining the names and labels of your dta file, you can then remove the n_max = 1 option and read in full while possibly adding the col_select option specifying the subset of variables you wish to read in.

How to import an ASCII text file into R - the NIBRS

Currently, I am trying to import the National Incidence Based Reporting System (NIBRS) 2019 data into R. The data comes in an ASCII text format, and so far I've tried readr::read_tsv and readr::read_fwf. However, I can't seem to import the data correctly - read_tsv shows only 1 column, while read_fwf needs column arguments that I do not understand how to decipher based on the text file.
Here is the link to the NIBRS. I used the Master File Downloads to download the zipped file for the NIBRS in 2019.
My overall goal is to have a typical dataframe/tibble for this data set with column names being the type of crime, and the rows being the number of incidents.
I have seen a few other examples of importing this data through this help page, but their copies of the data only covers up to 2015 (My data needs to range from 2015-2019).
.
Use read.fwf(). Column widths are listed here
We can use read_fwf with column_positions
library(readr)
read_fwf(filename, column_positions = c(2, 5, 10))

R: Writing data frame into excel with large number of rows

I have a data frame (panel form) in R with 194498 rows and 7 columns. I want to write it to an Excel file (.xlsx) using function res <- write.xlsx(df, output) but R goes in the coma (keeps showing stop sign on the top left of console) without making any change in the targeted file(output). Finally shows following:
Error in .jcheck(silent = FALSE) :
Java Exception <no description because toString() failed>.jcall(row[[ir]], "Lorg/apache/poi/ss/usermodel/Cell;", "createCell", as.integer(colIndex[ic] - 1))<S4 object of class "jobjRef">
I have loaded readxl and xlsx packages. Please suggest to fix it. Thanks.
Install and load package named 'WriteXLS' and try writing out your R object using function WriteXLS(). Make sure your R object is written in quotes like the one below "data".
# Store your data with 194498 rows and 7 columns in a data frame named 'data'
# Install package named WriteXLS
install.packages("WriteXLS")
# Loading package
library(WriteXLS)
# Writing out R object 'data' in an Excel file created namely data.xlsx
WriteXLS("data",ExcelFileName="data.xlsx",row.names=F,col.names=T)
Hope this helped.
This does not answer your question, but might be a solution to your problem.
Could save the file as a CSV instead like so:
write.csv(df , "df.csv")
open the CSV and then save as an Excel file.
I gave up on trying to import/export Excel files with R because of hassles like this.
In addition to Pete's answer I wouldn't recommend write.csv because it takes or can take minutes to load. I used fwrite() (from data.table library) and it did the same thing in about 1-2 secs.
The post author asked about large files. I dealt with a table about 2,3 million rows long and write.data (and frwrite) aren't able to write more than about 1 million rows. It just cuts the data away. So instead use write.table(Data, file="Data.txt"). You can open it in Excel and split the one column by your delimiter (use argument sep) and voila!

Read excel file with formulas in cells into R

I was trying to read an excel spreadsheet into R data frame. However, some of the columns have formulas or are linked to other external spreadsheets. Whenever I read the spreadsheet into R, there are always many cells becomes NA. Is there a good way to fix this problem so that I can get the original value of those cells?
The R script I used to do the import is like the following:
options(java.parameters = "-Xmx8g")
library(XLConnect)
# Step 1 import the "raw" tab
path_cost = "..."
wb = loadWorkbook(...)
raw = readWorksheet(wb, sheet = '...', header = TRUE, useCachedValues = FALSE)
UPDATE: read_excel from the readxl package looks like a better solution. It's very fast (0.14 sec in the 1400 x 6 file I mentioned in the comments) and it evaluates formulas before import. It doesn't use java, so no need to set any java options.
# sheet can be a string (name of sheet) or integer (position of sheet)
raw = read_excel(file, sheet=sheet)
For more information and examples, see the short vignette.
ORIGINAL ANSWER: Try read.xlsx from the xlsx package. The help file implies that by default it evaluates formulas before importing (see the keepFormulas parameter). I checked this on a small test file and it worked for me. Formula results were imported correctly, including formulas that depend on other sheets in the same workbook and formulas that depend on other workbooks in the same directory.
One caveat: If an externally linked sheet has changed since the last time you updated the links on the file you're reading into R, then any values read into R that depend on external links will be the old values, not the latest ones.
The code in your case would be:
library(xlsx)
options(java.parameters = "-Xmx8g") # xlsx also uses java
# Replace file and sheetName with appropriate values for your file
# keepFormulas=FALSE and header=TRUE are the defaults. I added them only for illustration.
raw = read.xlsx(file, sheetName=sheetName, header=TRUE, keepFormulas=FALSE)

Convert .xlsm to .xlsx in R

I would like to convert an Excel file (say it's name is "Jimmy") that is saved as a macro enabled workbook (Jimmy.xlsm) to Jimmy.xlsx.
I need this to be done in a coding environment. I cannot simply change this by opening the file in Excel and assigning a different file-type. I am currently programming in R. If I use the function
file.rename("Jimmy.xlsm", "Jimmy.xlsx")
the file becomes corrupted.
In your framework you have to read in the sheet and write it back out. Suppose you have an XLSM file (with macros, I presume) called "testXLSM2X.xlsm" containing one sheet with tabular columns of data. This will do the trick:
library(xlsx)
r <- read.xlsx("testXLSMtoX.xlsm", 1) # read the first sheet
# provides a data frame
# use the first column in the spreadsheet to create row names then delete that column from the data frame
# otherwise you will get an extra column of row index numbers in the first column
r2w<-data.frame(r[-1],row.names=r[,1])
w <- write.xlsx(r2w,"testXLSMtoX.xlsx") # write the sheet
The macros will be stripped out, of course.
That's an answer but I would question what you are trying to accomplish. In general it is easier to control R from Excel than Excel from R. I use REXCEL from http://rcom.univie.ac.at/, which is not open source but pretty robust.
Here is a function that converts XLSM files to XLSX files with the R package RDCOMClient :
convert_XLSM_File_To_XLSX <- function(path_XLSM_File, path_XLSX_File)
{
xlApp <- COMCreate("Excel.Application")
xlApp[['Visible']] <- FALSE
xlApp[["DisplayAlerts"]] <- FALSE
xlWbk <- xlApp$Workbooks()$Open(path_XLSM_File)
xlWbk$SaveAs(path_XLSX_File, 51)
xlWbk$Close()
xlApp$Quit()
}
library(RDCOMClient)
convert_XLSM_File_To_XLSX(path_XLSM_File, path_XLSX_File)

Resources