I am creating a way to read in SPSS labels into R. Using library(sjPlot), view_spss(df, useViewer = FALSE) I can create a local html page such as http://localhost:11773/session/file1e0c67270a5.html that shows a nice table with columns for the variable names and the labels I am looking for.
Now I want to use rvest to scrape it but when I start with a command such as page <- rvest::html("http://localhost:11773/session/file1e0c67270a5.html") R just seems to get stuck.
I've tried searching for "connect with local host" but I can't seem to find any questions or answers related to the R package.
This doesn't really answer your specific question as I think the reason is that R spins up a non-persistent process to serve that HTML view of your data. But your approach seems quite round-a-bout to just get to variable labels. This is a general way that works quite well:
library(foreign)
d <- read.spss("your_data.sav", use.value.labels=TRUE, to.data.frame=FALSE)
var_labels <- attr(d, "variable.labels")
## To access the label of a variable named 'var_name':
var_labels[["var_name"]]
Where d results in a list of data, and var_labels is a named list of labels keyed by variable/column.
If you want to get variable and/or value label of SPSS-imported data, you can use get_val_labels and get_var_labels of the sjmisc-package.
See examples here. Both functions accept either a single variable (vector) or a data frame and return the associated variable and value labels. See also this blog post.
The sjmisc-Package supports data frames imported both with the haven- or foreign-package.
Related
I am trying to figure out how to 'download' data into a nice CSV file to be able to analyse.
I am currently looking at WHO data here:
I am doing so through following documentation and getting output like so:
test_data <- jsonlite::parse_json(url("http://apps.who.int/gho/athena/api/GHO/WHS6_102.json?profile=simple"))
head(test_data)
This gives me a rather messy list of list of lists.
For example:
I get this
It is not very easy to analyse and rather messy. How could I clean this up by using say two columns that is returned from this json_parse, information only from say dim like REGION, YEAR, COUNTRY and then the values from the column Value. I would like to make this into a nice dataframe/CSV file so I can then more easily understand what is happening.
Can anyone give any advice?
jsonlite::fromJSON gives you data in a better format and the 3rd element in the list is where the main data is.
url <- 'https://apps.who.int/gho/athena/api/GHO/WHS6_102.json?profile=simple'
tmp <- jsonlite::fromJSON(url)
data <- tmp[[3]]
I'm writing a script to plot data from multiple files. Each file is named using the same format, where strings between “.” give some info on what is in the file. For example, SITE.TT.AF.000.52.000.001.002.003.WDSD_30.csv.
These data will be from multiple sites, so SITE, or WDSD_30, or any other string, may be different depending on where the data is from, though it's position in the file name will always indicate a specific feature such as location or measurement.
So far I have each file read into R and saved as a data frame named the same as the file. I'd like to get something like the following to work: if there is a data frame in the global environment that contains WDSD_30, then plot a specific column from that data frame. The column will always have the same name, so I could write plot(WDSD_30$meas), and no matter what site's files were loaded in the global environment, the script would find the WDSD_30 file and plot the meas variable. My goal for this script is to be able to point it to any folder containing files from a particular site, and no matter what the site, the script will be able to read in the data and find files containing the variables I'm interested in plotting.
A colleague suggested I try using strsplit() to break up the file name and extract the element I want to use, then use that to rename the data frame containing that element. I'm stuck on how exactly to do this or whether this is the best approach.
Here's what I have so far:
site.files<- basename(list.files( pattern = ".csv",recursive = TRUE,full.names= FALSE))
sfsplit<- lapply(site.files, function(x) strsplit(x, ".", fixed =T)[[1]])
for (i in 1:length(site.files)) assign(site.files[i],read.csv(site.files[i]))
for (i in 1:length(site.files))
if (sfsplit[[i]][10]==grep("PARQL", sfsplit[[i]][10]))
{assign(data.frame.getting.named.PARQL, sfsplit[[i]][10])}
else if (sfsplit[i][10]==grep("IRBT", sfsplit[[i]][10]))
{assign(data.frame.getting.named.IRBT, sfsplit[[i]][10])
...and so on for each data frame I'd like to eventually plot from.Is this a good approach, or is there some better way? I'm also unclear on how to refer to the objects I made up for this example, data.frame.getting.named.xxxx, without using the entire filename as it was read into R. Is there something like data.frame[1] to generically refer to the 1st data frame in the global environment.
I can see the entire data frame in the console. Is there any possible way or any function to view data frame in the R-Console (Editing similar to that of Excel) so that I should be able to edit the data manually?
S3 method for class 'data.frame'
You can use:
edit(name, factor.mode = c("character", "numeric"),
edit.row.names = any(row.names(name) != 1:nrow(name)), ...)
Example:
edit(your_dataframe)
You can go through in detail with the help of this link - Here
You really can use edit() or view().
But maybe, if you dataset isn't big enough, if you prefer to use Excel, you can use this function below:
library(xlsx)
view.excel<-function(inputDF,nrows=5000){
if (class(inputDF)!="data.frame"){
stop("ERROR: <inputDF> class is not \"data.frame\"")
}
if(nrow(inputDF)>5000 & nrows!=-1){
inputDF=inputDF[1:nrows,]
}
tempPath=tempfile(fileext='.xlsx')
write.xlsx(inputDF,tempPath)
system(paste0('open ',tempPath))
return(invisible(tempPath))
}
I've defined this function to help me with some tasks in R...
Basically, you only need to pass a DataFrame to the function as a parameter. The function by default display a maximum of 5000 rows (you can set the parameter nrows = -1 to view all the rows, but it may be slow).
This function opens your DataFrame in Excel and returns the path where your temporary view was saved. If you wanna save and load your temporary view, after changing something directly with Excel, you can load again your data frame with:
# Open a view in excel
tempPath <- view.excel(initialDF, nrows=-1)
# Load the file of the Excel View in the new DataFrame modifiedDF
modifiedDF <- read.xlsx(tempPath)
This function may works well in Linux, Windows or Mac.
You can view the dataframe with View():
View(df)
As #David Arenburg says, you can also open your dataframe in an editable view, but be warned this is slow:
edit(df)
For updates/changes to affect the dataframe use:
df <- edit(df)
Since a lot of people are using (and developing in) RStudio and Shiny nowadays, things have become far more convenient for R users.
You should look at the rhandsontable package.
There is also very nice Shiny implementation of rhandsontable, from a blog I stumbled upon: https://stla.github.io/stlapblog/posts/shiny_editTable.html. It's not using the console, but it is anyway super slick:
(A few years later) This may be worth trying if you use RStudio: It seems to support all data types. I did not use it extensively but helped me ocassionally:
https://cran.r-project.org/web/packages/editData/README.html
It shows an editing dialog by default. If your dataframe is big you can browse to http://127.0.0.1:7212 while the dialog is being shown, to get a resizable editing view.
You can view and edit a dataframe using with the fix() function:
# Open the mtcars dataframe for editing:
fix(mtcars)
# Edit and close.
# This produces the same result:
mtcars <- edit(mtcars)
# But it is a longer command to write.
I using the Alteryx R Tool to sign an amazon http request. To do so, I need the hmac function that is included in the digest package.
I'm using a text input tool that includes the key and a datestamp.
Key= "foo"
datastamp= "20120215"
Here's the issue. When I run the following script:
the.data <- read.Alteryx("1", mode="data.frame")
write.Alteryx(base64encode(hmac(the.data$key,the.data$datestamp,algo="sha256",raw = TRUE)),1)
I get an incorrect result when compared to when I run the following:
write.Alteryx(base64encode(hmac("foo","20120215",algo="sha256",raw = TRUE)),1)
The difference being when I hardcode the values for the key and object I get the correct result. But if use the variables from the R data frame I get incorrect output.
Does the data frame alter the data in someway. Has anyone come across this when working with the R Tool in Alteryx.
Thanks for your input.
The issue appears to be that when creating the data frame, your character variables are converted to factors. The way to fix this with the data.frame constructor function is
the.data <- data.frame(Key="foo", datestamp="20120215", stringsAsFactors=FALSE)
I haven't used read.Alteryx but I assume it has a similar way of achieving this.
Alternatively, if your data frame has already been created, you can convert the factors back into character:
write.Alteryx(base64encode(hmac(
as.character(the.data$Key),
as.character(the.data$datestamp),
algo="sha256",raw = TRUE)),1)
I wish to read data into R from SAS data sets in Windows. The read.ssd function allows me to do so, however, it seems to have an issue when I try to import a SAS data set that has any non-alphabetic symbols in its name. For example, I can import table.sas7bdat using the following:
directory <- "C:/sas data sets"
sashome <- "/Program Files/SAS/SAS 9.1"
table.df <- read.ssd(directory, "table", sascmd = file.path(sashome, "sas.exe"))
but I can't do the same for a table SAS data set named table1.sas7bdat. It returns an error:
Error in file.symlink(oldPath, linkPath) :
symbolic links are not supported on this version of Windows
Given that I do not have the option to rename these data sets, is there a way to read a SAS data set that has non-alphabetic symbols in its name in to R?
Looking about, it looks like others have your problem as well. Perhaps it's just a bug.
Anyway, try the suggestion from this (old) R help post, posted by the venerable Dan Nordlund who's pretty good at this stuff - and also active on SASL (sasl#listserv.uga.edu) if you want to try cross-posting your question there.
https://stat.ethz.ch/pipermail/r-help/2008-December/181616.html
Also, you might consider the transport method if you don't mind 8 character long variable names.
Use:
directory <- "C:/sas data sets"
sashome <- "/Program Files/SAS/SAS 9.1"
table.df <- read.ssd(library=directory, mem="table1", formats=F,
sasprog=file.path(sashome, "sas.exe"))