When trying to merge two columns (pre and post) in a kwic dataframe created with the quanteda package, the resulting data frame contains only NA values. Using the paste() function from base R works perfectly fine, but I'd rather solve this issue with a tidy approach. Has anyone else experienced this before and knows what to do?
I'm including a reprex below, but unfortunately, in the reprex the unite function works perfectly fine. I'm wondering if it's related to the input being a data frame created with quanteda::kwic?
pre = c("Pre Text 1", "Pre Text 2", "Pre Text 3")
post = c("Post Text 1", "Post Text 2", "Post Text 3")
data <- data.frame(id=1:3,
pre = pre,
post = post)
data2 <- data %>%
unite("merged", pre, post, sep = " ")
EDIT: I'm including a better example in the code below. "x" is a data frame that resulted from applying kwic() to my dataset, and speeches_meta is metadata associated with the texts contained in "x". My issue is that when running the unite function on the "dput" object, it somehow doubles the amount of variables and all of the observations except for two are empty (with the two that aren't containing a bunch of information from all variables).
merged_kwic <- left_join(x, speeches_meta, by = "docname")
dput <- dput(merged_kwic[1:3, c("pre", "post")])
dput <- dput %>%
unite("merged", pre, post, sep = " ")
EDIT 2:
The following is the output I get after running the following code:
dput(merged_kwic[1:3, c("pre", "post")])
structure(list(docname = c("585662", "586622", "650973"), from = c(377L,
1665L, 562L), to = c(377L, 1665L, 562L), pre = c("5 Dies kann weder durch",
"tief in die Mottenkiste der", "unterstellen dass es ihnen um"
), keyword = c("Ostalgie", "Ostalgie", "Ostalgie"), post = c("noch durch Amnesie durch Gedächtnisverlust",
"greifen würden 33 An dieser", "geht um eine Werbung für"),
pattern = structure(c(1L, 1L, 1L), .Label = "ostalgie", class = "factor"),
id = c(585662, 586622, 650973), session = c(241, 245, 56),
electoralTerm = c(13, 13, 15), firstName = c("Dietrich",
"werner", "Vera"), lastName = c("Austermann", "schulz", "Lengsfeld"
), politicianId = c(11000066, 11002108, 11002721), factionId = c(4,
3, 4), documentUrl = c("https://dip21.bundestag.de/dip21/btp/13/13241.pdf",
"https://dip21.bundestag.de/dip21/btp/13/13245.pdf", "https://dip21.bundestag.de/dip21/btp/15/15056.pdf"
), positionShort = c("Member of Parliament", "Member of Parliament",
"Member of Parliament"), positionLong = c(NA_character_,
NA_character_, NA_character_), date = structure(c(10395,
10402, 12236), class = "Date")), ntoken = c(`585662` = 839L,
`586622` = 1724L, `650973` = 647L), row.names = c(NA, 3L), class = c("kwic",
"data.frame"))
I realized the issue here is that unite() functions don't necessarily work for kwic dataframes. After piping the dataframe into as.tibble() it ended up working just fine. Hopefully this will be helpful to people in the future!
Related
Kinda long winded but here goes:
I have a dataframe like this:
testprotocols<-structure(list(protocol_no = c("LS-P-Joe's API", "JoeTest3"),
nct_number = c(654321, 543210), library = structure(c(2L,
2L), levels = c("General Research", "Oncology"), class = "factor"),
organizational_unit = structure(c(1L, 1L), levels = c("Lifespan Cancer Institute",
"General Research"), class = "factor"), title = c("Testing to see if basic stuff came through",
"Testing Oncology Projects for API"), department = structure(c(2L,
2L), levels = c("Diagnostic Imaging", "Lifespan Cancer Institute"
), class = "factor"), protocol_type = structure(2:1, levels = c("Basic Science",
"Other"), class = "factor"), protocolid = 1:2), row.names = c(NA,
-2L), class = c("tbl_df", "tbl", "data.frame"))
I want to push it into a website using an API with this code, that'll go line by line and return a dataframe given me the status of whether or not that row worked:
##This chunk gets a random one we're going to change later
base <- "https://website.forteresearchapps.com"
endpoint <- "/website/rest/protocols/"
protocol <- "2501"
## 'results' will get changed later to plug back in
## store
protocolid <- protocolnb <- library_names <- get_codes <- put_codes <- list()
UpdateAccountNumbers <- function(protocol){
call2<-paste(base,endpoint, protocol, sep="")
httpResponse <- GET(call2, add_headers(authorization = token))
results = fromJSON(content(httpResponse, "text"))
results$protocolId<- "8887" ## doesn't seem to matter
results$protocolNo<- testprotocols$protocol_no
results$library<- as.character(testprotocols$library)
results$title<- testprotocols$title
results$nctNo<-testprotocols$nct_number
results$objectives<-"To see if the API works, specifically if you can write over a previous number"
results$shortTitle<- "Short joseph Title"
results$nctNo<-testprotocols$nct_number
results$department <- as.character(testprotocols$department)
results$organizationalUnit<- as.charater(testprotocols$organizational_unit)
results$protocolType<- as.character(testprotocols$protocol_type)
call2 <- paste(base,endpoint, protocol, sep="")
httpResponse_put <- PUT(
call2,
add_headers(authorization = token),
body=results, encode = "json",
verbose()
)
# save stats
protocolid <- append(protocolid, protocol)
protocolnb <- append(protocolnb, testprotocols$PROTOCOL_NO[match(protocol, testprotocols$PROTOCOL_ID)])
library_names <- append(library_names, testprotocols$LIBRARY[match(protocol, testprotocols$PROTOCOL_ID)])
get_codes <- append(get_codes, status_code(httpResponse_get))
put_codes <- append(put_codes, status_code(httpResponse_put))
}
## Oncology will have to change to whatever the df name is, above and below this
purrr::walk(testprotocols$protocol_no, UpdateAccountNumbers)
allresults <- tibble('protocolNo'=unlist(protocol_no),'protocolnb'=unlist(protocolnb),'library_names'=unlist(library_names), 'get_codes'=unlist(get_codes), 'put_codes'=unlist(put_codes) )
This basic gist of purrr loop is from my question here: Question
The only difference is that in that question, I was only doing one small change within the loop, this line:
results$hospitalAccountNo <- results$internalAccountNo
Where it would take what it downloaded from the API, copy it over to 'hospitalAccountNo' and put it back up.
This time around, I'm trying to make a few more changes: all of these lines which I envision using the 'testprotocols' dataframe and writing over the 'results' it downloaded, then uploading one row at a time using the loop.
results$protocolId<- "8887" ## doesn't seem to matter
results$protocolNo<- testprotocols$protocol_no
results$library<- as.character(testprotocols$library)
results$title<- testprotocols$title
results$nctNo<-testprotocols$nct_number
results$objectives<-"To see if the API works, specifically if you can write over a previous number"
results$shortTitle<- "Short joseph Title"
results$nctNo<-testprotocols$nct_number
results$department <- as.character(testprotocols$department)
results$organizationalUnit<- as.charater(testprotocols$organizational_unit)
results$protocolType<- as.character(testprotocols$protocol_type)
For whatever reason, when I try to run the line:
purrr::walk(testprotocols$protocol_no, UpdateAccountNumbers)
If I run traceback() I get this:
I'd love it if someone could just fix my entire loop for me haha, but realistically my question is:
Where should I look to figure out what is causing this error?
So I'm running the code below in R Studio and getting this error:
Error in UseMethod("tbl_vars") : no applicable method for 'tbl_vars'
applied to an object of class "character"
I don't know how to fix it cause there is no tbl_vars function! Can someone help?
for (i in 1:ceiling(nrow(reviews)/batch)) {
row_start <- i*batch-batch+1
row_end <- ifelse(i*batch < nrow(reviews), i*batch, nrow(reviews))
print(paste("Processing row", row_start, "to row", row_end))
reviews[row_start:row_end, ] %>%
unnest_tokens(word, text) -> reviews_subset
reviews_subset$row <- 1:nrow(reviews_subset)
reviews_subset %>%
anti_join(stopwords) %>%
arrange(row) -> reviews_subset
write_feather(reviews_subset, path = paste0("reviews", i, ".txt"))
}
Ps: dplyr is installed. Also other installed packages: pacman, feather, data.table, devtools, tidyr, tidytext, tokenizers, tibble
I'm using it to work with Yelp dataset.
Thank you so much,
Carmem
ps2: dataset example (edited and simplified to fit here):
> dput(as.data.frame(review))
structure(list(user_id = 1:10, review_id = 11:20, business_id = 21:30,
stars = c(2L, 2L, 5L, 4L, 4L, 5L, 4L, 3L, 5L, 4L), text = c("Are you the type of person that requires being seen in an expensive, overly pretentious restaurant so that you can wear it as a status symbol? Or maybe you're a gansta who dresses like CiLo Green and wants to show the hunny's (yes, a group of them out with one man) a night on the town!",
"Today was my first visit to the new luna, and I was disappointed-- both because I really liked the old cafe luna, and because the new luna came well recommended",
"Stayed here a few months ago and still remember the great service I received.",
"I came here for a business lunch from NYC and had a VERY appetizing meal. ",
"Incredible food with great flavor. ",
"OMG, y'all, try the Apple Pie Moonshine. It. Is. Seriously. Good. Smoooooooth. The best rum that I've sampled so far: Zaya.",
"Caitlin is an amazing stylist. She took time to hear what I had to say before jumping in",
"Oh yeah! After some difficulties in securing dinner, my dad and I found ourselves at one of the billion Primanti's locations for a quick feast",
"I've been going to this studio since the beginning of January",
"The best cannoli, hands down!!"
)), .Names = c("user_id", "review_id", "business_id", "stars",
"text"), row.names = c(NA, -10L), class = "data.frame")
change anti_join(stopwords) to anti_join(stop_words). stopwords probably doesn't exist or isn't what you want it to be
The
Error in UseMethod("tbl_vars") : no applicable method for 'tbl_vars'...
message is not being caused by a missing tbl_vars function. I ran into this exact same error when I mistakenly passed a vector to a dplyr join function instead of another dataframe. Here is a simple example of how to generate this error in R 3.5 using dplyr 0.7.5:
library(dplyr)
# Create a dataframe of sales by person and bike color
salesNames = c('Sally', 'Jim', 'Chris', 'Chris', 'Jim',
'Sally', 'Jim', 'Sally', 'Chris', 'Sally')
salesDates = c('2018-06-01', '2018-06-05', '2018-06-10', '2018-06-15',
'2018-06-20', '2018-06-25', '2018-06-30', '2018-07-09',
'2018-07-12', '2018-07-14')
salesColor = c('red', 'red', 'red', 'green', 'red',
'blue', 'green', 'green', 'green', 'blue')
df_sales = data.frame(Salesperson = salesNames,
SalesDate = as.Date(salesDates),
BikeColor = salesColor,
stringsAsFactors = F)
# Create another dataframe to join to
modelColor = c('red', 'blue', 'green', 'yellow', 'orange', 'black')
modelPrice = c(279.95, 269.95, 264.95, 233.54, 255.27, 289.95)
modelCommission = modelPrice * 0.20
df_commissions = data.frame(ModelColor = modelColor,
ModelPrice = modelPrice,
Commission = modelCommission,
stringsAsFactors = F)
df_sales_comm = df_sales %>% left_join(df_commissions,
by = c('BikeColor'= 'ModelColor'))
This works fine. Now try this:
df_comms = df_commissions$ModelColor # vector instead of dataframe
df_sales_comm2 = df_sales %>% left_join(df_comms,
by = c('BikeColor'= 'ModelColor'))
and you should see the exact same error you report because df_comms is not a dateframe. The problem you are having is that stopwords is a vector and not a dataframe (or a tibble).
There are several ways to resolve this error. As Szczepaniak points out, the root cause is attempting to pass a character vector into an operation that expects a data frame or tibble.
Option 1: convert the character vector to a data frame (or tibble), then use in anti_join. An example conversion:
`stopwords <- tibble(joinColumn = stopwords)`
Option 2: change the operation to accept a character vector. In this case we can use filter in place of anti_join as shown here:
`reviews_subset <- reviews_subset %>%
filter(!joinColumn %in% stopwords) %>%
arrange(row) -> reviews_subset`
This should be really simple. I am currently trying to make a list I am building slightly more efficient. Instead of having to write out:
list('1'= value1, '2' =value1, '3' = value1)
how would I condense this to be able to simply list the numbers I want to be equal to value1. e.g. '1:4' =value1 or '1,2,3,4' =value1
EDIT:
So, for background, I am currently trying to create custom formatting for an excel file using the xlsx package.
wb = createWorkbook()
sheet =createSheet(wb,sheetName = "TestFormatting")
dfcurrency = DataFormat("[$$-409]#,##0_ ;[Red]-[$$-409]#,##0 ")
dfdate = DataFormat("m/d/yyyy")
currency = CellStyle(wb, dataFormat = dfcurrency)
date = CellStyle(wb, dataFormat = dfdate)
datastyle = setNames(as.list(c(currency,date)),rep(c(3,4),c(1)))
data = addDataFrame(table,sheet, colStyle = datastyle)
Is what I am currently running, thanks to akrun's help. This gives the error:
Error in thisColStyle$ref : no field, method or inner class called 'ref'
And just in case it's useful, here is the data structure of table:
structure(list(workingdate = structure(c(1458518400, 1458604800,
1458691200, 1458777600, 1458864000, 1459119600), class = c("POSIXct",
"POSIXt"), tzone = ""), trader = structure(c(1L, 1L, 1L, 1L,
1L, 1L), .Label = c("a", "b", "c",
"d", "e"), class = "factor"), pnl.1d = c(3,
-573.7978, -107.1941, 1128.3061, -0.709699999999998, 3.55990000000003
), rt.1d.Util = c(0, -3.82531866666667e-05, -7.14627333333333e-06,
7.52204066666667e-05, -4.73133333333332e-08, 2.37326666666669e-07
)), .Names = c("workingdate", "trader", "pnl.1d", "rt.1d.Util"
), row.names = c(NA, 6L), class = "data.frame")
Here's a very general way to do similar things. This solution is likely more convoluted than the best solution, but it will work and can be extended to similar problems. It is based on eval and parse. parse turns a string into an unevaluated expression, eval evaluates it.
So, eval(parse(text="5+5")) will return 10.
If we can create the string "list('1'=value1, '2'=value1, '3'=value1)", we can then use eval(parse(text= to turn it into the list you want.
The following code will create the above string:
value1 <- 'asdf'
paste(
'list(', paste(sapply(seq_len(4),
function(n) { paste("'", n,"'", "=", "value1", sep="")}),
collapse = ","),
')')
So, combining everything, call
eval(parse(text=
paste(
'list(', paste(sapply(seq_len(4),
function(n) { paste("'", n,"'", "=", "value1", sep="")}),
collapse = ","),
')')))
And you get the list you want.
Thanks to Julian's comment I was able to create a solution to this. I will accept Julian's comment as the answer but will give my own (less general) solution as an example. It basically applies his solution so as to create more customisability in an albeit very roundabout way:
#if no columns need a type of format enter 0
a =paste(sapply(list(c(
#enter column numbers formatted as currency eg. 1:5, 8, 10
3
)),
function(n) { paste("'", n,"'", "=", "currency", sep="")}))
b =paste(sapply(list(c(
#columns formatted as date
1
)),
function(n) { paste("'", n,"'", "=", "date", sep="")}))
You can continue in this fashion with this general formula for as many variables as you like. You can then combine them into one text file ready to be parsed:
text = paste( 'list(',paste(c(a,b,c,d), collapse = ","),')')
datastyle = eval(parse(text = text))
where you simply enter all your formats or styles in a,b,c,d,...
Hopefully this will help someone who finds a similar problem.
I'm working in R, but I need to deliver some data in SPSS format with both 'variable labels' and 'value labels' and I'm kinda stuck.
I've added variable labels to my data using the Hmisc's label function. This add the variable labels as a label attribute, which is handy when using describe() from the Hmisc package. The problem is that I cannot get the write.foreign() function, from the foreign package, to recognize these labels as variable labels. I imagine I need to modify write.foreign() to use the label attribute as variable label when writing the .sps file.
I looked at the R list and at stackoverflow, but I could only find a post from 2006 on the R list regarding exporting varibles labels to SPSS from R and it doesn't seem to answer my question.
Here is my working example,
# First I create a dummy dataset
df <- data.frame(id = c(1:6), p.code = c(1, 5, 4, NA, 0, 5),
p.label = c('Optometrists', 'Nurses', 'Financial analysts',
'<NA>', '0', 'Nurses'), foo = LETTERS[1:6])
# Second, I add some variable labels using label from the Hmisc package
# install.packages('Hmisc', dependencies = TRUE)
library(Hmisc)
label(df) <- "Sweet sweet data"
label(df$id) <- "id !##$%^"
label(df$p.label) <- "Profession with human readable information"
label(df$p.code) <- "Profession code"
label(df$foo) <- "Variable label for variable x.var"
# modify the name of one varibes, just to see what happens when exported.
names(df)[4] <- "New crazy name for 'foo'"
# Third I export the data with write.foreign from the foreign package
# install.packages('foreign', dependencies = TRUE)
setwd('C:\\temp')
library(foreign)
write.foreign(df,"df.wf.txt","df.wf.sps", package="SPSS")
list.files()
[1] "df.wf.sps" "df.wf.txt"
When I inspect the .sps file (see the content of 'df.wf.sps' below) my variable labels are identical to my variable names, except for foo that I renamed to "New crazy name for 'foo'." This variable has a new and seemly random name, but the correct variable label.
Does anyone know how to get the label attributes and the variable names exported as 'variable labels' and 'labels names' into a .sps file? Maybe there is a smarter way to store 'variable labels' then my current method?
Any help would be greatly appreciated.
Thanks, Eric
Content of 'df.wf.sps' export using write.foreign from the foreign package
DATA LIST FILE= "df.wf.txt" free (",")
/ id p.code p.label Nwcnf.f. .
VARIABLE LABELS
id "id"
p.code "p.code"
p.label "p.label"
Nwcnf.f. "New crazy name for 'foo'"
.
VALUE LABELS
/
p.label
1 "0"
2 "Financial analysts"
3 "Nurses"
4 "Optometrists"
/
Nwcnf.f.
1 "A"
2 "B"
3 "C"
4 "D"
5 "E"
6 "F"
.
EXECUTE.
Update April 16 2012 at 15:54:24 PDT;
What I am looking for is a way to tweak write.foreign to write a .sps file where this part,
[…]
VARIABLE LABELS
id "id"
p.code "p.code"
p.label "p.label"
Nwcnf.f. "New crazy name for 'foo'"
[…]
looks like this,
[…]
VARIABLE LABELS
id "id !##$%^"
p.code "Profession code"
p.label "Profession with human readable information"
"New crazy name for 'foo'" "New crazy name for 'foo'"
[…]
The last line is a bit ambitious, I don't really need to have a variables with white spaces in the names, but I would like the label attributes to be transferred to the .spas file (that I produce with R).
Try this function and see if it works for you. If not, add a comment and I can see what I can do as far as troubleshooting goes.
# Step 1: Make a backup of your data, just in case
df.orig = df
# Step 2: Load the following function
get.var.labels = function(data) {
a = do.call(llist, data)
tempout = vector("list", length(a))
for (i in 1:length(a)) {
tempout[[i]] = label(a[[i]])
}
b = unlist(tempout)
structure(c(b), .Names = names(data))
}
# Step 3: Apply the variable.label attributes
attributes(df)$variable.labels = get.var.labels(df)
# Step 4: Load the write.SPSS function available from
# https://stat.ethz.ch/pipermail/r-help/2006-January/085941.html
# Step 5: Write your SPSS datafile and codefile
write.SPSS(df, "df.sav", "df.sps")
The above example is assuming that your data is named df, and you have used Hmisc to add labels, as you described in your question.
Update: A Self-Contained Function
If you do not want to alter your original file, as in the example above, and if you are connected to the internet while you are using this function, you can try this self-contained function:
write.Hmisc.SPSS = function(data, datafile, codefile) {
a = do.call(llist, data)
tempout = vector("list", length(a))
for (i in 1:length(a)) {
tempout[[i]] = label(a[[i]])
}
b = unlist(tempout)
label.temp = structure(c(b), .Names = names(data))
attributes(data)$variable.labels = label.temp
source("http://dl.dropbox.com/u/2556524/R%20Functions/writeSPSS.R")
write.SPSS(data, datafile, codefile)
}
Usage is simple:
write.Hmisc.SPSS(df, "df.sav", "df.sps")
The function that you linked to (here) should work, but I think the problem is that your dataset doesn't actually have the variable.label and label.table attributes that would be needed to write the SPSS script file.
I don't have access to SPSS, but try the following and see if it at least points you in the right direction. Unfortunately, I don't see an easy way to do this other than editing the output of dput manually.
df = structure(list(id = 1:6,
p.code = c(1, 5, 4, NA, 0, 5),
p.label = structure(c(5L, 4L, 2L, 3L, 1L, 4L),
.Label = c("0", "Financial analysts",
"<NA>", "Nurses",
"Optometrists"),
class = "factor"),
foo = structure(1:6,
.Label = c("A", "B", "C", "D", "E", "F"),
class = "factor")),
.Names = c("id", "p.code", "p.label", "foo"),
label.table = structure(list(id = NULL,
p.code = NULL,
p.label = structure(c("1", "2", "3", "4", "5"),
.Names = c("0", "Financial analysts",
"<NA>", "Nurses",
"Optometrists")),
foo = structure(1:6,
.Names = c("A", "B", "C", "D", "E", "F"))),
.Names = c("id", "p.code", "p.label", "foo")),
variable.labels = structure(c("id !##$%^", "Profession code",
"Profession with human readable information",
"New crazy name for 'foo'"),
.Names = c("id", "p.code", "p.label", "foo")),
codepage = 65001L)
Compare the above with the output of dput for your sample dataset. Notice that label.table and variable.labels have been added, and a line that said something like row.names = c(NA, -6L), class = "data.frame" was removed.
Update
NOTE: This will not work with the default write.foreign function in R. To test this you first need to load the write.SPSS function shared here, and (of course), make sure that you have the foreign package loaded. Then, you write your files as follows:
write.SPSS(df, datafile="df.sav", codefile="df.sps")
I'm working in R, but I need to deliver some data in SPSS format with both 'variable labels' and 'value labels' and I'm kinda stuck.
I've added variable labels to my data using the Hmisc's label function. This add the variable labels as a label attribute, which is handy when using describe() from the Hmisc package. The problem is that I cannot get the write.foreign() function, from the foreign package, to recognize these labels as variable labels. I imagine I need to modify write.foreign() to use the label attribute as variable label when writing the .sps file.
I looked at the R list and at stackoverflow, but I could only find a post from 2006 on the R list regarding exporting varibles labels to SPSS from R and it doesn't seem to answer my question.
Here is my working example,
# First I create a dummy dataset
df <- data.frame(id = c(1:6), p.code = c(1, 5, 4, NA, 0, 5),
p.label = c('Optometrists', 'Nurses', 'Financial analysts',
'<NA>', '0', 'Nurses'), foo = LETTERS[1:6])
# Second, I add some variable labels using label from the Hmisc package
# install.packages('Hmisc', dependencies = TRUE)
library(Hmisc)
label(df) <- "Sweet sweet data"
label(df$id) <- "id !##$%^"
label(df$p.label) <- "Profession with human readable information"
label(df$p.code) <- "Profession code"
label(df$foo) <- "Variable label for variable x.var"
# modify the name of one varibes, just to see what happens when exported.
names(df)[4] <- "New crazy name for 'foo'"
# Third I export the data with write.foreign from the foreign package
# install.packages('foreign', dependencies = TRUE)
setwd('C:\\temp')
library(foreign)
write.foreign(df,"df.wf.txt","df.wf.sps", package="SPSS")
list.files()
[1] "df.wf.sps" "df.wf.txt"
When I inspect the .sps file (see the content of 'df.wf.sps' below) my variable labels are identical to my variable names, except for foo that I renamed to "New crazy name for 'foo'." This variable has a new and seemly random name, but the correct variable label.
Does anyone know how to get the label attributes and the variable names exported as 'variable labels' and 'labels names' into a .sps file? Maybe there is a smarter way to store 'variable labels' then my current method?
Any help would be greatly appreciated.
Thanks, Eric
Content of 'df.wf.sps' export using write.foreign from the foreign package
DATA LIST FILE= "df.wf.txt" free (",")
/ id p.code p.label Nwcnf.f. .
VARIABLE LABELS
id "id"
p.code "p.code"
p.label "p.label"
Nwcnf.f. "New crazy name for 'foo'"
.
VALUE LABELS
/
p.label
1 "0"
2 "Financial analysts"
3 "Nurses"
4 "Optometrists"
/
Nwcnf.f.
1 "A"
2 "B"
3 "C"
4 "D"
5 "E"
6 "F"
.
EXECUTE.
Update April 16 2012 at 15:54:24 PDT;
What I am looking for is a way to tweak write.foreign to write a .sps file where this part,
[…]
VARIABLE LABELS
id "id"
p.code "p.code"
p.label "p.label"
Nwcnf.f. "New crazy name for 'foo'"
[…]
looks like this,
[…]
VARIABLE LABELS
id "id !##$%^"
p.code "Profession code"
p.label "Profession with human readable information"
"New crazy name for 'foo'" "New crazy name for 'foo'"
[…]
The last line is a bit ambitious, I don't really need to have a variables with white spaces in the names, but I would like the label attributes to be transferred to the .spas file (that I produce with R).
Try this function and see if it works for you. If not, add a comment and I can see what I can do as far as troubleshooting goes.
# Step 1: Make a backup of your data, just in case
df.orig = df
# Step 2: Load the following function
get.var.labels = function(data) {
a = do.call(llist, data)
tempout = vector("list", length(a))
for (i in 1:length(a)) {
tempout[[i]] = label(a[[i]])
}
b = unlist(tempout)
structure(c(b), .Names = names(data))
}
# Step 3: Apply the variable.label attributes
attributes(df)$variable.labels = get.var.labels(df)
# Step 4: Load the write.SPSS function available from
# https://stat.ethz.ch/pipermail/r-help/2006-January/085941.html
# Step 5: Write your SPSS datafile and codefile
write.SPSS(df, "df.sav", "df.sps")
The above example is assuming that your data is named df, and you have used Hmisc to add labels, as you described in your question.
Update: A Self-Contained Function
If you do not want to alter your original file, as in the example above, and if you are connected to the internet while you are using this function, you can try this self-contained function:
write.Hmisc.SPSS = function(data, datafile, codefile) {
a = do.call(llist, data)
tempout = vector("list", length(a))
for (i in 1:length(a)) {
tempout[[i]] = label(a[[i]])
}
b = unlist(tempout)
label.temp = structure(c(b), .Names = names(data))
attributes(data)$variable.labels = label.temp
source("http://dl.dropbox.com/u/2556524/R%20Functions/writeSPSS.R")
write.SPSS(data, datafile, codefile)
}
Usage is simple:
write.Hmisc.SPSS(df, "df.sav", "df.sps")
The function that you linked to (here) should work, but I think the problem is that your dataset doesn't actually have the variable.label and label.table attributes that would be needed to write the SPSS script file.
I don't have access to SPSS, but try the following and see if it at least points you in the right direction. Unfortunately, I don't see an easy way to do this other than editing the output of dput manually.
df = structure(list(id = 1:6,
p.code = c(1, 5, 4, NA, 0, 5),
p.label = structure(c(5L, 4L, 2L, 3L, 1L, 4L),
.Label = c("0", "Financial analysts",
"<NA>", "Nurses",
"Optometrists"),
class = "factor"),
foo = structure(1:6,
.Label = c("A", "B", "C", "D", "E", "F"),
class = "factor")),
.Names = c("id", "p.code", "p.label", "foo"),
label.table = structure(list(id = NULL,
p.code = NULL,
p.label = structure(c("1", "2", "3", "4", "5"),
.Names = c("0", "Financial analysts",
"<NA>", "Nurses",
"Optometrists")),
foo = structure(1:6,
.Names = c("A", "B", "C", "D", "E", "F"))),
.Names = c("id", "p.code", "p.label", "foo")),
variable.labels = structure(c("id !##$%^", "Profession code",
"Profession with human readable information",
"New crazy name for 'foo'"),
.Names = c("id", "p.code", "p.label", "foo")),
codepage = 65001L)
Compare the above with the output of dput for your sample dataset. Notice that label.table and variable.labels have been added, and a line that said something like row.names = c(NA, -6L), class = "data.frame" was removed.
Update
NOTE: This will not work with the default write.foreign function in R. To test this you first need to load the write.SPSS function shared here, and (of course), make sure that you have the foreign package loaded. Then, you write your files as follows:
write.SPSS(df, datafile="df.sav", codefile="df.sps")