R error : arguments imply differing number of rows - r

So I am trying to operate a function over a few columns of a data frame, using a for loop.
z <- function(x) gsub("[^\\.\\d]", "", x, perl = TRUE)
data <- cbind(data[1:2], for(i in seq(3, 9)) {y(data[[i]])})
I keep running into the error as mentioned in the subject
arguments imply differing number of rows
The number of rows in all my columns are same.
I tried to use lapply for this, but though it works, it converts the column types over which I apply the function to factor. The columns are numerical values, but are originally read as characters from the file (they are stored as such). So when I try to convert to numbers after using lapply, I get number of levels as output (like, 1,2,3...)
Any suggestions, using either the for loop, or lapply are welcome. Thanks in advance.
> dput(head(data,3))
structure(list(MCF.Channel.Grouping = structure(c(6L, 6L, 6L), .Label = c("(Other)",
"Direct", "Display", "Email", "Organic Search", "Paid Search",
"Referral", "Social Network"), class = "factor"), Device.Category = structure(c(2L,
1L, 3L), .Label = c("desktop", "mobile", "tablet"), class = "factor"),
Spend = c("A$503,172.17", "A$375,940.43", "A$92,560.94"),
Clicks = c("1,545,416", "1,037,740", "291,314"), Impressions = c("7,328,657",
"3,787,612", "1,178,508"), Data.Driven.Conversions = c("1,697,814.32",
"1,540,810.43", "430,738.63"), Data.Driven.CPA = c("A$0.30",
"A$0.24", "A$0.21"), Data.Driven.Conversion.Value = c("A$12,815,842.66",
"A$13,883,073.58", "A$3,804,800.15"), Data.Driven.ROAS = c("2547.01%",
"3692.89%", "4110.59%")), .Names = c("MCF.Channel.Grouping",
"Device.Category", "Spend", "Clicks", "Impressions", "Data.Driven.Conversions",
"Data.Driven.CPA", "Data.Driven.Conversion.Value", "Data.Driven.ROAS"
), row.names = c(NA, 3L), class = "data.frame")

We can use
data[-(1:2)] <- lapply(data[-(1:2)], z)
The function is run on columns that are not the first or second. The output is assigned to the same subset in the data.
The original method did not work because the for loop does not result in saved output. Check by trying to save it as a variable:
x <- for(i in seq(3, 9)) {z(data[[i]])}
x
NULL
Even though we saved the contents of the loop, nothing was captured. The loop ran then dumped the results. To see how a loop could work, we can assign values within:
for ( i in 3:9) data[,i] <- z(data[,i])

Related

Setting several values at once in a list r

This should be really simple. I am currently trying to make a list I am building slightly more efficient. Instead of having to write out:
list('1'= value1, '2' =value1, '3' = value1)
how would I condense this to be able to simply list the numbers I want to be equal to value1. e.g. '1:4' =value1 or '1,2,3,4' =value1
EDIT:
So, for background, I am currently trying to create custom formatting for an excel file using the xlsx package.
wb = createWorkbook()
sheet =createSheet(wb,sheetName = "TestFormatting")
dfcurrency = DataFormat("[$$-409]#,##0_ ;[Red]-[$$-409]#,##0 ")
dfdate = DataFormat("m/d/yyyy")
currency = CellStyle(wb, dataFormat = dfcurrency)
date = CellStyle(wb, dataFormat = dfdate)
datastyle = setNames(as.list(c(currency,date)),rep(c(3,4),c(1)))
data = addDataFrame(table,sheet, colStyle = datastyle)
Is what I am currently running, thanks to akrun's help. This gives the error:
Error in thisColStyle$ref : no field, method or inner class called 'ref'
And just in case it's useful, here is the data structure of table:
structure(list(workingdate = structure(c(1458518400, 1458604800,
1458691200, 1458777600, 1458864000, 1459119600), class = c("POSIXct",
"POSIXt"), tzone = ""), trader = structure(c(1L, 1L, 1L, 1L,
1L, 1L), .Label = c("a", "b", "c",
"d", "e"), class = "factor"), pnl.1d = c(3,
-573.7978, -107.1941, 1128.3061, -0.709699999999998, 3.55990000000003
), rt.1d.Util = c(0, -3.82531866666667e-05, -7.14627333333333e-06,
7.52204066666667e-05, -4.73133333333332e-08, 2.37326666666669e-07
)), .Names = c("workingdate", "trader", "pnl.1d", "rt.1d.Util"
), row.names = c(NA, 6L), class = "data.frame")
Here's a very general way to do similar things. This solution is likely more convoluted than the best solution, but it will work and can be extended to similar problems. It is based on eval and parse. parse turns a string into an unevaluated expression, eval evaluates it.
So, eval(parse(text="5+5")) will return 10.
If we can create the string "list('1'=value1, '2'=value1, '3'=value1)", we can then use eval(parse(text= to turn it into the list you want.
The following code will create the above string:
value1 <- 'asdf'
paste(
'list(', paste(sapply(seq_len(4),
function(n) { paste("'", n,"'", "=", "value1", sep="")}),
collapse = ","),
')')
So, combining everything, call
eval(parse(text=
paste(
'list(', paste(sapply(seq_len(4),
function(n) { paste("'", n,"'", "=", "value1", sep="")}),
collapse = ","),
')')))
And you get the list you want.
Thanks to Julian's comment I was able to create a solution to this. I will accept Julian's comment as the answer but will give my own (less general) solution as an example. It basically applies his solution so as to create more customisability in an albeit very roundabout way:
#if no columns need a type of format enter 0
a =paste(sapply(list(c(
#enter column numbers formatted as currency eg. 1:5, 8, 10
3
)),
function(n) { paste("'", n,"'", "=", "currency", sep="")}))
b =paste(sapply(list(c(
#columns formatted as date
1
)),
function(n) { paste("'", n,"'", "=", "date", sep="")}))
You can continue in this fashion with this general formula for as many variables as you like. You can then combine them into one text file ready to be parsed:
text = paste( 'list(',paste(c(a,b,c,d), collapse = ","),')')
datastyle = eval(parse(text = text))
where you simply enter all your formats or styles in a,b,c,d,...
Hopefully this will help someone who finds a similar problem.

How to prepare transaction data for arules

I've been digging the questions for 3 days already so finally have a courage to ask here.
I have a dataset of 379,584 entries and I want to feed it to "arules" in R
It looks like this
A. If I try to go with the format = "basket", I do the following
sales <- read.csv("sales.csv", sep=";")
s1 <- split(sales$product_id, sales$order_id)
s1 <- unique(s1)
tr <- as(s1, "transactions")
This gives me an error "can not coerce list with transactions with duplicated items"
B. If I go with the format = "single"
tr <- read.transactions("sales.csv",
sep=";", format = "single", cols = c(4,2))
I have the same error "can not coerce list with transactions with duplicated items"
I've already checked the files for duplicates and Excel can't find any. I believe the trouble is trivial but I'm just stuck.
Apparently the unique(s1) is causing some problem to your coding. Is it required?
I'd managed to create the transaction just by hashing out that line.
sales <- structure(list(sku = c(207426L, 207422L, 207424L, 9793L, 33186L,
72406L), product_id = c(15729L, 15725L, 15727L, 15999L, 15983L,
15992L), item_id = 1:6, order_id = c(1L, 1L, 1L, 2L, 2L, 2L)),
.Names = c("sku", "product_id", "item_id", "order_id"),
class = "data.frame", row.names = c(NA, -6L))
s1 <- split(sales$product_id, sales$order_id)
#s1 <- unique(s1)
tr <- as(s1, "transactions")
tr
transactions in sparse format with
2 transactions (rows) and
6 items (columns)
If unique is really required, run this instead:
s1 <- lapply(s1, unique)

R export to SPSS file, with variable names longer than 8 characters [duplicate]

I'm working in R, but I need to deliver some data in SPSS format with both 'variable labels' and 'value labels' and I'm kinda stuck.
I've added variable labels to my data using the Hmisc's label function. This add the variable labels as a label attribute, which is handy when using describe() from the Hmisc package. The problem is that I cannot get the write.foreign() function, from the foreign package, to recognize these labels as variable labels. I imagine I need to modify write.foreign() to use the label attribute as variable label when writing the .sps file.
I looked at the R list and at stackoverflow, but I could only find a post from 2006 on the R list regarding exporting varibles labels to SPSS from R and it doesn't seem to answer my question.
Here is my working example,
# First I create a dummy dataset
df <- data.frame(id = c(1:6), p.code = c(1, 5, 4, NA, 0, 5),
p.label = c('Optometrists', 'Nurses', 'Financial analysts',
'<NA>', '0', 'Nurses'), foo = LETTERS[1:6])
# Second, I add some variable labels using label from the Hmisc package
# install.packages('Hmisc', dependencies = TRUE)
library(Hmisc)
label(df) <- "Sweet sweet data"
label(df$id) <- "id !##$%^"
label(df$p.label) <- "Profession with human readable information"
label(df$p.code) <- "Profession code"
label(df$foo) <- "Variable label for variable x.var"
# modify the name of one varibes, just to see what happens when exported.
names(df)[4] <- "New crazy name for 'foo'"
# Third I export the data with write.foreign from the foreign package
# install.packages('foreign', dependencies = TRUE)
setwd('C:\\temp')
library(foreign)
write.foreign(df,"df.wf.txt","df.wf.sps", package="SPSS")
list.files()
[1] "df.wf.sps" "df.wf.txt"
When I inspect the .sps file (see the content of 'df.wf.sps' below) my variable labels are identical to my variable names, except for foo that I renamed to "New crazy name for 'foo'." This variable has a new and seemly random name, but the correct variable label.
Does anyone know how to get the label attributes and the variable names exported as 'variable labels' and 'labels names' into a .sps file? Maybe there is a smarter way to store 'variable labels' then my current method?
Any help would be greatly appreciated.
Thanks, Eric
Content of 'df.wf.sps' export using write.foreign from the foreign package
DATA LIST FILE= "df.wf.txt" free (",")
/ id p.code p.label Nwcnf.f. .
VARIABLE LABELS
id "id"
p.code "p.code"
p.label "p.label"
Nwcnf.f. "New crazy name for 'foo'"
.
VALUE LABELS
/
p.label
1 "0"
2 "Financial analysts"
3 "Nurses"
4 "Optometrists"
/
Nwcnf.f.
1 "A"
2 "B"
3 "C"
4 "D"
5 "E"
6 "F"
.
EXECUTE.
Update April 16 2012 at 15:54:24 PDT;
What I am looking for is a way to tweak write.foreign to write a .sps file where this part,
[…]
VARIABLE LABELS
id "id"
p.code "p.code"
p.label "p.label"
Nwcnf.f. "New crazy name for 'foo'"
[…]
looks like this,
[…]
VARIABLE LABELS
id "id !##$%^"
p.code "Profession code"
p.label "Profession with human readable information"
"New crazy name for 'foo'" "New crazy name for 'foo'"
[…]
The last line is a bit ambitious, I don't really need to have a variables with white spaces in the names, but I would like the label attributes to be transferred to the .spas file (that I produce with R).
Try this function and see if it works for you. If not, add a comment and I can see what I can do as far as troubleshooting goes.
# Step 1: Make a backup of your data, just in case
df.orig = df
# Step 2: Load the following function
get.var.labels = function(data) {
a = do.call(llist, data)
tempout = vector("list", length(a))
for (i in 1:length(a)) {
tempout[[i]] = label(a[[i]])
}
b = unlist(tempout)
structure(c(b), .Names = names(data))
}
# Step 3: Apply the variable.label attributes
attributes(df)$variable.labels = get.var.labels(df)
# Step 4: Load the write.SPSS function available from
# https://stat.ethz.ch/pipermail/r-help/2006-January/085941.html
# Step 5: Write your SPSS datafile and codefile
write.SPSS(df, "df.sav", "df.sps")
The above example is assuming that your data is named df, and you have used Hmisc to add labels, as you described in your question.
Update: A Self-Contained Function
If you do not want to alter your original file, as in the example above, and if you are connected to the internet while you are using this function, you can try this self-contained function:
write.Hmisc.SPSS = function(data, datafile, codefile) {
a = do.call(llist, data)
tempout = vector("list", length(a))
for (i in 1:length(a)) {
tempout[[i]] = label(a[[i]])
}
b = unlist(tempout)
label.temp = structure(c(b), .Names = names(data))
attributes(data)$variable.labels = label.temp
source("http://dl.dropbox.com/u/2556524/R%20Functions/writeSPSS.R")
write.SPSS(data, datafile, codefile)
}
Usage is simple:
write.Hmisc.SPSS(df, "df.sav", "df.sps")
The function that you linked to (here) should work, but I think the problem is that your dataset doesn't actually have the variable.label and label.table attributes that would be needed to write the SPSS script file.
I don't have access to SPSS, but try the following and see if it at least points you in the right direction. Unfortunately, I don't see an easy way to do this other than editing the output of dput manually.
df = structure(list(id = 1:6,
p.code = c(1, 5, 4, NA, 0, 5),
p.label = structure(c(5L, 4L, 2L, 3L, 1L, 4L),
.Label = c("0", "Financial analysts",
"<NA>", "Nurses",
"Optometrists"),
class = "factor"),
foo = structure(1:6,
.Label = c("A", "B", "C", "D", "E", "F"),
class = "factor")),
.Names = c("id", "p.code", "p.label", "foo"),
label.table = structure(list(id = NULL,
p.code = NULL,
p.label = structure(c("1", "2", "3", "4", "5"),
.Names = c("0", "Financial analysts",
"<NA>", "Nurses",
"Optometrists")),
foo = structure(1:6,
.Names = c("A", "B", "C", "D", "E", "F"))),
.Names = c("id", "p.code", "p.label", "foo")),
variable.labels = structure(c("id !##$%^", "Profession code",
"Profession with human readable information",
"New crazy name for 'foo'"),
.Names = c("id", "p.code", "p.label", "foo")),
codepage = 65001L)
Compare the above with the output of dput for your sample dataset. Notice that label.table and variable.labels have been added, and a line that said something like row.names = c(NA, -6L), class = "data.frame" was removed.
Update
NOTE: This will not work with the default write.foreign function in R. To test this you first need to load the write.SPSS function shared here, and (of course), make sure that you have the foreign package loaded. Then, you write your files as follows:
write.SPSS(df, datafile="df.sav", codefile="df.sps")

read.table from write.table in R

I'm trying to do a qdap::multigsub in order to fix some typos, misspelled names, variant expressions and some other "aberrations" in a list of climatic event types (yes, it's the NOAA's data set on storms that belongs to an assignment in a coursera class on reproducible research; although this fixing is neither required nor expected in the assignment: it's me trying my best!).
So I have events named "flash flood", "flash flooding", "flash floods" and the like, and I'd like to group them all in a level called "flash flood". So what I did first was:
expr <- c("^flash.*floo.*","thun.*")
repl <- c("flash flood","thunderstorm")
Length of each vector is 51 and this is a knitr assignment, so in order to keep it readable (margin column=80), I had to go with something like
expr <- c(expr,"new_expr_1","new_expr_2")
repl <- c(repl,"new_repl_1","new_repl_2") # repeated many, many times
Which makes the code kind of messy. Of course, I have the complete expr and repl vectors, so I would like to have each pair (expr and repl) of correspondent values in a row, so the reader of the code would have an easy time (that's why dput won't work here: they don't align each pair of values).
I tried this:
a <- data.frame(expr=expr,repl=repl)
print(a,rownames=FALSE)
# copying the output, and then
b <- read.table(header=TRUE,text="paste_text_here")
but it failed (I think because print throws the output without quotation marks and there are some two-word expr or repl). I also tried
write.table(a,rownames=FALSE)
# copying the output, and then
b <- read.table(header=TRUE,text="paste_text_here")
but it doesn't work either (I think because write.table outputs each item between quotes, and read.table finds too many quotation marks to handle).
I'd like to have in my Rmarkdown file something like this:
exprRepl <- read.table(header=TRUE,text="expr repl
expr_1 repl_1
expr_2 repl_2")
How can I achieve this from the data I have now?
dput of the first 5 rows of data frame follow:
> dput(a[1:5,])
structure(list(expr = structure(c(5L, 1L, 2L, 3L, 4L), .Label = c("^BLIZZARD.*",
"^FLASH.*FLOOD.*", "^HAIL.*", "^HEAVY.*RAIN.*", "^HURRICANE.*"
), class = "factor"), repl = structure(c(5L, 1L, 2L, 3L, 4L), .Label = c("BLIZZARD",
"FLASH FLOOD", "HAIL", "HEAVY RAIN", "HURRICANE"), class = "factor")), .Names = c("expr",
"repl"), row.names = c(NA, 5L), class = "data.frame")
If there's any other approach to replace the wrong/variant names, I'd be very happy to hear from it and give it a try!
One solution is to use a singe quote ' around the pasted text (this works as long as there are no ' in your data):
d <- structure(list(expr = structure(c(5L, 1L, 2L, 3L, 4L), .Label = c("^BLIZZARD.*",
"^FLASH.*FLOOD.*", "^HAIL.*", "^HEAVY.*RAIN.*", "^HURRICANE.*"
), class = "factor"), repl = structure(c(5L, 1L, 2L, 3L, 4L), .Label = c("BLIZZARD",
"FLASH FLOOD", "HAIL", "HEAVY RAIN", "HURRICANE"), class = "factor")), .Names = c("expr",
"repl"), row.names = c(NA, 5L), class = "data.frame")
write.table(d, row.names=FALSE)
# copy paste output of write.table in text field below:
read.table(header = TRUE, text='"expr" "repl"
"^HURRICANE.*" "HURRICANE"
"^BLIZZARD.*" "BLIZZARD"
"^FLASH.*FLOOD.*" "FLASH FLOOD"
"^HAIL.*" "HAIL"
"^HEAVY.*RAIN.*" "HEAVY RAIN"')

information from `label attribute` in R to `VARIABLE LABELS` in SPSS

I'm working in R, but I need to deliver some data in SPSS format with both 'variable labels' and 'value labels' and I'm kinda stuck.
I've added variable labels to my data using the Hmisc's label function. This add the variable labels as a label attribute, which is handy when using describe() from the Hmisc package. The problem is that I cannot get the write.foreign() function, from the foreign package, to recognize these labels as variable labels. I imagine I need to modify write.foreign() to use the label attribute as variable label when writing the .sps file.
I looked at the R list and at stackoverflow, but I could only find a post from 2006 on the R list regarding exporting varibles labels to SPSS from R and it doesn't seem to answer my question.
Here is my working example,
# First I create a dummy dataset
df <- data.frame(id = c(1:6), p.code = c(1, 5, 4, NA, 0, 5),
p.label = c('Optometrists', 'Nurses', 'Financial analysts',
'<NA>', '0', 'Nurses'), foo = LETTERS[1:6])
# Second, I add some variable labels using label from the Hmisc package
# install.packages('Hmisc', dependencies = TRUE)
library(Hmisc)
label(df) <- "Sweet sweet data"
label(df$id) <- "id !##$%^"
label(df$p.label) <- "Profession with human readable information"
label(df$p.code) <- "Profession code"
label(df$foo) <- "Variable label for variable x.var"
# modify the name of one varibes, just to see what happens when exported.
names(df)[4] <- "New crazy name for 'foo'"
# Third I export the data with write.foreign from the foreign package
# install.packages('foreign', dependencies = TRUE)
setwd('C:\\temp')
library(foreign)
write.foreign(df,"df.wf.txt","df.wf.sps", package="SPSS")
list.files()
[1] "df.wf.sps" "df.wf.txt"
When I inspect the .sps file (see the content of 'df.wf.sps' below) my variable labels are identical to my variable names, except for foo that I renamed to "New crazy name for 'foo'." This variable has a new and seemly random name, but the correct variable label.
Does anyone know how to get the label attributes and the variable names exported as 'variable labels' and 'labels names' into a .sps file? Maybe there is a smarter way to store 'variable labels' then my current method?
Any help would be greatly appreciated.
Thanks, Eric
Content of 'df.wf.sps' export using write.foreign from the foreign package
DATA LIST FILE= "df.wf.txt" free (",")
/ id p.code p.label Nwcnf.f. .
VARIABLE LABELS
id "id"
p.code "p.code"
p.label "p.label"
Nwcnf.f. "New crazy name for 'foo'"
.
VALUE LABELS
/
p.label
1 "0"
2 "Financial analysts"
3 "Nurses"
4 "Optometrists"
/
Nwcnf.f.
1 "A"
2 "B"
3 "C"
4 "D"
5 "E"
6 "F"
.
EXECUTE.
Update April 16 2012 at 15:54:24 PDT;
What I am looking for is a way to tweak write.foreign to write a .sps file where this part,
[…]
VARIABLE LABELS
id "id"
p.code "p.code"
p.label "p.label"
Nwcnf.f. "New crazy name for 'foo'"
[…]
looks like this,
[…]
VARIABLE LABELS
id "id !##$%^"
p.code "Profession code"
p.label "Profession with human readable information"
"New crazy name for 'foo'" "New crazy name for 'foo'"
[…]
The last line is a bit ambitious, I don't really need to have a variables with white spaces in the names, but I would like the label attributes to be transferred to the .spas file (that I produce with R).
Try this function and see if it works for you. If not, add a comment and I can see what I can do as far as troubleshooting goes.
# Step 1: Make a backup of your data, just in case
df.orig = df
# Step 2: Load the following function
get.var.labels = function(data) {
a = do.call(llist, data)
tempout = vector("list", length(a))
for (i in 1:length(a)) {
tempout[[i]] = label(a[[i]])
}
b = unlist(tempout)
structure(c(b), .Names = names(data))
}
# Step 3: Apply the variable.label attributes
attributes(df)$variable.labels = get.var.labels(df)
# Step 4: Load the write.SPSS function available from
# https://stat.ethz.ch/pipermail/r-help/2006-January/085941.html
# Step 5: Write your SPSS datafile and codefile
write.SPSS(df, "df.sav", "df.sps")
The above example is assuming that your data is named df, and you have used Hmisc to add labels, as you described in your question.
Update: A Self-Contained Function
If you do not want to alter your original file, as in the example above, and if you are connected to the internet while you are using this function, you can try this self-contained function:
write.Hmisc.SPSS = function(data, datafile, codefile) {
a = do.call(llist, data)
tempout = vector("list", length(a))
for (i in 1:length(a)) {
tempout[[i]] = label(a[[i]])
}
b = unlist(tempout)
label.temp = structure(c(b), .Names = names(data))
attributes(data)$variable.labels = label.temp
source("http://dl.dropbox.com/u/2556524/R%20Functions/writeSPSS.R")
write.SPSS(data, datafile, codefile)
}
Usage is simple:
write.Hmisc.SPSS(df, "df.sav", "df.sps")
The function that you linked to (here) should work, but I think the problem is that your dataset doesn't actually have the variable.label and label.table attributes that would be needed to write the SPSS script file.
I don't have access to SPSS, but try the following and see if it at least points you in the right direction. Unfortunately, I don't see an easy way to do this other than editing the output of dput manually.
df = structure(list(id = 1:6,
p.code = c(1, 5, 4, NA, 0, 5),
p.label = structure(c(5L, 4L, 2L, 3L, 1L, 4L),
.Label = c("0", "Financial analysts",
"<NA>", "Nurses",
"Optometrists"),
class = "factor"),
foo = structure(1:6,
.Label = c("A", "B", "C", "D", "E", "F"),
class = "factor")),
.Names = c("id", "p.code", "p.label", "foo"),
label.table = structure(list(id = NULL,
p.code = NULL,
p.label = structure(c("1", "2", "3", "4", "5"),
.Names = c("0", "Financial analysts",
"<NA>", "Nurses",
"Optometrists")),
foo = structure(1:6,
.Names = c("A", "B", "C", "D", "E", "F"))),
.Names = c("id", "p.code", "p.label", "foo")),
variable.labels = structure(c("id !##$%^", "Profession code",
"Profession with human readable information",
"New crazy name for 'foo'"),
.Names = c("id", "p.code", "p.label", "foo")),
codepage = 65001L)
Compare the above with the output of dput for your sample dataset. Notice that label.table and variable.labels have been added, and a line that said something like row.names = c(NA, -6L), class = "data.frame" was removed.
Update
NOTE: This will not work with the default write.foreign function in R. To test this you first need to load the write.SPSS function shared here, and (of course), make sure that you have the foreign package loaded. Then, you write your files as follows:
write.SPSS(df, datafile="df.sav", codefile="df.sps")

Resources