Related
I am new to programming and am attempting to create a prediction model for multiple articles.
Unfortunately, using Excel or similar software is not possible for this task. Therefore, I have installed Rstudio to solve this problem. My goal is to make a 18-month prediction for each article in my dataset using an ARIMA model.
However, I am currently facing an issue with the format of my data frame. Specifically, I am unsure of how my CSV should be structured to be read by my code.
I have attached an image of my current dataset in CSV format : https://i.stack.imgur.com/AQJx1.png
Here is my dput(sales_data) :
structure(list(X.Article.1.Article.2.Article.3 = c("janv-19;42;49;55", "f\xe9vr-19;56;58;38", "mars-19;55;59;76")), class = "data.frame", row.names = c(NA, -3L))
And also provided the code I have constructed so far with the help of blogs and websites :
library(forecast)
library(reshape2)
sales_data <- read.csv("sales_data.csv", header = TRUE)
sales_data_long <- reshape2::melt(sales_data, id.vars = "Code Article")
for(i in 1:nrow(sales_data_long)) {
sales_data_article <- subset(sales_data_long, sales_data_long$`Code Article` == sales_data_long[i,"Code Article"])
sales_ts <- ts(sales_data_article$value, start = c(2010,6), frequency = 12)
arima_fit <- auto
arima_forecast <- forecast(arima_fit, h = 18)
print(arima_forecast)
print("Article: ", Code article[i])
}
With this code, RStudio gives me the following error : "Error: id variables not found in data: Code Article"
Currently, I am not interested in generating any plots or outputs. My main focus is on identifying the appropriate format for my data.
Do I need to modify my CSV file and separate each column using "," or ";"? Or, can I keep my data in its current format and make adjustments in the code instead?
Added the dput output as per jrcalabrese request.
Swapped to the replacement for reshape2 (tidyr).
Used pivot_longer.
Now doesn't give error, which was happening in reshape2::melt.
It doesn't matter so much what the csv structure is. Your structure was fine.
Hope this helps! :-)
library(tidyr)
sales_data <- structure(list(var1 = c("Article 1", "Article 2", "Article 3"),
`janv-19` = c(42, 56, 55),
`fev-19` = c(49, 58, 59),
`mars-19` = c(55, 38, 76)),
row.names = c(NA, 3L), class = "data.frame")
sales_data_long <- sales_data |> pivot_longer(!var1,
names_to = "month",
values_to = "count")
When trying to merge two columns (pre and post) in a kwic dataframe created with the quanteda package, the resulting data frame contains only NA values. Using the paste() function from base R works perfectly fine, but I'd rather solve this issue with a tidy approach. Has anyone else experienced this before and knows what to do?
I'm including a reprex below, but unfortunately, in the reprex the unite function works perfectly fine. I'm wondering if it's related to the input being a data frame created with quanteda::kwic?
pre = c("Pre Text 1", "Pre Text 2", "Pre Text 3")
post = c("Post Text 1", "Post Text 2", "Post Text 3")
data <- data.frame(id=1:3,
pre = pre,
post = post)
data2 <- data %>%
unite("merged", pre, post, sep = " ")
EDIT: I'm including a better example in the code below. "x" is a data frame that resulted from applying kwic() to my dataset, and speeches_meta is metadata associated with the texts contained in "x". My issue is that when running the unite function on the "dput" object, it somehow doubles the amount of variables and all of the observations except for two are empty (with the two that aren't containing a bunch of information from all variables).
merged_kwic <- left_join(x, speeches_meta, by = "docname")
dput <- dput(merged_kwic[1:3, c("pre", "post")])
dput <- dput %>%
unite("merged", pre, post, sep = " ")
EDIT 2:
The following is the output I get after running the following code:
dput(merged_kwic[1:3, c("pre", "post")])
structure(list(docname = c("585662", "586622", "650973"), from = c(377L,
1665L, 562L), to = c(377L, 1665L, 562L), pre = c("5 Dies kann weder durch",
"tief in die Mottenkiste der", "unterstellen dass es ihnen um"
), keyword = c("Ostalgie", "Ostalgie", "Ostalgie"), post = c("noch durch Amnesie durch Gedächtnisverlust",
"greifen würden 33 An dieser", "geht um eine Werbung für"),
pattern = structure(c(1L, 1L, 1L), .Label = "ostalgie", class = "factor"),
id = c(585662, 586622, 650973), session = c(241, 245, 56),
electoralTerm = c(13, 13, 15), firstName = c("Dietrich",
"werner", "Vera"), lastName = c("Austermann", "schulz", "Lengsfeld"
), politicianId = c(11000066, 11002108, 11002721), factionId = c(4,
3, 4), documentUrl = c("https://dip21.bundestag.de/dip21/btp/13/13241.pdf",
"https://dip21.bundestag.de/dip21/btp/13/13245.pdf", "https://dip21.bundestag.de/dip21/btp/15/15056.pdf"
), positionShort = c("Member of Parliament", "Member of Parliament",
"Member of Parliament"), positionLong = c(NA_character_,
NA_character_, NA_character_), date = structure(c(10395,
10402, 12236), class = "Date")), ntoken = c(`585662` = 839L,
`586622` = 1724L, `650973` = 647L), row.names = c(NA, 3L), class = c("kwic",
"data.frame"))
I realized the issue here is that unite() functions don't necessarily work for kwic dataframes. After piping the dataframe into as.tibble() it ended up working just fine. Hopefully this will be helpful to people in the future!
I have an easy question I can seem to figure out. I am trying to remove a column name. I am trying to make some tables with formattable and I don't like the column name that is there. Here is what the table looks like now.
df <- data.frame(
"test" = c("Average age of Diagnosis", "Average Life Space", "Average Number of Dogs Diagnosised with MR"),
"2008-2018" = c(28, 27, 30),
"Sample size" = c(12,23,34),
stringsAsFactors = FALSE, check.names=FALSE)
I want to remove column1 so I tried this
df <- data.frame(
"" = c("Average age of Diagnosis", "Average Life Space", "Average Number of Dogs Diagnosised with MR"),
"2008-2018" = c(28, 27, 30),
"Sample size" = c(12,23,34),
stringsAsFactors = FALSE, check.names=FALSE)
Error: attempt to use zero-length variable name
which is clearly related to me removing the column name. Is there a way to fix this? I just wanted to remove the name for test.
Thanks in advance!
have you tried this?
colnames(df)[1] <- ''
To solve this issue I used Kath suggestion (see comment to the original question). To fix this problem I used the following
df <- data.frame(
" " = c("Average age of Diagnosis", "Average Life Space", "Average Number of Dogs Diagnosised with MR"),
"2008-2018" = c(28, 27, 30),
"Sample size" = c(12,23,34),
stringsAsFactors = FALSE, check.names=FALSE)
It is subtly different from my original post but there is a space between the "".
I am using this ability to make some figures and I didn't want anything in that column name
I'm working in R, but I need to deliver some data in SPSS format with both 'variable labels' and 'value labels' and I'm kinda stuck.
I've added variable labels to my data using the Hmisc's label function. This add the variable labels as a label attribute, which is handy when using describe() from the Hmisc package. The problem is that I cannot get the write.foreign() function, from the foreign package, to recognize these labels as variable labels. I imagine I need to modify write.foreign() to use the label attribute as variable label when writing the .sps file.
I looked at the R list and at stackoverflow, but I could only find a post from 2006 on the R list regarding exporting varibles labels to SPSS from R and it doesn't seem to answer my question.
Here is my working example,
# First I create a dummy dataset
df <- data.frame(id = c(1:6), p.code = c(1, 5, 4, NA, 0, 5),
p.label = c('Optometrists', 'Nurses', 'Financial analysts',
'<NA>', '0', 'Nurses'), foo = LETTERS[1:6])
# Second, I add some variable labels using label from the Hmisc package
# install.packages('Hmisc', dependencies = TRUE)
library(Hmisc)
label(df) <- "Sweet sweet data"
label(df$id) <- "id !##$%^"
label(df$p.label) <- "Profession with human readable information"
label(df$p.code) <- "Profession code"
label(df$foo) <- "Variable label for variable x.var"
# modify the name of one varibes, just to see what happens when exported.
names(df)[4] <- "New crazy name for 'foo'"
# Third I export the data with write.foreign from the foreign package
# install.packages('foreign', dependencies = TRUE)
setwd('C:\\temp')
library(foreign)
write.foreign(df,"df.wf.txt","df.wf.sps", package="SPSS")
list.files()
[1] "df.wf.sps" "df.wf.txt"
When I inspect the .sps file (see the content of 'df.wf.sps' below) my variable labels are identical to my variable names, except for foo that I renamed to "New crazy name for 'foo'." This variable has a new and seemly random name, but the correct variable label.
Does anyone know how to get the label attributes and the variable names exported as 'variable labels' and 'labels names' into a .sps file? Maybe there is a smarter way to store 'variable labels' then my current method?
Any help would be greatly appreciated.
Thanks, Eric
Content of 'df.wf.sps' export using write.foreign from the foreign package
DATA LIST FILE= "df.wf.txt" free (",")
/ id p.code p.label Nwcnf.f. .
VARIABLE LABELS
id "id"
p.code "p.code"
p.label "p.label"
Nwcnf.f. "New crazy name for 'foo'"
.
VALUE LABELS
/
p.label
1 "0"
2 "Financial analysts"
3 "Nurses"
4 "Optometrists"
/
Nwcnf.f.
1 "A"
2 "B"
3 "C"
4 "D"
5 "E"
6 "F"
.
EXECUTE.
Update April 16 2012 at 15:54:24 PDT;
What I am looking for is a way to tweak write.foreign to write a .sps file where this part,
[…]
VARIABLE LABELS
id "id"
p.code "p.code"
p.label "p.label"
Nwcnf.f. "New crazy name for 'foo'"
[…]
looks like this,
[…]
VARIABLE LABELS
id "id !##$%^"
p.code "Profession code"
p.label "Profession with human readable information"
"New crazy name for 'foo'" "New crazy name for 'foo'"
[…]
The last line is a bit ambitious, I don't really need to have a variables with white spaces in the names, but I would like the label attributes to be transferred to the .spas file (that I produce with R).
Try this function and see if it works for you. If not, add a comment and I can see what I can do as far as troubleshooting goes.
# Step 1: Make a backup of your data, just in case
df.orig = df
# Step 2: Load the following function
get.var.labels = function(data) {
a = do.call(llist, data)
tempout = vector("list", length(a))
for (i in 1:length(a)) {
tempout[[i]] = label(a[[i]])
}
b = unlist(tempout)
structure(c(b), .Names = names(data))
}
# Step 3: Apply the variable.label attributes
attributes(df)$variable.labels = get.var.labels(df)
# Step 4: Load the write.SPSS function available from
# https://stat.ethz.ch/pipermail/r-help/2006-January/085941.html
# Step 5: Write your SPSS datafile and codefile
write.SPSS(df, "df.sav", "df.sps")
The above example is assuming that your data is named df, and you have used Hmisc to add labels, as you described in your question.
Update: A Self-Contained Function
If you do not want to alter your original file, as in the example above, and if you are connected to the internet while you are using this function, you can try this self-contained function:
write.Hmisc.SPSS = function(data, datafile, codefile) {
a = do.call(llist, data)
tempout = vector("list", length(a))
for (i in 1:length(a)) {
tempout[[i]] = label(a[[i]])
}
b = unlist(tempout)
label.temp = structure(c(b), .Names = names(data))
attributes(data)$variable.labels = label.temp
source("http://dl.dropbox.com/u/2556524/R%20Functions/writeSPSS.R")
write.SPSS(data, datafile, codefile)
}
Usage is simple:
write.Hmisc.SPSS(df, "df.sav", "df.sps")
The function that you linked to (here) should work, but I think the problem is that your dataset doesn't actually have the variable.label and label.table attributes that would be needed to write the SPSS script file.
I don't have access to SPSS, but try the following and see if it at least points you in the right direction. Unfortunately, I don't see an easy way to do this other than editing the output of dput manually.
df = structure(list(id = 1:6,
p.code = c(1, 5, 4, NA, 0, 5),
p.label = structure(c(5L, 4L, 2L, 3L, 1L, 4L),
.Label = c("0", "Financial analysts",
"<NA>", "Nurses",
"Optometrists"),
class = "factor"),
foo = structure(1:6,
.Label = c("A", "B", "C", "D", "E", "F"),
class = "factor")),
.Names = c("id", "p.code", "p.label", "foo"),
label.table = structure(list(id = NULL,
p.code = NULL,
p.label = structure(c("1", "2", "3", "4", "5"),
.Names = c("0", "Financial analysts",
"<NA>", "Nurses",
"Optometrists")),
foo = structure(1:6,
.Names = c("A", "B", "C", "D", "E", "F"))),
.Names = c("id", "p.code", "p.label", "foo")),
variable.labels = structure(c("id !##$%^", "Profession code",
"Profession with human readable information",
"New crazy name for 'foo'"),
.Names = c("id", "p.code", "p.label", "foo")),
codepage = 65001L)
Compare the above with the output of dput for your sample dataset. Notice that label.table and variable.labels have been added, and a line that said something like row.names = c(NA, -6L), class = "data.frame" was removed.
Update
NOTE: This will not work with the default write.foreign function in R. To test this you first need to load the write.SPSS function shared here, and (of course), make sure that you have the foreign package loaded. Then, you write your files as follows:
write.SPSS(df, datafile="df.sav", codefile="df.sps")
I'm working in R, but I need to deliver some data in SPSS format with both 'variable labels' and 'value labels' and I'm kinda stuck.
I've added variable labels to my data using the Hmisc's label function. This add the variable labels as a label attribute, which is handy when using describe() from the Hmisc package. The problem is that I cannot get the write.foreign() function, from the foreign package, to recognize these labels as variable labels. I imagine I need to modify write.foreign() to use the label attribute as variable label when writing the .sps file.
I looked at the R list and at stackoverflow, but I could only find a post from 2006 on the R list regarding exporting varibles labels to SPSS from R and it doesn't seem to answer my question.
Here is my working example,
# First I create a dummy dataset
df <- data.frame(id = c(1:6), p.code = c(1, 5, 4, NA, 0, 5),
p.label = c('Optometrists', 'Nurses', 'Financial analysts',
'<NA>', '0', 'Nurses'), foo = LETTERS[1:6])
# Second, I add some variable labels using label from the Hmisc package
# install.packages('Hmisc', dependencies = TRUE)
library(Hmisc)
label(df) <- "Sweet sweet data"
label(df$id) <- "id !##$%^"
label(df$p.label) <- "Profession with human readable information"
label(df$p.code) <- "Profession code"
label(df$foo) <- "Variable label for variable x.var"
# modify the name of one varibes, just to see what happens when exported.
names(df)[4] <- "New crazy name for 'foo'"
# Third I export the data with write.foreign from the foreign package
# install.packages('foreign', dependencies = TRUE)
setwd('C:\\temp')
library(foreign)
write.foreign(df,"df.wf.txt","df.wf.sps", package="SPSS")
list.files()
[1] "df.wf.sps" "df.wf.txt"
When I inspect the .sps file (see the content of 'df.wf.sps' below) my variable labels are identical to my variable names, except for foo that I renamed to "New crazy name for 'foo'." This variable has a new and seemly random name, but the correct variable label.
Does anyone know how to get the label attributes and the variable names exported as 'variable labels' and 'labels names' into a .sps file? Maybe there is a smarter way to store 'variable labels' then my current method?
Any help would be greatly appreciated.
Thanks, Eric
Content of 'df.wf.sps' export using write.foreign from the foreign package
DATA LIST FILE= "df.wf.txt" free (",")
/ id p.code p.label Nwcnf.f. .
VARIABLE LABELS
id "id"
p.code "p.code"
p.label "p.label"
Nwcnf.f. "New crazy name for 'foo'"
.
VALUE LABELS
/
p.label
1 "0"
2 "Financial analysts"
3 "Nurses"
4 "Optometrists"
/
Nwcnf.f.
1 "A"
2 "B"
3 "C"
4 "D"
5 "E"
6 "F"
.
EXECUTE.
Update April 16 2012 at 15:54:24 PDT;
What I am looking for is a way to tweak write.foreign to write a .sps file where this part,
[…]
VARIABLE LABELS
id "id"
p.code "p.code"
p.label "p.label"
Nwcnf.f. "New crazy name for 'foo'"
[…]
looks like this,
[…]
VARIABLE LABELS
id "id !##$%^"
p.code "Profession code"
p.label "Profession with human readable information"
"New crazy name for 'foo'" "New crazy name for 'foo'"
[…]
The last line is a bit ambitious, I don't really need to have a variables with white spaces in the names, but I would like the label attributes to be transferred to the .spas file (that I produce with R).
Try this function and see if it works for you. If not, add a comment and I can see what I can do as far as troubleshooting goes.
# Step 1: Make a backup of your data, just in case
df.orig = df
# Step 2: Load the following function
get.var.labels = function(data) {
a = do.call(llist, data)
tempout = vector("list", length(a))
for (i in 1:length(a)) {
tempout[[i]] = label(a[[i]])
}
b = unlist(tempout)
structure(c(b), .Names = names(data))
}
# Step 3: Apply the variable.label attributes
attributes(df)$variable.labels = get.var.labels(df)
# Step 4: Load the write.SPSS function available from
# https://stat.ethz.ch/pipermail/r-help/2006-January/085941.html
# Step 5: Write your SPSS datafile and codefile
write.SPSS(df, "df.sav", "df.sps")
The above example is assuming that your data is named df, and you have used Hmisc to add labels, as you described in your question.
Update: A Self-Contained Function
If you do not want to alter your original file, as in the example above, and if you are connected to the internet while you are using this function, you can try this self-contained function:
write.Hmisc.SPSS = function(data, datafile, codefile) {
a = do.call(llist, data)
tempout = vector("list", length(a))
for (i in 1:length(a)) {
tempout[[i]] = label(a[[i]])
}
b = unlist(tempout)
label.temp = structure(c(b), .Names = names(data))
attributes(data)$variable.labels = label.temp
source("http://dl.dropbox.com/u/2556524/R%20Functions/writeSPSS.R")
write.SPSS(data, datafile, codefile)
}
Usage is simple:
write.Hmisc.SPSS(df, "df.sav", "df.sps")
The function that you linked to (here) should work, but I think the problem is that your dataset doesn't actually have the variable.label and label.table attributes that would be needed to write the SPSS script file.
I don't have access to SPSS, but try the following and see if it at least points you in the right direction. Unfortunately, I don't see an easy way to do this other than editing the output of dput manually.
df = structure(list(id = 1:6,
p.code = c(1, 5, 4, NA, 0, 5),
p.label = structure(c(5L, 4L, 2L, 3L, 1L, 4L),
.Label = c("0", "Financial analysts",
"<NA>", "Nurses",
"Optometrists"),
class = "factor"),
foo = structure(1:6,
.Label = c("A", "B", "C", "D", "E", "F"),
class = "factor")),
.Names = c("id", "p.code", "p.label", "foo"),
label.table = structure(list(id = NULL,
p.code = NULL,
p.label = structure(c("1", "2", "3", "4", "5"),
.Names = c("0", "Financial analysts",
"<NA>", "Nurses",
"Optometrists")),
foo = structure(1:6,
.Names = c("A", "B", "C", "D", "E", "F"))),
.Names = c("id", "p.code", "p.label", "foo")),
variable.labels = structure(c("id !##$%^", "Profession code",
"Profession with human readable information",
"New crazy name for 'foo'"),
.Names = c("id", "p.code", "p.label", "foo")),
codepage = 65001L)
Compare the above with the output of dput for your sample dataset. Notice that label.table and variable.labels have been added, and a line that said something like row.names = c(NA, -6L), class = "data.frame" was removed.
Update
NOTE: This will not work with the default write.foreign function in R. To test this you first need to load the write.SPSS function shared here, and (of course), make sure that you have the foreign package loaded. Then, you write your files as follows:
write.SPSS(df, datafile="df.sav", codefile="df.sps")