I have a data frame with 383 variables. Because the names of the variables are long and self-explanatory, I would like to add these names to the labels of variables, then in a second step (already successfully done), I would rename variables for easier coding. I have tried the following with the error:
library(expss)
REGCON_CA_FIRM <- apply_labels(REGCON_CA_FIRM,names(REGCON_CA_FIRM)<-names(REGCON_CA_FIRM))
# Error in if (curr_name %in% data_names) { : argument is of length zero
A one-liner using mtcars:
do.call(apply_labels, c(list(data=mtcars),setNames(names(mtcars), names(mtcars)) %>% as.list()))
However, for your use case, you can create a small function as below that takes a dataframe and a vector of new names, and basically moves the current column names to labels, and replaces the original (i.e. too long) names with the new names
replace_long_with_short <- function(d,short_names) {
setNames(
do.call(apply_labels, c(list(data=d),setNames(names(df), names(df)) %>% as.list())),
short_names
)
}
Pass your dataframe to this function, along with desired new names. The function will return the frame with the original column names as labels, and the new colnames will be the desired new names:
Example: Let's say you have a data frame that looks like this:
X.is.an.important.variable Y.is.also.important
1 -0.003643385 1.1052905
2 1.641458152 0.5303247
3 -1.058337452 0.5490569
and you want those descriptive column names to be the labels, and the new names to be x and y.
Then calling the above function like this:
df = replace_long_with_short(df,c("x", "y"))
will convert df to this:
x y
1 -0.003643385 1.1052905
2 1.641458152 0.5303247
3 -1.058337452 0.5490569
and the labels will be attached:
str(df)
'data.frame': 3 obs. of 2 variables:
$ x:Class 'labelled' num -0.00364 1.64146 -1.05834
.. .. LABEL: X.is.an.important.variable
$ y:Class 'labelled' num 1.105 0.53 0.549
.. .. LABEL: Y.is.also.important
I am working with R. I found this link here on creating empty data frames in R: Create an empty data.frame .
I tried to do something similar:
df <- data.frame(Date=as.Date(character()),
country=factor(),
total=numeric(),
stringsAsFactors=FALSE)
Yet, when I try to populate it:
df$total = 7
I get the following error:
Error in `$<-.data.frame`(`*tmp*`, total, value = 7) :
replacement has 1 row, data has 0
df[1, "total"] <- rnorm(100,100,100)
Error in `[<-.data.frame`(`*tmp*`, 1, "total", value = c(-79.4584309347689, :
replacement has 100 rows, data has 1
Does anyone know how to fix this error?
Thanks
An option is to specify the row index
df[1, "total"] <- 7
-output
str(df)
#'data.frame': 1 obs. of 3 variables:
# $ Date : Date, format: NA
# $ country: Factor w/ 0 levels: NA
# $ total : num 7
The issue is that when we select a single column and assign on a 0 row dataset, it is not automatically expanding the row for other columns. By specifying the row index, other columns will automatically filled with default NA
Regarding the second question (updated), a standard data.frame column is a vector and the length of the vector should be the same as the index we are specifying. Suppose, we want to expand to 100 rows, change the index accordingly
df[1:100, "total"] <- rnorm(100, 100, 100) # length is 100 here
dim(df)
#[1] 100 3
Or if we need to cram everything in a single row, then wrap the rnorm in a list
df[1, "total"] <- list(rnorm(100, 100, 100))
In short, the lhs should be of the same length as the rhs. Another case is when we are assigning from a different dataset
df[seq_along(aa$bb), "total"] <- aa$bb
This can also be done without initialization i.e.
df <- data.frame(total = aa$bb)
I have imported a dataset which contains large numbers which were automatically converted to exponential notation. Because I had to see the full number, I used options(scipen = 999). I discovered that the imported number did not equal the original number from the dataset. For example: 5765949338897345178 was changed to 5765949338897345536.
How can it be that these numbers are not the same? The weird thing is that when I use: which(dim_alias1$id == 5765949338897345536) and which(dim_alias1$id == 5765949338897345178), it returns the same rownumber. How is this possible?
As you are using the variable as an id number, it doesn't need to be numeric. So set the column class to character when reading in.
Example:
dat <- data.frame(id=12345, x=1)
write.table(dat, tmp <- tempfile())
dat2 <- read.table(tmp, colClasses = c(id="character"))
str(dat2)
#'data.frame': 1 obs. of 2 variables:
# $ id: chr "12345"
# $ x : int 1
I have a table source that reads into a data frame. I know that by default, external sources are read into data frames as factors. I'd like to apply stringsAsFactors=FALSE in the data frame call below, but it throws an error when I do this. Can I still use chaining and turn stringsAsFactors=FALSE?
library(rvest)
pvbData <- read_html(pvbURL)
pvbDF <- pvbData %>%
html_nodes(xpath = `//*[#id="ajax_result_table"]`) %>%
html_table() %>%
data.frame()
data.frame(,stringsAsFactors=FALSE) <- Throws an error
I know this is probably something very simple, but I'm having trouble finding a way to make this work. Thank you for your help.
Though the statement should logically be data.frame(stringsAsFactors=FALSE) if you are applying chaining, even this statement doesn't produce the required output.
The reason is misunderstanding of use of stringsAsFactors option. This option works only if you make the data.frame column by column. Example:
a <- data.frame(x = c('a','b'),y=c(1,2),stringsAsFactors = T)
str(a)
'data.frame': 2 obs. of 2 variables:
$ x: Factor w/ 2 levels "a","b": 1 2
$ y: num 1 2
a <- data.frame(x = c('a','b'),y=c(1,2),stringsAsFactors = F)
str(a)
'data.frame': 2 obs. of 2 variables:
$ x: chr "a" "b"
$ y: num 1 2
If you give data.frame as input, stringsAsFactors option doesn't work
Solution:
Store the chaining result to a variable like this:
library(rvest)
pvbData <- read_html(pvbURL)
pvbDF <- pvbData %>%
html_nodes(xpath = `//*[#id="ajax_result_table"]`) %>%
html_table()
And then apply this command:
data.frame(as.list(pvbDF),stringsAsFactors=F)
Update:
If the column is already a factor, then you can't convert it to character vector using this command. Better first as.character it and retry.
You may refer to Change stringsAsFactors settings for data.frame for more details.
I am just starting with R and encountered a strange behaviour: when inserting the first row in an empty data frame, the original column names get lost.
example:
a<-data.frame(one = numeric(0), two = numeric(0))
a
#[1] one two
#<0 rows> (or 0-length row.names)
names(a)
#[1] "one" "two"
a<-rbind(a, c(5,6))
a
# X5 X6
#1 5 6
names(a)
#[1] "X5" "X6"
As you can see, the column names one and two were replaced by X5 and X6.
Could somebody please tell me why this happens and is there a right way to do this without losing column names?
A shotgun solution would be to save the names in an auxiliary vector and then add them back when finished working on the data frame.
Thanks
Context:
I created a function which gathers some data and adds them as a new row to a data frame received as a parameter.
I create the data frame, iterate through my data sources, passing the data.frame to each function call to be filled up with its results.
The rbind help pages specifies that :
For ‘cbind’ (‘rbind’), vectors of zero
length (including ‘NULL’) are ignored
unless the result would have zero rows
(columns), for S compatibility.
(Zero-extent matrices do not occur in
S3 and are not ignored in R.)
So, in fact, a is ignored in your rbind instruction. Not totally ignored, it seems, because as it is a data frame the rbind function is called as rbind.data.frame :
rbind.data.frame(c(5,6))
# X5 X6
#1 5 6
Maybe one way to insert the row could be :
a[nrow(a)+1,] <- c(5,6)
a
# one two
#1 5 6
But there may be a better way to do it depending on your code.
was almost surrendering to this issue.
1) create data frame with stringsAsFactor set to FALSE or you run straight into the next issue
2) don't use rbind - no idea why on earth it is messing up the column names. simply do it this way:
df[nrow(df)+1,] <- c("d","gsgsgd",4)
df <- data.frame(a = character(0), b=character(0), c=numeric(0))
df[nrow(df)+1,] <- c("d","gsgsgd",4)
#Warnmeldungen:
#1: In `[<-.factor`(`*tmp*`, iseq, value = "d") :
# invalid factor level, NAs generated
#2: In `[<-.factor`(`*tmp*`, iseq, value = "gsgsgd") :
# invalid factor level, NAs generated
df <- data.frame(a = character(0), b=character(0), c=numeric(0), stringsAsFactors=F)
df[nrow(df)+1,] <- c("d","gsgsgd",4)
df
# a b c
#1 d gsgsgd 4
Workaround would be:
a <- rbind(a, data.frame(one = 5, two = 6))
?rbind states that merging objects demands matching names:
It then takes the classes of the
columns from the first data frame, and
matches columns by name (rather than
by position)
FWIW, an alternative design might have your functions building vectors for the two columns, instead of rbinding to a data frame:
ones <- c()
twos <- c()
Modify the vectors in your functions:
ones <- append(ones, 5)
twos <- append(twos, 6)
Repeat as needed, then create your data.frame in one go:
a <- data.frame(one=ones, two=twos)
One way to make this work generically and with the least amount of re-typing the column names is the following. This method doesn't require hacking the NA or 0.
rs <- data.frame(i=numeric(), square=numeric(), cube=numeric())
for (i in 1:4) {
calc <- c(i, i^2, i^3)
# append calc to rs
names(calc) <- names(rs)
rs <- rbind(rs, as.list(calc))
}
rs will have the correct names
> rs
i square cube
1 1 1 1
2 2 4 8
3 3 9 27
4 4 16 64
>
Another way to do this more cleanly is to use data.table:
> df <- data.frame(a=numeric(0), b=numeric(0))
> rbind(df, list(1,2)) # column names are messed up
> X1 X2
> 1 1 2
> df <- data.table(a=numeric(0), b=numeric(0))
> rbind(df, list(1,2)) # column names are preserved
a b
1: 1 2
Notice that a data.table is also a data.frame.
> class(df)
"data.table" "data.frame"
You can do this:
give one row to the initial data frame
df=data.frame(matrix(nrow=1,ncol=length(newrow))
add your new row and take out the NAS
newdf=na.omit(rbind(newrow,df))
but watch out that your newrow does not have NAs or it will be erased too.
Cheers
Agus
I use the following solution to add a row to an empty data frame:
d_dataset <-
data.frame(
variable = character(),
before = numeric(),
after = numeric(),
stringsAsFactors = FALSE)
d_dataset <-
rbind(
d_dataset,
data.frame(
variable = "test",
before = 9,
after = 12,
stringsAsFactors = FALSE))
print(d_dataset)
variable before after
1 test 9 12
HTH.
Kind regards
Georg
Researching this venerable R annoyance brought me to this page. I wanted to add a bit more explanation to Georg's excellent answer (https://stackoverflow.com/a/41609844/2757825), which not only solves the problem raised by the OP (losing field names) but also prevents the unwanted conversion of all fields to factors. For me, those two problems go together. I wanted a solution in base R that doesn't involve writing extra code but preserves the two distinct operations: define the data frame, append the row(s)--which is what Georg's answer provides.
The first two examples below illustrate the problems and the third and fourth show Georg's solution.
Example 1: Append the new row as vector with rbind
Result: loses column names AND coverts all variables to factors
my.df <- data.frame(
table = character(0),
score = numeric(0),
stringsAsFactors=FALSE
)
my.df <- rbind(
my.df,
c("Bob", 250)
)
my.df
X.Bob. X.250.
1 Bob 250
str(my.df)
'data.frame': 1 obs. of 2 variables:
$ X.Bob.: Factor w/ 1 level "Bob": 1
$ X.250.: Factor w/ 1 level "250": 1
Example 2: Append the new row as a data frame inside rbind
Result: keeps column names but still converts character variables to factors.
my.df <- data.frame(
table = character(0),
score = numeric(0),
stringsAsFactors=FALSE
)
my.df <- rbind(
my.df,
data.frame(name="Bob", score=250)
)
my.df
name score
1 Bob 250
str(my.df)
'data.frame': 1 obs. of 2 variables:
$ name : Factor w/ 1 level "Bob": 1
$ score: num 250
Example 3: Append the new row inside rbind as a data frame, with stringsAsFactors=FALSE
Result: problem solved.
my.df <- data.frame(
table = character(0),
score = numeric(0),
stringsAsFactors=FALSE
)
my.df <- rbind(
my.df,
data.frame(name="Bob", score=250, stringsAsFactors=FALSE)
)
my.df
name score
1 Bob 250
str(my.df)
'data.frame': 1 obs. of 2 variables:
$ name : chr "Bob"
$ score: num 250
Example 4: Like example 3, but adding multiple rows at once.
my.df <- data.frame(
table = character(0),
score = numeric(0),
stringsAsFactors=FALSE
)
my.df <- rbind(
my.df,
data.frame(
name=c("Bob", "Carol", "Ted"),
score=c(250, 124, 95),
stringsAsFactors=FALSE)
)
str(my.df)
'data.frame': 3 obs. of 2 variables:
$ name : chr "Bob" "Carol" "Ted"
$ score: num 250 124 95
my.df
name score
1 Bob 250
2 Carol 124
3 Ted 95
Instead of constructing the data.frame with numeric(0) I use as.numeric(0).
a<-data.frame(one=as.numeric(0), two=as.numeric(0))
This creates an extra initial row
a
# one two
#1 0 0
Bind the additional rows
a<-rbind(a,c(5,6))
a
# one two
#1 0 0
#2 5 6
Then use negative indexing to remove the first (bogus) row
a<-a[-1,]
a
# one two
#2 5 6
Note: it messes up the index (far left). I haven't figured out how to prevent that (anyone else?), but most of the time it probably doesn't matter.