I'm working with a bunch of SAS datasets and I prefer the variable names to all be lowercase, using read.sas7bdat, but I do it so often I wanted to write a function. This method works fine,
df <- data.frame(ALLIGATOR=1:4, BLUEBIRD=rnorm(4))
names(file1) <- tolower(names(file1))
but when I try to put it into a function it doesn't assign.
lower <- function (df) {names(df) <- tolower(names(df))}
lower(file1)
I know that there is some larger concept that I'm missing, that is blocking me. It doesn't seem to do anything.
Arguments in R are passed by copy. You have to do:
lower <- function (df) {
names(df) <- tolower(names(df))
df
}
file1 <- lower(file1)
Although I don't see why you would do this rather than simply : names(df) <- tolower(names(df)), I think you should do:
lower <- function (x) {tolower(names(x))}
names(df) <- lower(df)
Here is an answer that I don't recommend using anywhere other than the globalenvironment but it does provide you some convenient shorthand. Basically we take care of the assignment inside the function, overwriting the object passed to it. Short-hand for you, but please be careful about how you use it:
tl <- function(x){
ass <- all.names(match.call()[-1])
assign( ass , setNames( x , tolower(names(x))) , env = sys.frame(sys.parent()) )
}
# This will 'overwrite' the names of df...
tl(df)
# Check what df now looks like...
df
alligator bluebird
1 1 0.2850386
2 2 -0.9570909
3 3 -1.3048907
4 4 -0.9077282
Related
I need to run through a large data frame and extract a vector with the name of the variables that are numeric type.
I've got stuck in my code, perhaps someone could point me to a solution.
This is how far I have got:
numericVarNames <- function(df) {
numeric_vars<-c()
for (i in colnames(df)) {
if (is.numeric(df[i])) {
numeric_vars <- c(numeric_vars, colnames(df)[i])
message(numeric_vars[i])
}
}
return(numeric_vars)
}
To run it:
teste <-numericVarNames(semWellComb)
The is.numeric assertion is not working. There is something wrong with my syntax for catching the type of each column. What is wrong?
Rather than a looping function, how about
df <- data.frame(a = c(1,2,3),
b = c("a","b","c"),
c = c(4,5,6))
## names(df)[sapply(df, class) == "numeric"]
## updated to be 'safer'
names(df)[sapply(df, is.numeric)]
[1] "a" "c"
## As variables can have multiple classes
This question is worth a read
Without test data it is hard to be sure, but it looks like there is just a "grammar" issue in your code.
You wrote:
numeric_vars <- c(numeric_vars, colnames(df)[i])
The way to get the column name into the concatenated list is to include the whole referred to subset in the parentheses:
numeric_vars <- c(numeric_vars, colnames(df[i]))
Try running it with that one change and see what you get.
I'm trying to replicate solution on applying multiple functions in sapply posted on R-Bloggers but I can't get it to work in the desired manner. I'm working with a simple data set, similar to the one generated below:
require(datasets)
crs_mat <- cor(mtcars)
# Triangle function
get_upper_tri <- function(cormat){
cormat[lower.tri(cormat)] <- NA
return(cormat)
}
require(reshape2)
crs_mat <- melt(get_upper_tri(crs_mat))
I would like to replace some text values across columns Var1 and Var2. The erroneous syntax below illustrates what I am trying to achieve:
crs_mat[,1:2] <- sapply(crs_mat[,1:2], function(x) {
# Replace first phrase
gsub("mpg","MPG",x),
# Replace second phrase
gsub("gear", "GeArr",x)
# Ideally, perform other changes
})
Naturally, the code is not syntactically correct and fails. To summarise, I would like to do the following:
Go through all the values in first two columns (Var1 and Var2) and perform simple replacements via gsub.
Ideally, I would like to avoid defining a separate function, as discussed in the linked post and keep everything within the sapply syntax
I don't want a nested loop
I had a look at the broadly similar subject discussed here and here but, if possible, I would like to avoid making use of plyr. I'm also interested in replacing the column values not in creating new columns and I would like to avoid specifying any column names. While working with my existing data frame it is more convenient for me to use column numbers.
Edit
Following very useful comments, what I'm trying to achieve can be summarised in the solution below:
fun.clean.columns <- function(x, str_width = 15) {
# Make character
x <- as.character(x)
# Replace various phrases
x <- gsub("perc85","something else", x)
x <- gsub("again", x)
x <- gsub("more","even more", x)
x <- gsub("abc","ohmg", x)
# Clean spaces
x <- trimws(x)
# Wrap strings
x <- str_wrap(x, width = str_width)
# Return object
return(x)
}
mean_data[,1:2] <- sapply(mean_data[,1:2], fun.clean.columns)
I don't need this function in my global.env so I can run rm after this but even nicer solution would involve squeezing this within the apply syntax.
We can use mgsub from library(qdap) to replace multiple patterns. Here, I am looping the first and second column using lapply and assign the results back to the crs_mat[,1:2]. Note that I am using lapply instead of sapply as lapply keeps the structure intact
library(qdap)
crs_mat[,1:2] <- lapply(crs_mat[,1:2], mgsub,
pattern=c('mpg', 'gear'), replacement=c('MPG', 'GeArr'))
Here is a start of a solution for you, I think you're capable of extending it yourself. There's probably more elegant approaches available, but I don't see them atm.
crs_mat[,1:2] <- sapply(crs_mat[,1:2], function(x) {
# Replace first phrase
step1 <- gsub("mpg","MPG",x)
# Replace second phrase. Note that this operates on a modified dataframe.
step2 <- gsub("gear", "GeArr",step1)
# Ideally, perform other changes
return(step2)
#or one nested line, not practical if more needs to be done
#return(gsub("gear", "GeArr",gsub("mpg","MPG",x)))
})
I have an initial variable:
a = c(1,2,3)
attr(a,'name') <- 'numbers'
Now I want to create a new variable that is a subset of a and then have it have the same attributes as a. Is there like a copy.over.attr function or something around that does this without me having to go inside and identify which one is user defined attributes etc. This gets complicated when I have numerous attributes attached to a single variable.
It should be used with caution and care. There is mostattributes<-, which receives a list and attempts to set the attributes in the list to the object in its argument. At the very least, reading the source code will give you some nice ideas on how to check attributes between objects. Here's a little run on your sample a vector. It succeeds since it's not violating any properties of b
a = c(1,2,3)
attr(a,'name') <- 'numbers'
b <- a[-1]
attributes(b)
# NULL
mostattributes(b) <- attributes(a)
attributes(b)
# $name
# [1] "numbers"
Here's a sample of the source code where names are checked.
if (h.nam <- !is.na(inam <- match("names", names(value)))) {
n1 <- value[[inam]]
value <- value[-inam]
}
if (h.dim <- !is.na(idin <- match("dim", names(value)))) {
d1 <- value[[idin]]
value <- value[-idin]
}
if (h.dmn <- !is.na(idmn <- match("dimnames", names(value)))) {
dn1 <- value[[idmn]]
value <- value[-idmn]
}
attributes(obj) <- value
There is also attr.all.equal. It's not the operation you want, but I think you would benefit from reading that source code too. There are many good checks you can learn about in that one.
Wouldn't a simple attributes(b) <- attributes(a) work?
This will just be executed after creating b from a subset of the data in a, so it's not really a single statement, but should work.
Using R. Is there a way that I can give R any text string and it will treat it like a formula?
An example says it all.
a <- 1
b <- 2
c <- 3
d <- 4
What if I had to do this all the way up to z?
In R we can write:
letters[1]
This gives us an "a"
So what about something like this:
(It doesn't work but I'd like to do something like this)
for (i in 1:4) {
letters[i] <- i
}
There's the as.formula function but that's only good for formulas like a ~ b + c.
Thanks.
If you want to evaluate a text :
eval(parse(text="a<-1"))
But if you want to initialize many variables, you can create a named list and convert it to a separate variables (attach each components to the global environment) using list2env, but I would highly recommend that you keep your variables in the same list.
xx <- letters[1:5]
list2env(setNames(seq_along(xx), xx), .GlobalEnv)
I have some data.frames
dat1=read.table...
dat2=read.table...
dat3=read.table...
And I would to count the rows for each data set. So
the names are saved like this (cannot "change" it) vector=c("dat1","dat2","dat3...)
p <- vector(numeric, length=1:length(dat))
counting <- function(x) {for (i in 1:x){
p[i]<-nrow(dat[i])}
return(p)
}
This is not working because the input for nrow is a character, but i need integer(?) or?
Thx for help
You can use get for this, but be careful! Instead reading the tables at a list is the R-ish way:
file.names <- list.files()
dat <- lapply(file.names, read.table)
Then you have all the conveniences of lapply and the apply family at your disposal, e.g.:
lapply(dat, nrow)
The solution using get (also vector is a bad variable name since its a very important function):
lapply(vector, function(x) nrow(get(x)))
Your method fails since there is no object called dat to index into. The for loop could look like:
p = NULL
for(v in vector) {
p <- c(p, nrow(get(v)))
}
But that technique is poor form for lotsa reasons...
If you want to determine properties of items you know to be in the .GlobalEnv, this works:
> sapply( c("A","B"), function(objname) nrow(.GlobalEnv[[objname]]) )
A B
5 4
You could substitute any character vector for c("A","B")`. If the object is not in the global environment it just returns NULL, so it's reasonably robust.