How do I make all my dataframes' variables in my environment numeric? - r

I have got a lot of data frames in my R environment and I want to do the as.numeric() function on all of the variables in the data.frames and overwrite them. I do not know how to address all of them.
The following is my attempt, but ls() seemingly just writes the name to x:
for (i in 1:length(ls())){
x <- ls()[i]
for (i in 1:length(x)){
x[i] <- as.numeric(x[i])
}
}

So, there were two helpful answers to my question. One, that was later deleted and another one by #Henrik.
The deleted one followed my approach to convert all data frames in global environment (that has an "V" in it - in my example) as numerics. This is the code:
res <- lapply(mget(ls(pattern = 'V')), \(x) {
x[] <- lapply(x, as.numeric)
return(x)
})
list2env(res, .GlobalEnv)
# Check
str(VA01.000306__ft2)
The second approach uses lists instead of multiple objects. When I have stored my multiple csv files into lists. This is the csv to list import:
F_EB_names <- list.files(pattern="*.csv")### Daten in Liste speichern?
F_EB <- lapply(F_EB_names, read.csv2)
names(F_EB) <- gsub(".wav.csv","_ft2",F_EB_names)
And this is the conversion to numerals:
F_EB <- type.convert(F_EB) # Conversion
str(F_EB) # Check
Thank you both for the help.

Related

Lists and matrix using sapply

I have a perhaps basic questions and I have searched on the web. I have a problem reading files. Though, I managed to get to read my files, following #Konrad suggestions, which I appreciate: How to get R to read in files from multiple subdirectories under one large directory?
It is a similar problem, however, I have not resolved it.
My problem:
I have large number of files of with same name ("tempo.out") in different folders. This tempo.out has 5 columns/headers. And they are all the same format with 1048 lines and 5 columns:
id X Y time temp
setwd("~/Documents/ewat")
dat.files <- list.files(path="./ress",
recursive=T,
pattern="tempo.out"
,full.names=T)
readDatFile <- function(f) {
dat.fl <- read.table(f)
}
data.filesf <- sapply(dat.files, readDatFile)
# I might not have the right sintax in sub5:
subs5 <- sapply(data.filesf,`[`,5)
matr5 <- do.call(rbind, subs5)
probs <- c(0.05,0.1,0.16,0.25,0.5,0.75,0.84,0.90,0.95,0.99)
q <- rowQuantiles(matr5, probs=probs)
print(q)
I want to extract the fifth column (temp) of each of those thousands of files and make calculations such as quantiles.
I tried first to read all subfiles in "ress"
The latter gave no error, but my main problem is the "data.filesf" is not a matrix but list, and actually the 5th column is not what I expected. Then the following:
matr5 <- do.call(rbind, subs5)
is also not giving the required values/results.
What could be the best way to get columns into what will become a huge matrix?
Try
lapply(data.filef,[,,5)
Hope this will help
Consider extending your defined function, readDatFile, to extract fifth column, temp, and assign directly to matrix with sapply or vapply (since you know ahead the needed structure -numeric matrix length equal to nrows or 1048). Then, run needed rowQuantiles:
setwd("~/Documents/ewat")
dat.files <- list.files(path="./ress",
recursive=T,
pattern="tempo.out",
full.names=T)
readDatFile <- function(f) read.table(f)$temp # OR USE read.csv(f)[[5]]
matr5 <- sapply(dat.files, readDatFile, USE.NAMES=FALSE)
# matr5 <- vapply(dat.files, readDatFile, numeric(1048), USE.NAMES=FALSE)
probs <- c(0.05,0.1,0.16,0.25,0.5,0.75,0.84,0.90,0.95,0.99)
q <- rowQuantiles(matr5, probs=probs)

How can I make a list of all dataframes that are in my global environment?

I am trying to use rbind on them. But I need a list of all the dataframes that are already in my global environment. How can I do it?
Code I used to import the 20 csv files in a directory. Basically, have to combine into a single dataframe.
temp = list.files(pattern = "*.csv")
for (i in 1:length(temp)) assign(temp[i], read.csv(temp[i]))
This function should return a proper list with all the data.frames as elements
dfs <- Filter(function(x) is(x, "data.frame"), mget(ls()))
then you can rbind them with
do.call(rbind, dfs)
Of course it's awfully silly to have a bunch of data.frames lying around that are so related that you want to rbind them. It sounds like they probably should have been in a list in the first place.
I recommend you say away from assign(), that's always a sign things are probably afoul. Try
temp <- list.files(pattern="*.csv")
dfs <- lapply(temp, read.csv)
that should return a list straight away.
From your posted code, I would recommend you start a new R session, and read the files in again with the following code
do.call(rbind, lapply(list.files(pattern = ".csv"), read.csv))
The ls function lists all things in your environment. The get function gets a variable with a given name. You can use the class function to get the class of a variable.
If you put them all together, you can do this:
ls()[sapply(ls(), function(x) class(get(x))) == 'data.frame']
which will return a character vector of the data.frames in the current environment.
If you only have data.frames with the same number of columns and column names in you global environment, the following should work (non-data.frame object don't matter):
do.call(rbind, eapply(.GlobalEnv,function(x) if(is.data.frame(x)) x))
This is a slight improvement on MentatOfDune's answer, which does not catch data.frames with multiple classes:
ls()[grepl('data.frame', sapply(ls(), function(x) class(get(x))))]
To improve MentatOfDune's answer (great username by the way):
ls()[sapply(ls(), function(x) any(class(get(x)) == 'data.frame'))]
or even more robust:
ls()[sapply(ls(), function(x) is.data.frame(get(x)))]
This also supports tibbles (created with dplyr for example), because they contain multiple classes, where data.frame is one of them.
A readable version to get TRUEs and FALSEs using R 4 and higher:
ls() |> sapply(get) |> sapply(is.data.frame)
Finally super, super robust, also for package developers:
ls()[sapply(ls(), function(x) is.data.frame(eval(parse(text = x), envir = globalenv())))]

R not converting to numeric

I have a matrix of information that I import from tab separated files. Once I import this data, I consolidate it in to a dataframe, and perform some editing on it to make it usable.
The last step, is for me to convert all the numbers to numeric. In order to do this, i use
as.numeric(as.character()). Unfortunately, the numbers do not change to numeric. They are still of chr type.
Here is my code:
stringsAsFactors=F
filelist <- list.files(path="C:\\Users\\LocalAdmin\\Desktop\\Correlation Project\\test", full.names=TRUE, recursive=FALSE)
temp <- data.frame()
TSV <- data.frame()
for (i in seq (1:length(filelist)))
{
temp <- read.table(file=filelist[i],head=TRUE,sep="\t")
TSV <- rbind(TSV,temp)
}
for (i in seq(15,1,-1)) #getting rid of extraneous dataframe entries
{
TSV <- TSV[-i,] #deleting by row
}
for(i in seq(1,ncol(TSV),1))
{
TSV[,i] <- as.numeric(as.character(TSV[,i]))
}
Thank you for your help!
You can use
TSV <- as.data.frame(as.numeric(as.matrix(TSV)))
This will only work if all values can be transformed into numbers.
A couple of things here:
Prefer vector operations whenever possible
no need to read the files in a for loop
TSV<-do.call(rbind,lapply(filelist, read.delim))
your loop to get rid of extraneous info can be reduced to a vector operation
TSV<-TSV[-(1:15),]
I'm assuming you are getting factors and integers that you want as numeric
oldClasses<-sapply(TSV[1,],class)
int2numeric<-oldClasses == "integer"
factor2numeric<-oldClasses == "factor"
TSV[,int2numeric]<-as.numeric(TSV[,int2numeric])
TSV[,factor2numeric]<-as.numeric(as.character(TSV[,factor2numeric]))
you could arguably reduce the 2 above to one, but I think this makes your intent clear
and that should be it
#JPC I finally managed to get this to work. Here is my code:
TSVnew<-apply(TSV,2,as.numeric)
rownames(TSVnew)<-rownames(TSV)
TSV<-TSVnew
However, I still don't understand why my previous attempt using this didn't work:
for(i in seq(1,ncol(TSV),1))
{
TSV[,i] <- as.numeric(as.character(TSV[,i]))
}

calling data frames in a for loop by a vector

I have some data.frames
dat1=read.table...
dat2=read.table...
dat3=read.table...
And I would to count the rows for each data set. So
the names are saved like this (cannot "change" it) vector=c("dat1","dat2","dat3...)
p <- vector(numeric, length=1:length(dat))
counting <- function(x) {for (i in 1:x){
p[i]<-nrow(dat[i])}
return(p)
}
This is not working because the input for nrow is a character, but i need integer(?) or?
Thx for help
You can use get for this, but be careful! Instead reading the tables at a list is the R-ish way:
file.names <- list.files()
dat <- lapply(file.names, read.table)
Then you have all the conveniences of lapply and the apply family at your disposal, e.g.:
lapply(dat, nrow)
The solution using get (also vector is a bad variable name since its a very important function):
lapply(vector, function(x) nrow(get(x)))
Your method fails since there is no object called dat to index into. The for loop could look like:
p = NULL
for(v in vector) {
p <- c(p, nrow(get(v)))
}
But that technique is poor form for lotsa reasons...
If you want to determine properties of items you know to be in the .GlobalEnv, this works:
> sapply( c("A","B"), function(objname) nrow(.GlobalEnv[[objname]]) )
A B
5 4
You could substitute any character vector for c("A","B")`. If the object is not in the global environment it just returns NULL, so it's reasonably robust.

R: How to read different files into a two-dim vector?

I have an R newbie question about storing data.
I have 3 different files, each of which contains one column. Now I would like to read them into a structure x so that x[1] is the column of the first file, x[2] is the column of the second file, etc. So x would be a two-dim vector.
I tried this, but it wants x[f] to be a single number rather than a whole vector:
files <- c("dir1/data.txt", "dir2b/data.txt", "dir3/data2.txt")
for(f in 1:length(files)) {
x[f] <- scan(files[f])
}
How can I fix this?
Lists should help. Try
x <- vector(mode="list",length=3)
before the loop and then assign as
x[[f]] <- read.table(files[f])
I would recommend against scan; you should have better luck with read.table() and its cousins like read.csv.
Once you have x filled, you can combine as e.g. via
y <- do.call(cbind, x)
which applies cbind -- a by-column combiner -- to all elements of the list x.

Resources