Guys I have a code that generates 2 columns of data (e.g Number, Median) which refers to a particular person...but I have taken samples of 7 people
so basically I get this output:
[[1]
Number Median
1 5
2 3
.....
[[2]]
Number Median
1 6
2 4
....
[[3]]
Number Median
1 3
2 5
So I basically get this output....up til [[7]]
I tried transferring this output in excel using this code
write.csv(cbind(data),"data1.csv")
and I get this type of output:
list(c(Median =.......It lists all the median on the rows
But I want it to save the data referring to the 'median' and 'Number' in columns NOT ROWS
If I just type
write.csv(data,"data1.csv")
I get an error
arguments imply differing number of rows: 157, 179, 178, 180
As Marius said, you have a list of data.frames which can't be written to a .csv file. You need to do:
NewDataFrame <- do.call("rbind", YourList)
write.csv(NewDataFrame, "Data.csv")
do.call takes each of the elements from a list and applies whatever function you tell it (in this case rbind) to all of them.
Here are two options. Both use the following sample data:
myList <- list(data.frame(matrix(1:4, ncol = 2)),
data.frame(matrix(3:10, ncol = 2)),
data.frame(matrix(11:14, ncol =2)))
myList
# [[1]]
# X1 X2
# 1 1 3
# 2 2 4
#
# [[2]]
# X1 X2
# 1 3 7
# 2 4 8
# 3 5 9
# 4 6 10
#
# [[3]]
# X1 X2
# 1 11 13
# 2 12 14
Option 1: Write a csv file where the data.frames are presented as they are in the list
sink("list_of_dataframes.csv", type="output")
invisible(lapply(myList, function(x) dput(write.csv(x))))
sink()
If you open the resulting "list_of_dataframes.csv" file in a text editor, you will get something that looks like this. When you read this into a spreadsheet program, the first column will include the rownames and NULL separating each data.frame:
"","X1","X2"
"1",1,3
"2",2,4
NULL
"","X1","X2"
"1",3,7
"2",4,8
"3",5,9
"4",6,10
NULL
"","X1","X2"
"1",11,13
"2",12,14
NULL
Option 2: Write or search around for a version of cbind that accommodates binding data.frames with differing number of rows.
Here is one such function that I've written.
cbind2 <- function(datalist) {
nrows <- max(sapply(datalist, nrow))
expandmyrows <- function(mydata, rowsneeded) {
temp1 = names(mydata)
rowsneeded = rowsneeded - nrow(mydata)
temp2 = setNames(data.frame(
matrix(rep(NA, length(temp1) * rowsneeded),
ncol = length(temp1))), temp1)
rbind(mydata, temp2)
}
do.call(cbind, lapply(datalist, expandmyrows, rowsneeded = nrows))
}
And here is that function applied to your list:
cbind2(myList)
# X1 X2 X1 X2 X1 X2
# 1 1 3 3 7 11 13
# 2 2 4 4 8 12 14
# 3 NA NA 5 9 NA NA
# 4 NA NA 6 10 NA NA
That output should be easy for you to use with write.csv and related functions.
Related
I want to save residuals from a linear model to a dataframe. I was trying to do it with the line of code (note that this was supposed to go inside a loop):
resi <- NULL
resi <- cbind(resi, colnames(dados[1])=residuals(m))
Here I intended to save the residuals vector from my model m under the same column name from the dados object (which is basicaly a date), but I get the error:
Error: unexpected '=' in "resi <- cbind(resi, colnames(dados[1])="
You want `colnames <- ()`.
cbind(d, `colnames<-`(d, letters[1:4]))
# X1 X2 X3 X4 a b c d
# 1 1 4 7 10 1 4 7 10
# 2 2 5 8 11 2 5 8 11
# 3 3 6 9 12 3 6 9 12
It's similar to setNames() but also compatible with matrices.
Toydata
d <- data.frame(matrix(1:12, 3, 4))
It is possible to do this in tibble
library(tibble)
tibble(resi, !!colnames(dados)[1] :=residuals(m))
I got a list which have another list of data frames.
The outside list elements represents years and inside list represent months data.
Now I want to create a final list which will contain data for all months. Each Month columns will be "cbinded" by other years column values.
Alldata <- list()
Alldata[[1]] <- list(data.frame(Jan_2015_A=c(1,2), Jan_2015_B=c(3,4)), data.frame(Feb_2015_C=c(5,6), Feb_2015_D=c(7,8)))
Alldata[[2]] <- list(data.frame(Jan_2016_A=c(1,2), Jan_2016_B=c(3,4)), data.frame(Feb_2016_C=c(5,6), Feb_2016_D=c(7,8)))
Expected output list is as following
I've tried using for loops and its little complex, I want any R function to do this task.
I have done this using for loops using following code. But this is really complex and I myself found this little complicate. Hope I will get any simpler and tidy code for this operation.
I created list with each months and years data as a list item in form of data frames
x2 <- list()
for(l1 in 1: length(Alldata[[1]])){
temp <- list()
for(l2 in 1: length(Alldata)){
temp <- append(temp, list(Alldata[[l2]][[l1]]))
}
x2 <- append(x2, list(temp))
}
# then created final List with succesive years data of each month as list items. This is primarily used for Tracking data for years For Example: how much was count was for Jan_2015 and Jan_2016 for "A"
finalList <- list()
for(l3 in 1: length(x2)){
temp <- x2[[l3]]
td2 <- as.data.frame(matrix("", nrow = nrow(temp[[1]])))
rownames(td2)[rownames(temp[[1]])!=""] <- rownames(temp[[1]])[rownames(temp[[1]])!=""]
for(l4 in 1:ncol(temp[[1]])){
for(l5 in 1: length(temp)){
# lapply(l4, function(x) do.call(cbind,
td2 <- cbind(td2, temp[[l5]][, l4, drop=F])
}
}
finalList <- append(finalList, list(td2))
}
> finalList
[[1]]
V1 Jan_2015_A Jan_2016_A Jan_2015_B Jan_2016_B
1 1 1 3 3
2 2 2 4 4
[[2]]
V1 Feb_2015_C Feb_2016_C Feb_2015_D Feb_2016_D
1 5 5 7 7
2 6 6 8 8
You could do the following below. The lapply will iterate over the outer list and the do.call will cbind the inner list of data frames.
lapply(Alldata, do.call, what = 'cbind')
[[1]]
Jan_2015_A Jan_2015_B Feb_2015_C Feb_2015_D
1 1 3 5 7
2 2 4 6 8
[[2]]
Jan_2016_A Jan_2016_B Feb_2016_C Feb_2016_D
1 1 3 5 7
2 2 4 6 8
You can also use dplyr to get the same results.
library(dplyr)
lapply(Alldata, bind_cols)
Here is a third option proposed by J.R.
lapply(Alldata, Reduce, f = cbind)
EDIT
After clarification from OP, the above solution has been modified (see below) to produce the newly specified output. The solution above has been left there since it is a building block for the solution below.
pattern.vec <- c("Jan", "Feb")
### For a given vector of months/patterns, returns a
### list of elements with only that month.
mon_data <- function(mo) {
return(bind_cols(sapply(Alldata, function(x) { x[grep(pattern = mo, x)]})))
}
### Loop through months/patterns.
finalList <- lapply(pattern.vec, mon_data)
finalList
## [[1]]
## Jan_2015_A Jan_2015_B Jan_2016_A Jan_2016_B
## 1 1 3 1 3
## 2 2 4 2 4
##
## [[2]]
## Feb_2015_C Feb_2015_D Feb_2016_C Feb_2016_D
## 1 5 7 5 7
## 2 6 8 6 8
## Ordering the columns as specified in the original question.
## sorting is by the last character in the column name (A or B)
## and then the year.
lapply(finalList, function(x) x[ order(gsub('[^_]+_([^_]+)_(.*)', '\\2_\\1', colnames(x))) ])
## [[1]]
## Jan_2015_A Jan_2016_A Jan_2015_B Jan_2016_B
## 1 1 1 3 3
## 2 2 2 4 4
##
## [[2]]
## Feb_2015_C Feb_2016_C Feb_2015_D Feb_2016_D
## 1 5 5 7 7
## 2 6 6 8 8
I am using a for loop to merge multiple files with another file:
files <- list.files("path", pattern=".TXT", ignore.case=T)
for(i in 1:length(files))
{
data <- fread(files[i], header=T)
# Merge
mydata <- merge(mydata, data, by="ID", all.x=TRUE)
rm(data)
}
"mydata" looks as follows (simplified):
ID x1 x2
1 2 8
2 5 5
3 4 4
4 6 5
5 5 8
"data" looks as follows (around 600 files, in total 100GB). Example of 2 (seperate) files. Integrating all in 1 would be impossible (too large):
ID x3
1 8
2 4
ID x3
3 4
4 5
5 1
When I run my code I get the following dataset:
ID x1 x2 x3.x x3.y
1 2 8 8 NA
2 5 5 4 NA
3 4 4 NA 4
4 6 5 NA 5
5 5 8 NA 1
What I would like to get is:
ID x1 x2 x3
1 2 8 8
2 5 5 4
3 4 4 4
4 6 5 5
5 5 8 1
ID's are unique (never duplicates over the 600 files).
Any idea on how to achieve this as efficiently as possible much appreciated.
It's better suited as comment, But I can't comment yet.
Would it not be better to rbind instead of merge?
This seems to be what you want to acomplish.
Set fill argument TRUE to take care of different column numbers:
asd <- data.table(x1 = c(1, 2), x2 = c(4, 5))
a <- data.table(x2 = 5)
rbind(asd, a, fill = TRUE)
x1 x2
1: 1 4
2: 2 5
3: NA 5
Do this with data and then merge into mydata by ID.
Update for comment
files <- list.files("path", pattern=".TXT", ignore.case=T)
ff <- function(input){
data <- fread(input)
}
a <- lapply(files, ff)
library(plyr)
binded.data <- ldply(a, function(x) rbind(x, fill = TRUE))
So, this creates a function to read files and pushes it to lapply, so you will get a list containing all your data files, each on its own dataframe.
With ldply from plyr rbind all dataframes into one dataframe.
Don't touch mydata yet.
binded.data <- data.table(binded.data, key = ID)
Depending on your mydata you will perform different merge commands.
See:
https://rstudio-pubs-static.s3.amazonaws.com/52230_5ae0d25125b544caab32f75f0360e775.html
Update 2
files <- list.files("path", pattern=".TXT", ignore.case=T)
ff <- function(input){
data <- fread(input)
# This keeps only the rows of 'data' whose ID matches ID of 'mydata'
data <- data[ID %in% mydata[, ID]]
}
a <- lapply(files, ff)
library(plyr)
binded.data <- ldply(a, function(x) rbind(x, fill = TRUE))
Update 3
You can add cat to see the file the function is reading right now. So you can see after which file you are running out of memory. Which will point you to the direction on how many files you can read in one go.
ff <- function(input){
# This will print name of the file it is reading now
cat(input, "\n")
data <- fread(input)
# This keeps only the rows of 'data' whose ID matches ID of 'mydata'
data <- data[ID %in% mydata[, ID]]
}
I'm trying to remove all the NA values from a list of data frames. The only way I have got it to work is by cleaning the data with complete.cases in a for loop. Is there another way of doing this with lapply as I had been trying for a while to no avail. Here is the code that works.
I start with
data_in <- lapply (file_name,read.csv)
Then have:
clean_data <- list()
for (i in seq_along(id)) {
clean_data[[i]] <- data_in[[i]][complete.cases(data_in[[i]]), ]
}
But what I tried to get to work was using lapply all the way like this.
comp <- lapply(data_in, complete.cases)
clean_data <- lapply(data_in, data_in[[id]][comp,])
Which returns this error "Error in [.default(xj, i) : invalid subscript type 'list' "
What I'd like to know is some alternatives or if I was going about this right. And why didn't the last example not work?
Thank you so much for your time. Have a nice day.
I'm not sure what you expected with
clean_data <- lapply(data_in, data_in[[id]][comp,])
The second parameter to lapply should be a proper function to which each member of the data_in list will be passed one at a time. Your expression data_in[[id]][comp,] is not a function. I'm not sure where you expected id to come from, but lapply does not create magic variables for you like that. Also, at this point comp is now a list itself of indices. You are making no attempt to iterate over this list in sync with your data_in list. If you wanted to do it in two separate steps, a more appropriate approach would be
comp <- lapply(data_in, complete.cases)
clean_data <- Map(function(d,c) {d[c,]}, data_in, comp)
Here we use Map to iterate over the data_in and comp lists simultaneously. They each get passed in to the function as a parameter and we can do the proper extraction that way. Otherwise, if we wanted to do it in one step, we could do
clean_data <- lapply(data_in, function(x) x[complete.cases(x),])
welcome to SO, please provide some working code next time
here is how i would do it with na.omit (since complete.cases only returns a logical)
(dat.l <- list(dat1 = data.frame(x = 1:2, y = c(1, NA)),
dat2 = data.frame(x = 1:3, y = c(1, NA, 3))))
# $dat1
# x y
# 1 1 1
# 2 2 NA
#
# $dat2
# x y
# 1 1 1
# 2 2 NA
# 3 3 3
Map(na.omit, dat.l)
# $dat1
# x y
# 1 1 1
#
# $dat2
# x y
# 1 1 1
# 3 3 3
Do you mean like the below?
> lst
$a
a
1 1
2 2
3 NA
4 3
5 4
$b
b
1 1
2 NA
3 2
4 3
5 4
$d
d e
1 NA 1
2 NA 2
3 3 3
4 4 NA
5 5 NA
> f <- function(x) x[complete.cases(x),]
> lapply(lst, f)
$a
[1] 1 2 3 4
$b
[1] 1 2 3 4
$d
d e
3 3 3
file_name[complete.cases(file_name), ]
complete.cases() returns only a logical value. This should do the job and returns only the rows with no NA values.
Consider the following:
df <- data.frame(a = 1, b = 2, c = 3)
names(df[1]) <- "d" ## First method
## a b c
##1 1 2 3
names(df)[1] <- "d" ## Second method
## d b c
##1 1 2 3
Both methods didn't return an error, but the first didn't change the column name, while the second did.
I thought it has something to do with the fact that I'm operating only on a subset of df, but why, for example, the following works fine then?
df[1] <- 2
## a b c
##1 2 2 3
What I think is happening is that replacement into a data frame ignores the attributes of the data frame that is drawn from. I am not 100% sure of this, but the following experiments appear to back it up:
df <- data.frame(a = 1:3, b = 5:7)
# a b
# 1 1 5
# 2 2 6
# 3 3 7
df2 <- data.frame(c = 10:12)
# c
# 1 10
# 2 11
# 3 12
df[1] <- df2[1] # in this case `df[1] <- df2` is equivalent
Which produces:
# a b
# 1 10 5
# 2 11 6
# 3 12 7
Notice how the values changed for df, but not the names. Basically the replacement operator `[<-` only replaces the values. This is why the name was not updated. I believe this explains all the issues.
In the scenario:
names(df[2]) <- "x"
You can think of the assignment as follows (this is a simplification, see end of post for more detail):
tmp <- df[2]
# b
# 1 5
# 2 6
# 3 7
names(tmp) <- "x"
# x
# 1 5
# 2 6
# 3 7
df[2] <- tmp # `tmp` has "x" for names, but it is ignored!
# a b
# 1 10 5
# 2 11 6
# 3 12 7
The last step of which is an assignment with `[<-`, which doesn't respect the names attribute of the RHS.
But in the scenario:
names(df)[2] <- "x"
you can think of the assignment as (again, a simplification):
tmp <- names(df)
# [1] "a" "b"
tmp[2] <- "x"
# [1] "a" "x"
names(df) <- tmp
# a x
# 1 10 5
# 2 11 6
# 3 12 7
Notice how we directly assign to names, instead of assigning to df which ignores attributes.
df[2] <- 2
works because we are assigning directly to the values, not the attributes, so there are no problems here.
EDIT: based on some commentary from #AriB.Friedman, here is a more elaborate version of what I think is going on (note I'm omitting the S3 dispatch to `[.data.frame`, etc., for clarity):
Version 1 names(df[2]) <- "x" translates to:
df <- `[<-`(
df, 2,
value=`names<-`( # `names<-` here returns a re-named one column data frame
`[`(df, 2),
value="x"
) )
Version 2 names(df)[2] <- "x" translates to:
df <- `names<-`(
df,
`[<-`(
names(df), 2, "x"
) )
Also, turns out this is "documented" in R Inferno Section 8.2.34 (Thanks #Frank):
right <- wrong <- c(a=1, b=2)
names(wrong[1]) <- 'changed'
wrong
# a b
# 1 2
names(right)[1] <- 'changed'
right
# changed b
# 1 2