I have been facing a problem for three days and I cannot get any answer about why it does not work. I have tried quite a lot different ways, but I am just going to post the one I believe is likely to be closest to the solution. I am going to put a reduce example about what I want to ask.
I have 7 csv files (called 001.csv, 002.csv, ... etc), in a folder called "Myfolder".
I have been trying to get a function that merged into an unique data.frame all this different .csv files using for-loop and r.bind and finally return the mean from either column "Colour1" or "Colour2" depending in the "colour" (column) and the "Children" (rows) I choose, and of course without missing values "Na". As an example when I merge the files I get a data frame like this data frame:
Colour1 Colour2 Children
NA NA 1
9 NA 2
NA NA 2
NA 5 3
7 NA 4
NA NA 5
NA 8 5
2 NA 6
6 3 6
14 NA 7
This is the the example of the function I want to build get_mean <- function(directory, colour, children)
What I have tried
get_mean <- function(directory, colour, children) {
files <- list.files(directory, full.names=TRUE)
allfiles <- data.frame()
for(i in 1:7) {
allfiles <- rbind(allfiles, read.csv(files[i]))
}
if(colour == "colour1"){
mean(allfiles$colour1[allfiles$Children == children], na.rm = TRUE)
}
if(colour == "colour2"){
mean(alllists$colour2[alllist$Children == children], na.rm = TRUE)
}
}
When I tried for example:
get_mean("Myfolder", "colour1", 3:6)
I get
In alllist$ID == id :
longer object length is not a multiple of shorter object length
and when I try:
get_mean("Myfolder", "colour1", 6)
I get:
>
Yes guys....I get back absolutely nothing.
What do you think guys? any correction to it? any other way to get the mean?
Note: all the data I put in here is not the one I am using. This is just an example from an exercise much bigger. I have tried to make a really small example with different names and numbers in order to don't discuss about the exercise itself and other could copy the solution
Here is a corrected and more readable version of your function - I named your data.frame all files df, I also added a check on colour:
get_mean <- function(directory, colour, children) {
files = list.files(directory, full.names=T)
df = do.call(rbind, lapply(files[1:7], read.csv))
# check the colour argument
if(!is.element(colour, c('colour1','colour2')))
stop(sprtinf('colour argument value %s is not part of df column', colour))
mean(df[[colour]][df$Children == children], na.rm=TRUE)
}
Related
I'm attempting to import multiple csv files and create a new dataframe that includes specific columns (some with the same name; some different) from each of these files. So far I have been able to create the dataframe with the specific columns I want, but somewhere in my code my data gets lost and doesn't transfer over to each column.
I would also like to create a new column named status where I would like to have each cell equal to either Lost/Gained/Neutral depending if the same value found in the all_v.csv file is also found in the lost_v and/or the gained_v. If it is found in niether then it is Neutral. I attempted to write a line of code for this, but I won't know if it works till I am able to attach the correct data in each column.
This would give me a total of 8 columns:
pre_contact, status, gained_variation, lost_variation, coord.lat, coord.long, country, Date
Most of these columns come from the 4 files listed below with the exception of the status column:
all_v - pre_contact
status - Lost / Gained / Neutral
gained_v - gained_variation
lost_v - lost_variation
SOUTH - coord.lat, coord.long, country, Date
An issue I'm also facing is having disproportionate dataframes. So when I attempt to merge or use rbind, I get an error saying that my rows do not line up because some columns are larger than others so I would like a way to fix this with adding NAs
Here is my sample code:
folder_path<- setwd("/directory/")
setwd(folder_path)
#this creates a table with two columns: filename and file_path but I'm not sure how to utilize it
all_of_them <- fs::dir_ls(folder_path, pattern="*.csv")
file_names <- tibble(filename = list.files(folder_path))
file.paths <- file_names %>% mutate(filepath = paste0(folder_path, filename))
#Each file I want to use
gained_v <- read.csv("gained.csv", header = TRUE)
lost_v <- read.csv("lost.csv", header = TRUE)
all_v <- read.csv("all.csv", header = TRUE)
SOUTH <- read.csv("SOUTH.csv", header = TRUE)
files = list.files(pattern="*.csv", full.names = TRUE)
for (i in 1:length(files)){
data <-
files %>%
map_df(~fread(.))
}
# Set Column Names
subset_data <- data.frame(data)
subset_data$status <- with(subset_data, subset_data$pre_contact == subset_data$gained_variation | subset_data$pre_contact == subset_data$lost_variatiion)
subset_data <- subset(subset_data, select = c(pre_contact,status, gained_variation,lost_variatiion,coord.lat,coord.long, country, Date))
subset_data <- as_tibble(subset_data)
write.csv(subset_data, "subset_data.csv")
status_data = read.csv("subset_data.csv", header = TRUE)
status_data <- data.frame(subset(status_data, select = -c(X)))
status_data <- tibble(status_data)
So far my output looks like this (where the only data showing is from my pre-contact column:
pre_contact status gained_variation lost_variation coord.lat coord.long country Date
1234 NA NA NA NA NA
6543 NA NA NA NA NA
9876 NA NA NA NA NA
1233 NA NA NA NA NA
1276 NA NA NA NA NA
Solution found! Scroll to the end to see what I did. Hopefully, this function can help others.
TLDR: I have a list: https://i.stack.imgur.com/7t6Ej.png
and I need to do something like this to it
lapply(irabdf, function(x) c(x[!is.na(x)], x[is.na(x)]))
But I need this function to do this to each element of the list individually, and not delete the column names. Currently, I can get it to sort lowest to highest but it moves everything into a single column and deletes the column names.
I have a list in R that I am exporting as a XLS file using the Openxlsx package. I have everything that I need functionally, but my P.I has requested that I sort each column lowest to highest for reviewers, as there are a lot of empty cells that make the document look funny. I am trying to add this feature in R so that I don't need to do it manually. All columns were created from separate .csv files, and rows are unimportant.
The List: https://i.stack.imgur.com/7t6Ej.png
The generated XLSX file looks like this: https://i.stack.imgur.com/ftg00.png.
The columns are not blank, the data is just much further down.
My code for writing the file:
wb <- createWorkbook()
lapply(names(master), function(i){
addWorksheet(wb=wb, sheetName = names(master[i]))
writeData(wb, sheet = i, master[[i]])
addFilter(wb, sheet = i, rows = 1, cols = 1:(a))
})
#Save Workbook
saveWorkbook(wb, saveFile, overwrite = TRUE)
a = this value obtained through (length(unique(x)). X is the levels of a variable.
What I have:
Column1, Column2, Column3, Column4
1. 1 NA NA NA
2. 2 NA NA NA
3. NA 3 NA NA
4. NA 4 NA NA
5. NA NA 5 NA
6. NA NA 6 NA
7. NA NA NA 7
8. NA NA NA 8
What I want:
Column1, Column2, Column3, Column4
1. 1 3 5 7
2. 2 4 6 8
3. NA NA NA NA
4. NA NA NA NA
5. NA NA NA NA
6. NA NA NA NA
7. NA NA NA NA
8. NA NA NA NA
The actual file has 1,000's of rows and 100's of blank cells for each column. The solution would replicate this across all tabs of the XLSX file.
What I have tried:
In a previous version of this script I was able to do this. I had separate df's which were assigned names through user-dialogue options. This is an example of the code I used to do that.
irabdf <- masterdf %>%
filter(Fluorescence == "Infrared") %>%
select(mean, Conditions) %>%
mutate(row = row_number()) %>%
spread(Conditions, mean) %>%
select(!row)
irabdf <- lapply(irabdf, function(x) c(x[!is.na(x)], x[is.na(x)])) %>% ## Move NAs to the bottom of the df
data.frame()
# Create a blank workbook
WB <- createWorkbook()
# Add some sheets to the workbook
addWorksheet(WB, gab)
addWorksheet(WB, rab)
addWorksheet(WB, irab)
# Write the data to the sheets
writeData(WB, sheet = gab, x = gabdf)
writeData(WB, sheet = rab, x = rabdf)
writeData(WB, sheet = irab, x = irabdf)
# Reorder worksheets
worksheetOrder(WB) <- c(1:3)
# Export the file
saveWorkbook(WB, saveFile)
Now that I have removed the user interface and am now using a list I can no longer do this. I have also tried a myriad of other things with most utilizing lapply.
If you need any more information just ask.
Thanks in advance for the assistance!
09/21
I think I am getting closer but I still haven't resolved the issue.
When I use this code
list <- lapply(master[[1]],
function(x) c(x[!is.na(x)], x[is.na(x)]))
I get the results I want but end up losing the first element. If I could keep the first element and apply this over my entire list that should do the trick.
09/22
I have found something that works! However, it isn't dynamic. If someone could help me loop this function across all of the elements of this list (or knows a better solution) just let me know.
list1 <- lapply(master[[1]],
function(x) c(x[!is.na(x)], x[is.na(x)]))
list1 <- data.frame(list1)
master[[1]] <- list1
I need to specify list1 as a df for me to maintain my column names in my XLSX output.
09/22 - 2
Okay, I have the script doing exactly what I want it to do. However, it isn't pretty and it isn't "very" dynamic.
+rep to anyone who can help me convert this into a pretty lapply loop!
if (b >= 1) {
list1 <- lapply(master[[1]],
function(x) c(x[!is.na(x)], x[is.na(x)]))
list1 <- data.frame(list1)
master[[1]] <- list1
}
if (b >= 2) {
list2 <- lapply(master[[2]],
function(x) c(x[!is.na(x)], x[is.na(x)]))
list2 <- data.frame(list2)
master[[2]] <- list2
}
ect... x12
b has a value of 12 here. However, it could be any number practically.
09/22 - 3
Alright, I figured it out. I created the following loop to do what I needed to do and everything appears to be working perfectly. Part of me wants to scream from happiness.
for (i in 1:length(unique(masterdf$ABwant))) {
if (i >= 1)
list.i <- lapply(master[[i]],
function(x) c(x[!is.na(x)], x[is.na(x)]))
list.i<- data.frame(list.i)
master[[i]] <- list.i
}
I'll keep the thread open the rest of the week and if someone has a better solution I will accept it and give you some rep. Else, GG.
This was the code that I used to create the loop that I wanted.
for (i in 1:length(unique(masterdf$ABwant))) {
if (i >= 1)
list.i <- lapply(master[[i]],
function(x) c(x[!is.na(x)], x[is.na(x)]))
list.i<- data.frame(list.i)
master[[i]] <- list.i
}
Using OpenXLSX, I was able to use this loop to create an Excel file that has a separate tab for each antibody and has all columns sorted with NA values placed at the bottom.
### Creating the Excel file
wb <- createWorkbook()
lapply(names(master), function(i){
addWorksheet(wb=wb, sheetName = names(master[i]))
writeData(wb, sheet = i, master[[i]])
# Saving the Excel file
saveWorkbook(wb, saveFile, overwrite = TRUE)
I currently have the following loop.
> margin_values
$margINCBJP
[1] 0.8481856 0.9165585 0.9270849 0.7932756 0.8296131 0.8284826 0.7584834 0.2566567
$margINCTRS
[1] NA NA NA NA NA 0.84499199 0.73135251 -0.06664292
$margBJPTRS
[1] NA NA NA NA NA 0.01650935 -0.02713086 -0.32329962
for(i in 1:length(margin_values)) {
nam <- paste("x", i, sep = "")
assign(nam, margin_values[[i:i]])
}
This creates separate lists starting at x1 to xn. How can I then automatically combine the numbers from all the lists to create one list? I know I can manually type c(x1, x2, x3...) all the way up until n, but since n is variable, is there anyway to have R simply do c() on all values starting with x? For this example, n=3, but depending on parameters I have earlier in my code it may change.
I Just ran into this myself and here is what I came up with (tweaked for you of course):
total<-c(lapply(ls(pattern = "x"),get))
This will create a list, total, with each element being one of your variables starting with x
Let's say I have data like:
> data[295:300,]
Date sulfate nitrate ID
295 2003-10-22 NA NA 1
296 2003-10-23 NA NA 1
297 2003-10-24 3.47 0.363 1
298 2003-10-25 NA NA 1
299 2003-10-26 NA NA 1
300 2003-10-27 NA NA 1
Now I would like to add all the nitrate values into a new list/vector. I'm using the following code:
i <- 1
my_list <- c()
for(val in data)
{
my_list[i] <- val
i <- i + 1
}
But this is what happens:
Warning message:
In x[i] <- val :
number of items to replace is not a multiple of replacement length
> i
[1] 2
> x
[1] NA
Where am I going wrong? The data is part of a Coursera R Programming coursework. I can assure you that this is not an assignment/quiz. I have been trying to understand what is the best way append elements into a list with a loop? I have not proceeded to the lapply or sapply part of the coursework, so thinking about workarounds.
Thanks in advance.
If it's a duplicate question, please direct me to it.
As we mention in the comments, you are not looping over the rows of your data frame, but the columns (also sometimes variables). Hence, loop over data$nitrate.
i <- 1
my_list <- c()
for(val in data$nitrate)
{
my_list[i] <- val
i <- i + 1
}
Now, instead of looping over your values, a better way is to use that you want the new vector and the old data to have the same index, so loop over the index i. How do you tell R how many indexes there are? Here you have several choices again: 1:nrow(data), 1:length(data$nitrate) and several other ways. Below I have given you a few examples of how to extract from the data frame.
my_vector <- c()
for(i in 1:nrow(data)){
my_vector[i] <- data$nitrate[i] ## Version 1 of extracting from data.frame
my_vector[i] <- data[i,"nitrate"] ## Version 2: [row,column name]
my_vector[i] <- data[i,3] ## Version 3: [row,column number]
}
My suggestion: Rather than calling the collection a list, call it a vector, since that is what it is. Vectors and lists behave a little differently in R.
Of course, in reality you don't want to get the data out one by one. A much more efficient way of getting your data out is
my_vector2 <- data$nitrate
How could I go about performing a str() function in R on all of these files loaded in the workspace at the same time? I simply want to export this information out, but in a batch-like process, to a .csv file. I have over 100 of them, and want to compare one workspace with another to help locate incongruities in data structure and avoid mismatches.
I came painfully close to a solution via UCLA's R Code Fragment, however, they failed to include the instructions for how to form the read.dta function which loops through the files. That is the part I need help on.
What I have so far:
#Define the file path
f <- file.path("C:/User/Datastore/RData")
#List the files in the path
fn <- list.files(f)
#loop through file list, return str() of each .RData file
#write to .csv file with 4 columns (file name, length, size, value)
EDIT
Here is an example of what I am after (the view from RStudio--it simply lists the Name, Type, Length, Size, and Value of all of the RData Files). I want to basically replicate this view, but export it out to a .csv. I am adding the tag to RStudio in case someone might know a way of exporting this table out automatically? I couldn't find a way to do it.
Thanks in advance.
I've actually written a function for this already. I also asked a question about it, and dealing with promise objects with the function. That post might be of some use to you.
The issue with the last column is that str is not meant to do anything but print a compact description of objects and therefore I couldn't use it (but that's been changed with recent edits). This updated function gives a description for the values similar to that of the RStudio table. The data frames and lists are tricky because their str output is more than one line. This should be good.
objInfo <- function(env = globalenv())
{
obj <- mget(ls(env), env)
out <- lapply(obj, function(x) {
vals1 <- c(
Type = toString(class(x)),
Length = length(x),
Size = object.size(x)
)
val2 <- gsub("|^\\s+|'data.frame':\t", "", capture.output(str(x))[1])
if(grepl("environment", val2)) val2 <- "Environment"
c(vals1, Value = val2)
})
out <- do.call(rbind, out)
rownames(out) <- seq_len(nrow(out))
noquote(cbind(Name = names(obj), out))
}
And then we can test it out on a few objects..
x <- 1:10
y <- letters[1:5]
e <- globalenv()
df <- data.frame(x = 1, y = "a")
m <- matrix(1:6)
l <- as.list(1:5)
objInfo()
# Name Type Length Size Value
# 1 df data.frame 2 1208 1 obs. of 2 variables
# 2 e environment 11 56 Environment
# 3 l list 5 328 List of 5
# 4 m matrix 6 232 int [1:6, 1] 1 2 3 4 5 6
# 5 objInfo function 1 24408 function (env = globalenv())
# 6 x integer 10 88 int [1:10] 1 2 3 4 5 6 7 8 9 10
# 7 y character 5 328 chr [1:5] a b c d e
Which is pretty close I guess. Here's the screen shot of the environment in RStudio.
I would write a function, something like below. And then loop through that function, so you basically write the code for a single dataset
library(foreign)
giveSingleDataset <- function( oneFile ) {
#Read .dta file
df <- read.dta( oneFile )
#Give e.g. structure
s <- ls.str(df)
#Return what you want
return(s)
}
#Actually call the function
result <- lapply( fn, giveSingleDataset )