I have my code:
new_df = data.frame()
#G = 0
for (i in 1:nrow(furin_data)){
frac = furin_data[i,3]/furin_data[i,5]
#print(frac)
if (frac > 2 || frac < 0.5) {
name = furin_data[i,1]
print(name)
new_df = furin_data[i,]
#print(new_df)
#G = G + 1
}
write.csv(new_df, "C:\\User\\Documents\\MyData.csv", row.names = FALSE)
}
It creates a new data file, but only the last row is written and not all of the the rows based on the condition. I cannot seem to figure out where is the problem.
That's because you're assigning the row to it, so every assignment overrides the previous one. What you want is to add rows to it instead.
new_df[nrow(new_df)+1,] = furin_data[i,]
Another thing is that you created your new_df data frame without any columns, so none are assigned in the transfer. You should define it with the same types and names of columns as furin_data, so those columns could be copied. An easy of initializing it as empty but having the same structure would be:
new_df = furin_data[F,]
Buuut, in the R language, writing a loop is not the best way to do things. R is a vectorized language, meaning it can perform all operations on a vector at once, causing it to execute much much faster. So a conversion of your whole code to R style would be:
library(dplyr)
new_df <-
furin_data %>%
mutate(frac = .[3] / .[5]) %>%
subset(frac > 2 | frac < 0.5)
write.csv(new_df, "C:\\User\\Documents\\MyData.csv", row.names = FALSE)
Related
In R (studio), I have tried so many iterations of storing just the dates into the turning_point_dates data frame, but I have only been able to get it to store the loop numbers. I can print out each date as it is found, but not able to store them yet.
dates = data.frame(Date = seq(from = as.Date("2002-06-01"), to = as.Date("2011-09-30"), by = 'day'))
nums = c(98,99,100,101,102,103,104,105,106,107)
dataframe_of_numbers = data.frame(nums)
mat = matrix(ncol=0, nrow=0)
turning_point_dates = data.frame(mat)
for (i in 1:nrow(dataframe_of_numbers)){
print(dates$Date[dataframe_of_numbers[i,]])
turning_point_dates[i,] = dates$Date[dataframe_of_numbers[i,]]
}
turning_point_dates
How can I instead store the actual dates that are being looped over into the turning_point_dates data frame?
turning_point_dates puts out a data frame looking like the following:
Description:df [10 x 0]
1
2
3
4
5
6
7
8
9
10
1-10 of 10 rows
When I want instead a data frame like so:
"2002-09-06"
"2002-09-07"
"2002-09-08"
"2002-09-09"
"2002-09-10"
"2002-09-11"
"2002-09-12"
"2002-09-13"
"2002-09-14"
"2002-09-15"
It's a bit unclear, but if you want to end up with a smaller dataframe that only has the dates corresponding to the row numbers in nums, you don't need to use a loop. You can just subset the data frame with num, as shown below.
I'm also suggesting using a tibble instead of a basic data.frame, because subsetting a tibble returns a tibble but subsetting a data.frame returns a vector.
library(tibble)
dates <- tibble::tibble(Date = seq(from = as.Date("2002-06-01"),
to = as.Date("2011-09-30"),
by = 'day'))
nums <- c(98,99,100,101,102,103,104,105,106,107)
dates_subsetted <- dates[nums,]
It can also be done with a loop, but in my view it's much clunkier. It will almost certainly be much, much slower if you have a lot of data.
But since it was asked:
library(dplyr)
# set up another tibble for the values we'll extract
dates_looped <- tibble::tibble()
# loop through each row of the input, add that row if
for (i in 1:nrow(dates)){
if (i %in% nums) {
dates_looped <- dplyr::bind_rows(dates_looped, dates[i,])
}
}
dates_looped and dates_subsetted are the same, but making dates_subsetted took a single line of code that will run many times faster.
I don't think you need a loop to do this. Here is what I did:
dates <- data.frame(Date = seq(from = as.Date("2002-06-01"), to = as.Date("2011-09-30"), by = 'day'))
nums = c(98,99,100,101,102,103,104,105,106,107)
turning_point_dates <- data.frame(nums, dates = dates$Date[nums])
I'm currently having an issue where I'm trying to nest simulated data for an efficient frontier inside a tibble containing all 250 simulations. The tibble will have 1 column named "sim" which indicates the number of the simulation, i.e. the rows in this column runs from 1:250. The other column should contain the nested simulation data which is a 3x123 tibble for each simulation. (Really hope this makes sense).
I've tried to replicate the problem such that you don't need all of the previous code and data to see the issue. Problem is that the nested data is saved as a list:
library(tidyverse)
counter = 0
table <- tibble(sim = 1:250, obs = NA)
for(i in (1:250)){
counter = counter + 1
tibble <- tibble(a = NA, b = 1:113, c = 2, d = 3)
tibble$a <- counter
nested_tibble <- tibble %>% nest(data = -a) %>% select(-a)
table$obs[i] <- nested_tibble
}
In this simplified reproducible example the values in the tibble are identical. Whereas in the assignment I'm working on, the tibble contains values for the efficient frontier. Variable 'a' in the tibble corresponds to simulation number and this is the variable i use to nest the efficient frontier. Afterwards I wish to remove this variable a, and insert the nested tible in the corresponding 'obs' field currently being NA.
I really hope this makes sense. I'm still very new with R and coding. If you need any additional documentation please let me know.
Your nested_tibble is a list containing a tibble. To access the tibble inside the list, you can use double bracket notation: nested_tibble[[1]]. So to get the result you want you can change your loop as follows:
counter = 0
table <- tibble(sim = 1:250, obs = NA)
for(i in (1:250)){
counter = counter + 1
tibble <- tibble(a = NA, b = 1:113, c = 2, d = 3)
tibble$a <- counter
nested_tibble <- tibble %>% nest(data = -a) %>% select(-a)
table$obs[i] <- nested_tibble[[1]]
}
I have a tryCatch block that returns a dataframe. In some cases, it returns a empty dataframe with 0 rows. No matter it's empty or not, I need to add some data in there. (some columns with one row). I found when it returns an empty dataframe, adding columns to the dataframe always give me error. For example:
dt <- data.frame()
for (a in 0:2) {
table <- tryCatch(
{ mtcars %>%
filter(am==a) %>%
group_by(vs) %>%
summarise(n=n()) %>%
spread(vs, n)
},
error = function(e) {
return(NULL)
} )
table$am = a
dt <- bind_rows(dt, table)
}
Here is the error message:
Error in `$<-.data.frame`(`*tmp*`, "am", value = 2L) : replacement has 1 row, data has 0
Anyone can help solving this issue? Thanks a lot.
One option could be to declare your data.frame with valid columns but 0 rows. This provides flexibility to add row using nrow(data)+1 and assigned values of desired columns.
data = data.frame(ID = integer(), Result = integer())
data[nrow(data)+1,] = c(1, 5)
data
# ID Result
# 1 1 5
EDIT
OP is facing problem with the tryCatch block. It seems there are conditions when exception is thrown from tryCatch block and value of table is assigned as NULL in such cases.
The possible fix can be to replace
table$am = a
dt <- bind_rows(dt, table)
lines with
if(!is.null(table)){
table$am = a
dt <- bind_rows(dt, table)
}else{
dt[nrow(dt)+1, "am"] <- a
}
data<-data.frame(rbind(data, ID=1))
You can't add entries to an empty data.frame like that. Try
data = data.frame("ID"=1,"Result"=5)
I'm trying to create a loop and for each iteration (the number of which can vary between source files) construct a mutate statement to add a column based on the value of another column.
Having my programming background in php, to my mind this should work:
for(i in number){
colname <- paste("Column",i,sep="")
filtercol <- paste("DateDiff_",i,sep="")
dataset <- mutate(dataset, a = ifelse(b >= 0 & b <= 364,1,NA))
}
But... as I've noticed a couple of times now with R functions sometimes the function ignores outright that you have defined a variable with that name -
as mutate() is here.
So instead of getting several columns titled "a1", "a2", "a3", etc, I get one column entitled "a" that gets overwritten each iteration.
Firstly, can somebody point out to me where I'm going wrong here, but secondly could someone explain to me under what circumstances R ignores variable names, as it's happened a couple of times now and it just seems wildly inconsistent at this point. I'm sure it's not, and there's logic there, but it's certainly well obfuscated.
It's also worth mentioning that originally I tried it this way:
just.dates <- just.dates %>%
for(i in number){
a <- paste("a",i,sep="")
filtercol <- paste("DateDiff_",i,sep="")
mutate(a = ifelse(filtercol >= 0 & filtercol <= 364),1,NA)
}
But that way decided I was passing the for() loop 4 arguments when it only wanted three.
Something like this may work for you. The mutate_() function as opposed to just mutate() should help you out with this.
# Create dataframe for testing
dataset <- data.frame(date = as.Date(c("06/07/2000","15/09/2000","15/10/2000","03/01/2001","17/03/2001",
"06/08/2010","15/09/2010","15/10/2010","03/01/2011","17/03/2011"), "%d/%m/%Y"),
event=c(0,0,1,0,1, 1,0,1,0,1),
id = c(rep(1,5),rep(2,5)),
DateDiff_1 = c(-2,0,34,700,rep(5,6)),
DateDiff_2 = c(20,-12,360,900,rep(5,6))
)
# Set test number vector
number <- c(1:2)
# Begin loop through numbers
for(i in number){
# Set the name of the new column to be created
newcolumn <- paste("Column",i,sep="")
# Set the name of the column to be filtered
filtercolumn <- paste("DateDiff_",i,sep="")
# Create the function to be passed into the mutate command
mutate_function = lazyeval::interp(~ ifelse(fc >= 0 & fc <= 364, 1, NA), fc = as.name(filtercolumn))
# Apply the mutate command to the dataframe
dataset <- dataset %>%
mutate_(.dots = setNames(list(mutate_function), newcolumn))
}
I am trying to compile data from several files using for loops in R. I would like to get all the data into one table. Following calculation is just an example.
library(reshape)
dat1 <- data.frame("Specimen" = paste("sp", 1:10, sep=""), "Density_1" = rnorm(10,4,2), "Density_2" = rnorm(10,4,2), "Density_3" = rnorm(10,4,2))
dat2 <- data.frame("Specimen" = paste("fg", 1:10, sep=""), "Density_1" = rnorm(10,4,2), "Density_2" = rnorm(10,4,2))
dat <- c("dat1", "dat2")
for(i in 1:length(dat)){
data <- get(dat[i])
melt.data <- melt(data, id = 1)
assign(paste(dat[i], "tbl", sep=""), cast(melt.data, ~ variable, mean))
}
rbind(dat1tbl, dat2tbl)
What is the smoothest way to add an extra column into dat2? I would like to get the same column name ("Density_3" in this case) and fill it up with zeros, if it does not already exist. Assume that I have ~100 tables with number of columns (Density_1, 2, 3 etc) varying between 5 and 6.
I tried following, but it didn't work:
if(names(data) %in% "Density_3" == FALSE){
dat.all$Density_3 <- 0
} else {
dat.all$Density_3 <- dat.all$Density3}
Another one: is there a smooth way to rbind() the tables? It seems that rbind(get(dat)) does not work.
After staring at this question for a while I think its intent may have been obscured by the unnecessary get and assign manipulations. And I think the answer is pylr::rbind.fill
I would have constructed "dat", not as a character vector but as a list of two dataframes, used aggregate( ..., FUN=mean) (because I haven't gotten on the reshape2/plyr bus, except for melt and rbind.fill that is ) and then do.call(rbind.fill, ...) on the resulting list. At any rate this is what I think you want. I do not think it is a good idea to add in zeros for what are really missing values.
> rbind.fill(dat1tbl, dat2tbl)
value Density_1 Density_2 Density_3
1 (all) 5.006709 4.088988 2.958971
2 (all) 4.178586 3.812362 NA