How to use loop to create new data.frame - r

I have 300 locations, let's say "A1, A2, A3,..., A300" and I have 40 values. My Locations are in df1 and my values are in df2. I want to add those 40 values to each location, location A1 would have codes from to 40 and so on.
I tried to make a for loop:
df1 <- c("A1", "A2", "A3")
df1 <- data.frame(df1)
colnames(df1) <- c("location")
df2 <- c(1:40)
df2 <-data.frame(df2)
colnames(df2) <- c("code")
data <- data.frame() #Empty data.frame
for (i in df2) {
temp <- df1
temp$code <- rep(i)
data1 <- rbind(data, temp)
}
This script results in an Error: 'replacement has 40 rows, data has 315'.
Can someone tell me what should I do to make this work?
Desired output:

We can use aggregate
aggregate(Value ~Location, df1, sum)
If the values are in a different dataset and have the same order as in the original dataset 'Location', just do a cbind and aggregate
aggregate(Value ~Location, cbind(df1, df2), sum)
Assuming that there are no common columns in each dataset to merge
Update
Based on the OP's update
expand.grid(location = df1$location, code = df2$code)
Or CJ from data.table
library(data.table)
CJ(location = df1$location, code = df2$code)

Related

R: Assign values to column, from a column from another data frame, based on a condition (different sized data frames)

I need to create a new column in df1 named col_2, and assign it values from another data frame (df2). When the value in col_1 from df1 equals a value in col_a from df2, I want the corresponding value of col_b of df2 assigned to col_2.
The data frames are different sizes.
The data:
col_1 <- c(23,31,98,76,47,65,23,76,3,47)
col_2 <- NA
df1 <- data.frame(col_1, col_2)
col_a <- c(1:100)
col_b <- c(runif(100,0,1))
df2 <- data.frame(col_a, col_b)
I tried the following but none seemed to work... I keep running into the same problem, that the data frames are not of the same length.
for (i in 1:10){
if(df1$col_1[i] == df2$col_a[]){
df1$col_2[i] == df2$col_b[]
}
}
df1$col_2 <- ifelse(df2$col_a %in% df1$col_1, df2$col_b, NA)
df1$col_1[df1$col_1 %in% df2$col_a] <- df2$col_b[df1$col_1 %in% df2$col_a]
We can use left_join
library(dplyr)
left_join(df1, df2, by = c('col_1' = 'col_a'))

Subset a df and remove rows subsetted R

hello I have a df called df and I have subsetted it in another df called df1. Now I'd like to remove df1 rows from df to obtain a df2 = df - df1. How I can do it on R?
df <- read.csv("dataframe.csv")
df1 <- df[(df$time <= 0.345),]
Try:
df2 <- df[(df$time > 0.345), ]
or
df2 <- df[-which(df$time <= 0.345), ]
If for any reason you strictly have to keep the structure described, this is a possible approach:
df = data.frame(Sample.Name = c(12,13,14,12,13),
Target=c("A","B","C","A","A"),
Task=c("Sample","Standard","Sample","Standard","Sample"),
Value=c(36,34,34,35,36),
Mean=c(35,32,36,37,35))
df1 = df[(df$Value <= 34),]
df2 = df[do.call(paste0, df) %in% do.call(paste0, df1),]
df2
The result is this one:
Sample.Name Target Task Value Mean
2 13 B Standard 34 32
3 14 C Sample 34 36
This should work without even knowing the logic of first subset
library (dplyr)
df2 <- setdiff(df, df1)
OR
df2 <- anti_join(df, df1)

Return the row indices of df1 when those row values occur in df2 in R

I'm coding in R. I have a big data frame (df1) and a little data frame (df2). df2 is a subset of df1, but in a random order. I need to know the row indices of df1 which occur in df2. All of the specific cell values have lots of duplicates. Tapirus terrestris shows up more than once, as does each ModType value. I tried experimenting with which() and grpl() but couldn't get my code to work.
df1 <- data.frame(
SpeciesName = c('Tapirus terrestris', 'Panthera onca', 'Leopardus tigrinus' , 'Leopardus tigrinus'),
ModType = c('ANN', 'GAM', 'GAM','RF'),
Variable_scale = c('aspect_s2_sd', 'CHELSAbio1019_s3_sd','CHELSAbio1015_s4_sd','CHELSAbio1015_s4_sd'))
df2 <- data.frame(
SpeciesName = c('Tapirus terrestris', 'Leopardus tigrinus'),
ModType = c('ANN', 'RF'),
Variable_scale = c('aspect_s2_sd', 'CHELSAbio1015_s4_sd'))
Should output an array: 1,4 because df1 rows 1 and 4 occur in df2.
You can create an index column in df1 and merge the datasets.
df1$index <- 1:nrow(df1)
df3 <- merge(df1, df2)
df3$index
#[1] 4 1
You can use match.
df1[match(df2$SpeciesName, df1$SpeciesName), ]
Another option is tidyverse
library(dplyr)
df1 %>%
mutate(index = row_number()) %>%
inner_join(df2)

Repeated values ​when join data frames in r

when I merge dataframes, I write this code:
library(readxl)
df1 <- read_excel("C:/Users/PC/Desktop/precipitaciones_4Q.xlsx")
df2 <- read_excel("C:/Users/PC/Desktop/libro_copia_1.xlsx")
df1 = data.frame(df1)
df2 = data.frame(df2)
df1$codigo = toupper(df1$codigo)
df2$codigo = toupper(df2$codigo)
dat = merge.data.frame(df1,df2,by= "codigo", all.y = TRUE,sort = TRUE)
the data has rainfall counties, df1 has less counties than df2. I want to paste counties that has rainfall data from df1 to df2.
The problem occurs when counties data are paste into df2, repeat counties appears.
df1:
df2:
Instead "id" you must specify the column names for join from the first and second table.
You can use the data.table package and code below:
library(data.table)
dat <- merge(df1, df2, by.x = "Columna1", by.y = "prov", all.y = TRUE)
also, you can use funion function:
dat <- funion(df1, df2)
or rbind function:
dat <- rbind(df1, df2)
dat <- unique(dat)
Note: column names and the number of columns of the two dataframes needs to be same.

extracting a dataframe from a list over many objects

I have over a 1000 objects (z) in R, each containing three dataframes (df1, df2, df3) with different structures.
z1$df1 … z1000$df1
z1$df2 … z1000$df2
z1$df3 … z1000$df3
I created a list of these objects (list1 thus contains z1 thru z1000) and tried to use lapply to extract one type of dataframe (df2) for all objects, and then merge them to one single dataframe.
Extraction:
For a single object it would look like this:
df15<- z15$df2 # I transferred the index of z to the extracted df
I tried some code with lapply, ignoring the transfer of the index (I can create another list for that). However I don’t know what function I should use.
List2 <- lapply(list1, function(x))
I try to avoid using a loop because there's so many and vectorization is so much quicker. I have the idea I'm looking at it from the wrong angle.
Subsequent merging can be done as follows:
merged <- do.call(rbind, list2)
Thanks for any suggestions.
It sounds like you want to pull out all the df1s and rbind them together then do the same for the other dataframes. You can use purrr::map_dfr to extract a column from each element of the list and rowbind them together.
library('tidyverse')
dummy_df <- list(
df1 = iris,
df2 = cars,
df3 = CO2)
list1 <- list(
z1 = dummy_df,
z2 = dummy_df,
z3 = dummy_df)
df1 <- map_dfr(list1, 'df1')
df2 <- map_dfr(list1, 'df2')
df3 <- map_dfr(list1, 'df3')
If you wanted to do it in base R, you can use lapply.
df1 <- lapply(list1, function(x) x$df1)
df1_merged <- do.call(rbind, df1)
One option could be using lapply to extract data.frame and then use bind_rows from dplyr.
## The data
df1 <- data.frame(id = c(1:10), name = c(LETTERS[1:10]), stringsAsFactors = FALSE)
df2 <- data.frame(id = 11:20, name = LETTERS[11:20], stringsAsFactors = FALSE)
df3 <- data.frame(id = 21:30, name = LETTERS[15:24], stringsAsFactors = FALSE)
df4 <- data.frame(id = 121:130, name = LETTERS[15:24], stringsAsFactors = FALSE)
z1 <- list(df1 = df1, df2 = df2, df3 = df3)
z2 <- list(df1 = df1, df2 = df2, df3 = df3)
z3 <- list(df1 = df1, df2 = df2, df3 = df3)
z4 <- list(df1 = df1, df2 = df2, df3 = df4) #DFs can contain different data
# z <- list(z1, z2, z3, z4)
# Dynamically populate list z with many list object
z <- as.list(mget(paste("z",1:4,sep="")))
df1_all <- bind_rows(lapply(z, function(x) x$df1))
df2_all <- bind_rows(lapply(z, function(x) x$df2))
df3_all <- bind_rows(lapply(z, function(x) x$df3))
## Result for df3_all
> tail(df3_all)
## id name
## 35 125 S
## 36 126 T
## 37 127 U
## 38 128 V
## 39 129 W
## 40 130 X
Try this:
lapply(list1, "[[", "df2")
or if you want to rbind them together:
do.call("rbind", lapply(list1, "[[", "df2"))
The row names in the resulting data frame will identify the origin of each row.
No packages are used.
Note
We can use this input to test the code above. BOD is a built-in data frame:
z <- list(df1 = BOD, df2 = BOD, df3 = BOD)
list1 <- list(z1 = z, z2 = z)
THere's also data.table::rbindlist, which is likely faster than do.call(rbind, lapply(...)) or dplyr::bind_rows
library(data.table)
rbindlist(lapply(list1, "[[", "df2"))

Resources