I have a dataframe with 5000 rows, containing municipalities data from which I need to extract only rows matching a specific set of names. I am iterating the set through my dataframe using for loop.
This is for R 3.6.0
data <- NULL
for (i in mun.names){
data <- area.mun[area.mun[, 1] == i, ]
}
The object mun.names contain the municipalities I need to match. The object area.mun has the two columns NAME and AREA. The first column of both objects has municipalities names formatted accordingly.
At the end of the for loop my resulting object data always has only one value, the last municipality of the object area.mun.
This is a simple error. I appreciate any kind of feedback.
Convert your 'mun.names' to data frame:
mun.names <- data.frame(mun.names)
Change the column name to 'NAME':
colnames(mun.names) <- c(NAME)
Convert your 'area.mun' to data frame:
area.mun <- data.frame(area.mun)
Use merge command to extract the matched rows:
df <- merge(area.mun,mun.names,by.x="NAME",by.y="NAME")
You can also get all the unmatched rows from mun.names and area.mun data frames using all.x=TRUE and all.y=TRUE
df <- merge(area.mun,mun.names,by.x="NAME",by.y="NAME",all.x=TRUE, all.y=TRUE)
Related
Hello, I have this type of table consisting of a single row and several columns. I have tried a code to extract my KD_PL parameters without success. Do you know a way in R to extract all the KD_PLs and store them in a vector or data frame array?
I tried this:
KDPL <- select("KD_PL.", which(substr(colnames(max_LnData), start=1, stop=6)))
This should do the trick:
library(tidyverse)
KDPL <- max_LnData %>% select(starts_with("KD_PL."))
This function selects all columns from your old dataset starting with "KD_PL." and stores them in a new dataframe KDPL.
If you only want the names of the columns to be saved, you could use the following:
KDPL_names <- colnames(KDPL)
This saves the column names in the vector KDPL_names.
Given: a list of data frames with the same number of columns, but varying number of rows.
Trying to get: a function that extracts the same column from all data frames and generates another column that labels from which data frame the common column came from.
My reasoning:
Use column I want and the name of the data frame to make a new data frame that has two columns: column of labels (name of the dataframe) and the column of interest
do this for every data frame.
rbind all dataframe
Make a function that does this for as many data frame inputs as requested
Given:
a <- data.frame(V1=c(1:3),V2=c(1001:1003))
b <- data.frame(V1=c(1:5),V2=c(2001:2005))
What I want:
rbind(data.frame(group="a",value=c(a$V2)),data.frame(group="b",value=c(b$V2)))
Effort to make a function that does this:
my_fn <- function(...) {
arg <- structure(list(...),names=as.list(substitute(list(...)))[-1L])
do.call(rbind,lapply(arg,function(x) {data.frame(group="x",value=c(x$V2))})) %>% return
}
In the function I tried, I can almost get what I want, except group="x" is read as "x". But I want it to read it as x=na me of object in the list and then put " ".
I am using one of Rs built in datasets called USArrests. It looks like instead of the rows having a numeric ID, they have a State as the row ID. Now how do I create a vector containing all of these state names?
I would generally use myvec <- c(USArrests$colname) but I am not sure how to access the states as it is not considered a normal column
data("USArrests")
head(USArrests)
vector_of_names <- rownames(USArrests)
##if you want to append to the dataframe
USArrests$state_name <-rownames(USArrests)
USArrests
new.df <- as.data.frame(match(unique_numbers$ID, MASTERFILE$ID))
I have a few million rows in a data frame called MASTERFILE. It contains a column "ID" with a bunch of integers. I have another data frame called "unique_numbers" which has a similar integer column "ID" with numbers in it.
I want to match the two "ID" columns from the different data frames so that the IDs that match in the MASTERFILE, will be copied to the new data frame "new.df".
The above command seems to work, but I'm afraid it only goes through each number ones, and the MASTERFILE may have the same ID written multiple times in different rows which I think it doesn't pick up!
You could use %in%
new.df <- MASTERFILE[MASTERFILE$ID %in% unique_numbers$ID, ];
Or if you want new.df to only contain the ID column:
new.df <- MASTERFILE[MASTERFILE$ID %in% unique_numbers$ID, "ID"];
Its hard to see with no example (appreciate its millions of rows) even sample data would be good.
This may work?
# dummy data different ID lengths
unique_numbers <- rep(1:1000,each=10)
master_id <- rep(1:2000,each=20)
# subset the ones that match
new.df <- subset(master_id , master_id %in% unique_numbers)
or to your specific case:
new.df <- subset(MASTERFILE$ID, MASTERFILE$ID %in% unique_numbers$ID)
I'm trying to store a bunch of dataframes in a list, and each of these dataframes has column names that are important (they are stock names, which are different for each dataframe).
I'm storing them in a list because this way it can be done with a foreach loop, which will allow me to run this beforehand, then use the list as a database of information.
right now I have:
Y.matrices <- foreach(i = (1:600)) %dopar% {
df = data.frame(data)
return(df)
}
The issue with this is once I store them, I'm not sure how to get the data frames back. If I do:
unlist(Y.matrices[1])
I get a long numeric vector that has lost the column names. Is there some other way to store these data frames (ie, perhaps not in a list) that would enable me to preserve the formats?
Thanks!
To access 1 individual dataframe, you can use Y.matrices[[#]], where # is the dataframe you want to access, if the result needs to be 1 merged dataframe with all the 600 dataframes you can use:
library(dplyr)
df1 <- bind_rows(Y.matrices, .id = "df")
The .id fills in the number of the data.frame, or if they are named in the list, the name of the dataframe.