I have a data frame with only one column. Column contain some names. I need change this data frame.
I created a list with some places:
voos_inter <- c("PUJ","SCL","EZE","MVD","ASU","VVI")
How can i include on this data frame the number of column according the names of the list?
Is a vector your one column data frame? You can convert a vector to a data.frame and add columns. I use to add columns with NA and add values later. Check this example:
vtr <-c(1:6)
df <- as.data.frame(vtr)
voos_inter <- c("PUJ","SCL","EZE","MVD","ASU","VVI")
df[,2:(length(voos_inter)+1)] <- NA
names(df)[2:(length(voos_inter)+1)] <- voos_inter
df
vtr PUJ SCL EZE MVD ASU VVI
1 1 NA NA NA NA NA NA
2 2 NA NA NA NA NA NA
3 3 NA NA NA NA NA NA
4 4 NA NA NA NA NA NA
5 5 NA NA NA NA NA NA
6 6 NA NA NA NA NA NA
Related
This question already has answers here:
Initialize an empty tibble with column names and 0 rows
(6 answers)
Closed 1 year ago.
After transposing my data I am right now at this stage:
Alex Aro
Billie Piper
Chris Fe
Daron Chlim
Erik Fuc
(3000 more names)
Only headers, but no data inside. Now I want to populate the empty dataframe like this:
Alex Aro
Billie Piper
Chris Fe
Daron Chlim
Erik Fuc
(3000 more names)
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
(18 000 rows)
NA
NA
NA
NA
NA
It does not matter if I have zeros or NA in the end. As you can see lots of rows and columns, so code is only useful if I do not have to type every row and column by myself. Thanks in advance!
You subset the empty dataframe. If the dataframe with headers is called df to create 18000 rows with NA values you can do -
out <- df[1:18000, ]
rownames(out) <- NULL
out
This question already has answers here:
Add empty columns to a dataframe with specified names from a vector
(6 answers)
Closed 4 years ago.
I have some data:
df = data.frame(matrix(rnorm(20), nrow=10))
X1 X2
1 1.17596402 0.06138821
2 -1.76439330 1.03674803
3 -0.39069424 0.61616793
4 0.68375346 0.27435354
5 0.27426476 -1.71226109
6 -0.06153577 1.14514453
7 -0.37067621 -0.61243104
8 1.11107852 0.47788971
9 -1.73036658 0.31545148
10 -1.83155718 -0.14433432
I want to add new variables to it for every element in a list, which changes:
list = c("a","b","c")
The result should be:
X1 X2 a b c
1 1.17596402 0.06138821 NA NA NA
2 -1.76439330 1.03674803 NA NA NA
3 -0.39069424 0.61616793 NA NA NA
4 0.68375346 0.27435354 NA NA NA
5 0.27426476 -1.71226109 NA NA NA
6 -0.06153577 1.14514453 NA NA NA
7 -0.37067621 -0.61243104 NA NA NA
8 1.11107852 0.47788971 NA NA NA
9 -1.73036658 0.31545148 NA NA NA
10 -1.83155718 -0.14433432 NA NA NA
I can do this using suggestions below:
df[list] <- NA
But now, I want to search every row for the variable name as a value and flag if it contains that value. For example:
X1 X2 a b c
1 a b 1 1 0
2 a c 1 0 1
So the code would search for "a" in all columns and flag if any column contains "a". How do I do this?
You can use
df[list] <- NA
The result:
X1 X2 a b c
1 -2.07205164 -0.93585363 NA NA NA
2 1.11014587 0.23468072 NA NA NA
3 -1.17909665 0.04741478 NA NA NA
4 0.23955056 1.02029880 NA NA NA
5 -0.79212220 -1.13485661 NA NA NA
6 -0.57571547 0.33069641 NA NA NA
7 -0.70063920 -0.17251563 NA NA NA
8 1.90625189 0.30277177 NA NA NA
9 0.09029121 -0.72104778 NA NA NA
10 -1.36324313 -1.48041873 NA NA NA
If you want to add only the variables that are not present in df, you can use:
df[list[!list %in% names(df)]] <- NA
I want to have a column's values equal another column's values if the first column's value is NA in this row. So I want to change something like this
A B
3 NA
NA NA
NA NA
5 NA
NA NA
NA NA
7 5
to something like this
A B
3 3
NA NA
NA NA
5 5
NA NA
NA NA
7 5
I am fairly new to R and any other kind of programming.
As per OP's description:
equal another column's values if the first column's value is NA in
this row
Could you please try following and let me know if this helps you.
df21223$B[is.na(df21223$B[1])] <- df21223$A
Output will be as follows for data frame's B part:
> df21223$B
[1] 3 NA NA 5 NA NA 7
Where Sample data is:
> df21223$A
[1] 3 NA NA 5 NA NA 7
> df21223$B
[1] NA NA NA NA NA NA NA
try:
df$B[is.na(df$B)] <- df$A
I would like to subset my data frame by selecting columns with partial characters recognition, which works when I have a single "name" to recognize.
where the data frame is:
ABBA01A ABBA01B ABBA02A ABBA02B ACRU01A ACRU01B ACRU02A ACRU02B
1908 NA NA NA NA NA NA NA NA
1909 NA NA NA NA NA NA NA NA
1910 NA NA NA NA NA NA NA NA
1911 NA NA NA NA NA NA NA NA
1912 NA NA NA NA NA NA NA NA
1913 NA NA NA NA NA NA NA NA
library(stringr)
df[str_detect(names(df), "ABBA" )]
works, and returns:
ABBA01A ABBA01B ABBA02A ABBA02B
1908 NA NA NA NA
So, I would like to create a dataframe for each of my species:
Speciesnames=unique ( substring (names(df),0, 4))
Speciesnames
[1] "ABBA" "ACRU" "ARCU" "PIAB" "PIGL"
I have tried to make a loop and use [i] as species name but the str_detect funtion does not recognise it.
and I would like to add additional calculations in the loop
for ( i in seq_along(Speciesnames)){
df=df[str_detect(names(df), pattern =[i])]
print(df)
#my function for the subsetted dataframe
}
thank you for your help!
Using your data you could do the following:
create a list to hold the data.frames to be created.
filter the data.frames and store in the list
give each data.frame the name of of the specie
bring all the data.frames to the global environment out of the list
Speciesnames <- unique(substring(names(df),0, 4))
data <- vector("list", length(Speciesnames))
for(i in seq_along(Speciesnames)) {
data[[i]] <- df %>% select(starts_with(Speciesnames[i]))
}
names(data) <- Speciesnames
list2env(data, envir = globalenv())
The end result after list2envis 2 data.frames called "ABBA" "ACRU" which you then can access. If further manipulation is needed you might leave everything in the list and do it there.
An option is to use mapply with SIMPLIFY=FALSE to return list of data frames for each species. startsWith function from base-R will provide option to subset columns starting with specie name.
# First find species but taking unique first 4 characters from column names
species <- unique(gsub("([A-Z]{4}).*", "\\1",names(df)))
# Pass each species
listOfDFs <- mapply(function(x){
df[,startsWith(names(df),x)] # Return only columns starting with species
}, species, SIMPLIFY=FALSE)
listOfDFs
# $ABBA
# ABBA01A ABBA01B ABBA02A ABBA02B
# 1908 NA NA NA NA
# 1909 NA NA NA NA
# 1910 NA NA NA NA
# 1911 NA NA NA NA
# 1912 NA NA NA NA
# 1913 NA NA NA NA
#
# $ACRU
# ACRU01A ACRU01B ACRU02A ACRU02B
# 1908 NA NA NA NA
# 1909 NA NA NA NA
# 1910 NA NA NA NA
# 1911 NA NA NA NA
# 1912 NA NA NA NA
# 1913 NA NA NA NA
Data:
df <- read.table(text =
"ABBA01A ABBA01B ABBA02A ABBA02B ACRU01A ACRU01B ACRU02A ACRU02B
1908 NA NA NA NA NA NA NA NA
1909 NA NA NA NA NA NA NA NA
1910 NA NA NA NA NA NA NA NA
1911 NA NA NA NA NA NA NA NA
1912 NA NA NA NA NA NA NA NA
1913 NA NA NA NA NA NA NA NA",
header = TRUE, stringsAsFactors = FALSE)
I think that you should select all matching columns first, and then subselect your data.frame.
patterns <- c("ABB", "CDC")
res <- lapply(patterns, function(x) grep(x, colnames(df), value=TRUE))
df[, unique(unlist(res))]
res object is a list of matched columns for each pattern
Next step is to select unique set of columns: unique(unlist(res)) and subselect data.frame.
If you are writing production code probably it is not the best answer.
INPUT
specimens: character vector of 60 items: specimen1A, specimen1B, specimen2A ... specimen 30B.
DESIRED OUTPUT
A matrix or a dataframe in which each item in specimens is the name of a column in the matrix/dataframe.
The number of rows must be set to a fixed value (any).
The data for the cells will be filled with subsequent code so can be left as NA.
For example:
specimen1A specimen1B specimen2A ... specimen 30B
1 NA NA NA NA
2 NA NA NA NA
3 NA NA NA NA
.. .. .. .. ..
100 NA NA NA NA
Thanks
A data.frame is just a list with some added attributes. Just coerce it:
> specimens <- list(A=runif(10),B=runif(10))
>
> as.data.frame(specimens)
A B
1 0.6746436 0.7599987
2 0.2198677 0.5004017
3 0.4927745 0.9455003
4 0.8028011 0.8718274
5 0.6190707 0.7415874
6 0.5273992 0.8118802
7 0.6602548 0.4432799
8 0.5820781 0.8117375
9 0.8196531 0.5172833
10 0.0683938 0.0205693
Edit: Re-reading your problem, I suspect specimens is a character vector not really a list. If so:
N.rows <- 10
specimens <- c("A","B")
spec.dat <- as.data.frame(matrix(NA,nrow=N.rows,ncol=length(specimens)))
colnames(spec.dat) <- specimens
> spec.dat
A B
1 NA NA
2 NA NA
3 NA NA
4 NA NA
5 NA NA
6 NA NA
7 NA NA
8 NA NA
9 NA NA
10 NA NA