R function to fix automatically formatted data - r

I am currently analyzing a baseball data set that has the count data included, however, some of the data has automatically been formatted as a date.
I have already tried using as.numeric but it does not help. I have provided a sample of the data below:
Count(Factor) 0-0 0-1 0-2 1-Feb 1-Jan 1-Mar 2-Feb 2-Jan 2-Mar
Feb-00 Jan-00 Mar-00
I would like to remove the date format. For instance, I want to see 1-Feb as 1-2, 1-Jan as 1-1, 1-Mar as 1-3, Feb-00 as 2-0.
Does anyone have any suggestions on how to do so?

You can replace the abbreviated months with their relevant calendar position by referencing months.abb. Below I have created a general function using Base R.
## function to apply
month_num <- function(x){
if (! grepl('\\w{3}', x))
return(x)
gsub('/?\\w{3}', as.character(match(regmatches(x, regexpr('(\\w{3})', x)), month.abb)), x)
}
## vector
strings <- c( '0-0', '0-1' ,'0-2', '1-Feb', '1-Jan', '1-Mar', '2-Feb', '2-Jan', '2-Mar', 'Feb-00', '/Jan-00', 'Mar-00')
sapply(strings, month_num, USE.NAMES = FALSE)
#> [1] "0-0" "0-1" "0-2" "1-2" "1-1" "1-3" "2-2" "2-1" "2-3" "2-00"
#> [11] "1-00" "3-00"
## data.frame or matrix
tmp <- data.frame(
strings = c( '0-0', '0-1' ,'0-2', '1-Feb', '1-Jan', '1-Mar', '2-Feb', '2-Jan', '2-Mar', 'Feb-00', '/Jan-00', 'Mar-00')
)
tmp$strings <- apply(tmp, 1, month_num)
tmp
#> strings
#> 1 0-0
#> 2 0-1
#> 3 0-2
#> 4 1-2
#> 5 1-1
#> 6 1-3
#> 7 2-2
#> 8 2-1
#> 9 2-3
#> 10 2-00
#> 11 1-00
#> 12 3-00
## list
strings <- list( '0-0', '0-1' ,'0-2', '1-Feb', '1-Jan', '1-Mar', '2-Feb', '2-Jan', '2-Mar', 'Feb-00', '/Jan-00', 'Mar-00')
strings <- lapply(strings, month_num)
tail(strings)
#> [[1]]
#> [1] "2-2"
#>
#> [[2]]
#> [1] "2-1"
#>
#> [[3]]
#> [1] "2-3"
#>
#> [[4]]
#> [1] "2-00"
#>
#> [[5]]
#> [1] "1-00"
#>
#> [[6]]
#> [1] "3-00"
Created on 2019-02-12 by the reprex package (v0.2.1)

Related

How can I write a regex to order the paths of which I want to list them in numeric order

I have hundreds of .wav files and imported them using list.files. Something like above:
[1] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Balsam-poplar-English-0701.wav"
[2] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Birch-English-0700.wav"
[3] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Blueberry-English-0703.wav"
.......
[73] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Capercaillie-English-0069.wav"
[74] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fat-tail-scorpion-English-0082.wav"
[75] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fire-salamander-English-0067.wav"
I use the following code to reorder the file paths of which I want number in each subpath follows numberic order. I have tried the following
filename<- file_list[order(as.numeric(stringr::str_extract(file_list,"[0-9]+(.*?)")) )]
The result is something like:
[1] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Capercaillie-English-0069.wav"
[2] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fat-tail-scorpion-English-0082.wav"
[3] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fire-salamander-English-0067.wav"
.......
[73] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Balsam-poplar-English-0701.wav"
[74] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Birch-English-0700.wav"
[75] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Blueberry-English-0703.wav"
I also want the last subpath follows in numberic order, e.g. English-0067;English-0069. I tried to repeat the matching for the last subpath, but it will disorder the previous order followed by 3...10. How could I let all the numbers in the subpaths follows numberic order?
another option:
ord <- order(as.numeric(sub("(^\\d+)/.*$","\\1",files)), as.numeric(sub("^.*-(\\d+)\\.wav","\\1",files)))
files[ord]
#> [1] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fire-salamander-English-0067.wav"
#> [2] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Capercaillie-English-0069.wav"
#> [3] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fat-tail-scorpion-English-0082.wav"
#> [4] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Birch-English-0700.wav"
#> [5] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Balsam-poplar-English-0701.wav"
#> [6] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Blueberry-English-0703.wav"
Here's one approach:
vec <- c( "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Balsam-poplar-English-0701.wav",
"10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Birch-English-0700.wav",
"10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Blueberry-English-0703.wav",
"3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Capercaillie-English-0069.wav",
"3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fat-tail-scorpion-English-0082.wav",
"3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fire-salamander-English-0067.wav")
nums <- strcapture("^([0-9]+).*\\b([0-9]+)\\.[a-z]+$", vec, proto=list(a=0L,b=0L))
nums
# a b
# 1 10 701
# 2 10 700
# 3 10 703
# 4 3 69
# 5 3 82
# 6 3 67
do.call(order, nums)
# [1] 6 4 5 2 1 3
vec[do.call(order, nums)]
# [1] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fire-salamander-English-0067.wav"
# [2] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Capercaillie-English-0069.wav"
# [3] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fat-tail-scorpion-English-0082.wav"
# [4] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Birch-English-0700.wav"
# [5] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Balsam-poplar-English-0701.wav"
# [6] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Blueberry-English-0703.wav"
If you needed to also include the BL-0001 in your ordering, all it would take is a small addition to the regex, an additional entry in proto=, and that's it. The use of do.call(order, nums) will handle 1 or more columns, regardless of how many.
Note that if you over-tune your regex, rows that don't match both groups here will return NA for both; this means it'll sort the NA rows last. If you find that one or more filenames are misordered, check the regex and the intermediate nums entries for those filenames.
A tidyverse solution: structuring data as a table and using stringr::str_detect() to arrange rows before extracting filenames.
vec <- c( "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Balsam-poplar-English-0701.wav",
"10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Birch-English-0700.wav",
"10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Blueberry-English-0703.wav",
"3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Capercaillie-English-0069.wav",
"3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fat-tail-scorpion-English-0082.wav",
"3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fire-salamander-English-0067.wav")
library(dplyr)
library(stringr)
vec_tib <- tibble(filename = vec)
vec_tib <- mutate(vec_tib,
num_1 = str_extract(filename, "\\d+"),
num_2 = str_extract(filename, "\\d+(?=(\\.wav))"))
head(vec_tib, 3)
#> # A tibble: 3 × 3
#> filename num_1 num_2
#> <chr> <chr> <chr>
#> 1 10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Balsa… 10 0701
#> 2 10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Birch… 10 0700
#> 3 10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Blueb… 10 0703
vec_tib <- mutate(vec_tib, across(starts_with("num"), as.numeric))
vec_tib |>
arrange(num_1, num_2) |>
pull(filename)
#> [1] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fire-salamander-English-0067.wav"
#> [2] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Capercaillie-English-0069.wav"
#> [3] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fat-tail-scorpion-English-0082.wav"
#> [4] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Birch-English-0700.wav"
#> [5] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Balsam-poplar-English-0701.wav"
#> [6] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Blueberry-English-0703.wav"
Created on 2022-11-28 with reprex v2.0.2

comparing two meta data and their variables and options

I am validating two data frames if they are consistent, its working on small dataframes perfectly but when records of data frame increases then it shows error
library(tidyverse)
df1 <- data.frame(MAN=c(6,6,4,6,8,6,8,4,4,6,6,8,8),MANi=c("OD","NY","CA","CA","OD","CA","OD","NY","OL","NY","OD","CA","OD"),
nune=c("akas","mani","juna","mau","nuh","kil","kman","nuha","huna","kman","nuha","huna","mani"),
klay=c(1,2,2,1,1,2,1,2,1,2,1,1,2),emial=c("dd","xyz","abc","dd","xyz","abc","dd","xyz","abc","dd","xyz","abc","dd"),Pass=c("Low","High","Low","Low","High","Low","High","High","Low","High","High","High","Low"),fri=c("KKK","USA","IND","SRI","PAK","CHI","JYP","TGA","KKK","USA","IND","SRI","PAK"),
mkl=c("m","f","m","m","f","m","m","f","m","m","f","m","m"),kin=c("Sent","Rec","Sent","Rec","Sent","Rec","Sent","Rec","Sent","Rec","Rec","Sent","Rec"),munc=c("Car","Bus","Truk","Cyl","Bus","Car","Bus","Bus","Bus","Car","Car","Cyl","Car"),
lone=c("Sr","jun","sr","jun","man","man","jr","Sr","jun","sr","jun","man","man"),wond=c("tko","kent","bho","kilt","kent","bho","kent","bho","bho","kilt","kent","bho","kilt"))
df2 <- data.frame(MAN=c(6,6,4,6,8,6,8,4,4,6,6,8,8,8,6),MANi=c("OD","NY","CA","CA","OD","CA","OD","NY","OL","ny","OD","CA","OD","NY","OL"),
nune=c("akas","mani","juna","mau","nuh","kil","kman","nuha","huna","kman","nuha","huna","mani","juna","mau"),
klay=c(1,2,2,1,1,2,1,2,1,2,1,1,2,2,1),emial=c("dd","xyz","ABC","dd","xyz","ABC","dd","xyz","ABC","dd","xyz","ABC","dd","xyz","ABC"),Pass=c("Low","High","Low","Low","High","Low","High","High","Low","High","High","High","Low","High","High"),fri=c("KKK","USA","IND","SRI","PAK","CHI","JYP","TGA","KKK","USA","IND","SRI","PAK","CHI","JYP"),
mkl=c("male","female","male","male","female","male","male","female","male","male","female","male","male","female","male"),kin=c("Sent","Rec","Sent","Rec","Sent","Rec","Sent","Rec","Sent","Rec","Rec","Sent","Rec","Sent","Rec"),munc=c("Car","Bus","Truk","Cyl","Bus","Car","Bus","Bus","Bus","Car","Car","Cyl","Car","Bus","Bus"),
lone=c("Sr","jun","sr","jun","man","man","jr","Sr","jun","sr","jun","man","man","jr","man"),wond=c("tko","kent","bho","kilt","kent","bho","kent","bho","bho","kilt","kent","bho","kilt","kent","bho"))
Worth considering waldo::compare?
df1 <- data.frame(MAN=c(6,6,4,6,8,6,8,4,4,6,6,8,8),MANi=c("OD","NY","CA","CA","OD","CA","OD","NY","OL","NY","OD","CA","OD"),
nune=c("akas","mani","juna","mau","nuh","kil","kman","nuha","huna","kman","nuha","huna","mani"),
klay=c(1,2,2,1,1,2,1,2,1,2,1,1,2),emial=c("dd","xyz","abc","dd","xyz","abc","dd","xyz","abc","dd","xyz","abc","dd"),Pass=c("Low","High","Low","Low","High","Low","High","High","Low","High","High","High","Low"),fri=c("KKK","USA","IND","SRI","PAK","CHI","JYP","TGA","KKK","USA","IND","SRI","PAK"),
mkl=c("m","f","m","m","f","m","m","f","m","m","f","m","m"),kin=c("Sent","Rec","Sent","Rec","Sent","Rec","Sent","Rec","Sent","Rec","Rec","Sent","Rec"),munc=c("Car","Bus","Truk","Cyl","Bus","Car","Bus","Bus","Bus","Car","Car","Cyl","Car"),
lone=c("Sr","jun","sr","jun","man","man","jr","Sr","jun","sr","jun","man","man"),wond=c("tko","kent","bho","kilt","kent","bho","kent","bho","bho","kilt","kent","bho","kilt"))
df2 <- data.frame(MAN=c(6,6,4,6,8,6,8,4,4,6,6,8,8,8,6),MANi=c("OD","NY","CA","CA","OD","CA","OD","NY","OL","ny","OD","CA","OD","NY","OL"),
nune=c("akas","mani","juna","mau","nuh","kil","kman","nuha","huna","kman","nuha","huna","mani","juna","mau"),
klay=c(1,2,2,1,1,2,1,2,1,2,1,1,2,2,1),emial=c("dd","xyz","ABC","dd","xyz","ABC","dd","xyz","ABC","dd","xyz","ABC","dd","xyz","ABC"),Pass=c("Low","High","Low","Low","High","Low","High","High","Low","High","High","High","Low","High","High"),fri=c("KKK","USA","IND","SRI","PAK","CHI","JYP","TGA","KKK","USA","IND","SRI","PAK","CHI","JYP"),
mkl=c("male","female","male","male","female","male","male","female","male","male","female","male","male","female","male"),kin=c("Sent","Rec","Sent","Rec","Sent","Rec","Sent","Rec","Sent","Rec","Rec","Sent","Rec","Sent","Rec"),munc=c("Car","Bus","Truk","Cyl","Bus","Car","Bus","Bus","Bus","Car","Car","Cyl","Car","Bus","Bus"),
lone=c("Sr","jun","sr","jun","man","man","jr","Sr","jun","sr","jun","man","man","jr","man"),wond=c("tko","kent","bho","kilt","kent","bho","kent","bho","bho","kilt","kent","bho","kilt","kent","bho"))
waldo::compare(df1, df2)
#> `attr(old, 'row.names')[11:13]`: 11 12 13
#> `attr(new, 'row.names')[11:15]`: 11 12 13 14 15
#>
#> old vs new
#> MAN MANi nune klay emial Pass fri mkl kin munc lone wond
#> - old[1, ] 6 OD akas 1 dd Low KKK m Sent Car Sr tko
#> + new[1, ] 6 OD akas 1 dd Low KKK male Sent Car Sr tko
#> - old[2, ] 6 NY mani 2 xyz High USA f Rec Bus jun kent
#> + new[2, ] 6 NY mani 2 xyz High USA female Rec Bus jun kent
#> - old[3, ] 4 CA juna 2 abc Low IND m Sent Truk sr bho
#> + new[3, ] 4 CA juna 2 ABC Low IND male Sent Truk sr bho
#> - old[4, ] 6 CA mau 1 dd Low SRI m Rec Cyl jun kilt
#> + new[4, ] 6 CA mau 1 dd Low SRI male Rec Cyl jun kilt
#> - old[5, ] 8 OD nuh 1 xyz High PAK f Sent Bus man kent
#> + new[5, ] 8 OD nuh 1 xyz High PAK female Sent Bus man kent
#> - old[6, ] 6 CA kil 2 abc Low CHI m Rec Car man bho
#> + new[6, ] 6 CA kil 2 ABC Low CHI male Rec Car man bho
#> - old[7, ] 8 OD kman 1 dd High JYP m Sent Bus jr kent
#> + new[7, ] 8 OD kman 1 dd High JYP male Sent Bus jr kent
#> - old[8, ] 4 NY nuha 2 xyz High TGA f Rec Bus Sr bho
#> + new[8, ] 4 NY nuha 2 xyz High TGA female Rec Bus Sr bho
#> - old[9, ] 4 OL huna 1 abc Low KKK m Sent Bus jun bho
#> + new[9, ] 4 OL huna 1 ABC Low KKK male Sent Bus jun bho
#> - old[10, ] 6 NY kman 2 dd High USA m Rec Car sr kilt
#> + new[10, ] 6 ny kman 2 dd High USA male Rec Car sr kilt
#> and 5 more ...
#>
#> `old$MAN[11:13]`: 6 8 8
#> `new$MAN[11:15]`: 6 8 8 8 6
#>
#> `old$MANi[10:13]`: "NY" "OD" "CA" "OD"
#> `new$MANi[7:15]`: "OD" "NY" "OL" "ny" "OD" "CA" "OD" "NY" "OL"
#>
#> `old$nune[11:13]`: "nuha" "huna" "mani"
#> `new$nune[11:15]`: "nuha" "huna" "mani" "juna" "mau"
#>
#> `old$klay[11:13]`: 1 1 2
#> `new$klay[11:15]`: 1 1 2 2 1
#>
#> old$emial | new$emial
#> [2] "xyz" - "dd" [1]
#> [3] "abc" - "xyz" [2]
#> [4] "dd" - "ABC" [3]
#> [5] "xyz" - "dd" [4]
#> [6] "abc" - "xyz" [5]
#> [7] "dd" - "ABC" [6]
#> [8] "xyz" - "dd" [7]
#> [9] "abc" - "xyz" [8]
#> [10] "dd" - "ABC" [9]
#> [11] "xyz" - "dd" [10]
#> ... ... ... and 5 more ...
#>
#> `old$Pass[11:13]`: "High" "High" "Low"
#> `new$Pass[11:15]`: "High" "High" "Low" "High" "High"
#>
#> `old$fri[11:13]`: "IND" "SRI" "PAK"
#> `new$fri[11:15]`: "IND" "SRI" "PAK" "CHI" "JYP"
#>
#> old$mkl | new$mkl
#> [1] "m" - "male" [1]
#> [2] "f" - "female" [2]
#> [3] "m" - "male" [3]
#> [4] "m" - "male" [4]
#> [5] "f" - "female" [5]
#> [6] "m" - "male" [6]
#> [7] "m" - "male" [7]
#> [8] "f" - "female" [8]
#> [9] "m" - "male" [9]
#> [10] "m" - "male" [10]
#> ... ... ... and 5 more ...
#>
#> And 4 more differences ...
Created on 2022-05-21 by the reprex package (v2.0.1)
Or the daff package for highlighted sortable / filterable differences:
library(daff)
diffs <- diff_data(df1, df2)
render_diff(diffs)

Trying to extract specific characters in a column in R?

The content in the column appears as follows $1,521+ 2 bds. I want to extract 1521 and put it in a new column. I know this can be done in alteryx using regex can I do it R?
How about the following?:
library(tidyverse)
x <- '$1,521+ 2 bds'
parse_number(x)
For example:
library(tidyverse)
#generate some data
tbl <- tibble(string = str_c('$', as.character(seq(1521, 1541, 1)), '+', ' 2bds'))
new_col <-
tbl$string %>%
str_split('\\+',simplify = TRUE) %>%
`[`(, 1) %>%
str_sub(2, -1) #get rid of '$' at the start
mutate(tbl, number = new_col)
#> # A tibble: 21 x 2
#> string number
#> <chr> <chr>
#> 1 $1521+ 2bds 1521
#> 2 $1522+ 2bds 1522
#> 3 $1523+ 2bds 1523
#> 4 $1524+ 2bds 1524
#> 5 $1525+ 2bds 1525
#> 6 $1526+ 2bds 1526
#> 7 $1527+ 2bds 1527
#> 8 $1528+ 2bds 1528
#> 9 $1529+ 2bds 1529
#> 10 $1530+ 2bds 1530
#> # … with 11 more rows
Created on 2021-06-12 by the reprex package (v2.0.0)
We can use sub from base R
as.numeric( sub("\\$(\\d+),(\\d+).*", "\\1\\2", x))
#[1] 1521
data
x <- '$1,521+ 2 bds'

Use mutate inside a function called by an apply family function

I am trying to change some of my data that are stored as tibbles inside a list.
This list of tibbles was generated by a package.
I do not understand why my function does not work.
If I extract a tibble element manually, the function works but not inside a lapply.
my function:
changesomethingtaxize <- function(x, whatchange=NULL, applyfunction=NULL){
library(lazyeval) ;
mutate_call <- lazyeval::interp(~ a(b), a = match.fun(applyfunction), b = as.name(whatchange) )
x %<>% mutate_(.dots = setNames(list(mutate_call), whatchange) )
return(x)
}
I want to do
mydata <- lapply(mydata, function(x) changesomethingtaxize(x, whatchange=rank, applyfunction=str_to_sentence) )
I could use a loop to extract each tibbles (in this case I only have 5) but I would like to understand what I do wrong :)
From dput()
mydata <- structure(list(`Zostera marina` = structure(list(name = c("Plantae",
"Viridiplantae", "Streptophyta", "Embryophyta", "Tracheophyta",
"Spermatophytina", "Magnoliopsida", "Lilianae", "Alismatales",
"Zosteraceae", "Zostera", "Zostera marina"), rank = c("kingdom",
"subkingdom", "infrakingdom", "superdivision", "division", "subdivision",
"class", "superorder", "order", "family", "genus", "species"),
id = c("202422", "954898", "846494", "954900", "846496",
"846504", "18063", "846542", "38883", "39069", "39073", "39074"
)), row.names = c(NA, 12L), class = "data.frame"), `Vascular plants` = structure(list(
name = c("Plantae", "Viridiplantae", "Streptophyta", "Embryophyta",
"Tracheophyta"), rank = c("kingdom", "subkingdom", "infrakingdom",
"superdivision", "division"), id = c("202422", "954898",
"846494", "954900", "846496")), row.names = c(NA, 5L), class = "data.frame"),
`Fucus vesiculosus` = structure(list(name = c("Chromista",
"Chromista", "Phaeophyta", "Phaeophyceae", "Fucales", "Fucaceae",
"Fucus", "Fucus vesiculosus"), rank = c("kingdom", "subkingdom",
"division", "class", "order", "family", "genus", "species"
), id = c("630578", "590735", "660055", "10686", "11328",
"11329", "11334", "11335")), row.names = c(NA, 8L), class = "data.frame"),
Macroalgae = NA, `Filamentous algae` = NA), class = "classification", db = "itis")
I think I actually found why... :D
The lapply works but was not returning anything because of the NAs (empty elements of the list).
I added an if() that only mutates a tibble if the tibble actually contains something.
It is always an NA issue somewhere!
Well hope that piece of code could help someone someday.
The functions you provided aren't usable by themselves, but it looks like you're attempting to use a function meant to modify a data frame on non-dataframe objects, which mydata contains.
I'm using dplyr::mutate() just to illustrate here.
Your data contain NAs (which in this case are logical). dplyr::mutate() doesnt' have a method for logicals and I'm assuming the function you're trying to use doesn't either (or simply doesn't have a way of handling NA values).
You should be getting an error that's at least conceptually similar to the following ...
lapply(mydata, function(x) dplyr::mutate(x, col_to_modify = toupper(rank)))
#> Error in UseMethod("mutate_"): no applicable method for 'mutate_' applied to an object of class "logical"
To get around this, you can check your list ahead of time and note which elements are indeed data frames.
df_indices <- vapply(mydata, is.data.frame, logical(1L))
df_indices
#> Zostera marina Vascular plants Fucus vesiculosus Macroalgae
#> TRUE TRUE TRUE FALSE
#> Filamentous algae
#> FALSE
Using df_indices, we can modify only those elements in mydata like so...
mydata[df_indices] <- lapply(
mydata[df_indices],
function(x) dplyr::mutate(x, col_to_modify = toupper(rank))
)
mydata
#> $`Zostera marina`
#> name rank id col_to_modify
#> 1 Plantae kingdom 202422 KINGDOM
#> 2 Viridiplantae subkingdom 954898 SUBKINGDOM
#> 3 Streptophyta infrakingdom 846494 INFRAKINGDOM
#> 4 Embryophyta superdivision 954900 SUPERDIVISION
#> 5 Tracheophyta division 846496 DIVISION
#> 6 Spermatophytina subdivision 846504 SUBDIVISION
#> 7 Magnoliopsida class 18063 CLASS
#> 8 Lilianae superorder 846542 SUPERORDER
#> 9 Alismatales order 38883 ORDER
#> 10 Zosteraceae family 39069 FAMILY
#> 11 Zostera genus 39073 GENUS
#> 12 Zostera marina species 39074 SPECIES
#>
#> $`Vascular plants`
#> name rank id col_to_modify
#> 1 Plantae kingdom 202422 KINGDOM
#> 2 Viridiplantae subkingdom 954898 SUBKINGDOM
#> 3 Streptophyta infrakingdom 846494 INFRAKINGDOM
#> 4 Embryophyta superdivision 954900 SUPERDIVISION
#> 5 Tracheophyta division 846496 DIVISION
#>
#> $`Fucus vesiculosus`
#> name rank id col_to_modify
#> 1 Chromista kingdom 630578 KINGDOM
#> 2 Chromista subkingdom 590735 SUBKINGDOM
#> 3 Phaeophyta division 660055 DIVISION
#> 4 Phaeophyceae class 10686 CLASS
#> 5 Fucales order 11328 ORDER
#> 6 Fucaceae family 11329 FAMILY
#> 7 Fucus genus 11334 GENUS
#> 8 Fucus vesiculosus species 11335 SPECIES
#>
#> $Macroalgae
#> [1] NA
#>
#> $`Filamentous algae`
#> [1] NA
#>
#> attr(,"class")
#> [1] "classification"
#> attr(,"db")
#> [1] "itis"
Note that {purrr} has a nice map() variant designed to handle this very situation. purrr::map_if() takes a .p (predicate) argument to which you can provide a function that it applies to .x and returns TRUE or FALSE. Only those elements that return TRUE are modified by the function you provide to .f
purrr::map_if(.x = mydata, .p = is.data.frame,
.f = ~ dplyr::mutate(.x, col_to_modify = toupper(rank)))
#> $`Zostera marina`
#> name rank id col_to_modify
#> 1 Plantae kingdom 202422 KINGDOM
#> 2 Viridiplantae subkingdom 954898 SUBKINGDOM
#> 3 Streptophyta infrakingdom 846494 INFRAKINGDOM
#> 4 Embryophyta superdivision 954900 SUPERDIVISION
#> 5 Tracheophyta division 846496 DIVISION
#> 6 Spermatophytina subdivision 846504 SUBDIVISION
#> 7 Magnoliopsida class 18063 CLASS
#> 8 Lilianae superorder 846542 SUPERORDER
#> 9 Alismatales order 38883 ORDER
#> 10 Zosteraceae family 39069 FAMILY
#> 11 Zostera genus 39073 GENUS
#> 12 Zostera marina species 39074 SPECIES
#>
#> $`Vascular plants`
#> name rank id col_to_modify
#> 1 Plantae kingdom 202422 KINGDOM
#> 2 Viridiplantae subkingdom 954898 SUBKINGDOM
#> 3 Streptophyta infrakingdom 846494 INFRAKINGDOM
#> 4 Embryophyta superdivision 954900 SUPERDIVISION
#> 5 Tracheophyta division 846496 DIVISION
#>
#> $`Fucus vesiculosus`
#> name rank id col_to_modify
#> 1 Chromista kingdom 630578 KINGDOM
#> 2 Chromista subkingdom 590735 SUBKINGDOM
#> 3 Phaeophyta division 660055 DIVISION
#> 4 Phaeophyceae class 10686 CLASS
#> 5 Fucales order 11328 ORDER
#> 6 Fucaceae family 11329 FAMILY
#> 7 Fucus genus 11334 GENUS
#> 8 Fucus vesiculosus species 11335 SPECIES
#>
#> $Macroalgae
#> [1] NA
#>
#> $`Filamentous algae`
#> [1] NA

Importing multiple .csv files into R with differents names

I have one work directory with 37 Locations.csv and 37 Behavior.csv
See below that has some files having the same number as 111868-Behavior.csv and 111868-Behavior 2.csv, so also with Locations.csv
#here some of the csv in the work directory
dir()
[1] "111868-Behavior 2.csv" "111868-Behavior.csv"
[3] "111868-Locations 2.csv" "111868-Locations.csv"
[5] "111869-Behavior.csv" "111869-Locations.csv"
[7] "111870-Behavior 2.csv" "111870-Behavior.csv"
[9] "111870-Locations 2.csv" "111870-Locations.csv"
[11] "112696-Behavior 2.csv" "112696-Behavior.csv"
[13] "112696-Locations 2.csv" "112696-Locations.csv"
I can't change the name of files.
I want to import all the 36 Locations and 36 Behaviors, but when I tried this
#Create list of all behaviors
bhv <- list.files(pattern="*-Behavior.csv")
bhv2 <- list.files(pattern="*-Behavior 2.csv")
#Throw them altogether
bhv_csv = ldply(bhv, read_csv)
bhv_csv2 = ldply(bhv2, read_csv)
#Join bhv_csv and bhv_csv2
b<-rbind(bhv_csv,bhv_csv2)
#Create list of all locations
loc <- list.files(pattern="*-Locations.csv")
loc2 <- list.files(pattern="*-Locations 2.csv")
#Throw them altogether
loc_csv = ldply(loc, read_csv)
loc_csv2 = ldply(loc2, read_csv)
#Join loc_csv and loc_csv2
l<-rbind(loc_csv,loc_csv2)
Shows me only 28, not 36 like I spected
length(unique(b$Ptt))
[1] 28
length(unique(l$Ptt))
[1] 28
This number 28, is about all Behaviors.csv and Locations.csv without Behaviors 2.csv and Locations 2.csv (those with number "2" are 8 in total each one)
I want to import all the files Behaviors and all the Locations in a way that shows the 36 Behaviors and Locations. How can I do that?
You can use purrr::map to simplify some of your code:
library("tidyverse")
library("readr")
# Create two small csv files
write_lines("a,b\n1,2\n3,4", "file1.csv")
write_lines("a,c\n5,6\n7,8", "file2.csv")
list.files(pattern = "*.csv") %>%
# `map` will cycle through the files and read each one
map(read_csv) %>%
# and then we can bind them all together
bind_rows()
#> Parsed with column specification:
#> cols(
#> a = col_double(),
#> b = col_double()
#> )
#> Parsed with column specification:
#> cols(
#> a = col_double(),
#> c = col_double()
#> )
#> # A tibble: 4 x 3
#> a b c
#> <dbl> <dbl> <dbl>
#> 1 1 2 NA
#> 2 3 4 NA
#> 3 5 NA 6
#> 4 7 NA 8
Created on 2019-03-28 by the reprex package (v0.2.1)

Resources