I can't seem to access the data in my file? - r

library(tidyverse)
y <- read_tsv("assignment_data.tsv")
x <- 1
When I check R console I get the following:
> y <- read_tsv("assignment_data.tsv", header=TRUE)
Error in read_tsv("assignment_data.tsv", header = TRUE) :
unused argument (header = TRUE)
>
> x <- 1
>
However, I can only access x in the global environment and I can't visualize the data in the file I tried to import.

Regarding your error:
Error in read_tsv("assignment_data.tsv", header = TRUE) :
unused argument (header = TRUE)
If you use
?read_tsv
you will find header is not one of the arguments. Instead, you are looking for col_names
Edit:
We found out the problem laid within the tsv itself. The number of column names did not match the number of columns (implied by data)

Related

Species List download with RGBIF package - Error in `[[<-`(`*tmp*`, taxon, value = occ_data(scientificName = taxon)) : no such index at level 1

I am trying to extract data using the rgbif package for multiple species (once the code works I'll be running a list of about 200 species, so it is important for me to implement a list).
I have tried to adapt code written in following link:
https://github.com/ropensci/rgbif/issues/377
This is what my input file looks like:
csv file
And my code looks as follows:
library("rgbif")
#input <- read.csv("C:/Users/omi30wk/Desktop/TESTsampledata_udi.csv", header = TRUE, fill = TRUE, sep = ",")
#since you guys don't have my csv file here are three samples species I'm using:
# Acanthorrhynchium papillatum, Acrolejeunea sandvicensis, Acromastigum cavifolium
#'taxon' as header, see image posted above of my csv file for clarity
allpts <- vector('list', length(input))
names(allpts) = input
for (taxon in input){
cat(taxon, "\n")
allpts[[taxon]] <- occ_data(scientificName = taxon, limit = 2) #error here
df <- allpts[[taxon]]$data
df$networkKeys = NULL
if (!is.null(df)) {
df <- df[, !apply(df, 2, function(z)
is.null(unlist(z)))]
write.csv(df, paste("/Users/user/Desktop/DATA Bats/allpts_30sept/", gsub(" ", "_", taxon), ".csv", sep = "")) } }
However I get following error message at the moment:
Error in `[[<-`(`*tmp*`, taxon, value = list(`Acanthorrhynchium papillatum` = list( :
no such index at level 1
I'm even happy to try different codes to extract multiple species data. I've already tried many codes (i.e. loops, etc) that also kept giving me error messages and I haven't been able to solve.
Any help is greatly appreciated!

Merging Datasets with Different Attributes

I have a group of .xls files containing data for different periods of the year. I would like to merge them so that I have all the data in one file. I tried the following code:
#create files list
setwd("~/2010")
file.list <- list.files( pattern = ".*\\.xls$", full.names = TRUE )
When I continue, I get some warnings but I don't think they are relevent. See below:
#read files
> l <- lapply( file.list, readxl::read_excel )
There were 50 or more warnings (use warnings() to see the first 50)
> warnings()
Warning messages:
1: In read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, ... :
Expecting numeric in F1944 / R1944C6: got '-'
2: In read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, ... :
Expecting numeric in H1944 / R1944C8: got '-'
Then, I run the following line and the problems with the attributes pop up:
> dt <- data.table::rbindlist( l, use.names = TRUE, fill = TRUE )
Error in data.table::rbindlist(l, use.names = TRUE, fill = TRUE) :
Class attribute on column 15 of item 4 does not match with column 15 of item 1.
Can someone help me to fix this? Many thanks in advance
If you are going to bind together two datasets, the classes of the columns must match. Yours apparently do not. So you somehow need to address these mismatches.
Because you did not supply a col_types argument to read_xl::read_excel, it is guessing column types. I assume you expect the columns to be the same class in all of the data frames (otherwise, why bind them?) in which case you could pass a col_types argument so that read_xl::read_excel doesn't have to guess.
The error messages here are useful: I think they are saying that a column was guessed to be numeric but then the parser encountered a "-". Maybe this led to the column being assigned class "character". Perhaps "-" appears in the raw data to indicate a missing value. Then passing na = c("", "-") to read_xl::read_excel could resolve the issue.

Name variable according to Function input in R

I am trying to perform the following:
Analyse_Store <- function("x"){
datapaste"x" <- read.table(file = "x".txt, sep=";", header = TRUE)
}
So basically I am trying to read a table named "x" and assign it to the name datapaste"x" unfortunately this does not work as expected and I cannot find any help online.
Thanks in advance
Use assign to read table into the variable named x.
The following function will take argument x and assign x.txt from working directory to a variable x inside the function.
Analyse_Store <- function(x){
assign(paste("datapaste",x,sep=""), read.table(file = paste(x,".txt",sep=""), sep=";", header = TRUE))
}
Usage would be
datapastex = Analyse_Store("x")
Unless you are further processing the table inside the function, I don't see much use for a function in this case. You could just do
datapastex = read.table(file = "x.txt", sep=";", header = TRUE)

Undefined Columns Selected v. duplicate 'row.names' are not allowed

Within a for loop, I am trying to run a function between two columns of data in my data frame, and move to another data set every interation of the loop. I would like to output every output of the for loop into one vector of answers.
I can't get passed the following errors (listed below my code), depending on if I add or remove row.names = NULL to data <- read.csv... part of the following code (line 4 of the for-loop):
** Edited to include directory references, where the error ultimately was:
corr <- function(directory, threshold = 0) {
source("complete.R")
The above code/ my unseen directory organzation was where my error was
lookup <- complete("specdata")
setwd(paste0(getwd(),"/",directory,sep=""))
files <-list.files(full.names="TRUE") #read file names
len <- length(files)
answer2 <- vector("numeric")
answer <- vector("numeric")
dataN <- data.frame()
for (i in 1:len) {
if (lookup[i,"nobs"] > threshold){
# TRUE -> read that file, remove the NA data and add to the overall data frame
data <- read.csv(file = files[i], header = TRUE, sep = ",")
#remove incomplete
dataN <- data[complete.cases(data),]
#If yes, compute the correlation and assign its results to an intermediate vector.
answer<-cor(dataN[,"sulfate"],dataN[,"nitrate"])
answer2 <- c(answer2,answer)
}
}
setwd("../")
return(answer2)
}
1) Error in read.table(file = file, header = header, sep = sep, quote = quote, :
duplicate 'row.names' are not allowed
vs.)
2) Error in [.data.frame(data, , 2:3) : undefined columns selected
What I've tried
referring to the column names directly "colA"
initializing data and dataN to empty data.frames before the for loop
initializing answer2 to an empty vector
Getting an better understanding on how vectors, matrices and data.frames work with each other
** Thank you!**
My problem was that I had the function .R file that I was referencing in the code above, in the same directory as the data files I was looping through and analyzing. My "files" vector was an incorrect length, because it was reading the another .R function I made and referenced earlier in the function. I believe this R file is what created the 'undefined columns'
I apologize, I ended up not even putting up the right area of code where the problem lay.
Key Takeaway: You can always move between directories within a function! In fact, it may be very necessary if you want to perform a function on all the contents of a directory of interest
One approach:
# get the list of file names
files <- list.files(path='~',pattern='*.csv',full.names = TRUE)
# load all files
list.data <- lapply(files,read.csv, header = TRUE, sep = ",", row.names = NULL)
# remove rows with NAs
complete.data <- lapply(list.data,function(d) d[complete.cases(d),])
# compute correlation of the 2nd and 3rd columns in every data set
answer <- sapply(complete.data,function(d) cor(d[,2],d[,3]))
The same idea, buth slightly different realization
cr <- function(fname) {
d <- read.csv(fname, header = TRUE, sep = ",", row.names = NULL)
dc <- d[complete.cases(d),]
cor(dc[,2],dc[,3])
}
answer2 <- sapply(files,cr)
example of CSV files:
# ==> a.csv <==
# a,b,c,d
# 1,2,3,4
# 11,12,13,14
# 11,NA,13,14
# 11,12,13,14
#
# ==> b.csv <==
# A,B,C,D
# 101,102,103,104
# 101,102,103,104
# 11,12,13,14

write multiple custom files with d_ply

This question is almost the same as a previous question, but differs enough that the answers for that question don't work here. Like #chase in the last question, I want to write out multiple files for each split of a dataframe in the following format(custom fasta).
#same df as last question
df <- data.frame(
var1 = sample(1:10, 6, replace = TRUE)
, var2 = sample(LETTERS[1:2], 6, replace = TRUE)
, theday = c(1,1,2,2,3,3)
)
#how I want the data to look
write(paste(">", df$var1,"_", df$var2, "\n", df$theday, sep=""), file="test.txt")
#whole df output looks like this:
#test.txt
>1_A
1
>8_A
1
>4_A
2
>9_A
2
>2_A
3
>1_A
3
However, instead of getting the output from the entire dataframe I want to generate individual files for each subset of data. Using d_ply as follows:
d_ply(df, .(theday), function(x) write(paste(">", df$var1,"_", df$var2, "\n", df$theday, sep=""), file=paste(x$theday,".fasta",sep="")))
I get the following output error:
Error in file(file, ifelse(append, "a", "w")) :
invalid 'description' argument
In addition: Warning messages:
1: In if (file == "") file <- stdout() else if (substring(file, 1L, :
the condition has length > 1 and only the first element will be used
2: In if (substring(file, 1L, 1L) == "|") { :
the condition has length > 1 and only the first element will be used
Any suggestions on how to get around this?
Thanks,
zachcp
There were two problems with your code.
First, in constructing the file name, you passed the vector x$theday to paste(). Since x$theday is taken from a column of a data.frame, it often has more than one element. The error you saw was write() complaining when you passed several file names to its file= argument. Using instead unique(x$theday) ensures that you will only ever paste together a single file name rather than possibly more than one.
Second, you didn't get far enough to see it, but you probably want to write the contents of x (the current subset of the data.frame), rather than the entire contents of df to each file.
Here is the corrected code, which appears to work just fine.
d_ply(df, .(theday),
function(x) {write(paste(">", x$var1,"_", x$var2, "\n", x$theday, sep=""),
file=paste(unique(x$theday),".fasta",sep=""))
})

Resources