Here is my question:
I am trying my hand at datapasta(). I was able to do everything required but my finish result is not good.
How do I make my clipboard accessible so that I can copy what I need right into the clipboard instead of having to paste my copied text into the text box and then press saves to import my data? I believe this is the problem that is stopping me from properly viewing my copied data.frame correctly when I run head()
Here are the steps I followed via code chunk
install.packages("datapasta")
test<-data.frame(
stringsAsFactors = FALSE,
check.names = FALSE,
`Pos Team P W D L GD Pts` = c("1\tChelsea\t7\t5\t1\t1\t12\t16",
"2\tLiverpool\t7\t4\t3\t0\t15","3\tManchester City\t7\t4\t3\t0\t11\t14",
"4\tManchester United\t7\t4\t2\t1\t8\t14",
"5\tEverton\t7\t4\t2\t1\t5\t14",
"6\tBrighton\t7\t4\t2\t1\t3\t14","7\tBrentford\t7\t3\t3\t1\t4\t12",
"8\tTottenham\t7\t4\t0\t3\t12",
"9\tWest Ham\t7\t3\t2\t2\t4\t11",
"10\tAston Villa\t7\t3\t1\t3\t10","11\tArsenal\t7\t3\t1\t3\t10",
"12\tWolves\t7\t3\t0\t4\t9",
"13\tLeicester City\t7\t2\t2\t3\t8","14\tCrystal Palace\t7\t1\t4\t2\t7",
"15\tWatford\t7\t2\t1\t4\t7",
"16\tLeeds United\t7\t1\t3\t3\t6","17\tSouthampton\t7\t0\t4\t3\t4",
"18\tBurnley\t7\t0\t3\t4\t3",
"19\tNewcastle\t7\t0\t3\t4\t3","20\tNorwich City\t7\t0\t1\t6\t1")
)
head(test)
Here is my result which is not what I wanted
Pos\tTeam\tP\tW\tD\tL\tGD\tPts
1 1\tChelsea\t7\t5\t1\t1\t12\t16
2 2\tLiverpool\t7\t4\t3\t0\t15
3 3\tManchester City\t7\t4\t3\t0\t11\t14
4 4\tManchester United\t7\t4\t2\t1\t8\t14
5 5\tEverton\t7\t4\t2\t1\t5\t14
6 6\tBrighton\t7\t4\t2\t1\t3\t14
>
This is what pops up on my screen
Any help or suggestion will be greatly appreciated.
Related
I run the "R code" in "RKWard" to read a CSV file:
# I) Go to the working directory
setwd("/home/***/Desktop/***")
# II) Verify the current working directory
print(getwd())
# III) Load te nedded package
require("csv")
# IV) Read the desired file
read.csv(file="CSV_Example.csv", header=TRUE, sep=";")
The data in CSV file is as follow (an example token from this website):
id,name,salary,start_date,dept
1,Rick,623.3,2012-01-01,IT
2,Dan,515.2,2013-09-23,Operations
3,Michelle,611,2014-11-15,IT
4,Ryan,729,2014-05-11,HR
5,Gary,843.25,2015-03-27,Finance
6,Nina,578,2013-05-21,IT
7,Simon,632.8,2013-07-30,Operations
8,Guru,722.5,2014-06-17,Finance
But the result is as follow:
id.name.salary.start_date.dept
1 1,Rick,623.3,2012-01-01,IT
2 2,Dan,515.2,2013-09-23,Operations
3 3,Michelle,611,2014-11-15,IT
4 4,Ryan,729,2014-05-11,HR
5 5,Gary,843.25,2015-03-27,Finance
6 6,Nina,578,2013-05-21,IT
7 7,Simon,632.8,2013-07-30,Operations
8 8,Guru,722.5,2014-06-17,Finance
PROBLEM: The datas are not aligned as supposed to be.
Please can anyone help me
Just discovered the googlesheets package and find it very helpful thus far. I would now like to be able to replace all or a subset of the contents in an existing sheet.
Example:
> library(googlesheets)
> set.seed(10)
> test1 <- data.frame(matrix(rnorm(10),nrow = 5))
> test1
X1 X2
1 0.01874617 0.3897943
2 -0.18425254 -1.2080762
3 -1.37133055 -0.3636760
4 -0.59916772 -1.6266727
5 0.29454513 -0.2564784
> gs_new("foo_sheet", input = test1, trim = TRUE)
This creates a new sheet as expected. Let's say that we then need to update the sheet (this data is used for a shinyapps.io hosted shiny app, and I would prefer to not have to redeploy the app in order to change sheet references).
> test1$X2 <- NULL
> test1
X1
1 0.01874617
2 -0.18425254
3 -1.37133055
4 -0.59916772
5 0.29454513
I tried to simply overwrite with gs_new() but run into the following warning message:
> gs_new("foo_sheet", input = test1, trim = TRUE)
Warning message:
At least one sheet matching "foo_sheet" already exists, so you may
need to identify by key, not title, in future.
This results in a new sheet foo_sheet being created with a new key, but does not replace the existing sheet and will therefore produce a key error if we try to register the updated sheet with
gs_title("foo_sheet")
Error in gs_lookup(., "sheet_title", verbose) :
"foo_sheet" matches sheet_title for multiple sheets returned by gs_ls() (which should reflect user's Google Sheets home screen). Suggest you identify this sheet by unique key instead.
This means that if we later try to access the new sheet foo_sheet with gs_read("foo_sheet"), the API will return the original sheet, rather than the new one.
> df <- gs_read("foo_sheet")
> df
X1 X2
1 0.01874617 0.3897943
2 -0.18425254 -1.2080762
3 -1.37133055 -0.3636760
4 -0.59916772 -1.6266727
5 0.29454513 -0.2564784
It is my understanding that one possible solution could be to first delete the sheet with gs_delete("test1") and then create a new one. Alternatively one could perhaps empty cells with gs_edit_cells(), but was hoping for some form of overwrite function.
Thanks in advance!
I find that the edit cells function is a good workaround:
gs_edit_cells(ss = "foo_sheet", ws = "worksheet name", input = test1, anchor = "A1" trim = TRUE, col_names = TRUE)
By anchoring the data to the upper left corner, you can effectively overwrite all other data. The trim function will eliminate all cells that are not to be updated.
Data I have to process has unquoted text with some additional \r character. Files are big (500MB), copious (>600), and changing the export is not an option. Data might look like
A,B,C
blah,a,1
bloo,a\r,b
blee,c,d
How can this be handled with data.table's fread?
Is there a better R read CSV function for this, that's similarly performant?
Repro
library(data.table)
csv<-"A,B,C\r\n
blah,a,1\r\n
bloo,a\r,b\r\n
blee,c,d\r\n"
fread(csv)
Error in fread(csv) :
Expected sep (',') but new line, EOF (or other non printing character) ends field 1 when detecting types from point 0:
bloo,a
Advanced repro
The simple repro might be too trivial to give a sense of scale...
samplerecs<-c("blah,a,1","bloo,a\r,b","blee,c,d")
randomcsv<-paste0(c("A,B,C",rep(samplerecs,2000000)))
write(randomcsv,file = "sample.csv")
# Naive approach
fread("sample.csv")
# Akrun's approach with needing text read first
fread(gsub("\r\n|\r", "", paste0(randomcsv,collapse="\r\n")))
#>Error in file.info(input) : file name conversion problem -- name too long?
# Julia's approach with needing text read first
readr::read_csv(gsub("\r\n|\r", "", paste0(randomcsv,collapse="\r\n")))
#> Error: C stack usage 48029706 is too close to the limit
Further to #dirk-eddelbuettel & #nrussell's suggestions, a way of solving this is to is to pre-process the file. The processor could also be called within fread() but here it is performed in seperate steps:
samplerecs<-c("blah,a,1","bloo,a\r,b","blee,c,d")
randomcsv<-paste0(c("A,B,C",rep(samplerecs,2000000)))
write(randomcsv,file = "sample.csv")
# Remove errant `\r`'s with tr - shown here is the Windows R solution
shell("C:/Rtools/bin/tr.exe -d '\\r' < sample.csv > sampleNEW.csv")
fread("sampleNEW.csv")
We can try with gsub
fread(gsub("\r\n|\r", "", csv))
# A B C
#1: blah a 1
#2: bloo a b
#3: blee c d
You can also do this with tidyverse packages, if you'd like.
> library(readr)
> library(stringr)
> read_csv(str_replace_all(csv, "\r", ""))
# A tibble: 3 × 3
A B C
<chr> <chr> <chr>
1 blah a 1
2 bloo a b
3 blee c d
If you do want to do it purely in R, you could try working with connections. As long as a connection is kept open, it will start reading/writing from its previous position. Of course, this means the burden of opening and closing connections falls on you.
In the following code, the file is processed by chunks:
library(data.table)
input_csv <- "sample.csv"
in_conn <- file(input_csv)
output_csv <- "out.csv"
out_conn <- file(output_csv, "w+")
open(in_conn)
chunk_size <- 1E6
return_pattern <- "(?<=^|,|\n)([^,]*(?<!\n)\r(?!\n)[^,]*)(?=,|\n|$)"
buffer <- ""
repeat {
new_chars <- readChar(in_conn, chunk_size)
buffer <- paste0(buffer, new_chars)
while (grepl("[\r\n]$", buffer, perl = TRUE)) {
next_char <- readChar(in_conn, 1)
buffer <- paste0(buffer, next_char)
if (!length(next_char))
break
}
chunk <- gsub("(.*)[,\n][^,\n]*$", "\\1", buffer, perl = TRUE)
buffer <- substr(buffer, nchar(chunk) + 1, nchar(buffer))
cleaned <- gsub(return_pattern, '"\\1"', chunk, perl = TRUE)
writeChar(cleaned, out_conn, eos = NULL)
if (!length(new_chars))
break
}
writeChar('\n', out_conn, eos = NULL)
close(in_conn)
close(out_conn)
result <- fread(output_csv)
Process:
If a chunk ends with a \r or \n, another character is added until it doesn't.
Quotes are put around values containing a \r which isn't adjacent to a
\n.
The cleaned chunk is added to the end of another file.
Rinse and repeat.
This code simplifies the problem by assuming no quoting is done for any field in sample.csv. It's not especially fast, but not terribly slow. Larger values for chunk_size should reduce the amount of time spent in I/O operations. If used for anything beyond this toy example, I'd strongly suggesting wrapping it in a tryCatch(...) call to make sure the files are closed afterwards.
I want to put some R code plus the associated data file (RData) on Github.
So far, everything works okay. But when people clone the repository, I want them to be able to run the code immediately. At the moment, this isn't possible because they will have to change their work directory (setwd) to directory that the RData file was cloned (i.e. downloaded) to.
Therefore, I thought it might be easier, if I changed the R code such that it linked to the RData file on github. But I cannot get this to work using the following snippet. I think perhaps there is some issue text / binary issue.
x <- RCurl::getURL("https://github.com/thefactmachine/hex-binning-gis-data/raw/master/popDensity.RData")
y <- load(x)
Any help would be appreciated.
Thanks
This works for me:
githubURL <- "https://github.com/thefactmachine/hex-binning-gis-data/raw/master/popDensity.RData"
load(url(githubURL))
head(df)
# X Y Z
# 1 16602794 -4183983 94.92019
# 2 16602814 -4183983 91.15794
# 3 16602834 -4183983 87.44995
# 4 16602854 -4183983 83.79617
# 5 16602874 -4183983 80.19643
# 6 16602894 -4183983 76.65052
EDIT Response to OP comment.
From the documentation:
Note that the https:// URL scheme is not supported except on Windows.
So you could try this:
download.file(githubURL,"myfile")
load("myfile")
which works for me as well, but this will clutter your working directory. If that doesn't work, try setting method="curl" in the call to download.file(...).
I've had trouble with this before as well, and the solution I've found to be the most reliable is to use a tiny modification of source_url from the fantastic [devtools][1] package. This works for me (on a Mac).
load_url <- function (url, ..., sha1 = NULL) {
# based very closely on code for devtools::source_url
stopifnot(is.character(url), length(url) == 1)
temp_file <- tempfile()
on.exit(unlink(temp_file))
request <- httr::GET(url)
httr::stop_for_status(request)
writeBin(httr::content(request, type = "raw"), temp_file)
file_sha1 <- digest::digest(file = temp_file, algo = "sha1")
if (is.null(sha1)) {
message("SHA-1 hash of file is ", file_sha1)
}
else {
if (nchar(sha1) < 6) {
stop("Supplied SHA-1 hash is too short (must be at least 6 characters)")
}
file_sha1 <- substr(file_sha1, 1, nchar(sha1))
if (!identical(file_sha1, sha1)) {
stop("SHA-1 hash of downloaded file (", file_sha1,
")\n does not match expected value (", sha1,
")", call. = FALSE)
}
}
load(temp_file, envir = .GlobalEnv)
}
I use a very similar modification to get text files from github using read.table, etc. Note that you need to use the "raw" version of the github URL (which you included in your question).
[1] https://github.com/hadley/devtoolspackage
load takes a filename.
x <- RCurl::getURL("https://github.com/thefactmachine/hex-binning-gis-data/raw/master/popDensity.RData")
writeLines(x, tmp <- tempfile())
y <- load(tmp)
I am using the rDrop package that is available from https://github.com/karthikram/rDrop, and after a bit of tweaking (as all the functions don't quite work as you would always expect them to) I have got it to work finally in the way I would like, but it still requires authorisation verification to allow the use of the app, once you get the token each time, as I think that tokens expire over time...(if this is not the case and I can hard code in my token please tell me as that would be a good solution too...)
Basically I wanted a near seamless way of downloading csv files from my dropbox folders from the commandline in R in one line of code so that I dont need to click on the allow button after the token request.
Is this possible?
Here is the code I used to wrap up a dropbox csv download.
db.csv.download <- function(dropbox.path, ...){
cKey <- getOption('DropboxKey')
cSecret <- getOption('DropboxSecret')
reqURL <- "https://api.dropbox.com/1/oauth/request_token"
authURL <- "https://www.dropbox.com/1/oauth/authorize"
accessURL <- "https://api.dropbox.com/1/oauth/access_token/"
require(devtools)
install_github("ROAuth", "ropensci")
install_github("rDrop", "karthikram")
require(rDrop)
dropbox_oa <- oauth(cKey, cSecret, reqURL, authURL, accessURL, obj = new("DropboxCredentials"))
cred <- handshake(dropbox_oa, post = TRUE)
raw.data <- dropbox_get(cred,dropbox.path)
data <- read.csv(textConnection(raw.data), ...)
data
}
Oh and if its not obvious I have put my dropbox key and secret in my .Rprofile file, which is what the getOption part is referring to.
Thanks in advance for any help that is provided. (For bonus points...if anybody knows how to get rid of all the loading messages even for the install that would be great...)
library(rDrop)
# my keys are in my .rprofile, otherwise specifiy inline
db_token <- dropbox_auth()
# Hit ok to authorize once through the browser and hit enter back at the R prompt.
save(db_token, file="my_dropbox_token.rdata")
Dropbox token are non-expiring and can be revoked anytime from the Dropbox web panel.
For future use:
library(rDrop)
load('~/Desktop/my_dropbox_token.rdata')
df <- data.frame(x=1:10, y=rnorm(10))
> df
x y
1 1 -0.6135835
2 2 0.3624928
3 3 0.5138807
4 4 -0.2824156
5 5 0.9230591
6 6 0.6759700
7 7 -1.9744624
8 8 -1.2061920
9 9 0.9481213
10 10 -0.5997218
dropbox_save(db_token, list(df), file="foo", ext=".rda")
rm(df)
df2 <- db.read.csv(db_token, file='foo.rda')
> df2
x y
1 1 -0.6135835
2 2 0.3624928
3 3 0.5138807
4 4 -0.2824156
5 5 0.9230591
6 6 0.6759700
7 7 -1.9744624
8 8 -1.2061920
9 9 0.9481213
10 10 -0.5997218
If you have additional problems, please file an issue.