I'm trying to read my clipboard in R as a vector. I have a large list of numbers I need in vector format and I tried copy and pasting values before, but R stops after 4000 numbers.
# 1,2,3,4,5,6,7,8 <--example of what's on clipboard
vector<-c(1,2,3,4,5,6,7,8)
vector[5]
#[1] 5
why cant I apply the same thing to the "read Clipboard" function?
vector<-c(readClipboard())
vector
vector[5]
#[1]"1,2,3,4,5,6,7,8"
#[1]NA
is there anyway to get rid of the quotes and use these values?
Use the clipr package.
> # clipboard: 1,2,3,4,5,6,7,8
> library(clipr)
> read_clip_tbl(sep=",", header=FALSE)
V1 V2 V3 V4 V5 V6 V7 V8
1 1 2 3 4 5 6 7 8
This output is a dataframe, but now you just have to take its first row:
tbl <- read_clip_tbl(sep=",", header=FALSE)
unlist(tbl[1,])
This gives a named vector:
V1 V2 V3 V4 V5 V6 V7 V8
1 2 3 4 5 6 7 8
If you don't want the names:
> unname(unlist(tbl[1,]))
[1] 1 2 3 4 5 6 7 8
EDIT
In fact you don't need clipr. You can do:
> read.table(file="clipboard", sep=",")
V1 V2 V3 V4 V5 V6 V7 V8
1 1 2 3 4 5 6 7 8
My use-case is generally working between Excel and R, in which case #StéphaneLaurent's answer works. (And, in fact, I tend to use read.table(file='clipboard',header=F) (or =F) instead of readClipboard(), as it has worked well for me in almost all situations.
If your string is non-tabular, though, as in a literal comma-delimited string, you can split it using:
# clipboard: 1,2,3,4,5,6,7,8
s <- strsplit(readClipboard(), ",")
str(s)
# List of 1
# $ : chr [1:8] "1" "2" "3" "4" ...
There are several data-massaging you may want/need to do, depending on your use, such as: as.integer(s[[1]]), as.numeric(s[[1]]), or trimws(s) (not useful here).
Notice that I have not yet un-listed it. In the event that you have more than one line copied, such as a clipboard with:
1,2,3,4,5
11,12,13,14
then
s <- strsplit(readClipboard(), ",")
str(s)
# List of 2
# $ : chr [1:5] "1" "2" "3" "4" ...
# $ : chr [1:4] "11" "12" "13" "14"
str(lapply(s, as.integer))
# List of 2
# $ : int [1:5] 1 2 3 4 5
# $ : int [1:4] 11 12 13 14
and you can either refer to each line within as (say) s[[1]] for 1-5, or you can (as suggested in other answers) unlist(s) to combine
unlist(lapply(s, as.integer))
# [1] 1 2 3 4 5 11 12 13 14
as.integer(unlist(s))
# [1] 1 2 3 4 5 11 12 13 14
If the number of entries is always the same (i.e., CSV), such as a single row
1,2,3,4,5
with
read.csv(file='clipboard', header=FALSE)
# V1 V2 V3 V4 V5
# 1 1 2 3 4 5
And multilines such as
1,2,3,4,5
11,12,13,14,15
into
read.csv(file='clipboard', header=FALSE)
# V1 V2 V3 V4 V5
# 1 1 2 3 4 5
# 2 11 12 13 14 15
etc. From here, it can be unlisted, as.matrixed, or whatever you want, though unlist will do it by-column instead of by-row.
Related
I've got a data.frame which contains a character variable and multiple numeric variables, something like this:
sampleDF <- data.frame(a = c(1,2,3,"String"), b = c(1,2,3,4), c= c(5,6,7,8), stringsAsFactors = FALSE)
Which looks like this:
a b c
1 1 1 5
2 2 2 6
3 3 3 7
4 String 4 8
I'd like to transpose this data.frame and get it to look like this:
V1 V2 V3 V4
1 1 2 3 String
2 1 2 3 4
3 5 6 7 8
I tried
c<-t(sampleDF)
as well as
d<-transpose(sampleDF)
but both these methods result in V1, V2 and V3 now being of characer type despite only having numeric values.
I know that this has already been asked multiple times. However, I haven't found a suitable answer for why in this case V1, V2 and V3 are also being converted to character.
Is there any way how ensure that these column stay numeric?
Thanks a lot any apologies already for the duplicate nature of this question.
EDIT:
as.data.frame(t(sampleDF)
Does not solve the problem:
'data.frame': 3 obs. of 4 variables:
$ V1: Factor w/ 2 levels "1","5": 1 1 2
..- attr(*, "names")= chr "a" "b" "c"
$ V2: Factor w/ 2 levels "2","6": 1 1 2
..- attr(*, "names")= chr "a" "b" "c"
$ V3: Factor w/ 2 levels "3","7": 1 1 2
..- attr(*, "names")= chr "a" "b" "c"
$ V4: Factor w/ 3 levels "4","8","String": 3 1 2
..- attr(*, "names")= chr "a" "b" "c"
After transposing it, convert the columns to numeric with type.convert
out <- as.data.frame(t(sampleDF), stringsAsFactors = FALSE)
out[] <- lapply(out, type.convert, as.is = TRUE)
row.names(out) <- NULL
out
# V1 V2 V3 V4
#1 1 2 3 String
#2 1 2 3 4
#3 5 6 7 8
str(out)
#'data.frame': 3 obs. of 4 variables:
# $ V1: int 1 1 5
# $ V2: int 2 2 6
# $ V3: int 3 3 7
# $ V4: chr "String" "4" "8"
Or rbind the first column converted to respective 'types' with the transposed other columns
rbind(lapply(sampleDF[,1], type.convert, as.is = TRUE),
as.data.frame(t(sampleDF[2:3])))
NOTE: The first method would be more efficient
Or another approach would be to paste the values together in each column and then read it again
read.table(text=paste(sapply(sampleDF, paste, collapse=" "),
collapse="\n"), header = FALSE, stringsAsFactors = FALSE)
# V1 V2 V3 V4
#1 1 2 3 String
#2 1 2 3 4
#3 5 6 7 8
Or we can convert the 'data.frame' to 'data.matrix' which changes the character elements to NA, use the is.na to find the index of elements that are NA for replacing with the original string values
m1 <- data.matrix(sampleDF)
out <- as.data.frame(t(m1))
out[is.na(out)] <- sampleDF[is.na(m1)]
Or another option is type_convert from readr
library(dplyr)
library(readr)
sampleDF %>%
t %>%
as_data_frame %>%
type_convert
# A tibble: 3 x 4
# V1 V2 V3 V4
# <int> <int> <int> <chr>
#1 1 2 3 String
#2 1 2 3 4
#3 5 6 7 8
I am trying to sort a data frame by a column of numbers and I get an alphanumeric sorting of the digits instead. If the data frame is converted to a matrix, the sorting works.
df[order(as.numeric(df[,2])),]
V1 V2
1 a 1
3 c 10
2 b 2
4 d 3
> m <- as.matrix(df)
> m[order(as.numeric(m[,2])),]
V1 V2
[1,] "a" "1"
[2,] "b" "2"
[3,] "d" "3"
[4,] "c" "10"
V1 <- letters[1:4]
V2 <- as.character(c(1,10,2,3))
df <- data.frame(V1,V2, stringsAsFactors=FALSE)
df[order(as.numeric(df[,2])),]
gives
V1 V2
1 a 1
3 c 2
4 d 3
2 b 10
But
V1 <- letters[1:4]
V2 <- as.character(c(1,10,2,3))
df <- data.frame(V1,V2)
df[order(as.numeric(df[,2])),]
gives
V1 V2
1 a 1
2 b 10
3 c 2
4 d 3
which is due to factors.
thanks to the commentators akrun and Imo. Inspect each of the two dfs with str(df).
Also, there is more detail given the factor() function help menu. Scroll down to 'Warning' for more details of the issue at hand.
Could you be a little more specific about what's your intial dataframe ?
Because by running this code :
df<-data.frame(c("a","b","c","d"),c(1,2,10,3))
colnames(df)<-c("V1","V2")
#print(df)
df.order<-df[order(as.numeric(df[,2])),]
print(df.order)
I get the right answer :
V1 V2
1 a 1
2 b 2
4 d 3
3 c 10
Edit:
The column values might be being treated as factors.
Try forcing to character and then integer.
Example copy and pasted from console:
> Foo <- data.frame('ABC' = c('a','b','c','d'),'123' = c('1','2','10','3'))
> Foo[order(as.integer(as.character(Foo[,2]))),]
ABC X123
1 a 1
2 b 2
4 d 3
3 c 10
I have some data in a csv file, which includes row names. I want to take a single column of the data, while retaining the row names. The csv file was produced in the following manner:
MAT <- matrix(nrow=5, ncol=2, c(1:10))
rownames(MAT) <- c("First","Second","Third","Fourth","Fifth")
write.csv(MAT, file='~/test.csv', row.names=TRUE)
The matrix MAT is given below. Ultimately I want the first column of this matrix (after loading the csv file), with the row names intact.
[,1] [,2]
First 1 6
Second 2 7
Third 3 8
Fourth 4 9
Fifth 5 10
If I now read the csv file,
MAT2 <- read.csv(file='~/test.csv')
MAT2 is given by
X V1 V2
1 First 1 6
2 Second 2 7
3 Third 3 8
4 Fourth 4 9
5 Fifth 5 10
The read.csv command seems to have created another row. In any case, if I do MAT3 <- MAT2[,2], I do not get a matrix like above. as.matrix(MAT2[,2]) does not retain the row names as I want.
Any ideas of how to proceed?
Perhaps a better starting point is:
read.csv(file='~/test.csv', row.names = 1)
V1 V2
First 1 6
Second 2 7
Third 3 8
Fourth 4 9
Fifth 5 10
You can also wrap this in as.matrix:
as.matrix(read.csv(file='~/test.csv', row.names = 1))
Compare their structures:
> str(read.csv(file='~/test.csv', row.names = 1))
'data.frame': 5 obs. of 2 variables:
$ V1: int 1 2 3 4 5
$ V2: int 6 7 8 9 10
> str(as.matrix(read.csv(file='~/test.csv', row.names = 1)))
int [1:5, 1:2] 1 2 3 4 5 6 7 8 9 10
- attr(*, "dimnames")=List of 2
..$ : chr [1:5] "First" "Second" "Third" "Fourth" ...
..$ : chr [1:2] "V1" "V2"
If all you are actually concerned about is how to extract a column while retaining the original structure, perhaps drop = FALSE is what you're after:
MAT2 <- as.matrix(read.csv(file='~/test.csv', row.names = 1))
# V1 V2
# First 1 6
# Second 2 7
# Third 3 8
# Fourth 4 9
# Fifth 5 10
MAT2[, 2]
# First Second Third Fourth Fifth
# 6 7 8 9 10
MAT2[, 2, drop = FALSE]
# V2
# First 6
# Second 7
# Third 8
# Fourth 9
# Fifth 10
This question already has answers here:
Recombining a list of Data.frames into a single data frame [duplicate]
(5 answers)
Closed 9 years ago.
I have a list of elements, like this
[[1]]
[1] 9.623571 5.334566 7.266597 6.510794 4.301958
[[2]]
[1] 9.693326 9.015892 1.266178 8.547392 4.326199
and I would like to transform it to dataframe like this:
V1 V2 V3 V4 V5
9.623571 5.334566 7.266597 6.510794 4.301958
9.693326 9.015892 1.266178 8.547392 4.326199
Therefore, all elements in the same position within the entries of the list, merged in the same column.
It has to be something related to "rbind", because the result is like doing:
rbind(list[[1]],list[[2]]...)
but I don't know how to apply it for all the entries of the list. Any light on this would be appreciated.
Thank you very much in advance.
Tina.
since no one mentioned this yet:
library(data.table)
rbindlist(yourList)
Very fast.
Try this:
as.data.frame(do.call(rbind, l))
where, l <- list(1:5, 6:10)
# V1 V2 V3 V4 V5
# 1 1 2 3 4 5
# 2 6 7 8 9 10
I know this is using an extra package, but I usually have plyr and reshape2 in my path.
library(plyr)
x <- list( sample(1:10,10) , sample(1:10,10))
ldply(x) # list 2 data.frame
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 3 8 6 2 10 9 7 5 1 4
2 8 10 3 4 2 5 1 7 6 9
I find it particularly useful when the list has names:
names(x) <- c("hello","goodbye")
ldply(x)
.id V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 hello 3 8 6 2 10 9 7 5 1 4
2 goodbye 8 10 3 4 2 5 1 7 6 9
Using reshape2 you can also quickly reshape the data.frame into something that will plot quickly with ggplot2
library(reshape2)
melt(ldply(x),c(".id"))
.id variable value
1 hello V1 3
2 goodbye V1 8
3 hello V2 8
4 goodbye V2 10
5 hello V3 6
6 goodbye V3 3
7 hello V4 2
8 goodbye V4 4
9 hello V5 10
10 goodbye V5 2
11 hello V6 9
12 goodbye V6 5
13 hello V7 7
14 goodbye V7 1
15 hello V8 5
16 goodbye V8 7
17 hello V9 1
18 goodbye V9 6
19 hello V10 4
20 goodbye V10 9
None of these solutions are particularly fast for very large datasets (untested assumption), but they are really useful when working with smallish datasets (< 10k rows).
use sapply to get what you are after. As an example...
x <- list( 1:10 , letters[1:10])
as.data.frame( t( sapply(x , rbind) ) )
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
# 1 1 2 3 4 5 6 7 8 9 10
# 2 a b c d e f g h i j
Use Reduce to rbind the elements of the list together:
l <- list (a=c(9.623571, 5.334566, 7.266597, 6.510794, 4.301958),
b=c(9.693326, 9.015892, 1.266178, 8.547392, 4.326199))
# rbind the elements of the list.
Reduce(rbind, l)
Here's my dataframe df
I'm trying:
df=data.frame(rbind(c(1,"*","*"),c("*",3,"*"))
df2=as.data.frame(sapply(df,sub,pattern="*",replacement="NA"))
It doesn't work because of the asterisk but I'm getting mad trying to replace it.
If you just have * in (meaning its not like ab*de) your data.frame, then, you can do ths without regex:
df[df == "*"] <- NA
Both solutions here address an object already in your workplace. If possible (or at least in the future) you can make use of the na.strings argument in read.table. Notice that it is plural "strings", so you should be able to specify more than one character to treat as NA values.
Here's an example: This just writes a file named "readmein.txt" to your current working directory and verifies that it is there.
cat("V1 V2 V3 V4 V5 V6 V7\n
2 * * * * * 2\n
1 2 * * * * 1\n", file = "readmein.txt")
list.files(pattern = "readme")
# [1] "readmein.txt"
Here's read.table with the na.strings argument in action.
read.table("readmein.txt", na.strings="*", header = TRUE)
# V1 V2 V3 V4 V5 V6 V7
# 1 2 NA NA NA NA NA 2
# 2 1 2 NA NA NA NA 1
Update: Objects already in your workplace
I see another problem with the other two answers: They both result in character (or rather factor) variables, even when the column should have possibly been numeric.
Here's an example. First, we create an example dataset. For fun, I've added another character to be treated as NA: ".".
temp <- data.frame(
V1 = c(1:3),
V2 = c(1, "*", 3),
V3 = c("a", "*", "c"),
V4 = c(".", "*", "3"))
temp
# V1 V2 V3 V4
# 1 1 1 a .
# 2 2 * * *
# 3 3 3 c 3
str(temp)
# 'data.frame': 3 obs. of 4 variables:
# $ V1: int 1 2 3
# $ V2: Factor w/ 3 levels "*","1","3": 2 1 3
# $ V3: Factor w/ 3 levels "*","a","c": 2 1 3
# $ V4: Factor w/ 3 levels ".","*","3": 1 2 3
Let's make a copy, and then solve this in what I would consider the most obvious "R" way:
temp1 <- temp
temp1[temp1 == "*"|temp1 == "."] <- NA
Looks OK...
temp1
# V1 V2 V3 V4
# 1 1 1 a <NA>
# 2 2 <NA> <NA> <NA>
# 3 3 3 c 3
... but I presume that V2 and V4 should have been numeric....
str(temp1)
# 'data.frame': 3 obs. of 4 variables:
# $ V1: int 1 2 3
# $ V2: Factor w/ 3 levels "*","1","3": 2 NA 3
# $ V3: Factor w/ 3 levels "*","a","c": 2 NA 3
# $ V4: Factor w/ 3 levels ".","*","3": 1 NA 3
Here's a workaround:
temp2 <- read.table(text = capture.output(temp), na.strings = c("*", "."))
temp2
# V1 V2 V3 V4
# 1 1 1 a NA
# 2 2 NA <NA> NA
# 3 3 3 c 3
str(temp2)
# 'data.frame': 3 obs. of 4 variables:
# $ V1: int 1 2 3
# $ V2: int 1 NA 3
# $ V3: Factor w/ 2 levels "a","c": 1 NA 2
# $ V4: int NA NA 3
Update 2: (Yet another) alternative
It might be more appropriate to make use of type.convert which is described as a "helper function for read.table" on its help page. I haven't timed it, but my guess is that it would be faster than the workaround I mentioned above, with all the benefits.
data.frame(
lapply(temp, function(x) type.convert(
as.character(x), na.strings = c("*", "."))))
You should put up a full reproducible example, people will be more inclined to help when you make it easy for em. Anywho...
dat <- data.frame(a=c(1,2,'*',3,4), b=c('*',2,3,4,'*'))
> dat
a b
1 1 *
2 2 2
3 * 3
4 3 4
5 4 *
> as.data.frame(sapply(dat,sub,pattern='\\*',replacement=NA))
a b
1 1 <NA>
2 2 2
3 <NA> 3
4 3 4
5 4 <NA>
This could work (It's a pretty flexible) but there's other great solutions already. Arun's solution is my typical approach but created replacer for new R (little experience with the command line) users. I wouldn't recommend replacer for anyone with even a bit of experience.
library(qdap)
replacer(dat, "*", NA)