I'm importing a dataset of numbers. Just one column being Numbers_Picked which has 600,000 rows. Each row consist of 20 integers ranging from 01 to 80, each separated by a white space. My problem is R will only handle it as characters. When represented as characters, all 20 numbers show up.
library(readr)
numbers_picked <- read_delim("C:/Users/HP/Desktop/csv/numbers_picked.csv",
" ", escape_double = FALSE, col_types = cols(numbers_picked =
col_character()))
View(numbers_picked)
When I use the white space delimiter and set it to integer, the data preview show that the column only takes one value.
library(readr)
numbers_picked <- read_delim("C:/Users/HP/Desktop/csv/numbers_picked.csv",
" ", escape_double = FALSE, col_types = cols(numbers_picked =
col_integer()))
View(numbers_picked)
Basically, I want to represent 20 integers in one column.
here is a sample of the dataset
numbers_picked
06 18 20 21 24 32 36 40 44 47 50 52 55 57 60 61 62 68 72 79
03 05 12 13 14 16 17 18 24 28 33 34 35 39 44 55 62 63 64 67
09 10 12 13 15 25 30 31 36 42 43 44 46 48 51 57 65 69 75 79
08 12 15 20 27 33 34 37 41 43 44 45 54 55 60 61 66 70 72 76
Windows 10
RSTUDIOS - Latest
File - .csv
Link -Large File 600,000 + lines
I created a .csv file with the numbers you provided using ' ' as a separator and this worked like a charm.
numbers_picked <- read.table("C:/Users/HP/Desktop/csv/numbers_picked.csv",
sep = " ")
If your intention was to get a data frame with a single list-column of integer vectors, then you can do the following.
Read in the column as a character vector, then str_split it into a list of character vectors. We can then map each of the character vectors to an integer vector.
library('tidyverse')
csv_text <- 'numbers_picked
06 18 20 21 24 32 36 40 44 47 50 52 55 57 60 61 62 68 72 79
03 05 12 13 14 16 17 18 24 28 33 34 35 39 44 55 62 63 64 67
09 10 12 13 15 25 30 31 36 42 43 44 46 48 51 57 65 69 75 79
08 12 15 20 27 33 34 37 41 43 44 45 54 55 60 61 66 70 72 76'
read_csv(csv_text) %>%
mutate(numbers_picked = stringr::str_split(numbers_picked, ' ') %>% map(as.integer))
# numbers_picked
# <list>
# 1 <int [20]>
# 2 <int [20]>
# 3 <int [20]>
# 4 <int [20]>
Related
In Stata, when changing values to variables (or other related operations), the output includes a comment regarding the number of changes. E.g:
Is there a way to obtain similar commentary in RStudio?
For instance, sometimes I want to check how many changes a command made (partly to see if command worked, or to count the extent of a potential problem in the data). Currently, I have to inspect the data manually or do a pretty uninformative comparison using all(), for instance.
Base R doesn't do this, but you could write a function to do it, and then instead of saying
x <- y
you'd say
x <- showChanges(x, y)
For example,
library(waldo)
showChanges <- function(oldval, newval) {
print(compare(oldval, newval))
newval
}
set.seed(123)
x <- 1:100
x <- showChanges(x, x + rbinom(100, size = 1, prob = 0.01))
#> `old[21:27]`: 21 22 23 24 25 26 27
#> `new[21:27]`: 21 22 23 25 25 26 27
x
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
#> [19] 19 20 21 22 23 25 25 26 27 28 29 30 31 32 33 34 35 36
#> [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
#> [55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
#> [73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
#> [91] 91 92 93 94 95 96 97 98 99 100
Created on 2021-10-21 by the reprex package (v2.0.0)
I'm trying to import an anova data set csv file into R using the read.csv function. When I import it the columns are labelled X........ Even though the csv file the column labels are clearly person, gender etc
I don't know why this is. I've copied the code below. Any help would be appreciated
read.csv("/Users/Desktop/R /anova data set.csv")
X.......
1 ;Person;gender;Age;Height;pre.weight;Diet;weight6weeks
2 ;25; ;41;171;60;2;60
3 ;26; ;32;174;103;2;103
4 ;1;0;22;159;58;1;54.2
5 ;2;0;46;192;60;1;54
6 ;3;0;55;170;64;1;63.3
7 ;4;0;33;171;64;1;61.1
8 ;5;0;50;170;65;1;62.2
9 ;6;0;50;201;66;1;64
10 ;7;0;37;174;67;1;65
11 ;8;0;28;176;69;1;60.5
12 ;9;0;28;165;70;1;68.1
13 ;10;0;45;165;70;1;66.9
14 ;11;0;60;173;72;1;70.5
15 ;12;0;48;156;72;1;69
16 ;13;0;41;163;72;1;68.4
17 ;14;0;37;167;82;1;81.1
18 ;27;0;44;174;58;2;60.1
19 ;28;0;37;172;58;2;56
20 ;29;0;41;165;59;2;57.3
21 ;30;0;43;171;61;2;56.7
22 ;31;0;20;169;62;2;55
23 ;32;0;51;174;63;2;62.4
24 ;33;0;31;163;63;2;60.3
25 ;34;0;54;173;63;2;59.4
26 ;35;0;50;166;65;2;62
27 ;36;0;48;163;66;2;64
28 ;37;0;16;165;68;2;63.8
29 ;38;0;37;167;68;2;63.3
30 ;39;0;30;161;76;2;72.7
31 ;40;0;29;169;77;2;77.5
32 ;52;0;51;165;60;3;53
33 ;53;0;35;169;62;3;56.4
34 ;54;0;21;159;64;3;60.6
35 ;55;0;22;169;65;3;58.2
36 ;56;0;36;160;66;3;58.2
37 ;57;0;20;169;67;3;61.6
38 ;58;0;35;163;67;3;60.2
39 ;59;0;45;155;69;3;61.8
40 ;60;0;58;141;70;3;63
41 ;61;0;37;170;70;3;62.7
42 ;62;0;31;170;72;3;71.1
43 ;63;0;35;171;72;3;64.4
44 ;64;0;56;171;73;3;68.9
45 ;65;0;48;153;75;3;68.7
46 ;66;0;41;157;76;3;71
47 ;15;1;39;168;71;1;71.6
48 ;16;1;31;158;72;1;70.9
49 ;17;1;40;173;74;1;69.5
50 ;18;1;50;160;78;1;73.9
51 ;19;1;43;162;80;1;71
52 ;20;1;25;165;80;1;77.6
53 ;21;1;52;177;83;1;79.1
54 ;22;1;42;166;85;1;81.5
55 ;23;1;39;166;87;1;81.9
56 ;24;1;40;190;88;1;84.5
57 ;41;1;51;191;71;2;66.8
58 ;42;1;38;199;75;2;72.6
59 ;43;1;54;196;75;2;69.2
60 ;44;1;33;190;76;2;72.5
61 ;45;1;45;160;78;2;72.7
62 ;46;1;37;194;78;2;76.3
63 ;47;1;44;163;79;2;73.6
64 ;48;1;40;171;79;2;72.9
65 ;49;1;37;198;79;2;71.1
66 ;50;1;39;180;80;2;81.4
67 ;51;1;31;182;80;2;75.7
68 ;67;1;36;155;71;3;68.5
69 ;68;1;47;179;73;3;72.1
70 ;69;1;29;166;76;3;72.5
71 ;70;1;37;173;78;3;77.5
72 ;71;1;31;177;78;3;75.2
73 ;72;1;26;179;78;3;69.4
74 ;73;1;40;179;79;3;74.5
75 ;74;1;35;183;83;3;80.2
76 ;75;1;49;177;84;3;79.9
77 ;76;1;28;164;85;3;79.7
78 ;77;1;40;167;87;3;77.8
79 ;78;1;51;175;88;3;81.9
colnames(aov)
[1] "X......."
Let's say I have a vector named ages and I bound the values of a vector to a limit:
# Create vector
ages <- c(1:7427) # This is not what my vector is actually assigned as btw, it has a lot of random floats in reality.
# Set limits
min_HB <- 11.68987
max_HB <- 11.81083
# Limit vector
HB_ages <- sapply(ages, function(y) min(max(y,min_HB),max_HB))
The HB_ages range is from 1 to length(HB_ages), but I would still like to get info on what are the indices of the values from the original vector. Is there a way to do this?
We can stack on a named vector
stack(setNames(HB_ages, ages))
Or with vectorized approach
data.frame(ind = ages, val = pmin(pmax(ages, min_HB), max_HB))
This seems to be a very simple problem to me. Correct me if I am wrong.
Here is your ages vector (slightly modified)
ages <- round(rnorm(100, 32, 15))
ages
[1] 37 31 2 33 15 20 41 38 71 47 50 40 64 32 54 28 62 38 42 24 39 22 32 29
[25] 26 27 32 50 33 11 22 15 21 5 43 17 45 51 47 54 15 13 21 29 13 21 33 23
[49] 40 24 20 36 29 35 42 54 30 53 32 48 39 46 20 62 49 41 35 44 50 56 17 43
[73] 36 22 38 30 19 14 40 26 27 50 36 32 36 51 41 25 58 45 27 45 30 44 9 29
[97] 43 19 41 5
and here are your limits:
min_HB <- 11.68987
max_HB <- 11.81083
and this is what you have done:
HB_ages <- sapply(ages, function(y) min(max(y,min_HB),max_HB))
Index is the same order of the values. Just add that order.
cbind(HB_ages, c(1:length(HB_ages)))
The second column is what you need. Correct?
Hey there I am super stuck on getting this example of r setup. I need a function called draw one number at a time (no duplicate numbers). Out put I want it to be a list with a length of 20 with each element a numeric scalar representing the randomly selected number.
Essentially I need the values to be something like: "draw1", "draw2", etc...
so far I have something like:
draw <- lapply(x=list(draw=1:80), FUN = sample, size 20, replace = F)
Well I think I got what I needed if I am wrong I am wrong but. . .
draw <- function(){
deal <- as.list(sample(1:80, size=20, replace=FALSE))
names(deal) <- paste("draw", 1:20, sep="")
return(deal)
}
Then by calling:
round.one <- draw()
round.one
I get a list of 20 entries each named Draw1, Draw2, Draw3, etc. . .
Now that I understand better what you're after here are some modifications/improvements
I'd make draw a function of the number of samples n; that way it's easy to draw different sample sizes.
You can use setNames to construct the list and give names to the list elements in one go.
Taken together we end up with
draw <- function(n = 20) setNames(
as.list(sample(1:80, size = n, replace = FALSE)),
paste("draw", 1:n, sep=""))
Another useful function in this context is replicate which lets you replicate the process of drawing 20 samples Nrep times, storing results in a matrix
Nrep <- 10
mat <- replicate(Nrep, draw())
mat
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#draw1 39 71 18 55 55 26 70 24 30 24
#draw2 64 6 79 63 15 57 77 66 10 45
#draw3 59 36 39 64 42 20 68 33 61 27
#draw4 27 69 49 7 62 24 34 1 6 77
#draw5 62 54 1 13 24 53 15 4 60 17
#draw6 19 23 22 3 52 32 7 55 79 79
#draw7 44 44 73 18 58 64 44 6 58 31
#draw8 12 64 56 67 20 40 6 74 27 40
#draw9 13 33 5 1 49 73 38 46 45 59
#draw10 17 24 58 17 71 61 79 30 66 1
#draw11 56 40 9 12 39 27 4 31 8 48
#draw12 77 29 29 11 65 12 42 73 50 61
#draw13 76 57 64 45 79 13 54 19 11 28
#draw14 9 11 38 33 16 58 41 54 18 60
#draw15 66 74 76 46 32 23 2 7 52 54
#draw16 50 1 63 19 30 51 26 18 39 4
#draw17 20 21 50 79 51 5 75 9 7 69
#draw18 2 60 6 65 1 16 66 42 75 32
#draw19 43 65 23 73 47 63 53 61 44 74
#draw20 69 5 30 5 64 77 20 13 17 13
It's now easy to operate on the matrix in a row (apply(mat, 1, ...)) or column-wise way (apply(mat, 2, ...)).
currently, I read in a graph from an edgelist as follows:
>> require(igraph) # i have igraph 1.1.0
>> g1 <- read_graph(graphname, format='ncol')
>> V(g1)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 38 40 42 44 46 47 48 49 50 52 56 57 58
[50] 59 60 61 62 63 64 65 67 68 41 69 43 53 37 39 45 51 54 55 66 70
As you can see, the vertex ordering is completely wrong, despite the fact that the vertices have incredibly, incredibly basic naming convention (they are all just integers). This is incredibly problematic, because the ordering of the get.adjacency function in igraph (returning me a 70x70 matrix) depends on the ordering of the vertices in V(g1), so when I try to compare to some g2 with the same set of vertices, they are similarly in a ridiculously nonsensical ordering (yet distinct from the one here) leading to inconsistent graph vertices in the sample of graphs I have despite them all having the same vertex labels. Is there a way to correct this issue, such that I can easily reorder the vertices in my graph so that the resulting adjacency matrices have sensible orderings?
EDIT: note I have already tried permuting the vertices with the permute.vertices function:
>> gtest <- permute.vertices(g1, as.numeric(V(g1))) # permute vertex ids by the ordering returned by V()
>> V(gtest) # too bad it doesn't work...
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 38 40 42 44 46 47 48 49 50 52 56 57 58
[50] 59 60 61 62 63 64 65 67 68 41 69 43 53 37 39 45 51 54 55 66 70
I managed to get it working when I instead read my graph in as:
>> g1 <- read_graph(graphname, format='ncol', predef=1:70)
>> V(g1)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
[50] 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70
But this seems a bit ludicrous if this really is the only way to do it. Does anybody have any other suggestions?
Thanks!