I have a problem in reading a .txt in to R.
The data is something like this:
68 89 103 1
37 8 103 9
78 93 8 12
3 50
I used readLine() in R and came up with a list. But when I compare it to the raw data, I find that , for example, the last "1" in the first line is not 1, it should be connected to the second line, which make the number to e 137, instead of 1 and 37. I think this data is split by " ". If I use readLine(), I manually split up the lines. How could I correctly read it?
And, number 9 is not connect to 78 since at the beginning of line 3, there is a space. number 12 is connected with 3 to form 123, since there is no space before 3.
Thanks. I even don't know how to search my problem in Google. Don't know how to express it.
182 63 68 152 130 134 145 152 98 152 182 88 95 105 130 137 167 152 81 71 84 126 134 152 116 130 91 63 68 84 95 152 105 152 63
102 152 63 77 112 140 77 119 152 161 167 105 112 145 161 182 152 81 95 84 91 102 108 130 134 91
1 2 1 4 3 6 1 1 5 2 1 5 2 3 4 5 5 1 2 6 1
63 102 119 161 161 172 179 88 91 95 105 112 119 119 137 145 167 172 91 98 108 112 134 137 161 161 179 71 174 95 105 134 134 1
37 140 145 150 150 68 68 130 137 77 95 112 137 161 174 81 84 126 134 161 161 174 68 77 98 102 102 102 112 88 88 91 98 112 134
134 137 137 140 140 152 152 77 179 112 71 71 74 77 112 116 116 140 140 167 77 95 126 150 88 126 130 130 134 63 74 84 84 88 9
1 95 108 134 137 179 81 88 105 116 123 140 145 152 161 161 179 88 95 112 119 126 126 150 157 179 68 68 84 102 105 119 123 123
137 161 179 182 140 152 182 182 81 63 88 134 84 134 182
7 11 9 2 9 4 6 7 6 1 13 2 1 10 4 5 11 11 9 12 1 3 1 3 3
Basically, what I am doing now is:
For example, the vector:
ind <- c(7, 11, 9, 2 ,9 ,4 ,6, 7, 6 ,1, 13, 2 ,1 ,10 ,4 ,5 ,11 ,11, 9 ,12, 1, 3 ,1, 3 ,3)
indicates that the block of number above should be split up according to the length specified by the vector. I know I can split up a vector by
split(vector, rep(1:length(ind), ind))
However, the problem is I can't read the block of number correctly.
Based on the conditions you described, i.e. if there is a space at the beginning of line after you read the file with readLines, then the last number in the previous line should be joined with the first number of the current line.
Using your second example (I didn't understand the ind though)
lines1 <- readLines(n=10)
182 63 68 152 130 134 145 152 98 152 182 88 95 105 130 137 167 152 81 71 84 126 134 152 116 130 91 63 68 84 95 152 105 152 63
102 152 63 77 112 140 77 119 152 161 167 105 112 145 161 182 152 81 95 84 91 102 108 130 134 91
1 2 1 4 3 6 1 1 5 2 1 5 2 3 4 5 5 1 2 6 1
63 102 119 161 161 172 179 88 91 95 105 112 119 119 137 145 167 172 91 98 108 112 134 137 161 161 179 71 174 95 105 134 134 1
37 140 145 150 150 68 68 130 137 77 95 112 137 161 174 81 84 126 134 161 161 174 68 77 98 102 102 102 112 88 88 91 98 112 134
134 137 137 140 140 152 152 77 179 112 71 71 74 77 112 116 116 140 140 167 77 95 126 150 88 126 130 130 134 63 74 84 84 88 9
1 95 108 134 137 179 81 88 105 116 123 140 145 152 161 161 179 88 95 112 119 126 126 150 157 179 68 68 84 102 105 119 123 123
137 161 179 182 140 152 182 182 81 63 88 134 84 134 182
lines2 <- lines1[lines1!=''] #remove blank lines
indx <- grep("^ ", lines2) #create a numeric index for lines that start with a space
indx1 <- indx-1 #index that is one above the previous `indx`
lines2[indx1] <- paste0(lines2[indx1], gsub("^\\s+", "", lines2[indx])) #paste the lines together using the two indexes
lines3 <- lines2[-indx] #remove the lines that belong to the first index
lines3
#[1] "182 63 68 152 130 134 145 152 98 152 182 88 95 105 130 137 167 152 81 71 84 126 134 152 116 130 91 63 68 84 95 152 105 152 63102 152 63 77 112 140 77 119 152 161 167 105 112 145 161 182 152 81 95 84 91 102 108 130 134 91"
#[2] "1 2 1 4 3 6 1 1 5 2 1 5 2 3 4 5 5 1 2 6 1"
#[3] "63 102 119 161 161 172 179 88 91 95 105 112 119 119 137 145 167 172 91 98 108 112 134 137 161 161 179 71 174 95 105 134 134 1"
#[4] "37 140 145 150 150 68 68 130 137 77 95 112 137 161 174 81 84 126 134 161 161 174 68 77 98 102 102 102 112 88 88 91 98 112 134134 137 137 140 140 152 152 77 179 112 71 71 74 77 112 116 116 140 140 167 77 95 126 150 88 126 130 130 134 63 74 84 84 88 9"
#[5] "1 95 108 134 137 179 81 88 105 116 123 140 145 152 161 161 179 88 95 112 119 126 126 150 157 179 68 68 84 102 105 119 123 123137 161 179 182 140 152 182 182 81 63 88 134 84 134 182"
Related
I am getting my self familiar with R, working on it using some mathematical work. I am working on indexing and seq function and getting help from here
I am first creating a vector x with all the integer from 1 to 200, I am performing this task using the code below
t <- 1:200
now I want to display the every 5th number using from above vector, I am doing it with below method
u <- seq (1,200, by=5)
First question: though the every 5th number is 5, 10 , 15 but its showing me 1, 6 , 11 etc
Now I want to take the square of any random numbers from vector t for that I am doing it in below way:\
square <- t[c(4, 6, 7, 9, 16, 24, 26, 29,30)]^2
Second question This is displaying me the square of these numbers but without using loops how I can display the numbers like 1,2,3,16,5,36 etc
I am using the below web pages for practice and understanding
https://rspatial.org/intr/4-indexing.html
https://www.r-exercises.com/start-here-to-learn-r/
Another option is replace
t <- 1:200
v <- c(4, 6, 7, 9, 16, 24, 26, 29, 30)
replace(t, v, t[v]^2)
We can use an ifelse
ifelse(seq_along(t) %in% c(4, 6, 7, 9, 16, 24, 26, 29,30), t^2, t)
-output
[1] 1 2 3 16 5 36 49 8 81 10 11 12 13 14 15 256 17 18 19 20 21 22 23 576 25 676 27 28 841 900 31 32 33 34 35
[36] 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70
[71] 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105
[106] 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140
[141] 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175
[176] 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200
I have a large (ish) data frame and I want to use dplyr mutate function (or suitable alternative) to calculate the mean of selected columns.
For example, suppose I had a data frame as follows:
colnames(dall)
[1] "Code" "LA.Name" "LA_Name" "Jan.20" "Feb.20" "Mar.20" "Apr.20" "May.20" "Jun.20"
[10] "Jul.20" "Aug.20" "Sep.20" "Oct.20" "Nov.20" "Dec.20" "Jan.19" "Feb.19" "Mar.19"
[19] "Apr.19" "May.19" "Jun.19" "Jul.19" "Aug.19" "Sep.19" "Oct.19" "Nov.19" "Dec.19"
[28] "Jan.18" "Feb.18" "Mar.18" "Apr.18" "May.18" "Jun.18" "Jul.18" "Aug.18" "Sep.18"
[37] "Oct.18" "Nov.18" "Dec.18" "Jan.17" "Feb.17" "Mar.17" "Apr.17" "May.17" "Jun.17"
[46] "Jul.17" "Aug.17" "Sep.17" "Oct.17" "Nov.17" "Dec.17" "Jan.16" "Feb.16" "Mar.16"
[55] "Apr.16" "May.16" "Jun.16" "Jul.16" "Aug.16" "Sep.16" "Oct.16" "Nov.16" "Dec.16"
[64] "Jan.15" "Feb.15" "Mar.15" "Apr.15" "May.15" "Jun.15" "Jul.15" "Aug.15" "Sep.15"
[73] "Oct.15" "Nov.15" "Dec.15"
I'm trying to create a new column with the mean of January data from 2015 to 2019.
Have tried several methods. Latest as follows:
mutate(dall, mJan15to19 = mean(Jan.15,Jan.16,Jan.17,Jan.18,Jan.19))
I get the following back:
Error in mean.default(Jan.15, Jan.16, Jan.17, Jan.18, Jan.19) :
'trim' must be numeric of length one
In addition: Warning message:
In if (na.rm) x <- x[!is.na(x)] :
the condition has length > 1 and only the first element will be used
The content of the cells I'm trying to average is a numeric
Can you help?
UPDATE:
Tried:
head(dall) %>% mutate(new = rowMeans(select(., Jan.15:Jan.19)))
Returned the following:
head(dall) %>% mutate(new = rowMeans(select(., Jan.15:Jan.19)))
Code LA.Name LA_Name Jan.20 Feb.20 Mar.20 Apr.20 May.20 Jun.20
1 E06000001 Hartlepool Hartlepool 108 76 89 NA NA NA
2 E06000002 Middlesbrough Middlesbrough 178 98 135 NA NA NA
3 E06000003 Redcar and Cleveland Redcar and Cleveland 150 148 126 NA NA NA
4 E06000004 Stockton-on-Tees Stockton-on-Tees 202 124 175 NA NA NA
5 E06000005 Darlington Darlington 137 90 116 NA NA NA
6 E06000006 Halton Halton 141 101 115 NA NA NA
Jul.20 Aug.20 Sep.20 Oct.20 Nov.20 Dec.20 Jan.19 Feb.19 Mar.19 Apr.19 May.19 Jun.19 Jul.19 Aug.19
1 NA NA NA NA NA NA 92 87 68 81 108 77 97 73
2 NA NA NA NA NA NA 144 116 126 113 123 100 113 118
3 NA NA NA NA NA NA 146 152 133 135 114 101 140 116
4 NA NA NA NA NA NA 192 166 160 133 157 126 136 149
5 NA NA NA NA NA NA 138 110 104 84 115 75 86 104
6 NA NA NA NA NA NA 114 95 83 92 97 88 98 83
Sep.19 Oct.19 Nov.19 Dec.19 Jan.18 Feb.18 Mar.18 Apr.18 May.18 Jun.18 Jul.18 Aug.18 Sep.18 Oct.18
1 69 87 85 99 126 89 97 97 77 65 64 61 76 71
2 117 127 119 121 204 117 112 132 129 106 96 115 103 111
3 108 139 134 145 225 152 135 114 122 116 113 108 113 154
4 136 177 159 173 256 171 189 142 146 149 142 144 128 179
5 77 95 96 119 127 125 98 98 104 76 77 84 79 109
6 91 106 102 121 170 106 114 93 102 93 83 111 91 93
Nov.18 Dec.18 Jan.17 Feb.17 Mar.17 Apr.17 May.17 Jun.17 Jul.17 Aug.17 Sep.17 Oct.17 Nov.17 Dec.17
1 94 97 116 83 101 76 85 86 52 80 85 88 98 94
2 108 121 151 137 131 111 112 114 127 112 113 120 150 151
3 113 129 171 126 158 104 120 134 122 119 107 145 126 134
4 152 174 177 166 176 129 157 148 141 148 168 143 142 186
5 84 100 103 110 105 88 101 89 73 92 87 96 102 86
6 115 96 117 95 115 94 99 105 93 110 110 86 89 84
Jan.16 Feb.16 Mar.16 Apr.16 May.16 Jun.16 Jul.16 Aug.16 Sep.16 Oct.16 Nov.16 Dec.16 Jan.15 Feb.15
1 79 97 90 92 82 87 75 74 74 79 68 93 138 99
2 116 143 138 131 139 95 107 107 102 121 125 142 166 144
3 129 132 147 141 137 137 115 108 115 127 135 124 179 144
4 159 176 171 191 146 169 160 128 161 143 159 161 263 169
5 105 113 85 92 87 92 74 78 91 85 88 86 149 78
6 113 98 108 117 90 99 92 107 101 93 123 111 162 105
Mar.15 Apr.15 May.15 Jun.15 Jul.15 Aug.15 Sep.15 Oct.15 Nov.15 Dec.15 new
1 109 69 82 85 71 65 74 82 81 112 85.89796
2 130 116 127 124 119 104 107 95 115 101 123.51020
3 129 142 136 125 114 108 120 117 108 140 131.61224
4 155 163 127 129 142 101 161 148 140 180 161.30612
5 105 102 78 90 112 91 83 109 97 96 96.34694
6 100 102 99 90 90 81 102 98 86 107 103.02041
>
I have a new column, but the calculation is incorrect. I want an average of all of the 'Jan' columns except for 'Jan.20'
Since you wanted rowwise mean, this will work:
dall$mJan15to19 = rowMeans(dall[,c("Jan.15","Jan.16","Jan.17","Jan.18","Jan.19")])
Suppose, I am having a list of lists like below:
> myList
[[1]]
[1] 0 7 14 21 28 35 42 49 56 63 70 77 84 91 98 105 112 119 126 133 140 147 154 161 168 175 182 189 196 203 210 217
[[2]]
[1] 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141 148 155 162 169 176 183 190 197 204 211 218
[[3]]
[1] 2 9 16 23 30 37 44 51 58 65 72 79 86 93 100 107 114 121 128 135 142 149 156 163 170 177 184 191 198 205 212 219
[[4]]
[1] 3 10 17 24 31 38 45 52 59 66 73 80 87 94 101 108 115 122 129 136 143 150 157 164 171 178 185 192 199 206 213 220
[[5]]
[1] 4 11 18 25 32 39 46 53 60 67 74 81 88 95 102 109 116 123 130 137 144 151 158 165 172 179 186 193 200 207 214 221
How do I search for an element in this list of lists and retrieve the entire list in which it belongs?
I tried something like below:
> myList[grep(7, myList)][[1]]
[1] 0 7 14 21 28 35 42 49 56 63 70 77 84 91 98 105 112 119 126 133 140 147 154 161 168 175 182 189 196 203 210 217
This case looks correct, but when I tried this for the below case, I got the wrong result.
> myList[grep(18, myList)][[1]]
[1] 0 7 14 21 28 35 42 49 56 63 70 77 84 91 98 105 112 119 126 133 140 147 154 161 168 175 182 189 196 203 210 217
while the correct output should be :
[1] 4 11 18 25 32 39 46 53 60 67 74 81 88 95 102 109 116 123 130 137 144 151 158 165 172 179 186 193 200 207 214 221
Is there any possible solution to this?
EDIT::
The sample list can be produced using --
l <- seq(0, 194)
myList <- list()
for (d in l){
temp <- intersect(seq(d, max(l), by = 7),l)
if (any(sapply(myList,function(x) d %in% x)) == FALSE){
myList <- append(myList, list(temp))
}
}
Could try:
myList[sapply(myList, function(x) any(x %in% 7))]
Use purrr package:
library(purrr)
keep(mylist, function(x, y) {any(x == y)}, y = 18)
purrr provides many useful list-handling functions which are documented in a cheatsheet that can be found here
If 18 is a number you wish to find in the list, try:
myList[sapply(myList, function(x) 18 %in% x)]
I am new to R2OpenBUGS and the very enigmatic errors are quite frustrating.
I try to run a model that is quite simple. I had success running similar models before.
Are my problems from the fact that I have a 2-dimensional array (matrix) ?
I tried simplifying the model without success.
Here are the errors:
model is syntactically correct
expected the collection operator c error pos 11
model compiled
expected a number or an NA error pos 1449
initial values generated, model initialized
model is updating
200 updates took 0 s
tau.0 is not a variable in the model
tau.1 is not a variable in the model
model is updating
****** Sorry something went wrong in procedure StdMonitor.Update in module DeviancePlugin ******
And here is the code I use
rm(list=ls(all=TRUE))
cat("\014")
library(R2OpenBUGS)
rat.dat<- read.table("BigRatDat.txt",header=FALSE);
dose = data.matrix(rat.dat[1])
weight = data.matrix(rat.dat[3:13])
N<- length(dose);
cat("
model{
for(i in 1:50){
for(j in 1:11){
weight[i,j]~dnorm(mu[i,j],tau[i])
mu[i,j]<-b.0[i]+b.1[i]*j
}
b.0[i]~dnorm(mu.0[i],tau.0)
b.1[i]~dnorm(mu.1[i],tau.1)
mu.0[i] <-b.00+b.01*dose[i]
mu.1[i] <-b.00+b.01*dose[i]
tau[i]~dgamma(0.01,0.01)
dose[i]~dnorm(0,1)
}
b.00~dnorm(0,0.001)
b.01~dnorm(0,0.001)
b.10~dnorm(0,0.001)
b.11~dnorm(0,0.001)
tau.0~dgamma(0.01,0.01)
tau.1~dgamma(0.01,0.01)
}
",file="Rats2OpenBugs.txt")
data <- list("dose","weight")
inits <- function(){
b.0<-rnorm(n=N,0);
b.1<-rnorm(n=N,0);
b.00<-rnorm(1,0);
b.01<-rnorm(1,0);
b.10<-rnorm(1,0);
b.11<-rnorm(1,0);
tau = rep(1,N);
tau.0 = 1;
tau.1 = 1;
list(b.0=b.0,b.1=b.1,b.00=b.00,b.01=b.01,b.10=b.10,b.11=b.11,tau=tau,tau.0=1,tau.1=1)
}
params <- c("b.0","b.1","b.00","b.01","b.10","b.11","tau","tau.0","tau.1");
output.sim <- bugs(data,inits,params,model.file="Rats2OpenBugs.txt",
n.chains=1, n.iter=5000, n.burnin=200, n.thin=1
,debug=TRUE)
A Datafile:
0 1 54 60 63 74 77 89 93 100 108 114 124
0 2 69 75 81 90 97 120 114 119 126 138 143
0 3 77 81 87 94 101 110 117 124 134 141 151
0 4 64 69 77 83 88 96 104 109 120 123 131
0 5 51 58 62 71 74 81 88 93 99 103 113
0 6 64 71 77 89 90 100 106 114 122 134 139
0 7 80 91 97 101 111 119 129 131 137 147 154
0 8 79 85 89 99 104 105 116 121 132 139 147
0 9 77 82 88 92 101 109 119 127 135 144 158
0 10 79 84 91 98 107 114 119 131 137 146 155
.5 1 62 71 75 79 87 91 100 105 111 121 124
.5 2 68 73 81 89 94 101 110 114 123 132 139
.5 3 94 102 109 110 128 133 147 151 153 171 184
.5 4 81 90 95 102 109 120 128 137 141 154 160
.5 5 64 69 72 76 84 89 97 103 108 114 124
.5 6 67 74 81 81 84 95 100 109 119 128 130
.5 7 73 80 86 89 97 101 110 116 117 135 141
.5 8 71 74 82 84 93 97 102 113 119 124 131
.5 9 69 74 79 89 94 100 107 113 124 134 139
.5 10 60 62 67 74 78 85 92 103 112 121 130
1 1 59 63 66 75 80 87 99 104 110 115 124
1 2 56 66 70 81 77 88 96 100 113 120 130
1 3 71 77 84 80 97 106 111 109 128 133 140
1 4 59 64 69 76 85 88 96 104 110 119 126
1 5 65 70 73 77 85 92 96 101 111 118 121
1 6 61 69 77 81 89 92 107 111 118 127 132
1 7 80 86 95 99 106 113 127 131 142 150 160
1 8 74 80 84 90 99 101 108 117 126 133 140
1 9 71 79 88 90 98 102 116 121 127 139 142
1 10 69 75 80 86 96 97 104 113 122 129 138
The problem was that I was trying to use a matrix with only one column as a vector. R has no problem with that but it does not work when exporting the data to OpenBUGS. The program expects references to a matrix to have 2 indices (for line and column).
I just had to replace:
dose = data.matrix(rat.dat[1])
with:
dose = unlist(as.vector(rat.dat[1]))
Consider the following two code snippets.
A:
download.file("https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FGDP.csv", destfile = "./data/gdp.csv", method = "curl" )
gdp <- read.csv('./data/gdp.csv', header=F, skip=5, nrows=190) # Specify nrows, get correct answer
download.file("https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FEDSTATS_Country.csv", destfile = "./data/education.csv", method = "curl" )
education = read.csv('./data/education.csv')
mergedData <- merge(gdp, education, by.x='V1', by.y='CountryCode')
# No need to remove unranked countries because we specified nrows
# No need to convert V2 from factor to numeric
sortedMergedData = arrange(mergedData, -V2)
sortedMergedData[13,1] # Get KNA, correct answer
B:
download.file("https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FGDP.csv", destfile = "./data/gdp.csv", method = "curl" )
gdp <- read.csv('./data/gdp.csv', header=F, skip=5) # Don't specify nrows, get incorrect answer
download.file("https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FEDSTATS_Country.csv", destfile = "./data/education.csv", method = "curl" )
education = read.csv('./data/education.csv')
mergedData <- merge(gdp, education, by.x='V1', by.y='CountryCode')
mergedData = mergedData[which(mergedData$V2 != ""),] # Remove unranked countries
mergedData$V2 = as.numeric(mergedData$V2) # make V2 a numeric column
sortedMergedData = arrange(mergedData, -V2)
sortedMergedData[13,1] # Get SRB, incorrect answer
I would think the two code snippets would be identical, except that in A you never add the unranked countries to your dataframe and in B you add them but then remove them. Why is the sorting different for these two code snippets?
The file downloads are from Coursera's Getting and Cleaning Data class (Quiz 3, Question 3).
Edit: To avoid security concerns, I've pasted the raw .csv files below
gdp.csv - http://pastebin.com/raw.php?i=4aRZwBRd
education.csv - http://pastebin.com/raw.php?i=0pbhDCSX
Edit2: The problem is occurring in the as.numeric step. For case B, here is mergedData$V2 before and after mergedData$V2 = as.numeric(mergedData$V2) is applied:
> mergedData$V2
[1] 161 105 60 125 32 26 133 172 12 27 68 162 25 140 128 59 76 93
[19] 138 111 69 169 149 96 7 153 113 167 117 165 11 20 36 2 99 98
[37] 121 30 182 166 81 67 102 51 4 183 33 72 48 64 38 159 13 103
[55] 85 43 155 5 185 109 6 114 86 148 175 176 110 42 178 77 160 37
[73] 108 71 139 58 16 10 46 22 47 122 40 9 116 92 3 50 87 145
[91] 120 189 178 15 146 56 136 83 168 171 70 163 84 74 94 82 62 147
[109] 141 132 164 14 188 135 129 137 151 130 118 154 127 152 34 123 144 39
[127] 126 18 23 107 55 66 44 89 49 41 187 115 24 61 45 97 54 52
[145] 8 142 19 73 119 35 174 157 100 88 186 150 63 80 21 158 173 65
[163] 124 156 31 143 91 170 184 101 79 17 190 95 106 53 78 1 75 180
[181] 29 57 177 181 90 28 112 104 134
194 Levels: .. Not available. 1 10 100 101 102 103 104 105 106 107 ... Note: Rankings include only those economies with confirmed GDP estimates. Figures in italics are for 2011 or 2010.
> mergedData$V2 = as.numeric(mergedData$V2)
> mergedData$V2
[1] 72 10 149 32 118 111 41 84 26 112 157 73 110 49 35 147 166 185
[19] 46 17 158 80 58 188 159 63 19 78 23 76 15 105 122 104 191 190
[37] 28 116 94 77 172 156 7 139 126 95 119 162 135 153 124 69 37 8
[55] 176 130 65 137 97 14 148 20 177 57 87 88 16 129 90 167 71 123
[73] 13 161 47 146 70 4 133 107 134 29 127 181 22 184 115 138 178 54
[91] 27 101 90 59 55 144 44 174 79 83 160 74 175 164 186 173 151 56
[109] 50 40 75 48 100 43 36 45 61 38 24 64 34 62 120 30 53 125
[127] 33 91 108 12 143 155 131 180 136 128 99 21 109 150 132 189 142 140
[145] 170 51 102 163 25 121 86 67 5 179 98 60 152 171 106 68 85 154
[163] 31 66 117 52 183 82 96 6 169 81 103 187 11 141 168 3 165 92
[181] 114 145 89 93 182 113 18 9 42
Can anyone explain why the numbers change when I apply as.numeric()?
The real reason for getting different results are in the second case i.e. the full dataset have some footer notes, which were also read with the read.csv resulting in most of the columns to be 'factor' class because of the 'character' elements in the footer. This could have avoided either by
skipping the last few lines using skip argument in read.csv
using stringsAsFactors=FALSE in the read.csv call along with skipping the lines.
The columns were ordered based on the "levels" of the factor.
If you have already read the files without skipping the lines, convert to the respective classes. If it is 'numeric' column, convert it to numeric by as.numeric(as.character(df$column)) or as.numeric(levels(df$column))[df$column].