How can I read text file with converting in Julia? - julia

I try to read a text file in julia but I cannot give any type while reading it gives an error;
data = readdlm("data.txt",'\t', Float64)
at row 1, column 1 : ErrorException("file entry \" 0.00632 18.00 2.310 0 0.5380 6.5750 65.20 4.0900 1 296.0 15.30 396.90 4.98 24.00\" cannot be converted to Float64")
If I dont use Float64, the data type is Array{Any,2}.
this result returns but I have 14 different columns in the data.
" 0.27957 0.00 9.690 0 0.5850 5.9260 42.60 2.3817 6 391.0 19.20 396.90 13.59 24.50"
" 0.17899 0.00 9.690 0 0.5850 5.6700 28.80 2.7986 6 391.0 19.20 393.29 17.60 23.10"
" 0.28960 0.00 9.690 0 0.5850 5.3900 72.90 2.7986 6 391.0 19.20 396.90 21.14 19.70"
" 0.26838 0.00 9.690 0 0.5850 5.7940 70.60 2.8927 6 391.0 19.20 396.90 14.10 18.30"
" 0.23912 0.00 9.690 0 0.5850 6.0190 65.30 2.4091 6 391.0 19.20 396.90 12.92 21.20"

I recommend using the CSV library to parse delimited files. It has features, such as handling repeated delimiters, which will probably deal with your input file.
julia> using Pkg
julia> Pkg.add("CSV")
julia> import CSV
julia> Array(CSV.read("data.txt"; delim=' ', ignorerepeated=true, type=Float64))
4×14 Array{Float64,2}:
0.17899 0.0 9.69 0.0 0.585 5.67 28.8 2.7986 6.0 391.0 19.2 393.29 17.6 23.1
0.2896 0.0 9.69 0.0 0.585 5.39 72.9 2.7986 6.0 391.0 19.2 396.9 21.14 19.7
0.26838 0.0 9.69 0.0 0.585 5.794 70.6 2.8927 6.0 391.0 19.2 396.9 14.1 18.3
0.23912 0.0 9.69 0.0 0.585 6.019 65.3 2.4091 6.0 391.0 19.2 396.9 12.92 21.2

Related

How would I generate matrices to represent the variants of "R" in these equations?

I am essentially trying to make my own code for the nonpartest() function in the npmv package. I have a dataset:
Cattle <- read.table(text=" Treatment Replicate Weight_Loss Persistent Head_Size Salebarn_Q
'LA 200' 1 17.90 14.10 14.25 1.0
'LA 200' 2 19.30 15.30 2.56 1.0
'LA 200' 3 19.50 16.82 5.80 1.5
'LA 200' 4 18.94 12.70 7.51 1.5
Excede 1 19.60 11.20 14.52 1.0
Excede 2 19.50 10.54 9.83 1.0
Excede 3 19.10 10.83 3.82 0.5
Excede 4 20.40 11.00 0.04 1.0
Micotil 1 17.30 14.29 1.62 1.0
Micotil 2 20.00 11.65 0.13 3.0
Micotil 3 18.10 10.89 2.41 0.0
Micotil 4 19.50 12.43 5.93 2.0
Zoetis 1 18.50 25.48 10.08 1.0
Zoetis 2 17.60 20.12 11.93 1.0
Zoetis 3 19.70 23.29 7.93 2.5
Zoetis 4 18.50 28.32 13.08 3.0", header=TRUE)
Which I am trying to use to generate the matrices for Ri. and R.. and Rij in the equation in the paper below so that I can calculate the test statistics G and H
I attempted to do it using
R<-matrix(rank(Cattle,ties.method = "average"),N,p)
R_bar<-matrix(rank(Cattle,ties.method = "average"),1,p)
H<-(1/(a-1))*sum(n*(R-R_bar)*t(R-R_bar))
G<-(1/(N-a)*sum(sum(R-R_bar)*(R_prime-R_bar_prime)))
But that does not work apparently.. I'm not entirely sure what they're describing in the paper in regards to the dimensions of the R matrices.. I know you should use the rank() function and then transpose them using t() for the 'prime' versions
**Images show the excerpts of the paper where the different matrices and their dimensions and how they go in the actual equations are described

Need help using npmv package (nonparametric testing) in R

I'm trying to use the Bathke package "npmv" in R to run the nonpartest() function on a dataset that I created. In their paper they use their implemented code with provided dataset 'sherry':
install.packages("npmv")
library(npmv)
data("sberry",package="npmv")
nonpartest(weight|bot|fungi|rating~treatment,data=sberry,permreps=1000)
Which works perfect for their dataset. However, when I try to run it on my csv dataset which has the exact same dimensions, it does not work for some reason and keeps giving me the error "data set not found" and saying the sample size must be at least 2??
Their dataset is as follows:
Treatment Replicate Weight Botrytis Fungi Phomopsis
Kocide 1 6.9 4.1 17.24 1
Kocide 2 8.3 5.13 5.65 1
Kocide 3 8.4 6.07 8.8 1.5
Kocide 4 7.95 2.72 9.51 1.5
Elevate 1 8.6 1.19 17.06 1
Elevate 2 8.5 0.55 12.86 1
Elevate 3 8.2 0.74 6.76 0.5
Elevate 4 9.5 0.99 1.84 1
V-10135 1 6.2 4.29 4.64 1
V-10135 2 9 1.56 3.03 3
V-10135 3 6.8 0.88 5.6 0
V-10135 4 8.5 2.42 8.66 2
Control 1 7.5 15.6 13.08 1
Control 2 6.7 10.28 14.43 1
Control 3 8.7 13.29 10.92 2.5
Control 4 7.4 18.38 16.03 3
while mine is:
Treatment Replicate Weight_Loss Persistent Head_Size Salebarn_Q
LA 200 1 17.90 14.10 14.25 1.0
LA 200 2 19.30 15.30 2.56 1.0
LA 200 3 19.50 16.82 5.80 1.5
LA 200 4 18.94 12.70 7.51 1.5
Excede 1 19.60 11.20 14.52 1.0
Excede 2 19.50 10.54 9.83 1.0
Excede 3 19.10 10.83 3.82 0.5
Excede 4 20.40 11.00 0.04 1.0
Micotil 1 17.30 14.29 1.62 1.0
Micotil 2 20.00 11.65 0.13 3.0
Micotil 3 18.10 10.89 2.41 0.0
Micotil 4 19.50 12.43 5.93 2.0
Zoetis 1 18.50 25.48 10.08 1.0
Zoetis 2 17.60 20.12 11.93 1.0
Zoetis 3 19.70 23.29 7.93 2.5
Zoetis 4 18.50 28.32 13.08 3.0
(Zoetis being my control)
I tried the code
data("Cattle", package = "npmv")
nonpartest(Weight_Loss|Persistent|Head_Size|Salebarn_Q~Treatment,data=Cattle,permreps=1000)
Any idea how I would be able to return the same test statistics that they get for their example for my dataset? Thanks in advance.
The call to data() is supposed to be used for package-supplied datasets, not for ones you import. I think you are misinterpreting a warning.
data(Cattle,package='npmv')
Warning message:
In data(Cattle, package = "npmv") : data set ‘Cattle’ not found
R reports both 'warnings' and 'errors' and I don't think yours is an error. I get no error when loading your data and running that function:
Cattle <- read.table(text=" Treatment Replicate Weight_Loss Persistent Head_Size Salebarn_Q
'LA 200' 1 17.90 14.10 14.25 1.0
'LA 200' 2 19.30 15.30 2.56 1.0
'LA 200' 3 19.50 16.82 5.80 1.5
'LA 200' 4 18.94 12.70 7.51 1.5
Excede 1 19.60 11.20 14.52 1.0
Excede 2 19.50 10.54 9.83 1.0
Excede 3 19.10 10.83 3.82 0.5
Excede 4 20.40 11.00 0.04 1.0
Micotil 1 17.30 14.29 1.62 1.0
Micotil 2 20.00 11.65 0.13 3.0
Micotil 3 18.10 10.89 2.41 0.0
Micotil 4 19.50 12.43 5.93 2.0
Zoetis 1 18.50 25.48 10.08 1.0
Zoetis 2 17.60 20.12 11.93 1.0
Zoetis 3 19.70 23.29 7.93 2.5
Zoetis 4 18.50 28.32 13.08 3.0", header=TRUE)
Here's the call
nonpartest(Weight_Loss|Persistent|Head_Size|Salebarn_Q~Treatment,data=Cattle,permreps=1000)
Hit <Return> to see next plot:
Hit <Return> to see next plot:
Hit <Return> to see next plot:
Hit <Return> to see next plot:
$results
Test Statistic df1 df2 P-value
ANOVA type test p-value 2.843 6.912 27.6479 0.023
McKeon approx. for the Lawley Hotelling Test NA NA NA NA
Muller approx. for the Bartlett-Nanda-Pillai Test NA NA NA NA
Wilks Lambda NA NA NA NA
Permutation Test p-value
ANOVA type test p-value 0.007
McKeon approx. for the Lawley Hotelling Test NA
Muller approx. for the Bartlett-Nanda-Pillai Test NA
Wilks Lambda NA
$releffects
Weight_Loss Persistent Head_Size Salebarn_Q
Excede 0.71875 0.15625 0.50000 0.30469
LA 200 0.43750 0.59375 0.53125 0.53125
Micotil 0.45312 0.37500 0.23438 0.53125
Zoetis 0.39062 0.87500 0.73438 0.63281
Warning message:
In nonpartest(Weight_Loss | Persistent | Head_Size | Salebarn_Q ~ :
Rank covariance matrix is singular, only ANOVA test returned

Cross-tabulating data with a function

I have data in tree columns
set.seed(42)
N = 1000
XYp = as.data.frame(matrix(cbind(round(runif(N)*100),
round(runif(N)*1000+1000),
round(runif(N),2)),N,3))
colnames(XYp) <- c('X','Y','p')
Now I would like to cross-tabulate the data based on deciles in 2 dimension:
colX_deciles = quantile(data[,'X'], probs=seq(0,1,1/10))
colY_deciles = quantile(data[,'Y'], probs=seq(0,1,1/10))
XYp['X_decile'] <- findInterval(XYp[,'X'],colX_deciles,all.inside = TRUE)
XYp['Y_decile'] <- findInterval(XYp[,'Y'],colY_deciles,all.inside = TRUE)
> colX_deciles
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
0.0 9.9 18.0 29.0 39.0 48.0 57.0 69.0 79.2 91.0 100.0
> colY_deciles
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
1000.0 1088.0 1180.0 1279.4 1392.0 1502.5 1602.4 1711.3 1805.2 1902.0 2000.0
I have figured out that it is possible to calculate the sum of elements in column p using xtabs:
> xtabs(p ~ X_decile + Y_decile, XYp)
Y_decile
X_decile 1 2 3 4 5 6 7 8 9 10
1 2.57 8.74 5.51 5.74 4.40 1.77 5.79 3.43 4.66 3.80
2 6.43 4.25 7.29 5.41 3.08 4.43 8.70 2.62 3.37 4.45
3 1.99 2.80 7.54 2.56 5.02 4.30 7.99 2.03 4.91 6.28
4 4.53 4.90 8.04 3.49 2.25 2.87 7.47 5.41 3.54 9.28
5 2.32 5.82 7.18 4.58 5.39 2.26 0.59 9.61 5.91 5.37
6 7.70 5.50 6.45 7.83 4.65 8.45 1.70 6.40 4.88 4.32
7 7.05 3.87 3.54 3.79 6.15 5.55 6.31 2.31 3.42 6.14
8 4.43 4.50 3.04 3.62 9.92 5.66 3.75 7.01 4.92 7.08
9 3.67 5.56 3.56 7.92 5.05 5.00 3.64 6.74 5.85 3.26
10 5.75 3.17 9.50 5.44 3.64 6.13 3.18 5.93 6.18 3.71
But how to elegantly apply any function to the cross-tabulated matrix element and get the results, for example avg(p) in the following manner? :
> xtabs(mean(p) ~ X_decile + Y_decile, XYp)
Error in model.frame.default(formula = mean(p) ~ X_decile + Y_decile, :
variable lengths differ (found for 'X_decile')
As a bonus, the values of colX_deciles[1:10] and colY_deciles[1:10] could be set as row names and column names, respectively.
I assume you want to use XYp object all the time (sometimes you used data)
I would suggest to immerse the aggregate function inside xtabs
xtabs(p ~ X_decile + Y_decile, aggregate(p ~ X_decile + Y_decile, XYp, mean))
Y_decile
X_decile 1 2 3 4 5 6 7 8 9 10
1 0.4283333 0.5826667 0.5009091 0.4100000 0.5500000 0.2950000 0.5263636 0.4900000 0.3584615 0.4222222
2 0.5358333 0.5312500 0.6627273 0.4918182 0.3850000 0.5537500 0.5800000 0.4366667 0.4814286 0.4450000
3 0.3980000 0.3500000 0.5800000 0.5120000 0.4183333 0.3583333 0.4205263 0.3383333 0.5455556 0.5233333
4 0.4118182 0.3769231 0.6700000 0.5816667 0.5625000 0.3587500 0.6225000 0.3864286 0.5900000 0.7138462
5 0.4640000 0.4476923 0.6527273 0.5088889 0.4900000 0.4520000 0.1966667 0.6006250 0.4925000 0.5370000
6 0.4812500 0.6111111 0.7166667 0.5592857 0.5166667 0.6035714 0.3400000 0.5818182 0.5422222 0.6171429
7 0.5035714 0.5528571 0.4425000 0.5414286 0.5125000 0.3964286 0.4853846 0.5775000 0.4275000 0.4723077
8 0.4430000 0.4090909 0.6080000 0.5171429 0.6200000 0.5660000 0.4687500 0.5392308 0.3784615 0.5446154
9 0.4077778 0.6177778 0.5085714 0.7200000 0.4208333 0.5000000 0.4550000 0.5616667 0.5318182 0.3622222
10 0.5227273 0.4528571 0.6785714 0.3885714 0.3640000 0.4715385 0.5300000 0.5390909 0.6866667 0.5300000

'max.print' option in R

I have a data.frame with 178 rows and 14 columns. When I print it into the R-console, it only shows me 71 rows, despite the max.print option being set to 1000 rows.
Could anyone please explain why max.print option doesn't work to print full dataset in R console? And how can I do that?
I use R 3.4.1 on MacOS.
Here is a data example:
1 1 14.23 1.71 2.43 15.6 127 2.80 3.06 0.28 2.29 5.640000 1.040 3.92 1065
2 1 13.20 1.78 2.14 11.2 100 2.65 2.76 0.26 1.28 4.380000 1.050 3.40 1050
3 1 13.16 2.36 2.67 18.6 101 2.80 3.24 0.30 2.81 5.680000 1.030 3.17 1185
4 1 14.37 1.95 2.50 16.8 113 3.85 3.49 0.24 2.18 7.800000 0.860 3.45 1480
5 1 13.24 2.59 2.87 21.0 118 2.80 2.69 0.39 1.82 4.320000 1.040 2.93 735
6 1 14.20 1.76 2.45 15.2 112 3.27 3.39 0.34 1.97 6.750000 1.050 2.85 1450
7 1 14.39 1.87 2.45 14.6 96 2.50 2.52 0.30 1.98 5.250000 1.020 3.58 1290
8 1 14.06 2.15 2.61 17.6 121 2.60 2.51 0.31 1.25 5.050000 1.060 3.58 1295
9 1 14.83 1.64 2.17 14.0 97 2.80 2.98 0.29 1.98 5.200000 1.080 2.85 1045
10 1 13.86 1.35 2.27 16.0 98 2.98 3.15 0.22 1.85 7.220000 1.010 3.55 1045
11 1 14.10 2.16 2.30 18.0 105 2.95 3.32 0.22 2.38 5.750000 1.250 3.17 1510
12 1 14.12 1.48 2.32 16.8 95 2.20 2.43 0.26 1.57 5.000000 1.170 2.82 1280
13 1 13.75 1.73 2.41 16.0 89 2.60 2.76 0.29 1.81 5.600000 1.150 2.90 1320
14 1 14.75 1.73 2.39 11.4 91 3.10 3.69 0.43 2.81 5.400000 1.250 2.73 1150
15 1 14.38 1.87 2.38 12.0 102 3.30 3.64 0.29 2.96 7.500000 1.200 3.00 1547
16 1 13.63 1.81 2.70 17.2 112 2.85 2.91 0.30 1.46 7.300000 1.280 2.88 1310
17 1 14.30 1.92 2.72 20.0 120 2.80 3.14 0.33 1.97 6.200000 1.070 2.65 1280
18 1 13.83 1.57 2.62 20.0 115 2.95 3.40 0.40 1.72 6.600000 1.130 2.57 1130
19 1 14.19 1.59 2.48 16.5 108 3.30 3.93 0.32 1.86 8.700000 1.230 2.82 1680
20 1 13.64 3.10 2.56 15.2 116 2.70 3.03 0.17 1.66 5.100000 0.960 3.36 845
21 1 14.06 1.63 2.28 16.0 126 3.00 3.17 0.24 2.10 5.650000 1.090 3.71 780
22 1 12.93 3.80 2.65 18.6 102 2.41 2.41 0.25 1.98 4.500000 1.030 3.52 770
23 1 13.71 1.86 2.36 16.6 101 2.61 2.88 0.27 1.69 3.800000 1.110 4.00 1035
24 1 12.85 1.60 2.52 17.8 95 2.48 2.37 0.26 1.46 3.930000 1.090 3.63 1015
25 1 13.50 1.81 2.61 20.0 96 2.53 2.61 0.28 1.66 3.520000 1.120 3.82 845
26 1 13.05 2.05 3.22 25.0 124 2.63 2.68 0.47 1.92 3.580000 1.130 3.20 830
27 1 13.39 1.77 2.62 16.1 93 2.85 2.94 0.34 1.45 4.800000 0.920 3.22 1195
28 1 13.30 1.72 2.14 17.0 94 2.40 2.19 0.27 1.35 3.950000 1.020 2.77 1285
29 1 13.87 1.90 2.80 19.4 107 2.95 2.97 0.37 1.76 4.500000 1.250 3.40 915
30 1 14.02 1.68 2.21 16.0 96 2.65 2.33 0.26 1.98 4.700000 1.040 3.59 1035
31 1 13.73 1.50 2.70 22.5 101 3.00 3.25 0.29 2.38 5.700000 1.190 2.71 1285
32 1 13.58 1.66 2.36 19.1 106 2.86 3.19 0.22 1.95 6.900000 1.090 2.88 1515
33 1 13.68 1.83 2.36 17.2 104 2.42 2.69 0.42 1.97 3.840000 1.230 2.87 990
34 1 13.76 1.53 2.70 19.5 132 2.95 2.74 0.50 1.35 5.400000 1.250 3.00 1235
35 1 13.51 1.80 2.65 19.0 110 2.35 2.53 0.29 1.54 4.200000 1.100 2.87 1095
36 1 13.48 1.81 2.41 20.5 100 2.70 2.98 0.26 1.86 5.100000 1.040 3.47 920
37 1 13.28 1.64 2.84 15.5 110 2.60 2.68 0.34 1.36 4.600000 1.090 2.78 880
38 1 13.05 1.65 2.55 18.0 98 2.45 2.43 0.29 1.44 4.250000 1.120 2.51 1105
39 1 13.07 1.50 2.10 15.5 98 2.40 2.64 0.28 1.37 3.700000 1.180 2.69 1020
40 1 14.22 3.99 2.51 13.2 128 3.00 3.04 0.20 2.08 5.100000 0.890 3.53 760
41 1 13.56 1.71 2.31 16.2 117 3.15 3.29 0.34 2.34 6.130000 0.950 3.38 795
42 1 13.41 3.84 2.12 18.8 90 2.45 2.68 0.27 1.48 4.280000 0.910 3.00 1035
43 1 13.88 1.89 2.59 15.0 101 3.25 3.56 0.17 1.70 5.430000 0.880 3.56 1095
44 1 13.24 3.98 2.29 17.5 103 2.64 2.63 0.32 1.66 4.360000 0.820 3.00 680
45 1 13.05 1.77 2.10 17.0 107 3.00 3.00 0.28 2.03 5.040000 0.880 3.35 885
46 1 14.21 4.04 2.44 18.9 111 2.85 2.65 0.30 1.25 5.240000 0.870 3.33 1080
47 1 14.38 3.59 2.28 16.0 102 3.25 3.17 0.27 2.19 4.900000 1.040 3.44 1065
48 1 13.90 1.68 2.12 16.0 101 3.10 3.39 0.21 2.14 6.100000 0.910 3.33 985
49 1 14.10 2.02 2.40 18.8 103 2.75 2.92 0.32 2.38 6.200000 1.070 2.75 1060
50 1 13.94 1.73 2.27 17.4 108 2.88 3.54 0.32 2.08 8.900000 1.120 3.10 1260
51 1 13.05 1.73 2.04 12.4 92 2.72 3.27 0.17 2.91 7.200000 1.120 2.91 1150
52 1 13.83 1.65 2.60 17.2 94 2.45 2.99 0.22 2.29 5.600000 1.240 3.37 1265
53 1 13.82 1.75 2.42 14.0 111 3.88 3.74 0.32 1.87 7.050000 1.010 3.26 1190
54 1 13.77 1.90 2.68 17.1 115 3.00 2.79 0.39 1.68 6.300000 1.130 2.93 1375
55 1 13.74 1.67 2.25 16.4 118 2.60 2.90 0.21 1.62 5.850000 0.920 3.20 1060
56 1 13.56 1.73 2.46 20.5 116 2.96 2.78 0.20 2.45 6.250000 0.980 3.03 1120
57 1 14.22 1.70 2.30 16.3 118 3.20 3.00 0.26 2.03 6.380000 0.940 3.31 970
58 1 13.29 1.97 2.68 16.8 102 3.00 3.23 0.31 1.66 6.000000 1.070 2.84 1270
59 1 13.72 1.43 2.50 16.7 108 3.40 3.67 0.19 2.04 6.800000 0.890 2.87 1285
60 2 12.37 0.94 1.36 10.6 88 1.98 0.57 0.28 0.42 1.950000 1.050 1.82 520
61 2 12.33 1.10 2.28 16.0 101 2.05 1.09 0.63 0.41 3.270000 1.250 1.67 680
62 2 12.64 1.36 2.02 16.8 100 2.02 1.41 0.53 0.62 5.750000 0.980 1.59 450
63 2 13.67 1.25 1.92 18.0 94 2.10 1.79 0.32 0.73 3.800000 1.230 2.46 630
64 2 12.37 1.13 2.16 19.0 87 3.50 3.10 0.19 1.87 4.450000 1.220 2.87 420
65 2 12.17 1.45 2.53 19.0 104 1.89 1.75 0.45 1.03 2.950000 1.450 2.23 355
66 2 12.37 1.21 2.56 18.1 98 2.42 2.65 0.37 2.08 4.600000 1.190 2.30 678
67 2 13.11 1.01 1.70 15.0 78 2.98 3.18 0.26 2.28 5.300000 1.120 3.18 502
68 2 12.37 1.17 1.92 19.6 78 2.11 2.00 0.27 1.04 4.680000 1.120 3.48 510
69 2 13.34 0.94 2.36 17.0 110 2.53 1.30 0.55 0.42 3.170000 1.020 1.93 750
70 2 12.21 1.19 1.75 16.8 151 1.85 1.28 0.14 2.50 2.850000 1.280 3.07 718
71 2 12.29 1.61 2.21 20.4 103 1.10 1.02 0.37 1.46 3.050000 0.906 1.82 870
[ reached getOption("max.print") -- omitted 107 rows ]```
options(max.print = 99999)
try this command
Type this code at the start of your R code. Worked for me:
options(max.print = .Machine$integer.max)

How to match across 2 data frames IDs and run operations in R loop?

I have 2 data frames, the sampling ("samp") and the coordinates ("coor").
The "samp" data frame:
Plot X Y H L
1 6.4 0.6 3.654 0.023
1 19.1 9.3 4.998 0.023
1 2.4 4.2 5.568 0.024
1 16.1 16.7 5.32 0.074
1 10.8 15.8 6.58 0.026
1 1 16 4.968 0.023
1 9.4 12.4 6.804 0.078
2 3.6 0.4 4.3 0.038
3 12.2 19.9 7.29 0.028
3 2 18.2 7.752 0.028
3 6.5 19.9 7.2 0.028
3 3.7 13.8 5.88 0.042
3 4.9 10.3 9.234 0.061
3 3.7 13.8 5.88 0.042
3 4.9 10.3 9.234 0.061
4 16.3 2.4 5.18 0.02
4 15.7 9.8 10.92 0.096
4 6 12.6 6.96 0.16
5 19.4 16.4 8.2 0.092
10 4.8 5.16 7.38 1.08
11 14.7 16.2 16.44 0.89
11 19 19 10.2 0.047
12 10.8 2.7 19.227 1.2
14 0.6 6.4 12.792 0.108
14 4.6 1.9 12.3 0.122
15 12.2 18 9.6 0.034
16 13 18.3 4.55 0.021
The "coor" data frame:
Plot X Y
1 356154.007 501363.546
2 356154.797 501345.977
3 356174.697 501336.114
4 356226.469 501336.816
5 356255.24 501352.714
10 356529.313 501292.4
11 356334.895 501320.725
12 356593.271 501255.297
14 356350.029 501314.385
15 356358.81 501285.955
16 356637.29 501227.297
17 356652.157 501263.238
18 356691.68 501262.403
19 356755.386 501242.501
20 356813.735 501210.59
22 356980.118 501178.974
23 357044.996 501168.859
24 357133.365 501158.418
25 357146.781 501158.866
26 357172.485 501161.646
I wish to run "for loop" function to register the "samp" data frame with the GPS coordinates from the "coor" data frame -- e.g. the "new_x" variable is the sum output of "X" from the "samp" and the "coor" , under the same "Plot" IDs.
This is what i tried but not working.
for (i in 1:nrow(samp)){
if (samp$Plot[i]==coor$Plot[i]){
(samp$new_x[i]<-(coor$X[i] + samp$X[i]))
} else (samp$new_x[i]<-samp$X[i])
}
The final output i wish to have is with a proper coordinate variable ("new_x") created onto the "samp" data frame. It should looks like this:
Plot X Y H L new_x
1 6.4 0.6 3.654 0.023 356160.407
1 19.1 9.3 4.998 0.023 356173.107
1 2.4 4.2 5.568 0.024 356156.407
1 16.1 16.7 5.32 0.074 356170.107
1 10.8 15.8 6.58 0.026 356164.807
1 1 16 4.968 0.023 356155.007
1 9.4 12.4 6.804 0.078 356163.407
2 3.6 0.4 4.3 0.038 356158.397
3 12.2 19.9 7.29 0.028 356186.897
3 2 18.2 7.752 0.028 356176.697
3 6.5 19.9 7.2 0.028 356181.197
3 3.7 13.8 5.88 0.042 356178.397
3 4.9 10.3 9.234 0.061 356179.597
3 3.7 13.8 5.88 0.042 356178.397
3 4.9 10.3 9.234 0.061 356179.597
4 16.3 2.4 5.18 0.02 356242.769
4 15.7 9.8 10.92 0.096 356242.169
4 6 12.6 6.96 0.16 356232.469
5 19.4 16.4 8.2 0.092 356274.64
10 4.8 5.16 7.38 1.08 356534.113
11 14.7 16.2 16.44 0.89 356349.595
11 19 19 10.2 0.047 356353.895
Any suggestion will be appreciated. Thanks.
You could merge the two datasets and create a new column by summing the X.x and X.y variables.
res <- transform(merge(samp, coor, by='Plot'), new_x=X.x+X.y)[,-c(6:7)]
colnames(res) <- colnames(out) #`out` is the expected result showed
all.equal(res[1:22,], out, check.attributes=FALSE)
#[1] TRUE

Resources