How do I get rid of commas and periods, etc in R? [duplicate] - r

This question already has answers here:
How to load comma separated data into R?
(2 answers)
Closed 6 years ago.
This is my data set:
Depth.Fe
1 0,14.21
2 3,19.35
3 10,17.22
4 14,15.87
5 23,13.62
6 30,16.31
7 36,14.13
8 48,13.95
9 59,15
10 66,14.23
11 68,16.81
12 81,15.93
13 94,16.02
14 96,17.85
15 102,17.02
16 115,15.87
17 121,19.84
18 130,16.94
19 163,16.72
20 168,19.2
21 205,20.41
22 239,16.88
23 251,18.74
24 283,16.67
25 297,18.56
26 322,18.87
27 335,20.81
28 351,24.52
29 370,25.03
30 408,25.11
31 416,23.28
32 419,22.56
33 425,19
34 429,20.53
35 443,19.08
36 447,22.83
37 465,21.06
38 474,24.96
39 493,19.12
40 502,22.24
41 522,26.88
42 550,21.15
43 558,28.92
44 571,27.96
45 586,25.03
46 596,26.27
I want depth and Fe to be separated as individual columns, but nothing I try is working.
please help

First of all, #akrun is definitely right in his comment to your post. If this is a dataset imported from somewhere, then follow his comment.
Assuming that somehow you were handed this weird dataset, I would try this:
df <- data.frame(matrix(as.numeric(unlist(strsplit(df$Depth.Fe,split=","))),nrow=2,byrow = T),stringsAsFactors = F)
colnames(df) <- c("Depth","Fe")
This would take a dataset that looks like this:
Depth.Fe
1 0,14.21
2 3,19.35
to this:
Depth Fe
1 0 14.21
2 3 19.34

Related

Yahoo financial data in r with zoo [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
Hello i want to import a . csv file in r, i have the following code :
fbmonthly<-read.zoo("E:\\R\\Stockforecast\\Data\\AAPLmonthly.csv",sep=",",header= TRUE, format = '%m/%Y', FUN=as.Date)
Although i have this error :
Error in read.zoo("E:\R\Stockforecast\Data\AAPLmonthly.csv", sep = ",", :
index has bad entries at data rows: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62
my csv file looks like 01-03-13 0 0 0 0 0 0
01-04-13 63.128571 63.607143 55.014286 63.254284 47.519821 2740872400
01-05-13 63.494286 66.535713 59.842857 64.247147 48.265705 2361882600
01-06-13 64.389999 64.918571 55.552856 56.647144 44.609531 1754634000
01-07-13 57.527142 65.334282 57.317142 64.647141 50.909512 1634528700
01-08-13 65.10714 73.391426 64.751427 69.602859 54.812115 2014584600
01-09-13 70.442856 72.559998 63.888573 68.10714 56.215424 2157735300
01-10-13 68.349998 77.035713 68.325714 74.671425 61.633572 1959433000
01-11-13 74.860001 79.761429 73.197144 79.438568 65.568352 1306288900
01-12-13 79.714287 82.162857 76.971428 80.145714 68.953758 1764349300
01-01-14 79.382858 80.028572 70.507141 71.514282 61.527653 2191488600
01-02-14 71.80143 78.741432 71.328575 75.177139 64.679031 1470091700
01-03-14 74.774284 78.428574 74.687141 76.677139 68.836685 1250424700
01-04-14 76.822861 85.632858 73.047142 84.298569 75.678795 1608765200
01-05-14 84.571426 92.024284 82.904289 90.428574 81.181992 1433917100
01-06-14 90.565712 95.050003 88.928574 92.93 86.802559 1206934800
01-07-14 93.519997 99.440002 92.57 95.599998 89.296494 1035086000
01-08-14 94.900002 102.900002 93.279999 102.5 95.741524 937077000
can you please help me?
Thanks
It's not a "CSV" file. It's delimiter appears to be whitespace, which is what read.zoo would use by default. No header, also the default for read.zoo. Need to correct the date format:
read.zoo(text="01-03-13 0 0 0 0 0 0
01-04-13 63.128571 63.607143 55.014286 63.254284 47.519821 2740872400",
format = '%m-%d-%y')
V2 V3 V4 V5 V6 V7
2013-01-03 0.00000 0.00000 0.00000 0.00000 0.00000 0
2013-01-04 63.12857 63.60714 55.01429 63.25428 47.51982 2740872400
fbmonthly<-read.zoo("E:\\R\\Stockforecast\\Data\\AAPLmonthly.csv",sep=",",header= TRUE, format = '%d-%m-%Y', FUN=as.Date)
If you have header and comma separated file it seems you have the date format wrong.
so you say to use fbmonthly<-read.zoo("E:\\R\\Stockforecast\\Data\\AAPLmonthly.csv",sep=".",header= TRUE, format = '%m-%Y', FUN=as.yearmon)
instead?

Morans correlogram with only one point. What is wrong?

Im trying Moran's I and respective plot in r. But the plot has only one point. I have no idea of what is going wrong. The code is based on<
http://rstudio-pubs-static.s3.amazonaws.com/9688_a49c681fab974bbca889e3eae9fbb837.html>
my data called "coordenata"
resid x y
1 0.07785411 -53.20342 -22.66700
2 -0.28358702 -53.20389 -22.66864
3 -0.64011338 -53.21392 -22.68122
4 1.22071249 -53.21311 -22.72369
5 0.95734778 -53.28469 -22.75289
6 0.35345302 -53.25822 -22.74850
7 -0.68357738 -53.28344 -22.70694
8 -1.24596010 -53.32950 -22.72872
9 -0.19944162 -53.33669 -22.73561
10 0.67544909 -53.36756 -22.80767
11 0.64002961 -53.35947 -22.79958
12 0.04564233 -53.21889 -22.67419
13 0.01618436 -53.24522 -22.70144
14 -2.65436794 -53.23017 -22.69292
15 0.72096256 -53.25539 -22.69978
16 0.89656515 -53.28489 -22.72222
17 1.85358579 -53.33069 -22.79161
18 -0.03590077 -53.33200 -22.78336
19 0.32348975 -53.33494 -22.78586
20 2.06771402 -53.37781 -22.77869
21 -1.02190709 -53.30492 -22.77244
22 -2.02813250 -53.53917 -22.79856
23 -1.20702445 -53.53858 -22.79406
24 -1.24091732 -53.55272 -22.80536
25 -1.13491596 -53.56181 -22.82914
26 -0.82934613 -53.56422 -22.83417
27 1.23418758 -53.60017 -22.85531
28 -1.72808514 -53.65900 -22.97828
29 -0.02144049 -53.65908 -22.97497
30 0.49174568 -53.64597 -22.95439
31 -0.54408149 -53.64217 -22.91033
32 -0.37111342 -53.61447 -22.86269
33 -0.31121931 -53.27153 -22.70036
34 0.32419211 -53.30308 -22.72183
35 1.57980287 -53.33053 -22.72947
36 -1.91156060 -53.34633 -22.74722
37 -0.79036645 -53.23667 -22.68925
the code
coordinates(coordenata)<-c("x","y")
fit2<-correlog(coordenata$x,coordenata$y,coordenata$resid,increment=5,resamp=100,quiet=T)
plot(fit2)
Thanks in advance for any help!

plotting a two column data frame with date

I'm having trouble puting this little data.frame into a plot. I use the plot() fx but it just gives me back a plot which X axis is not the date in the first column.
> DDDhabd
Mes DDD.1000hab.día
1 Ene-14 0.03564701
2 Feb-14 0.03959695
3 Mar-14 0.04677090
4 Abr-14 0.04928782
5 May-14 0.03783808
6 Jun-14 0.04939231
7 Jul-14 0.05464189
8 Ago-14 0.05208003
9 Set-14 0.05475650
10 Oct-14 0.05290589
11 Nov-14 0.05714252
12 Dic-14 0.05056313
13 Ene-15 0.05688352
14 Feb-15 0.05710022
15 Mar-15 0.05754084
16 Abr-15 0.04362755
17 May-15 0.06209153
18 Jun-15 0.05715994
19 Jul-15 0.04373711
20 Ago-15 0.02462424
21 Set-15 0.03812404
22 Oct-15 0.08368198
23 Nov-15 0.07506378
24 Dic-15 0.05974877
I would really appreciate if you could give me a hint about where is my mistake.
Thanks

R efficiently add up tables in different order

At some point in my code, I get a list of tables that looks much like this:
[[1]]
cluster_size start end number p_value
13 2 12 13 131 4.209645e-233
12 1 12 12 100 6.166824e-185
22 11 12 22 132 6.916323e-143
23 12 12 23 133 1.176194e-139
13 1 13 13 31 3.464284e-38
13 68 13 117 34 3.275941e-37
23 78 23 117 2 4.503111e-32
....
[[2]]
cluster_size start end number p_value
13 2 12 13 131 4.209645e-233
12 1 12 12 100 6.166824e-185
22 11 12 22 132 6.916323e-143
23 12 12 23 133 1.176194e-139
13 1 13 13 31 3.464284e-38
....
While I don't show the full table here I know they are all the same size. What I want to do is make one table where I add up the p-values. Problem is that the $cluster_size, start, $end and $number columns don't necessarily correspond to the same row when I look at the table in different list elements so I can't just do a simple sum.
The brute force way to do this is to: 1) make a blank table 2) copy in the appropriate $cluster_size, $start, $end, $number columns from the first table and pull the correct p-values using a which() statement from all the tables. Is there a more clever way of doing this? Or is this pretty much it?
Edit: I was asked for a dput file of the data. It's located here:
http://alrig.com/code/
In the sample case, the order of the rows happen to match. That will not always be the case.
Seems like you can do this in two steps
Convert your list to a data.frame
Use any of the split-apply-combine approaches to summarize.
Assuming your data was named X, here's what you could do:
library(plyr)
#need to convert to data.frame since all of your list objects are of class matrix
XDF <- as.data.frame(do.call("rbind", X))
ddply(XDF, .(cluster_size, start, end, number), summarize, sump = sum(p_value))
#-----
cluster_size start end number sump
1 1 12 12 100 5.550142e-184
2 1 13 13 31 3.117856e-37
3 1 22 22 1 9.000000e+00
...
29 105 23 117 2 6.271469e-16
30 106 22 146 13 7.266746e-25
31 107 23 146 12 1.382328e-25
Lots of other aggregation techniques are covered here. I'd look at data.table package if your data is large.

R sorts a vector on its own accord

df.sorted <- c("binned_walker1_1.grd", "binned_walker1_2.grd", "binned_walker1_3.grd",
"binned_walker1_4.grd", "binned_walker1_5.grd", "binned_walker1_6.grd",
"binned_walker2_1.grd", "binned_walker2_2.grd", "binned_walker3_1.grd",
"binned_walker3_2.grd", "binned_walker3_3.grd", "binned_walker3_4.grd",
"binned_walker3_5.grd", "binned_walker4_1.grd", "binned_walker4_2.grd",
"binned_walker4_3.grd", "binned_walker4_4.grd", "binned_walker4_5.grd",
"binned_walker5_1.grd", "binned_walker5_2.grd", "binned_walker5_3.grd",
"binned_walker5_4.grd", "binned_walker5_5.grd", "binned_walker5_6.grd",
"binned_walker6_1.grd", "binned_walker7_1.grd", "binned_walker7_2.grd",
"binned_walker7_3.grd", "binned_walker7_4.grd", "binned_walker7_5.grd",
"binned_walker8_1.grd", "binned_walker8_2.grd", "binned_walker9_1.grd",
"binned_walker9_2.grd", "binned_walker9_3.grd", "binned_walker9_4.grd",
"binned_walker10_1.grd", "binned_walker10_2.grd", "binned_walker10_3.grd")
One would expect that order of this vector would be 1:length(df.sorted), but that appears not to be the case. It looks like R internally sorts the vector according to its logic but tries really hard to display it the way it was created (and is seen in the output).
order(df.sorted)
[1] 37 38 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
[26] 23 24 25 26 27 28 29 30 31 32 33 34 35 36
Is there a way to "reset" the ordering to 1:length(df.sorted)? That way, ordering, and the output of the vector would be in sync.
Use the mixedsort (or) mixedorder functions in package gtools:
require(gtools)
mixedorder(df.sorted)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
[28] 28 29 30 31 32 33 34 35 36 37 38 39
construct it as an ordered factor:
> df.new <- ordered(df.sorted,levels=df.sorted)
> order(df.new)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ...
EDIT :
After #DWins comment, I want to add that it is even not nessecary to make it an ordered factor, just a factor is enough if you give the right order of levels :
> df.new2 <- factor(df.sorted,levels=df.sorted)
> order(df.new)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ...
The difference will be noticeable when you use those factors in a regression analysis, they can be treated differently. The advantage of ordered factors is that they let you use comparison operators as < and >. This makes life sometimes a lot easier.
> df.new2[5] < df.new2[10]
[1] NA
Warning message:
In Ops.factor(df.new[5], df.new[10]) : < not meaningful for factors
> df.new[5] < df.new[10]
[1] TRUE
Isn't this simply the same thing you get with all lexicographic shorts (as e.g. ls on directories) where walker10_foo sorts higher than walker1_foo?
The easiest way around, in my book, is to use a consistent number of digits, i.e. I would change to binned_walker01_1.grd and so on inserting a 0 for the one-digit counts.
In response to Dwin's comment on Dirk's answer: the data are always putty in your hands. "This is R. There is no if. Only how." -- Simon Blomberg
You can add 0 like so:
df.sorted <- gsub("(walker)([[:digit:]]{1}_)", "\\10\\2", df.sorted)
If you needed to add 00, you do it like this:
df.sorted <- gsub("(walker)([[:digit:]]{1}_)", "\\10\\2", df.sorted)
df.sorted <- gsub("(walker)([[:digit:]]{2}_)", "\\10\\2", df.sorted)
...and so on.

Resources