Swap row 1-22 with 23-48 - r

sampleFiles <- list.files(path="/path",pattern="*.txt");
> sampleFiles
[1] "D104.txt" "D121.txt" "D153.txt" "D155.txt" "D161.txt" "D162.txt" "D167.txt"
[8] "D173.txt" "D176.txt" "D177.txt" "D179.txt" "D204.txt" "D221.txt" "D253.txt"
[15] "D255.txt" "D261.txt" "D262.txt" "D267.txt" "D273.txt" "D276.txt" "D277.txt"
[22] "D279.txt" "N101.txt" "N108.txt" "N113.txt" "N170.txt" "N171.txt" "N172.txt"
[29] "N175.txt" "N181.txt" "N182.txt" "N183.txt" "N186.txt" "N187.txt" "N188.txt"
[36] "N201.txt" "N208.txt" "N213.txt" "N270.txt" "N271.txt" "N272.txt" "N275.txt"
[43] "N281.txt" "N282.txt" "N283.txt" "N286.txt" "N287.txt" "N288.txt"
How can I get all started with "N" first and "D" last? In other words swap them.

If you want to sort by letter (N, D) and number (101, ..) you could -just- swap your elements:
#random vector
vec <- c("D104.txt", "D121.txt", "D279.txt", "N101.txt", "N108.txt", "N113.txt")
#swap places
vec[c(grep("N", vec), grep("D", vec))]
[1] "N101.txt" "N108.txt" "N113.txt" "D104.txt" "D121.txt" "D279.txt"
grep finds what element of the vector has the pattern wanted. So, we move elements with "N" in front and with "D" in the back.
If you just want to sort with letters and numbers decreasing, you just (like Thomas suggested):
sort(vec, decreasing = T)
[1] "N113.txt" "N108.txt" "N101.txt" "D279.txt" "D121.txt" "D104.txt"
Also, since you know the indices of the elements you want to swap, then:
sampleFiles[c(23:48, 1:22)]

In this case it would be as simple as:
sampleFiles[c(23:48, 1:22)]
More general solutions have been suggested including, but sort(sampleFiles) would NOT succeed with "D" < "N". You could have used:
sampleFiles[rev(order(substr(sampleFiles, 1,1)))]
If you just used:
sampleFiles[rev(order(sampleFiles, 1,1))]
.. then the numeric values would get reversed as well. So you could have used chartr to swap them as the argument to order to selectively reverse the values of only "D" and "N":
sampleFiles[ order( chartr(c("DN"), c("ND"), sampleFiles) ) ]

Related

Substitution of strings results in incorrect names

I,d like to change several strings in vector. In my case, I have in all.images object:
# Original character's list
all.images <-c("S2B2A_20171003_124_IndianaIIPR00911120170922_BOA_10.tif",
"S2B2A_20181028_124_IndianaIIPR0065820181024_BOA_10.tif",
"S2B2A_20170715_124_SantaMariaCalcasPR0033420170731_BOA_10.tif",
"S2B2A_20180928_124_NSraAparecidaBortolettoPR0042720180912_BOA_10.tif",
"S2A2A_20170610_124_LagoaAmarelaPR0022020170619_BOA_10.tif",
"S2A2A_20160705_124_AguaSumidaPR001320160629_BOA_10.tif",
"S2A2A_20181023_124_SaoPedroGabrielGarciaPR001720181031_BOA_10.tif",
"S2B2A_20180908_124_NSraAparecidaBortolettoPR001920180911_BOA_10.tif",
"S2A2A_20180824_124_NSraAparecidaBortolettoPR0043320180911_BOA_10.tif",
"S2A2A_20170720_124_VoAnaPR001520170802_BOA_10.tif",
"S2B2A_20180322_124_SaoMateusPR0021920180314_BOA_10.tif",
"S2A2A_20181212_124_NSradeFatimaJoaoBatistaPR002320181128_BOA_10.tif",
"S2A2A_20180413_081_SantaFeSebastiaoFogacaPR0021920180427_BOA_10.tif",
"S2B2A_20170913_124_PerdizesPR0034920170905_BOA_10.tif",
"S2A2A_20170610_124_TresMeninasPR001820170601_BOA_10.tif",
"S2B2A_20180428_081_SantaFeSebastiaoFogacaPR0021020180501_BOA_10.tif",
"S2B2A_20180508_081_SantaFeSebastiaoFogacaPR0022320180427_BOA_10.tif",
"S2A2A_20170809_124_VoAnaPR001620170803_BOA_10.tif",
"S2B2A_20180819_124_PontalIIPR0012220180801_BOA_10.tif",
"S2B2A_20181214_081_NSradeFatimaJoaoBatistaPR002320181128_BOA_10.tif",
"S2A2A_20180423_081_SantaFeSebastiaoFogacaPR0033920180427_BOA_10.tif",
"S2A2A_20180814_124_PontalIIPR0012220180801_BOA_10.tif",
"S2B2A_20170715_124_VoAnaPR0015A20170803_BOA_10.tif",
"S2A2A_20160615_124_AguaSumidaPR0011220160627_BOA_10.tif",
"S2A2A_20170720_124_SantaMariaCalcasPR0022820170726_BOA_10.tif",
"S2A2A_20180913_124_SantaMariaCalcasPR001620180829_BOA_10.tif",
"S2B2A_20170804_124_NSraAparecidaBortolettoPR0035720170811_BOA_10.tif",
"S2A2A_20170809_124_SantaFeBaracatPR001920170801_BOA_10.tif",
"S2B2A_20180322_124_NSradeFatimaGlebaAPR001320180403_BOA_10.tif",
"S2B2A_20180508_081_SantaFeSebastiaoFogacaPR0021920180427_BOA_10.tif")
#
My idea is 1) remove S2B2A_ and _BOA_10.tif; 2) After S2B2A_ convert the 8 values into dates (e.g. 2017-09-05); 3) After the dates take the next three
values to the end (eg. 124 or 081); and 4) Separate the characters based in capital letters and dates (eg. AguaSumidaPR0011220160627 to AguaSumida-PR00112-2016-06-27).
But when I try to do:
sub("^\\w+_(\\d+)_(\\d+)_([A-Za-z]+)([A-Z]{2}\\d{3})(\\d)(\\d{4})(\\d{2})(\\d+)_.*",
"\\3_\\4_\\5_\\6-\\7-\\8_\\1_\\2", all.images)
[1] "IndianaII_PR009_1_1120-17-0922_20171003_124"
[2] "IndianaII_PR006_5_8201-81-024_20181028_124"
...
[28] "SantaFeBaracat_PR001_9_2017-08-01_20170809_124"
[29] "NSradeFatimaGlebaA_PR001_3_2018-04-03_20180322_124"
[30] "SantaFeSebastiaoFogaca_PR002_1_9201-80-427_20180508_081"
I have incorrected dates (eg. in [30] 9201-80-427_20180508_081) and my desirable output needs to be:
[1] "IndianaII_PR009111_2017-09-22_2017-10-03_124"
[2] "IndianaII_PR00658_2018-10-24_2018-10-28_124"
...
[28] "SantaFeBaracat_PR0019_2017-08-01_2017-08-09_124"
[29] "NSradeFatimaGlebaA_PR0013_2018-04-03_2018-03-22_124"
[30] "SantaFeSebastiaoFogaca_PR00219_2018-04-27_2018-05-08_081"
Please any help with it?
I think this handles those exceptions in the comments on your answer using look ahead:
sub("^\\w+_(\\d{4})(\\d{2})(\\d{2})_(\\d+)_([A-Za-z]+)([A-Z]{2}\\w+)(?=\\d{8})+(\\d{4})(\\d{2})(\\d+)_.*",
"\\5_\\6_\\7-\\8-\\9_\\1-\\2-\\3_\\4", all.images, perl = TRUE)

shift a variable one position down

In R, I have a variable called test which has 19 elements
> test
[1] 2014538.23 4487086.00 1334284.39 -1043651.88 -2717872.52 7823769.24 -3362387.51 2769196.46
[9] -3252671.72 -3799388.26 -91410.81 1631932.15 6462360.52 -4523175.28 4876797.43 -1900613.35
[17] 188371.84 484573.51 -2483920.48
and I would like to move all elements down by one position, and the first element would then be NA, increasing the total elements to 20.
If I try:
lag(test,n=1)
I get the following elements:
> lag(test,n=1)
[1] NA 2014538.23 4487086.00 1334284.39 -1043651.88 -2717872.52 7823769.24 -3362387.51
[9] 2769196.46 -3252671.72 -3799388.26 -91410.81 1631932.15 6462360.52 -4523175.28 4876797.43
[17] -1900613.35 188371.84 484573.51
which are still 19. How can I implement this?
You basically want to add NA not shift your data with a lag. In this case you can just concatenate NA in your vector, i.e.
c(NA, test)
You can use below code-
> append(values=NA,x=test,after=0)
Note: You can use after parameter in the above function to provide position at which value is to be appended.
Input Data:
> test <- c(2014538.23 , 4487086.00 , 1334284.39 ,-1043651.88 ,-2717872.52 , 7823769.24 ,-3362387.51 , 2769196.46,
-3252671.72 ,-3799388.26 , -91410.81 , 1631932.15 ,6462360.52, -4523175.28 , 4876797.43 ,-1900613.35,
188371.84 , 484573.51 ,-2483920.48)

Cannot index a list with vector[1] of same class as direct character (where the latter can index)

Sorry for the confusing title:
I have a list, y, that is indexed by characters, e.g. y$"100"$VALUE returns some vector that I am looking for.
However, when I store a vector, list.index = as.character(...), where "..." is a bunch of numbers, I can get list.index[1] == "100" to return TRUE, but typing y$list.index[1]$VALUE does not return the vector I saw above, but instead returns NULL.
I am very confused, since I wouldn't think it would matter if the blank in y$_____$VALUE were the "100" or list.index[1] since the two are equivalent. They are both of the class "character".
Any advice? Thanks!
In short:
> y$"100"$VALUE
[1] 10.9950 11.1900 11.1400 10.7100 10.4100 9.3300 9.6700 9.0400 9.2000 9.5400 9.9000 9.8000 9.2600 9.7500 9.2900 9.5900 9.3000 9.2500 9.8800 9.8500
[21] 9.3100 10.1400 9.9800 10.3000 9.3100 9.9000 9.6300 9.6300 9.7500 9.5300 9.4000 9.3500 9.5300 9.4700 9.4000 9.3400 9.3620 9.3900 9.5500 9.7000
[41] 9.5000 9.7699 9.9000 9.6500 9.7000 9.7900 9.5000 9.8500 9.1700 8.9900 8.9000 9.0800 8.7010 8.3000 8.5260 8.5350 8.6120 8.5400 8.9800 8.7600
[61] 9.2584 8.9900 7.5000 8.5000 8.5000
> y$list.index[1]$VALUE
NULL
> class(list.index[1])
[1] "character"
> class("100")
[1] "character"
> list.index[1]
[1] "100"
> class(y)
[1] "list"
> list.index[1] == "100"
[1] TRUE
Straight from the manual (?`$`):
Both [[ and $ select a single element of the list. The main difference
is that $ does not allow computed indices, whereas [[ does.
Therefore y$list.index[1]$VALUE doesn't work because $ looks for a list name that looks like list.index[1]. You should instead use y[[list.index[1]]]$VALUE or y[[list.index[1]]][["VALUE"]].

R: How to remove quotation marks in a vector of strings, but maintain vector format as to call each individual value?

I want to create a vector of names that act as variable names so I can then use themlater on in a loop.
years=1950:2012
for(i in 1:length(years))
{
varname[i]=paste("mydata",years[i],sep="")
}
this gives:
> [1] "mydata1950" "mydata1951" "mydata1952" "mydata1953" "mydata1954" "mydata1955" "mydata1956" "mydata1957" "mydata1958"
[10] "mydata1959" "mydata1960" "mydata1961" "mydata1962" "mydata1963" "mydata1964" "mydata1965" "mydata1966" "mydata1967"
[19] "mydata1968" "mydata1969" "mydata1970" "mydata1971" "mydata1972" "mydata1973" "mydata1974" "mydata1975" "mydata1976"
[28] "mydata1977" "mydata1978" "mydata1979" "mydata1980" "mydata1981" "mydata1982" "mydata1983" "mydata1984" "mydata1985"
[37] "mydata1986" "mydata1987" "mydata1988" "mydata1989" "mydata1990" "mydata1991" "mydata1992" "mydata1993" "mydata1994"
[46] "mydata1995" "mydata1996" "mydata1997" "mydata1998" "mydata1999" "mydata2000" "mydata2001" "mydata2002" "mydata2003"
[55] "mydata2004" "mydata2005" "mydata2006" "mydata2007" "mydata2008" "mydata2009" "mydata2010" "mydata2011" "mydata2012"
All I want to do is remove the quotes and be able to call each value individually.
I want:
>[1] mydata1950 mydata1951 mydata1952 mydata1953, #etc...
stored as a variable such that
varname[1]
> mydata1950
varname[2]
> mydata1951
and so on.
I have played around with
cat(varname[i],"\n")
but this just prints values as one line and I can't call each individual string. And
gsub("'",'',varname)
but this doesn't seem to do anything.
Suggestions? Is this possible in R? Thank you.
There are no quotes in that character vector's values. Use:
cat(varname)
.... if you want to see the unquoted values. The R print mechanism is set to use quotes as a signal to your brain that distinct values are present. You can also use:
print(varname, quote=FALSE)
If there are that many named objects in you workspace, then you need desperately to learn to use lists. There are mechanisms for "promoting" character values to names, but this would be seen as a failure on your part to learn to use the language effectively:
var <- 2
> eval(as.name('var'))
[1] 2
> eval(parse(text="var"))
[1] 2
> get('var')
[1] 2

r-find two closest values in a vector

I tried to find two values in the following vector, which are close to 10. The expected value is 10.12099196 and 10.63054170. Your inputs would be appreciated.
[1] 0.98799517 1.09055728 1.20383713 1.32927166 1.46857509 1.62380423 1.79743107 1.99241551 2.21226576 2.46106916 2.74346924 3.06455219 3.42958354 3.84350238 4.31005838
[16] 4.83051356 5.40199462 6.01590035 6.65715769 7.30532785 7.93823621 8.53773241 9.09570538 9.61755743 10.12099196 10.63018180 11.16783243 11.74870531 12.37719092 13.04922392
[31] 13.75661322 14.49087793 15.24414627 16.00601247 16.75709565 17.46236358 18.06882072 18.51050094 18.71908344 18.63563523 18.22123225 17.46709279 16.40246292 15.09417699 13.63404124
[46] 12.11854915 10.63054170 9.22947285 7.95056000 6.80923943 5.80717982 4.93764782 4.18947450 3.54966795 3.00499094 2.54283599 2.15165780 1.82114213 1.54222565 1.30703661
[61] 1.10879707 0.94170986 0.80084308 0.68201911 0.58171175 0.49695298 0.42525021 0.36451350 0.31299262 0.26922281 0.23197860 0.20023468 0.17313291 0.14995459 0.13009730
[76] 0.11305559 0.09840485 0.08578789 0.07490387 0.06549894 0.05735864
Another alternative could be allowing the user to control for the "tolerance" in order to set what "closeness" is, this can be done by using a simple function:
close <- function(x, value, tol=NULL){
if(!is.null(tol)){
x[abs(x-10) <= tol]
} else {
x[order(abs(x-10))]
}
}
Where x is a vector of values, value is the value of comparison for closeness, and tol is logical, if it's NULL it returns all the "close" values ordered by "closeness" to value, otherwise it returns just the values meeting the condition given in tol.
> close(x, value=10, tol=.7)
[1] 9.617557 10.120992 10.630182 10.630542
> close(x, value=10)
[1] 10.12099196 9.61755743 10.63018180 10.63054170 9.22947285 9.09570538 11.16783243
[8] 8.53773241 11.74870531 7.95056000 7.93823621 12.11854915 12.37719092 7.30532785
[15] 13.04922392 6.80923943 6.65715769 13.63404124 13.75661322 6.01590035 5.80717982
[22] 14.49087793 5.40199462 4.93764782 15.09417699 4.83051356 15.24414627 4.31005838
[29] 4.18947450 16.00601247 3.84350238 16.40246292 3.54966795 3.42958354 16.75709565
[36] 3.06455219 3.00499094 2.74346924 2.54283599 17.46236358 17.46709279 2.46106916
[43] 2.21226576 2.15165780 1.99241551 18.06882072 1.82114213 1.79743107 18.22123225
[50] 1.62380423 1.54222565 18.51050094 1.46857509 18.63563523 1.32927166 1.30703661
[57] 18.71908344 1.20383713 1.10879707 1.09055728 0.98799517 0.94170986 0.80084308
[64] 0.68201911 0.58171175 0.49695298 0.42525021 0.36451350 0.31299262 0.26922281
[71] 0.23197860 0.20023468 0.17313291 0.14995459 0.13009730 0.11305559 0.09840485
[78] 0.08578789 0.07490387 0.06549894 0.05735864
In the first example I defined "closeness" to be at most a difference of 0.7 between value and each elements in x. In the second example the function close returns a vector of values where the firsts are the closest to the value given in value and the lasts are the farest values from value.
Since my solution does not provide an easy (practical) way to find tol as #Arun pointed out, one way to find the closest values would be seting tol=NULL and asking for the exact number of close values as in:
> close(x, value=10)[1:3]
[1] 10.120992 9.617557 10.630182
This shows the three values in x closest to 10.
I can't think of a way without using sort. However, you can speed it up by using partial sort.
x[abs(x-10) %in% sort(abs(x-10), partial=1:2)[1:2]]
# [1]  9.617557 10.120992
In case the same values are present more than once, you'll get all of them here. So, you can either wrap this with unique or you can use match instead as follows:
x[match(sort(abs(x-10), partial=1:2)[1:2], abs(x-10))]
# [1] 10.120992 9.617557
dput output:
dput(x)
c(0.98799517, 1.09055728, 1.20383713, 1.32927166, 1.46857509,
1.62380423, 1.79743107, 1.99241551, 2.21226576, 2.46106916, 2.74346924,
3.06455219, 3.42958354, 3.84350238, 4.31005838, 4.83051356, 5.40199462,
6.01590035, 6.65715769, 7.30532785, 7.93823621, 8.53773241, 9.09570538,
9.61755743, 10.12099196, 10.6301818, 11.16783243, 11.74870531,
12.37719092, 13.04922392, 13.75661322, 14.49087793, 15.24414627,
16.00601247, 16.75709565, 17.46236358, 18.06882072, 18.51050094,
18.71908344, 18.63563523, 18.22123225, 17.46709279, 16.40246292,
15.09417699, 13.63404124, 12.11854915, 10.6305417, 9.22947285,
7.95056, 6.80923943, 5.80717982, 4.93764782, 4.1894745, 3.54966795,
3.00499094, 2.54283599, 2.1516578, 1.82114213, 1.54222565, 1.30703661,
1.10879707, 0.94170986, 0.80084308, 0.68201911, 0.58171175, 0.49695298,
0.42525021, 0.3645135, 0.31299262, 0.26922281, 0.2319786, 0.20023468,
0.17313291, 0.14995459, 0.1300973, 0.11305559, 0.09840485, 0.08578789,
0.07490387, 0.06549894, 0.05735864)
I'm not sure your question is clear, so here's another approach. To find the value closest to your first desired value, 10.12099196 , subtract that from the vector, take the absolute value, and then find the index of the closest element. Explicit:
delx <- abs( 10.12099196 - x)
min.index <- which.min(delx) #returns index of first minimum if there are duplicates
x[min.index] #gets you the value itself
Apologies if this was not the intent of your question.

Resources