How to remove whitespaces between letters NOT numbers
For example:
Input
I ES P 010 000 000 000 000 000 001 001 000 000 IESP 000 000
Output
IESP 010 000 000 000 000 000 001 001 000 000 IESP 000 000
I tried something like this
gsub("(?<=\\b\\w)\\s(?=\\w\\b)", "", x,perl=T)
But wasn't able to arrive at the output I was hoping for
Use gsub to replace whitespace " " with nothing "" between letters then return replacement and letters.
Input <- "I ES P 010 000 000 000 000 000 001 001 000 000 IESP 000 000"
gsub("([A-Z]) ([A-Z])", "\\1\\2", Input)
[1] "IESP 010 000 000 000 000 000 001 001 000 000 IESP 000 000"
Edit after #Wiktor Stribiżew comment (replaced [A-z] to [a-zA-Z]):
For lower and upper case use [a-zA-Z]
Input <- "I ES P 010 000 000 000 000 000 001 001 000 000 IESP 000 000 aaa ZZZ"
gsub("([a-zA-Z]) ([a-zA-Z])", "\\1\\2", Input)
[1] "IESP 010 000 000 000 000 000 001 001 000 000 IESP 000 000 aaaZZZ"
You need to use
Input <- "I ES P E ES P 010 000 000 000 000 000 001 001 000 000 IESP 000 000"
gsub("(?<=[A-Z])\\s+(?=[A-Z])", "", Input, perl=TRUE, ignore.case = TRUE)
## gsub("(*UCP)(?<=\\p{L})\\s+(?=\\p{L})", "", Input, perl=TRUE) ## for Unicode
See the R demo online and a regex demo.
NOTE: The ignore.case = TRUE will make the pattern case insensitive, if it is not expected, remove this argument.
Details
(?<=[A-Z]) (or (?<=\p{L})) - a letter must appear immediately to the left of the current location (without adding it to the match)
\\s+ - 1 or more whitespaces
(?=[A-Z]) (or (?=\\p{L})) - a letter must appear immediately to the right of the current location (without adding it to the match).
Related
I am looping to load multiple xlsx files. This I am doing well. But when I want to add the name of the columns of the documents (the same names for all files) I have not managed to do it.
library(dplyr)
library(readr)
library(openxlsx)
library(readxl)
setwd("C:/Users/MiguelAngel/Documents/R Miguelo/Guillermo Ahumada")
ldf <- list()
listxlsx <- dir(pattern = "*.xlsx")
for (k in 1:length(listxlsx)){
ldf[[k]] <-as.data.frame(read.xlsx(listxlsx[k]))
}
The result:
355 1500 1100 43831
1 190 850 600 43832
2 93 4000 3000 43833
3 114 4000 3000 43834
4 431 1000 700 43835
5 182 1000 700 43836
6 496 500 300 43837
7 254 500 300 43838
8 174 600 300 43839
9 397 1500 945 43840
10 198 1500 900 43841
11 271 1500 900 43842
12 94 3000 2000 43843
13 206 400 230 43844
14 305 1500 1100 43845
15 184 850 600 43846
16 90 4000 3000 43847
17 70 4000 3000 43848
18 492 1000 700 43849
19 168 1000 700 43850
20 530 500 300 43851
They load all the files well but without the name of the columns.
I need add the name of columns:
list_file <- dir(pattern = "*.xlsx") %>%
lapply(read.xlsx) %>% # *I use stringAsFactor but appear error.
bind_rows
but appear this
list_file
Form of original columns all files
I need put this columns names after make the loop with for.
Thanks for help me guys
I cannot check this since I don't have Excel files to load, but I think this should work:
listxlsx <- list.files(path = "C:/Users/MiguelAngel/Documents/R Miguelo/Guillermo Ahumada", pattern = "*.xlsx", full.nams = TRUE)
names(listxlsx) <- listxlsx
purrr::map_dfr(listxlsx, readxl::read_excel, .id = "Filename")
(The first line is a better practice to get the filenames than relying on setwd.)
When listxlsx is a named vector the function map_dfr gives a column named Filename where the values are taken from listxlsx.
How can I format my varible Number to format like this:
1
10
100
1 000
10 000
100 000
1 000 000
10 000 000
100 000 000
1 000 000 000
My plan is to reformat my Numberstring then check if str(Number)in element.text
element.text output is:
xxxxxxx Sälj xxxxxxx
B xxxxxxx B zzz
zzzzzz zzz 1 500 0,767 1 150,50 SEK 19:14
Time=datetime.now().strftime('%H:%M')[:]
Time=19:14
Action=Sälj
Number=int(1500)
sring=f'{number:,}'
sring=sring.replace(','," ")
Number=sring
result_elements = driver2.find_elements_by_xpath("""//*[#data-deal_id]""")
for element in result_elements:
if Time,Action,Number in element.text:
print('order was executed')
print(element.text)
I would like to extract number information in cells which is located beside specific string. My data looks like this.
item stock
PRE 24GUSSETX4SX15G 200
PLS 12KLRX10SX15G 200
ADU 24SBX200ML 200
NIS 18BNDX40SX11G 200
REF 500GX12BTL 200
i want to extract the numbers which located besides string 'GUSSET','KLR','SB','BND' and 'BTL'. I want to use this number to do multiplication with the stock. For example like this.
item stock pcs total
PRE 24GUSSETX4SX15G 200 24 4800
PLS 12KLRX10SX15G 200 12 2400
ADU 24SBX200ML 200 24 4800
NIS 18BNDX40SX11G 200 18 3600
REF 500GX12BTL 200 12 2400
anyone know how to extract the numbers? thanks very much in advance
One way using base R, is to use sub to extract numbers besides those groups and multiply them with stock to get total.
df$pcs <- as.numeric(sub(".*?(\\d+)(GUSSET|KLR|SB|BND|BTL).*", "\\1", df$item))
df$total <- df$stock * df$pcs
df
# item stock pcs total
#PRE 24GUSSETX4SX15G 200 24 4800
#PLS 12KLRX10SX15G 200 12 2400
#ADU 24SBX200ML 200 24 4800
#NIS 18BNDX40SX11G 200 18 3600
#REF 500GX12BTL 200 12 2400
Or everything in one pipe
library(dplyr)
df %>%
mutate(pcs = as.numeric(sub(".*?(\\d+)(GUSSET|KLR|SB|BND|BTL).*", "\\1", item)),
total = stock * pcs)
We can do this in tidyverse
library(tidyverse)
df %>%
mutate(pcs = as.numeric(str_extract(item, "(\\d+)(?=(GUSSET|KLR|SB|BND|BTL))")),
total = pcs * stock)
# item stock pcs total
#1 PRE 24GUSSETX4SX15G 200 24 4800
#2 PLS 12KLRX10SX15G 200 12 2400
#3 ADU 24SBX200ML 200 24 4800
#4 NIS 18BNDX40SX11G 200 18 3600
#5 REF 500GX12BTL 200 12 2400
data
df <- structure(list(item = c("PRE 24GUSSETX4SX15G", "PLS 12KLRX10SX15G",
"ADU 24SBX200ML", "NIS 18BNDX40SX11G", "REF 500GX12BTL"), stock = c(200L,
200L, 200L, 200L, 200L)), class = "data.frame", row.names = c(NA,
-5L))
relatively new to AWK here. Wanting to compare two files. First two columns are to match in order to compare the 3rd column. 3rd column needs to be 100 larger in order to print that line from the second file. Some data may exist in one file but not in the other. I don't think it matters to AWK, but spaceing isn't very consistent for delimination. Here is a small snipit.
File1
USTL_WR_DATA MCASYNC#L -104 -102 -43 -46
USTL_WR_DATA SMC#L 171 166 67 65
TC_MCA_GCKN SMC#L -100 -100 0 0
WDF_ARRAY_DW0(0) DCDC#L 297 297 101 105
WDF_ARRAY_DW0(0) MCASYNC#L 300 300 50 50
WDF_ARRAY_DW0(0) MCMC#L 12 11 34 31
File2
TC_MCA_GCKN SMC#L 200 200 0 0
WDF_ARRAY_DW0(0) DCDC#L 842 867 271 270
WDF_ARRAY_DW0(0) MCASYNC#L 300 300 50 50
WDF_ARRAY_DW0(1) SMCw#L 300 300 50 50
WDF_ARRAY_DW0(2) DCDC#L 896 927 279 286
WDF_ARRAY_DW0(2) MCASYNC#L 300 300 50 50
Output
TC_MCA_GCKN SMC#L 200 200 0 0
WDF_ARRAY_DW0(0) DCDC#L 842 867 271 270
Here is my code. Not working. Not sure why.
awk 'NR==FNR{a[$1,$2];b[$3];next} (($1,$2) in a) && ($3> (b[$1]+100))' File1 File2
NR==FNR{a[$1,$2];b[$3];next} makes two arrays from the first file (I had issues making it one), the first two columns go in a to confirm we're comparing the same thing, and the third column I'm using to compare since late mode high seems like a reasonable assert to compare
(($1,$2) in a) makes sure first two columns in second file are the ones we're comparing to.
&& ($3> (b[$1]+100))' I think this is what's giving the issue. Supposed to see if second file column 3 is 100 or more greater than first file column 3 (first and only column in array b)
you need to key the value with the same ($1,$2) combination. Since we don't use a for any other purposes just store the value there.
$ awk 'NR==FNR {a[$1,$2]=$3; next}
($1,$2) in a && $3>a[$1,$2]+100' file1 file2
TC_MCA_GCKN SMC#L 200 200 0 0
WDF_ARRAY_DW0(0) DCDC#L 842 867 271 270
I have the following lines:
123 abcd 456 xyz
123 abcd 678 xyz
234 egfs 434 ert
345 fggfgf 456 455 rty
234 egfs 422 ert 33
So here, if the first field is same for multiple lines, they are considered duplicate. so, in the above example 123 is same in 2 lines, they are considered duplicates (though they differ in one field in the middle). Similarly, lines with 234 are duplicates.
I need to remove these duplicate lines.
Since they aren't 100% duplicates, sort u doesn't work. Does anyone know how i can delete these duplicate lines?
this would be a very easy task for awk, I would do it with awk. In vim, you can do:
% !awk '\!a[$1]++'
then you got:
123 abcd 456 xyz
234 egfs 434 ert
345 fggfgf 456 455 rty
if you do it in shell, you don't have to escape the !:
awk '!a[$1]++' file
g/\%(^\1\>.*$\n\)\#<=\(\k\+\).*$/d
This is easy with my PatternsOnText plugin. It allows to specify a pattern that is ignored for the duplicate check; in your case, that would be everything after the first (space-delimited) field:
%DeleteDuplicateLinesIgnoring / .*/