I'm working on diffing some ldif files where each section begins with "dn: leaf,branch3,branch2,branch1,root" I would like the dn (distinguished name) for each section to be displayed, and the Unix diff utility has a feature to do that: --show-function-line=regular expression. However, the diff util truncates the dn line in the output, which makes it harder to know the full path.
current command:
diff -U 0 --show-function-line="^dn\: .*" file1.ldif file2.ldif > deltas.txt
example output:
## -56 +56 ## dn: administratorId=0,applicationName=pl
-previousLoginTime: 20120619180751Z
+previousLoginTime: 20120213173659Z
original dn:
dn: administratorId=0,applicationName=platform,nodeName=NODENAME
I would like the entire original line to be included in the output. Is there a way to do this?
Thanks,
Rusty
I solved it by editing the source code and recompiling.
in src/context.c: print_context_function (FILE *out, char const *function)
changed line:
for (j = i; j < i + 40 && function[j] != '\n'; j++)
to
for (j = i; j < i + 100 && function[j] != '\n'; j++)
The "40" was limiting the number of characters output to 40, so I increased it to 100, which should be large enough for my needs. That check could probably be omitted entirely, and let it just check for function[j] != '\n', but I decided to leave it as is.
Related
The problem is as such:
given an array of N numbers, find two numbers in the array such that they will have a range(max - min) value of K.
for example:
input:
5 3
25 9 1 6 8
output:
9 6
So far, what i've tried is first sorting the array and then finding two complementary numbers using a nested loop. However, because this is a sort of brute force method, I don't think it is as efficient as other possible ways.
import java.util.*;
public class Main {
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
int n = sc.nextInt(), k = sc.nextInt();
int[] arr = new int[n];
for(int i = 0; i < n; i++) {
arr[i] = sc.nextInt();
}
Arrays.sort(arr);
int count = 0;
int a, b;
for(int i = 0; i < n; i++) {
for(int j = i; j < n; j++) {
if(Math.max(arr[i], arr[j]) - Math.min(arr[i], arr[j]) == k) {
a = arr[i];
b = arr[j];
}
}
}
System.out.println(a + " " + b);
}
}
Much appreciated if the solution was in code (any language).
Here is code in Python 3 that solves your problem. This should be easy to understand, even if you do not know Python.
This routine uses your idea of sorting the array, but I use two variables left and right (which define two places in the array) where each makes just one pass through the array. So other than the sort, the time efficiency of my code is O(N). The sort makes the entire routine O(N log N). This is better than your code, which is O(N^2).
I never use the inputted value of N, since Python can easily handle the actual size of the array. I add a sentinel value to the end of the array to make the inner short loops simpler and quicker. This involves another pass through the array to calculate the sentinel value, but this adds little to the running time. It is possible to reduce the number of array accesses, at the cost of a few more lines of code--I'll leave that to you. I added input prompts to aid my testing--you can remove those to make my results closer to what you seem to want. My code prints the larger of the two numbers first, then the smaller, which matches your sample output. But you may have wanted the order of the two numbers to match the order in the original, un-sorted array--if that is the case, I'll let you handle that as well (I see multiple ways to do that).
# Get input
N, K = [int(s) for s in input('Input N and K: ').split()]
arr = [int(s) for s in input('Input the array: ').split()]
arr.sort()
sentinel = max(arr) + K + 2
arr.append(sentinel)
left = right = 0
while arr[right] < sentinel:
# Move the right index until the difference is too large
while arr[right] - arr[left] < K:
right += 1
# Move the left index until the difference is too small
while arr[right] - arr[left] > K:
left += 1
# Check if we are done
if arr[right] - arr[left] == K:
print(arr[right], arr[left])
break
Is there a way to get the number of lines in a file without importing it?
So far this is what I am doing
myfiles <- list.files(pattern="*.dat")
myfilesContent <- lapply(myfiles, read.delim, header=F, quote="\"")
for (i in 1:length(myfiles)){
test[[i]] <- length(myfilesContent[[i]]$V1)
}
but is too time consuming since each file is quite big.
You can count the number of newline characters (\n, will also work for \r\n on Windows) in a file. This will give you a correct answer iff:
There is a newline char at the end of last line (BTW, read.csv gives a warning if this doesn't hold)
The table does not contain a newline character in the data (e.g. within quotes)
I'll suffice to read the file in parts. Below I set chunk (tmp buf) size of 65536 bytes:
f <- file("filename.csv", open="rb")
nlines <- 0L
while (length(chunk <- readBin(f, "raw", 65536)) > 0) {
nlines <- nlines + sum(chunk == as.raw(10L))
}
print(nlines)
close(f)
Benchmarks on a ca. 512 MB ASCII text file, 12101000 text lines, Linux:
readBin: ca. 2.4 s.
#luis_js's wc-based solution: 0.1 s.
read.delim: 39.6 s.
EDIT: reading a file line by line with readLines (f <- file("/tmp/test.txt", open="r"); nlines <- 0L; while (length(l <- readLines(f, 128)) > 0) nlines <- nlines + length(l); close(f)): 32.0 s.
If you:
still want to avoid the system call that a system2("wc"… will cause
are on BSD/Linux or OS X (I didn't test the following on Windows)
don't mind a using a full filename path
are comfortable using the inline package
then the following should be about as fast as you can get (it's pretty much the 'line count' portion of wc in an inline R C function):
library(inline)
wc.code <- "
uintmax_t linect = 0;
uintmax_t tlinect = 0;
int fd, len;
u_char *p;
struct statfs fsb;
static off_t buf_size = SMALL_BUF_SIZE;
static u_char small_buf[SMALL_BUF_SIZE];
static u_char *buf = small_buf;
PROTECT(f = AS_CHARACTER(f));
if ((fd = open(CHAR(STRING_ELT(f, 0)), O_RDONLY, 0)) >= 0) {
if (fstatfs(fd, &fsb)) {
fsb.f_iosize = SMALL_BUF_SIZE;
}
if (fsb.f_iosize != buf_size) {
if (buf != small_buf) {
free(buf);
}
if (fsb.f_iosize == SMALL_BUF_SIZE || !(buf = malloc(fsb.f_iosize))) {
buf = small_buf;
buf_size = SMALL_BUF_SIZE;
} else {
buf_size = fsb.f_iosize;
}
}
while ((len = read(fd, buf, buf_size))) {
if (len == -1) {
(void)close(fd);
break;
}
for (p = buf; len--; ++p)
if (*p == '\\n')
++linect;
}
tlinect += linect;
(void)close(fd);
}
SEXP result;
PROTECT(result = NEW_INTEGER(1));
INTEGER(result)[0] = tlinect;
UNPROTECT(2);
return(result);
";
setCMethod("wc",
signature(f="character"),
wc.code,
includes=c("#include <stdlib.h>",
"#include <stdio.h>",
"#include <sys/param.h>",
"#include <sys/mount.h>",
"#include <sys/stat.h>",
"#include <ctype.h>",
"#include <err.h>",
"#include <errno.h>",
"#include <fcntl.h>",
"#include <locale.h>",
"#include <stdint.h>",
"#include <string.h>",
"#include <unistd.h>",
"#include <wchar.h>",
"#include <wctype.h>",
"#define SMALL_BUF_SIZE (1024 * 8)"),
language="C",
convention=".Call")
wc("FULLPATHTOFILE")
It'd be better as a package since it actually has to compile the first time through. But, it's here for reference if you really do need "speed". For a 189,955 line file I had lying around, I get (mean values from a bunch of runs):
user system elapsed
0.007 0.003 0.010
I found this easy way using R.utils package
library(R.utils)
sapply(myfiles,countLines)
here is how it works
Maybe I am missing something but usually I do it using length on top of ReadLines:
con <- file("some_file.format")
length(readLines(con))
This at least has worked with many cases I had. I think it's kinda fast and it does only create a connection to the file without importing it.
If you are using linux, this might work for you:
# total lines on a file through system call to wc, and filtering with awk
target_file <- "your_file_name_here"
total_records <- as.integer(system2("wc",
args = c("-l",
target_file,
" | awk '{print $1}'"),
stdout = TRUE))
in your case:
#
lapply(myfiles, function(x){
as.integer(system2("wc",
args = c("-l",
x,
" | awk '{print $1}'"),
stdout = TRUE))
}
)
Here is another way with CRAN package fpeek, function peek_count_lines. This function is coded in C++ and is pretty fast.
library(fpeek)
sapply(filenames, peek_count_lines)
We have:
n1 number of {} brackets ,
n2 number of () brackets ,
n3 number of [] brackets ,
How many different valid combination of these brackets we can have?
What I thought: I wrote a brute force code in java (which comes in the following) and counted all possible combinations, I know it's the worst solution possible,
(the code is for general case in which we can have different types of brackets)
Any mathematical approach ?
Note 1: valid combination is defined as usual, e.g. {{()}} : valid , {(}){} : invalid
Note 2: let's assume that we have 2 pairs of {} , 1 pair of () and 1 pair of [], the number of valid combinations would be 168 and the number of all possible (valid & invalid) combinations would be 840
static void paranthesis_combination(char[] open , char[] close , int[] arr){
int l = 0;
for (int i = 0 ; i < arr.length ; i++)
l += arr[i];
l *= 2;
paranthesis_combination_sub(open , close , arr , new int[arr.length] , new int[arr.length], new StringBuilder(), l);
System.out.println(paran_count + " : " + valid_paran_count);
return;
}
static void paranthesis_combination_sub(char[] open , char[] close, int[] arr , int[] open_so_far , int[] close_so_far, StringBuilder strbld , int l){
if (strbld.length() == l && valid_paran(open , close , strbld)){
System.out.println(new String(strbld));
valid_paran_count++;
return;
}
for (int i = 0 ; i < open.length ; i++){
if (open_so_far[i] < arr[i]){
strbld.append(open[i]);
open_so_far[i]++;
paranthesis_combination_sub(open , close, arr , open_so_far , close_so_far, strbld , l);
open_so_far[i]--;
strbld.deleteCharAt(strbld.length() -1 );
}
}
for (int i = 0 ; i < open.length ; i++){
if (close_so_far[i] < open_so_far[i]){
strbld.append(close[i]);
close_so_far[i]++;
paranthesis_combination_sub(open , close, arr , open_so_far , close_so_far, strbld , l);
close_so_far[i]--;
strbld.deleteCharAt(strbld.length() -1 );
}
}
return;
}
Cn is the nth Catalan number, C(2n,n)/(n+1), and gives the number of valid strings of length 2n that use only (). So if we change all [] and {} into (), there would be Cn1+n2+n3 ways. Then there are C(n1+n2+n3,n1) ways to change n1 () back to {}, and C(n2+n3,n3) ways to change the remaining () into []. Putting that all together, there are C(2n1+2n2+2n3,n1+n2+n3)C(n1+n2+n3,n1)C(n2+n3,n3)/(n1+n2+n3+1) ways.
As a check, when n1=2, n2=n3=1, we have C(8,4)C(4,2)C(2,1)/5=168.
In general, infinitely. However I assume, that you meant to find how many combinations are there provided limited string length. For simplicity lets assume that the limit is an even number. Then, lets create an initial string:
(((...()...))) with length equal to the limit.
Then, we can switch any instance of () pair with [] or {} parenthesis. However, if we change an opening brace, then we ought to change the matching closing brace. So, we can look only at the opening braces, or at pairs. For each parenthesis pair we have 4 options:
leave it unchanged
change it to []
change it to {}
remove it
So, for each of (l/2) objects we choose one of four labels, which gives:
4^(l/2) possibilities.
EDIT: this assumes only "concentric" parenthesis strings (contained in each other), as you've suggested in your edit. Intuitively however, a valid combination is also: ()[]{} - this solution does not take this into account.
I can't get this to work. I want to replace all two character occurences in the first field of a csv file with the occurence and an X appended, and whitespace removed. For example SA and SA should map to SAX in the new file. Below is what I tried with sed (from help through an earlier question)
system( paste("sed ","'" ,' s/^GG/GGX/g; s/^GG\\s/GGX/g; s/^GP/GPX/g;
s/^GP\\s/GPX/g; s/^FG/FGX/g; s/^FG\\s/FGX/g; s/^SA/SAX/g; s/^SA\\s/SAX/g;
s/^TP/TPX/g; s/^TP\\s/TPX/g ',"'",' ./data/concat_csv.2 >
./data/concatenated_csv.2 ',sep=''))
I tried using the sQuote() function, but this still doesn't help. The file has problems being handled by read.csv because there are errors within some fields based on too many and not enough separators on certain lines.
I could try reading in and editing the file in pieces, but I don't know how to do that as a streaming process.
I really just want to edit the first field of the file using a system() call. The file is about 30GB.
try the following on a file like so:
echo "fi,second,third" | awk '{len = split($0,array,","); str = ""; for (i = 1; i <= len; ++i) if (i == 1) { m = split(array[i],array2,""); if (m == 2) {str = array[i]"X";} else {str = array[i]};} else str = str","array[i]; print str;}'
so you would call it from R using the following as input to the paste() call
cat fileNameToBeRead | awk '{len = split($0,array,","); str = ""; for (i = 1; i <= len; ++i) if (i == 1) { m = split(array[i],array2,""); if (m == 2) {str = array[i]"X";} else {str = array[i]};} else str = str","array[i]; print str;}' > newFile
this code won't handle your whitespace requirement though. could you provide examples to demonstrate the sort of functionality you're looking at
i wonder which hexdump() scapy uses, since i would like to modify it, but i simply cant find anything.
what i DO find is:
def hexdump(self, lfilter=None):
for i in range(len(self.res)):
p = self._elt2pkt(self.res[i])
if lfilter is not None and not lfilter(p):
continue
print "%s %s %s" % (conf.color_theme.id(i,"%04i"),
p.sprintf("%.time%"),
self._elt2sum(self.res[i]))
hexdump(p)
but that simply is an alternative for pkt.hexdump(), which does a pkt.summary() with a following hexdump(pkt)
could anyone tell me where to find the hexdump(pkt) sourcecode?
what i want to have is the hex'ed packet, almost like str(pkt[0]) (where i can check byte by byte via str(pkt[0])[0] ), but with nothing else than hexvalues, just like displayed in hexdump(pkt).
maybe you guys could help me out with this one :)
found it, so, to answer my own question, it is located in utils.py
def hexdump(x):
x=str(x)
l = len(x)
i = 0
while i < l:
print "%04x " % i,
for j in range(16):
if i+j < l:
print "%02X" % ord(x[i+j]),
else:
print " ",
if j%16 == 7:
print "",
print " ",
print sane_color(x[i:i+16])
i += 16