Home » Increase Boost regex speed or use PCRE in C++ [duplicate]

Increase Boost regex speed or use PCRE in C++ [duplicate]

Solutons:


i used the boost and got better result (2 Min)

You’d have to show me that to believe it !!

Using benchmark software from this app RegexFormat that uses Boost, I get less than 3 seconds.

The thing with that benchmark software is you can use a single test line
and run it a million times and its the same as a million lines running it once.

Here are the results, you can try it out for yourself.
Basically, it runs in 2.5 seconds across the board.

Two regexes are tested, one with the extra capture group, one without,
that represents your dual regexes text above.

The target line :

key{info('1'),details('1'),others('{"1": "2test data1", "2": "2test data2"}')}

1 Line run 1,000,000 times:

Regex1:   key[{]info[(][']1['][)],details[(][']1['][)],others[(]['][{](.*?)[}]['][)][}]
Options:  < none >
Completed iterations:   1000  /  1000     ( x 1000 )
Matches found per iteration:   1
Elapsed Time:    2.78 s,   2777.70 ms,   2777696 µs


Regex2:   (key[{]info[(][']1['][)],details[(][']1['][)],others[(]['][{](.*?)[}]['][)][}])
Options:  < none >
Completed iterations:   1000  /  1000     ( x 1000 )
Matches found per iteration:   1
Elapsed Time:    2.89 s,   2893.58 ms,   2893576 µs

1,000 Lines run 1,000 times:

Regex1:   key[{]info[(][']1['][)],details[(][']1['][)],others[(]['][{](.*?)[}]['][)][}]
Options:  < none >
Completed iterations:   1  /  1     ( x 1000 )
Matches found per iteration:   1000
Elapsed Time:    2.38 s,   2381.16 ms,   2381163 µs


Regex2:   (key[{]info[(][']1['][)],details[(][']1['][)],others[(]['][{](.*?)[}]['][)][}])
Options:  < none >
Completed iterations:   1  /  1     ( x 1000 )
Matches found per iteration:   1000
Elapsed Time:    2.50 s,   2495.65 ms,   2495649 µs

10,000 Lines run 100 times:

Regex1:   key[{]info[(][']1['][)],details[(][']1['][)],others[(]['][{](.*?)[}]['][)][}]
Options:  < none >
Completed iterations:   100  /  100     ( x 1 )
Matches found per iteration:   10000
Elapsed Time:    2.38 s,   2384.73 ms,   2384729 µs


Regex2:   (key[{]info[(][']1['][)],details[(][']1['][)],others[(]['][{](.*?)[}]['][)][}])
Options:  < none >
Completed iterations:   100  /  100     ( x 1 )
Matches found per iteration:   10000
Elapsed Time:    2.50 s,   2497.35 ms,   2497349 µs

Finally, an overboard test. 1 Line run 9,999,000 times:

 Regex1:   key[{]info[(][']1['][)],details[(][']1['][)],others[(]['][{](.*?)[}]['][)][}]
Options:  < none >
Completed iterations:   9999  /  9999     ( x 1000 )
Matches found per iteration:   1
Elapsed Time:    27.54 s,   27536.56 ms,   27536560 µs


Regex2:   (key[{]info[(][']1['][)],details[(][']1['][)],others[(]['][{](.*?)[}]['][)][}])
Options:  < none >
Completed iterations:   9999  /  9999     ( x 1000 )
Matches found per iteration:   1
Elapsed Time:    28.73 s,   28726.18 ms,   28726182 µs

According to regex101.com, it takes 610 steps to match your regex against 5 lines. That’s a lot.

It takes 230 steps if you change (.*?) to ([^}]*). This should cut the time to less than 5 minutes.

If you may have }s in your expression that ([^}]*) will fail to match, try ((?:[^}]*}??)*?) instead. It adds 25-40 steps, but may not be as slow as your original.

I can shave off 10 steps if you remove the capture group around the entire expression. You don’t need it, $0 is equivalent.

The thing you need to understand is that C++ uses a different regex engine than PCRE. PCRE is very advanced, and likely includes more optimizations.

Without a better idea of what your data can contain, it will be hard to know what can be optimized.


The other thing you could consider is using moving some of the work done in the regex to C++.

For example, you could try finding all instances of key{info(' and remove key[{]info[(]['] from the start of your regex. You could then try seeing if a match can be made directly after each key{info(' occurrence.

Better yet, why not replace all the }')}s with a character that will never occur anywhere else in the line, then use ([^c]*) instead of (.*?)[}]['][)][}].

Related Solutions

Performance issue with this code [closed]

In short: You should create,open,use,close,dispose Connections where you're using them. The best way is to use the using-statement. By not closing the connection as soon as possible, the Connection-Pool needs to create new physical connections to the dbms which...

Compare a pointer to an integer in C [closed]

Here's what I think you meant to post, it still doesn't compile though, since you can't compare a pointer to a char /* *Description: Construction of a social network */ #include <stdio.h> #include <strings.h> #include <stdlib.h> #define SIZE...

Autocomplete json in textbox

If you are using jQuery UI, the jQuery documentation on autocomplete is straightforward. Put your array as the source: and it should work automatically. IMHO, You seriously need to spend some time for googling and looking into the documentations. jQuery UI...

having all my scores and names in one big array

You need to initialize your array outside of your loop: name_arr = [] while int(students)>int(student): name = input ("what is your name ") score = input ("what is your score ") student = student + 1 name_arr.append(name) name_arr.append(score)...

pacman “exists on filesystem” error

After pacman finally deprecated the --force option and made the surrogate --overwrite option work as expected, the following usage pattern should be noted. A command to reproduce the --force option that blindly overwrites anything that conflicts is this: sudo...

How to determine the maximum number to pass to make -j option?

nproc gives the number of CPU cores/threads available, e.g. 8 on a quad-core CPU supporting two-way SMT. The number of jobs you can run in parallel with make using the -j option depends on a number of factors: the amount of available memory the amount of memory...

Number of Nearest ‘True’ in a matrix or list of list

Definitely not the best way to do it, but it's one that works: import numpy as np mas1 = np.array([[True, False, True], [ False, True, True], [ False, True, False]]) mas_answer = np.ndarray(shape=mas1.shape) for i in range(mas1.shape[0]): for j in...

Trying to display Json data from a web url into a table

You can take this json and put it in the loop through length of the json and show data into the table. This is how i solved it <?php try{ $url="the json url goes here"; // path to your JSON file $data = file_get_contents($url); // put the contents of the...

View v is unreachable statement

Anything else is written after the return keyword it's unreachable. Remove return super.getView(position, convertView, parent); from the first line of your function. This is a warning, telling you that static analysis of the code shows that some of your code...

index out of range but is in fact in range [closed]

Well try to debug your code by yourself first. Anyhow for your question Why is this happening? : It gives you error in postCode = split_address[4] because your list has 4 elements 0,1,2,3 and you are accessing the 4th element which is not present.. you don't...

Ubuntu update error: “waiting for unattended-upgr to exit”

I would first try a softer way. Stop the automatic updater. sudo dpkg-reconfigure -plow unattended-upgrades At the first prompt, choose not to download and install updates. Make a reboot. Make sure any packages in an unclean state are installed correctly. sudo...

how to Styling classes with the same name in a file css [closed]

You need to use :nth-of-type(n) selector. // For First Right Class Div #container .right:nth-of-type(1) { } // For First Left Class Div #container .left:nth-of-type(1) { } Hence for every div you need to change n value. Your question is extremely unclear but I...

Java – different parameters resulting to different outputs

What I think you're trying to achieve is that when you call your method "horn" with some parameter it has to either use "Beep!" or "Boop!". First of: void horn(a,b) Is not a valid function signature in Java, in a java function you always have to specify what...

Cannot use method returned value into another method

Using @super's suggestion and a little warning fixing. The two important changes are in the line as suggested by @super: printf("r=%.3f; phi=%.3fn",distanta(),phi()); The variables 'r' and 'unghi' are both variables local to member functions and cannot be...

Class has no member speak? [closed]

void::speak(); //THE GLOBAL SCOPE HAS NO SPEAK It's interpreting this as void ::speak() where leading an identifier (a name) with :: indicates to C++, "Look in the global scope of all names". :: is the "scope resolution operator" In the header file, you should...

Convert code with multiple lines into one line

Read the docs! A simple statement is comprised within a single logical line. Several simple statements may occur on a single line separated by semicolons. Search Stack Overflow! How to put multiple statements in one line? Or google, to find converters for more...

How to POSITION my Marker to Always Follow the Slider-Handle?

Youc can set a position to image using Jquery See fiddle //set a begining position to img var slider = $(".slider")[0]; var sliderPos = slider.value / slider.max; var pixelPostion = slider.clientWidth * sliderPos; $(".img").css("left",pixelPostion-7 + "px");...

css nth-child() check board pattern [closed]

This is pretty simple, as the pattern is repeated over 2 rows of 4, you just need to apply styles to 8n + i for the chequered pattern: .flex { display: flex; width: 400px; /* width of four squares */ flex-direction: row; flex-wrap: wrap; } .square { width:...