Home » Disproving Euler proposition by brute force in C

# Disproving Euler proposition by brute force in C

## Solutons:

Note: At some point, this review drifted into the realm of assembler and GMP. An actual review is at the end of this post, whereas the first section discusses the runtime-problems concerning `pow`, wrong data types and arbitrary large integers.

# No life time for run time

Would there be any way (on my current machine) to get this to run in my lifetime?

There’s a great saying in carpentry: measure twice, cut once. It concerns cutting wood or other material, where you have to throw away your resources if you accidentally cut at the wrong place.

A similar saying is there for software engineers: you can’t optimize what you can’t measure. There are several ways to measure your code, e.g. benchmarking, profiling, or looking at the generated as­sembler to see how many instructions a certain part of your code will take.

Here, we will take the latter route, start with the assembler, take considerations step by step and see where we end up.

## A study in assembly

Lets have a look at your code. Well, not yours, but the assembler the compiler generates. You can use `gcc -S -O3`. On my platform, this results in the following “hot” section in `main`:

``````.L6:
cmp     rbx, 500000
je      .L18
.L8:
mov     rax, QWORD PTR .LC0[rip]
movsd   xmm0, QWORD PTR [rsp+40]
movq    xmm1, rax
call    pow                        ; (1)
mov     rax, QWORD PTR .LC0[rip]
movsd   QWORD PTR [rsp+8], xmm0
movsd   xmm0, QWORD PTR [rsp+48]
movq    xmm1, rax
call    pow                        ; (2)
mov     rax, QWORD PTR .LC0[rip]
movsd   QWORD PTR [rsp+16], xmm0
movsd   xmm0, QWORD PTR [rsp+32]
movq    xmm1, rax
call    pow                        ; (3)
mov     rax, QWORD PTR .LC0[rip]
movsd   QWORD PTR [rsp+24], xmm0
pxor    xmm0, xmm0
cvtsi2sd        xmm0, rbx
movq    xmm1, rax
call    pow                        ; (4)
movsd   xmm2, QWORD PTR [rsp+8]
movapd  xmm1, xmm0
movsd   xmm0, QWORD PTR [rsp+24]
ucomisd xmm0, xmm1
jp      .L6
jne     .L6
``````

Even though you might not know assembler, you can see those four calls to `pow`. The first thing you need to know is that `call` is slow compared to those other operations. Those four `call`s happen in the innermost loop. The compiler removed the `call` to `prop` and instead replaced it by its code (that’s faster).

`mov*` assigns values to registers, `add*` adds values, and so on. The registers with `xmm*` are double precision registers, meant for `double` variables. So we’re basically calling `pow` with the right values and then add, subtract and modify our small little double values.

## Double trouble

But wait a second. We’re trying to solve a completely integral problem! Why does our generated program use those registers at all?

This should raise a red flag. And indeed, if we remember `pow`‘s signature, it should be clear that it’s not the right tool. It takes a double base and exponent, which indicates that it’s suitable for terms like $$15.12151^{3.1415926}$$. This is a total overkill for your problem.

## Using proper functions

So let’s use another `pow` version instead:

``````long int pow4(long int x){
return x * x * x * x;
}
``````

Note that your compiler should create something like this from that:

``````movq    %rdi, %rax
imulq   %rdi, %rax
imulq   %rax, %rax
ret
``````

but if your compiler doesn’t recognize this potential (micro) optimization, you can use

``````long int pow4(long int x){
const long int t = x * x;
return t * t;
}
``````

We also need to change `prop`:

``````int prop(long int A, long int B, long int C, long int D) {
return (pow4(A) + pow4(B) + pow4(C) == pow4(D));
}
``````

Allright. Now, before I show the times of the new program, let’s check the output of your old one:

`a = 1, b = 1, c = 1000, time = 114.156000s`

That’s when I hit ^C. How does the one using `pow4` hold up?

```a = 1, b = 1, c = 1000, time = 0.296000s
a = 1, b = 1, c = 2000, time = 0.578000s
a = 1, b = 1, c = 3000, time = 0.859000s
a = 1, b = 1, c = 4000, time = 1.140000s
a = 1, b = 1, c = 5000, time = 1.421000s
a = 1, b = 1, c = 6000, time = 1.703000s
a = 1, b = 1, c = 7000, time = 1.984000s
a = 1, b = 1, c = 8000, time = 2.265000s
a = 1, b = 1, c = 9000, time = 2.546000s
a = 1, b = 1, c = 10000, time = 2.828000s
a = 1, b = 1, c = 11000, time = 3.109000s
a = 1, b = 1, c = 12000, time = 3.390000s
a = 1, b = 1, c = 13000, time = 3.687000s
a = 1, b = 1, c = 14000, time = 3.968000s
a = 1, b = 1, c = 15000, time = 4.250000s
a = 1, b = 1, c = 16000, time = 4.531000s```

Which is 0,2% of your original time, or a 500x speedup.

However, this comes at a cost: `pow4(500000)` is too large for a `int64_t`, since $$log_2(500000^4) approx 76$$. The greatest number you could check with a `uint64_t` is 65535, $$2^{16}-1$$, which shouldn’t be very surprising. As the standard does not provide `int128_t` or similar, you should make sure that your numbers don’t exceed those bounds.

You can either write your own large integer logic for this, or use GMP.

## Proper bounds and parameter estimation

Next up, you can increase the lower bounds of `b` and `c`, so that $$a le b le c$$. And for `d`, well, if we have `a`, `b`, `c`, then there is only one solution for `d`. We can directly search for that solution with binary search.

The binary search makes a $$mathcal O (n^3 log n)$$ algorithm from your current $$mathcal O (n^4)$$ one, which should provide a lot more speed than the previous speedup.

Even better, if you used the appropriate bounds for `a`, `b` and `c`, we can bound `d` by

$$d^4 = a^4 + b^4 + c^4 le c^4 + c^4 + c^4 = 3c^4$$

and therefore get

$$c le d le sqrt[4]{3},c.$$

With the proper binary algorithm,you can finish the first `a=1`,`b=1` case quickly:

```…
a = 1, b = 1, c = 481000, time = 0.031000s
a = 1, b = 1, c = 482000, time = 0.031000s
a = 1, b = 1, c = 483000, time = 0.031000s
a = 1, b = 1, c = 484000, time = 0.031000s
a = 1, b = 1, c = 485000, time = 0.031000s
a = 1, b = 1, c = 486000, time = 0.031000s
a = 1, b = 1, c = 487000, time = 0.031000s
a = 1, b = 1, c = 488000, time = 0.031000s
a = 1, b = 1, c = 489000, time = 0.031000s
a = 1, b = 1, c = 490000, time = 0.031000s
a = 1, b = 1, c = 491000, time = 0.031000s
a = 1, b = 1, c = 492000, time = 0.031000s
a = 1, b = 1, c = 493000, time = 0.031000s
a = 1, b = 1, c = 494000, time = 0.031000s
a = 1, b = 1, c = 495000, time = 0.031000s
a = 1, b = 1, c = 496000, time = 0.031000s
a = 1, b = 1, c = 497000, time = 0.031000s
a = 1, b = 1, c = 498000, time = 0.031000s
a = 1, b = 1, c = 499000, time = 0.031000s
a = 1, b = 1, time = 0.031000s```

### Exercise

Write a function, that given `a`, `b` and `c` checks whether there exist a `d`, such that your property holds. It should return `-1` if there does not exist such a `d`, and the `d` otherwise.

Use that function in your code. Make sure that you need roughly $$log d_{text{max}}$$ iterations in that function.

### Important remark about integer sizes

Keep in mind that `long int` is usually just a 64 bit integer, which means that the largest integer you can store is $$2^{63}-1$$. Integer types with more bits have greater bounds, but are platform specific. Also, multiplication can be a tad slower, since multiplying 128bit numbers isn’t as easy as multiplying 64bit numbers.

See the next section how to get multiplications down.

# An actual review

Our `pow4` is now essentially two multiplications. However, we’re still using `pow4` too often. After all, we don’t need to recalculate $$a^4$$ in every iteration. The compiler happily does, since it doesn’t optimize aggressively enough.

Which brings us to the actual review: your code is cleanly written, easy to read and to understand. Unfortunately, well-written, modular code often doesn’t squeeze the last bit (heh) out of your hardware, unless your compiler/runtime is very smart (and thus often expensive).

So let’s get back to the drawing board for a final review of your code:

## Includes

``````#include <stdio.h>
#include <time.h>
#include <math.h>
``````

I would sort them by name, but that’s fine. You don’t include anything that’s not necessary, nor did you forget something (and got saved by a non-standard compliant compiler).

## Declaration

``````int main() {
long int a, b, c, d;
clock_t t;
t = clock();
``````

Depending on whether you write ANSI-C or C99, I would defer declaration of variables as long as possible. For example, at the moment it’s easy to accidentally change `c` to some bogus value, or forget a `{` and accidentally check the `prop` after the loops or similar:

``````for (a = 1; a < 100000; a++)
for (b = 1; b < 300000; b++)
for (c = 1; c < 500000; c++)
for (d = 1; d < 500000; d++)
printf("inner loop");
if (prop(a,b,c,d))
printf("FOUND IT!na = %ldnb = %ldnc = %ldnd = %ldn", a, b, c, d);
``````

Whoops. The `if` doesn’t get checked, and you don’t get a warning (in older compiler versions; new ones do warn about possible whitespace issues). If you declare your variables later (e.g. C99-style), errors like that cannot happen (although it introduces possible shadowing):

``````for (long int a = 1; a < 100000; a++)
for (long int b = 1; b < 300000; b++)
for (long int c = 1; c < 500000; c++)
for (long int d = 1; d < 500000; d++)
printf("inner loop");
if (prop(a,b,c,d))
printf("FOUND IT!na = %ldnb = %ldnc = %ldnd %ldn", a, b, c, d);
``````

This will now lead to a compiler error, since `a`, `b` and so on are out of scope. Either way, that depends on the language standard you want to use. Some people prefer one way, others the other one. Choose yours.

## Types

Given that all values should be strictly greater than zero, `long int` is not the appropriate type, as it can be negative. We should accommodate that. However, instead of using `long unsigned int` through­out our code, let’s use a type synonym in case we want to change it later to a type with a greater range:

`````` typedef long unsigned int Number;
``````

You can probably come up with a better name.

## Cache results (by hand)

One thing that strikes me most is that you recalculate $$a^4$$ and so on every time. We can easily treat this with more variables (using your declaration style):

``````int main() {
long int a, b, c, d;
long int a4, b4, c4, d4; // new variables
clock_t t;
t = clock();

for (a = 1; a < 100000; a++) {
a4 = pow4(a);                             // remember
for (b = a; b < 300000; b++) {
b4 = pow4(b);                         // remember
for (c = b; c < 500000; c++) {
c4 = pow4(c);                     // the fourth power
for (d = c; d < 500000; d++) {
d4 = pow4(d);                 // of this member
if (a4 + b4 + c4 == d4)
printf("FOUND IT!na = %ldnb = %ldnc = %ldnd = %ldn", a, b, c, d);
…
``````

Remember how I said that nicely written, modular code isn’t often optimal? This is one of those un­fortunate examples where you have to help the compiler (unless you know exactly what optimization flags you have to use or your compiler is overly aggressive). The `prop` is gone, the calls to `pow4` are now in your loop.

But the compiler cannot make a mistake here anymore: it’s now very clear that `a4` doesn’t need to be recalculated 300000*500000*500000 times.

That being said, we should apply the other suggestions like the type synonym and the late declaration:

``````typedef long unsigned int Number;

int main() {
clock_t t;
t = clock();

for (Number a = 1; a < 100000; a++) {
const Number a4 = pow4(a);                         // remember
for (Number b = a; b < 300000; b++) {
const Number b4 = pow4(b);                     // remember
for (Number c = b; c < 500000; c++) {
const Number c4 = pow4(c);                 // the fourth power
for (Number d = c; d < 500000; d++) {
const Number d4 = pow4(d);             // of this member
if (a4 + b4 + c4 == d4)
printf("FOUND IT!na = %ldnb = %ldnc = %ldnd = %ldn", a, b, c, d);
…
``````

While `const` isn’t necessary here, it will make sure that we don’t change our cached results accidentally.

## The time has come

Although our code is now more verbose, there is one small piece of code that repeats itself three times throughout your `main`:

``````((double)(clock() - t))/CLOCKS_PER_SEC)
``````

That’s quite hard to read, isn’t it? It’s a perfect candidate for a function:

``````static inline seconds_since(clock_t t){
return ((double)(clock() - t))/CLOCKS_PER_SEC;
}
``````

This changes your `printf` from

``````printf("a = %ld, b = %ld, c = %ld, time = %fsn", a, b, c, ((double)(clock() - t))/CLOCKS_PER_SEC);
``````

to

``````printf("a = %ld, b = %ld, c = %ld, time = %fsn", a, b, c, seconds_since(t));
``````

Ah. Much easier to read. That’s what `inline` functions are for. Note that any sophisticated compiler should inline that function anyway, so you may also drop `inline` if you don’t want to use C99.

``````for (a = 1; a < 100000; a++) {
for (b = 1; b < 300000; b++) {
for (c = 1; c < 500000; c++) {
``````

let’s ignore `d` for now. What do you do here? You check 1, 1, 1, then 1, 1, 2, then 1, 1, 3, … up to 1, 1, 499999. Then you start over at 1, 2, 1. But you already checked 1, 1, 2, so why are you checking 1, 2, 1? You could go straight to 1, 2, 2. That doesn’t save you much for these low numbers, but believe me, when you get to big numbers it adds up.

In short: a, b, c should be nondecreasing. We can achieve that by starting b at a, and starting c at b, so b is never smaller than a, and c is never smaller than b. So immediately you can get rid of about half the work you’re doing with

``````for (a = 1; a < 100000; a++) {
for (b = a; b < 300000; b++) {
for (c = b; c < 500000; c++) {
``````

Next, consider `d`.

Of course we can immediately make the same optimization we just made for a, b and c, since d will never be smaller than any of them, and c is always the largest.

Also, once d4 is larger than the sum, we can stop incrementing d because it’s only going to get bigger.

So that will save a lot of time right there. But we can do way better than that.

The question you are asking is “do these four numbers have the sum property?” but the question you should be asking is “does a4 + b4 + c4 equal any fourth power?” If it does, then you can easily compute d much faster than trying all possible fourth powers. So, can you write a fast method that tells you if a particular sum is a fourth power or not?

If you know how to take a square root, what you can do is take the sum, take the square root twice, and then square the result twice. If you get back the original sum, then it was a fourth power, and if you don’t, then it wasn’t.

If you don’t know how to take a square root, you can do the following logic: we have sum; is 1284 equal to sum? No. Is it bigger? Yes. Then next try 644. Is it equal to sum? No. Is it smaller? Yes. Then try 964, and so on. Binary search for the result until you find it, or until there are no more numbers in your range to check. That does only a tiny handful of computations, compared to trying thousands upon thousands of possible fourth powers.

I’m not able to comment on vnp’s solution, but vnp was right the first time: you can brute force it in \$O(n^2log(n))\$ time and \$O(n)\$ space. You don’t need \$O(n^2)\$ space because you don’t have to store the whole list of \$a^4+b^4\$ or \$d^4-c^4\$ upfront. Instead you only need to be able to list the values of \$a^4+b^4\$ and \$d^4-c^4\$ in ascending order and then match up the two values.

To list the values of \$a^4+b^4\$, you can maintain, for each \$a\$, the smallest value of \$b\$, call it \$b_a\$, such that \$a^4+b_a^4ge k\$, where \$k\$ is the previous value of \$a^4+b^4\$ that you read off. To read off the next value, you need to find the value of \$a\$ such that \$a^4+b_a^4\$ is the smallest, which you can do in \$O(log(n))\$ time using a heap or something. Then you increment \$b_a\$ and you are ready to read off the next value of \$a^4+b^4\$.

This is an example C++11 program (using a GNU extension to get 128 bit integers) that finds the solution in about 7 hours on my PC. (This is just to illustrate the method – it is not very efficient in other ways.)

``````#include <cstdio>
#include <cmath>
#include <cstdlib>
#include <queue>
#include <vector>
using std::priority_queue;
using std::vector;

//typedef long long int bigint;
typedef __int128 bigint;

// Look for a^4+b^4+c^4 = d^4 by comparing
// ascending list of a^4+b^4 with ascending list of d^4-c^4

vector<bigint> list;
vector<int> diffptr,sumptr;

bool sumcmp(int a0,int a1){
int b0=sumptr[a0];
int b1=sumptr[a1];
return list[a0]+list[b0]>list[a1]+list[b1];
}

bool diffcmp(int c0,int c1){
int d0=diffptr[c0];
int d1=diffptr[c1];
return list[d0]-list[c0]>list[d1]-list[c1];
}

int main(int ac,char**av){
int a,c,i,n;

n=500000;
printf("Using n = %dn",n);
if(4*log(n)/log(2)+2>sizeof(bigint)*8){fprintf(stderr,"Error: %lu-bit type not large enoughn",sizeof(bigint)*8);exit(1);}

bigint sumval,diffval;

list.resize(n);
diffptr.resize(n);
sumptr.resize(n);

priority_queue<int,vector<int>,decltype(&sumcmp)> sumnext(sumcmp);
priority_queue<int,vector<int>,decltype(&diffcmp)> diffnext(diffcmp);

for(i=0;i<n;i++)list[i]=bigint(i)*i*i*i;
for(a=1;a<n;a++){sumptr[a]=a;sumnext.push(a);}
for(c=0;c<n-1;c++){diffptr[c]=c+1;diffnext.push(c);}

while(!sumnext.empty()&&!diffnext.empty()){
a=sumnext.top();sumval=list[sumptr[a]]+list[a];
c=diffnext.top();diffval=list[diffptr[c]]-list[c];
if(sumval==diffval){printf("BINGO %d^4+%d^4+%d^4=%d^4n",a,sumptr[a],c,diffptr[c]);fflush(stdout);}
if(sumval<=diffval){
sumnext.pop();
sumptr[a]++;if(sumptr[a]<n)sumnext.push(a);
}else{
diffnext.pop();
diffptr[c]++;if(diffptr[c]<n)diffnext.push(c);
}
}

}
``````

## Given an array of integers, return the smallest positive integer not in it

That's a nice simple solution, with two problems: It will give incorrect result when A contains all the values in the ranges [1..1000000] or [1..999999], returning undefined instead of 1000001 and 1000000, respectively. It doesn't meet the time complexity...

## What does it mean when the connectivity icons in the status bar go white/gray?

It has to do with whether or not you've currently got a good connection to Google's servers for sync services and the like. From page 27 of their Android 2.3 Users Guide: Network status icons turn green if you have a Google Account added to your phone and the...

## Explaining computational complexity theory

Hoooo, doctoral comp flashback. Okay, here goes. We start with the idea of a decision problem, a problem for which an algorithm can always answer "yes" or "no." We also need the idea of two models of computer (Turing machine, really): deterministic and...

## Building a multi-level menu for umbraco

First off, no need pass the a parent parameter around. The context will transport this information. Here is the XSL stylesheet that should solve your problem: <!-- update this variable on how deep your menu should be --> <xsl:variable...

## How to generate a random string?

My favorite way to do it is by using /dev/urandom together with tr to delete unwanted characters. For instance, to get only digits and letters: tr -dc A-Za-z0-9 </dev/urandom | head -c 13 ; echo '' Alternatively, to include more characters from the OWASP...

## How to copy a file from a remote server to a local machine?

The syntax for scp is: If you are on the computer from which you want to send file to a remote computer: scp /file/to/send username@remote:/where/to/put Here the remote can be a FQDN or an IP address. On the other hand if you are on the computer wanting to...

## What is the difference between curl and wget?

The main differences are: wget's major strong side compared to curl is its ability to download recursively. wget is command line only. There's no lib or anything, but curl's features are powered by libcurl. curl supports FTP, FTPS, HTTP, HTTPS, SCP, SFTP, TFTP,...

## Using ‘sed’ to find and replace [duplicate]

sed is the stream editor, in that you can use | (pipe) to send standard streams (STDIN and STDOUT specifically) through sed and alter them programmatically on the fly, making it a handy tool in the Unix philosophy tradition; but can edit files directly, too,...

## How do I loop through only directories in bash?

You can specify a slash at the end to match only directories: for d in */ ; do echo "\$d" done If you want to exclude symlinks, use a test to continue the loop if the current entry is a link. You need to remove the trailing slash from the name in order for -L to...

## How to clear journalctl

The self maintenance method is to vacuum the logs by size or time. Retain only the past two days: journalctl --vacuum-time=2d Retain only the past 500 MB: journalctl --vacuum-size=500M man journalctl for more information. You don't typically clear the journal...

## How can I run a command which will survive terminal close?

One of the following 2 should work: \$ nohup redshift & or \$ redshift & \$ disown See the following for a bit more information on how this works: man nohup help disown Difference between nohup, disown and & (be sure to read the comments too) If your...

## In Bash, when to alias, when to script, and when to write a function?

The other answers provide some soft general guidelines based on personal taste, but ignore many pertinent facts that one should consider when deciding between scripts, functions, or aliases. Aliases and Functions ¹ The entire contents of aliases and functions...

## Get exit status of process that’s piped to another

bash and zsh have an array variable that holds the exit status of each element (command) of the last pipeline executed by the shell. If you are using bash, the array is called PIPESTATUS (case matters!) and the array indicies start at zero: \$ false | true \$...

## Execute vs Read bit. How do directory permissions in Linux work?

When applying permissions to directories on Linux, the permission bits have different meanings than on regular files. The read bit (r) allows the affected user to list the files within the directory The write bit (w) allows the affected user to create, rename,...

## What are the pros and cons of Vim and Emacs? [closed]

I use both, although if I had to choose one, I know which one I would pick. Still, I'll try to make an objective comparison on a few issues. Available everywhere? If you're a professional system administrator who works with Unix systems, or a power user on...

## How do I use pushd and popd commands?

pushd, popd, and dirs are shell builtins which allow you manipulate the directory stack. This can be used to change directories but return to the directory from which you came. For example start up with the following directories: \$ pwd /home/saml/somedir \$ ls...

## How to forward X over SSH to run graphics applications remotely?

X11 forwarding needs to be enabled on both the client side and the server side. On the client side, the -X (capital X) option to ssh enables X11 forwarding, and you can make this the default (for all connections or for a specific connection) with ForwardX11 yes...

## What does “LC_ALL=C” do?

LC_ALL is the environment variable that overrides all the other localisation settings (except \$LANGUAGE under some circumstances). Different aspects of localisations (like the thousand separator or decimal point character, character set, sorting order, month,...

## What is a bind mount?

What is a bind mount? A bind mount is an alternate view of a directory tree. Classically, mounting creates a view of a storage device as a directory tree. A bind mount instead takes an existing directory tree and replicates it under a different point. The...

## Turn off buffering in pipe

Another way to skin this cat is to use the stdbuf program, which is part of the GNU Coreutils (FreeBSD also has its own one). stdbuf -i0 -o0 -e0 command This turns off buffering completely for input, output and error. For some applications, line buffering may...