Home » Searching in an array in less than O(n) time

Searching in an array in less than O(n) time


While other answers make good points, I have to wonder why you are using recursion. This is such a simple problem to solve with a for loop.

I assume that you are not supposed to start from any index other than index 0, so consider the following routine:

public int searchArray(int[] arr, int elem) {

    for (int i = 0; i < arr.length; ) {
        if (arr[i] == elem) {
            return i;
        i += Math.abs(elem - arr[i]);
    return -1;

(If you need to start the search part-way through the array then you can add the offset input parameter again and start i from that).

The bottom line is that recursion is overkill, this system is $O(n)$, but the cost of each cycle is less than the same thing using recursion.

I am not aware of any way to solve the given problem with a system of better than $O(n)$ complexity.

Discussion on complexity – why this is $O(n)$

This answer has generated a lot of discussion about complexity, that this method only ever scans, at most, half the members of the input array, and thus it should be complexity $O left( frac{n}{2} right )$ instead of $O(n)$. The argument given is something like:

Consider the worst-case data 1,2,1,2,1,2,1,2,1,2,1,2 and the search term 3. For this situation the method will start at data[0], and then skip to data[2], then data[4], and so on. It will never inspect data[1], and other odd-indexed data points. If the search term is even more ‘different’ than the actual data (e.g. 100) then the method will do only one comparison at data[0] and will then return ‘not found’ -1.

This is an interesting observation, that the method only ever needs to scan half the data at most. This is especially interesting considering a ‘naive’ method which just scans the data one-member-at-a-time and returns when it finds the value. That ‘naive’ method most certainly has $ Oleft(nright) $ ‘performance’ and complexity, and the ‘skip-method’ will be more than twice as fast.

The important thing to note though, is how the algorithms scale relative to the amount of data, not relative to each other!

So, consider a hypothetical set of worst-case data 1,2,1,2,1,2,.... and the search-term 3. This hypothetical happens to be searched in 4 milliseconds by the skip-method, and in 8 milliseconds by the naive-method. Now we double the amount of data, what happens? The processing time for both methods will double!

In both cases, the performance of the algorithms will double for each doubling of the data volume. This is what makes both algorithms $ O(n) $ complexity. From Wikipedia:

In computer science, big O notation is used to classify algorithms by how they respond (e.g., in their processing time or working space requirements) to changes in input size.

Reversing the argument, by suggesting the skip-method has $O left( frac{n}{2} right) $ complexity, I would then expect that, if I double the data, the execution time would only increase by a half, or 50%. This is ‘obviously’ not true for the skip-method (nor the naive-method).

Both methods have $O(n)$ complexity because they both scale the same way with increasing volumes of data.

But, just because they scale the same way, does not mean that one method is not better than the other… obviously.

The problem is in $O(n)$. Consider the case that 200_success describes.

You have a sequence of alternating 1 and 2’s where a single 1 is replaced by a 3.

When you are asked to search for a 3 you know, after inspecting the first element, that it will have an even index. But if every odd index holds a 2 then any even index can hold a 3, so you can not guarantee that the 3 is not in the last index you search. This means that you will have to look in $O(dfrac{n}{2})$ places. $O(dfrac{n}{2})$ = $O(n)$, so the problem is $O(n)$.

No correct algorithm can have better worst time performance than this.

A more interesting question is what happens if you know that no number occurs more than some fixed upper bound.

You never need to look backwards if you start at 0.

Proof by induction over algorithm steps j:

For j = 0, i(j) = 0, you cannot go backwards.

For j > 1, there are two cases: First one, we found our number. Second one, there is a difference diff(j) = abs(elem - array[i(j)]). Then no number in array[i(j),i(j)+diff) can contain elem. By induction hypothesis, no element in array[i(j-1),i(j)=i(j-1)+diff(j-1)) contains that number either. So the next possible index for i is i(j)+diff(j) (= i(j+1))

But that algorithm really is in O(n) as Taemyr already answered.

Related Solutions

Extract file from docker image?

You can extract files from an image with the following commands: docker create $image # returns container ID docker cp $container_id:$source_path $destination_path docker rm $container_id According to the docker create documentation, this doesn't run the...

Transfer files using scp: permission denied

Your commands are trying to put the new Document to the root (/) of your machine. What you want to do is to transfer them to your home directory (since you have no permissions to write to /). If path to your home is something like /home/erez try the following:...

What’s the purpose of DH Parameters?

What exactly is the purpose of these DH Parameters? These parameters define how OpenSSL performs the Diffie-Hellman (DH) key-exchange. As you stated correctly they include a field prime p and a generator g. The purpose of the availability to customize these...

How to rsync multiple source folders

You can pass multiple source arguments. rsync -a /etc/fstab /home/user/download bkp This creates bkp/fstab and bkp/download, like the separate commands you gave. It may be desirable to preserve the source structure instead. To do this, use / as the source and...

Benefits of Structured Logging vs basic logging

There are two fundamental advances with the structured approach that can't be emulated using text logs without (sometimes extreme levels of) additional effort. Event Types When you write two events with log4net like: log.Debug("Disk quota {0} exceeded by user...

Interfaces vs Types in TypeScript

2019 Update The current answers and the official documentation are outdated. And for those new to TypeScript, the terminology used isn't clear without examples. Below is a list of up-to-date differences. 1. Objects / Functions Both can be used to describe the...

Get total as you type with added column (append) using jQuery

One issue if that the newly-added column id's are missing the id number. If you look at the id, it only shows "price-", when it should probably be "price-2-1", since the original ones are "price-1", and the original ones should probably be something like...

Determining if a file is a hard link or symbolic link?

Jim's answer explains how to test for a symlink: by using test's -L test. But testing for a "hard link" is, well, strictly speaking not what you want. Hard links work because of how Unix handles files: each file is represented by a single inode. Then a single...

How to restrict a Google search to results of a specific language?

You can do that using the advanced search options: http://www.googleguide.com/sharpening_queries.html I also found this, which might work for you: http://www.searchenginejournal.com/how-to-see-google-search-results-for-other-locations/25203/ Just wanted to add...

Random map generation

Among the many other related questions on the site, there's an often linked article for map generation: Polygonal Map Generation for Games you can glean some good strategies from that article, but it can't really be used as is. While not a tutorial, there's an...

How to prettyprint a JSON file?

The json module already implements some basic pretty printing in the dump and dumps functions, with the indent parameter that specifies how many spaces to indent by: >>> import json >>> >>> your_json = '["foo", {"bar":["baz", null,...

How can I avoid the battery charging when connected via USB?

I have an Android 4.0.3 phone without root access so can't test any of this but let me point you to /sys/class/power_supply/battery/ which gives some info/control over charging issues. In particular there is charging_enabled which gives the current state (0 not...

How to transform given dataset in python? [closed]

From your expected result, it appears that each "group" is based on contiguous id values. For this, you can use the compare-cumsum-groupby pattern, and then use agg to get the min and max values. # Sample data. df = pd.DataFrame( {'id': [1, 2, 2, 2, 2, 2, 1, 1,...

Output of the following C++ Program [closed]

It works exactly like this non-recursive translation: int func_0() { return 2; } int func_1() { return 3; } int func_2() { return func_1() + func_0(); } // Returns 3 + 2 = 5 int func_3() { return func_2() + func_1(); } // Returns 5 + 3 = 8 int func_4() { return...

Making a circle out of . (periods) [closed]

Here's the maths and even an example program in C: http://pixwiki.bafsoft.com/mags/5/articles/circle/sincos.htm (link no longer exists). And position: absolute, left and top will let you draw: http://www.w3.org/TR/CSS2/visuren.html#choose-position Any further...

Should I use a code converter (Python to C++)?

Generally it's an awful way to write code, and does not guarantee that it will be any faster. Things which are simple and fast in one language can be complex and slow in another. You're better off either learning how to write fast Python code or learning C++...

tkinter: cannot concatenate ‘str’ and ‘float’ objects

This one line is more than enough to cause the problem: text="რეგულარი >> "+2.23+ 'GEL' 2.23 is a floating-point value; 'GEL' is a string. What does it mean to add an arithmetic value and a string of letters? If you want the string label 'რეგულარი...