Home » Python package for handwriting recognition

Python package for handwriting recognition

Solutons:


Overall I think this is good quality, if a little dense code, by which I
mean that since there is lots of functionality there, it’s not the
easiest code to follow. One suggestion I have for that is more grouping
according to topics. So e.g. the utilities could just have a couple of
sections for files, string formatting, calling external programs an so
on.

Since you work with a lot of external files, I imagine some sort of
wrapper object to handle a project or so could work as well.

(Frankly, I may just submit pull requests now, that’s easier, heh.)

Code

While running the tests, I’ve found that the import of open from
future.builtins doesn’t work (Python 2.7.9), since there is no such
module / future_builtins doesn’t have open either, which means that
nntoolkit can’t be loaded and serve.py:19 also throws an error. I
don’t what causes this, as you already have Travis CI setup, I’ll see
if I can get to the root cause of that.

IMO pickle isn’t the nicest format for longterm data files; however at
this point this is more of a reflex for me, if it works for you, than
sure, why not (although you already have at least one workaround, the
sys.modules part, so keep that in mind I think).

For speed you can use ujson as a drop-in replacement of json.

In data_analyzation_metrics.py:119 a raw escape string for color(?) is
used. I’d additionally want a global flag to disable colors and it
would be good to use a library for the formatting (I saw colorterm and
termcolor; there may be others).

For something like "%s" % str(x) the str isn’t necessary.

You’re already doing this in some places, so I’d suggest using
with open("foo") as file: all the time (if possible).

Instead of self.__repr__() repr(self) looks cleaner.

In features.py:174 the factors 2 and 3 should be extracted,
e.g. something like draw_width = 3 if self.pen_down else 2 or so; in
general extracting common subexpressions (even len(x)) can eliminate a
lot of code, so I won’t look for other examples here.

In general, if you have if foo: return, you don’t need an else, just
remove that indentation; also returning early can remove lots of
indentation.

For HandwrittenData.py:208 the whole method could just be reduce to:

def __eq__(self, other):
    return isinstance(other, self.__class__) 
        and self.__dict__ == other.__dict__

euclidean_distance from preprocessing.py:30 is also defined as
scipy.spatial.distance.euclidean, so if in some cases you run that
over more than a few elements you could consider using that.

preprocessing.py:497 could be neater with
for index, point in enumerate(pointlist):.

which in selfcheck.py should already exist somewhere …?

Instead of using zip you can also use itertools.izip, the generator
variant, to use less memory when possible, i.e. when only iterating over
something with a for loop instead of storing the resulting list.

Packaging

The setup.py has a version number, but the Git repository has no
corresponding tags/releases. If you start now and add e.g. 0.1.207 as
the first release (or so), that’d make it easier to refer to specific
version, i.e. from install scripts.

The long_description has a newline, that might look weird in some
circumstances.

install_requires lists at least one package which is not in the
requirements.txt, so I’d sync those, unless it’s really only required
for installation, in that case this point is moot.

The keywords look good, except that I’d doubt that including 'HWRT'
helps if the package is already named that way, and 'on-line' probably
doesn’t need the hyphen
(it does, see comments).

The classifiers are good; you could also list more specific versions of
Python 3.

The requirements.txt has no version requirements. I’d say that at
least some lower bound (i.e. your currently installed packages or so)
would be useful to have. Of course you might not precisely know which
versions to go for, but for someone trying to get it running this would
be helpful nonetheless.

I don’t exactly know the process for external requirements, but you
could just note somewhere that ImageMagick is a dependency.

You also fixed the PEP8 stuff, so now there’s only a few too-long lines
left; you could also add the pep8 as a pre-commit hook, so you
wouldn’t be able to check in without everything fixed; I use that for
library code at least. Same goes for pylint; I’d maybe disable a few
of those things (and add it to the Makefile or again as a pre-commit
hook).

Tests

Great! There is a bit of duplication, e.g. compare_pointlists is
implemented three times. If possible I’d move that into (yet another,
heh) utils package just to get it out of the way.

Using nosetests and the addition of the Makefile is nice as well.

Future ideas

Well, I like PostgreSQL, so I think this will come up at some point; if
you don’t have a pressing reason to use MySQL exclusively, using a
database independent library would be cool.

Related Solutions

How to download package not install it with apt-get command?

Use --download-only: sudo apt-get install --download-only pppoe This will download pppoe and any dependencies you need, and place them in /var/cache/apt/archives. That way a subsequent apt-get install pppoe will be able to complete without any extra downloads....

What defines the maximum size for a command single argument?

Answers Definitely not a bug. The parameter which defines the maximum size for one argument is MAX_ARG_STRLEN. There is no documentation for this parameter other than the comments in binfmts.h: /* * These are the maximum length and maximum number of strings...

Bulk rename, change prefix

I'd say the simplest it to just use the rename command which is common on many Linux distributions. There are two common versions of this command so check its man page to find which one you have: ## rename from Perl (common in Debian systems -- Ubuntu, Mint,...

Output from ls has newlines but displays on a single line. Why?

When you pipe the output, ls acts differently. This fact is hidden away in the info documentation: If standard output is a terminal, the output is in columns (sorted vertically) and control characters are output as question marks; otherwise, the output is...

mv: Move file only if destination does not exist

mv -vn file1 file2. This command will do what you want. You can skip -v if you want. -v makes it verbose - mv will tell you that it moved file if it moves it(useful, since there is possibility that file will not be moved) -n moves only if file2 does not exist....

Is it possible to store and query JSON in SQLite?

SQLite 3.9 introduced a new extension (JSON1) that allows you to easily work with JSON data . Also, it introduced support for indexes on expressions, which (in my understanding) should allow you to define indexes on your JSON data as well. PostgreSQL has some...

Combining tail && journalctl

You could use: journalctl -u service-name -f -f, --follow Show only the most recent journal entries, and continuously print new entries as they are appended to the journal. Here I've added "service-name" to distinguish this answer from others; you substitute...

how can shellshock be exploited over SSH?

One example where this can be exploited is on servers with an authorized_keys forced command. When adding an entry to ~/.ssh/authorized_keys, you can prefix the line with command="foo" to force foo to be run any time that ssh public key is used. With this...

Why doesn’t the tilde (~) expand inside double quotes?

The reason, because inside double quotes, tilde ~ has no special meaning, it's treated as literal. POSIX defines Double-Quotes as: Enclosing characters in double-quotes ( "" ) shall preserve the literal value of all characters within the double-quotes, with the...

What is GNU Info for?

GNU Info was designed to offer documentation that was comprehensive, hyperlinked, and possible to output to multiple formats. Man pages were available, and they were great at providing printed output. However, they were designed such that each man page had a...

Set systemd service to execute after fstab mount

a CIFS network location is mounted via /etc/fstab to /mnt/ on boot-up. No, it is not. Get this right, and the rest falls into place naturally. The mount is handled by a (generated) systemd mount unit that will be named something like mnt-wibble.mount. You can...

Merge two video clips into one, placing them next to each other

To be honest, using the accepted answer resulted in a lot of dropped frames for me. However, using the hstack filter_complex produced perfectly fluid output: ffmpeg -i left.mp4 -i right.mp4 -filter_complex hstack output.mp4 ffmpeg -i input1.mp4 -i input2.mp4...

How portable are /dev/stdin, /dev/stdout and /dev/stderr?

It's been available on Linux back into its prehistory. It is not POSIX, although many actual shells (including AT&T ksh and bash) will simulate it if it's not present in the OS; note that this simulation only works at the shell level (i.e. redirection or...

How can I increase the number of inodes in an ext4 filesystem?

It seems that you have a lot more files than normal expectation. I don't know whether there is a solution to change the inode table size dynamically. I'm afraid that you need to back-up your data, and create new filesystem, and restore your data. To create new...

Why doesn’t cp have a progress bar like wget?

The tradition in unix tools is to display messages only if something goes wrong. I think this is both for design and practical reasons. The design is intended to make it obvious when something goes wrong: you get an error message, and it's not drowned in...

OpenSSH: How to end a match block

To end up a match block with openssh 6.5p1 or above, use the line: Match all Here is a piece of code, taken from my /etc/ssh/sshd_config file: # Change to no to disable tunnelled clear text passwords PasswordAuthentication no Match host 192.168.1.12...

Redirecting the content of a file to the command “echo”

You can redirect all you want to echo but it won't do anything with it. echo doesn't read its standard input. All it does is write to standard output its arguments separated by a space character and terminated by a newline character (and with some echo...