Overall I think this is good quality, if a little dense code, by which I
mean that since there is lots of functionality there, it’s not the
easiest code to follow. One suggestion I have for that is more grouping
according to topics. So e.g. the utilities could just have a couple of
sections for files, string formatting, calling external programs an so
Since you work with a lot of external files, I imagine some sort of
wrapper object to handle a project or so could work as well.
(Frankly, I may just submit pull requests now, that’s easier, heh.)
While running the tests, I’ve found that the import of
future.builtins doesn’t work (Python 2.7.9), since there is no such
future_builtins doesn’t have
open either, which means that
nntoolkit can’t be loaded and
serve.py:19 also throws an error. I
don’t what causes this, as you already have Travis CI setup, I’ll see
if I can get to the root cause of that.
pickle isn’t the nicest format for longterm data files; however at
this point this is more of a reflex for me, if it works for you, than
sure, why not (although you already have at least one workaround, the
sys.modules part, so keep that in mind I think).
For speed you can use
ujson as a drop-in replacement of
data_analyzation_metrics.py:119 a raw escape string for color(?) is
used. I’d additionally want a global flag to disable colors and it
would be good to use a library for the formatting (I saw colorterm and
termcolor; there may be others).
For something like
"%s" % str(x) the
str isn’t necessary.
You’re already doing this in some places, so I’d suggest using
with open("foo") as file: all the time (if possible).
repr(self) looks cleaner.
features.py:174 the factors
3 should be extracted,
e.g. something like
draw_width = 3 if self.pen_down else 2 or so; in
general extracting common subexpressions (even
len(x)) can eliminate a
lot of code, so I won’t look for other examples here.
In general, if you have
if foo: return, you don’t need an
remove that indentation; also returning early can remove lots of
HandwrittenData.py:208 the whole method could just be reduce to:
def __eq__(self, other): return isinstance(other, self.__class__) and self.__dict__ == other.__dict__
preprocessing.py:30 is also defined as
scipy.spatial.distance.euclidean, so if in some cases you run that
over more than a few elements you could consider using that.
preprocessing.py:497 could be neater with
for index, point in enumerate(pointlist):.
selfcheck.py should already exist somewhere …?
Instead of using
zip you can also use
itertools.izip, the generator
variant, to use less memory when possible, i.e. when only iterating over
something with a
for loop instead of storing the resulting list.
setup.py has a version number, but the Git repository has no
corresponding tags/releases. If you start now and add e.g.
the first release (or so), that’d make it easier to refer to specific
version, i.e. from install scripts.
long_description has a newline, that might look weird in some
install_requires lists at least one package which is not in the
requirements.txt, so I’d sync those, unless it’s really only required
for installation, in that case this point is moot.
The keywords look good, except that I’d doubt that including
helps if the package is already named that way
, and (it does, see comments).
doesn’t need the hyphen
The classifiers are good; you could also list more specific versions of
requirements.txt has no version requirements. I’d say that at
least some lower bound (i.e. your currently installed packages or so)
would be useful to have. Of course you might not precisely know which
versions to go for, but for someone trying to get it running this would
be helpful nonetheless.
I don’t exactly know the process for external requirements, but you
could just note somewhere that ImageMagick is a dependency.
You also fixed the PEP8 stuff, so now there’s only a few too-long lines
left; you could also add the
pep8 as a pre-commit hook, so you
wouldn’t be able to check in without everything fixed; I use that for
library code at least. Same goes for
pylint; I’d maybe disable a few
of those things (and add it to the Makefile or again as a pre-commit
Great! There is a bit of duplication, e.g.
implemented three times. If possible I’d move that into (yet another,
heh) utils package just to get it out of the way.
Using nosetests and the addition of the Makefile is nice as well.
Well, I like PostgreSQL, so I think this will come up at some point; if
you don’t have a pressing reason to use MySQL exclusively, using a
database independent library would be cool.