For those of us more dictionary oriented, there is https://pypi.python.org/pypi/voluptuous (which is OK for the most part, as long as you are only trying to do validation, and nothing too crazy)
+1 for marshmallow - most of the serialization libraries are 80% there, but marshmallow has it 95% down - all the weird corner cases about nested models and lists of nested models and all that. Plus the dev is very helpful and courteous on github.
I've tried truckloads of Python serialisation libs over the last few years and marshmallow is the one that finally makes me feel like I don't need to look for another one.
Looks extremely similar to https://github.com/schematics/schematics . Always good to see two projects approach the problem the same way independently - higher chance that the solution is right :)
Schematics has a very useful feature in "roles" - i.e. a good way of hiding certain fields in certain situations (e.g. admin views vs self vs other vs anonymous). Does marshmallow have something similar?
> Here's a schema I use in production, see how readable it makes the parameters of the API and how quick all the validation and normalization is: https://www.pastery.net/mhwwnv/
Thank you so much for providing this example. I couldn't grok what schema did, and your code made it make sense.
If people want to provide examples for the other libraries in this thread, you'll be popular :)
I generally like the simplicity of Rx and the fact that's language agnostic (I've used it with both json and yaml and other serialization libraries), with schemas themselves being written both systems. However, the lacking documentation has always been a problem.
Optional being an attribute of the key rather than the value is pretty bizarre to me, I've seen that patten a couple times. Feels like the wrong way to go about it
Think about it in terms of composability of validators; "having a key" is a property of the dict/object, not the value that is stored there. I feel your intuition, but experience says otherwise.
Ahh. That makes sense then. The value of the key being optional is indeed a different concept. Fwiw, this is one huge downside of Swagger APIs. Swagger supports the concept of optional keys but not optional values >.>
I have yet to find a validation library that supports all the sorts of things I expect it to. One place they tend to fail is the ways they can fulfill default values.
Say a field has an error and I want to just give it a default value when it's broken? That particular feature doesn't exist in any library i've found so far (for python) =/
Granted, this sort of blends into the usual "It's a validation library not a serialization library". But they all make a half-assed attempt at the other side in my experience.
I thought most ORMs were back-end independent. Isn't that 1/2 of their value proposition? The other half being accessing a persistence layer in native code.
Wow, thanks for this...I was literally in the middle of writing a CLI tool to fetch a URL and parse out metadata and was getting a little tired of the argparse route. I browsed the documentation and honestly can't say that I immediately grok the advantages but given who its author is, I'm more than happy to switch libraries in mid-coding :)
I like Click because it does very simple things very well. It gets hairy when you want to build more complex CLIs. For example, value options and validators don't play nicely because Click doesn't distinguish between the absence of a value and an invalid value, so you wind up dropping Click features and rewriting your own plumbing for that kind of stuff. The alternative is writing a more verbose CLI grammar, which leads to a really clunky UI.
I find click to be much more natural too. Docopt was always touchy about more complicated CLIs when I used it (probably because a lot of "magic" happens behind the scenes), whereas click lets you drill down and arrange things just so. It also feels very well designed, hats of to Ronacher as usual for being good at designing Python libraries.
I also find myself using click even when I don't want a CLI. The pretty-printer (`secho`) and progress bars are extremely handy, plus some of the other stuff in utilities. It's quite nice that they handle detecting when output is an interactive terminal versus piping to a file.
I'm feeling a lot of love for Pandas. Any (biology related) project I work on starts with multi-headered dataframes and ends in beautiful Seaborn graphs. In combination with Jupyter notebook I breeze through large data sets while leaving a perfect trail of what goes on in the data pipeline. Python is great.
Seaborn is great for visualization; it basically packages up some of the more specialized R plots for MPL. The other thing I really love about it are plotting contexts, which make it really easy to properly size and format the same plot for e.g. a poster and a paper.
Interestingly enough, my biggest use of Pandas is to serialize to and from HDF5. I work with a lot of large datasets and Pandas simplifies using HDF5 quite a lot.
I've been loving Pandas used alongside Seaborn as well. It's really just so easy to manipulate/visualize my data (and make it look gorgeous) with the combo.
Print-debugging on steroids. This really does make things so much easier, especially when dealing with huge apps you don't have time to learn. Not just useful as a dev but also as a sysadmin.
First of all, pudb is fantastic, just use it over pdb all the time.
q is for when you want to log data, pudb is for when you want to step through and evaluate lines in-context. It's very possible that you'll want to use both together.
They don't do the same thing. Q does printing. import q; q(var) -> prints var to /tmp/q, with syntax highlighting, separate files for big output, etc. It can also do lots of other cool things, cf 1-page documentation in the link. :)
Wow, there are some great new tools to explore. Thanks!
Some of the new libraries I'm using this year that I've found really handy include:
Odo - http://odo.readthedocs.org/en/latest/ It is ridiculously handy for converting data from one format to another - especially for transforming a table from a database or csv into a DataFrame and back.
Xlsxwriter - http://xlsxwriter.readthedocs.org - I'm building beautiful reports, with charts, using this tool. As someone who moves data around a lot, but has to work with less technical business and analyst folks, this is becoming my goto for handing them some data to play with.
Blessings - https://pypi.python.org/pypi/blessings - as I get older staring at simple black and white text on the screen seems to be getting harder. Putting a little color and flare in my command line interfaces cheers me up even if it doesn't do much in the way of actually getting the job done.
Lastly, switching from curl to httpie was a huge help in working with API's of all sorts. It solved a problem I didn't even know I had. https://pypi.python.org/pypi/httpie
It looks like a much cleaner Scrapy-inspired spider framework, without the twisted dependency. And it's python 2+3 compatible. I'm very excited to try it out.
I love that the Python community is coming up with so many robust frameworks. I have used Scrapy, and it provides most boilerplate functionality out of the box. Gone are the days I would use wget for my scraping tasks. This being said, it's a bit disappointing that Scrapy still doesn't have native support for dynamic pages. I am hoping to see this feature in the upcoming releases. With more and more of the web becoming dynamic, this should be priority feature. Other than that, I have nothing but praise for Scrapy.
Pomp is apparently a simplistic take on Scraping, but it doesn't handle redirects, caching, cookies, authentication etc. I wonder if it provides parallel processing out of the box; most likely not. This is a bit strange, because these are the features for which I would prefer using a framework over writing my own code, which makes me less inclined to try Pomp.
Scrapy for the win :D
Just seeing this for the first time. Got a proof-of-concept demo working in no time (after fussing with install requirements...). Looks like a great tool for non-devs like me who still need to scrape things occasionally for data collection and analysis.
A few years ago I tried to set up a Mac with a scientific computing stack and it took me days to hack my way through all the various dependencies and incompatible versions. Anaconda now lets me do that in a minutes.
One issue with using your system's packaging system is that a lot of system utilities are written in Python which makes it harder to play around with new versions, bleeding edge libs, etc.
Not for binary packages, which usually end up requiring libraries installed into /user/local (and let's not get started about the mess that is Python binary deployment on Windows).
Me too...I taught a python class by making everyone download Anaconda's distribution of 3.x...and everyone could do the assignments no matter what kind of computer they used. Anaconda does a little too much for me to have it be my own default install but it does quite well in on boarding beginners. I use pyenv to install maintain Anaconda on my own machine when I need to replicate student work
It takes precedence in the path over everything...and in the last version I used (before I upgraded to OS X El Capitain and wiped out everything), things like `curl` were provided [1] ...which I completely understand for Anaconda's use case, but it caused a lot of confusing grief to me when I hadn't expected that and OpenSSL was having its rough times.
I don't know if that's the case (curl being part of the package) now, with Anaconda 3 2.4.0+? It certainly isn't so when installed via pyenv, so I'm happy with that. But there were other issues in the past build...BeautifulSoup was inexplicably broken. I mean that it simply did not correctly parse non-trivial HTML pages and yet threw no errors. The results could be replicated for all of my students but I never could isolate the issue... I installed Python 3 and the same version of BS4 from scratch and had no problems, but I can't imagine where the Anaconda build would have gotten wrong. It ended up being OK since I just switched to lxml which I now happily use over BS4 on any day, but it was frustrating to not be able to diagnose the problem (I didn't get a response in the support forums either). I'm assuming this problem has gone away in subsequent versions of Anaconda though I haven't tried since lxml is perfectly fine to me.
And finally...well, I have to admit it, but I use Python like a goddamned moron in that I still don't know how to use virutalenv/venv to do proper dev isolation. And from the brief research I did, I see that Anaconda has its own conventions, or work flow...something with the conda utility. Again, I can see why it's necessary for Anaconda's use case (people who want to do data science and not hand-tweak their environment every time they upgrade a package over pip), but it added too many layers for me at the time.
> And finally...well, I have to admit it, but I use Python like a goddamned moron in that I still don't know how to use virutalenv/venv to do proper dev isolation.
I was the same way for quite a while, until I bumped into pyenv-virtualenv[1]. Just install that plugin, and you can do, eg,
pyenv virtualenv 3.5.1 my-project
to get a virtual environment called `my-project` based off of Python 3.5.1 (assuming that you've installed 3.5.1 via pyenv, of course). Or, you can just do
pyenv virtualenv my-project
to make a virtualenv called `my-project` based off of the current version of Python that you're using.
Once you do that, pyenv treats `my-project` just as another installation of Python. In fact, `my-project` will show up in the list of installed versions (`pyenv versions`), and you can switch to it:
pyenv global my-project
(Or you can switch at the local or shell levels. Whichever.)
And voila! You have your own virtual environment that can contain its own list of libraries.
And no, I'm not a shill for the creator of pyenv, I just really like the software.
Thanks for this...wrapping it up in pyenv is a lot more familiar to me. And why would you apologize for shilling for pyenv?...it's amazing :) (as is rbenv, its inspiration)
And ehhh, I've been downvoted and bitched at about evangelizing pyenv before. Just thought I'd preempt that. But yes, it's an amazing piece of software. :)
FYI, I've put together a bash function for my .bash_profile that adds an indicator to my prompt showing the current Python version/virtualenv in use[1]. That's saved me a bit of frustration when going into a directory where a local pyenv version overrides the global version.
A few years back I made the mistake of allowing Anaconda to prepend it's path to .bashrc without realizing it. I was a bit of a novice back then, but I had a number of existing projects in virtualenvs on my system and was rather upset when everything stopped working because my default python had changed. For those out there that would like to test out Anaconda but already have a lot of projects using their default installation, I would recommend using this installation guide to keep things separate:
While there are certainly advantages to Anaconda, I've never encountered any troubles installing Pandas, NumPy, SciPy, or scikit-learn on any OS X or linux system. In my experience, getting GCC up and running is far more of a pain in the ass (and it usually isn't even that bad).
I use pyenv[1] and pyenv-virtualenv[2] to easily keep track of Python versions and virtual environments. I keep one virtual environment for each project I'm working on, and things prettymuch Just Work.
Pyrasite. When you have a running python app that is behaving oddly and you can't replicate the bug elsewhere, you can run python code inside the running process - without any preparation beforehand - to display stack trace, output vars,...
Anaconda has been a lifesaver, because it can be installed and managed quite easily without root privileges (it even installs pip). Some of the sysadmins where I work are slower than molasses when it comes to installing python packages (as in, it takes months of repeated emails from multiple people to get anything done), and what is installed is often years out of date.
> I went looking for a pure-Python NoSQL database and came across TinyDB…which had a simple interface, and has handled everything I’ve thrown at it so far!
Why would anyone need a simple NoSQL? Why would you go the NoSQL route if it isn't a HUGE complex database?
That particular back-end is deprecated, but the same API is provided by the dbm/gdbm/dumbdbm modules. Those still exist in Python 3, although they've been consolidated under one top-level module.
We've transitioned our local/dev/prod instances to use conda on Heroku, and couldn't be happier. It was a tiny bit of work to get it set up, but now everything is consistent, and we can set up new local environments in seconds.
So I have been considering this. does conda track pypi or does it lag it? I have been concerned about moving over my requirements.text for a webapp with lots of dependencies
It's also pretty straightforward to set up your own Conda package tree. Nice for packaging your app for deployment or making sure you have very precise dependencies.
I think deployment is a solved problem with docker. Its libraries like blas,etc that are a huge pain. I'm not sure why static linked bumpy is not possible - even anaconda could not achieve it.
If you've ever tried to dive into the NumPy build process you'd see why. It's unbelievably complicated... not that they really could do it better given that they are compiling about a billion scientific libraries and support alternatives and optimizations (like MKL).
Yes - unfortunately I have and I failed miserably.
These days I'm trying to see if there's a docker build that can build a great numpy (with all optimizations). Interestingly there are even docker images to call cuda APIs from python.
We have to use a mix of pypi and conda since quite a few of our dependencies are not in conda. We have a script which checks conda first, then falls back to pypi, all from one requirements.txt
I really like lists like this. I get updates daily on which of my github friends (is that what they're called?) have starred and there is no real reason why they're following a project. I can look at the README and guess. I did see someone start following this project the other day https://github.com/elastic/elasticsearch-dsl-py, which seems pretty interesting. Has anyone used it?
It's something like Django models but with Elasticsearch. You can create object classes and then save them to Elasticsearch, query them, etc. It's built on the lower-level https://github.com/elastic/elasticsearch-py. Very handy.
tqdm looks super promising. progressbar and progressbar2 end up being complicated and weird enough to use that my company ended up making wrappers. Why maintain that when you can just use a library that works out of the box.
It would be great if it had ipython notebook support. I often end up doing long operations that scrape services for data but have no idea what their progress is.
For me 2015 has been the year of tox. It is a great tool and worth using for just about any python project.
tqdm is failing for me on Windows at the moment. To be fair, it might not be its fault (I'm mixing it with Blinker signals), but still I'm slightly disappointed.
A lot of these "magic" tools fall apart when you're trying to do something slightly more structured than "throwaway bunch o' functions".
It's not much about language, it's mostly about code/ecosystem re-use (that is: if you have a library available in system X and you're writing for Y you can still take advantage of it).
Just write the next thing in Python 3, there's not really that much to learn right away as much as there are minor surprises that you can very quickly get up to speed on as you encounter them.
I've been dragging my heels on this for a long time, but I'm finally starting to take the plunge into 3.
I think for any greenfield project, there's very little reason to use CPython 2 anymore. If you want performance, use PyPy. If you want features, use CPython 3. From now on, that's the philosophy I'm following whenever I write something new.
I have been a bit 'off Python' for a while, but this list article prompted me to take a renewed look at it because of Jupyter Notebook, and I have to say I'm quite impressed .. this is a really nice way of working on code, wow .. especially using folium this way is very cool.
This is a tangent, but the most annoying change in the latest Python versions is you can no longer write print "foo". Now it has to be print("foo"). Damn kids ruining my language.
> This is a tangent, but the most annoying change in the latest Python versions is you can no longer write print "foo". Now it has to be print("foo").
The statement-to-function migration for print is, IMO, generally an improvement, but in any case its not a change in the latest versions of python, except with an unusually broad interpretation of latest; its a change in Python 3.0, which was released a little over 7 years ago.
I switched to Python three a few months ago. And it still gets me and I end up typing "print variable", only to have Python complain. It's probably stuck with me because of it's simplicity, though I do understand why they've removed it. It's an aberration in the syntax, for lack of a better way of putting it.
https://pypi.python.org/pypi/schema
Here's a schema I use in production, see how readable it makes the parameters of the API and how quick all the validation and normalization is:
https://www.pastery.net/mhwwnv/
At the end, you get an object called data, and you can do data.title, data.language, etc, and be sure that everything is as you expect.