The Django Speed Handbook: making a Django app faster

这篇文章真的写的非常好,讲解到了django是如何一步一步做性能优化的,可以优化的点还是很多的。

February 25, 2020

Over the course of developing several Django apps, I’ve learned quite a bit about speed optimizations. Some parts of this process, whether on the backend or frontend, are not well-documented. I’ve decided to collect most of what I know in this article.

If you haven’t taken a close look at the performance of your web-app yet, you’re bound to find something good here.

 What’s in this article?

Why speed is important

On the web, 100 milliseconds can make a significant difference and 1 second is a lifetime. Countless studies indicate that faster loading times are associated with better conversion-rates, user-retention, and organic traffic from search engines. Most importantly, they provide a better user experience.

Different apps, different bottlenecks

There are many techniques and practices to optimize your web-app’s performance. It’s easy to get carried away. Look for the highest return-to-effort ratio. Different web-apps have different bottlenecks and therefore will gain the most when those bottlenecks are taken care of. Depending on your app, some tips will be more useful than others.

While this article is catered to Django developers, the speed optimization tips here can be adjusted to pretty much any stack. On the frontend side, it’s especially useful for people hosting with Heroku and who do not have access to a CDN service.

Analyzing and debugging performance issues

On the backend, I recommend the tried-and-true django-debug-toolbar. It will help you analyze your request/response cycles and see where most of the time is spent. Especially useful because it provides database query execution times and provides a nice SQL EXPLAIN in a separate pane that appears in the browser.

Google PageSpeed will display mainly frontend related advice, but some can apply to the backend as well (like server response times). PageSpeed scores do not directly correlate with loading times but should give you a good picture of where the low-hanging fruits for your app are. In development environments, you can use Google Chrome’s Lighthouse which provides the same metrics but can work with local network URIs. GTmetrix is another detail-rich analysis tool.

Disclaimer

Some people will tell you that some of the advice here is wrong or lacking. That’s okay; this is not meant to be a bible or the ultimate go-to-guide. Treat these techniques and tips as ones you may use, not should or must use. Different needs call for different setups.

Backend: the database layer

Starting with the backend is a good idea since it’s usually the layer that’s supposed to do most of the heavy lifting behind the scenes.

There’s little doubt in my mind which two ORM functionalities I want to mention first: these are select_related and prefetch_related. They both deal specifically with retrieving related objects and will usually improve speed by minimizing the number of database queries.

select_related

Let’s take a music web-app for example, which might have these models:

# music/models.py, some fields & code omitted for brevity
class RecordLabel(models.Model):
    name = models.CharField(max_length=560)

class MusicRelease(models.Model):
    title = models.CharField(max_length=560)
    release_date = models.DateField()

class Artist(models.Model):
    name = models.CharField(max_length=560)
    label = models.ForeignKey(
        RecordLabel,
        related_name="artists",
        on_delete=models.SET_NULL
    )
    music_releases = models.ManyToManyField(
        MusicRelease, 
        related_name="artists"
    )

So each artist is related to one and only one record company and each record company can sign multiple artists: a classic one-to-many relationship. Artists have many music-releases, and each release can belong to one artist or more.

I’ve created some dummy data:

  • 20 record labels
  • each record label has 25 artists
  • each artist has 100 music releases

Overall, we have ~50,500 of these objects in our tiny database.

Now let’s wire-up a fairly standard function that pulls our artists and their label. django_query_analyze is a decorator I wrote to count the number of database queries and time to run the function. Its implementation can be found in the appendix.

# music/selectors.py
@django_query_analyze
def get_artists_and_labels():
    result = []
    artists = Artist.objects.all()
    for artist in artists:
        result.append({"name": artist.name, "label": artist.label.name})
    return result

get_artists_and_labels is a regular function which you may use in a Django view. It returns a list of dictionaries, each contains the artist’s name and their label. I’m accessing artist.label.name to force-evaluate the Django QuerySet; you can equate this to trying to access these objects in a Jinja template:

{% for artist in artists_and_labels %}
    <p>Name: {{ artist.name }}, Label: {{ artist.label.name }}</p>
{% endfor %}

Now let’s run this function:

ran function get_artists_and_labels
--------------------
number of queries: 501
Time of execution: 0.3585s

So we’ve pulled 500 artists and their labels in 0.36 seconds, but more interestingly — we’ve hit the database 501 times. Once for all the artists, and 500 more times: once for each of the artists’ labels. This is called “The N+1 problem”. Let’s tell Django to retrieve each artist’s label in the same query with select_related:

@django_query_analyze
def get_artists_and_labels_select_related():
    result = []
    artists = Artist.objects.select_related("label") # select_related
    for artist in artists:
        result.append(
            {"name": artist.name, "label": artist.label.name if artist.label else "N/A"}
        )
    return result

Now let’s run this:

ran function get_artists_and_labels_select_related
--------------------
number of queries: 1
Time of execution: 0.01481s

500 queries less and a 96% speed improvement.

prefetch_related

Let’s look at another function, for getting each artist’s first 100 music releases:

@django_query_analyze
def get_artists_and_releases():
    result = []
    artists = Artist.objects.all()[:100]
    for artist in artists:
        result.append(
            {
                "name": artist.name,
                "releases": [release.title for release in artist.music_releases.all()],
            }
        )
    return result

How long does it take to fetch 100 artists and 100 releases for each one of them?

ran function get_artists_and_releases
--------------------
number of queries: 101
Time of execution: 0.18245s

Let’s change the artists variable in this function and add select_related so we can bring the number of queries down and hopefully get a speed boost:

artists = Artist.objects.select_related("music_releases")

If you actually do that, you’ll get an error:

django.core.exceptions.FieldError: Invalid field name(s) given in select_related: 'music_releases'. Choices are: label

That’s because select_related can only be used to cache ForeignKey or OneToOneField attributes. The relationship between Artist and MusicRelease is many-to-many though, and that’s where prefetch_ related comes in:

@django_query_analyze
def get_artists_and_releases_prefetch_related():
    result = []
    artists = Artist.objects.all()[:100].prefetch_related("music_releases") # prefetch_related
    for artist in artists:
        result.append(
            {
                "name": artist.name,
                "releases": [rel.title for rel in artist.music_releases.all()],
            }
        )
    return result

select_related can only cache the “one” side of the “one-to-many” relationship, or either side of a “one-to-one” relationship. You can use prefetch_related for all other caching, including the many side in one-to-many relationships, and many-to-many relationships. Here’s the improvement in our example:

ran function get_artists_and_releases_prefetch_related
--------------------
number of queries: 2
Time of execution: 0.13239s

Nice.

Things to keep in mind about select_related and prefetch_related:

  • If you aren’t pooling your database connections, the gains will be even bigger because of fewer roundtrips to the database.
  • For very large result-sets, running prefetch_related can actually make things slower.
  • One database query isn’t necessarily faster than two or more.

Indexing

Indexing your database columns can have a big impact on query performance. Why then, is it not the first clause of this section? Because indexing is more complicated than simply scattering db_index=True on your model fields.

Creating an index on frequently accessed columns can improve the speed of look-ups pertaining to them. Indexing comes at the cost of additional writes and storage space though, so you should always measure your benefit:cost ratio. In general, creating indices on a table will slow down inserts/updates.

Take only what you need

When possible, use values() and especially values_list() to only pull the needed properties of your database objects. Continuing our example, if we only want to display a list of artist names and don’t need the full ORM objects, it’s usually better to write the query like so:

artist_names = Artist.objects.values('name') 
# <QuerySet [{'name': 'Chet Faker'}, {'name': 'Billie Eilish'}]>

artist_names = Artist.objects.values_list('name') 
# <QuerySet [('Chet Faker',), ('Billie Eilish',)]>

artist_names = Artist.objects.values_list('name', flat=True) 
# <QuerySet ['Chet Faker', 'Billie Eilish']>

Haki Benita, a true database expert (unlike me), reviewed some parts of this section. You should read Haki’s blog.

Backend: the request layer

The next layer we’re going to look at is the request layer. These are your Django views, context processors, and middleware. Good decisions here will also lead to better performance.

Pagination

In the section about select_related we were using the function to return 500 artists and their labels. In many situations returning this many objects is either unrealistic or undesirable. The section about pagination in the Django docs is crystal clear on how to work with the Paginator object. Use it when you don’t want to return more than N objects to the user, or when doing so makes your web-app too slow.

Asynchronous execution/background tasks

There are times when a certain action inevitably takes a lot of time. For example, a user requests to export a big number of objects from the database to an XML file. If we’re doing everything in the same process, the flow looks like this:

web: user requests file -> process file -> return response

Say it takes 45 seconds to process this file. You’re not really going to let the user wait all this time for a response. First, because it’s a horrible experience from a UX standpoint, and second, because some hosts will actually cut the process short if your app doesn’t respond with a proper HTTP response after N seconds.

In most cases, the sensible thing to do here is to remove this functionality from the request-response loop and relay it to a different process:

web: user requests file -> delegate to another process -> return response 
                           |
                           v
background process:        receive job -> process file -> notify user

Background tasks are beyond the scope of this article but if you’ve ever needed to do something like the above I’m sure you’ve heard of libraries like Celery.

Compressing Django’s HTTP responses

This is not to be confused with static-file compression, which is mentioned later in the article.

Compressing Django’s HTTP/JSON responses also stands to save your users some latency. How much exactly? Let’s check the number of bytes in our response’s body without any compression:

Content-Length: 66980
Content-Type: text/html; charset=utf-8

So our HTTP response is around 67KB. Can we do better? Many use Django’s built-in GZipMiddleware for gzip compression, but today the newer and more effective brotli enjoys the same support across browsers (except IE11, of course).

Important: Compression can potentially open your website to security breaches, as mentioned in the GZipMiddleware section of the Django docs.

Let’s install the excellent django-compression-middleware library. It will choose the fastest compression mechanism supported by the browser by checking the request’s Accept-Encoding headers:

pip install django-compression-middleware

Include it in our Django app’s middleware:

MIDDLEWARE = [
    "django.middleware.security.SecurityMiddleware",
    "django.contrib.sessions.middleware.SessionMiddleware",
    "django.contrib.auth.middleware.AuthenticationMiddleware",
    "compression_middleware.middleware.CompressionMiddleware",
    # ...
]

And inspect the body’s Content-Length again:

Content-Encoding: br
Content-Length: 7239
Content-Type: text/html; charset=utf-8

The body size is now 7.24KB, 89% smaller. You can certainly argue this kind of operation should be delegated to a dedicated server like Ngnix or Apache. I’d argue that everything is a balance between simplicity and resources.

Caching

Caching is the process of storing the result of a certain calculation for faster future retrieval. Django has an excellent caching framework that lets you do this on a variety of levels and using different storage backends.

Caching can be tricky in data-driven apps: you’d never want to cache a page that’s supposed to display up-to-date, realtime information at all times. So, the big challenge isn’t so much setting up caching as it is figuring out what should be cached, for how long, and understanding when or how the cache is invalidated.

Before resorting to caching, make sure you’ve made proper optimizations at the database-level and/or on the frontend. If designed and queried properly, databases are ridiculously fast at pulling out relevant information at scale.

Frontend: where it gets hairier

Reducing static files/assets sizes can significantly speed up your web application. Even if you’ve done everything right on the backend, serving your images, CSS, and JavaScript files inefficiently will degrade your application’s speed.

Between compiling, minifying, compressing, and purging, it’s easy to get lost. Let’s try not to.

Serving static-files

You have several options on where and how to serve static files. Django’s docs mention a dedicated server running Ngnix and Apache, Cloud/CDN, or the same-server approach.

I’ve gone with a bit of a hybrid attitude: images are served from a CDN, large file-uploads go to S3, but all serving and handling of other static assets (CSS, JavaScript, etc…) is done using WhiteNoise (covered in-detail later).

Vocabulary

Just to make sure we’re on the same page, here’s what I mean when I say:

  • Compiling: If you’re using SCSS for your stylesheets, you’ll first have to compile those to CSS because browsers don’t understand SCSS.
  • Minifying: reducing whitespace and removing comments from CSS and JS files can have a significant impact on their size. Sometimes this process involves uglifying: the renaming of long variable names to shorter ones, etc…
  • Compressing/Combining: for CSS and JS, combining multiple files to one. For images, usually means removing some data from images to make their files size smaller.
  • Purging: remove unneeded/unused code. In CSS for example: removing selectors that aren’t used.

Serving static files from Django with WhiteNoise

WhiteNoise allows your Python web-application to serve static assets on its own. As its author states, it comes in when other options like Nginx/Apache are unavailable or undesired.

Let’s install it:

pip install whitenoise[brotli]

Before enabling WhiteNoise, make sure your STATIC_ROOT is defined in settings.py:

STATIC_ROOT = os.path.join(BASE_DIR, "staticfiles")

To enable WhiteNoise, add its WhiteNoise middleware right below SecurityMiddleware in settings.py:

MIDDLEWARE = [
  'django.middleware.security.SecurityMiddleware',
  'whitenoise.middleware.WhiteNoiseMiddleware',
  # ...
]

In production, you’ll have to run manage.py collectstatic for WhiteNoise to work.

While this step is not mandatory, it’s strongly advised to add caching and compression:

STATICFILES_STORAGE = 'whitenoise.storage.CompressedManifestStaticFilesStorage'

Now whenever it encounters a {% static %} tag in templates, WhiteNoise will take care of compressing and caching the file for you. It also takes care of cache-invalidation.

One more important step: To ensure that we get a consistent experience between development and production environments, we add runserver_nostatic:

INSTALLED_APPS = [
    'whitenoise.runserver_nostatic',
    'django.contrib.staticfiles',
    # ...
]

This can be added regardless of whether DEBUG is True or not, because you don’t usually run Django via runserver in production.

I found it useful to also increase the caching time:

# Whitenoise cache policy
WHITENOISE_MAX_AGE = 31536000 if not DEBUG else 0 # 1 year

Wouldn’t this cause problems with cache-invalidation? No, because WhiteNoise creates versioned files when you run collectstatic:

<link rel="stylesheet" href="/static/CACHE/css/4abd0e4b71df.css" type="text/css" media="all">

So when you deploy your application again, your static files are overwritten and will have a different name, thus the previous cache becomes irrelevant.

Compressing and combining with django-compressor

WhiteNoise already compresses static files, so django-compressor is optional. But the latter offers an additional enhancement: combining the files. To use compressor with WhiteNoise we have to take a few extra steps.

Let’s say the user loads an HTML document that links three .css files:

<head>
  <link rel="stylesheet" href="base.css" type="text/css" media="all">
  <link rel="stylesheet" href="additions.css" type="text/css" media="all">
  <link rel="stylesheet" href="new_components.css" type="text/css" media="all">
</head>

Your browser will make three different requests to these locations. In many scenarios it’s more effective to combine these different files when deploying, and django-compressor does that with its {% compress css %} template tag:

This:

{% load compress %}
<head>
  {% compress css %}
    <link rel="stylesheet" href="base.css" type="text/css" media="all">
    <link rel="stylesheet" href="additions.css" type="text/css" media="all">
    <link rel="stylesheet" href="new_components.css" type="text/css" media="all">
  {% compress css %}
</head>

Becomes:

<head>
  <link rel="stylesheet" href="combined.css" type="text/css" media="all">
</head>

Let’s go over the steps to make django-compressor and WhiteNoise play well. Install:

pip install django_compressor

Tell compressor where to look for static files:

COMPRESS_STORAGE = "compressor.storage.GzipCompressorFileStorage"
COMPRESS_ROOT = os.path.abspath(STATIC_ROOT)

Because of the way these two libraries intercept the request-response cycle, they’re incompatible with their default configurations. We can overcome this by modifying some settings.

I prefer to use environment variables in .env files and have one Django settings.py, but if you have settings/dev.py and settings/prod.py, you’ll know how to convert these values:

main_project/settings.py:

from decouple import config
#...

COMPRESS_ENABLED =  config("COMPRESS_ENABLED", cast=bool)
COMPRESS_OFFLINE = config("COMPRESS_OFFLINE", cast=bool)

COMPRESS_OFFLINE is True in production and False in development. COMPRESS_ENABLED is True in both

.

With offline compression, one must run manage.py compress on every deployment. On Heroku, you’ll want to disable the platform from automatically running collectstatic for you (on by default) and instead opt to do that in the post_compile hook, which Heroku will run when you deploy. If you don’t already have one, create a folder called bin at the root of your project and inside of it a file called post_compile with the following:

python manage.py collectstatic --noinput
python manage.py compress --force
python manage.py collectstatic --noinput

Another nice thing about compressor is that it can compress SCSS/SASS files:

COMPRESS_PRECOMPILERS = (
    ("text/x-sass", "django_libsass.SassCompiler"),
    ("text/x-scss", "django_libsass.SassCompiler"),
)

Minifying CSS & JS

Another important thing to apply when talking about load-times and bandwidth usage is minifying: the process of (automatically) decreasing your code’s file-size by eliminating whitespace and removing comments.

There are several approaches to take here, but if you’re using django-compressor specifically, you get that for free as well. You just need to add the following (or any other filters compressor supports) to your settings.py file:

COMPRESS_FILTERS = {
    "css": [
        "compressor.filters.css_default.CssAbsoluteFilter",
        "compressor.filters.cssmin.rCSSMinFilter",
    ],
    "js": ["compressor.filters.jsmin.JSMinFilter"],
}

Defer-loading JavaScript

Another thing that contributes to slower performance is loading external scripts. The gist of it is that browsers will try to fetch and execute JavaScript files in the <head> tag as they are encountered and before parsing the rest of the page:

<html>
  <head>
    <script src="https://will-block.js"></script>
      <script src="https://will-also-block.js"></script>
  </head>
</html>

We can use the async and defer keywords to mitigate this:

<html>
  <head>
      <script async src="somelib.somecdn.js"></script>
  </head>
</html>

async and defer both allow the script to be fetched asynchronously without blocking. One of the key differences between them is when the script is allowed to execute: With async, once the script has been downloaded, all parsing is paused until the script has finished executing, while with defer the script is executed only after all HTML has been parsed.

I suggest referring to Flavio Copes’ article on the defer and aysnc keywords. Its general conclusion is:

The best thing to do to speed up your page loading when using scripts is to put them in the head, and add a defer attribute to your script tag.

Lazy-loading images

Lazily loading images means that we only request them when or a little before they enter the client’s (user’s) viewport. It saves time and bandwidth ($ on cellular networks) for your users. With excellent, dependency-free JavaScript libraries like LazyLoad, there really isn’t an excuse to not lazy-load images. Moreover, Google Chrome natively supports the lazy attribute since version 76.

Using the aforementioned LazyLoad is fairly simple and the library is very customizable. In my own app, I want it to apply on images only if they have a lazy class, and start loading an image 300 pixels before it enters the viewport:

$(document).ready(function (e) {
  new LazyLoad({
    elements_selector: ".lazy", // classes to apply to
    threshold: 300 // pixel threshold
  })
})

Now let’s try it with an existing image:

<img class="album-artwork" alt="{{ album.title }}"  src="{{ album.image_url }}">

We replace the src attribute with data-src and add lazy to the class attribute:

<img class="album-artwork lazy" alt="{{ album.title }}"  data-src="{{ album.image_url }}">

Now the client will request this image when the latter is 300 pixels under the viewport.

If you have many images on certain pages, using lazy-loading will dramatically improve your load times.

Optimize & dynamically scale images

Another thing to consider is image-optimization. Beyond compression, there are two more techniques to consider here.

First, file-format optimization. There are newer formats like WebP that are presumably 25-30% smaller than your average JPEG image at the same quality. As of 02/2020 WebP has decent but incomplete browser support, so you’ll have to provide a standard format fallback if you want to use it.

Second, serving different image-sizes to different screen sizes: if some mobile device has a maximum viewport width of 650px, then why serve it the same 1050px image you’re displaying to 13″ 2560px retina display?

Here, too, you can choose the level of granularity and customization that suits your app. For simpler cases, You can use the srcset attribute to control sizing and be done at that, but if for example you’re also serving WebP with JPEG fallbacks for the same image, you may use the <picture> element with multiple sources and source-sets.

If the above sounds complicated for you as it does for me, this guide should help explain the terminology and use-cases.

Unused CSS: Removing imports

If you’re using a CSS framework like Bootstrap, don’t just include all of its components blindly. In fact, I would start with commenting out all of the non-essential components and only add those gradually as the need arises. Here’s a snippet of my bootstrap.scss, where all of its different parts are imported:

// ...

// Components
// ...
@import "bootstrap/dropdowns";
@import "bootstrap/button-groups";
@import "bootstrap/input-groups";
@import "bootstrap/navbar";
// @import "bootstrap/breadcrumbs";
// @import "bootstrap/badges";
// @import "bootstrap/jumbotron";

// Components w/ JavaScript
@import "bootstrap/modals";
@import "bootstrap/tooltip";
@import "bootstrap/popovers";
// @import "bootstrap/carousel";

I don’t use things like badges or jumbotron so I can safely comment those out.

Unused CSS: Purging CSS with PurgeCSS

A more aggressive and more complicated approach is using a library like PurgeCSS, which analyzes your files, detects CSS content that’s not in use, and removes it. PurgeCSS is an NPM package, so if you’re hosting Django on Heroku, you’ll need to install the Node.js buildpack side-by-side with your Python one.

Conclusion

I hope you’ve found at least one area where you can make your Django app faster. If you have any questions, suggestions, or feedback don’t hesitate to drop me a line on Twitter.

Appendices

Decorator used for QuerySet performance analysis

Below is the code for the django_query_analyze decorator:

from timeit import default_timer as timer
from django.db import connection, reset_queries

def django_query_analyze(func):
    """decorator to perform analysis on Django queries"""

    def wrapper(*args, **kwargs):

        avs = []
        query_counts = []
        for _ in range(20):
            reset_queries()
            start = timer()
            func(*args, **kwargs)
            end = timer()
            avs.append(end - start)
            query_counts.append(len(connection.queries))
            reset_queries()

        print()
        print(f"ran function {func.__name__}")
        print(f"-" * 20)
        print(f"number of queries: {int(sum(query_counts) / len(query_counts))}")
        print(f"Time of execution: {float(format(min(avs), '.5f'))}s")
        print()
        return func(*args, **kwargs)

    return wrapper
2020/3/29 posted in  python

django 的 gulp集成

django 集成gulp task run来更方便的发布静态文件

基于Adminto3的基本的gulp使用

首先是安装 gulp

$ npm install gulp -g

然后运行到模板文件中,运行

$ npm install
  • gulp - recompiles and minifies theme assets into dist directory and starts local server serving the theme.
  • gulp build - recompiles and minifies theme assets into dist folder.

迁移步骤

  • 将src、package.json和gulpfile.js 拷贝到项目根路径
  • 在Template中引入模板文件
  • 修改gulpfile.js

将输出路径改到static下面,使用dist来单独隔开

var folder = {
    src: "src/", // source files
    dist: "dist/", // build files
    dist_assets: "collect_web/static/dist/assets/" //build assets files 
};

删除掉HTML相关部分,使用django来管理HTML template,所以不在需要使用gulp来管理了

// copy html files from src folder to dist folder, also copy favicons
function html() {
    var out = folder.dist;

    return gulp
        .src([
            folder.src + "html/*.html",
            folder.src + "html/*.ico", // favicons
            folder.src + "html/*.png"
        ])
        .pipe(fileinclude({
            prefix: '@@',
            basepath: '@file',
            indent: true
        }))
        .pipe(gulp.dest(out));
}

同时删除下面的HTML调用部分


function watchFiles() {
    gulp.watch(folder.src + "html/**", gulp.series(html, reloadBrowserSync));
    gulp.watch(folder.src + "assets/images/**/*", gulp.series(imageMin, reloadBrowserSync));
    gulp.watch(folder.src + "assets/fonts/**/*", gulp.series(fonts, reloadBrowserSync));
    gulp.watch(folder.src + "scss/**/*", gulp.series(css, reloadBrowserSync));
    gulp.watch(folder.src + "js/**/*", gulp.series(javascript, reloadBrowserSync));
}


// default task
gulp.task(
    "default",
    gulp.series(
        copyAssets,
        html,
        imageMin,
        fonts,
        css,
        javascript,
        'watch'
    ),
    function(done) {done();}
);
  • 使用npm install 安装
  • 开发的时候启动gulp即可,这样在修改的时候就可以自动更新的dist中
  • 修改gitignore文件,不要把node_modules文件上传到Git中了
  • 修改模板文件的static

2019/8/29 posted in  python

在线python editor

2019/8/26 posted in  python

Computer Science课程研究

2019/8/25 posted in  python

django Template中取多对多关系

{% for place in places %}
    Name: {{ place.name }}<br/>
    Area: <br/>{% for area in places.area.all %}{{ area }}<br/>{% endfor %}
{% endfor %}

在Template中可以取到多对多关系,使用不带括号的搜索就可以拿到。

也可以使用

{% for place in places %}
    Name: {{ place.name }}, Area: {{ place.area.all|join:", " }}
{% endfor %}

查询模式2

将查询到的内容批量存入context中

from myapp.models import Area, Place

def detail(request, place_id):
    place = Place.objects.get(pk=place_id)
    areas = place.area.all()

    return render_to_response('detail.html', {
        "place": place,
        "areas": areas,
    })
<h3>{{ place }}</h3>

{% if areas %}
  <ul>
  {% for area in areas %}
    <li>{{ area.name }}</li>
  {% endfor %}
  </ul>
{% endif %}
2019/8/22 posted in  python

django-taggit

安装 django-taggit

pip install django-taggit -i https://mirrors.aliyun.com/pypi/simple

添加模型

setting.py 中添加引入taggit app

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'blog',
    'taggit',
]

models.py中添加tag字段

class Article(BaseSchema):
    title = models.CharField(max_length=1023, null=True, blank=True, default='')
    description = models.CharField(max_length=2047, null=True, blank=True, default='')
    content = models.TextField(null=True, blank=True, default='')
    slug = models.SlugField("slug", max_length=255,
                            help_text="Used to build the article's URL.")
    banner = models.URLField(null=True, blank=True)
    header_image_height = models.PositiveIntegerField(default=75)
    header_image_width = models.PositiveIntegerField(default=75)
    header_image = models.ImageField(upload_to='photos/%Y/%m', height_field='header_image_height',
                                     width_field='header_image_width',
                                     null=True, blank=True)
    categories = models.ManyToManyField(
        Category, blank=True, related_name="entries", verbose_name="categories")
    tags = TaggableManager()

    def __unicode__(self):
        return self.title

    def __str__(self):
        return self.title

添加完成之后重新发布模型

python manage.py makemigrations
python manage.py migrate

在admin中添加模型

class ArticleAdmin(admin.ModelAdmin):
    # fields = ["title", "description", "banner","content","header_image","categories"]
    fieldsets = [
        ('basic content', {"fields": ["title", "slug", "description", "content", "tag"]}),
        ('图片', {"fields": ["banner", "header_image"]}),
        ('category', {"fields": ["categories"]})
    ]
    list_display = ["title", "description", "updatedAt"]
    list_filter = ['updatedAt']
    search_fields = ['title', 'description']
    filter_horizontal = ('categories',)

    def get_queryset(self, request):
        qs = super(ArticleAdmin, self).get_queryset(request)
        qs = qs.filter(deletedAt=None)
        return qs

在前台显示tags

2019/8/21 posted in  python

cookiecutter-django 部署指南

cookiecutter对于项目的管理结构比较清晰,但是在部署的时候就会出现这样那样的问题,这里整理一下cookiecutter构建的项目上基于guincorn的部署指南。 后面再逐步整理其他的部署指南

安装配置指南

1. 虚拟环境的安装和配置

首先还是要创建我们的虚拟环境 virtualenv,这里就不过多介绍了。配置完成后,进入虚拟环境,安装依赖包

(venv)....$ pip install -r requirements/production.txt

系统原生生成的还是需要添加我们自己添加的部分依赖包,所以在之前还是要修改一下production.txt文件,加入自己的依赖包

完成后对于cookiecutter还是要加入自己的一些配置文件,配置文件建议放到venv中,这里我们在虚拟环境中添加一个新的配置文件

2. 修改环境变量

# vi ../collectvenv/bin/postactivate

export WEB_CONCURRENCY=4
export DJANGO_SETTINGS_MODULE='config.settings.production'
export DATABASE_URL='postgres://<user>:<password>@<ip>:<port>/<databasename>'
export DJANGO_SECRET_KEY='<添加秘钥>'
export DJANGO_ALLOWED_HOSTS='<添加网站名>'
export REDIS_URL='redis://127.0.0.1:6379/<redis>'
export DJANGO_ADMIN_URL='admin/'

把这些添加上之后不要忘记了要执行一下

(venv)... $ source ../collectvenv/bin/postactivate

让这个配置文件在当前环境下生效

3. 添加guincorn配置文件

配置文件可以在collect_web项目路径下

vi /opt/collect/collect_web# vi gunicorn.conf.py

把相应的配置都加上

import logging
import logging.handlers
from logging.handlers import WatchedFileHandler
import os
import multiprocessing
bind = "0.0.0.0:9091"   #绑定的ip与端口
backlog = 512                #监听队列数量,64-2048
#chdir = '/home/test/server/bin'  #gunicorn要切换到的目的工作目录
worker_class = 'sync' #使用gevent模式,还可以使用sync 模式,默认的是sync模式
workers = 4 # multiprocessing.cpu_count()    #进程数
threads = 16 #multiprocessing.cpu_count()*4 #指定每个进程开启的线程数
loglevel = 'info' #日志级别,这个日志级别指的是错误日志的级别,而访问日志的级别无法设置
access_log_format = '%(t)s %(p)s %(h)s "%(r)s" %(s)s %(L)s %(b)s %(f)s" "%(a)s"'
# accesslog = "/home/log/gunicorn_access.log"      #访问日志文件
#errorlog = "/home/log/gunicorn_error.log"        #错误日志文件
accesslog = "-"  #访问日志文件,"-" 表示标准输出
errorlog = "-"   #错误日志文件,"-" 表示标准输出
proc_name = 'collect_web'   #进程名

4. 执行测试

配置完成后可以执行测试

/opt/collect/collect_web# gunicorn -c gunicorn.conf.py config.wsgi

使用默认的gunicorn需要对项目的路径比较清楚,我们发现cookiecutter的wsgi.py文件是写在config里面的,所以这里我们需要修改到config下面,才能找到相应的wsgi,然后进行到下一步

5. 修改配置文件

启动后会发现依然有一部分的环境变量找不到,实际上是因为我们没有使用那一部分。所以这里比较重要的就是修改config中的production.py中的相关内容。把不需要的尽量屏蔽掉。

6. 发布相应文件

如果是新建数据库的话记得migrate。如果是新环境的话记得去collectstatic一下,把所有的静态文件都发布到同一个包中去。

cd project-directory
git pull
python manage.py migrate
python manage.py collectstatic

7. 配置nginx

server {
    server_name ***
    listen 80;
    location /{
        proxy_pass http://127.0.0.1:9091/;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_pass_request_body on;
        client_max_body_size 100m;
    }

    location /static/ {
        alias /opt/collect/collect_web/staticfiles/;
    }
}

systemd配置

待完成事项:

  • 配置自启动和进程监控
  • 配置github钩子,一键同步更新

可以参考的网站

gunicorn官方部署文档

cookiecutter-django官方文档

2019/8/13 posted in  python

numpy and pandas

安装

Terminal中运行,注意请确保前面的虚拟环境

# 安装numpy
(venv)$ pip install numpy -i https://mirrors.aliyun.com/pypi/simple
# 安装pandas
(venv)$ pip install pandas -i https://mirrors.aliyun.com/pypi/simple

(venv)$ pip install matplotlib -i https://mirrors.aliyun.com/pypi/simple

数据导入

df = pd.read_excel('../uva_course_spider/session_all.excel')
#指定导入sheet
df = pd.read_excel('../uva_course_spider/session_all.excel', sheet_name='Sheet1')
#按照个数指定sheet和索引列,使用None为默认第一行
df = pd.read_excel('../uva_course_spider/session_all.excel', sheet_name=0, header=0)
#指定导入列
df = pd.read_excel('../uva_course_spider/session_all.excel', usecols=[0,2])

#导入csv文件
df = pd.read_csv('../uva_course_spider/session_all.csv', header=None, sep=',')
#指定导入行数
df = pd.read_csv('../uva_course_spider/session_all.csv', header=None, sep=',', nrows=2,encoding='utf-8')


Viewing data

# 显示数据的头几行
df.head()
# 显示数据的尾行
df.tail(3)
# 显示行索引
df.index
# 显示列索引
df.columns
# 转化为矩阵
df.to_numpy()


DataFrame

#选择某些列
>>> df.iloc[:,[5,8]]
#选择连续列
>>> df.iloc[:,5:8]

#选择行
>>> df.iloc[0]
#选择某些行
>>> df.iloc[[1,4]]
#选择某几个连续行
>>> df.iloc[[1:4]]

# 过滤行
>>> df[df[5] != 'Staff']
# 过滤行后选择列
>>> df[df[5] != 'Staff'][5]
# 等同于
>>> df[df[5] != 'Staff'].iloc[:,5]

Missing Data

# 删除NA
df1.dropna(how='any')
# 填充NA
df1.fillna(value=5)
# 判断是否是NA
pd.isna(df1)

数据可视化

安装matplotlib

$ pip install matplotlib -i https://mirrors.aliyun.com/pypi/simple

数据导出

相关教材

numpy官方文档

pandas官网

pandas官方教材

pandas cheat sheet

Matplotlib 中文文档

2019/7/31 posted in  python

catalog目录树状图的形成

目录的树状图的django中可以使用第三方包来完善,我们可以使用django-mptt这个库来完善相应的信息

github路径

首先需要在模型中添加相应的字段,使用

from mptt.models import MPTTModel, TreeForeignKey

class TagCatalog(MPTTModel):
    tag_catalog_name = models.CharField('tag_catalog_name', null=True, blank=True, max_length=255)
    tag_catalog_description = models.TextField('tag_catalog_description', null=True, blank=True, max_length=255)
    c_tags = models.ManyToManyField(MyCustomTag, blank=True )
    parent = TreeForeignKey('self', on_delete=models.CASCADE, null=True, blank=True, related_name='tag_catalog')

    class MPTTMeta:
        order_insertion_by = ['tag_catalog_name']

    def __str__(self):
        return self.tag_catalog_name

发布模型之后,可以修改admin的内容

from mptt.admin import MPTTModelAdmin

class TagCatalogAdmin(MPTTModelAdmin):
    mptt_level_indent = 20
    fields = ['tag_catalog_name', 'tag_catalog_description', 'parent', 'c_tags']
    list_display =('tag_catalog_name',)
    filter_horizontal = ('c_tags',)
    
    
admin.site.register(TagCatalog, TagCatalogAdmin)
    

在views中添加相应的模型

最后在Template中添加模板就可以了


<ul>
              {% recursetree tag_catalogs %}
                <li>
                  {% if node.level == 0  %}
                     <h4 class="widget-title">{{ node.tag_catalog_name }}</h4>
                  {% elif node.level == 1 %}
                     <h6>{{ node.tag_catalog_name }}</h6>
                  {% else %}
                     {{ node.tag_catalog_name }}
                  {% endif %}
                  {% if not node.is_leaf_node %}
                    <ul class="children">
                      {{ children }}
                    </ul>
                  {% endif %}
                </li>
              {% endrecursetree %}
2019/7/29 posted in  python

Django分页的实现

这篇博客非常好的详述了django分页的自定义修改方法

发表于 2018-01-23 | 分类于 Django |

可以通过 Django 提供的 Paginator 类来实现分页功能。也可以自定义实现分页功能。

自定义实现分页

一、原理解析

1、初始数据:
total_data: 数据集合,这是一个list,存放所有的数据。
page_size: 每页显示多少条数据。
num_page: 分页处显示页码数量。

2、分页需要使用的数据
current_page: 当前页,当前的页码。
通过 request.GET 获得:

current_page = int(request.GET.get('p'))

total_num: 数据条目总数

total_num = len(total_data)

max_page: 最大页码数
计算方法: 数据总条数除以每页显示条数,若余数等于0,最大页码数为商,否则最大页码数为商+1

getmax_page = lambda x,y: divmod(x,y)[0] if divmod(x,y)[1] == 0 else divmod(x,y)[0]+1
max_page = getmax_page(total_num,page_size)

peer_data: 每页显示数据
计算方法: 当我们定义page_size=10的时候,每页显示10条数据
第一页:0:10
第二页:10:20
以此类推…….
第n页:(n-1)x10:(nx10)
就可以推出每页显示数据的计算方法:

start = (current_page-1)*page_size
end = current_page*page_size
peer_data = total_data[start:end]

prev_page: 上一页页码
当前页小于等于1的时候,上一页为1

if current_page <= 1:
    prev_page =1
else:
    prev_page = current_page - 1

next_page: 下一页页码
当前页大于等于最大页时,下一页为最大页

if current_page >= max_page:
    next_page = max_page
else:
    next_page = current_page + 1

num_page_range: 分页区域显示页码范围
此处如果全部显示页码的话不合适,页码数太多页码就超级难看了,因此可以显示一个固定数量的页码值。
前面定义的 num_page (分页区域显示的页码数量)就在这里用到了。
如果num_page=7的话就这样显示:
1 2 3 4 5 6 7
前半区 后半区
实现逻辑如下:
1、总页数 max_page 小于最多显示页数 num_page ,显示1到总页数 max_page,此时显示的页码数量小于最多显示页数 num_page
2、当前页 current_page 小于等于最多显示页数的一半 num_page/2,显示1到最多显示页数 num_page,当前页在前半区。
3、当前页 current_page 加上最多显示页数的一半 num_page/2 大于总页数 max_page,显示(总页数 max_page 减去最大显示页数 num_page)到总页数 max_page,当前页在后半区。
4、前后各显示最多显示页数一半,当前页在中间位置。

part = num_page/2
if max_page < num_page:
    num_page_range = [i for i in range(1,max_page + 1)]
elif current_page <= part:
    num_page_range = [i for i in range(1,num_page + 1)]
elif current_page + part > max_page:
    num_page_range = [i for i in range(max_page - max_page_num,max_page + 1)]
else:
    num_page_range = [i for i in range(current_page-part,current_page + part + 1)]

3、视图函数传给模板的变量汇总
current_page: 当前页码
peer_data: 当前页数据
prev_page: 上一页页码
next_page: 下一页页码
max_page: 最大页码数
num_page_range: 分页区域显示页码范围

4、前端模板逻辑分析
前端页面调用 Bootstrap 分页组件来显示:

<nav aria-label="Page navigation">
  <ul class="pagination">
    <li>
      <a href="#" aria-label="Previous">
        <span aria-hidden="true">&laquo;</span>
      </a>
    </li>
    <li><a href="#">1</a></li>
    <li><a href="#">2</a></li>
    <li><a href="#">3</a></li>
    <li><a href="#">4</a></li>
    <li><a href="#">5</a></li>
    <li>
      <a href="#" aria-label="Next">
        <span aria-hidden="true">&raquo;</span>
      </a>
    </li>
  </ul>
</nav>

img
上一页按钮
当前为第一页时,禁用左边的上一页按钮

<ul class="pagination">
{% if current_page == 1 %}
    <li class="disabled">
      <a href="#" aria-label="Previous">
        <span aria-hidden="true">&laquo;</span>
      </a>
    </li>
{% else %}
    <li>
      <a href="?p={{ prev_page }}" aria-label="Previous">
        <span aria-hidden="true">&laquo;</span>
      </a>
    </li>
{% endif %}

下一页按钮
当前为最后一页时,禁用右边的下一页按钮

{% if current_page == max_page %}
    <li class="disabled">
      <a href="#" aria-label="Next">
        <span aria-hidden="true">&raquo;</span>
      </a>
    </li>
    {% else %}
    <li>
      <a href="?p={{ next_page }}" aria-label="Next">
        <span aria-hidden="true">&raquo;</span>
      </a>
    </li>
{% endif %}

显示页码范围
根据视图传过来的分页区域显示页码范围 num_page_range 来循环显示,并对当前页进行样式处理:

{% for p in num_page_range %}
    {% if p == current_page %}
        <li class="active"><a href="#">{{ p }}</a></li>
    {% else %}
        <li><a href="?p={{ p }}">{{ p }}</a></li>
    {% endif %}
{% endfor %}

二、将分页实现为Web框架公共组件

分页的使用其实非常广泛,前面分析了分页实现的原理,这里我们将代码实现为公共组件,以后使用时就可以直接调用。

1、在app下建立py文件并自定义构建类

pager.py

#coding:utf-8

class Pagination(object):
    def __init__(self, totalCount, currentPage, perPageItemNum=10, maxPageNum=7):
        # 数据条目总数
        self.total_count = totalCount
        # 当前页页码值
        self.current_page = currentPage
        # 每页显示数据条目数
        self.per_page_item_num = perPageItemNum
        # 页码区域最多显示页码数
        self.max_page_num = maxPageNum

    def start(self):
        return (self.current_page - 1)*self.per_page_item_num

    def end(self):
        return self.current_page*self.per_page_item_num

    @property
    def num_pages(self):
        '''
        装饰器@property将函数的方法的调用方式转换为属性调用方式
        求出总页数
        '''
        a,b = divmod(self.total_count, self.per_page_item_num)
        if a == 0:
            return a
        else:
            return a+1

    def pager_num_page(self):
        '''
        分页区域显示页码范围
        '''
        part = self.max_page_num/2
        if self.num_pages < self.max_page_num:
            return range(1,self.num_pages+1)
        elif self.current_page <= part:
            return range(1,self.max_page_num+1)
        elif self.current_page + part > self.num_pages:
            return range(self.num_pages-self.max_page_num,self.num_pages+1)
        else:
            return range(self.current_page-part,self.current_page+part+1)

    def page_str(self):
        '''
        html返回到templates,templates中需引入bootStrap的css样式
        '''
        page_list = []
        first = """
        <li><a href='?p=1'>首页</a></li>
        """
        page_list.append(first)

        if self.current_page == 1:
            prev_page = """
            <li class="disabled">
                <span>
                    <span aria-hidden="true">&laquo;</span>
                </span>
            </li>
            """
        else:
            prev_page = """
            <li>
                <a href="?p=%s" aria-label="Previous">
                    <span aria-hidden="true">&laquo;</span>
                </a>
            </li>
            """ %(self.current_page - 1)
        page_list.append(prev_page)

        for i in self.pager_num_page():
            if i == self.current_page:
                temp = """
                <li class="active">

                    <li class="active"><a href="#">%s</a></li>
                </li>
                """ %i
            else:
                temp = """
                <li>
                    <a href="?p=%s">%s</a>
                </li>
                """ %(i,i)
            page_list.append(temp)

        if self.current_page == self.num_pages:
            next_page = """
            <li class="disabled">
                <span>
                    <span aria-hidden="true">&raquo;</span>
                </span>
            </li>
            """
        else:
            next_page = """
            <li>
                <a href="?p=%s" aria-label="Next">
                    <span aria-hidden="true">&raquo;</span>
                </a>
            </li>
            """ %(self.current_page + 1)
        page_list.append(next_page)

        last = """
        <li><a href='?p=%s'>尾页</a></li>
        """ %self.num_pages
        page_list.append(last)

        return ''.join(page_list)

2、调用 Pagination 实现分页

视图方法逻辑: views.py

#coding:utf-8
from django.shortcuts import render
from app01.pager import Pagination

# 定义后台数据
mylist=[]
for i in range(1,1000):
    temp = {'id':i,'name':'zhang'+str(i),'age':i,'addr':'gaoxin'+str(i),'score':2*i +1}
    mylist.append(temp)


def mypager(request):
    try:
        current_page = int(request.GET.get('p'))
    except:
        current_page = 1

    page_obj = Pagination(len(mylist),current_page,15,5)
    data = mylist[page_obj.start():page_obj.end()]
    return render(request,'mypager.html',locals())

模板使用: mypager.html

<!DOCTYPE html>
<html lang="en">
{% load staticfiles %}
<head>
    <meta charset="UTF-8">
    <title>mypager</title>
    <link rel="stylesheet" href="{% static 'css/bootstrap.css' %}">
</head>
<body>
<div class="container">
    <div class="panel">
        <table class="table table-striped">
            <tr>
                <th>ID</th>
                <th>name</th>
                <th>age</th>
                <th>addr</th>
                <th>score</th>
            </tr>
            {% for item in data %}
                <tr>
                    <td>{{ item.id }}</td>
                    <td>{{ item.name }}</td>
                    <td>{{ item.age }}</td>
                    <td>{{ item.addr }}</td>
                    <td>{{ item.score }}</td>
                </tr>
            {% endfor %}
        </table>
    </div>
    <nav aria-label="Page navigation">
        <ul class="pagination">
            {{ page_obj.page_str | safe }}
        </ul>
    </nav>
</div>
</body>
</html>

完后会效果如下:
img

使用Django内置分页

Django提供了两个新的类(Paginator和Page)来帮助你管理分页数据,这两个类存放在 django/core/paginator.py .它可以接收列表、元组或其它可迭代的对象。

1、语法解析

Paginator类
基于分页对象
count 数据总个数
num_pages 总共可分页数
page_range 总页数索引范围

Page类
基于分页对象中的某一页
object_list 分页对象的元素列表
number 分页对象的当前页值
has_next 是否有下一页
next_page_number 下一页代码
has_previous 是否有上一页
previous_page_number 上一页代码
has_other_pages 是否有其他页
start_index 分页对象元素的开始索引
end_index 分页对象元素的结束索引

2、基本语法实例

import os
 
from django.core.paginator import Paginator
objects = ['john','paul','george','ringo','lucy','meiry','checy','wind','flow','rain']<br>
p = Paginator(objects,3)  # 3条数据为一页,实例化分页对象
print p.count  # 10 对象总共10个元素
print p.num_pages  # 4 对象可分4页
print p.page_range  # xrange(1, 5) 对象页的可迭代范围
 
page1 = p.page(1)  # 取对象的第一分页对象
print page1.object_list  # 第一分页对象的元素列表['john', 'paul', 'george']
print page1.number  # 第一分页对象的当前页值 1
 
page2 = p.page(2)  # 取对象的第二分页对象
print page2.object_list  # 第二分页对象的元素列表 ['ringo', 'lucy', 'meiry']
print page2.number  # 第二分页对象的当前页码值 2
 
print page1.has_previous()  # 第一分页对象是否有前一页 False
print page1.has_other_pages()  # 第一分页对象是否有其它页 True
 
print page2.has_previous()  # 第二分页对象是否有前一页 True
print page2.has_next()  # 第二分页对象是否有下一页 True
print page2.next_page_number()  # 第二分页对象下一页码的值 3
print page2.previous_page_number()  # 第二分页对象的上一页码值 1
print page2.start_index()  # 第二分页对象的元素开始索引 4
print page2.end_index()  # 第2分页对象的元素结束索引 6

3、内置分页使用实例

视图方法逻辑: views.py

#coding:utf-8
from django.shortcuts import render
from django.core.paginator import Paginator,PageNotAnInteger,EmptyPage
# Create your views here.
mylist=[]
for i in range(1,1000):
    temp = {'id':i,'name':'zhang'+str(i),'age':i,'addr':'gaoxin'+str(i),'score':2*i +1}
    mylist.append(temp)

def myin(request):
    try:
        current_page = int(request.GET.get('p'))
    except:
        current_page = 1

    # 实例化分页对象,每页10条数据
    paginator = Paginator(mylist,10)
    try:
        # 取对象的当前页分页对象
        posts = paginator.page(current_page)
    # current_page非数字时取第一页
    except PageNotAnInteger:
        posts = paginator.page(1)
    # current_page空时取最后一页
    except EmptyPage:
        posts = paginator.page(paginator.num_pages)

    return render(request,'inpage.html',{'posts':posts})

模板使用: inpage.html

<!DOCTYPE html>
{% load staticfiles %}
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>内置</title>
    <link rel="stylesheet" href="{% static 'css/bootstrap.css' %}">
</head>
<body>
<div class="container">
<div class="panel">
    <table class="table table-striped">
        <tr>
            <th>ID</th>
            <th>name</th>
            <th>age</th>
            <th>addr</th>
            <th>score</th>
        </tr>

        {% for item in posts.object_list %}
            <tr>
                <td>{{ item.id }}</td>
                <td>{{ item.name }}</td>
                <td>{{ item.age }}</td>
                <td>{{ item.addr }}</td>
                <td>{{ item.score }}</td>
            </tr>
        {% endfor %}
    </table>
</div>
<nav aria-label="Page navigation">
  <ul class="pagination">
    <!--当前页分页对象如果有上一页-->
    {% if posts.has_previous %}
        <li>
            <a href="?p={{ posts.previous_page_number }}" aria-label="Previous">
                <span aria-hidden="true">&laquo;</span>
            </a>
        </li>

    {% else %}
        <li class="disabled">
          <a href="#" aria-label="Previous">
            <span aria-hidden="true">&laquo;</span>
          </a>
        </li>
    {% endif %}
    <!--循环显示可迭代页码范围-->
    {% for p in posts.paginator.page_range %}
        <!--如果是当前页,高亮显示-->
        {% if p == posts.number %}
            <li class="active"><a href="#">{{ p }}</a></li>
        {% else %}
            <li><a href="?p={{ p }}">{{ p }}</a></li>
        {% endif %}
    {% endfor %}
  <!--当前页分页对象如果有下一页-->
      {% if posts.has_next %}
          <li>
              <a href="?p={{ posts.next_page_number }}" aria-label="Next">
                  <span aria-hidden="true">&raquo;</span>
              </a>
          </li>
      {% else %}
          <li class="disabled">
              <a href="#" aria-label="Next">
                  <span aria-hidden="true">&raquo;</span>
              </a>
          </li>
      {% endif %}
  </ul>
</nav>
</div>
</body>
</html>

使用Django内置Paginator可以实现分页,但分页区域显示的页码范围 page_range 默认是全部,如果页数很多,页码效果将会很差,下面介绍修改扩展Django内置分页,实现自定义分页区域显示的页码范围功能。

4、扩展Django内置分页

在视图函数中自定义一个新类,继承 django.core.paginator.Paginator 来进行扩展。

from django.core.paginator import Paginator,PageNotAnInteger,EmptyPage

class CustomPaginator(Paginator):
    def __init__(self ,current_page, per_pager_num, *args, **kwargs):
        # 当前页
        self.current_page = int(current_page)
        # 页码最大显示范围
        self.per_pager_num = per_pager_num
        # 继承父类Paginator的其他属性方法
        Paginator.__init__(self, *args, **kwargs)

    def pager_num_range(self):
        if self.num_pages < self.per_pager_num:
            return range(1, self.num_pages + 1)
        part = int(self.per_pager_num / 2)
        if self.current_page <= part:
            return range(1, self.per_pager_num + 1)
        if (self.current_page + part) > self.num_pages:
            return range(self.num_pages - self.per_pager_num + 1, self.num_pages + 1)
        else:
            return range(self.current_page - part, self.current_page + part + 1)

## 调用
mylist=[]
for i in range(1,1000):
    temp = {'id':i,'name':'zhang'+str(i),'age':i,'addr':'gaoxin'+str(i),'score':2*i +1}
    mylist.append(temp)

def myin(request):
    try:
        current_page = int(request.GET.get('p'))
    except:
        current_page = 1

    # 实例化分页对象,每页10条数据
    paginator = CustomPaginator(current_page, 5, mylist, 10)
    try:
        # 取对象的当前页分页对象
        posts = paginator.page(current_page)
    # current_page非数字时取第一页
    except PageNotAnInteger:
        posts = paginator.page(1)
    # current_page空时取最后一页
    except EmptyPage:
        posts = paginator.page(paginator.num_pages)

    return render(request,'inpage.html',{'posts':posts})

完美解决问题。

2019/7/27 posted in  python

django paginator 流行分页模式

以下是当前比较流行的分页器写法,使用简单的paginator自带的分页是达不到这个效果的,我们需要自己更新我们的分页模型。

  1. 如果当前页为1, 则不显示上一页
  2. 如果当前页不为1

参看网站

具体实现

首先定义自己的分页器,扩展系统分页器。在app中创建custom_paginator.py文件

from django.core.paginator import Paginator


class CustomPaginator(Paginator):
    def __init__(self, current_page, per_pager_num, *args, **kwargs):
        # 当前页
        self.current_page = int(current_page)
        # 页码最大显示范围
        self.per_pager_num = per_pager_num
        # 继承父类Paginator的其他属性方法
        Paginator.__init__(self, *args, **kwargs)

    def pager_num_range(self):
        if self.num_pages < self.per_pager_num:
            return range(1, self.num_pages + 1)
        part = int(self.per_pager_num / 2)
        if self.current_page <= part:
            return range(1, self.per_pager_num + 1)
        if (self.current_page + part) > self.num_pages:
            return range(self.num_pages - self.per_pager_num + 1, self.num_pages + 1)
        else:
            return range(self.current_page - part, self.current_page + part + 1)

    def show_first_page(self):
        part = int(self.per_pager_num / 2)
        if self.current_page > part + 1:
            return True
        else:
            return False

    def show_last_page(self):
        part = int(self.per_pager_num / 2)
        if self.current_page < self.num_pages - part:
            return True
        else:
            return False

在view.py中添加相应的分页内容

def catalog_view(request):
    dribbbles_all = Dribbble.objects.filter(detail_flag=1).filter(download_flag=1)
    current_page = request.GET.get('page', 1)

    paginator = CustomPaginator(current_page, 5, dribbbles_all, 16)

    try:
        # 取对象的当前页分页对象
        posts = paginator.page(current_page)
        # current_page非数字时取第一页
    except PageNotAnInteger:
        posts = paginator.page(1)
        # current_page空时取最后一页
    except EmptyPage:
        posts = paginator.page(paginator.num_pages)

    context = {
        'dribbbles': posts,
    }
    return render(request, 'collect/catalog.html', context)

页面模板分页器内容

<!-- Pagination -->
          <div class="pagination clearfix">
            <nav class="pagination__nav right clearfix">
              {% if dribbbles.has_previous %}
                <a href="?page={{ dribbbles.previous_page_number }}" class="pagination__page"><i
                  class="ui-arrow-left"></i></a>
              {% endif %}

              {% if dribbbles.paginator.show_first_page %}
                <a href="?page=1" class="pagination__page">1</a>
                <span>...</span>
              {% endif %}


              {% for p in dribbbles.paginator.pager_num_range %}
                <!--如果是当前页,高亮显示-->
                {% if p == dribbbles.number %}
                  <a href="#" class="pagination__page pagination__page--current">{{ p }}</a>
                {% else %}
                  <a href="?page={{ p }}" class="pagination__page">{{ p }}</a>
                {% endif %}
              {% endfor %}

              {% if dribbbles.paginator.show_last_page %}
                <span>...</span>
                <a href="?page={{ dribbbles.paginator.num_pages }}" class="pagination__page">{{ dribbbles.paginator.num_pages }}</a>

              {% endif %}

              {% if dribbbles.has_next %}
                <a href="?page={{ dribbbles.next_page_number }}" class="pagination__page"><i
                  class="ui-arrow-right"></i></a>
              {% endif %}
            </nav>
          </div>

2019/7/27 posted in  python

django taggit扩展属性

项目中的taggit需要给标签添加新的功能

django-taggit扩展tag字段

class MyCustomTag(TagBase):
    # ... fields here

    tag_chinese = models.CharField('tag_chinese', null=True, blank=True, max_length=255)

    class Meta:
        verbose_name = _("Tag")
        verbose_name_plural = _("Tags")

    # ... methods (if any) here



class TaggedWhatever(GenericTaggedItemBase):
    # TaggedWhatever can also extend TaggedItemBase or a combination of
    # both TaggedItemBase and GenericTaggedItemBase. GenericTaggedItemBase
    # allows using the same tag for different kinds of objects, in this
    # example Food and Drink.

    # Here is where you provide your custom Tag class.
    tag = models.ForeignKey(MyCustomTag,
                            related_name="%(app_label)s_%(class)s_items", on_delete=models.CASCADE)

# 最后在自己的字段中添加相应的字段即可配置成功

class Dribbble(models.Model):
...
    tag = TaggableManager(through=TaggedWhatever)

形成的数据库如下:

collect_mycustomtag数据库

collect_taggedwhatever

这样就可以给标签添加一个tag字段

在添加以及列表页显示自己的标签字段

在collect app中添加utils.py文件

def comma_splitter(tag_string):
    tags = []
    tags_with_c = tag_string.split(',')
    for tag_with_c in tags_with_c:
        if '(' in tag_with_c:
            tag = tag_with_c.split('(')[0]
            tags.append(tag.strip().lower())
        else:
            tags.append(tag_with_c.strip().lower())
    return tags


def comma_joiner(tags):
    tag_str = ""
    for t in tags:
        if t.tag_chinese:
            tag_str = tag_str + f"{t.name}({t.tag_chinese}),"
        else:
            tag_str = tag_str + f"{t.name}(),"
    return tag_str

最后在settings.py文件中添加相应的配置

TAGGIT_TAGS_FROM_STRING = 'collect.utils.comma_splitter'
TAGGIT_STRING_FROM_TAGS = 'collect.utils.comma_joiner'

详情可以参考

django-taggit官方说明文档

2019/7/27 posted in  python

cookiecutter-django 项目使用指南

cookiecutter集成了比较多的配置内容,帮助我们更快的生成项目。使用方法

项目初始化

首先确定自己的项目在virtualenv环境下,所有的安装在virtualenv环境下配置

#1. 安装cookiecutter
$ pip install "cookiecutter>=1.4.0"
#2. 安装cookiecutter-django ,完成后会输入一系列的配置内容,根据这些配置来初始化项目。

$ cookiecutter https://github.com/pydanny/cookiecutter-django

#如果已经安装可以使用cookiecutter cookiecutter-django
Cloning into 'cookiecutter-django'...
remote: Counting objects: 550, done.
remote: Compressing objects: 100% (310/310), done.
remote: Total 550 (delta 283), reused 479 (delta 222)
Receiving objects: 100% (550/550), 127.66 KiB | 58 KiB/s, done.
Resolving deltas: 100% (283/283), done.
project_name [Project Name]: Reddit Clone
project_slug [reddit_clone]: reddit
author_name [Daniel Roy Greenfeld]: Daniel Greenfeld
email [you@example.com]: pydanny@gmail.com
description [Behold My Awesome Project!]: A reddit clone.
domain_name [example.com]: myreddit.com
version [0.1.0]: 0.0.1
timezone [UTC]: America/Los_Angeles
use_whitenoise [n]: n
use_celery [n]: y
use_mailhog [n]: n
use_sentry [n]: y
use_pycharm [n]: y
windows [n]: n
use_docker [n]: n
use_heroku [n]: y
use_compressor [n]: y
Select postgresql_version:
1 - 11.3
2 - 10.8
3 - 9.6
4 - 9.5
5 - 9.4
Choose from 1, 2, 3, 4, 5 [1]: 1
Select js_task_runner:
1 - None
2 - Gulp
Choose from 1, 2 [1]: 1
Select cloud_provider:
1 - AWS
2 - GCP
3 - None
Choose from 1, 2, 3 [1]: 1
custom_bootstrap_compilation [n]: n
Select open_source_license:
1 - MIT
2 - BSD
3 - GPLv3
4 - Apache Software License 2.0
5 - Not open source
Choose from 1, 2, 3, 4, 5 [1]: 1
keep_local_envs_in_vcs [y]: y
debug[n]: n

初始化完成后安装依赖文件

$ pip install requirements/local.txt

输出数据库配置文件

$export DATABASE_URL=postgres://postgres:'*******'@***.***.***.***:5432/dribbble

export DATABASE_URL=postgres://postgres:Stella1986214@121.41.8.92:5432/dribbble

export DATABASE_URL=postgres://postgres:Stella1986214@121.41.8.92:5432/cookiegulp

生成数据库

#生成数据
$ python manage.py migrate
#创建超级用户
$ python manage.py createsuperuser
#启动服务器
$ python manage.py runserver

导入pycharm

创建一个Django项目,注意下面的virtualenv选择创建好的env,保证虚拟环境的配置

完成后可以看到比较完整的项目配置

如果需要使用右上角的run和debug来运行项目,那么需要配置相应的export databaseurl

配置git导入gogs

首先在gogs上创建项目

在Terminal中初始化即可。

创建APP

生成命令:

\( django-admin.py startproject mysite
\) cd mysite
$ django-admin.py startapp my_app

2019/7/21 posted in  python

Django脚手架分析

awesome-django

很好的中文整理的django资料

cookiecutter-django

比较好的脚手架,一直在维护

代码帮助文档也比较完善,适合于研究
https://cookiecutter-django.readthedocs.io/en/latest/developing-locally.html

http://www.debugger.wiki/article/html/155222640058179

original

新一代 django 项目开发脚手架。

当你经常面临开发各种中小型项目,且需要支持微信登录、微信小程序逻辑的时候, 用这个脚手架会节省很多时间。

features
python-social-auth 提供的oauth支持, 特别对微信相逻辑定制,例如unioinid
微信小程序登录
本站 oauth, access_token 支持
简单的account逻辑
restframework
supervisor+gunicore+nginx,配置文件
cdn 图片上传,目前支持 qiniu、腾讯云
微信公众号jssdk签名
sms, 支持云片、腾讯云
图形验证码
redis 支持
二维码
cms 用户权限

适合微信开发

django-rest-pandas

Django REST Framework + pandas = A Model-driven Visualization API
Django REST Pandas (DRP) provides a simple way to generate and serve pandas DataFrames via the Django REST Framework. The resulting API can serve up CSV (and a number of other formats) for consumption by a client-side visualization tool like d3.js.

比较有意思,可以研究研究

2019/7/21 posted in  python

CrawlSpider 类

摘要连接

Scrapy基于Spider还提供了一个CrawlSpier类。通过这个类,我们只需少量代码就可以快速编写出强大且高效的爬虫。为更好使用CrawlSpider,我们需要深入到源码层面,在这篇文章中我将给出CrawlSpiderAPI的详细介绍,建议学习的时候结合源码。

目录

  • scrapy.spider.CrawlSpider类

    • 创建CrawlSpider模板
    • CrawlSpider类的属性和方法
    • 实际运用中可以自己重写的属性/方法(重点)
  • scrapy.spider.Rule类

    • Rule类的常用参数介绍
    • 实际运用中最常用参数(重点)
  • LxmlLinkExtractor(LinkExtractor)类

    • LinkExtractor类的常用参数介绍:
    • 实际运用中最常用参数(重点)

scrapy.spider.CrawlSpider类

CrawlSpider是Scrapy最常见的用于爬取规则结构网页的类,它定义了一些规则用于从当前网页解析出其他网页。

创建CrawlSpider模板

在Scrapy工程的Spider文件夹下使用命令scrapy genspider -t crawl spider_name domain创建CrawlSpider爬虫

创建好之后,会得到如下的模板(下面代码以爬取http://quotes.toscrape.com/为例)

img

CrawlSpider类的属性和方法

  • 类属性

    • rules:它是一个包含若干个Rule实例对象的列表,其中每个Rule对象(后面会介绍)定义了爬取一个网页的具体规则。当多个规则匹配同一个链接时,只有第一个规则会生效。(Rule对象将在下面做详细介绍)
  • 方法(建议结合源码理解)

    • _parse_response(self, response, callback, cb_kwargs, follow=True):该方法是CrawlSpider类的核心方法该方法有两个功能:一,调用回调函数(parse_start_url())处理response。二,调用 _requests_to_follow()方法从当前网页中构建新的请求。
    • parse(self, response):该方法重载了Spider类的parse()方法。是CrawlSpider类获得Response后第一个执行的方法。这个方法将调用_parse_response()方法,由于_parse_response()方法中执行了回调函数和规则。因此,不能重写这个方法!
    • parse_start_url(self, response):该方法接收start_url列表中url的response(起始请求),返回一个Item对象、Request对象或是它们的可迭代类型。作为回调函数,它传递给_parse_response()方法,根据需求,可进行重载。其作用相当于Spider类中的parse()方法。
    • process_results(self, response, results):该方法用于加工parse_start_url方法返回的结果,可重载
    • _requests_to_follow(self, response):该方法根据rule中的规则构建相应request。
  • 注意:

    • 不同于Spider类,CrawlSpider类中定义的parse方法有特殊作用,因此在继承CrawlSpider的子类中不能重写该方法!

实际运用中可以自己重写的属性/方法(重点)

  • rules
  • parse_start_url()
  • process_results()

scrapy.spider.Rule类

APIclass scrapy.spiders.Rule(link_extractor, callback=None, cb_kwargs=None, follow=None, process_links=None, process_request=None)

Rule类的常用参数介绍:

  • link_extractor:它是一个LinkExtractor对象(下面将介绍),定义从当前网页获取新的链接的规则。
  • callback:它接收一个回调函数(定义当前类中)的名称(注意,是函数的名称而不是引用),之后用于处理按照link_extractor提取到的链接的Response。该回调函数接收response作为其第一个参数,返回ItemRequest或它们的子类。需要注意的是,parse方法不能作为回调函数。(这和Spider类中的情况不一样!)
  • cb_kwargs:它是一个参数字典,传递给回调函数。
  • follow:它是一个bool类型,用于指定是否要继续从当前规则解析出的Response中提取链接。如果callbackNone,则follow默认为True,否则默认为False
  • process_links:它接收一个回调函数(定义当前类中)的名称(注意,是函数的名称而不是引用),用于进一步处理那些用规则提取出的链接。
  • process_request:它接收一个回调函数(定义当前类中)的名称(注意,是函数的名称而不是引用),用于处理使用当前规则的request。

实际运用中最常用参数(重点)

  • link_extractor
  • callback
  • follow

LxmlLinkExtractor(LinkExtractor)类

LinkExtractor类用于从网页(Response)中提取匹配模式(由正则表达式定义)的链接。它拥有一个extract_links方法,该方法接收一个Respose,并且返回scrapy.link.Link对象的列表。每个LinkExtractor类只被实例化依一次,但是它们的extract_links方法会被多次调用以提取不同的Response的链接。

在Scrapy中,LxmlLinkExtractor被声明为LinkExtractor,习惯上用后者称呼前者。

APIclass scrapy.linkextractors.lxmlhtml.LxmlLinkExtractor(allow=(), deny=(), allow_domains=(), deny_domains=(), deny_extensions=None, restrict_xpaths=(), restrict_css=(), tags=('a', 'area'), attrs=('href', ), canonicalize=False, unique=True, process_value=None, strip=True)

LinkExtractor类的常用参数介绍:

  • allow:定义待提取的链接的模式。它接收一个正则表达式串或是正则表示式的串序列。默认则提取所有链接。
  • deny:类似allow参数,区别只在于该参数匹配到链接都不提取。其优先级高于allow。默认无效。
  • allow_domains:它是一个字符串或字符串序列,限定满足哪些域名的网页可以被提取。
  • deny_domains:类似allow_domains,功能相反。
  • restrict_xpaths:它是一个Xpath字符串或字符串序列,定义了网页中提取链接的特定区域。定义该参数后,只有满足Xpath的网页特定区域内可以提取到链接。
  • restrict_css:功能类似restrict_xpaths。
  • tags:它是一个字符串或字符串序列,定义了网页链接的标签来源。默认为('a', 'area')
  • attrs:它是一个字符串或字符串序列,定义了网页链接的属性来源(必须是满足tags参数规定的标签的属性)。默认为('href',)
  • process_value:它是一个回调函数,可以对提取到的值(源于tags和attrs)做进一步处理(例如:从JavaScript代码进一步提取出真实的url)。

实际运用中最常用参数(重点)

  • allow
  • allow_domains
  • restrict_xpaths
  • restrict_css
  • tags
  • attrs
2019/7/16 posted in  python