summaryrefslogtreecommitdiff
path: root/web-client-with-cache.org
diff options
context:
space:
mode:
Diffstat (limited to 'web-client-with-cache.org')
-rw-r--r--web-client-with-cache.org542
1 files changed, 542 insertions, 0 deletions
diff --git a/web-client-with-cache.org b/web-client-with-cache.org
new file mode 100644
index 0000000..17d3f4f
--- /dev/null
+++ b/web-client-with-cache.org
@@ -0,0 +1,542 @@
+#+title: Web Client with Cache {{{version}}}
+#+subtitle: for version {{{version}}}, {{{updated}}}
+#+author: Vivien Kraus
+#+email: vivien@planete-kraus.eu
+#+options: ':t toc:t author:t email:t
+#+language: en
+
+#+macro: version @@texinfo:@value{VERSION}@@
+#+macro: updated @@texinfo:@value{UPDATED}@@
+#+texinfo_filename: web-client-with-cache.info
+#+texinfo_header: @syncodeindex pg cp fn
+#+texinfo_dir_category: The Algorithmic Language Scheme
+#+texinfo_dir_title: Web Client with Cache: (web-client-with-cache)
+#+texinfo_dir_desc: Using the cache
+#+texinfo_printed_title: Web Client with Cache
+
+This manual is for the Guile Web Client with Cache (version
+{{{version}}}, {{{updated}}}).
+
+* Copying
+ :PROPERTIES:
+ :COPYING: t
+ :END:
+This manual is for the Guile Web Client with Cache.
+
+Copyright \copy 2021 Vivien Kraus.
+
+#+begin_quote
+Permission is granted to copy, distribute and/or modify this document
+under the terms of the GNU Free Documentation License, Version 1.3 or
+any later version published by the Free Software Foundation; with no
+Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
+Texts. A copy of the license is included in the section entitled "GNU
+Free Documentation License".
+
+The program present in this document is free software: you can
+redistribute it and/or modify it under the terms of the GNU General
+Public License as published by the Free Software Foundation, either
+version 3 of the License, or (at your option) any later version.
+
+This program is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program. If not, see [[https://www.gnu.org/licenses/]].
+#+end_quote
+
+* Using the cache
+#+cindex: using the cache
+
+The API is defined in the =(web client with-cache)= module. It
+consists of one function:
+
+#+begin_src emacs-lisp :noweb yes :exports results :results drawer
+ (format "
+ ,#+begin_export texinfo
+ @defun with-cache [#:http-get] [#:current-time] [#:box]
+ <<with-cache-docstring>>
+ @end defun
+ ,#+end_export")
+#+end_src
+
+* Covoiturage
+The goal here is to wrap an expensive long-running pure function so
+that calling it multiple times with the same arguments until it has
+not returned reuses the return value. The name is not pretty, if you
+find a better one please tell me.
+
+#+name: covoiturage-docstring
+#+caption: Docstring of the covoiturage function
+#+begin_src texinfo
+ Wrap @var{f} so that simultaneous calls use the same return value. By
+ simultaneous, we mean parallel calls with the same value for the first
+ argument (compared with @code{equal?}) during the long-running
+ evaluation of @var{f}.
+
+ @var{f} takes at least one argument, the key. @var{f} may throw or
+ return. It is executed in a Future (from @code{(ice-9 futures)}, so
+ side effects should be synchronized somehow.
+#+end_src
+
+Internally, we use an atomic box. The atomic box contains an alist
+from key values to promises to futures. Futures are cool, but once
+they are constructed they start running immediately.
+
+However, if the lockless synchronization failed because of a
+concurrent update (for instance, inserting a new query), we should not
+start f at all before having updated the atomic box. Otherwise, for
+http-get, an unwanted connection to the server will still be
+attempted.
+
+To work around that, we use a promise to a future, so the
+future starts when the promise is forced -- thus, either by our
+thread, after we have synchronized the box ourselves; or by another
+thread, because it has already been added to the box.
+
+Once the function has returned, we notify all the other waiting
+threads and remove everything from the box. Function
+[[fn-covoiturage-remove]] does the removal.
+
+#+name: fn-covoiturage-remove
+#+caption: Remove a completed entry from a box
+#+begin_src scheme :eval no
+ (define (remove box key)
+ (let* ((old (atomic-box-ref box))
+ (new (filter (lambda (record)
+ (not (equal? (car record) key)))
+ old))
+ (discarded
+ (atomic-box-compare-and-swap! box old new)))
+ (unless (eq? discarded old)
+ (remove box key))))
+#+end_src
+
+The application of function /f/ may either return (multiple values),
+or throw. We want to forward both cases to all the callers, so we need
+to intercept the throws. Function [[fn-covoiturage-wrap-f]] does that:
+call f, and cons either ='ok= or ='error= and the return or throw
+values.
+
+#+name: fn-covoiturage-wrap-f
+#+caption: Wrap a thunk so that we know whether it threw or returned
+#+begin_src scheme :eval no
+ (define (wrap-f f)
+ (catch #t
+ (lambda ()
+ (call-with-values f
+ (lambda args
+ (cons 'ok args))))
+ (lambda error
+ (cons 'error error))))
+#+end_src
+
+Symmetrically, when we want to report the value, we need to apply
+either =throw= or =values=. This is the job of [[fn-covoiturage-report]].
+
+#+name: fn-covoiturage-report
+#+caption: Report the computed values
+#+begin_src scheme :eval no
+ (define (report result)
+ (case (car result)
+ ((ok) (apply values (cdr result)))
+ ((error) (apply throw (cdr result)))))
+#+end_src
+
+There are now two cases when we query a key: either there is an entry
+in the cache, in which case we just remove that, or we need to create
+a new future. Function [[fn-covoiturage-create]] will create the job (it
+will not start until it has been added to the cache). We note that if
+the return value should be cached for longer, the caching should
+happen before /f/ has finished executing.
+
+#+name: fn-covoiturage-create
+#+caption: Create a new job (don't start it yet)
+#+begin_src scheme :eval no
+ (define (create box key f args)
+ (delay
+ (make-future
+ (lambda ()
+ (let ((ret (wrap-f (lambda () (apply f key args)))))
+ (remove box key)
+ ret)))))
+#+end_src
+
+The query function works atomically: it first reads the cache, use it
+if it is present, and otherwise add an entry with a job and start the
+job. If it can't do that atomically, it retries. That's function
+[[fn-covoiturage-do-query]].
+
+#+name: fn-covoiturage-do-query
+#+caption: Do query the cache
+#+begin_src scheme :eval no
+ (define (do-query box f key args)
+ (let ((old (atomic-box-ref box)))
+ (if (assoc-ref old key)
+ (report (touch (force (assoc-ref old key))))
+ (let* ((job (create box key f args))
+ (new (acons key job old)))
+ (let ((discarded
+ (atomic-box-compare-and-swap! box old new)))
+ (if (eq? discarded old)
+ ;; The box is updated, so the future can be
+ ;; forced and f starts running
+ (report (touch (force job)))
+ ;; Concurrent update, retry.
+ (do-query box f key args)))))))
+#+end_src
+
+Finally, we wrap everything in a public function that will manage the
+state (the atomic box). The final function is [[fn-covoiturage]].
+
+#+name: fn-covoiturage
+#+caption: The final function definition
+#+begin_src scheme :eval no :noweb no-export
+ (define-public (covoiturage f)
+ "\
+ <<covoiturage-docstring>>"
+ (let ((box (make-atomic-box '())))
+ <<fn-covoiturage-remove>>
+ <<fn-covoiturage-wrap-f>>
+ <<fn-covoiturage-report>>
+ <<fn-covoiturage-create>>
+ <<fn-covoiturage-do-query>>
+ (lambda (key . args)
+ (do-query box f key args))))
+#+end_src
+
+* Web caching
+While we still use an atomic box for this cache, it is more complex
+than covoiturage. We need to fix HTTP headers, parse the responses,
+manipulate time.
+
+** Filter =#:header= keyword argument
+
+Let's start with the headers. We need a function to parse keyword
+arguments, extract the non-ignored headers and the non-header keyword
+arguments. The function [[fn-filter-keyword]] extracts the headers, and
+[[fn-filter-headers]] removes the unwanted ones.
+
+#+name: fn-filter-keyword
+#+caption: Filter a keyword from a keyword list
+#+begin_src scheme :eval no
+ (define (filter-keyword kw args)
+ (define (search args-kept value args)
+ (if (null? args)
+ (values value (reverse args-kept))
+ (let ((next-keyword (car args))
+ (next-value (cadr args))
+ (rest (cddr args)))
+ (if (eq? next-keyword kw)
+ (search args-kept (or value next-value) rest)
+ (search
+ ;; args-kept is in reverse order!
+ (cons* next-value next-keyword args-kept)
+ value
+ rest)))))
+ (search '() #f args))
+#+end_src
+
+#+name: fn-filter-headers
+#+caption: Remove headers for which we have the responsibility
+#+begin_src scheme :eval no
+ (define (filter-headers headers)
+ (filter (lambda (header)
+ (case (car header)
+ ((if-none-match if-modified-since) #f)
+ (else #t)))
+ headers))
+#+end_src
+
+** Time management
+
+Now, it's time for time management. We need different auxiliary
+functions:
+1. Add seconds to a date, which is polymorphic enough to allow SRFI-19
+ dates and SRFI-19 times;
+2. Compare two dates or times.
+
+*** Adding seconds to a date
+Function [[fn-date-add]] adds a number of seconds to a time or a date, and
+returns a date.
+
+#+name: fn-date-add
+#+caption: Add seconds to a date
+#+begin_src scheme :eval no
+ (define (date-add date seconds)
+ (let* ((duration (make-time time-duration 0 seconds))
+ (time (if (time? date)
+ date
+ (date->time-utc date)))
+ (result (add-duration time duration)))
+ (time-utc->date result)))
+#+end_src
+
+*** Compare two dates/times
+This is the job of functon [[fn-date-geqp]]. If the arguments are dates,
+they are first compared to times.
+
+#+name: fn-date-geqp
+#+caption: Compare two dates
+#+begin_src scheme :eval no
+ (define (date>=? past future)
+ (when (date? past)
+ (set! past (date->time-utc past)))
+ (when (date? future)
+ (set! future (date->time-utc future)))
+ (time>=? past future))
+#+end_src
+
+** Fix the responses
+There are different things we need to do to responses:
+1. Add a date to every response that does not have one;
+2. Estimate an expiration date for a response, based on either the
+ Expires header or its Date and Cache-Control information;
+3. Merge the headers for two responses.
+
+*** Ensure that a response has a date
+If the server did not set a date, we set it from the client
+clock. This is the job of [[fn-response-with-date]].
+
+#+name: fn-response-with-date
+#+caption: Ensure that the response has a date. Use the client clock if needed.
+#+begin_src scheme :eval no
+ (define (response-with-date response current-time)
+ ;; current-time is only used if response does not have a date.
+ (let ((date (response-date response (time-utc->date current-time)))
+ (other-headers
+ (filter (lambda (header)
+ (not (eq? (car header) 'date)))
+ (response-headers response))))
+ (build-response
+ #:version (response-version response)
+ #:code (response-code response)
+ #:reason-phrase (response-reason-phrase response)
+ #:headers (acons 'date date other-headers)
+ #:port (response-port response))))
+#+end_src
+
+*** Estimate the expiration date from the response
+The expiration date can be:
+- the max-age value of the cache control after the response date, if
+ the cache control is present and has a max-age;
+- the value of the Expires header;
+- =#f= if the others failed.
+
+This is what [[fn-estimate-response-expires]] does.
+
+#+name: fn-estimate-response-expires
+#+caption: Estimate the expiration date of a response
+#+begin_src scheme :eval no
+ (define (estimate-response-expires response)
+ (let ((cache-control (response-cache-control response '()))
+ (expires (response-expires response))
+ (date (response-date response)))
+ (let ((max-age (assoc-ref cache-control 'max-age)))
+ (cond (max-age
+ (date-add date max-age))
+ (expires expires)
+ (else #f)))))
+#+end_src
+
+*** Merge the headers of two responses
+The difficult part is that this will happen every time a response is
+refreshed from a 304 Not Modified response. So if we just merge the
+alists, the header alist will grow arithmetically. The trick is to
+use a hash table to remove duplicate header names. This explains
+function [[fn-merge-headers]].
+
+#+name: fn-merge-headers
+#+caption: Merge the headers for two responses
+#+begin_src scheme :eval no
+ (define (merge-headers most-important less-important)
+ (let ((h+ (response-headers most-important))
+ (h- (response-headers less-important)))
+ (let* ((alist (append h+ h-))
+ (table (alist->hash-table alist)))
+ (hash-table->alist table))))
+#+end_src
+
+** Wrap =http-fetch= so as to store the response
+Now we can start using the =covoiturage= function. The inner function
+will call =http-get=, but between the moment we have the response and
+response body and the moment we return them, we add them to a web
+cache.
+
+The function [[fn-with-cache-add-to-box]] atomically adds the response and
+response-body to the cache, while ensuring there is a Date header.
+
+#+name: fn-with-cache-add-to-box
+#+caption: Add a response and its body to the cache
+#+begin_src scheme :eval no
+ (define (with-cache-add-to-box box key response body current-time)
+ (set! response (response-with-date response current-time))
+ (let* ((old (atomic-box-ref box))
+ (new (acons key (cons response body)
+ (filter
+ (lambda (cell)
+ (not (equal? key (car cell))))
+ old)))
+ (discarded (atomic-box-compare-and-swap! box old new)))
+ (if (eq? discarded old)
+ (values response body)
+ (with-cache-add-to-box box key response body current-time))))
+#+end_src
+
+The function [[fn-with-cache-smart-http-get]] takes more argument than
+regular =http-get=. It takes a base response and base response body,
+which are the previously cached versions of the response. Then, it
+proceeds with =http-get=. If the result is a 304 Not Modified
+response, then it updates the cached response with the headers from
+the new response and returns the previous response. Also, it updates
+the cache just before returning.
+
+#+name: fn-with-cache-smart-http-get
+#+caption: =http-get=, but compare with the previous cached responses
+#+begin_src scheme :eval no
+ (define (with-cache-smart-http-get base-response base-response-body
+ box current-time http-get key . args)
+ (receive (response response-body)
+ (apply http-get key args)
+ (set! response (response-with-date response current-time))
+ (when (and base-response (eq? (response-code response) 304))
+ (set! response
+ (build-response
+ #:version (response-version base-response)
+ #:code (response-code base-response)
+ #:reason-phrase (response-reason-phrase base-response)
+ #:headers (merge-headers response base-response)
+ #:port (response-port response)))
+ (set! response-body base-response-body))
+ (with-cache-add-to-box box key response response-body current-time)))
+#+end_src
+
+We can now wrap =with-cache-smart-http-get= within covoiturage, and
+focus on the main function. The main function takes the box,
+current-time and smart http-get function within covoiturage, looks up
+the response in the cache, and returns it as is if it is still
+valid. Otherwise, the smart http-get is called. This is the job of
+function [[fn-with-cache-query]].
+
+#+name: fn-with-cache-query
+#+caption: The main function for the cache
+#+begin_src scheme :eval no
+ (define (with-cache-query box current-time smart-http-get http-get key . args)
+ (let* ((old (atomic-box-ref box))
+ (resp (assoc-ref old key)))
+ (receive (request-headers other-request-args)
+ (filter-keyword #:headers args)
+ (unless request-headers
+ (set! request-headers '()))
+ (set! request-headers (filter-headers request-headers))
+ (if resp
+ (let ((response (car resp))
+ (response-body (cdr resp)))
+ (let ((expires (estimate-response-expires response)))
+ (if (and expires (date>=? expires current-time))
+ ;; The response is cached and not expired
+ (values response response-body)
+ ;; The response is cached but expired
+ (let ((etag (response-etag response))
+ (last-modified (response-last-modified response)))
+ (when etag
+ (set! request-headers
+ (acons 'if-none-match (list etag) request-headers)))
+ (when last-modified
+ (set! request-headers
+ (acons 'if-modified-since last-modified request-headers)))
+ (apply smart-http-get
+ response response-body box current-time http-get key
+ #:headers request-headers
+ other-request-args)))))
+ ;; The response is not cached
+ (apply smart-http-get
+ #f #f box current-time http-get key
+ #:headers request-headers
+ other-request-args)))))
+#+end_src
+
+** Putting everything together
+The final function is [[fn-with-cache]]. It takes its docstring from
+[[with-cache-docstring]].
+
+#+name: with-cache-docstring
+#+caption: Docstring of the http-get with cache
+#+begin_src texinfo
+ Call @var{http-get} with a cache. @var{http-get} is a procedure that
+ takes an URI, and optionally some headers and other arguments, and
+ either fails or returns two values: a response and a response body.
+
+ The following headers are set by the caching function and are thus
+ ignored from the additional arguments:
+
+ @itemize
+ @item @code{If-None-Match}
+ @item @code{If-Modified-Since}
+ @end itemize
+
+ For cache validation without connecting to the server,
+ @var{current-time} is needed. It should be a SRFI-19 time or date. By
+ default, use the client system clock.
+
+ The returned function takes the same arguments as @var{http-get} from
+ @code{(web client)}, and also returns a response and a response body,
+ however some headers are filtered, and responses with 304 will be
+ converted to 200 with the help of the cache, if there is some.
+
+ The optional argument @code{#:box} can be used to use a specific
+ atomic box instead of one created just for the occasion.
+#+end_src
+
+#+name: fn-with-cache
+#+caption: HTTP GET with cache
+#+begin_src scheme :eval no :noweb no-export :tangle web/client/with-cache.scm :mkdirp t
+ (define-module (web client with-cache)
+ #:use-module (web client)
+ #:use-module (web response)
+ #:use-module (srfi srfi-1)
+ #:use-module (srfi srfi-19)
+ #:use-module (srfi srfi-69)
+ #:use-module (ice-9 atomic)
+ #:use-module (ice-9 receive)
+ #:use-module (ice-9 optargs)
+ #:use-module (ice-9 threads)
+ #:use-module (ice-9 futures))
+
+ <<fn-covoiturage>>
+
+ (define*-public (with-cache
+ #:key
+ (http-get http-get)
+ (current-time current-time)
+ (box #f))
+ "\
+ <<with-cache-docstring>>"
+ (let ((box (or box (make-atomic-box '()))))
+ <<fn-filter-keyword>>
+ <<fn-filter-headers>>
+ <<fn-date-add>>
+ <<fn-date-geqp>>
+ <<fn-response-with-date>>
+ <<fn-estimate-response-expires>>
+ <<fn-merge-headers>>
+ <<fn-with-cache-add-to-box>>
+ <<fn-with-cache-smart-http-get>>
+ (define smart-http-get
+ (covoiturage with-cache-smart-http-get))
+ <<fn-with-cache-query>>
+ (lambda (uri . args)
+ (apply with-cache-query box (current-time) smart-http-get http-get uri args))))
+#+end_src
+
+* GNU Free Documentation License
+ :PROPERTIES:
+ :APPENDIX: t
+ :END:
+#+texinfo: @include fdl.texi
+* Index
+ :PROPERTIES:
+ :INDEX: cp
+ :END: