DMTN-090: DAX Webservice Implementation Guide

  • Christine Banek

Latest Revision: 2018-09-24

Note

Implementation guide for DAX Webservices.

1   Abstract

This document outlines the purpose, reasoning, and suggested implementation of the DAX Webservices. The DAX Webservices are the way to get LSST data through a set of IVOA compliant services that will allow any tool or client supporting IVOA standards to query and retrieve data.

2   Having Standards

The purpose of the DAX Webservices is to provide an IVOA compliant interface to access LSST data, both catalog data (and its metadata) as well as image data.

In the DAX Webservices, we are taking the LSST tools and stack, and providing a view that IVOA compliant tools can use. The LSST way involves technology like the Butler for finding FITS files on networked disks, and querying our distributed database, QServ, which hosts our catalog.

The Butler and QServ are both very purpose built to support scientific processing and analysis of the LSST dataset, and are designed to be generic, but aren’t required to use IVOA standards. These tools also aren’t designed to be webservices, but to be run by the scientists or automation directly.

The fact that they aren’t webservices means they use things like POSIX file permissions on local disk mounts, and are running already in a specific user context. Webservices operate differently, for example taking auth headers in each request to verify permissions. The DAX Webservices will provide such a wrapper to run a service, utilizing not only Butler and QServ, but other local services, to provide an IVOA compliant interface for clients.

Many times, the DAX Webservices will use the LSST specific toolchain to back the processing of requests. For example, to do image cutouts, the image service will likely need to use the Butler to find the raw images, as well as do cutouts and other operations on the raw images, before sending them back in a compliant format.

For IVOA compliant catalog access, the standards revolve around TAP (Table Access Protocol) and VOTable to represent the results. The language of TAP queries is ADQL, which is SQL-like, but has some differences between the variant of SQL that QServ. The DAX Webservices must rewrite the query appropriately before dispatching it to QServ and awaiting the result. QServ also has a specific way of doing async queries relating to shared scans.

In the following sections, I will outline the standards we intend to implement, how we might implement them, and where the tricky parts may be.

2.1   Why are standards important?

Astronomers have been writing code for quite a long time. In the era of multi-messenger astronomy and cross-referencing datasets of multiple instruments, one way needed to be created to standardize access to the data. Otherwise, every time a new telescope is created, owners and maintainers of astronomy tools would have to add support for the new telescope, taking into account the idiosyncratic nature of that particular instrument.

By providing one standard way of doing it, we allow for one tool to use data from multiple instruments. We can also use tools developed before LSST, unchanged, to access LSST data if we want. This allows for astronomers to have working, tested tools before our telescope is even ready for this level of integration testing.

There are many things about the standards that I find confusing, or disagree with. But like the law, the standard is the standard. I’m no fan of XML, but it is better to support the standard and not love it, than try and break bad.

Each time the standard is not followed, there may be features or unintended consequences in 3rd party tools. For example, if an unknown state was returned from an API, or a required key was missing, each tool will respond and error in its own way. Some may stop working, some may ignore it, some may outright crash or throw an exception. Tools differ in their access patterns even using a standard, so we also need to do compatibility testing with tools we want to officially support for our users.

We can do things above and beyond the standard, but we should know that those things cannot interfere with what is stipulated by the standard. Some additions may be nice, but these must be measured against the impact of implementing more of the IVOA standards, which will be used by more tools rather than just our custom LSST tools.

2.2   What standards to we want to support?

We will implement an interface that supports these standards. Links are provided to give a reference set of versions to allow people to communicate about sections and page numbers of each particular specification:

Catalog oriented standards:

Image oriented standards:

User storage standards:

If you don’t know what these standards are or how they fit in, don’t worry! In the Services section, I will outline where each of these come into play.

2.3   What clients do we want to ensure compatibility with?

Some clients and tools are just part of the general ecosystem of astronomy tools. We will need to support them. We will also be building the SUIT (LSST Science Platform) on top of many of these services. The portal and the notebook aspects will both be calling the services, and possibly passing IDs to async results between them.

Here’s a list of clients:

3   Architecture

3.1   Diagram

_images/dax-diagram.png

Figure 1 Architecture Diagram for DAX Webservices.

3.2   Call Flows

3.2.1   Catalog Query

  1. Submit the ADQL Query to the TAP service endpoint via HTTP POST and receives a query ID to check for results.
  2. Database service parses the query to determine the backend for the ADQL.
  3. Request is created and put on the work queue.
  4. UWS worker dispatches the query and gathers results.
  5. Worker massages data into the correct format and marks the request complete.
  6. Caller uses the URL and ID to be redirected to the results file.

3.2.2   Catalog Metadata Query

Same as a normal catalog query, but the query uses the TAP_SCHEMA tables stored in the Oracle database.

3.2.3   Image Metadata Query

Same as a normal catalog query, but the query uses standard tables that contain image metadata stored in the Oracle database. The result is a VOTable with metadata and access URLs.

3.2.4   Image Retrieval

  1. Caller uses an Image Metadata Query to determine images they want to retrieve.
  2. Caller makes another HTTP get to the URLs returned from the Image Metadata Query.
  3. Image Service creates a ID, and puts the request on the work queue.
  4. Image Service Worker picks up the request and uses the Butler to see if that file exists.
  5. If the file does not exist, Image Service recreates that file by using the workflow engine.
  6. Once the file exists, the file is put in the object store and the worker marks the request as complete.
  7. Caller is redirected to the object store URL.

3.2.5   Image Cutouts

  1. Caller uses an Image Metadata Query to determine datasets and particular images they might want cutouts of.
  2. Caller makes a SODA request to the Image Service with parameters that determine positions and shapes of cutouts.
  3. Image Service creates an ID and puts the request on the work queue.
  4. Image Service Worker picks up the work and uses the Butler to gather and create image files it needs to process the request.
  5. Worker uses the Butler to create cutouts on those images.
  6. Worker uploads result to object store and marks request as complete.
  7. Caller uses the ID to check for results, and is redirected to the object store URL of the result.

4   Database Service

4.1   TAP 1.1 & VOTable

For querying the catalog that is hosted in QServ, we want to support Table Access Protocol (TAP) v1.1. As outlined in the spec, TAP is a standard interface to provide a query (in ADQL) and return a table (usually VOTable) with the results of that query.

The results are returned usually in VOTable format, which include metadata about the columns and datatypes in the table, as well as the data values.

In order to run queries, we use the /sync, and /async endpoints, which are required parts of TAP 1.1. There are other optional endpoints in the spec, such as /tables, /examples, and /capabilities. For a chart that contains what is required reference page 10 of the TAP spec.

4.2   Sync, Async, and UWS

According to the standard, we need to provide endpoints to run queries either sync or async. For queries submitted to the /sync endpoint, the service blocks and waits for the response to return to the caller in the response. For /async, we can return an ID that can be queried in the future to determine the results. This will be useful for long running queries where the query may take hours to run. For /async queries, the spec requires us to implement the UWS standard.

While the UWS standard does not specify how to run the jobs, it provides a RESTful way of accessing the state, checking results, and providing control over jobs, such as canceling.

4.3   TAP_SCHEMA

The IVOA standards try to not only standardize access to data, but also the discovery of that data. Section 4 of the TAP 1.1 spec outlines TAP_SCHEMA, which is required of TAP 1.1 implementations. The idea is for a caller to be able to discover the schema of what we are serving (tables, columns, and data types) to craft their queries correctly.

The further parts of section 4 of the TAP 1.1 spec (4.1, 4.2, 4.3, 4.4) outline the schema for database tables to be created that can hold metadata about the data that is accessible through the endpoint.

To use this part of the service, you can submit a query through TAP, and the names of the metadata tables and columns are well known. The results are returned in VOTable format like any other query. In this clever usage, we can have one transport to tell us about the metadata as well as the data itself, using ADQL to query the metadata.

4.4   LSST Specific Requirements

While not covered generally by any IVOA specific standard, there are a few things that we have as requirements that are more LSST specific.

4.4.1   QServ

QServ is our custom scalable database for distributed hosting of data release catalogs. QServ is based on top of MariaDB with customizations. QServ has some special performance characteristics, but to us, it means we mostly need to be compliant with its SQL variant, and be able to transform ADQL into QServ SQL. QServ also has special functionality to do full table scans, and some special endpoints to allow for queries to run async and retrieve the results later on.

4.4.2   No JOINs Across QServ and Oracle

While TAP will present the tables from QServ and Oracle as one large unified table space, we can’t allow for people to do SQL JOINS between them.

If we wanted to support this, it would be very complicated, so for now this is out of scope. If you need to do some joins, query each table with a different query and then JOIN it yourself by iterating through the data on the application side.

JOINs should be supported on all Oracle or all QServ tables though. Just JOINs between them will be disallowed.

4.4.3   Authentication and Authorization

LSST data is not all public, and scientists may have their own private datasets uploaded as well to do JOINS or other algorithmic analysis against. We need to be able to authorize each user to use the LSST DAC resources as well as protect their results from someone else trying to scoop their research. Many IVOA standards come from the era of public astronomy data, so there may be some excitement here trying to add AAA to everything.

Note

AAA needs a lot more work and deciding on hard requirements

Since we are using UNIX groups and other very POSIX level permission schemes, we need to figure out how to respect these things in our Webservices, which aren’t always impersonating the user. For example, to get a result file, it’d be much easier to check the permissions rather than try to su to that user, and see if they still have access (which brings in things like ACLs, and UNIX group mechanics). Depending on the level of auth required, we might be able to restrict this to the creator of the query, rather than their group. Either way, this will have to be determined.

4.4.4   History Database

We want a history database of queries that can be looked through. The UWS spec defines that there is a way to get a list of jobs, both pending and finished, so that is one way of accomplishing this goal. Depending on how long we want to persist this data for, we might want to back up the queries, and index them in some other interesting way, probably through some other kind of ancillary service.

Query text should be protected by auth to only allow a user to see their own queries.

4.4.5   Large Result Sets

Since LSST queries may take a long time to run, and have large results sets, we need to be able to cache large results sets (up to 5 GB of results per query) for a reasonable period of time so they can be retrieved. This may be on the other of a few days or a week, since some of the queries may be run overnight or over the weekend.

These results must also be protected so that only the user executing the query can retrieve the results. After the results are retrieved, that user can obviously do what they will with the results (such as share them). While there are data rights implications here, once the data is out of our control, it’s out of our control.

4.5   Implementation

Now that we’ve established the particulars of what we want, let’s dive into the implementation of this service now.

This service needs to:

  1. Accept queries through a TAP compliant HTTP interface.
  2. Record the query in the query history.
  3. Determine what backend those queries should be dispatched to.
  4. Rewrite original ADQL query to the SQL variant of the backend.
  5. Dispatch the query, either locally or through a pool of workers.
  6. Gather results from the query, and transform them into VOTable.
  7. Put the results in a place that the user can download.

4.5.1   TAP Compliant Interface

There are many ways to write a webservice these days, including many frameworks. We know what URIs we want to serve, /sync and /async, and that we want to serve results in XML. We need to really reference the TAP 1.1 spec for this part, implementing what we need to, such as parameters (LANG, QUERY, MAXREC) as well as wrapping the results in a VOTable format.

4.5.2   History Database

Note

We still need firm requirements on what the retention period and auth scheme should be for accessing the history database.

There are many data stores we could use for a history database. Many might even be tied to the execution of async jobs. For example, the distributed task framework celery uses RabbitMQ, Redis, MongoDB, to store results and execution status. This isn’t just used to query the history but to drive execution. These databases can also be queried directly by users, or we can add additional URIs to look through the history.

The UWS spec also mandates a way to list jobs, and get their results. This is fairly analogous to the history database functionality we want, as it lists the queries, their IDs, execution status, and result location.

4.5.3   Determine the Backend

Many specs use the TAP and VOTable standards as a way of transmitting complex data. For example, the TAP_SCHEMA table stores the metadata, and could be on a different backend than the catalog itself, which is hosted by QServ. Some user generated (Level 3) data might also be present in another database, such as Oracle or Postgres. There are also special tables for ObsTAP to look at image metadata.

The tricky part here is that if one database isn’t hosting all the tables, we need to inspect the query to determine what tables are being accessed, and then route the query to the appropriate backend. Different backends might also have different load characteristics, such as the number of running queries.

4.5.4   Query Rewriting

QServ doesn’t speak ADQL. Neither does Oracle. We need to take the ADQL query, inspect it, and rewrite it to work on the individual backends.

This may be to work around various quirks of different SQL variants and implementations (such as how keywords work, or the way of limiting results, or datatypes).

There are also some extensions to do very astronomical things, such as cone and other spatial searches, as well as dealing with different coordinate systems.

4.5.5   Query Dispatch

Once we have the final query and we know where it’s going, we are ready to send the rewritten query to the backend and start getting the results. Since these results may be very large (GBs) or very small (0 or 1 rows), we need to be able to support both cases in a performant way.

For sync queries, the caller simply waits on the HTTP connection until the results are available. For async queries, since the caller will make another request, we need to ensure that these requests will always find the results, no matter how many TAP service copies we have. This means we can’t really store results locally on the TAP service disk (also this has the possibility of filling up the disk). It is better to have a central disk or shared place, so that results can be written there, and then picked up by anyone handling getting the results. This also helps with keeping results through upgrades and transient failures.

It’s also a good idea to separate out your front ends (things taking HTTP requests) from your back end workers (which dispatch to the database). This allows for a more even distribution of load across the workers, and keeping the load on the backends (which don’t scale as easily) in check.

As we gather these results, we need to put them also in the right format, which is VOTable. This may involve some coercing of data types to VOTable data types, rather than the original backend. Once the result is written and in the correct format, we can record that the query is finished and the results are available.

Note

QServ also supports an async query mode. We should investigate this to determine where it fits in with our plans. Inevitably we will have to gather the results, and put them in a VO compliant format.

Note

We need to figure out how to properly impersonate the user making the request. Do we store their token, or use a service account and su to them?

4.5.6   Centralized Result Store

After the user has completed their query, they will want their results, which may be large. They may be downloaded more than once, so we likely want to keep the results sets around for at least a few days, to prevent needing to rerun the same query on the database.

Because of the diversity of queries and their results sizes, and not being able to know the size of the results from the query, we need to be careful about local resources. If the results were stored on the TAP service nodes, we could easily fill up the local disk, which may be as few as 20 results for 100 GB. The fragmentation of splitting the load across multiple TAP service nodes might also be bad, since the sizes of the results might be uneven, filling up some nodes and leaving others empty. We want to store all these in a central place, preferably with URL access, so we can serve the results file directly off disk.

By having one place store the results, we eliminate the problem of the client needing to contact multiple servers to find the results, or the results not existing by the time the user checks for the results.

This could easily be an S3 like object store, or an NFS volume with Apache or another web front end checking for auth on top. Given that it is simply serving up static files, this part should be relatively easy.

4.5.7   Performance, Load, and Failure Characteristics

The performance characteristics of the database server should be fairly straightforward, at least compared to what it is built on top of and completely depends on.

The overhead of processing a request, parsing the query, putting it in ADQL, and dispatching it to the server should be very quick compared to running the query. This time should be fairly constant no matter what the query is.

Running a query is completely dependent on the query (which we don’t control) and the database (which we depend on, but don’t control). Things like the load on the shared database resources from other users and other queries can’t really be predicted.

The DAX Webservices can be good stewards of these shared resources. By having a work queue with a consistent maximum number of queries in flight, we can provide an orderly way to access a limited resource, without overloading it. There is usually a sweet spot in terms of performance, where you are fully using your resources, but not thrashing, that we will hopefully discover and tune our system accordingly.

The overhead of processing the response is certainly higher than that of the request. Having to take an up to 5 GB file and transmute it from database rows into a VOTable or other format can be costly. The latency involved in such large transfers is also not to be ignored. Given that we know we have a 5 GB limit on query responses, we can ensure that our portion of the processing of the results will generally have a fixed upper bound.

Because the database service doesn’t have much internal state, and has no important data to lose, the failure characteristics are straightforward. We might fail the request, and have to retry it, or lose a result. Since we cannot keep all results for all time, it’s inevitable that some results will be unavailable after a period, and tools will simply rerun the query. Transitive failures can be retried if desired, but not required.

5   Image Service

5.1   ObsTAP

ObsTAP is the way to query and determine metadata about image data. By using the same TAP / VOTable infrastructure from the database service, a user or client can craft a query against the available metadata to discover what images exist that fulfill those criteria, and retrieve the URL to access them.

The types of queries that can be run are independent of the data being served - the standard dictates what tables and columns must exist to run queries against. This helps general discoverability, as otherwise those tables would have to be described first (probably through TAP_SCHEMA), but by having a uniform data model, this allows one query to be run against multiple ObsTAP endpoints and have it work everywhere.

In the ObsTAP spec, there are some great UML diagrams for the data model on page 13-15. Then the data model is expanded further with tables describing the database metadata. Table 1 has all the metadata that is absolutely required, containing the usual suspects such as observation id, time, type of data, ra, dec, are all there. Section 4 on page 20 actually has the TAP_SCHEMA minimal set of fields and their datatypes that can easily be dumped right into TAP_SCHEMA.tables.

For some of these fields, we will have one identifier that is present throughout, and mostly constant, such as instrument and type of data (image). For fields that change, such as RA/DEC we will need to present that as a database table. This can be the same backends that the Database Service uses for TAP_SCHEMA and other associated metadata.

Two important basic fields are the access_url, and the access_format. This tells the client what URL it can go to to retrieve the image, and what format (JPG, FITS) the image at that URL is encoded in. The format column is a string containing a standard MIME-type.

Along with image metadata, ObsTAP also supports serving and querying provenance data, although it is not required.

Note

Are we going to use ObsTAP to serve provenance data?

5.2   SIA

SIA (Simple Image Access) is a simpler way than ObsTAP to discover images based on parameters the caller provides. This isn’t done in ADQL, but via a smaller list of parameter options. The SIA metadata model is the same as the ObsCore data model, and if we have a database of the ObsCore data model, it should be easy to field SIA queries against it.

The types of query parameters of SIA are things like position, energy, time, and wavelengths. There is a list of parameters in Section 2.1 of the SIA spec, that outlines all the possible query parameters.

SIA, unlike TAP, ObsTap, and SODA, only provides a sync endpoint called query, which takes a query string or post parameters, and returns a VOTable consistent with that of ObsTAP responses (Section 3.1 SIA spec). The sync nature of the request/response is to retrieve a VOTable response, containing links to the images, not sync/async about image retrieval. This will be related to a point mentioned below about PVI availability.

5.3   SODA

SODA (Server-side Operations for Data Access) is an IVOA standard that covers the processing of server side image data before returning it to the caller. Since many of our image files are large, and the portion of the file that the caller may care about is small, this makes sense to be able to filter the data down on the server side to reduce the amount of data transferred, along with the latency and cost of such a transfer. Another common use case is to create a cutout that covers multiple raw images (such as PVIs) to create a mosaic image that has the cutout and has stitched together the edges of the individual images to create one seamless image.

By allowing a user to select positional regions using the POS argument, different regions can be selected, such as CIRCLE, RANGE, and user defined shapes via POLYGON. To find the image with the correct filter, the user can use the BAND parameter, to provide a range of wavelengths to return.

Like the TAP service, SODA specifies a sync an async resource, of which you need at least one. Async behaves as a UWS service, just like TAP, and can provide an ID that can be later retrieved for large result sets.

Depending on the arguments, one query can provide multiple image results, for example looking at multiple bands, or drawing multiple CIRCLEs.

Note

It looks like SODA allows for us to also do our own custom parameters, to allow for more operations to happen. Other than the cutouts defined by the spec, what server side transformations do we need?

5.4   LSST Specific Requirements

5.4.1   Images we are serving

The standards mentioned previously can be used to host any particular image data, from any instrument. For LSST, we have two types of images we’d like to serve through these endpoints and queries:

  1. PVIs - Processed Visit Images
  2. Multiple sets of coadds - Created by Coadding PVIs.

Each of these will have images per band, and covering the LSST footprint. There are also multiple different sets of Coadds using different addition methods and selections of raw data.

Note

How to multiple data releases come into play when handling image metadata? Should this be a different dataset id?

5.4.2   PVI Retention and Virtual Products

Due to cost and space constraints, the current plan does not involve storing all the PVIs on disk. There is only a 30 day moving window of availability for these images while they are processed and can be easily read off disk.

After this 30 day window, additional work would need to happen to be able to recreate the PVI file, which could then be served to the caller. This work would involve having to read off tape (or hopefully, a disk) the raw image components, then use the workflow system to tell it to create the PVIs. While most of this logic is out of scope of this document, the important point is that this may take minutes and possibly even hours before an image can be served.

This is also true of other processing intensive operations, such as looking at different sets of coadds that might not always be on disk.

Because of these reasons, doing anything with images synchronously is probably a bad idea.

5.4.3   Authentication and Authorization

Users will have to be authenticated, and authorized (with data rights) to query these services and retrieve image data. This security model may be simpler to that of the TAP service, because people will likely not be uploading their own images to be served by the SIA, SODA, or ObsTAP interfaces. This means that there is generally a consistent level of protection needed that does not vary per user - everyone has the same access to all the image data, as all the image data is covered by LSST data rights rules.

That being said, ObsTAP does support a field called data_rights, which allows us to say that our dataset is either public, secure, or proprietary (ObsCore B.4.4). This will likely be one flag per data release, which will either be proprietary, then public after it is released.

5.4.4   History Database

While it is not mentioned in the requirements, we might want to extend the idea of the history database to encompass queries to the image service, such as ObsTAP, SODA, and SIA queries. Because of the authorization model outlined above, the results are less likely to need to be secured between users, allowing for caching and result reuse to be higher and easier to accomplish in a secure manner.

Either way, we will want to audit the access logs to this service, and attempt to determine usage patterns, to improve performance.

Note

What are our requirements for public history of image requests?

5.4.5   Large Result Sets

Because of the large size of the LSST data, including the images, we will want to ensure that queries are limited to a reasonable number of results, to not put undue load onto the system.

Since we have to support async queries to SODA, and because those jobs may take a while to run, it makes sense to use the same centralized results backend to store the data and provide URLs to objects in that backend.

5.4.6   Image Metadata

There will be a visit table that contains all the visits, and metadata about PVIs. This would be ideal if it’s in the ObsCoreDM format so it can directly be queried against using ObsTAP. Even if it’s not exactly in the same format, we’ll need to provide some kind of ObsTAP compliant view of that data to allow for queries, since the metadata model has to be in a specific format to follow the standard.

We will also need tables that contain the metadata about all the coadds, so they can also be discovered, even though it’s not a visit at all, and therefore doesn’t belong in the visit table. We might have to virtually stitch these two tables (one containing PVI metadata, and one containing coadd metadata) together somehow to allow a unified interface for querying through one table.

This metadata also needs to exist for things that aren’t currently on disk, because they are virtual products. The fact that they exist in this database lets us know that they can be created, and at one time, were created. When someone queries these products, we need to create them on demand.

Note

The current definition of the visit tables are on Github

More time should be spent making sure that we have everything we need in the visit metadata.

Note

How are we currently planning on doing coadd metadata?

Seems like we might want to use a different dataset ID to refer to coadds, as that is how SODA determines what raw images to use?

5.5   Implementation

5.5.1   Querying Metadata and Image Discovery

Some of the implementation here gets to be shared with the Database Service, as ObsTAP is making sure that certain tables exist in a certain format and can be queried from our TAP service.

First, we need to ensure that we have the proper metadata, and it is available via the standards compliant queries. Then we use the same TAP service described as above, using its sync or async endpoints to retrieve a VOTable containing image metadata. This image metadata contains URLs that can be used to access these images.

For SIA compatibility, we can run this on top of the current ObsTAP implementation because for each SIA query it can be mapped into an ObsTAP query, and the response is of the same format (VOTable). SIA only supports sync though, so it should only be for short queries. Again, the sync part is only relating to the query, but the images might also not be available for some time, even if there is an access_link provided in the response. This may break SIA clients.

5.5.2   Retrieving Images

Now we know what images exist, the types and formats of those images, and we have URLs to query them. Now we can either download PVIs or coadds, or do server side processing such as cutouts to receive a processed image via SODA on those PVIs or coadds.

Both of these types of requests can be served by one service, and that server uses the Butler as its backend for retrieving images and doing simple processing such as cutouts.

If the URL presented is not a SODA request, we can say that this is a request asking directly for a full image (either PVI or Coadd). We use the URL to map this back to a way that the Butler can retrieve the image using its known mappings. Once we find the file on disk (or network disk), we serve it up directly to the user. If the file doesn’t exist, we can create it using the workflow engine, but the image might not be available for some time. For direct GETs, we might need to use HTTP control to tell it to try again later, and that the image isn’t ready yet. Most of the standards assume images are all accessible in short order if they exist in the metadata.

Note

For direct image access without processing, standards assume files are available immediately, how do we do this async?

If the URL is a SODA request, then we get to work. First, we process the query and pass the parameters to the Butler, which will find the images, stitch them together, and attempt the cutout. This may take time, because the PVIs or coadds virtually exist, be a request that covers a large space, or has multiple cutouts requested.

SODA allows for async operations though, so we know we can tell our caller to call again later to get their result no matter how long it takes.

Because the resulting files can be large, we can upload or copy them to shared storage in an object store, and have the image server redirect HTTP requests for finished work items to their URL in the object store. In this way, we can split up the workers and the servers and scale them up and down independently.

5.5.3   Performance, Load, and Failure Characteristics

For the image metadata portion of the system, these queries will be run against the TAP service, and have the same performance and load characteristics as noted in that section.

For the image retrieval and processing portion of the image service, we have it a bit easier. Much of the performance will be related to the speed of access to images, and if they already exist or are virtual products. For the files that exist, we will need to copy them off of a network share, which is a shared resource, which could be a bottleneck under heavy load.

Processing for creating virtual products will likely involve the workflow engine, and having to be queued and executed there. This is also a shared resource, so depending on load from other portions of the system, this could be slow and add latency to the end user.

For processing cutouts and doing mosaics, we will likely use the Butler and local CPU processing to create those products. This means we need to provision the CPU correctly - not too small so that big jobs take a long time, but not having a lot of unused resources on a worker. If we have workers that have too much CPU, we could always reduce the CPU requirements for each worker and have more workers to increase throughput.

Since the image service is just a proxy and processing layer on top of the existing data, there is no risk that the image server could destroy or lose data. The data is persisted at a lower level and the image server doesn’t require permissions to delete data. If the service goes down, the problem is that the data is inaccessible until it is restored. If the service itself doesn’t have to handle persistent storage (using an object store instead), then we don’t have to worry about persisting previous results between deploys.

6   Further considerations

6.1   Deployment and Operations

Since both the image service and the database service don’t require a lot of state, we can easily run multiple copies of all the services at once. These different instances can be different versions and isolated from each other.

This means that to help do upgrades and deployments, we could easily keep the current version running, deploy the new version and do checkouts and testing, then update the nginx ingress rules to point to the new version. This means we don’t have to take downtime to do a deployment.

The state that may make this tricky are the requests that are queued or in-flight, and the history database itself. For requests that are already satisfied, but having their data put in the object store, the results are still accessible even if the service instance that created it might be taken down.

For requests that are queued, they will simply be delayed. For in-flight requests, we can either drain the worker pool (stop taking new requests, finish what you have), or just kill the workers and have an automatic retry for failures that look like they are technology related (disk issues, network reconnects, etc.).

If we want to do none of these, any user or client can simply re-run the query and we will start over again from scratch.

6.2   Testing

Testing the services should be fairly straightforward. There are a few types of testing we need to consider:

  1. Standards compatibility testing - we need to be compliant to the standard.
  2. Query testing - testing normal paths and edge cases via specific targeted queries.
  3. Robustness - ensuring our services operate normally and have good availability.

Due to the nature of being a generally stateless proxy, most tests for testing standards compatibility and individual queries can be done anywhere, and are easily repeatable and reproducible. While some of these standards are complex, they generally don’t have a lot of API surface area (endpoints to call). Combined with being stateless, this means we should be able to easily reproduce issues if we have the query string, even on another test instance from production.

For standards testing, at least against TAP, there is taplint, which should be able to help test against the standard.

For query testing, we should try to run some queries that we think will be common for the use cases of the science platform. As people report issues using the platform, we should be logging the query string and result codes. Any queries that fail due to some bug should be investigated, and that particular query string can be added to a list of tests to run to check for regressions. Since each test is essentially just a query, and making sure the response hasn’t changed, we can use a hash of the results, or check the results for particular fields to validate that it is the same.

One type of testing that may have timing issues in it is the general robustness. We need to make sure things like deploys and upgrades work without issue, and hopefully, without downtime. As nodes go down in our kubernetes cluster, or we scale up or down, we may run into bugs and issues, especially with kubernetes like things. These should be worked through with help of NCSA or the kubernetes admins.

6.3   Retention Policy of Results

Currently the retention policy for results has not be defined and no requirements have been proposed. Obviously we need to retain results at least until the user has had a chance to retrieve them. Once the result has been obtained, the user may need to retrieve it again for some reason. Given that it may have taken hours to comb through the large LSST dataset, we might not want to throw away that result so quickly.

On the other hand, with large (5 GB) result files, we can’t just keep all the results of all time. There needs to be a balance.

There are some obvious ways of doing this:

  1. Have enough disk space to comfortably have a window of X days before your result is deleted. X could be 5 days, a month, etc. We probably won’t know until we know the usage pattern, as if there are a lot of queries in a short time, we might exhaust our space before X days is up.
  2. Keep X GBs of past results. This way you can expire results that are the oldest first, and keep our cache at a constant size. This implies that all the users are in the same bin, so if one user is making most of the queries, they will take most of the cache. But assuming they are using all these query results for doing good things, that is probably the most efficient way.
  3. X GBs per user. We could do this, but it’s likely that we won’t have a disk big enough to have all users at full quota. Like a gym, we have to assume some people won’t use their allotment.

It’s likely we’ll have some kind of combination of business rules of these strategies, and we want to keep this as an operational sidecar script that can be easily tweaked and run by hand if necessary. If we use an object store, this can easily be run with appropriate credentials against the object store to clean it out on demand, or even hand pick certain results to delete.

Note

These are just guesses. Determine actual requirements here.