Installation

Standalone Mode

Install using pip:
```
pip install cirrocumulus
```
Launch via the command line:
```
cirro launch <path_to_dataset>
```
Full list of command line options:
usage: cirro launch [-h] [--spatial [SPATIAL ...]] [--markers [MARKERS ...]] [--host HOST] [--port PORT] [--no-open] [--results RESULTS] [--ontology ONTOLOGY] [--tmap [TMAP ...]] dataset [dataset ...]
Positional Arguments

dataset

Path(s) to dataset in h5ad, loom, Seurat, TileDB, zarr, or STAR-Fusion format. Separate multiple datasets with a comma instead of a space in order to join datasets by cell id

Named Arguments

--spatial

Directory containing 10x visium spatial data (tissue_hires_image.png, scalefactors_json.json, and tissue_positions_list.csv) or a directory containing image.png, positions.image.csv with headers barcode, x, and y, and optionally diameter.image.txt containing spot diameter

--markers

Path(s) to JSON file that maps name to features. For example {“a”:[“gene1”, “gene2”], “b”:[“gene3”]}

--host

Host IP address

--port

Server port

Default: 5000

--no-open

Do not open your web browser

Default: False

--results

URL to save user computed results (e.g. differential expression)

--ontology

Path to ontology in OBO format for annotation

--tmap

Path(s) to transport maps directory computed with WOT

Server Mode

Cirrocumulus can also be run in server mode in order to serve multiple users and datasets securely. The cirrocumulus server can be deployed on a cloud VM, an on-premise machine, or on Google App Engine.

Install cirrocumulus using pip or docker
Optional additional setup to enable authentication and authorization via Okta or Google:
- Install additional libraries:
  Google: pip install google-auth
  
  Okta: pip install okta-jwt-verifier
- Set environment variables:
  CIRRO_AUTH_CLIENT_ID: to your Okta or Google client id
  
  CIRRO_AUTH_PROVIDER: to either okta or Google.
  
  CIRRO_AUTH_ISSUER (okta only). The URL of the authorization server that will perform authentication. All Developer Accounts have a “default” authorization server. The issuer is a combination of your Org URL (found in the upper right of the console home page) and /oauth2/default. For example, https://dev-1234.oktapreview.com/oauth2/default.
- See Okta documentation for creating custom app integrations with Okta.
- Visit the Google OAuth 2.0 documentation to obtain OAuth 2.0 credentials for Google.
- Please note that https is required if using Okta or Google authentication
Additional libraries needed for cloud storage:
- Amazon S3: pip install s3fs
- Google Cloud Storage: pip install gcsfs
- Microsoft Azure: pip install adlfs
Install MongoDB and start the MongoDB server. Note that MongoDB compatible databases such as DocumentDB can also be used.
Start the server via the command line:
```
cirro serve
```
Use the prepare_data command to freeze an h5ad, loom, or Seurat file in cirrocumulus format.
Add a dataset and optionally share with dataset with collaborators. If you enabled authentication, then no users are allowed to add datasets to cirrocumulus. Set the property “importer” to true on an entry in the users collection to enable that user to import datasets. For example, the following screenshot in MongoDB Compass shows that the user with the email address me@gmail.com, is allowed to add datasets to cirrocumulus:

You can programmatically add a dataset by posting to the /api/dataset endpoint:

curl http://localhost:5000/api/dataset -X POST -F 'name=my_name' -F 'url=data/my_dataset_path' -F 'description=my_desc'  -F 'species=Mus musculus'

Additional customization via environment variables:
- CIRRO_MOUNT: For mounting a bucket locally. Comma separated string of bucket:local_path. Example s3://foo/bar:/fsx
- CIRRO_SPECIES: Path to JSON file for species list when adding new dataset
- CIRRO_MIXPANEL: Mixpanel project token for event tracking. Currently, only the open dataset event is supported.
Optionally, set the default view for a dataset by adding the field “defaultView” to your dataset entry in the database. You can configure the cirrocumulus state in the app, then use “Copy Link” to get the JSON configuration for “defaultView”. Example:
Full list of command line options:
usage: cirro serve [-h] [--db_uri DB_URI] [-w WORKERS] [-t TIMEOUT] [-b BIND] [--footer FOOTER] [--header HEADER] [--upload UPLOAD] [--results RESULTS] [--ontology ONTOLOGY]
Named Arguments

--db_uri

Database connection URI

Default: “mongodb://localhost:27017/cirrocumulus”

-w, --workers

The number of worker processes

-t, --timeout

Workers silent for more than this many seconds are killed and restarted

Default: 30

-b, --bind

Server socket to bind. Server sockets can be any of $(HOST), $(HOST):$(PORT), fd://$(FD), or unix:$(PATH). An IP is a valid $(HOST).

Default: “127.0.0.1:5000”

--footer

Markdown file to customize the application footer

--header

Markdown file to customize the application header

--upload

URL to allow users to upload files

--results

URL to save user computed results (e.g. differential expression) to

--ontology

Path to ontology in OBO format for annotation

Static Website

Clone the cirrocumulus repository:

git clone https://github.com/klarman-cell-observatory/cirrocumulus.git

Change to cirrocumulus directory:
```
cd cirrocumulus
```
Install typescript:
```
yarn global add typescript
```
Install JavaScript dependencies:
```
yarn install
```

Prepare dataset(s) in jsonl format:

cirro prepare_data pbmc3k.h5ad --format jsonl

Build JavaScript:
```
REACT_APP_STATIC=true yarn build
```

Create the file datasets.json in the build directory:

[
    {
        "id": "pbmc3k",
        "name": "pbmc3k",
        "url": "pbmc3k/pbmc3k.jsonl"
    }
]

Move your dataset files to build:
```
mv pbmc3k build
```
Test locally:
```
cd build ; npx http-server .
```
Host the build directory on your static website hosting service (e.g. Amazon S3, Google Cloud Storage)

Prepare Data

The prepare_data command is used to freeze an h5ad, loom, or Seurat (RDS) file in cirrocumulus format. The cirrocumulus format allows efficient partial dataset retrieval over a network (e.g Google bucket) using limited memory.

Example:
```
cirro prepare_data pbmc3k.h5ad
```
Full list of command line options:
usage: cirro prepare_data [-h] [--out OUT] [--format {parquet,jsonl,zarr}] [--whitelist WHITELIST] [--markers MARKERS] [--no-auto-groups] [--groups GROUPS] [--group_nfeatures GROUP_NFEATURES] [--spatial SPATIAL] dataset [dataset ...]
Positional Arguments

dataset

Path to a h5ad, loom, or Seurat (rds) file

Named Arguments

--out

Path to output directory

--format

Possible choices: parquet, jsonl, zarr

Output format

Default: “zarr”

--whitelist

Optional whitelist of fields to save when output format is parquet or zarr. Use obs, obsm, or X to save all entries for these fields. Use field.name to save a specific entry (e.g. obs.leiden)

--markers

Path to JSON file of precomputed markers that maps name to features. For example {“a”:[“gene1”, “gene2”], “b”:[“gene3”]

--no-auto-groups

Disable automatic cluster field detection to compute differential expression results for

Default: False

--groups

List of groups to compute markers for (e.g. louvain). Markers created with cumulus/scanpy are automatically included. Separate multiple groups with a comma to combine groups using “AND” logic (e.g. louvain,day)

--group_nfeatures

Number of marker genes/features to include

Default: 10

--spatial

Directory containing 10x visium spatial data (tissue_hires_image.png, scalefactors_json.json, and tissue_positions_list.csv) or a directory containing image.png, positions.image.csv with headers barcode, x, and y, and optionally diameter.image.txt containing spot diameter

Developer Instructions

Create a new conda environment:
```
conda create --name cirrocumulus-dev
```

Clone the cirrocumulus repository:

git clone https://github.com/klarman-cell-observatory/cirrocumulus.git

Change to cirrocumulus directory:
```
cd cirrocumulus
```

Install:

pip install --upgrade pip
pip install -e .[dev,test]
pre-commit install
yarn global add typescript
yarn install
yarn build
pip install -e .

Install additional optional Python dependencies:
```
pip install s3fs
```
Create an example h5ad file in ./data/pbmc3k_processed.h5ad:
```
import scanpy as sc
sc.datasets.pbmc3k_processed()
```

Launch cirrocumulus with the –no-open flag:

cirro launch ./data/pbmc3k_processed.h5ad --no-open

Alternatively, launch the cirrocumulus server (see prepare_data):
```
cirro serve
```
Run JavaScript server in development mode:
```
yarn start
```
Navigate to http://localhost:3000
In order to run End to End tests (yarn e2e), please install GraphicsMagick (brew install graphicsmagick on Mac)
Testing:
```
yarn e2e
yarn test
pytest
```
Build JavaScript front-end for deployment:
```
yarn build
```

Installation

Standalone Mode

Positional Arguments

Named Arguments

Server Mode

Named Arguments

Static Website

Prepare Data

Positional Arguments

Named Arguments

Developer Instructions