=============================
Create a Tiled Tiff Converter
=============================

------------
Introduction
------------

The bfio package is designed to make it easy to process arbitrarily sized images
in a fast, scalable way. The two core classes, :doc:`/Reference/BioReader` and
:doc:`/Reference/BioWriter`, can use one of three different backends depending
on the file type that will be read. Usually, the proper backend will be selected
when opening a file. For cases where bfio fails to select the proper backend, the 
``backend`` parameter can be passed to the constructor. The following backends
are available in ``bfio``.

1. ``backend="python"`` can only be used to read/write OME tiled tiff images.
   Tiled tiff is the preferred file format for reading/writing arbitrarily sized
   images.
2. ``backend="bioformats"`` can be used to read any
   `any format supported by Bio-Formats <https://docs.openmicroscopy.org/bio-formats/8.0.1/supported-formats.html>`_.
   The BioWriter with java backend will only save images as OME tiled tiff.
3. ``backend="zarr"`` can be used to read/write a subset of Zarr files following
   the `OME Zarr spec <https://ngff.openmicroscopy.org/latest/>`_.

The advantage to using the ``python`` and ``zarr`` backends are speed and
scalability at the expense of a rigid file structure, while the ``bioformats`` backend
provides broad access to a wide array of file types but is considerably slower.

In this example, the basic usage of two core classes are demonstrated by
creating a "universal" converter from any Bio-Formats supported file format to
OME tiled tiff.

---------------
Getting Started
---------------

~~~~~~~~~~~~~~~~~~~~
Install Dependencies
~~~~~~~~~~~~~~~~~~~~

To run this example, Java and Maven are required. See the :doc:`/Installation` instruction for details on how to install them. Then install ``bfio`` using:

``pip install bfio``

.. note::

    JPype and Bio-Formats require Java 8 or later. ``bfio`` has been tested
    using Java 8.

.. note::

    Bio-Formats is licensed under GPL. If you plan to package ``bfio`` using this option,
    make sure to consider the licensing ramifications.

~~~~~~~~~~~~~~~~~~~~~
Download a Test Image
~~~~~~~~~~~~~~~~~~~~~

Download a test image from the Web Image Processing Pipeline (WIPP) github set
of test images.

.. code-block:: python

    from pathlib import Path
    import requests

    """ Get an example image """
    # Set up the directories
    PATH = Path("data")
    PATH.mkdir(parents=True, exist_ok=True)

    # Download the data if it doesn't exist
    URL = "https://github.com/usnistgov/WIPP/raw/master/data/PyramidBuilding/inputCollection/"
    FILENAME = "img_r001_c001.ome.tif"
    if not (PATH / FILENAME).exists():
            content = requests.get(URL + FILENAME).content
            (PATH / FILENAME).open("wb").write(content)

---------------------------------
Reading Images with the BioReader
---------------------------------

The first step in loading an image with the BioReader is to initialize a
BioReader object. The BioReader tries to predict what backend to use based on
the file extension, and it will throw an error if it predicts incorrectly. 

.. code-block:: python

    from bfio import BioReader

    br = BioReader(PATH / FILENAME)


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Manually Specifying a Backend
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The user can also manually specify a backend. To use ``bioformats`` backend, just use the keyword argument ``backend=bioformats`` when initializing the
BioReader.

.. code-block:: python

    # Set up the BioReader
    br = BioReader(PATH / FILENAME,backend='bioformats')

    # Print off some information about the image before loading it
    print('br.shape: {}'.format(br.shape))
    print('br.dtype: {}'.format(br.dtype))

    br.close()

Behind the scenes, what happens is that ``scyjava`` starts Java and loads the
Bio-Formats jar. If Java is not installed, the above code will raise an error.

.. Note::

    Currently, the mechanism for starting Java is likely not thread safe.

~~~~~~~~~~~~~~~~~~~
Using the BioReader
~~~~~~~~~~~~~~~~~~~

In the above code, a ``BioReader`` object is initialized, the shape and data
type is printed, and then the BioReader object is closed. The closing of the
``BioReader`` object is necessary to ensure that the Java object is cleaned up
properly. To ensure that this happens, it is recommended to put image access
into a ``with`` block, which will automatically perform file cleanup.

.. code-block:: python

    from bfio import BioReader

    # Initialize the BioReader inside a ``with`` block to handle file cleanup
    with BioReader(PATH / FILENAME,backend='bioformats') as br:

        # Print off some information about the image before loading it
        print('br.shape: {}'.format(br.shape))
        print('br.dtype: {}'.format(br.dtype))

To read an entire image, use the :attr:`~bfio.bfio.BioReader.read` method
without any arguments.

.. code-block:: python

    I = br.read()

Alternatively, the
:attr:`~bfio.bfio.BioReader.X`,
:attr:`~bfio.bfio.BioReader.Y`,
:attr:`~bfio.bfio.BioReader.Z`,
:attr:`~bfio.bfio.BioReader.C`, and
:attr:`~bfio.bfio.BioReader.T` values can be specified to load only a subsection
of the image. If the BioReader is reading from an OME tiled tiff, then the file
reading should be faster and require less memory than other formats. This has to
do with how data is stored in the OME tiled tiff.

For the current file, to load only the first 100x100 pixels:

.. code-block:: python

    I = br.read(X=[0,100],Y=[0,100])

The above code will return a 5-dimensional numpy array with
``shape=(100,100,1,1,1)``. If this file had multiple z-slices, channels, or
timepoint information stored in it, then the first 100x100 pixels in every
z-slice, channel, and timepoint would all be loaded since Z, C, and T were not
included as keyword arguments.

To make it easier to load data, data can be fetch from a file using NumPy like
indexing. However, there are some caveats. Step sizes in slices are ignored for
the first three indices. Thus, the following three lines of code will load data
exactly the same as the above line using ``read`` to load the first 100 rows and
columns of pixels:

.. code-block:: python

    I = br[0:100,0:100,:,:,:]
    I = br[:100,:100,...]
    I = br[:100:2,:100:2]

    print(I.shape) # Should return (100,100)

.. note::

    When using NumPy like indexing, trailing dimensions with size=1 are squeezed. So,
    while ``read`` will always return a 5-dimensional array, the NumPy indexing in this
    case will return a 2-dimensional array since there are no Z, C, or T dimensions.

---------------------------------
Writing Images With the BioWriter
---------------------------------

~~~~~~~~~~~~~~~~~~~~~~~~~~
Initializing the BioWriter
~~~~~~~~~~~~~~~~~~~~~~~~~~

Initializing the :doc:`/Reference/BioWriter` requires a little more thought than
the ``BioReader`` because the properties of the file have to be set prior to
writing any data. In many cases, starting the BioWriter with the same metadata
as the BioReader will get you most of the way there.

.. code-block:: python

    from bfio import BioWriter

    bw = BioWriter(PATH / 'out.ome.tif',metadata=br.metadata)

The above code copies all the metadata from a BioReader object to the
BioWriter object. If the data type needs to be changed for the file, simply
set the object property.

.. code-block:: python

    bw.dtype = np.uint8 # Must be a numpy data type
    bw.X = 1000 # set the image width
    bw.Y = 500  # set the image height
    bw.channel_names = ['PI','phalloidin','DAPI'] # if your image has three channels, name each of them

For more information on the settable properties, see the
:doc:`/Reference/BioWriter` documentation.

~~~~~~~~~~~~~~~~~
Writing the Image
~~~~~~~~~~~~~~~~~

As with the BioReader, the BioWriter needs to be properly closed using the
:attr:`~bfio.bfio.BioWriter.close` method. Closing the BioWriter finalizes the
file, and if code exits without a file being close then the image may not open
properly. To help prevent this scenario, use a ``with`` block.

.. code-block:: python

    with BioWriter(PATH / 'out.ome.tif',metadata=br.metadata) as bw:

        original_image = br[:]
        bw.write(original_image)

This code reads an image and savess it as an OME tiled tiff!

As with the BioReader, it is possible to use numpy-like indexing. An alternative
to the above code block would be:

.. code-block:: python

    with BioWriter(PATH / 'out.ome.tif',metadata=br.metadata) as bw:

        bw[:] = br[:]

.. note::

    After the first ``write`` call, most BioWriter attributes become
    :attr:`~bfio.bfio.BioWriter.read_only`.

--------------------------------------------
An Efficient, Scalable, Tiled Tiff Converter
--------------------------------------------

In the above example, the demo image was relatively small, so opening the entire
image and saving it was trivial. However, the ``bfio`` classes can be used to
to convert an arbitrarily large image on a resource constrained system. This is
done by reading/writing images in subsections and controlling the number of
threads used for processing. Both the BioReader and BioWriter use multiple
threads to read/write data, one thread per tile on individual tiles. By default,
the number of threads is half the number of detected CPU cores, and this can be
changed when a BioReader or BioWriter object is created by using the
``max_workers`` keyword argument.

To get started, let's transform the previous examples into something more
scalable. Something more scalable will read in a small part of one image, and
save it into the tiled tiff format.

.. note::

    The BioWriter always saves images in 1024x1024 tiles. So, it is important to
    save images in multiples of 1024 (height or width) in order for the image to
    save correctly. In the future, the tiled tiff tile size may become a user
    defined parameter, but for now the WIPP OME tiled tiff standard of 1024x1024
    tile size is used exclusively.

.. code-block:: python

    # Number of tiles to process at a time
    # This value squared is the total number of tiles processed at a time
    tile_grid_size = 1

    # Do not change this, the number of pixels to be saved at a time must
    # be a multiple of 1024
    tile_size = tile_grid_size * 1024

    with BioReader(PATH / 'file.czi',backend='bioformats') as br, \
        BioWriter(PATH / 'out.ome.tif',backend='bioformats',metadata=br.metadata) as bw:

        # Loop through timepoints
        for t in range(br.T):

            # Loop through channels
            for c in range(br.C):

                # Loop through z-slices
                for z in range(br.Z):

                    # Loop across the length of the image
                    for y in range(0,br.Y,tile_size):
                        y_max = min([br.Y,y+tile_size])

                        # Loop across the depth of the image
                        for x in range(0,br.X,tile_size):
                            x_max = min([br.X,x+tile_size])

                            bw[y:y_max,x:x_max,z:z+1,c,t] = br[y:y_max,x:x_max,z:z+1,c,t]


The above code has a lot of for loops. What makes the above code more scalable
than just a simple piece of code like ``bw[:] = br[:]``? The for loops and the
``tile_size`` variable make it so that only a small portion of the image is
loaded into memory. In the above code, ``tile_grid_size = 1``, meaning that
individual tiles are being stored one by one, which is the most memory efficient
way of converting to tiled tiff.

One thing to note in the above example is that both the BioReader and BioWriter
are using the Java backend. This ensures a direct, 1-to-1 file conversion can
take place. While the Python backend for both the BioReader and BioWriter can read OME
TIFF files with any number of XYZCT dimensions, the WIPP platform expects each file to
only contain XYZ data. To make the above tiled tiff converter export WIPP compliant
files, the code should be changed as follows:

.. code-block:: python

    # Number of tiles to process at a time
    # This value squared is the total number of tiles processed at a time
    tile_grid_size = 1

    # Do not change this, the number of pixels to be saved at a time must
    # be a multiple of 1024
    tile_size = tile_grid_size * 1024

    with BioReader(PATH / 'file.czi',backend='bioformats') as br:

        # Loop through timepoints
        for t in range(br.T):

            # Loop through channels
            for c in range(br.C):

                with BioWriter(PATH / 'out_c{c:03d}_t{t:03d}.ome.tif',
                               backend='bioformats',
                               metadata=br.metadata) as bw:
                    bw.C = 1
                    bw.T = 1

                    # Loop through z-slices
                    for z in range(br.Z):

                        # Loop across the length of the image
                        for y in range(0,br.Y,tile_size):
                            y_max = min([br.Y,y+tile_size])

                            # Loop across the depth of the image
                            for x in range(0,br.X,tile_size):
                                x_max = min([br.X,x+tile_size])

                                bw[y:y_max,x:x_max,z:z+1,0,0] = br[y:y_max,x:x_max,z:z+1,c,t]

---------------------
Complete Example Code
---------------------

~~~~~~~~~~~~~~~~~~~~~~
Self Contained Example
~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    from bfio import BioReader, BioWriter
    from pathlib import Path
    import requests
    import numpy as np

    """ Get an example image """
    # Set up the directories
    PATH = Path("data")
    PATH.mkdir(parents=True, exist_ok=True)

    # Download the data if it doesn't exist
    URL = "https://github.com/usnistgov/WIPP/raw/master/data/PyramidBuilding/inputCollection/"
    FILENAME = "img_r001_c001.ome.tif"
    if not (PATH / FILENAME).exists():
        content = requests.get(URL + FILENAME).content
        (PATH / FILENAME).open("wb").write(content)

    """ Convert the tif to tiled tiff """
    # Set up the BioReader
    with BioReader(PATH / FILENAME,backend='bioformats') as br, \
        BioWriter(PATH / 'out.ome.tif',metadata=br.metadata,backend='python') as bw:

        # Print off some information about the image before loading it
        print('br.shape: {}'.format(br.shape))
        print('br.dtype: {}'.format(br.dtype))

        # Read in the original image, then save
        original_image = br[:]
        bw[:] = original_image

    # Compare the original and saved images using the Python backend
    br = BioReader(PATH.joinpath('out.ome.tif'))

    new_image = br.read()

    br.close()

    print('original and saved images are identical: {}'.format(np.array_equal(new_image,original_image)))

~~~~~~~~~~~~~~~~~~~
Scalable Tiled Tiff
~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    from bfio import BioReader, BioWriter
    import math
    from pathlib import Path
    from multiprocessing import cpu_count

    """ Define the path to the file to convert """
    # Set up the directories
    PATH = Path("path/to/file").joinpath('file.tif')


    """ Convert the tif to tiled tiff """
    # Number of tiles to process at a time
    # This value squared is the total number of tiles processed at a time
    tile_grid_size = math.ceil(math.sqrt(cpu_count()))

    # Do not change this, the number of pixels to be saved at a time must
    # be a multiple of 1024
    tile_size = tile_grid_size * 1024

    # Set up the BioReader
    with BioReader(PATH,backend='bioformats',max_workers=cpu_count()) as br:

        # Loop through timepoints
        for t in range(br.T):

            # Loop through channels
            for c in range(br.C):

                with BioWriter(PATH.with_name(f'out_c{c:03}_t{t:03}.ome.tif'),
                            backend='python',
                            metadata=br.metadata,
                            max_workers = cpu_count()) as bw:

                    # Loop through z-slices
                    for z in range(br.Z):

                        # Loop across the length of the image
                        for y in range(0,br.Y,tile_size):
                            y_max = min([br.Y,y+tile_size])

                            # Loop across the depth of the image
                            for x in range(0,br.X,tile_size):
                                x_max = min([br.X,x+tile_size])

                                bw[y:y_max,x:x_max,z:z+1,0,0] = br[y:y_max,x:x_max,z:z+1,c,t]