What is Small Object Detection?
Detecting very small objects in images is a hard problem in computer vision. Normally, when an image is sent through a detection model, it has to be resized to fit the model’s input size, which is usually no bigger than about 2000 × 2000 pixels. But in real-world cases, like medical scans (DICOM or NIfTI) or satellite photos (GeoTIFF), images can be tens of thousands of pixels wide.
For example, in satellite images, one picture might cover a huge area of land, but the objects we care about (like cars, ships, or houses) may only take up a tiny number of pixels. If we just resize the whole image to the model’s input size, those objects shrink so much that the model can no longer detect them. Because of this, special methods are needed to handle large images and make sure small objects can still be detected accurately.
What is Slicing Aided Hyper Inference (SAHI)?
Slicing Aided Hyper Inference (SAHI), introduced by Akyon et al. in 2021, is an improved version of sliding window methods designed to make small object detection more accurate. The idea is straightforward: instead of feeding a huge image directly to the model, the image is split into smaller slices, which helps the model notice tiny details that would otherwise be missed.

Sliding window techniques like SAHI work by cutting large images into smaller sections and running the model on each section. This makes small objects in the original image appear bigger in the slices, so the model can see more detail and detect them better.
While this approach helps the model find small objects, it also has some downsides. Sometimes, the model can “hallucinate,” meaning it mistakes background or irrelevant parts for objects. Also, because the model only sees small slices, it loses the overall context of the image, making it harder to detect large objects that don’t fit in a single slice.
Extending SAHI for Segmentation of Large Satellite Images
Most existing work with SAHI focuses on object detection. However, when it comes to semantic segmentation of very large images, resources are scarce. I decided to adapt SAHI’s slicing idea for this purpose.
I was working with 10,000 × 10,000 pixel satellite images, which were far too large to be processed directly by models like SAM due to GPU memory limitations. My solution was to:
- Slice the image into smaller overlapping windows.
- Perform segmentation on each slice.
- Merge the results back into a full-resolution mask aligned with the original image.
This allowed me to scale segmentation to ultra-large images efficiently.
I decided to take out the slicing function and do semantic segmentation on each overlapping slices (this is important) and then merge back the results on to same location taken out from the original image.
Implementation
Below is the function I used to calculate slice coordinates with configurable overlap:
def calculate_slice_bboxes(
image_height: int,
image_width: int,
slice_height: int = 512,
slice_width: int = 512,
overlap_height_ratio: float = 0,
overlap_width_ratio: float = 0
):
slice_bboxes = []
y_max = y_min = 0
y_overlap = int(overlap_height_ratio * slice_height)
x_overlap = int(overlap_width_ratio * slice_width)
while y_max < image_height:
x_min = x_max = 0
y_max = y_min + slice_height
while x_max < image_width:
x_max = x_min + slice_width
if y_max > image_height or x_max > image_width:
xmax = min(image_width, x_max)
ymax = min(image_height, y_max)
xmin = max(0, xmax - slice_width)
ymin = max(0, ymax - slice_height)
slice_bboxes.append([xmin, ymin, xmax, ymax])
else:
slice_bboxes.append([x_min, y_min, x_max, y_max])
x_min = x_max - x_overlap
y_min = y_max - y_overlap
return slice_bboxes
Usage example:
import cv2
import numpy as np
image = cv2.imread("image.jpg")
slice_size = 1024 # each slize of size 1024x1024 px
overlap_ratio = 100 / slice_size # 100px overlap b/w each slice
slices = calculate_slice_bboxes(
image.shape[0],
image.shape[1],
slice_size,
slice_size,
overlap_ratio,
overlap_ratio
)
With slice coordinates ready, I then processed each crop individually:
# generate a black image of resolution same as original image
final_mask = np.zeros((image.shape[0], image.shape[1], 1), dtype=np.uint8)
for (x1,y1,x2,y2) in slices:
# crop slice from original image
window = image[y1:y2, x1:x2]
# get segmentation mask for slice
mask = process_slice(window)
# merge slice result to final mask
final_mask[y1:y2, x1:x2] = mask
Results
This method worked remarkably well. The segmentation masks for each slice aligned seamlessly when merged back into the original image space. Once the full mask was reconstructed, I converted the results into vector polygons and exported them to GeoPackage (GPKG) format for downstream geospatial analysis.

By adapting SAHI’s slicing strategy to semantic segmentation, it became possible to process ultra-large images efficiently on limited GPU resources making it a practical solution for real-world satellite image analysis.
What’s Next
With our slicing-based segmentation pipeline working effectively on ultra-large satellite images, we will continue refining the approach and extending its capabilities.
In the near future, we plan to explore improvements such as adaptive slice sizing, better blending strategies for overlapping masks, and integration with additional segmentation backbones. In the longer term, we aim to expand these ideas beyond segmentation into tasks such as object detection and large-scale geospatial change analysis.
Got questions or ideas? We’d love to hear from you! Connect with us at algofly.ai, check out our Blog, or drop us a message to share your thoughts.