Aircraft-based monitoring of wildlife is a popular way among conservation practitioners to obtain animal population counts over large areas. Nowadays, these aerial censuses are becoming increasingly scalable due to the advent of drone technology, which is frequently combined with deep learning-based image recognition. Yet, the annotation burden associated with training deep learning architectures remains a problem especially for commonly used bounding box detection models. Point-based density estimation- and localization models are cheaper to train, and often work better when the aerial imagery is recorded at an oblique angle. Beyond this, though, there currently is little consensus about which strategy to use for what kind of data. In this work, we address this knowledge gap and evaluate modifications to a state-of-the-art detection model (YOLOv8) that minimize labeling efforts by enabling it to work on point-annotated images. We study the effect of these adjustments on detection accuracy and extensively compare them to a localization architecture on four datasets consisting of nadir and oblique images. The goal of this paper is to offer wildlife conservationists practical advice on which of the recently proposed deep learning architectures to use given the properties of their images, as well as on the data properties that will maximize model performance independently of the architecture. We find that counting accuracy can largely be maintained at reduced annotation effort, that object detection technology outperforms the localization approach on nadir images, and that it shows competitive performance in the oblique setting. The images used to obtain the results presented in this paper can be found on Zenodo for all publicly available datasets, as well as all code necessary to reproduce our results was uploaded to GitHub.