Object tracking by an acoustic sensor based on particle filtering is extended for the tracking of multiple objects. In order to overcome the inherent limitation of the acoustic sensor for the simultaneous multiple object tracking, support from the visual sensor is considered. Cooperation from the visual sensor, however, is better to be minimized, as the visual sensor's operation requires much higher computational resources than the acoustic sensor-based estimation, especially when the visual sensor is not dedicated to object tracking and deployed for other applications. The acoustic sensor mainly tracks multiple objects, and the visual sensor supports the tracking task only when the acoustic sensor has a difficulty. Several techniques based on particle filtering are used for multiple object tracking by the acoustic sensor, and the limitations of the acoustic sensor are discussed to identify the need for the visual sensor cooperation. Performance of the triggering-based cooperation by the two visual sensors is evaluated and compared with a periodic cooperation in a real environment.