convert the format of the caltech pedestrian dataset to the format that yolo uses
This repo is adapted from
- https://github.com/mitmul/caltech-pedestrian-dataset-converter
- https://pjreddie.com/media/files/voc_label.py
- opencv
- numpy
- scipy
- Convert the
.seqvideo files to.pngframes by running$ python generate-images.py. They will end up in theimagesfolder. - Squared images work better, which is why you can convert the 640x480 frames to 640x640 frames by running
$ python squarify-images.py - Convert the
.vbbannotation files to.txtfiles by running$ python generate-annotation.py. It will create thelabelsfolder that contains the.txtfiles named like the frames and thetrain.txtandtest.txtfiles that contain the paths to the images. - Adjust
.datayolo file - Adjust
.cfgyolo file: take e.g.yolo-voc.2.0.cfgand setheight = 640,width = 640,classes = 2, and in the final layerfilters = 35(= (classes + 5) * 5))
|- caltech
|-- annotations
|-- test06
|--- V000.seq
|--- ...
|-- ...
|-- train00
|-- ...
|- caltech-for-yolo (this repo, cd)
|-- generate-images.py
|-- generate-annotation.py
|-- images
|-- labels
|-- test.txt
|-- train.txt