Installation
$ apt-get install opencv-python
Non-free version
$ apt-get install opencv-contrib-python
1.Threshold
ref : http://docs.opencv.org/3.2.0/d7/d4d/tutorial_py_thresholding.html
.threshold(THRESH_BINARY(around 192)+THRESH_OTSU(around 177))
or
.adaptiveThreshold(ADAPTIVE_THRESH_MEAN_C|ADAPTIVE_THRESH_GAUSSIAN_C)
2.Morphology(Erosion/Dilation, Opening/Closing)
ref : http://docs.opencv.org/3.2.0/d9/d61/tutorial_py_morphological_ops.html
.morphologyEx(cv2.MORPH_OPEN) : to remove noise
or
kernel = np.ones((5,5), np.uint8)
erosion = cv2.erode(img, kernel,iterations = 1)
.dilation
3.Contour ( to find the border )
TIPs : We can rotate the picture 90 degrees, then execute findContour again.
ref : http://docs.opencv.org/3.2.0/d4/d73/tutorial_py_contours_begin.html
.findContours(thresh,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
imgray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
.drawContours()
4. Morphological transform ( to find the border )
ref : http://docs.opencv.org/3.2.0/d9/d61/tutorial_py_morphological_ops.html
.getStructuringElement(cv2.MORPH_RECT, long array)
get max ( dilate ) : return 111111… if there is a matched long array
get min ( erosion ) : return 000000… if there is a matched long array
5. Template matching ( to find a specific charactor, i.e. bullet, to detect whether it is a first page )
ref : http://docs.opencv.org/3.2.0/d4/dc6/tutorial_py_template_matching.html
result = .matchTemplate(img, template, TM_CCORR_NORMED)
_, max_val, _, max_loc = cv2.minMaxLoc(result)
// now we got max_loc
6. reduce ( to extract text line; example : we can detect the block of text line and seperate it with the blank space )
ref : http://docs.opencv.org/3.2.0/d2/de8/group__core__array.html#ga4b78072a303f29d9031d56e5638da78e
.reduce(src, dst, 1, REDUCE_AVG) >= THRESHOLD
7. OCR
- tesseract / tensor flow
- OCRopus / ocropy ( https://github.com/tmbdev/ocropy : python document analysis tool )
- scantailor ( http://scantailor.org/ : GUI to clean scan docs )
8. Vision Course
CE316 / CE866 ( http://orb.essex.ac.uk/ce/ce316/ ) University of Essex, UK