I am working on a small project in which I am interested in providing users with some additional information about the program being displayed on TV (e.g. extra info about the cast of a movie, show schedules, etc.)
My first task is to, based on a recording of the TV/Screen made with a smart phone, detect the screen and segment it (to later feed a classifier with the frames of the TV show). I need help on this task. I know there are algorithms like Mask R-CNN that can segment some things like people/cars/etc. but I need a custom model to do so JUST for screens/TV/monitor.
The solution can rely on deeplearning/basic image manipulation/whatever but the resulting precision should be high (see images attached).
I am basically looking for advice on this task.
- The resolution of the image will be high (1080p)
- The TV/Screen/monitor can be safely assume to ALWAYS be in the middle of the image (or, in other words, the central pixel will always belong to Screen/TV/monitor region)
- As frames will be taken out of a video, results can be double-checked with other frames from the same video
- Algorithm cannot rely on frames changing too much from one another (that is, do not assume the TV/screen/monitor can be detected by simply detecting WHAT has changed from one frame to the next or from the beginning of the video to the end).