User:CommCheck

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

CommCheck is an in development bot that uses Google's Cloud Vision service to automatically report common copyright violations and is designed to tackle some of the needs described in Community Wishlists.

Why?

[edit]

Commons is littered with copyright violations uploaded either cross-wiki or by those who just want to add a "better image" of their favourite whatever to Commons. At the moment, the main review process is human.

How it works?

[edit]

Google has one of the world largest repositories of commercially copyrighted images. Many patrollers on Commons use Google's reverse image search to check for potential copyright violations. CommCheck uses Google Cloud Vision to find instances of images on the internet to determine whether it is likely that the image has just been uploaded from an image search.

To prevent false positives, at the moment only CC-BY-SA-4.0 images are checked and those uploaded by accounts with more than 200 edits are ignored.

The bot will report "points". Two points is one full result in Google Images (i.e. pages that include the matching image), and one point is a partial result (which may be for cropped or modified images), so the higher the count the more image results that are found. Google limits the number of results so there will be a maximum number of points.

Who does it benefit?

[edit]

The entire community. It helps copyright holders by ensuring that their images aren't uploaded under invalid licenses, it helps us as Wikimedians by ensuring that images on Commons are actually true to the purpose of Wikimedia projects and it also relieves strain on patrollers.

False positives

[edit]

Not enough tests have been run for exact stats on false positives, but CommCheck will only check images once and ignore all images with ticket approval or any license other than CC BY SA 4.0.