Thursday, January 15, 2009

How Do Companies Find Images With Copyright Issues?

Well, here's a very interesting subject. In the early years of the Internet, nobody would know you existed unless you told them. In fact, you pretty much had to beat them over the head to get them to use their computer let alone the "Internet." Today, anything that connects to the Internet, including Intranets and private networks, becomes publicly available.

Google is a dominant behemoth of finding stuff, bringing rise to Google Hacking. It doesn't take long for Google to rush through your site and collect enough information to make you blush. You might even read Google to find out more about yourself than you currently know. Lets just assume you understand how Google collects information and "crawls" and "spiders" the Internet looking for stuff to index and cache. Google is not the only company to invent such a mehanism.

Bad Robots are engines similar to Google in terms of their desire to find Online content. Google is a Good Robot, because they respect our wishes and try to work nicely with us. Google rocks! Bad Robots disrespect your wishes and couldn't care less about your privacy, safety, or concerns. Placing a robots.txt file will be absolutely pointless for Bad Robots. In fact, Bad Robots will use your robots.txt file to find stuff they may otherwise miss. This is an important caveat regarding robots.txt files, as you must NOT specify stuff you want excluded from crawls and spidering, unless you don't mind it becoming public.

If it's sensitive information you seek to protect, you must protect it via authentication and through server features that limit access. We'll discuss this subject elsewhere.

Bad Robots are simply packages of software that are designed to find and sometimes retrieve. From what I have gathered in my reading is that an Isreali company wrote a complex search engine (robot) that specifically handles image comparisons. Bad Companies that manage image distribution and write nasty letters to often nice people use such software to find as many victims as they can.

The first thing they do is load all of their managed images into the Bad Robot and set it to hunt like a bulldog. It spiders sites collecting link structures just like Google does to index the World Wide Web. Where Google respects your privacy requests, the bad image robot steps on your toes and indexes everything. The links are reduced to a unique set of places to revisit for inspection. The crawler then browses all of the links and loads all of the web page images into the comparison routine. It then compares the retrieved image against its database of "protected" images for the Bad Company. If a match is found, a screenshot is generated of the "infirnging" web page, and a report is made to the "owner" of the image copyrights.

The Bad Robot most likely uses crafted programming that maximizes its effect, minimizes repetition, and uses a reduction theory for category or color palette subsecting. I'm just pulling this out of my wooly hat, but I'm sure it's a crazy program.

Word is that the bad image distributors split the profits 50-50 with the software developers. This is not a fact that can be supported to date. But, if this is true, you can see how inspired the companies with Bad Robots would be to find "offenders." In fact, I think this would inspire them to step over the line of reasonable discovery and be too inclusive rather than reasonably exclusive. If you possibly fit the ticket as, "a sucker who will pay the demanded amount from an extortion letter," you'll get it.

Word on the street is that Bad Image Companies will try to use Online resources like "The Wayback Time Machine" and that leads us to another article to explain how to Cleanse Your Internet Footprint from them too. From what I can tell, these Bad Image Companies use automation to prepare their letters, and there are supposedly thousands of them pouring out to the benefit of FedEx. [Not sure if we should be unhappy with FedEx, but who shoots the messenger anymore?] It remains to be determined how much legitimate research these Bad Image Companies do to substantiate and prove their claims.

Someone bought images from an old stock images company, which was bought by a Bad Image Company. The Bad Image Company later bought the old stock images company, then laid claim to their images by sending out Copyright Infringement Demand Extorion Letters. Does the Bad Image Company know their dates are screwed up? Does the automation find the potential copyright date, or perhaps the date they acquired the old stock images company, and state that as the date, for convenience? I don't know, but would love to find out.

If you have information about how Bad Companies are finding images that potentially infringe on their "rights," I'd like to hear about it, so we can share with the people in need of information. Hopefully the Good Image Companies will see what's going on use it to their advantage, by staying nice and making us proponents of their businesses.

