A quick note from the editor

I am extremely proud and honoured to have our first guest blog post by Zac Braddy from thereactionary.net. If you enjoy this article, please check out his site and leave comments below the article.

Re-Re-Blogging: Retrieving your audience from the grips of Rebloggers

Some people enjoy working hard for the greater good of their fellow man by doing things like starting a blog. Once you've gotten your blog up and running and serving the people you want to serve you'll be forgiven for being proud of your achievement. This pride makes it all the more difficult to stop yourself from getting upset when someone comes along and tries to claim that achievement.

What makes matters worse is that on the internet if you manage to gather any modicum of success then 9 times out of 10 people are going to try and claim it. When this happens it can feel like there’s little you can do to stop it and in some cases just knowing that its even happening is a challenge! This situation is embodied perfectly by this image that makes the rounds at reddit:

Today I want to talk to you about one way that I’ve attempted to maintain control over the fruits of my labour using web development as a weapon.

 

The story so far

Hello! I’m Zac Braddy proud owner and operator of The Reactionary. The Reactionary is a place where you can find all sorts of information about the react.js library. But as I’ve said, today I’d like to tell you about my experience at The Reactionary battling nefarious rebloggers. I'll also tell you how I managed to foil their efforts easily and without having to involve ISPs or any other faceless corporations.

I started The Reactionary in the middle of August 2016. I've found so far that the time investment necessary to write the blog is large and, at the moment, I'm doing for free. So you can understand and perhaps forgive me for getting pretty miffed when I found out that the audience I’ve worked hard to start generating has brought along with it...rebloggers. Ugh!

 

Know your enemy

So what is reblogging? Well, in my experience, reblogging takes a few different forms. My first experience happened where a rather large website copy pasted my article directly onto their website word for word! How did I know it wasn’t just a coincidence and they didn’t just have very similar wording? Two reasons, first they copied, exactly, the code snippets I was using to build a react component WHICH HAD MY NAME IN IT! Second, they had links to a handy dandy github repo of the code from the post. That github repo was MINE!!

That’s actually how I found out about it in the first place. Some engaged reader clicked on the link to my github repo and I saw the click on my repo traffic repo.

Intrigued, I checked out the link and found my article copy pasted amongst a monstrosity of a website, more ads than content.

I was really quite vexed by all of this. I mean we weren't talking about some high school kid copy pasting my stuff because they admired it. If this were the case I might have been more cool about it and perhaps tried to help guide the person and help them make their own start. But no this was just a blatant cash grab using my content to do it.

I couldn't let it stand. So I went about getting the content pulled down. I did a whois on the domain which was, of course, registered anonymously via a domain registrar in China. All I could do was contact the registrar and ask them to tell the owner of the domain to cease and desist or risk a DCMA.

Luckily, in this instance there was only a day or two turn around then the owner of the website replaced the content they'd copy pasted with a button linking through to my site. I probably wasn't going to get many click throughs based on that but ultimately this was enough to stop me from arguing.

 

Why won't you just die?!

About a month later I, once again, received a click through from another strange domain name. This time on my wordpress analytics. The odd thing about this one was that it was from a url that started with thereactionary! The only difference to my own url was it had a strange non standard domain qualifier.

Upon clicking through to the referrer I was presented with my own website! With one important difference, it was overlaid with ads. This second instance was enough for me to want to take further action.

I spoke with my blogging colleagues including Paul Seal (the owner of codeshare) and I was referred to an article by Scott Hanselman. Not only was this article excellent but it described a situation very similar to my own.

In Scott's article he describes a scenario where by the copy pasted content that was being hosted was still maintaining the urls back to his website in the img tags within the stolen posts.

Scott's solution for the problem was to change his web.config so that when a request was made for an image from a domain other than his own the server would instead serve an image which was just haughtily worded text explaining that his article had obviously been the victim of reblogging and telling the user that basically this website sucks for doing this.

By doing things this way Scott didn't have to waste his valuable time tracking down every single reblogger and doing something about it but he still got to, at least, make life difficult for the rebloggers.

I thought this was really great idea, but there were a couple of problems that were stopping me from implementing this approach. First, I use very few images in the body of my posts so it would rare that I this approach would benefit me without changing the way I blog!

Second, if I did this for the cases where the reblogger was rehosting my site the impression I'd give to my user would be that my own site was shoddy and broken. Worse than that they'd still have the rebloggers ads floating above my now broken looking site!

This was not going to do. I resolved that I there was not going to be able to do much about copy pasting rebloggers for now. Useful as Scott's article may be it was just not going to work in my instance. Instead I decided to focus my efforts on how I could go about applying similar principles to these other instance of reblogging where my blog was effectively getting rehosted.

 

Hack the planet!

Upon looking more closely at what the reblogger was actually rehosting I noticed that my pages themselvs appeared to be getting rehosted in their entirety. All the css and bits of javascript that went into making my blog look and act like my blog appeared to still be intact. This told me that whatever I put on my blog was probably going to get rehosted by the rebloggers.

So after a bit of tinkering, trial and error I came up with the following script which I then injected into the head tag of my blog:

<script type="text/javascript">
  window.onload = function() {
      if (window.location.hostname !== 'thereactionary.net'
|| window.location.search.indexOf('=somebadquerystringparameter') !== -1) {
          document.getElementsByTagName('body')[0].innerHTML =
'<h2 class="post-title">Sorry for the hassle :(</h2>' +
'<br />' +
'<div>Hi There!</div>' +
'<br />'
'<div>My name is Zac Braddy and I run the blog The Reactionary where we talk about '
+ 'all things React.</div>' +
'<br />' +
'<div>' +
'I have a great deal of fun writing The Reactionary but unfortunately some ' +
'companies whom I won\'t name so they don\'t get more exposure think it\'s ' +
'ok to rehost my blog and stop me from getting the traffic ' +
'that will help me build a community for you! Can\'t have that now can we!' +
'</div>' +
'<br />' +
'<div>' +
'If you were really hoping to read this article and join our community ' +
'please feel free to join us ' +
'<a href="http://thereactionary.net">over at the reactionary dot net</a>, ' +
'yeah I had to write dot because apparently they replace my domain with ' +
'theirs when they rehost it, pfft!' +
'</div>' +
'<br />' +
'<div>See you there soon :)</div>' +
'<div>Zac Braddy</div>' +
'<div>The Reactionary</div>';
     }
 };
</script>


To make things even easier I've used a wordpress plugin called Insert Headers and Footers which has allowed me to insert this script into the header of every page on my blog including every future page and post I put up.

The net result of this is first, and most importantly, I get to look like a total 1337 hax00r b@d@$$. Another pleasant side effect is that now whenever one of my pages is served to a client, this script runs and checks the host name that the browser has sent it's request to.

If the page is being served from anywhere but my domain name or if the page is being served along with a certain query string parameter (which I worked out was being used by the rebloggers to rehost my pages) the user will see my site for a second or two whilst the page loads but then once it has loaded and my script fires; then the magic happens.

When it fires, my script replaces the innerHTML of the body tag of the page with a message apologising to the user and providing a helpful link for the user to follow back to the safety of my site.

Before we move on, I should mention that the query string parameter I am testing for in the example script above has been changed to protect the identity of the site in question. I'm not interested in defaming the company doing this I just want my traffic back and show you how to get yours back too!

 

Want to know why this is nifty?

There's a couple of cool things about this approach:

  • First it has all the benefits of Scott's approach in that I don't have to know when new rebloggers try to rehost my site it should be largely covered.
  • Because I'm using the wordpress plugin to inject the script I get this goodness for free on all my current and future content with no input needed on my part.
  • It's fairly extensible in that if I find other cases that get around my current rules I can just add another case into my if() statement and then I'll cover off that hole too.
  • There's very little that a reblogger can do to combat this. The reblogger would have either turn off javascript for the page or search the head tag for scripts that use the window.load function. Regardless of the approach they'd risk breaking the page all together and whilst I doubt they care about breaking your content they do care about the fact if they break your content and no one wants to read it anymore then they get no ad revenue.
  • Speaking of ad revenue this solution also stops the the rebloggers ability to make ad revenue off your content because by blowing away the body tag you also blow away all the ads that were on the page.

 

Even Superman's got krpytonite

This approach certainly isn't infallible. There are definitely ways in which it can refined and made better.

 

You could...

Make a second script that could be embed-able within a hidden div in the body of your posts.This would then be sitting there ready to be inadvertently copy pasted by the copy paste rebloggers.

Once copy pasted the script could do something similar to the one above except rather than blowing away the body tag maybe doing something else to otherwise hide the content.

But then...

The downside to this would be that other people who wanted to copy paste a quote of yours might also get this script injected into their content, which wouldn't be very nice.

You could...

Make it so that the url you provide to the user takes them directly to the post they had wanted to view on your blog.

But then...

The problem with this would be that I would have to santise the url to ensure that I didn't inadvertently XSS myself.

You could...

Spend some time to make the replacement page look a little nicer.

But then...

I'd just end up having to worry about what CSS I'd be clashing with that was already on the rebloggers site.

That'll do pig, that'll do..

After taking each of these "improvements" and their costs into consideration I decided that as far as this solution was concerned I'd already received 80% of the benefit I was going to receive. Really, the effort required to improve it in a robust manner just wasn't worth it.

After all, this solution achieved my goals. It makes my site completely useless to a reblogger trying to rehost my pages. Their ads are blown away and ultimately the user is just redirected to my site.

From what I've seen from my wordpress stats, I'd say that the solution is obviously robust enough as it is. More than that it appears to be providing a good enough user experience that people aren't deterred from clicking through to my site.

A good UX is a great bonus too considering the fact that UX isn't really my goal. I'm not out to punish the users of reblogger's sites, it's not really their fault, but at the same time I don't want to make the experience of consuming my site through a reblogger to be a completely comfortable on either.

I want to make it simple enough for the users to get to my original content, but complicated enough that they would just prefer to go to my site directly to consume the content because it's less hassle. I think I've achieved this happy medium with this solution.

 

Happy Hacking

I hope you've found this solution useful. At very least I hope it's inspired the creativity that you need to come up with your own similar hack to retrieve your audience from the clutches of rebloggers.

It's been a pleasure writing for you here on codeshare.co.uk and I look forward to returning in the near future with an article about my chosen topic React. Or maybe you might like to come and interact with me sooner over at thereactionary.net seen as you've read so much about it now!

If you'd like to contact me about this post or about anything React related then do head over to The Reactionary to find my social media links as well as a heap of articles and resources on React.

Till next time, have fun Re-re-blogging :)

Zac Braddy

Zac Braddy is a professional full stack .NET web developer. Other full stack developers beleive Zac has some form of mutant powers because of his passion and talent with front end technologies particularly React.js. When he manages a spare second away from technology you'll find Zac pwning n00bs online or belting out a cracking (albeit discordant) melody with his D&D character, a Half-elf bard named Medrik.

Proudly sponsored by

Moriyama

  • Moriyama build, support and deploy Umbraco, Azure and ASP.NET websites and applications.
AppVeyor

  • CI/CD service for Windows, Linux and macOS
  • Build, test, deploy your apps faster, on any platform.
elmah.io

  • elmah.io is the easy error logging and uptime monitoring service for .NET.
  • Take back control of your errors with support for all .NET web and logging frameworks.
uSync Complete

  • uSync.Complete gives you all the uSync packages, allowing you to completely control how your Umbraco settings, content and media is stored, transferred and managed across all your Umbraco Installations.
uSkinned

  • More than a theme for Umbraco CMS, take full control of your content and design with a feature-rich, award-nominated & content editor focused website platform.
UmbHost

  • Affordable, Geo-Redundant, Umbraco hosting which gives back to the community by sponsoring an Umbraco Open Source Developer with each hosting package sold.