Orkut Banned in India

“Orkut Banned in India” – Will this ever become a real headline? It can happen as various political groups are preparing a move to make it happen. In India power still has power and political parties are synonyms for power but surprisingly we still are the largest democracy. Orkut has been under contriversies for various reasons. My previous post lists some of it http://www.idealwebtools.com/blog/orkut-scams/.

Today while going through Times Now I ran across their major coverage, “Controversy over Orkut ban“. They were juicing the popularity of Orkut by taking live opinions from various parts of India (In Kolkata they were at Flury). The news says,

The popular internet networking site Orkut was recently under attack by Shiv Sena workers, who went on a rampage after disparaging remarks made against Shiv Sena Supremo Bal Thackeray & Marathi icon Shivaji were posted on the site. The Sainiks vandalised a cyber cafe in Thane, Mumbai and demanded a ban on Orkut because of the remarks posted on one of its forums, essentially group blogs that allow people to write and post their views on any topic – without censors.

Here is my take on the controversial issue :-

  1. Orkut is not a site, it is a culture or a concept. Even if someone bans the major instance of Orkut concept, orkut.com, there will born another instance with equal power.
  2. Orkut concept is blended with social web (web 2.0) and is inseparable. To ban Orkut concept, one will have to ban social web.
  3. Is there any communication model that doesn’t carry cons. Isn’t mobile phone helping terrorists?
  4. “The Sainiks vandalised a cyber cafe in Thane”????? What did the cyber cafe guy do? Who will repay his losses? Even after this you expect people to love you not hate? People reserve the rights to love or hate anyone. If there are hate groups then there are bigger love groups as well, see yourself http://www.orkut.com/UniversalSearch.aspx?q=Bal+Thackeray&pno=1&searchFor=C or http://www.orkut.com/UniversalSearch.aspx?q=Shivaji+&pno=1&searchFor=C but still people reserve the right to express their opinion. There is a well saying, “Greatest loved elements are highly hated too”.
  5. Isn’t social web empowering the true government, the people? More that the cons I see pros. Now people are THE power.

How to tackle it?

  • Stop vandalizing wrong people. If this carries on then much connected society can show its true power too, remember the blogspot ban.
  • At a proactive level setup a committee to look at the bigger picture, which can keep the web clean to the possible extent. Creating awareness and better online laws are needed.
  • At a reactive level ask Orkut to ban the group
  • Look at the 100 positive things about Orkut and forget about one or two, that happens everywhere :), Enjoy Orkut.

Similar Ban proposals for various other social web

  • Youtube could be banned in India due to a supposedly offensive video of Mahatma Gandhi doing a Pole dance and other ungandhi like antics. The video features a comedy skit by a Hawaii-based Non-Resident Indian, Gautham Prasad, who has now apologized to the people offended by the video.
  • Blogs banned in India over misuse, later rectified.

What is your take on it?

Other related posts

  1. Chirkut Vs Orkut – My first post on orkut.
  2. Is Orkut’s Memetic strength a misinterpretation? – About Orkut scrap book and Orkut’s misinterpreted strength is finally interpreted.
  3. A list where orkut is used for drug/pic-scam/hate-groups.
  4. social networking flood

Common Blog API Access URLs

If you are a programmer and work on Blog products then this post makes a lot of sense to you. I am (and was) working on few products for blogosphere and our programmers had mixed API Type for different Blog system. I found a cool document by Google and here it is for all our new programmers. It will act as a referring doc for me. Taking an off tomorrow so expect a lot of changes and posts from me.

Blog System API

URL 
Blogsome

Blogger http://YOURBLOG.blogsome.com/xmlrpc.php
Conversant MovableType http://YOURBLOG/RPC2
Drupal 4.4 + MovableType http://YOURBLOG/PATH/TO/xmlrpc.php
GeekLog Blogger http://YOURBLOG/blog/

JRoller MetaWeblog

http://www.jroller.com/xmlrpc
Manila MetaWeblog http://YOURBLOG/RPC2
MovableType MovableType http://YOURBLOG/PATH/TO/mt-xmlrpc.cgi
Nucleus < 2.5 MetaWeblog http://YOURBLOG/PATH/TO/nucleus/xmlrpc/server.php
Nucleus 2.5 + MovableType http://YOURBLOG/PATH/TO/nucleus/xmlrpc/server.php
PLog MetaWeblog http://YOURBLOG/xmlrpc.php
pyblosxom MetaWeblog http://YOURBLOG/PATH/TO/cgi-bin/pyblosxom.cgi/RPC
pMachine Blogger http://YOURBLOG/pm/pmserver.php
Quick Blog

MetaWeblog http://YOURBLOG/MetaWeblog.aspx
Roller MetaWeblog http://YOURBLOG/xmlrpc or http://YOURSITE/root/xmlrpc
Serendipity MovableType http://YOURBLOG/serendipity/serendipity_xmlrpc.php
TextPattern MetaWeblog http://YOURBLOG/PATH/TO/textpattern/xmlrpcs.php
TypePad Blogger http://www.typepad.com/t/api/xmlrpc.php
Typo MetaWeblog or
MoveableType
http://YOURBLOG/backend/xmlrpc
WordPress MovableType

http://YOURBLOG/PATH/TO/xmlrpc.php

Xaraya MovableType http://YOURBLOG/PATH/TO/ws.php?type=xmlrpc

Is kitchen sink syndrome a risk for web products?

This belongs to the list of commonly and frequently asked questions, Is kitchen sink syndrome a risk for web products? As we know that kitchen sink syndrome is considered a risk under project management. Oops, Btw kitchen sink syndrome is unplanned changes to a project. Mainly referred to the changes at scope level and thus sometimes it is also known as Scope creep. Wikipedia offers more about it.


It is considered a risk because:-

  1. It often results in cost overrun.
  2. It can also result in a project team overrunning its schedule.

It may not be true for Web products as:-

  1. The Cost involved is very low.
  2. Time consumption is also very low, if proper open source classes are used.

I have personal experiences where kitchen sink syndrome was proven very profitable. On web we go with release early and release often. Even I remember Hedir.com which started like a normal directory and while development it because a community review center. It happens often due to unplanned and unmanaged actions but web products often enter this cycle as it is managed by a very small and very independent team. Spending time on document/planning sometimes doesn’t work (and is not needed too). Leaving it as an open debatable issue.

WordPress database error: [Got error 127 from storage Engine]

I know I am not blogging for last few days, got busy with new recruits. After almost an year (Last time it was when we hired (mass hiring) few programmers and Marketing guys from Army Institute of Management and other B-Schools) I revamped the whole Mentoring program. I am liking this mentorship program as it has the right blend of tech and non-tech dimension. I will post more about it.

Meanwhile I was getting this error
WordPress database error: [Got error 127 from storage Engine]
WordPress database error: [Got error 127 from storage Engine]
SELECT COUNT(comment_ID) FROM wp_comments Where Comment_approved =”Spam”

Error 127

Initially I thought it can be akismet related error and gave it a day. Today after a good 8 hour sleep I just did a search on Error 127 to find “Error 127 indicates a record has crashed”. When there is a crash you need to do a repair.

How to repair WordPress database error

If you are a non-techy guy (or want simpler solution),

  1. Go to cpanel
  2. MySql Databases
  3. See your database
  4. You will see a button named repair under your blog database, click on it and it will take care of it.

If you are a little techy,

  1. use the phpmyadmin
  2. In the main panel, you should see a list of your database tables. Check the boxes by the tables that need repair.
  3. At the bottom of the window just below the list of tables, there is a drop down menu. Choose “Repair Table”.

Even you can do it manually using REPAIR TABLE `wp_useronline` etc. I just did a search for similar error and saw many going through such phase. Hope this article helps.

Happy Bday to Grmtech

(I am not associated with Grmtech anymore)
It was 19th May again, the time to celebrate Grmtech Bday. Got busy after that with all the server audits, better late than never. So here we go with the reports of Grmtech’s 4th Anniversary. This year we had more space to arrange the party as we are shifted to our own office at AE665.

I need to rush to office, so here is a summary of overall party:-

  • There were 10 Maharaj’s (Cooks) for the day.
  • A local music band was there too, singing a mix of Bengali and Hindi songs. I liked most of the songs except the Himesh’s songs, you certainly need special talent to sing his songs.
  • A cake worth a cake, full of chocolate. It was specially made with Grmtech Logo on it.
  • A series of games including treasure hunt, musical chair and so on.
  • A special dance party, I did a Google dance too.
  • Then there was chocolates for winners and Khanna for lossers.

Here are some of pics (and some nostalgic moments)
Continue reading “Happy Bday to Grmtech”

Mistakes are often mistaken

What are mistakes? Can small mistakes be ignored? Are bigger mistakes real mistakes? What is a tolerable limit for mistakes? These are questions you deal while mentoring/monitoring/evaluating people. Here are some of my experiences with (real) mistakes.

Before proceeding further, lets try to define mistake. Wordweb says, “A wrong action attributable to bad judgment or ignorance or inattention is a mistake. Wow, what a perfect but still wrong definition to mistake from practical perspective. With the complete post it will become clearer why I agreed to disagree with the above definition of mistake. IMO, mistake is :-

  • Repeating the action even after recognizing it as a wrong action.
  • Wrong action committed with self permission for self gain. (All shortcuts to fame and name can come under it)
  • Being ignorant or inattentive with self permission resulting in a wrong action.
  • not accepting/not recognizing the right cause of wrong action.
  • unwillingness to learn the bigger lesson taught by previous similar actions resulting in another related wrong action.

Guide for Mentors/Project Managers/Leaders/Management/..

I have seen people creating list of mistakes by their juniors, staffs, team members for evaluation, mentoring etc. Sometimes we list/count mistakes that are not mistakes from practical perspective. Before I start putting more words under this title, here are few thoughts that I strongly believe in:-

  1. No Sincere person (who is committed for work and aims at the same mission if not vision) likes to commit mistakes but mistakes are inevitable for independent people. “Anyone who has never made a mistake has never tried anything new.” says Albert Einstein.
  2. Mistakes often carries different dimensions to it, one dimension can make it look very ugly while other dimension can define beauty. Our junior programmer deleting the whole code while trying different commands is an example, he just wanted to learn and never wanted to delete.
  3. Often the gaps (knowledge gap, vision gap, power gap, communication gap) are responsible for mistakes. May be we call it mistakes due to system.
  4. Every senior is responsible for mistakes done by the junior/colleague. The learning/correction has to be recursive (deeper in system).

So as a mentor it is very important to understand the reason behind the mistake. Also such mistakes can help us in designing a better system (We shifted to SVN when we encountered code deletion by a dedicated junior programmer). Discouraging mistakes can discourage attempts and thus the chances of success too. I often make stupid mistakes but never regret for being stupid as I learn wiser things from my own stupidity. If I am often defeated by myself than I am surely on the right path of success.

Right course of action for Wrongs

I promised my parents and myself to give atleast 6 hrs of sleep everyday, so I will conclude my post here. There is no stone-written rule that can help us define the right course for mistakes. Following thoughts can help us take a better course of action:-

  • Take a course of action aiming at a result. (For repeated mistakes by a team member: When you know you can’t fire that person define your course of action accordingly. When you know that anyway you will be needing a new person then define your course of action accordingly.)
  • There will be bad apples everywhere but don’t leave your ethics/character for them.
  • Tolerate mistakes but never tolerate bad attitude. Bad attitude is like a rotten root, you can’t expect a fruitful tree out of a rotten root.
  • Communicate as often and as quick. 90% mistakes happens due to communication gaps, so communicate to transfer knowledge, to transfer vision.
  • If a person is committed and is with the same vision then he/she can be molded rightly.

Good night, I still some have 7 posts under draft so keep watching.

Below the Top command

Top command is certainly there in every system admins’ frequently used command list. As the man page says, “The top program provides a dynamic real-time view of a running system. It can display system summary information as well as a list of tasks currently being managed by the Linux kernel.” This is simple but still not well explained. I was learning to install munin-node when I decided to read more about every display of top.


Top command
(This is screenshot of top run on idealwebtools.com, shared server)

Lets look at each section.

The first section – Uptime

top - 13:46:02 up 1 day, 14:27,
Starting for left, 13:46:02 is current time, which can get it like
aji@sawyer [~]# date
Wed May 23 13:46:23 EDT 2007

The next section says the server uptime, it is important that servers can run without any restart for many 100s of days. You can also check it with
aji@sawyer [~]# uptime
13:48:09 up 1 day, 14:29, 1 user, load average: 2.38, 1.63, 1.62

The second section – Active User

So we have 1 active user, nothing more to say here.

The Third section – Load Average

This is a very important piece of information.
load average: 2.38, 1.63, 1.62
As most explanations tell you, the three values represent processor load averaged over the last 1 minute, 5 minutes, and 15 minutes, respectively. It is average processes that are queued awaiting processor service at during the given time. Many feel that less that 1 queued awaiting processor service per processor is a good. Some feel it can handle 10 queued awaiting processor service per processor. I will still recommend it to be as low as 1 per processor. It is achievable for sure. This does not give you a complete picture as you need to poll it again and again to see the trend. You can use various application that are available which can run a cron job to poll it after a specific period. You can read more about load average at http://www.teamquest.com/resources/gunther/display/5/index.htm. If you have sometime you can even write a small script and run it every 5 minutes using a cron. This load average is very important data to have.

Fourth section – Tasks

Next section will show the task details
Tasks: 194 total, 2 running, 192 sleeping, 0 stopped, 0 zombie
If you have a lot of tasks in running state, do a good analysis to check it. Tasks shown as running should be more properly thought of as ‘ready to run’. If you want to read more about Zombie task, please visit http://www.ussg.iu.edu/hypermail/linux/kernel/0212.1/0864.html. Rest of it is quite obvious, also see the processes that are running and kill unwanted process.

Fifth section – CPUs

Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
What does these things mean? Here is a small explanation for each section :-

  1. us -> User CPU time: The time the CPU has spent running users’ processes that are not niced.
  2. sy -> System CPU time: The time the CPU has spent running the kernel and its processes.
  3. ni -> Nice CPU time: The time the CPU has spent running users’ proccess that have been niced.
  4. wa -> iowait: Amount of time the CPU has been waiting for I/O to complete.
  5. hi -> Hardware IRQ: The amount of time the CPU has been servicing hardware interrupts.
  6. si -> Software Interrupts.: The amount of time the CPU has been servicing software interrupts.
  7. id is idle, in other words CPU idle status
  8. st is Time stolen from a virtual machine. Prior to Linux 2.6.11, unknown

This shows a breakup of CPU usage, depending on your servers role, you need to optimize it. If you have a lot of disk writing keep a watch on iowait. If might be wondering what does “The time the CPU has spent running users’ processes that are not niced.” mean? If you do a “man nice”, it will say “nice – run a program with modified scheduling priority“. It is called “nice” because the number that is given to a process determines how willing a task is to step aside and let other tasks monopolize the processor. The number varies from -20 to 19. The default value is 0, higher values lower the priority and lower values increase it. If you want to read more about nice, visit http://wiki.linuxquestions.org/wiki/Nice.
When you do a top, it shows the NI value for different process
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
17578 root 15 0 13456 13M 9020 S 18.5 1.3 26:35 1 rhn-applet-gu
19154 root 20 0 1176 1176 892 R 0.9 0.1 0:00 1 top
1 root 15 0 168 160 108 S 0.0 0.0 0:09 0 init
2 root RT 0 0 0 0 SW 0.0 0.0 0:00 0 migration/0
3 root RT 0 0 0 0 SW 0.0 0.0 0:00 1 migration/1
4 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 keventd
5 root 34 19 0 0 0 SWN 0.0 0.0 0:00 0 ksoftirqd/0
6 root 35 19 0 0 0 SWN 0.0 0.0 0:00 1 ksoftirqd/1
9 root 15 0 0 0 0 SW 0.0 0.0 0:07 1 bdflush

Sixth section – Memory

Mem: 1536000k total, 1437272k used, 98728k free, 234212k buffers
Swap: 1020116k total, 72k used, 1020044k free, 567208k cached

This is very much self explanatory. Even you can free -m to get a different view

free -m
total used free shared buffers cached
Mem: 1500 1403 96 0 228 553
-/+ buffers/cache: 620 879
Swap: 996 0 996

This is RAM and SWAP. If we recall the memory classes we had during post graduation, there are different types of memory – Physical –

  • CPU Registers – this is the fastest, its like your hands used to do the tasks in the fastest way but very limited.
  • CPU Cache – This is like your office desk, quickly accessible location
  • RAM – Random Access Memory – Its like your office, you will have to walk around to get the work done.
  • Disk – This is a like a different location all together, so you will have to do a lot of traveling to get the work done. SWAP is basically a location of the disk used when RAM itself is not sufficient. The swap partitions are kept separate (not necessary, you can use a swap file instead) that OS can make the access as fast as possible.

If your server is using a lot of SWAP more often then you need to look into it as it will make your server go slow. We try not to use SWAP as much a possible. Swap cached means, written to swap, but still in memory. OS will anticipate memory needs, and pre-swap inactive data, but keep it in memory.

(SwapTotal – SwapFree – SwapCached) is Actual swapping (memory that will need to be read from disk)

Few more commands and reference for help

  1. Look at VMstat (do a man vmstat)
    procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
    -
    r b swpd free buff cache si so bi bo in cs us sy id wa st
    0 0 72 291196 236744 561308 0 0 15 23 6 42 2 0 97 0 0
  2. Also you can try Sysstat Suite of Resource Monitoring Tools.

A very big post for the day, Enjoy bottoms up for top.

Open letter to Indian Mag editors

This is an open request to all the Indian Tech Magazine Editors. I will like to share my experience with Web and Online business for Indian startups/bloggers/web+aspirants through a quality tech magazine. Please let me know, I will be more than happy to start next month itself. Many are insisting me to start the series on this blog itself (Sorry fellows, this time I will like to give first priority to a magazine, if it doesn’t work out, BLOG Jindabad).

About me

I am in web business for last 4 years. 4 years back when I was doing my Masters in Computer application (A techno manager concepts including almost 1.5 years of MBA, I was very lucky to be a part of it) from Army Institute of Management, Kolkata (then National Institute of Management. I did not accept any big offers to join a startup company working on web products. I then devoted almost 15,000 hrs on online business. As a technology leader, marketer, part time HR manager and all time student, I am able to guide many startups/blogs/web+aspirants to success. I waited for 4 years to see this boom in Indian web market, now I have the permission/time to write one page per month for mags. Recently we conducted the Indian Business School Blog hunt http://inferno.aimk.org/bloghunt/finalrank.php and I was very happy to discover some great blogs. My blog has more about me.

Why I want to write?

IMO people have seen the early online harvest phase for US and UK, from Googles to Amazons, most success stories are very inspiring but still we Indians are missing some core character of web, the simplicity, purple cow concept and the backbone technology. Also we are going more commercial. Hardly I see any Indian company that is doing hardcore web research. Good web products will not come by copying successful concepts alone but with core research.

I am basically into crowdsourcing. I will like to write about following issues

Guide for Internet Gold rush

  1. Issue 1 -> Don’t let your parents name you again (choose the right domain name based on your vision). Here I will like to talk about various aspects on domain name selection including the phone test, billboard test. Also how important is to book other related domains (topix.net bought topix.com for 1 million dollars). Choosing right domain is very important and this issue will answer most part of it.
  2. Issue 2 -> Web business is a pure brain business (make sure you are not washed away with Indian manpower quest). Here I will like to talk about how important is brain power for web business. Not all companies can find the gold nuggets, you need the right brains. In India we need right brains getting right training and exposure.
  3. Issue 3 -> Choosing the right platforms.
  4. Issue 4 -> How network is important than work.
  5. Issue 5 -> Which communication platform to choose? – Internet has defined and redefined communication, from emails to websites, from blogs to wikis, from forums to google docs. This issue will help you choose the right communication platform.
  6. Issue 6 -> Haunted by SEO experts, watchout. The basic of SEO.

The issues will carry on. I hope Indian entrepreneurs/bloggers/web+aspirants will get good help from my experiences through your mag. I am looking for following quote:

  • Can I get a regular section per month (just one page) with a proper branding. I can also answer 2 queries per month (so may be 1.5 page per month). This needs to be a regular affair every month. I want people to wait for this section every month.
  • How much will I be paid for each issue with question and without question?

Looking forward for a long relationship with one magazine.

Regards,
Aji Issac
919830271197
[email protected]

timesjob(s).com violating Google rule

While doing some job portal analysis I got surprised at some job portal ignorance (can I call it web innocence). When the world is talking about higher level canonicalization, Indian portals are violating basic web(Google) guidelines.

First violation – Google ads in email

This is not done by timesjobs.com but another big job portal. I have posted the query on WMW to get expert views on this violations. Let me quote the post,

I just got an email from a very big job portal, they had adsense inside the email. I knew it is illegal to have adsense in email (Google book sec 5.v, so I clicked on it to see the effect.

I saw that these ads were not from Google (but was labeled as “Ads by Google”) but they were links to pages created for Google adsense. Pages with no content, just plain ads. Also very targetted ads, so I suspected the use of google_kw (I think they are premium members, so they might be allowed to use google_kw). Also these pages were hosted on IP based site without any domain name.

Can premium members create such pages which are

  • without any content, only ads, targeted based on google_kw?
  • hosted on ip based site than a branded domain?
  • (sending such links through emails using Google’s name?)

Browse the complete discussion at WMW.

I did forward the suggestion with WMW link to job portal representatives but I am yet to receive any explanation. I do not intent to forward this to Google as my intention is to help, not to do harm to any startup business.

Second violation – Duplicate domains by timesjob(s).com

I was very surprised to encounter such a basic mistake by one of the top Indian sites. timesjob.com and timesjobs.com (one with an additional “s”) having the same content. It is a complete mirror site, I signed up at one and was able to login to other. Also both share the same DNS server (making it a little more against the rule). The strangest thing to happen was a common cache for both the sites, they both share the cache of timesjobs.com. I was very curious to know the reason behind it (can be a bug with Google). I forwarded this query some senior fellows at WMW and Google. Here are some of the responses.

A WMW senior (whom I respect very much for his great experience)

Both domains do resolve to the same content, and without any redirect. There is no root listed for timesjob.com in google. I can see why they got it wrong, but the did get it wrong. I meant that site site:timesjob does not show the domain root. I’ve seen this before when sites do not do a proper redirect and just source the same content for two domain names, both with a 200 status. I’m not behind the scenes at Google, but I’m guessing it’s something in their duplicate content checking that crosses this up.

When I forwarded it to some Googlers, here is (casual and informal) response from them

Ah! Okay 🙂 My guess is that there previously was a redirection in the past, and since the pages are still identical, our bot hasn’t corrected itself. Not high priority since both caches would look the same :P.

But just in case, I’ll ask a colleague to check it out.

It is always advisable not to have different domains with duplicate contents for the following reasons:-

  • Duplicate content issues which can cause big time problems to both the sites.
  • Unwanted links distribution: Some sites may link to timesjobs.com and some may link to timesjob.com, which can dilute the real strength. Link strength is very important to stand the web competition. Why give such a chance.
  • IMO it doesn’t help in branding as well. Users may not appreciate it either.

Search Engines (esp Google) are important as they are termed as the web starting point.

How to handle valuable spam Comments/Posts?

Spam is spam, how can it be valuable? OK. Xens, munins, SATAs and a week long server audits are not hitting me hard, I really mean valuable spam comments/posts. Please take a deep breath as this post is going to be really long. I had a taken a 30 mins class on this topic recently.

Spams are spams then what are valuable spam comments/posts?

In one sentence, spam by a spammer is a pure spam but spam by a valuable member is a valuable spam comment/post. Let me take a real (and personal) example. As I mentioned earlier I used (still I do) to visit blog.penelopetrunk.com (see the comments added by me, IMO they are really valuable, if not add a comment here to contradict 🙂). I used to spent almost 30 mins reading and adding the comments (adding used to take more time than reading) but one day I commented on a particular post on Coachology with Laura Allen of 15secondpitch.com. Actually, I found 15secondpitch.com very impressive, so I created a pitch for myself and added there for review. I never wanted to spam or add any irrelevant comment. I thought Laura and Penelope can review my pitch and suggest me changes right there. Somehow the comment got deleted and I really felt bad about it. There is no issue with deleting one or many comments of mine but it can make you feel bad if you are not notified (esp when you are adding value to someone’s site by taking out your time, I could have copied the comment for some other purpose, I seriously do not want to waste mine or someone’s time). I have no issues with Penelope or her blog :), I got a mail yesterday and I replied too but for few days I did not add any comment (may be I am too busy with the recruitment and server audits). It certainly broke the flow I had with the blog.

That was my personal experience with valuable comment but I had similar experiences while handling few forums. We have seen valuable members (with over 100s of posts) leaving forum when their posts got deleted (we deleted it because of duplicate contents etc). I hope, I was able to explain the concept of “valuable spam”.

Why to worry about valuable spams?

I was a part(handled, guided) of almost a dozen forums by now. Initial phase is a very crucial one, where the forum needs to find participating members. Even one participating member can make a lot of difference. We had articles by a member, who left because of similar issue, fetching us great deal of visitors from search engine. His articles were very helpful and ranked very high on search engines. During the initial phase of the forum it helped us grow. Also it is a node of the social network, a node is not one member/user but a chain of users. It is always good to play as safe as possible.

Who all should worry more about valuable spam comments?

Everyone should(if they can) handle it with as much care as possible but here is a list for whom it is more important :-

  • Forums/Blog in initial phase should give it more value, as you need more active users to make it a success.
  • Niche Forums/Blogs, where you can sometimes miss great and influential users.

How to handle valuable spam comments/posts?

There are various ways, so let me put it one by one in bulleted form :-

  1. Method 1: Permissive delete – Some 2 years back when I used to work with forums (full time) I encountered a similar issue where a valuable member used to post articles (duplicate content). I then sent a mail to the member describing the issues involved. He was very co-operative and allowed me delete the posts. He was posting the articles just to share it with the members, without any intention to spam. All of the articles belonged to him. In permissive delete, involve the member and help him/her understand the issues involved in their own terms (I mean non techy terms). This helps in building a better relationship too. (also see method 4 for duplicate content)
    Pros: The user will not feel bad and may become more loyal.
    Cons: Since the thread/post might stay longer in forums other members and moderators may not like the approach and may complain or create issues under the thread itself.
  2. Method 2: Delete with Personal Notification After deleting the thread you can let the user know personally in a very well drafted way. Here is an example, some months back I got a sticky from WMW moderator

    I appreciate all your intelligent contributions here, Aji.

    Please don’t worry about it if I decide not to publish a thread once in a while. It’s just part of my job to make that judgement call.

    Then we started talking more often. People really appreciate personal communication.
    Pros: User may appreciate the personal communication and may try to cooperate too. It will not reflect any rudeness on forum admins’ part.
    Cons: User may still not like it and may debate over the issue, so it is advisable to keep the thread/post in an incubator instead of complete delete. There are occasions when the user have not communicated after that. Also it is a time consuming task.

  3. Method 3: Replacing the spam with System warning message – This is a great way of handling valuable spam comments. I have learned this from Amazon. You can replace the spam comments with a message,

    “Our Forum/Blog bot(application) detected this message/post/comment as a probable violation to our guidelines(link to guidelines, may be specific section of guidelines which can explain the violation). Sorry for the inconvenience. Please note that sometimes our system can make mistake, in such cases please inform admin(link to admin email). Thanks for your co-operation.” (something like this, sorry I can’t post the exact message we use)

    This program can be automated with the help of community members and thus it is more customer forgiving system.
    Pros: It is more customer forgiving system. People will forgive/blame the system, in many cases they will also talk to admin, making it more effective.

    Cons: Some people may find it very odd.

  4. Method 4: Edit/Delete with admin message – This is a very common practice where the post is deleted with an admins’ message. “Admin: Deleted the post at so and so time, read TOS“.
  5. Method 5: Making the duplicate content an image: This is suggested by one of our project managers. Some of the forums convert duplicate content (another form of valuable spam) to an image to get away with any penalty.
  6. Method 6: A mix of above – You can use a mix of above said methods as per your requirement.

Gosh, what a long post! There may be other methods as well, so please add on your suggestions as a comment.