A bot (short for “robot”) is a program that operates as an agent for a user or another program or simulates a human activity. On the Internet, the most ubiquitous bots are the programs, also called spiders or crawlers, that access Web sites and gather their content for search engine indexes. Bots are also used to walk through the Internet in search of new sites. Search engines use search bots for that. Botnets are used for denial-of-service attacks, virus spreading, etc. So the term bot is used quite broadly. The complexity is to parse data and draw conclusions about followup actions from that, often in multiple cycles. A search bot could theoretically map out the whole non-dark Internet by just following links.
The Bot Framework Service may send a conversation update when a party joins the conversation. For example, on starting a conversation with the Bot Framework Emulator, you will see two conversation update activities (one for the user joining the conversation and one for the bot joining). To distinguish these conversation update activities, check whether the members added property includes a member other than the bot. Activities arrive at the bot from the Bot Framework Service via an HTTP POST request. The bot responds to the inbound POST request with a 200 HTTP status code. Activities sent from the bot to the channel are sent on a separate HTTP POST to the Bot Framework Service. This, in turn, is acknowledged with a 200 HTTP status code.
The protocol doesn’t specify the order in which these POST requests and their acknowledgments are made. The message activity carries conversation information between the parties. In an echo bot example, the message activities are carrying simple text and the channel will render this text. Alternatively, the message activity might carry text to be spoken, suggested actions or cards be displayed. In this example, the bot created and sent a message activity in response to the inbound message activity it had received. However, a bot can respond in other ways to a received message activity; it’s not uncommon for a bot to respond to a conversation update activity by sending some welcome text in a message activity.
Activities arrive at the bot from the Bot Framework Service via an HTTP POST request. The bot responds to the inbound POST request with a 200 HTTP status code. Activities sent from the bot to the channel are sent on a separate HTTP POST to the Bot Framework Service. This, in turn, is acknowledged with a 200 HTTP status code. The protocol doesn’t specify the order in which these POST requests and their acknowledgments are made.
Types of bot
1) Good And Bad bot
2) Generalist And Specialist boot
Good Bots
- Chatbots
- Crawlers
- Transactional bots
- Informational bots
- Entertainment bots: Art bots, Game Bots
Bad Bots
- Hackers
- Spammers
- Scrapers
- Impersonators
Good Bots
Chatbots
- chatbots are bots that are designed to carry on conversations with humans, usually just for fun, and to test the limits of the technology. Chatbots usually have a “personality” similar to a human, and there usually isn’t a goal for the interaction other than to see what the chatbot says.
- There’s another common usage of “chatbots” that essentially includes ALL bots, (i.e. if it’s automated, and you carry on a conversation with it, it’s a chatbot). I think this is confusing and misses the point.ELIZA is the godmother of all chatbots. The bot runs a simple question-and-response script that automatically generates responses to questions, in a style similar to a psychotherapist.
- Cleverbot is a more advanced example that uses AI to learn from interactions.
- Tay is a Microsoft AI chatbot that converses with people via Twitter.
Crawlers
- These bots run continuously in the background, primarily fetch data from other APIs or websites, and are “well-behaved” in that they respect directives you give them.
- For example, you can “hide” your entire website from search engines by blocking search engine spiders in your site’s robots.txt file, keeping all of your site’s content out of Google or Bing, or Yandex, or whatever.
Transactional bots
- Bots in this category act as agents on behalf of humans and interact with external systems to accomplish a specific transaction, moving data from one platform to another.
- Since bots can interact with any endpoint that has an API, Transactional bots can do LOTS of things, and lots of custom solutions are to be expected here.
Informational bots
- Bots in this category surface helpful information, often as push notifications and include things like breaking news stories.
- Techcrunch has a personalized news recommendation bot that pushes content to you via Facebook Messenger or Telegram.
Entertainment bot: Art bots, Game Bots
- Art bots are designed to be appreciated aesthetically.
- Deep Drumpf uses deep learning, applied to transcripts of speeches, to learn how to speak like Donald Trump.
- RealHumanPraise takes positive movie reviews from Rotten Tomatoes, and replaces actors with Fox News personalities, and tweets every 2 minutes.
- Ex: Video game notes
Bad Bots
If you’re interested in really getting into the weeds on bad bots, I highly recommend Distil’s 2017 Bad Bots Report. It’s one of the most comprehensive and digestible summaries of bad bot activity I’ve come across.
Hackers
- Hacker bots are designed to distribute malware, deceive individual people, attack websites, and sometimes entire networks. These bots exploit security vulnerabilities to inject code into the victim’s site.
- Hacker bots can create denial of services (DDoS) attacks by distributing their attack across many different proxies and are designed to have browser-like signatures.
- Google has said that 180% more sites were hacked in 2015 vs 2014.
Scrapers
- Published pages are designed to capture human visitors who are searching for specific keywords, and those visitors are monetized via advertising (AdSense is a classic example).
Spammers
- Spambots are designed to post crappy promotional content around the web, and ultimately drive traffic to the spammer’s website.
Impersonators
- Bots in the Impersonator category are designed to mimic natural user characteristics, making them hard to identify.
- Impersonators also include propaganda bots that are designed to sway political opinion one way or another, often by drowning dissenting opinions.
- Turkey, Mexico, and other nations have used Twitter impersonator bots for this purpose.
Generalist bots vs Specialist bots
The heuristic of “generalist bots” versus “specialist bots” is helpful is that it recognizes a primary market dynamic: Huge companies like Google, Facebook, Amazon, Apple, etc, are all building bot-like services. Most of the big companies seem to be focusing on “generalist” bots, while many companies and individual bot developers are building “specialist” bots. This dynamic of generalists-vs-specialist bots is one of the most helpful heuristics I’ve come across in thinking about how to classify bots.
- specialist botssoixli,x.ai,operator,insecurity,let stock,MyAlly,Digits,Claralabs.
- Generalist botssiri,Alexa,Viv,Messenger,Allo,Cortana,Sensay.
Script bots vs Smart bots
Script Bots
The simplest bots are script bots. The entire interaction is based on a pre-determined model (the “script”) that determines what the bot can and cannot do. The “script” is a decision tree where responding to one question takes you down a specific path, which opens up a new, pre-determined set of possibilities. It’s basically like a Choose Your Own Adventure (for those old enough to remember to Choose Your Own Adventure or books).
Smart Bots
Much of the excitement around bots focuses around the *possibilities* of bots, given the massive advances in ML and AI in recent years. And some of this excitement is well-founded. Many bots have a heavy server-side processing component, which allows them access to massive computing power in understanding and responding to queries. Couple that with the open-sourcing of AI software libraries like Theano and TensorFlow, and you have the ingredients for some amazing human-bot interactions.
How Many Bots Exist?
Coming up with credible numbers around how many bots exist is very difficult. Each bot platform (messenger app) has a vested interest in making their ecosystem look healthy, and so they’re inclined to inflate their numbers. As of spring 2017, the claims are as follows:
- Facebook claims “over 100,000 bots” on Messenger
- Twitter may have 48 million bot accounts
- Kik claims 20,000 bots on its platform
- Microsoft Bot Framework claims “more than 20,000 developers have signed up and gotten started.”
- There are only a handful of Skype bots
- Wit.ai claims 21,500 developers
Topline numbers like this from technology companies should be looked at with a jaundiced eye. Tech companies want to give the impression of critical mass and momentum, so they tell us how many people signed up. Along this line, the number I’m most inclined to believe is Microsoft, who says developers have “signed up and gotten started.” The topline “signed up” numbers are going to be WAY larger than the number of developers who actually create things, much less launch them.
On the side of Good Bots, we tend to have platform operators who will inflate numbers to make their platform ecosystem look healthy. On the Bad Bots side, we have developers who are deliberately trying to obfuscate themselves.
If we want to narrow our focus to Good bots that live and work in messaging platforms, one of my favorite services for surveying the landscape is Chatbottle. They rely on users submitting their bots, which means that these bots are intended to be used (as opposed to a developer tinkering on the weekend) and they’re at least somewhat production-ready. That bottle’s counts are:
- 860 bots for Facebook Messenger
- 85 bots for Skype
- 206 bots for Telegram
- 176 bots for Slack
Countermeasures
- The geographic dispersal of botnets means that each recruit must be individually identified/corralled/repaired and limits the benefits of filtering.
- Computer security experts have succeeded in destroying or subverting malware command and control networks, by, among other means, seizing servers or getting them cut off from the Internet, denying access to domains that were due to be used by malware to contact its C&C infrastructure, and, in some cases, breaking into the C&C network itself. In response to this, C&C operators have resorted to using techniques such as overlaying their C&C networks on other existing benign infrastructure such as IRC or Tor, using peer-to-peer networking systems that are not dependent on any fixed servers, and using public key encryption to defeat attempts to break into or spoof the network.
- Norton AntiBot was aimed at consumers, but most target enterprises and/or ISPs. Host-based techniques use heuristics to identify bot behavior that has bypassed conventional anti-virus software. Network-based approaches tend to use the techniques described above; shutting down C&C servers, null-routing DNS entries, or completely shutting down IRC servers. BotHunter is software, developed with support from the U.S. Army Research Office, that detects botnet activity within a network by analyzing network traffic and comparing it to patterns characteristic of malicious processes.
- Researchers at Sandia National Laboratories are analyzing botnets’ behavior by simultaneously running one million Linux kernels—a similar scale to a botnet—as virtual machines on a 4,480-node high-performance computer cluster to emulate a very large network, allowing them to watch how botnets work and experiment with ways to stop them.
- One thing that’s becoming more apparent is the fact that detecting automated bot attacks is becoming more difficult each day as newer and more sophisticated generations of bots are getting launched by attackers. For example, an automated attack can deploy a large bot army and apply brute-force methods with the highly accurate username and password lists to hack into accounts. The idea is to overwhelm sites with tens of thousands of requests from different IPs all over the world, but with each bot only submitting a single request every 10 minutes or so, which can result in more than 5 million attempts per day. In these cases, many tools try to leverage volumetric detection, but automated bot attacks now have ways of circumventing triggers of volumetric detection.
- One of the techniques for detecting these bot attacks is what’s known as “signature-based systems” in which the software will attempt to detect patterns in the request packet. But attacks are constantly evolving, so this may not be a viable option when patterns can’t be discerned from thousands of requests. There’s also the behavioral approach to thwarting bots, which ultimately is trying to disguise bots from humans. By identifying non-human behavior and recognizing known bot behavior, this process can be applied at the user, browser, and network levels.
- The most capable method of using software to combat a virus has been to utilize Honeypot software in order to convince the malware that a system is vulnerable. The malicious files are then analyzed using forensic software.