Posted
over 12 years
ago
““If you’re going through Hell, Keep Going”
-Winston Churchill” - Part 2 of our Disaster Recovery Series:Winston Churchill had it right when he talked about perseverance. When things look
... [More]
grim, it’s time to double down and make things right. Today we’re talking about Disaster Recovery: how you can stay online even when the streets look like a scene from I am Legend.
Lessons from Hurricane Sandy
Hurricane Sandy taught us that failover and backup plans are only as good as your weakest link.
Consider telecom, a space we theoretically understand. The basic components of a telecom network include:
• Power
• Copper and Fiber Transport Links (Wires)
• Computers that use power to move data across transport links
• Databases that store information about users
• Buildings to house power, transport, and users
• Staff to monitor and oversee the above processes
We may be stating the obvious, but it’s not rocket science: if one of the items listed above fails, and it only takes one, the entire infrastructure is rendered useless.
While many data centers in New York and New Jersey touted redundant backup systems, they never considered the consequence of losing power for several days. That’s why Hurricane Sandy had such a disastrous effect. In many cases, data centers ran out of fuel and were unable to resupply because of road blocks. In addition, they lost redundant fiber connections and power backups. Without power, it was game over for the whole network!
Business Continuity Planning
Does your business have a disaster plan? If you’re like most companies, you probably don’t. At 2600hz, we worry about your systems so you never have to. There are two types of systems: on-site and hosted. On-site solutions are vulnerable because they house all their hardware in just one location. Hosted centers are spread over multiple data centers. Running on multiple data centers is an incredible advantage because it makes for a more secure and reliable system. In fact, on a hosted center the likelihood of downtime because of damage is almost nonexistent. When data encounters a weak link it is simply rerouted to a functioning server.
I have explained how connection failure is averted at the server-level. Now lets consider the next potential breakage: the network. But don’t worry—in the emergency of a network fail, 2600hz has you and your business covered. Here’s how:
Network Fail
Are you connected to the 2600hz cloud? Have you configured failover in the portal? If your answer is yes, relax! All your users will remain fine and dandy in the case of a network fail. This is because 2600hz does an automatic failover to mobile devices or alternative routing systems. As long you’re connected to the 2600hz cloud, even if your organization is utilizing a legacy PBX for on-premises support, there is no cause for worry (like these fault-tolerant Espresso Machines).
However, if you’re using traditional phone services, you’re pretty much out of luck in the event of a network outage (hint: time to call 2600hz). With the 2600hz hosted phone services you can reconfigure call flows in the portal. And since your users have left their desks, we’ve got mobile on lock down.
More importantly, the 2600hz systems run in a split mode. This means that even if our own servers lose connectivity users witness no change in the network. Routine testing of this capacity has proven 2600hz trustworthy even when data centers lose power, which happens a few times each year. So the answer to the looming question is yes: 2600hz systems will continue to run in the event of a disaster. Our systems will continue to run when entire datacenters fail, and they will run under the extreme circumstances of a Hurricane Sandy.
Mobile Disaster Recovery
We want you to trust your business with 2600hz. To support this vision, we’re building architecture that stands up to brutal weather, and doesn’t add additional expenses. Users can easily direct business calls from desk equipment to softphones or alternative cell numbers after signing in at the User Portal. But we’re not stopping here. Over the next few months we’re developing Disaster Recovery features including one-click all-hands messaging architecture.
Our resilient designs protect your communications systems when something like Hurricane Sandy blows through. Let 2600hz be your disaster plan.
Difference Between 2600hz and Others
When everyone claims to offer redundancy it’s hard to know who to trust. Thanks to Hurricane Sandy, we now know who “weathered the storm” and who, well, hit rough waters.
Let’s take a look at how some companies faired during the storm:
CallCentric, a NY based VoIP provider, hosted their services in a single site (see post-storm comments on Twitter). Their website was down and didn’t provide any updates. It finally refreshed two days later, but did nothing to make up for a multi-day outage caused by lack of geographic redundancy (not spreading their systems over multiple points of presence).
Fonality, a national VoIP provider, suffered from upstream issues in New England. Everyone both on their network and also those attempting to connect to it were affected.
Improcom, a regional IT Service Provider, lost service during the storm for an extended period of time.
VoIP.ms faired exceptionally well during the storm. This is because they prepared to cut over to a second server prior to the foul weather.
Twilio suffered some degraded services related to numbers on the East Coast.
Many other CLECs and ILECs had degraded service throughout the storm (most notably WindStream and Verizon). We’ll know in a few more days how many of these firms recover. After this disaster, let’s see which companies retool the redundancy platforms they’ve been promising all along.
Want to know more? Join us for part 3: a deep dive architecture discussion of how the 2600hz Kazoo Platform mitigates failures.
Check out part 1 here.
2600hz builds cloud telecom infrastructures. If this is the sort of thing you’re into, drop us a line, we’re hiring (especially for Engineers)! Learn more at http://www.2600hz.com.
[Less]
|
Posted
over 12 years
ago
“WebRTC: Frictionless Communications” - Humans have always needed to communicate. We are a social species, vulnerable to loneliness and keen on group input. Throughout the ages, humans have developed many creative ways to satisfy their need for
... [More]
connection. Before the advent of communications technology in the 20th century, the primary method was face to face meeting. Our communication options continue to increase dramatically as technology presses forward. Given this accelerating evolution, the way we communicate will be radically different in the future.
Consider the browser
Your browser is an example of a quickly evolving communication tool. While today your browser is mainly a medium for the consumption of services, in the near future it will support real-time communication such as voice, video and messaging. In their current capacity, browsers are not useful for real-time activities. With the exception of games, most real-time applications are still delivered natively on the users system and not via web-based interfaces. Because there is no standard negotiation between clients, real–time synchronous browser to browser connections don’t really exist. While we have intricate systems for data lookup, personal communication and service locators (DNS, Email, Search Engines, etc.), there is no direct system to negotiate connections between end-users.
WebRTC represents the first ubiquitous technology specifically facilitating real-time communication and browser to browser connectivity without a middle-man. In layman’s terms: any browser can connect to another browser. We’re not just talking ajax long-polling: we’re talking bona fide real-time. With its capacity for frictionless contact, WebRTC could be the democratization of communications as we know it.
With all of this history in mind, it’s easy to see why we Telco folk are tremendously excited by the prospects of WebRTC. Being able to provide the experience of synchronous video communication without the big price tag is a beautiful thing.
What 2600hz is doing with WebRTC
2600hz builds architectures that power some of the world’s largest telecommunications infrastructures. Many of the architectures we deal with focus on delivering real-time services such as voice and SMS. In addition to more traditional services, these providers are increasingly demanding video communication. Installing the necessary plug-ins and proxying/transcoding data necessary for many video applications can be a headache for end-users. WebRTC makes this headache obsolete. Instead of trying to have a video chat with a call center employee via a downloaded Adobe plugin, WebRTC makes features available instantly without the hindrance of user interaction. Put simply: WebRTC equals frictionless access for all.
WebRTC is also just another medium, as far as the end-user is concerned. Integrating WebRTC in a way that allows common habits people already possess (picking up the phone and dialing, for example) to enable this new technology is paramount to making the technology successful. That is what we are focused on at 2600hz: humanizing technical innovation.
Inspirational image of the Moment from the Barcelona Supercomputing Center. [Less]
|
Posted
over 12 years
ago
“Lessons from Sandy on Outages and Customer Communication” - Many companies are just getting started in social media and navigating that ocean of public opinion can be a handful. Customers can give your firm a positive, or a negative, image in a
... [More]
moment’s notice — via Twitter, Facebook and other sites. Maintaining a positive appearance is becoming more important as potential customers often research these sites prior to signing up for a service, to check your reputation, as they should.
One of the most important times to ensure you have a positive image is during your handling of issues that impact your clients. No situation has proven this more than Hurricane Sandy.
Whether your telecom is “in the cloud” or in a back server room, outages can happen to anyone. And they do. Even with architectures as vast as Google’s and Amazon’s, we’ve all heard of such issues, including people like Skype. It’s during these times – the times when the pressure is on to get things up and running again – that transparency to customers has become paramount.
Understanding Your Customers
Customers often ask for status updates or a number to call when something goes wrong. To some degree, this direct customer contact is also important because it helps companies experiencing issues gauge how much impact an issue is causing. But in reality, having to staff hundreds of phone lines with people who are not actually doing anything to fix the problem is a drain on resources.
With today’s networks and monitoring tools, there are other options – and they are equally as effective. In fact, we’d argue, they’re more effective.
Sandy provided many, many interesting examples to us here at 2600hz. As we watched the storm progress, we saw outages from bandwidth.com, Windstream, Verizon and other high-profile providers in the NYC metropolitan area. Each one of these providers responded differently to the issues, but it was fairly obvious that, between email lists, Twitter and other resources, the customers were watching – and chatting – about how each company was handling the issue.
Most notably, if a company was having an issue and didn’t acknowledge it for a while, there was notable animosity from the on-lookers. If a company noted the issue but didn’t provide any detailed status or a plan for what they were doing, there was notable frustration from customers about a lack of details.
And most interesting… If a company provided a blog, pictures and a detailed explanation of what they were doing, they got write-ups in the newspaper, support from customers and in some cases even new customers.
A Picture is Worth A Thousand Words
You can staff phone lines with hundreds of people to try to explain what’s going on – or, you can post a picture.
Posting pictures has been particularly powerful for multiple companies, and has resulted in customers praising the staff of multiple companies when, normally, those same companies might be bearing the brunt of insults, refund demands and anger. In particular, JetBlue ( http://blog.jetblue.com/ ) posted pictures of flooding, explaining why they were unable to fly into LaGuardia. The comments section reveals compliments and best-wishes to their staff. Imagine, an airline getting compliments.
Peer1 is perhaps one of my favorite stories. They posted a grim outlook for their datacenter to their customers, a sort-of worst-case scenario note, but then subsequently posted pictures of an obviously tired team wearing head-lamps and wading through dark hallways to refuel a datacenter’s power generators. http://status.squarespace.com/ if you care to check it out. They managed to remain up and avoid a catastrophic outage. But possibly even more important, the dedication their team displayed landed Peer1 stories in the Huffington Post, the New York Times and various other publications. So much for needing a paid P.R. team!
Looking back, pictures seem to have done three very important things:
They provided an obvious status update that showed what was really happening (quickly and without question)
Possibly more important, they elicited sympathy from customers. When customers really understand what’s happening, they seem to become more patient. This is key insight into their psyche – they’re effectively cutting you some slack, granting you some forgiveness and are less likely to pull their business from you in the future.
In some cases, the transparency actually resulted in increased marketing and P.R., showing the great lengths employees go for their customers in times of need
In Telecom, and probably in many other industries, it’s often easy to dismiss customers as being ignorant or even unintelligent when they complain. Telecom is insanely complex and most end-users take it for granted until things go wrong. But the reality is, telecom isn’t an end-users primary job – it’s yours, the telco. End-users expect you to handle, well, everything. And when things happen that impact them, they expect you to fix it. This isn’t really that unique – you would expect your grocer to fix a problem with food supply chain issues. But you don’t necessarily go to the back room of the grocery store and help unload groceries when the store is having a staffing issue. Unless, maybe, they post a picture of how dire the situation is. Then your goodwill and desire to help just might kick in.
A Lack of Updates is … well… you know…
There are also examples of what not to do.
CallCentric is an example of a provider who has been down for over a day, with no updates to their customers except the equivalent of “we’re working on it.” They claim to have staff on-site ready to do, well, something, but there’s no visual representation or solid communication strategy explaining the outages they are experiencing. Many of their customers are frustrated and have taken to publicly bashing on the company. The company is clearly struggling and this is probably just the beginning for them.
It is unclear whether the company actually had any redundancy to start with besides a single datacenter. My guess is they did, but they probably lost both data centers which they didn’t expect. But the problem is that nobody knows – their responses don’t tell a story and so people are making up their own stories, in public, quite loudly. That’s a problem.
In general, we’ve heard customer service reps decline to explain details about situations to a customer in great detail because they don’t think it’s worth the time and are concerned the customer won’t understand anyway. Pictures help solve that, even with complex issues like telecom. If you tell a customer you are redundant and then you go down, they don’t believe you. If you try to explain it to them, they may still not believe you. But if you show them a picture of how your routing is setup and then a link to pictures or blogs for the areas the links go to that also show outages, the customers start to get it. The pictures aren’t required, but they sure help drive home the reality of the situation.
Just as Development Rolls and Engineering Rolls are Merging, So Are Marketing and Support
Marketing and Support are no longer able to enjoy the isolation bubble they have previously. It used to be that your marketing department could project a very positive image while many customers actually perceived that same company negatively, and it would take a long time for people to really understand what’s happening. Issues with customer support didn’t necessarily relate to the company’s public reputation. At least not quickly, anyway. But now with consumers who can use the same channels for marketing as for voicing support issues have changed the landscape completely. The two can no longer stay separate.
Companies must get smarter and more transparent. This starts from within.
Your marketing team must not only be made aware immediately when the Ops or Engineering team sees a customer-impacting issue, but they must learn how to present the issue in a way the end-user can understand via mediums the end-user gravitates to. Pictures, blogs and so-on with well-written FAQs and articles explaining the situation.
The best way to deal with issues has always been to get ahead of the issue, but now this must be done quite literally in real-time and continuously. And it must have the finesse of good messaging and a little dose of reality to be effective via the real-time channels. Companies have to talk about how they’re getting back to business so their clients don’t have to worry about their supporting services.
Focus on getting back online, do it with transparency and the public will adore you. Get people back up, and open for business!
Part 1 of 3 in our Disaster Recovery Series [Less]
|
Posted
over 12 years
ago
“Darren Schreiber Interview” - Short but sweet, check it out if you have some time.
An interview with Darren Schreiber of 2600hz.
|
Posted
over 12 years
ago
“Our Other Border Brother: Kamailio” - Discussion here on Hacker News.
Those of you who follow 2600hz know that we use OpenSIPs as our Session Border Controller. This is about to change: 2600hz is moving to Kamailio.
Why?
Session Border
... [More]
Controllers are critical to the success of Communications Infrastructures. They serve as Traffic cops, permitting or denying access to our sensitive Core Infrastructure. We home equipment to at least two different SBCs in order to facilitate failover in the event of an emergency or an outage. OpenSIPs has served us really well in this regard, and it’s not for performance reasons that we’re moving to Kamailio. In fact, in our testing, OpenSIPs and Kamailio performed almost identically, but there are key design decisions related to scripting, and a greater proliferation of Kamailio Infrastructure in the industry. Put simply, writing Scripts in OpenSIPs isn’t fun, and if there’s no performance gap between the two products, why not use the route with less friction?
Scripting
Writing Scripts for SBCs is critically important to our Infrastructure and our overall design. We write a lot of scripts, and they tend to have a lot of flags (defined functions). In OpenSIPs, Flags are simply listed as Flag1, Flag2, Flag3, ad infinitum… which is fine until you have a problem. Imagine debugging a script with four flags in it. Normally when you’re debugging, the name of the items inside of your script hint to the contents of your functions, but this is not the case in OpenSIPS. On the other hand, Kamailio supports the same Flag based scripting system, with the addition of naming flags. This is simple, but a critically important detail for us. If we have to rewrite sections of an SBC, we’d rather do it in a way that we’ll be able to understand later, as opposed to spending lots of time deciphering labels that don’t give any meaning. One of the problems we’re scripting around now has to do with the way Carriers present IPs to us in their system. They present Private IPs to us in a certain section of their API, and neither Kamailio or OpenSIPs can natively deal with this edge use case. A script with five flags becomes basically indecipherable, such that the engineer who wrote it originally has to essentially re-engineer the code in the event of a problem. Naming conventions for Flags are critically important.
Check out these code examples, see if you can note the differences (the code is functionally the same):
Kamailio: if (is_method(“OPTIONS”)) { if (isflagset(FLAG_INTERNALLY_SOURCED)) { route(INTERNAL_TO_EXTERNAL_RELAY); } else { sl_send_reply(“503”, “Options Reply”); } exit; }
Opensips: if (is_method(“OPTIONS”)) { if (isflagset(26)) { route(internal_to_external_relay); } else { sl_send_reply(“503”, “Options Reply”); } exit;
In the case of Kamailio we have the additional information “INTERNALLY_SOURCED” appended to the flag, whereas in OpenSIPs, we just have “Flag26”. This makes decoding finished code a pinch versus tedium.
Check out two more code examples here:
Retrieving Caching Data
Opensips: cache_fetch(“local”, “$ci”, $avp(i:56))Kamailio: $var(prefered_route) = $sht(associations=>$ci); Setting the reply timer
Opensips: $avp(s:final_reply_timer) = 2;Kamailio: t_set_fr(0, 1000);
In each example, identifying what’s happening for a command is much easier in Kamailio.
But Naming conventions are hardly the only reason we’re making this change. We do a lot of work with new technology, in particular HTML5. Kamailio has a stricter adherence to modern technology (less backwards compatibility for the sake of making the stuff we do today easier), and they seem to more prolific in the Industry. This is purely anecdotal, but in our experience, many of our larger clients are already using Kamailio for their SBCs, and given how close these various technologies are to each other, it makes sense for us to use what our clients are already using if there’s no downside. And why would there be? All of the open-SBC projects come from one root project: OpenSER.
[http://sip-router.org/wp-images/sip-router-history.jpg]
In watching the development cycles for these projects, we note that Kamailio tends to release the features we’re looking for before OpenSIPS (our specific example in this case is WebRTC, for which support was present in Kamailio before OpenSIPS). Both projects have a lot of common modules, and the modules tend to proliferate between the platforms pretty quickly. Calling these two platforms similar in terms of design would be an understatement. Consider the two screenshots from each projects respective documentation:
Here’s Kamailio, and here’s OpenSIPS:
Notice anything similar? All of the modules are nearly identical, and it’s pretty convenient.
When we started our Kazoo Project, it seemed that OpenSIPs was the dominant player in terms of performance and development. Kamailio does; however, appear to be more widely used in the industry, and we think it’s now a much more even fight. We note that the JSON module for Kamailio was written and contributed by FlowRoute, a well regarded SIP Trunking Service. To our minds, a more active development community means that other users are likely to help solve bottlenecks for us, so we spend less time worrying about our SBC, and more time writing fun new applications!
Conclusion:
We’re switching to Kamailio because:
Many of our Major Clients already use it
The Naming Conventions are more sensible
The Pace of Innovation on Kamailio is faster
[Less]
|
Posted
over 12 years
ago
Join us in San Francisco Next week for the 2600hz Kazoo Bootcamp!
Learn to build Cloud Telecom in this intensive, hands-on 3 Day Course. This course will cover Kazoo from install to massive scalability.
Kazoo is the first open-source, scalable
... [More]
telecommunications platform that runs on commodity hardware and allows you to scale across a private or public WAN with complete fault tolerance and scalability.
After this course you’ll be able to:
Deploy and Manage Kazoo
Build Applications using the Kazoo APIs
Craft a home-grown carrier using commodity hardware
Click here to learn more and save $299 on Kazoo Bootcamp! [Less]
|
Posted
over 12 years
ago
Our last post was so popular that we wanted to make sure you got the full stats. Here’s the deck where the information from our research around video was taken from. It was delivered by Andy Castonguay from Informa at TC3.
Informa presentation for tc3 from 2600hz
|
Posted
over 12 years
ago
Check out this post from Joshua Goldbard from our team on TC3; an awesome conference in Silicon Valley.
Tc3 summary from Joshua Goldbard
On ThePBXBlog here.
|
Posted
almost 13 years
ago
“How 2600hz Stayed Up while GoDaddy Was Down: “Lessons on running a reliable VoIP architecture”” - This week’s GoDaddy outage took down millions of websites – but it also took down thousands of VoIP phones. Many people aren’t aware of how their VoIP
... [More]
phone is programmed, but it must know how to reach the VoIP service provider and usually, this is done by a DNS setting in the phone. If your provider’s DNS was hosted with GoDaddy, your phone would no longer know how to reach the service provider and would stop working.
Our services remained up and active, mostly because we weren’t hosted at GoDaddy. :-) But even if we had been, our services would still have been OK thanks to an interesting strategy on DNS mitigation for VoIP Service Providers. Everyone can use this strategy, but most VoIP providers do not.
Our strategy stems from reading about past VoIP providers failures – namely a massive outage at Packet8 (8x8) in their early days. You can read more about the outage here and here.
How Things Are TodayVoIP, like most services on the web, is a weakest-link service. Any item between your phone and the final destination that is not operating properly and does not have a backup or failover system will cause the entire service to fail. Most people concentrate on making everything EXCEPT DNS redundant, which is a flaw in most design strategies.
As you can see in the left side of the diagram above, the phone and server portions of the network are redundant but the DNS server is not – there’s only one of them.
Many people are also unaware that their phone is smart-enough to be registered or proxy traffic through two different servers (some phones, even more!). You can use this to your advantage to create a redundant service that relies on two completely independent DNS providers or services.
DNS and Redundant VoIP ServicesThe strategy for redundancy is simple: on the phones themselves, define two proxy servers who are run by two different DNS providers. To do this, you’ll need two top-level domain names such as voip.my-provider.com and voip.my-provider.net. You can point the actual IP addresses at the same network address if you want, but preferably you would also point each domain name at alternate networks or, even better, use SRV to point each domain name at both networks. In this way you’ll end up with an architecture that looks more like the right side of the image below:
SRV records are the best strategy because you can have the phone or endpoint device still register to more then one server even when DNS is down, to avoid from a double-failure.
Alternate StrategiesAnother simple strategy is not to rely on DNS at all. This can be dangerous if you have customers who provision their own phones and, later on, you move data centers or change IP blocks for some reason and have no way to change the IPs of those phones. Unless you own your IPs (via the ARIN registry), this can be dangerous. Consider this solution with care.
Learn MoreOur wiki is always being updated with more information regarding DNS strategies and the like as they apply to VOIP so stay tuned there! [Less]
|
Posted
almost 13 years
ago
“Web RTC: Business Communications in a Post-BYOD World” -
Today, 2600hz CEO Darren Schreiber is speaking at the IIT RTC Conference on real time communications, with a focus on video streaming. The IIT RTC conference brings together speakers from
... [More]
AT&T, Microsoft, Alcatel-Lucent, Vocalocity as well as telecom analysts like Martin Geddes and we’re excited to be part of this exciting line up. Darren’s talk is part of the Cloud Communications track from 5:00-5:30 CDT. He’ll be speaking on the future of real time communications and how business communications will transform as we enter into a world of abundant devices and expanding media streams. If you’re in the area or attending the conference, be sure to attend! Learn more about Darren’s talk. [Less]
|