在此页面上
AI 原生运维概述
总结 本主题介绍瞻博网络 Mist™ 门户中人工智能原生运维功能的优势。
如果您的工作涉及故障排除、调查用户投诉或跟踪网络性能,您会发现借助瞻博网络 Mist 门户中的人工智能原生运维 (AIOps) 功能,所有这些任务都会变得更加轻松。
AIOps 嵌入到瞻博网络 Mist 中,让您的 IT 运营团队能够掌控和管理分布式网络的所有复杂性。Mist AI 应用大数据、分析和机器学习功能,智能地筛选网络信息,以查明事件并识别指示潜在问题的模式。Mist AI 还可以诊断问题的根本原因并建议措施。
这些功能缩短了故障排除所花费的时间,使您能够采取主动措施以确保积极的用户体验。无需再猜测事件的范围。不再需要大海捞针式地搜索日志文件来确定根本原因。再也不用费力重现问题,即可捕获数据包。
什么是 AIOps?
AIOps, short for AI for IT operations, is a term that encompasses technology, platforms, and processes designed to enhance experiences for both IT operators and end users.
By leveraging AI and machine learning, AIOps can contextualize vast amounts of telemetry and log data across an organization's IT infrastructure in real time or near realtime. This contextual data, combined with historical information, generates actionable insights essentially with deep knowledge of the IT domain. AIOps provides a real-time analysis and recommends or executes next steps to streamline processes.
This significantly boosts efficiency and productivity for IT teams by automating manual tasks, improving Network performance, and strengthening security postures. It's an investment in performance analysis, anomaly detection, and event correlation, enabling proactive identification and resolution of performance-impacting events. For IT teams, this means reduced downtime, fewer costly outages, and quicker incident responses.
That's AI Opsin 60 seconds.
10 分钟故障排除视频演示
在此演示中,您将了解如何使用“监控”页面、Marvis 操作和 Marvis 查询语言进行故障排除。
Hey, this is Joel at Juniper Networks. And in this video, I'm going to show you how to troubleshoot wireless and wired problems on your network using the Mist cloud. Now, in this case, we've got a network that's deployed using Mist access points and Juniper switches, and all those access points and switches are reporting data up to the Mist cloud that we can use to troubleshoot problems.
So let's come up with a scenario. Let's say that you've just received a phone call or a support ticket from a teacher who has class that's starting in just a few minutes, and they have a device that's critical for the class that they can't get connected to the wireless network, and so they need you to fix this problem, and you know that you have a relatively short time span to get this resolved because class is starting really soon. So the first place that you might look is you might come to the monitor view in the Mist dashboard, and you might look at the service level expectations or SLEs for short.
Basically, these service level expectations are here to help you understand whether the users on your network are having a good experience or not, and the way that we do this is that we measure over 150 unique client states. We look at all these different states that a wifi device can be in to help you understand what's going on with that client and to help you understand network-wide what the experience looks like for all the users. For example, we measure how long it takes devices to get connected on the network.
That's both for the wired side and the wireless side that counts both sides of the network. We look to see how often things are able to get successfully connected to the network coverage, roaming performance, throughput capacity, and so on and so forth. So let's say that, uh, let's say that you want to find this problem.
So one way that you might be able to do this is that if the teacher says that they aren't able to get connected to the network, you might first look at the successful connect service level expectation. Now for the last seven days worth of data, we're looking at a seven day time span right now. We've seen a 73% success rate.
Well, that means that there's been a 23% failure rate, and so we might be able to find the device that's having a hard time in here. And remember, I'm showing you the manual way of doing this first. I'll show you the automatic way in just a bit.
So let's drill down into successful connects a little bit to take a closer look at what's going on now for all of our service level expectations, we get classifiers that show us why devices are failing that service level expectation. So for example, for successful connects, we see that there's an association, an authorization, and a DHCP classifier. So let's do some quick math.
If we see a 73% success rate, that means that there's a 27% failure rate. Again, that's for the last seven days worth of data. 1% of that 27% was association problems, maybe driver problems, things like that.
99% of the time it was due to authorization issues. That might be a bad passphrase. Maybe we can't reach a radius server and so on and so forth.
And 0% of the time it wasn't due to, there were no DHCP problems that we saw. So then from here, we can go to the distribution tab and we can understand which aspects of the network are affected by this problem. Like for example, if we look at the list of access points, we can see which access points are seeing failures here.
And we can see that LD_GLN_AP and LD_JSW_AP are seeing a little bit more failures than usual, whereas the rest of the APs are pretty much behaving normally. Uh, things look pretty standard. Uh, therefore, you know, this is what the typical kind of performance that we expect on the network.
You can also look to see which frequency bands are experiencing problems, which device operating systems are having trouble, which device types are having problems, and which of your SSIDs are having issues with Successful Connects. Now notice that right now I've just clicked on Successful Connects and I haven't drilled down into a classifier, but you can do that at any time. For example, we could click on authorization and now we're only looking at the authorization issues.
So the next thing I want to do is I want to come over here to the affected items list. This is going to show us a list of host names that are affected due to unsuccessful connections. These are devices that are failing this service level expectation.
You can see at this device, we only get a Mac address for it because it probably hasn't successfully connected before, has failed a hundred percent of the time. This device has failed 14% of the time. And this device has failed 3% of the time.
So let's take a closer look at one of these devices to see exactly what's going on. I'm going to click on this one that just has a Mac address to take a closer look. Uh, and so first we get, uh, first we get an answer from Marvis.
Marvis tells us, and by the way, Marvis is a name that we assign our AI and machine learning engine that helps us understand what's going on automatically. I'll show you more about Marvis in just a little bit. It says the client failed to connect on a hundred percent of attempts due to authorization problems.
Interesting. Let's take a closer look to see what's going on. If we click on the view insights button, that's going to show us all the events that have happened for this device within whatever time span that we have selected.
Now, right now I have seven days selected, but you could select the last hour. You could select, you know, between Monday and, and a Thursday, you can look at any time span you want within the last seven days, and you can see that this device has been going through a brutal process where it deauthenticates from the access point, uh, and then there's an authorization failure and then it deauth, it's, it gets deauth. And then there's another authorization failure.
And this happens over and over and over. And the, the reason code here is that there was a WPA four-way handshake timeout. That sounds a lot like a bad passphrase to me.
It looks like this device just has the incorrect WPA2 passphrase. Now notice that there's also a little paperclip icon next to each one of these bad events that occurs. Mist will automatically take a packet capture whenever a wireless client enters a bad state, and this is just a normal standard old, plain old PCAP file that you can download, that you can open up in your favorite packet analysis tool, like, uh, like Wireshark or IPA or OmniPeak or any of your favorite packet analysis tools.
And so this is taken automatically to help you get to the root of the problems very quickly. Now, uh, like I mentioned earlier, we're going to show you any events that the client device goes through here and notice all of these different events that will show, will show you positive events like a DHCP success, neutral events like a disassociation. It's not really good or bad.
It's just something that happens. We'll show you negative events, events like a DHCP aborted or a DNS failure. We measure over 150 unique client states, and we'll expose many of those events here so you can look and see exactly what happened to a device, uh, within, uh, within a seven day period.
Now that feature applies to a lot more than just access points or rather clients. You can look at access points as well. Uh, so for example, we could go look up a specific access point.
Like let's look at LD_testbed_MB and we can go look and see what has been going on with this access point. Have there been any, uh, have there been any RadSec changes? Uh, have there been any certificate certificates regenerated? Have there been any DNS failures or maybe man in the middle attacks that have had, that have been detected? We'll show you anything that has happened for this client or for this access point within the last seven days that applies to clients, switches, uh, wired clients, and of course, wireless clients, you can look at all that data to see what's going on. So now let me show you the automatic way of doing troubleshooting.
And that's going to be with Marvis. Remember, I just showed you the manual way. Let's look at the automatic way.
Now, if we go to Marvis, this is where we can ask Marvis questions about what is going on on the wireless network. Like for example, you can say, show unhappy clients and Marvis is going to return a list of devices that are correlated with success and a list of devices that are correlated with failure. As you can see, Kilimanjaro has been highly correlated with failure.
And so let's check this device out to see what's going on. We can click on it and click on troubleshoot and Marvis is going to automatically using the power of AI and machine learning in the cloud. It's going to automatically troubleshoot this client device to figure out what's going on.
It looks like the client failed to connect on 100% of attempts due to authentication failures because of a possible PSK mismatch. Yeah, that looks like a bad passphrase to me. And from here we can investigate this further.
For example, we can go look to see what other aspects of the network correlate with this failure. Is it a client specific problem? Is it happening to everyone on the network? Or we can go look at all of the different events that have occurred for this client device, including the dynamic packet captures that I talked about earlier. Now this goes way deeper than just troubleshooting wireless devices.
We can actually troubleshoot the wired side of the network as well. Now we have the capability to, uh, to, uh, manage all of our Juniper EX switches. And so for example, we can go look at a specific switch.
This one's got three access points plugged into it, and we can even go look at client devices on our switches, uh, to see what kind of events have been happening to those, those devices over time. And Marvis brings this information in as well. We can ask it to troubleshoot access points.
Like for example, we can say, how is, um, let's say, how is Marvis. And this is actually the name of one of the access points on our demo network here. It looks like the AP has a low ethernet speed of a hundred megabits per second.
Here's the switch it's connected to. Here's the port it's connected to as well. Now, if you don't want to ask Marvis questions about the network, um, you can also go to Marvis actions and Marvis actions will automatically find things in the network that need to be fixed.
For example, if we look under the switch actions here, we can see that there's a missing VLAN action. So it looks like one of our switches here has two missing VLANs and Marvis detected this for us automatically. This AP connected to on port 21 is missing three, these three VLANs.
And so is this access point. They're both missing VLANs and we use machine learning to find those. We also use machine learning to automatically find bad cables and negotiation mismatches.
So Marvis can help you find these issues automatically. Thanks for watching our video about how to do troubleshooting with mist and Marvis. If you have any questions, please be sure to let us know, and I hope you have a great rest of your day.
Bye now.
仪表 板
通过瞻博网络 Mist 仪表板,您将看到:
-
一目了然的成功/失败指标
-
可视化效果,准确显示问题起源的时间和位置
-
为每个事件捕获数据包
-
根本原因分析
更好的是,您可以在问题产生影响之前发现它们。使用服务级别预期仪表板,您可以快速发现任何不符合您期望的情况。在事件发生之前采取措施。
马维斯
如果你有 Marvis 虚拟网络助手订阅,还可以获得:
-
人工智能建议的操作,可改善网络性能和用户体验
-
对话支持,包括问题识别和故障排除
-
强大的查询语言,适用于更结构化的查询
-
主动识别潜在问题