Linked Resources:
- Frédéric’s Twitter (@CoperniX)
- Word embeddings
- Word2Vec
- Andrew Ng’s deep learning course
- Bing’s new indexing API
- Frédéric’s TechSEO Boost presentation (w/Christi Olsen)
- Frédéric’s article on ML and guidelines
- Martin Splitt‘s Article on JS
- Playlist for Martin’s JS Series
- BM25 (i.e., more advanced TF-IDF)
Topic Timestamps:
[0:15] intros
[2:45] why the relationship with the search community
[4:05] how can webmasters help Bing (remember Bing!)
[5:45] why is the community focused on JavaScript?
[9:00] frederic’s techSEO boost talk
[12:15] what should SEOs know relating to machine learning?
[14:15] trust, bing, and spam
[15:30] Bing’s approach on dealing with spam
[16:45] importance of high quality results (especially top results)
[17:15] search and SEO community relationship
[20:15] what makes a strong eCommerce site? (trust)
[22:15] bing on accepting feedback
[27:15] internationalization in bing
[28:15] why word vectors
[32:15] related content hubs
[33:15] more word vector stuff
[35:45] Karen Jones -IDF
[37:45] 3 pieces of wisdom
1. remember that we build sites and products for people
2. take Andrew Ng’s deeplearning course
3. sign up for BWT (submit URLs or use their brand new API)
[44:15] closing
Favorite Quotes:
- “In the end, we build all of this for people. “
- “One of the one of the way we frame it here (at Bing is) if you look at all the eCommerce websites on the Internet, one question we asked ourselves is, would we give our credit card numbers to that website?”
- “It comes from the fact that our users really trust us to serve the best and most authoritative results.”
- “Because when we fail it has real life consequences for these people. “
- ” We are an industry where we are builders. We build websites, we build products, we build the search engine. We all build these things for people.”
Transcript:
Note 1: Add about ~15 seconds to timestamps to account for intro. 🙂
Note 2: If you see notice any major errors, please reach out to seointhelab [at] merkleinc.com, we tried our best to stay true to the vocal version.
[00:00:02] Alexis Sanders: Hello. Hello. And welcome back to the podcast. Today we have Frederic Dubut from Bing as well as Max from Merkle. Max, would you like to give an introduction of yourself first?
[00:00:12] Max Prin: Sure. Thanks, Alexis. My name is Max. I lead the technical SEO team here at Merkle and we focus on the most technical aspects of SEO, such as structural data and crawling and indexing.
[00:00:25] Alexis: And then Frédéric.
[00:00:26] Frédéric Dubut: I am Frédéric Dubut. I’m part of the Web ranking and quality team here at Bing, with the specific focus on anti-spam, anti-malware, and all the bad stuff.
[00:00:37] Alexis: Awesome! And one of the things I found in my research of you, Frédéric, is that you speak five different languages. How did that even happen?
[00:00:46] Frédéric: Well, I don’t speak them very well. And really, I’m truly proficient in French and English. And then they said practice makes perfect and for language. I think like the lack of practice makes you forget very, very fast in France. And Max went through the same system that I believe you have to start studying foreign language when you are like, ten or eleven or so. And you have to say, like seven or nine years of language. So And you have to take two of them. So I picked English and Spanish. That’s why there’s two. And then I was interested in Japan in general. So I learned a little Japanese lived there a little while, and then I start toward Zurich. So I had to pick up a little bit of German. Here are your five languages.
[00:01:29] Alexis: Wow! So we could do this podcast in, like, totally in French, probably with you and Max. And I would just listen in… I’m just kidding.
[00:01:37] Frédéric: exactament.
[00:01:40] Max: Yeah, because after several years living in the U. S. You forget your French. That’s very hard for me.
[00:01:52] Alexis: I imagine that’s so true. Gosh…
[00:01:56] Max: I could actually, actually a hard time like talking about SEO in French. So everything is in English.
[00:02:04] Alexis: Is it just the work terminology?
[00:02:05] Max: Yeah, everything. All the key terms, Everything is in English.
[00:02:09] Alexis: Ah, that’s so interesting! And fascinating. Awesome. Okay, so if we dive into some of the meat of the podcast, one of the things that I’ve been seeing, you (Frédéric) speak on the circuit a lot. And, of course, thank you so much, because it’s so fascinating. One of the things that I’ve been noticing is that Google and Bing as well, especially, have been integrating more in with the search community, which is awesome to see, and one of things from the SEO perspective that I’m really interested in is – what can we, as SEOs, do to support Bing?
[00:02:40] Frédéric: Yeah, that’s a good question in general, the reason why we want to interact with the communities, that they keep us honest. In the sense that we know the product we want to build. we think we know how it’s working and then You talk like ten minutes with SEO’s, and they tell you exactly. No, no, no. This technique you thought you eliminated it actually works Great. These kind of things, eh? So for us, it’s really enlightening. So any feedback the community has, it is definitely the best way to help us make a better product for users.
[00:03:12] Alexis: So it’s almost like by going to these SEO conferences, or search conferences, You guys were doing some product research?
[00:03:17] Frédéric: Yeah, absolutely. And I like the product of a program manager role is very focused on customers, understanding users. And we have, we’re in a very interesting position at Bing, where we have two different set of customers, so to speak. We have the final users who are actually using the product and entering the search queries. But the Webmaster community as well and SEOs is extremely important. Without webmasters, without people who write great content for the Internet, there would be no point in having a search engine. So for us, it’s extremely important to interact with both.
[00:03:52] Alexis: Definitely! that’s awesome! And are there any specific tasks? I know that when you spoke back at SMX East, you talked about how we could optimize our crawl efficiency as something that is helpful to Bing and is really useful. Is there anything else like that that you can think of that at the end of the day makes both our websites better as well as Bing a better search engine?
[00:04:15] Frédéric: Yeah and for a lot of people, it will be just making sure the basics are working (in terms of crawl indexing). A lot of the technical SEO is not very different for Google and Bing. But what a lot of people don’t realize is sometimes they just allowed Google to crawl everything and Bing gets a disallow.
And if you are an SEO, webmaster, and you complain that you feel you’re too dependent on Google to get your search traffic, but at the same time, you’re blocking all the other crawlers, all the other search engines from indexing your website… Well, you’re never going to get away from that situation. And those people don’t realize these are like two very related points, so make sure the basics are working for being in the same with our working for Google is definitely number one thing for most people.
[00:05:02] Alexis: I love that point. Think of Bing, remember Bing.
You obviously can’t see me right now, but I’m wearing a Bing sweatshirt today, so really reppin’ Bing.
[00:05:12] Frédéric: I don’t. (Lol) There’s only one person on this podcast wearing Bing swag.
[00:05:18] Max: That’s not true! I have a Merkle branded jacket that has a Bing logo on the shoulder.
[00:05:20] Frédéric: Nice, nice, there’s only one person on this they’re not for wearing Bing.
[00:05:28] Alexis: That’s awesome. So you reminded me of a tweet that you had recently where you asked people in the search community what they’re interested in hearing whether it was about JavaScript, machine learning or search history, right? And the top one was JavaScript. Why do you think that this search community is so fascinated with JavaScript?
[00:05:48] Frédéric: Well, I think there’s a legitimate concern from the community that as their websites are getting more, more complicated, the search engines are not going to represent them (in their index) in the best way.
There is a lot of misunderstanding around what are the best practices for JavaScript (or for other things we should do or the things we shouldn’t do). Maybe in the search engine side, there was some miscommunication in terms of “do we support JavaScript” or “we don’t support JavaScript”. It’s much more nuanced than just saying, “Oh, yeah, of course we support JavaScript.” Yep, in Bing we can claim with support JavaScript, in the sense that our crawler is able to download those kind of resources. Render the pages for most frameworks. But there is also a relative very process intensive, additionally intensive process. So you have fairly little control in what search engines are going to index on your site from JavaScript compared to a regular HTML. And I understand that can cause some nervousness in the SEO community. So that’s probably why there is a lot of questions and concerns.
[00:06:54] Alexis: Yeah, so like a lot of anxiety combined with probably like you said, some different formats of information. I know that one of the things that we’ve seen is that for certain sites, JavaScript – totally fine. But then you’ll go. You’ll have another site experience where they’ll switch over to a very JavaScript, heavy experience and their traffic will suffer from it. So I think that lack of consistency in terms of experience is so fascinating, and I think it’s something that makes people really anxious because they don’t want to have that type of trend in their performances as well. So thank you for sharing that. So are you going to actually write a piece on that?
[00:07:26] Frédéric: I think the piece will be on ML and guidelines actually. One of the reasons is Google came recently with very good article. I think it was Martin Splitt who wrote it about RenderTron and Techniques to make it easier for websites to be indexed when there is JavaScript. I think there is a lot of literature that has already being written on JavaScript. So I felt that even though, it was fairly clear, I think was forty five forty one percent that close, close enough that I felt there was not enough written around ML and guidelines, and that’s probably why I’m going to write about all that.
[00:07:59] Max: Yeah, I was about to say, That’s the good news about, like best practices for JavaScript. Is that what you can do to make sure search engines can understand your Content is to serve a prerendered or HTML snapshot and it goes for both like Google and Bing. once again like what you do for one search engine, it’s not different than what you would do to optimize for another search engine, so optimization of time.
[00:08:22] Alexis: Optimization of time. I love it. And I love the idea of basically endorsing content that Google has already done, saying this is fine, it works similar for Bing, let’s focus on what we need to with machine learning. Which brings up the talk that you had a Tech SEO boost, which was so fascinating as well. And I loved your little quip where you said that Bing was the first to have to be powered by a neural net, that’s so exciting and so interesting.
[00:08:45] Frédéric: Yeah, that’s a little known fact that that’s why Christi (Olsen) and I insisted we kind of hammer it like at the end of every conference now. We say it’s like Bing was the first one. Interestingly, these were like very rudimentary neural nets at the time. Like, our founder with deep learning. It wasn’t really deep in it in any way, because only one hidden layer and it was well, it was very simple. It shows that it’s something that’s been tough mind at Microsoft research and Bing search. Now at Bing for quite a while, we believe the best way to scale search is to use machine learning to make the machine learn about one of the best results to be returned for a query. And that’s why we have taken this approach that may be slightly different from what other major search engines do.
[00:09:30] Alexis: I love that it’s like the most shallow, deep learning that you have. (lol) I’m just kidding, of course.
[00:09:36] Frédéric: That’s right. It’s just quite bits. That was thirteen years ago. So…
[00:09:41] Alexis: It was deep for thirteen years ago, exactly. And you mentioned to they had, like a ton of features over, like, five hundred features that were engineered into it. Which, I mean, it’s one of the very challenging things that have probably been custom done.
[00:09:54] Frédéric: Yeah, and honestly, I don’t I don’t know exactly how they did it back in 2005. But future engineering is definitely a big challenge and that that’s why a lot of the discussions around ranking factors sounds a bit funny, especially for us at Bing, because some of the features are like derivatives off like several other features. You combine thing and it’s a very, our engineers take is that it really is a machine learning problem, so they create new features that will really make a lot of sense for humans. But have actually a great predictive power for the for the model. And that’s where, like, this ranking factor thing, like always comes in a bit odd for us.
[00:10:33] Alexis: yeah, I loved how powerful your example of what a machine sees is so different from what a person sees and in your example, You had used a stop sign where basically, all you did they did was cover up a small part of it, and the machine from that saw something totally different, which was a speed limit sign. And I think the idea that machines process information differently than humans process information is so interesting and so fascinating and probably something that you have to deal with on a daily basis.
[00:11:00] Frédéric: Yeah, then that starts, like with some of the worst cases where we see things better, different from what you are seeing, like cloaking and this kind of things that’s more like it. These are considered, like the cardinal sins of SEO and search, because if the machine can’t even access the same thing as users like lose all trust in everything you’re doing. But even like going back to JavaScript, that’s also exactly what the problem is with JavaScript is not having the guarantee that machine is going to see all the goodness you’re showing to users when you have a JS heavy page. So I’m training all these features reading all this knowledge can get complicated for sure.
[00:11:44] Alexis: Definitely. So what do you think is the most important part of machine learning for search professionals To understand? Because you and it is so many complicated elements like Bing’s LambdaMART, vector space, (which I love, that I really hope the “it’s the same in the vector space” catches on in the industry) and of course, RankNET. What do you think is really important for people who are maybe less technical or less well versed in mathematics to understand about what you’re trying to achieve with machine learning?
[00:12:13] Frédéric: So that that’s where I think the guidelines come rolling to play. If you look at the process of machine learning it, it can get pretty complicated from a technical point of view. If you’re not technical, that that sounds like a foreign language. So what is really important to remember is it’s a way to generalize search algorithms that is trained with how humans will be judging the sites according to the guidelines. So the way we train our mission early model, we have a subset of queries and URLs, and we send judges to these websites, and we ask them to rate them according to the search quality guidelines and that makes your training set. And we hold a little bit of this data as, like validation and test set. And then that’s where you train your machine learning algorithm. You want to go with them to perform really great on this small subset of queries and URL’s that have been judged by humans. Then you validate with other metrics that generalized pretty well to the 1,000x more queries and URL’s we see. So in the end, thinking, with my site, according to guidelines, get to perfect or excellent or good rating is probably a good way to think about it.
[00:13:26] Alexis: Nice. I love the idea of using humans as almost like he said, training all of that data so that you can, then iterate on that process to make it more efficient and better in the end, there’s like something very beautiful in that. Hopefully, one day it’ll be all machines, right?
[00:13:40] Frédéric: I don’t know if I would trust machines to the 100% of the work. For one, I like my job. I don’t want a machine to take it. (lol) In the end, we build all of this for people. So keeping people involved in the process, keeping the machine honest. Looking whether the results make sense, not just that the metric looks good. I don’t think it’s going away anytime soon.
[00:14:04] Max: And Frederic, you talked about trust towards a website. I remember from experience that Bing is pretty aggressive with, like spam or like a big red flags about websites. A few years ago, I remember website that we were launching and it was a dot info. And just for that reason, it could not be indexed right away. Can you tell us more? But like maybe some, some big red flags that Bing, as in the system that say “that website, most likely not a good one.”
[00:14:36] Frédéric: So like in the same way that I don’t think there is any like silver bullet in terms of good ranking factors, when you go outside of the worst offenders like cloaking, I’m not sure there is anything where we would we ban outright a website. I think what happens at Bing, compared to other search engines, is we tend to see violations of a Webmaster Guidelines, as mostly voluntary.
I when I hear Fili Wiese (and I know he doesn’t represent Google anymore), but he talks about this manual penalties and he says, this is mostly education and if people fix their issues, (Google will) remove the penalty and everything is great.
And on our side, we take probably what I more punitive approach, where if you try to cheat the system, you’re going to have a penalty that is going to last for a while because we don’t want you to cheat the system again. And we’ve seen before if we remove the penalty. The sites tends to just do the same things again so that That’s why I like when you say we’re harsher on spam, I think the idea of spam is fairly similar. But the way we approach it is a bit different. Maybe more of a punitive way to make sure like people who live by the rules, actually are ranked higher in the results.
[00:15:45] Max: That makes sense.
[00:15:47] Alexis: It brings back this idea. The fact that you know your site is a relationship between your experience, your users, and then also search engines as well. Because there’s almost like this implicit trust that’s formed and you mention the word trust. Of course, I know that with Google, this whole idea of expertise, authoritativeness, and trustworthiness is becoming more and more important or popping up a little bit more. But I think it’s so interesting and fascinating that you know you’re using that as a standard, almost as if it’s an actual relationship.
[00:16:16] Frédéric: Well, yeah, and it comes from the fact that our users really trust us to serve the best and most authoritative results, and especially for queries is like the tax season is picking up, and people want to make sure that they’re not giving their social security number out, and all of their confidential information out, scammers. And a lot of people will trust whatever comes at the first positions on Bing. And If they click on the first link, like they cannot even imagine, most of them actually cannot even imagine we will send them to a scam or anything like that. So it is a huge responsibility for us. That’s why we take it extremely seriously. Because when we fail it has real life consequences for these people.
[00:17:00] Alexis: Definitely. And do you think, Oh, I know that one of your articles that you mentioned, that you’re thinking about running with the history of search, which I don’t understand why it was the least popular? Because I feel like to hear from your perspective of the history of search would be so incredibly fascinating, because I felt that (and I don’t know if you’ve felt that this as well) that as time has gone on, people have gravitated more towards that first result. Where is in the past? I mean, when I was younger, just remember almost being told to more critically evaluate all of the results that were coming through, and then now it’s like, oh, just click on the first one. Whatever that says is fine, which probably shifts more responsibility onto you as a search engine and your team.
[00:17:38] Frédéric: Yes, and so there are two aspects to this from one sense, and it makes it slightly easier because if it puts more weight on the number one number two number three results, that means also like the weight of responsibilities is lower for in return, things that are not necessarily the best results like number nine or number ten. For some queries like if you type something like [Facebook log in], there is an excellent number one result I can think of and not much more two, three, four, five that I think would fulfill their user intent.
So to some extent it makes it a bit easier for this category of very navigational like very explicit intent queries. But on the other hand, you’re right, that it’s definitely changing. If you’ll get the best twenty years, search is to be more of an information retrieval problem. So really the idea of like, as you mean, this is a library of all the knowledge in the world. “How can I find the ten best pieces of information or the ten best books in the library” to match this this query and slowly, we’ve evolved towards more like task completion, actual transactional intent, and also more and more money got involved. And so that’s where you get spam and people are between SEO and that that’s probably the main thing that changed like search. Like the idea that you get a lot of people who love to be a number one and we’re going to do whatever it takes to be a number one. It’s not just an information control problem anymore. It’s becoming a really full-fledged products where all of these dimensions relevance, quality, context fall into place.
[00:19:23] Alexis: You know, when you have a lot of money on the line, I can imagine there’s a lot of consequences that could happen. And, of course, we’ve heard recently about so many different breakouts of data and data leakage issues, so super fascinating. So thank you for sharing that with us. Okay. Do you’ve any questions, Max?
[00:19:39] Max: Yeah sure. Since maybe you in Seattle and I hear there’s a big e commerce company in Seattle. If you can tell us, maybe it might be a little bit outside off, like the internal, Bing system. But like what For you makes like a great, like, eCommerce experience like features on the website that user expect maybe, and that then, yes, maybe that Bing will reward without, giving away ranking factors. That’s not really my question, but something that you guys are looking for because users are looking for it.
[00:20:13] Frédéric: Yeah, when it comes to the Bing eCommerce company in Seattle, it makes our life will be easier because one of the one of the way we frame it here if you look at all the eCommerce websites on the Internet, one question we asked ourselves is, what would we give our credit card numbers to that website? And so when it comes to our neighbor in Seattle, sure, like, I think anyone in the world so confident that if they give them their credit card number, is going to be taken care of with the greatest care and they’re not going to get unwanted charges. And on the other hand, there are many websites on the Internet were like, never ever I would even give like four digits of my credit card and when you look at these sites, this is really the question you trust yourself. Like, Would I give them my credit card number? It works in the user’s mind. It’s like, what are the trust factors on this website? Does it look professional that have an actual contact address that we can look up somewhere. I know that Google has in their guidelines has the BBB rating, and I don’t think they use it at the ranking factor or something. But the idea that someone else is vouching for you is something that you need to take into account if you have an eCommerce website from a trust point of view, all of these things are probably the number one thing you want to make sure users are willing to do business with you, are willing to give you their credit card number, And that’s what we’re looking for at the end, user satisfaction.
[00:21:38] Max: I love that you said that, because just from a design standpoint, it’s today, it’s with frameworks and built-in features and even would Bootstrap in orders like a CSS and HTML like from works that you can find out there. It is pretty easy to make scam looked really good and really professional. So I’m glad to hear that it’s not just about that website can look good and be still a scam. And hopefully we won’t see it popping up in any search results.
[00:22:06] Alexis: Definitely. Just it sounds like it all comes back to trustworthiness. So kind of really excited to hear that. Okay. All right. I’m going to go back to one of your tweets. In your tweet, You mentioned that you review user feedback and that you set aside a specific time to review that, which is really exciting because I feel like I’ve really felt a lot of positive energy coming from the Bing team in terms of almost doing a listening tour and trying to figure out what’s going on in a space. And how can we then learn and react from that? So how has your time that you’ve spent reviewing user feedback ever resulted in a new project?
[00:22:39] Frédéric: Yeah, that’s Ah, that’s if only super important to look at feedback. That’s a personal belief I have that as products or product managers, it’s an essential part of our job. And I don’t know if you can do good product manager work without listening to your customers and users and partners. I can think of two examples where it’s been extremely useful.
And one, it was a very visible feedback. If you remember last spring, I think Yoast posted something about Bing crawling too much. But they have a lot of data, probably from their plug-in, and they are very well informed on these problems. And we took the feedback very seriously and way heard before from other people. That Bing tends to crawl too much compared to Google, and that’s something we definitely started to look at very closely. And that’s what resulted last week. I believe in this new indexing a API were announced at SMX West, as well as the integration with the Yoast Plug in on My Yoast, which was announced at their conference last week.
So this is very concrete case where the feedback we’ve been listening to, and we’ve been aggregating, compounded with someone very visible and very vocal who forced the same feedback resulted in something extremely concrete that we announced in the past couple of weeks.
Something that is a bit fuzzier probably is around spam and all the all the times we are failing our users, so to speak. And I take the feedback extremely seriously. And when I hear several different people tell me, if I type a query for this domain, like the name of the drug or this kind of things, and I really see bad results. This informs where we’re going to invest our resources. And if I hear that a certain area is getting more and more spam, or if some very technical people come to me and say, I notice that this category of site, putting these key words in this way or whatever is ranking higher than they used to. This is just all goodness.
So I invite all the listeners if you have any feedback you want to give to us, you can tweet at me directly on Twitter. Or you can use the feedback form on Bing on the upper right menu and we take it extremely seriously.
[00:24:58] Alexis: That’s awesome. It’s almost like keeping one ear to the ground just to make sure that everything is going well, like a pulse, which is awesome. So thank you for doing that.
[00:25:05] Frédéric: Yeah, and in the end we do it for our users. So, like we have a lot of ways to scale or understanding of user satisfaction with metrics and numbers. But there’s nothing like qualitative feedback, like actual people. I have a personal belief that if you talk to ten users and you listen to their actuals verbatim feedback you learn so much more than just looking at a number, even if the number of covers one billion users.
[00:25:35] Alexis: It’s just so interesting to hear, though, that that qualitative feedback is so valuable because I think a lot of times when we think about data, we think about data and the massive amounts of information that, like even we receive on the webmaster end. And I mean, I can only imagine how much you guys received on your end. But we usually think about all quantitative, quantitative, quantitative. But the value of qualitative data is so interesting and how it can give you a totally different perspective. So thank you for sharing that with us.
[00:26:02] Max: I’d like to go back on the fact that there’s not a lot of differences. And what webmasters technical SEO’s and the SEO can do to optimize for search engines like at least Bing and Google. There was one that I can think in term of, you know, those technical tags that we implement, and things that we do a hreflang tags for international SEO and we all know that hreflang tags – they do work for Google, well most of the time, but it could be extremely complicated, setup and implementation really are to manage. Bing has not been on board with, like a tag, can you tell us a little bit about like, how are you guys like, really handle that? Not duplicate content, but violations, international violations and how you detect like the targeted audience, basically for this website that are multiple, like regional languages to target.
[00:26:55] Frédéric: So I’m going to be a very disappointing answer. I’m not very familiar with the hreflang tag treatment at Bing, so instead of giving an answer that I think would be inaccurate. What I can tell you is if you have a Web sites like, let’s say, blah dot com and in English blah dot fr in French. And if it is the same company and like, we have some ways to detect that this is not duplicate content that this is actually like two different language is the same thing if you have, like, slash en and slash fr on given website. But in terms of hreflang, I just don’t know, so sorry about that,
[00:27:43] Max: Yeah, as well, as we know, like officially Bing does not support hreflang tag again. That’s not something that I’m really surprised of because it’s a very complex implementation. I even heard people at Google that have been working on creating those tags that they’ll not extremely satisfied with the way it turned out. That it turned out to be more complicated, that they wanted it to be.
[00:28:04] Frédéric: Well, What I can tell you is it’s already complicated enough when you have only one language and in two different websites, and you want to do just a simple redirect from one to the other or simple economical. And sometimes when I look at the presentations from other SEOs in in conferences. And they show this super complicated graph for like, four websites, all canonicalizing to one another with the hreflang in like multiple foreign languages like, it just sounds like an extremely hard problem. So I’m not surprised that some people at Google say it is hard, we don’t get it right all the time.
[00:28:41] Max: Yeah, Sure.
[00:28:42] Alexis: That reminded me just a concept of different words going back to the question of vectors, you talked about in your tech SEO boost this idea that when you associate words as vectors, it ends up being more efficient. Why vectors? And I’m mostly curious because I’m in a class, and we literally just learned about how to calculate the distance between two vectors. So I loved when you muttered under your breath, you’re like you could just use the cosine of the angle. I was like, uou totally can. (lol) It was like I can find you the formula for that. But I was curious about what is it about vectors? And for people who are less technical with math, vector is almost like just a direction or an arrow with a line. So if you look at Frederic’s presentation, you can get that type of visual or just Google word vectors. But why are word vectors so useful?
[00:29:28] Frédéric: so so in ah, in summary, like the key concept here is embeddings, and the idea is that you get, I don’t know, maybe one hundred thousand words like a million words in the language, and you want to find similarities between the words. So the way we do that is we convert these words into a series of numbers. And, like, depending on what the exact implementation we have, it’s a one hundred numbers that are going to represent what this word means. And you train your model so that words that mean roughly the same thing or that are similar have numbers are close to each other. And so that’s a nice way to essentially compact the knowledge in your dictionary into a simple representation of one hundred numbers. And so although all these numbers represent different direction in them in the most dimensional space. So if you imagine, like the real world of three dimensions, there, like three numbers, like left right like that so to speak, in this world, it’s like one hundred different dimensions. And so we tried to find the similarities, and in the end, you mentioned like we measured the distance essentially between two different words. And so if you have something like, let’s say apple and orange, these are like fairly different objects. The words are completely different, but these are fruits, so you the concepts are still, like relatively similar. So I expect these words to be relatively close in the space. And the reason why it’s extremely useful for search and SEO in general is it just gets you away from this idea that you need to see synonyms or you need to make sure that you cover like ten different variations of the same concept. The hope here is that the machine is going turns than that. If you are, ah, fruit distributor, you don’t need like apples pears distributor dot com, orange business dot com, pear business dot com. We understand you’re a fruit distributor. All these things like makes sense to the machine so that that’s why it’s extremely exciting for us as A development.
[00:31:51] Max: I love that you say that I always used the superhero example like telling people that, Yes, if you do want to rank like about Superman, then maybe good that your website talk about, like Batman or Spiderman. And again, as you just mentioned about the fruits, they’re all different words. But they are related because there were, like, superhero name, and it will make the website worth more relevant for a particular topic. And, something like that, I need to expand to the context of what the topic is actually about.
[00:32:22] Alexis: Yeah, and it’s almost like a lot of people in the industry I’ve noticed over the last, probably two years have been talking about this idea of entity optimization versus focusing on keywords but focusing on that overall, being known for something, essentially.
[00:32:36] Frédéric: Yeah, that’s very interesting. We will be working on entities for quite a while at Bing, and there was a time like before and entities and vectors, and this concept of similarities really caught on where this was a bit much more, kind of handcrafted, so to speak. And so you would have liked this very strict relationship or like an entity links to another with, like, for example, Microsoft is a company. So next to the type was really a field in the entity is “company”. And then “is CEO” is like Satya (Nadella) and that would be like a related person. And then the relationship and it feels like What is relational? It is a manager and like and what? It’s kind of magical with these vectors and entities is – all of these relationships come completely natural. You don’t need someone to tell you exactly what is the relationship between Microsoft and Satya. And what is extremely interesting to look at the literature and that that is probably one of the most fascinating properties of these vectors, is if you, the distance between Microsoft and Satya Nadella in the Vector Space is the same as the distance between Google and Sundar Pichai.
[00:33:26] Alexis: Weird…
[00:33:30] Frédéric: And so, like you just drove, like essentially a triangle between Microsoft, Google and Sundar Pichai Then you can extremely easily find that Satya has the same relationship with Microsoft which is similar to their relationship with Google and I find that it’s really fascinating, and that just makes sense, the relationship so much more powerful because you can just learn them in the wild. Instead of being to handcraft them over time.
[00:33:30]: That’s so mind blowing. And when I’m visualizing this, I don’t know if anybody has seen the graph of Word2Vec. But basically it sounds like exactly what you’re talking about, which it’s probably stands for word to vector, but basically it’s like that three dimensional graph of words. So you like, you’re talking about You could almost see the clusters of information of things that are, like, similar and related together as almost like a group of things that air in one area. Just kind of cool to think about. But that’s actually it’s even more mind blowing that like that relationship, the distance is exactly similar. That’s crazy. Yeah, mind blown.
[00:34:52] Frédéric: I think in their example, they used a man, woman, King, Queen, if I remember correctly. And yeah, that’s exactly what I had in mind. So I think if you if you’re if you’re in technical SEO reading the work to make paper or in general like these foundational papers and word embeddings.
[00:35:11] Alexis: that’s so brilliant and so fascinating, too. I really hope the you know it’s pretty much the same in the vector space like no, totally different in the space. I really do hope that catches on. I think it’s kind of like, interesting to think about. I mean it basically, just if you were when you said something like that, it inferred that like, it’s all about relevancy. But I just think it’s kind of like another funny way to say that. I think that SEOs tend to find funny ways to say things. Also, I do want to give you shout out – I thought it was really cool that you mentioned Karen Jones in your speech. I know that she recently passed away, but really cool to have women of science mentioned and especially lauded for their accomplishments. So, thank you for that.
[00:35:52] Frédéric: And she is really one of the most important persons in the field of information retrieval, which is like the precursor to search (and SEO). And if you look at her work, a lot of people talk about tf-idf. So she’s the mother of -idf. And this specific part of the formula is actually one that it survived the time, so if you look at for more advanced things like BM25. The tf- part has been changed quite a bit. But the idea of the -idf is almost exactly the same. BM25 is considered state of the art today for informational travel in in some sense. So it’s quite incredible that her work, really, is still extremely relevant to the field, like forty years after she wrote a paper on the idea.
[00:36:39] Alexis: Isn’t that crazy? I think. Isn’t that, like every scientist’s dream that their work out-lives them? So amazing.
[00:36:45] Frédéric: Yeah, There is in a lot of conferences, They have these conferences there where they call the “test of time” paper for and they look at all the papers that were published ten years before. I think ten years is the canonical time. And they give the award to whatever paper is still relevant or the most relevant at the time.
[00:37:07] Alexis: And I mean, obviously something that we want to encourage our scientists to do is have relevant papers!
All right, so for the closing question, Frédéric, basically, I’ve been asking all of the other people that have joined the podcast. What are their three golden nuggets of advice, which is essentially – what should you do from an interpersonal level, a site-related level or really, just a personal development level? Could be anything but just three pieces of advice that you have for our listeners.
[00:37:35] Frédéric: That’s ah, that’s a great open question. I would say that the number one is to remember that you build things for people. We are an industry where we are builders. We build websites, we build products, we build the search engine. We all build these things for people. So this is the number one thing, my goal, as a Bing product manager, is to make sure the product is going to be useful to the people who use it and the consequence for you as webmaster or SEO is – it is important that the content you build is going to be useful to my users, because I mean the intermediary between my user and you. So I want I want to be able to vouch for you and say, “Yes, I think this is a great result, and I happily send by users to you.” So that’s definitely like the number one from, Ah, more technical point of view. On. I’m just going to reuse what I said a few years ago.
[00:38:32] Alexis: I totally feel for you. You can totally reuse whatever.
[00:38:35] Frédéric: And definitely start looking at embeddings and similarities and how modern NLP is done with deep learning. If you have a little bit of time, take the Coursera from Andrew Ng. There’s a machine learning 101. And there’s the deep learnings specialization, which is a set of five different courses. I found it easy to take deep learning, even without the machine learning knowledge. I just happen to do the machine learning before, but this is a great course. You don’t need a lot of technical math background, and he’s going to give you a lot of the understanding around deep learning. So that’s That would be my advice. Like if you if you can blow a little bit of time over the next few months to take this specialization or even if it’s not on Coursera just like, learn more about these things, deep learning and how it’s using NLP that that is the future. That is really the future. You get an edge just by learning about these things.
[00:39:37] Alexis: Yeah, I love that. Andrew is definitely the man, too. So…
[00:39:40] Frédéric: absolutely he worked with biggest companies. Like just not Microsoft. We need to hire him at some point, just so he can have the Grand Slam Big companoes.
[00:39:52] Alexis: Yeah, when you look up his resume, he was very high up in Google. Then he worked at Baidu. So. Yeah. You guys totally need to hire him at Bing So he has, like everything.
[00:40:02] Frédéric: Exactly when, when your lowest achievement is being a professor at Stanford like that, just speaks like…
[00:40:08] Alexis: That’s so true, but yes he’s actually his class on machine learning is also very good on Coursera. And then I think it’s a little bit better than the one that’s on iTunes University because that one’s basically the older class, but specifically for Stanford. Yeah, great, great point. I’ll definitely I’m going to check out that deep learning course too now.
[00:40:29] Frédéric: Yeah, the machine learning one is definitely a bit more technical. I think, and especially he had two versions, one on Coursera, and he had the one on the Stanford website. But they think the iTunes one, that’s really the one he had on Stanford. The one on the Stanford websites – it was really assumed that you basically followed all the classes at Stanford before. And so we have, a lot of knowledge in algebra and like a lot of things like that.
[00:40:58] Alexis: yeah, if you don’t know what partial derivatives are, it’s very discouraging. (lol)
[00:41:01] Frédéric: Exactly. He forces you to compute them. (lol) Whereas the deep learning one, you don’t need that technical background. And a lot of times he says that he actually say, if you know about these partial derivatives and everything – Great – here is like some reading for you. If you don’t know about it, just forget about this. Understanding concept Is somewhat more important than being able to complete it partial derivative.
[00:41:25] Alexis: Yeah, I love that idea of having to understand intuition of what you’re actually trying to achieve in math. I feel like that’s something that’s underappreciated art, which I thought you did a great job in your tech boost talk as well, like saying like, “Well, here’s the intuition of it, you know?”
[00:41:40] Frédéric: Well, I guess I tried to channel my Andrew at Tech SEO boost, because he does that a lot. I think these videos and then when we talk about one hundred dimension vector spaces, it’s gonna be extremely hard to visualize or understand what it is. And so a lot of time in his videos is going to explain the intuition behind it. And like, why we do this a certain way. And that’s why it’s just a great, great series, of course, and not just like good deep learning class, It is just like I think the reference at this point to learn more about deep learning.
And I will use my my third key points, maybe to do a little bit of upselling for Bing. In the sense that, we released this crawling, indexing API very recently. There was an integration with Yoast, but you don’t need to use Yoast.
If you have a website that is running on any platform, you can still go ahead – register with Bing Webmaster Tools and start using the API or even just submitting your URLs directly there. And for most websites, if you do that, you should like differently great improvements in terms of the crawling and indexing. So that would be really my top recommendation. If you feel you’re old Index isn’t crawling properly with Bing start with Bing webmaster tools, submit some URLs and that should solve most of the problems
[00:43:05] Max: Are you saying that we should not use XML sitemaps anymore? (lol)
[00:43:10] Frédéric: XML sitemaps are good, but they are just a least of all – like the way we see it – it’s the least of all the URLs on your website. And if have a million of them, maybe you care a lot about ten thousand of them, not the entire one million.
And that’s great, because you can submit ten thousand URLs to the submit URLs feature on webmaster tools. And these are the ones we’re going to prioritize. So we can discover all million from your sitemap. But instead of letting us decide which ones are more important, we just prefer you telling us which ones are important.
[00:43:40] Max: That is amazing. Thank you, guys, for putting that together and making it available.
[00:43:44] Alexis: Yeah. Congratulations! That’s so exciting and exciting for us as well in the search community, so thanks! Well, I know you have a tight time schedule, but thank you so much for coming on our show and for educating us all on Bing and the history. And of course, some of the more technical knowledge as well, very, very exciting and very honored to have you on the podcast. We’ll definitely have to check out some of the more technical things, like embeddings as well as deep learning. So thank you for that as well! It’s been an honor!
[00:44:12] Frédéric: Thanks for having me!
[00:44:13] Alexis: Alright. Thanks, ciao everyone!