Discovering URLs by brute force.....?

Barry Margolin

2016-06-22 14:08:36 UTC

Post by b***@hotmail.com
....so, I have a site upon which things will be *posted*, you can say it's
sort of like a blog. The *audience* will go there just to *read* it,ie. all
the writing on it will be done only by Me. So - I would like to make
something like a www.site.com/post which will have a form with some type of
password checking, which will let me post a post to the site. Now my question
is, can the "site.com/post" URL be *discovered* by someone? ie. is there a
way to brute-force something with a dictionary to see if indeed, there IS
something HTML-ish there?? What exactly is there in the HTTP protocol to
facilitate this? And, 2nd question, how do I protect against it? Will Apache
do it? Apache can be configged to prevent too many requests from 1 IP, yes,
but this isn't quite that, is it?
On a related note, can *Google* do this "URL discovery"? That IS indeed, how
crawling is done, right? So does Google brute force it with a dictionary as
well??? How does it do it?
Thanks.

Web crawlers generally work by following links. They have a list of
known web pages (everything they've found in the past). They go through
them, looking for links, and then downloading those pages. And so on and
so on.

So if there's no known web page that has a link to www.site.com/post,
Google shouldn't ever crawl into it. But if any of your users ever posts
the link on some other web page, Google will most likely discover it.

There's nothing stopping web crawlers from using a brute-force guessing
method like you ask about, but there's not really much need for it. Any
web page that's at all interesting will have links to it, since the page
operator usually wants people to be able to find it.

--
Barry Margolin
Arlington, MA