1. Take a look at our script references and resources here. Make sure you are using all commands, functions and features as instructed.
2. Please isolate the issue. If you think the issue is in a certain command, feature or function, try to isolate the issue to the commands they are affecting.
Go into your tools menu and check the show errors option. Use the step feature and find where the error occurs exactly. Isolate that code that is causing the issue.
3. Please send a small, concise script that reproduces the issue. The sooner we can reproduce the bug, the sooner we can have it fixed for you.
Please refrain from sending large scripts without following the above steps.
The sooner we can verify, isolate and reproduce the bug, the sooner we can have it fixed for you.
Thanks so much for your cooperation!
]]>I search for it, and now I would like to choose only the ones I want on the page. So I do not want links that start with:
http://bing.com/etc/etc/etc
I just want the search results for websites on Home Made candy and we would like to get this accomplished with regular expression. First, choose a link you would like to scrape.
Choose the link by attribute.
A parameter window will appear. Choose the link by href and select the Regular Expressions option on the far right corner of the parameter window.
Now you will delete everything after HTTP: in the second column of the parameter window. You will then place your regular expression code in that field.
The code I built for choosing the links I want is the follow:
Now let's break the code apart a bit just so it makes sense:
http: is what the link will start with
http: [^0-9]{2} now means we are looking for two non digit items after the http: So this could be two any of these (./-+ etc) The reason why we do not just type in // is because the / sign already means something in regular expression.
http:[^0-9]{2}[a-zA-Z]{3,} means that after the http://, we are looking for 3 or more letters, lowercase and uppercase, after that.
http:[^0-9]{2}[a-zA-Z]{3,}[^0-9]means that we are looking for another non digit item after the word. So again that could a any or these signs (./-+ etc)
http:[^0-9]{2}[a-zA-Z]{3,}[^0-9][^bing] now means that we do not want our link to contain the word BING (more the letters, B or I or N or G. You could write it out as [^b][^i][^n][^g] but that would be long and tedious) which is our search engine at the moment. This is because that usually signals that the link is a stray link for ads and other links to different pages on the search engine website. The urls have nothing to do with what we want, which are the search results.
http:[^0-9]{2}[a-zA-Z]{3,}[^0-9][^bing][a-zA-Z]{3,} now means that we are looking for 3 or more letters between lowercase a-z and uppercase A-Z after the non digit item.
http:[^0-9]{2}[a-zA-Z]{3,}[^0-9][^bing][a-zA-Z]{3,}[^0-9][a-zA-Z]{3,} Notice that we start repeating codes here, because we are basically dealing with scenarios like http://homemade.com/apples/pies/etc
And so finally, we end up with this altogether:
http:[^0-9]{2}[a-zA-Z]{3,}[^0-9][^bing][a-zA-Z]{3,}[^0-9][a-zA-Z]{3,}[^0-9][a-zA-Z]{3,}[^0-9][a-zA-Z]{3,}[^0-9]s*h*t*m*l*
This regular expression is going to match and find links like these:
http://homemadecandyideas.com/
http://www.homemadecandy.info/
http://www.wchstv.com/gmarecipes/homemadecandy.shtml
But it will ignore links like this one:
http://www.bing.com/explore?q=home+made+candy&FORM=BXFD
After inserting your Regular Expression into the parameter window of the Choose By Attribute command, click ok, and add a save to file command or an add to list command, and insert a scrape chosen attribute to scrape the items by href, like the following:
The results of your scrape will look like this, with all the stray unnecessary links removed from the bin:
]]>