Hi all, Its something not very new, but quite young for me. I tried to make search Engin like google and yahoo. Its really not very easy to make one. It is also not very useful for you people until it is mature enough to answer your queries. It may take some weeks and lot of processing power to do that. It’s continuously increasing its database. You can see more better results every minute. And its fast enough to give it a try. You can ask any thing you want. Here is the link https://bringmefast.com This is like copy of source-code of crawler that is used for collecting data form web. https://github.com/vishvendrasingh/searchEngineCrawler/blob/master/se.py Title : BringMeFast search engine

All news at one place, shivalink.com All news at one place, Why did I do this. I known you guys also don’t have time to scroll through all the news papers like me, but we love to read them. We have got no other option except to skip or scroll fast. And when people talk about it we just do not know what happened at the end of the story. But Being a engineer I can not stay deprived of it. So its my call to make it easier to read and all at one place. Currently I combined three news paper those are my favorite one. But you can tell me if you want your favorite news paper or[…]

ttld.in A very advance crawler by Mr Nirav in node.js when tested on internet got us 1,93,142 Names and phone numbers that is 1 lac and 93 thousands people have kept their phone numbers online from india. Now our machine know them all. Will be posting more on this till then search ur ph no if you can find it there 🙂

Hi every one, How could I miss crawler 2.0 and posted 3.0 before this. Here I am posting 2.0 crawler with multiprocess facility.. 😉 Actually 3.0 thread based crawler was easy to develop, and now it is the time for release of final 2.0. Why I am making crawler,  actually me and my friends Abhijeet and Zainab were thinking of making basic search engine. But we know there are already better than our’s. Then we thought we can do some better with this crawler thing, and now one more guy joined us, Mr Nirav quite high skilled person and work on highly critical projects. Now I am more sure of finishing all this in time and make an automatic system[…]

Hi Guys, Yes as you read above, one of my colleague asked me what would be the speed of parallel, thread based crawler, Now I am posting this to so that you all can check out the speed. Now how to use this, its very simple, it is written in the file itself. check it out….. 😉 enjoy and let me sleep now! https://github.com/vishvendrasingh/crawler/blob/master/crawler_3.0_stable.py

Crazy day, I indexed 30GB file having 53 million lines of json data to elastic. Then I tried kibana with it it was really enjoyable after doing it with my drink. Link to kibana is shivalink.com:5601. Link to exastic is shivalink.com:9200 the most tough was to unzip 5GB file using all cores, it was bz2 file. I used pbzip2 but it didn’t worked in my case. Then I found lbzip2 -d myfile.json. It was really fast and used my all cores efficiently. It turned out to be 30GB then. After that how could we insert it to elastic, as I am very new to this I found esbulk and started with this. I inserted 45 million entries then It became[…]

crawler

Completed coding of recursive crawler, it was fun and a lot of hard work, some meditation, and lots of google. I finally did it. My friend Abhijeet asked to make recursive crawler and I was thinking how can I do that. So came up with this idea wo making two lists 1. processed list (All crawled urls are stored here) 2. unprocessed list (All new url are stored here) Now if a new url exists in any of these lists then skip it and move furthur. Happy crawling guys…..:) This program do the following thing store data in mongodb parse html in page title, meta data, meta keywords In case if page request fails error handling save it from breaking[…]