The Educational CyberPlayGround Educational CyberPlayGround

 

HOW TO WORK WITH GOOGLE

Google is a black box. How to work with / not for google.

Back To Search Engines

The Google motto is "do no evil" "
By their fruits you shall know them -- evil is as evil does."

Google Apology for being Evil makes explicit an acknowledgment by Google that search results can have real impact on real people, and that the referenced Web sites in these results may at times be misleading, defamatory, or otherwise seriously damaging to actual lives.

GOOGLE means the number 1 with a 100 zero's after it. "Googol" was coined in 1938 by Milton Sirotta, age 9.

GOGGLES, eyes. See ogles. Goggle eyes, large prominent eyes; to goggle, to flare. ~ A Classical Dictionary of the Vulgar Tongue by Francis Grose 1785

Learn how to work with google and use google hacks.

 

GOOGLE FACTS

Excerpt:
"Google's corporate philosophy is based on the model which brought them success: organizing and giving away other people's content, creating space for advertisements in the process. The enormous success Google found with that model in the search engine business spurred it to try and impose it in every arena. In the Google worldview, content is individually valueless. No one page is more important than the next; the value lies in the page view.

2007
Jimmy Wales Open-source Wikia search engine gangs up on Google
Wikipedia's co-founder Jimmy Wales attempts to
overturn Google's domination of the search market. Their weapon? The transparency provided by open source software. The idea underpinning their search engine - is that its search algorithm, which determines which web pages appear top of the lists of links it serves up, will be made public. Wikia's search engineers think this will elicit the trust of users in a way that Google, which keeps its
algorithm a closely guarded secret, never will. Open source search results will also be more relevant, as the algorithm will continually be tweaked by its users, keeping it up to date with new technologies as they are deployed, Wales says.

2006
With that vast trove of search data, Google can now PREDICT behavior. Like with the movie studios.  By analyzing search queries, Google discovered it could predict within a week of a film’s release what the movie’s gross would be with approximately eighty percent accuracy!  But the studios said this was not helpful, because by the time this data snapshot was taken, they’d already spent their marketing wad, it was TOO LATE! So Google went back to the numbers.  And found out, purely based on how often people were searching for a film, that SIX WEEKS OUT they could predict the gross of a movie with EIGHTY TWO PERCENT ACCURACY! ~Lefsetz

DID YOU KNOW ABOUT GOOGLE WATCH??
A look at how Google's monopoly, algorithms, and privacy policies are undermining the Web. PageRank: Google's original sin, Google mum on privacy policy.

Google hired Ori Allon, a doctoral student at the University of New South Wales for his Orion search engine. Google will display your search results with other relevant info in the form of expanded text extracts giving you the relevant information without having to go to the Web site so that search engines can steal web site content without paying for it.

The Vanishing Click-Fraud Case
Why was a seemingly slam-dunk case against an alleged click-fraudster who attempted to extort Google quietly dismissed?
Google won't discuss specifically how it detects bad clicks or what percent it deems fraudulent, only that it's "less than 10%," saying such information could be helpful to would-be scam artists. Google and its competitors also make money on fraudulent clicks. Here's how it works: Hundreds of thousands of advertisers that market on Google's search engine also let Google distribute their ads to other Web sites. When an ad is clicked on a partner site, both Google and the Web site operator split the revenue and the advertiser is charged. If such a click is bogus, and gets through the search company's filters, Google still profits, at least in the short run.

Clarance Briggs AIT Corporation speaks http://media.webmasterradio.fm/episodes/audio/2006/googlestory/googlestory1.mp3 WebmasterRadio.FM investigative journalist Jim Hedger hosts this exclusive
AIT - x military found google in the al caida blog asked where is the money going? - is how this got started with the FBI google helps pro terror fund themselves. WebmasterRadio.FM series on the implications of click fraud on the industry and on national and global security. WebmasterRadio.FM is initiating an industry wide initiative to further examine and confirm issues raised by this series. The series starts with an interview with Clarence Briggs, CEO of hosting firm AIT.com. Mr. Briggs was a lead proponent in one of the class action lawsuits Google settled in the spring of 2006. Because the case was settled out of court, Google was never forced to show how they charge for some clicks and dismiss others as invalid. Mr. Briggs maintains Google is doing business as usual, just as they did before the class actions were initiated.
During the interview, Mr. Briggs noted the use of click fraud by criminal and terrorist organizations. Our investigation has found several incidents of this type of activity. We have also found evidence of bot-nets used to facilitate click fraud, primarily against Google advertisers. This series has been in research and production for over three months. In that time, Jim Hedger and a number of well-known search marketing experts and analysts have studied log files supplied by AIT. Each of the search marketing experts and analysts worked in exclusion of each other, without a lot of background information, in order to ensure non-biased examination of the data.

Click Fraud class action suits in CA & Lessons Learned:

Joe Holcomb -
Search engines KNOW about the fraud on their networks. They also do NOT eliminate revenue from a particular source unless they have another source to replace it. Ive worked for enough engines and been privy to enough conversations to know what I am talking about here. Fact is they may not know WHO they are funding but they do know that the crappy traffic is there making them money. And they DO let it happen.  Nilhan, they wont clean up click fraud unless stories like this one get out and force them to in order to save face. As I mentioned in my article linked to above, some industry experts estimate that 30% of all clicks are fraudulent. Imagine what would happen if Google had to clean all that up! Their stock would tank, their revenues would shrink, and their costs would go up. They have an incredible incentive NOT to do anything about this type of activity. They simply ban small time publishers (and keep their money by the way) to make it SEEM like they are doing something about the 300 lb gorilla in the room. [2 Stories - one and two ]

2005

IPO filed week of 4/26/04
$27 billion - a figure based on the six-year-old company's market valuation after its first day of trading.  95 percent of all Web searches in the United States are handled by two companies, Google and Yahoo, either directly or through other sites that use their technology. In the case of Google, whose shares started to trade publicly last week, the company holds the world's largest index of Web content, at more than four billion pages, and handles more than 200 million searches a day.

Since 2000, Google has recorded your search terms, the date-time of each search, the globally-unique ID in your cookie (it expires in 2038), and your IP address. This information is available to governments on request. If your favorite site features a Google search box, ask them to install their own local site search. They could also use our site search for webmasters, which shows the same results without the tracking.
http://www.scroogle.org/masters.html

Since March 2007 Google announces it will start anonymizing their server logs after 18-24 months.

Anonymizing Google's Cookie : Why and how to anonymize your Google cookie

 

MONETIZATION OF LIBRARIES

Google's Monetization of Libraries
Google and the libraries involved have at their core a mission and philosophy of open access to information, even if their economic and organizational missions are very different.  This conflict can be seen as a conscious attempt to push the boundaries of copyright law outward, by organizations that are well-informed about the legal issues but determined to build a more open information model.  They are saying “sue us.”  Google co-founder Larry Page is cited in an article that appeared in Tuesday's Information World Review as being a "firm believer in academic  libraries being able to 'monetise' the information they hold." (3) Paul Courant, provost at the University of Michigan, is quoted in the Chronicle of Higher Education as saying the project is worth "hundreds of millions" of dollars to his University alone. (4) Google obviously considers that kind of money to be a good investment, which means they expect many hundreds of millions in revenue from these collections, through advertising in the near term and probably other means in the longer term.

GOOGLE PRINT BOOK LIBRARY

EXERPT: "a page view is a page view, regardless of whether the page in question has a picture of a cat, a single link to another site, or the full text of Freakonomics. When all you're selling is ad space, the value shifts from the content to the viewer. And ultimately the content is valued at nothing. And here, finally, is the larger problem posed by Google's actions. 
Books are not in any important sense user-centric. Whether or not a book has readers matters little. Books stand on their own, over time, as ideas and creations. In the world of books, it is the ideas and the authors that matter most, not the readers. That is why the copyright exists in the first place, to protect the value of these created works, a value which Google is trying mightily to deny. As much as any other American business, Google is the corporate embodiment of the Internet's first principles. And as with so much else on the Internet, the promise of Google Book Search lies somewhere off on the horizon, while the dangers it poses today are very real."

About Google Book Search which used to be called Google Print Library Project Source known as the library that all the world could use via "universal accessibility" is now turning its readers over first to bookstores, and then to libraries as it becomes more and more plainly obvious that Google's library is not really a library but merely a catalog for bookstores and libraries for those who already have easy access to them. ~ Michael Hart, Internet user #100 since 1971 & Gutenberg Project Executive Coordinator http://www.gutenberg.org

Google Print vs. The Open Library vs. Project Gutenberg

The Year of the Electronic Library: Obviously the Big Boys have finally discovered books on the Internet.
Just under a year ago Google's multi-million dollar media blitz of December 14 ran hog wild through the media, getting more attention from television, radio and print media than eBooks had received in toto during their 35 years of existence, in spite of "The Wall St. Journal" claim to fame as being the first to put the word INTERNET on the front page or cover of any major media outlet in Oct, 1991,in reference to the growing idea[l] of Project Gutenberg eBooks.
However, since Google didn't really DO anything after such a great public relations coup, no one ended up paying any attention and it appears as if the momentum, at least the media momentum was lost. This was confirmed a few weeks ago when Yahoo and Internet Archive press releases about starting a competitive eLibrary failed to put any wind in the media's sails.
However, Google seems to have been paying attention, and finally a release from the Google Print Library resulted, but it turned out, sadly to say, that these releases were not turning out to be greatevents as had been predicted last December 14.
Most of the books were hard to search, impossible to download, and on subjects of little interest, and what interest there was stayed with the various lawsuits Google was being threatened with for any of a number of copyright problems, even though Google pretended an enormous amount of public domain works were still copyrighted in a concerted effort to ameliorate the situation.
Not content to let the Google Print Library and Yahoo Open Content Alliance/Open Book Library steal all this glory, Amazon and Random House announced their own eLibrary just a week later.
Today we saw yet another entry from The Library of Congress, as it received 3 million dollars to start their own project, from a most unlikely source, Google!  It was suggested at today's Geek Lunch a motivation of Google's might be to let The Library of Congress pay the price in non-cash value, for opening the vast intercontinental virtual prairieland to the virtual settlers, who just happed to be an assortment of multi-billion dollar cartels, who have felt those slings and arrows of their misfortune a little too much.

There are already open source research sites available.

Directory of Open Access Repositories:

"The OpenDOAR service is being developed to support the rapidly emerging movement towards Open Access to research information. This will categorise and list the wide variety of Open Access research archives that have grown up around the world."
"The project is a joint collaboration between the University of Nottingham in the UK and the Lund University in Sweden. Both institutions are active in supporting Open Access development. Lund operates the Directory of Open Access Journals (DOAJ), which is known throughout the world." Also find out about the Registry of Open Access Repositories (ROAR), a project based at the University of Southampton.

***
So, in just a single month we have seen more "action" on the parts of these multi-billion dollar alliances than ever before, except a person still has huge trouble actually downloading eBooks from any of these eLibraries. But, then again, that might NOT be their purpose, after all.
***

The original purpose of eLibraries, as Project Gutenberg set out a while back in 1971, was to provide library materials for people to keep, to use as sources for new editions, new libraries, etc., and to be the source for continual improvements over the centuries.
The purpose of these new entries into the fray seems to be by some other thing, as they do not invite readers to keep these materials and to create new and better editions for future readers, both via new editions, and by correcting previous editions.
The original ideal of eBooks was:
"to encourage the creation and distribution" of eLibraries, but the Newspeak Dictionary seems to have somewhat changed definitions.
The original ideal of eBooks was also to allow every reader to read in their own favorite program, and to use their own favorite search programs, indexing and concordance programs, and to choose favorite colors, margin lengths, page lengths, etc.
I can only hope there is something WE can do to keep these ideals-- such as they are--alive and thriving so WE can have our own eBooks, or own eLibraries, the way WE want them.

Michael S. Hart Founder Project Gutenberg
Print Encyclopedias Join Dinosaurs Part 1

EBOOKS

Google-Watch.org, argued that the Google search engine invades privacy, posted a heretofore confidential contract between Google and the University of Michigan.

Michigan Digitization Project PDF is the university hosted page about activities there.

§ 108. Limitations on exclusive rights: Reproduction by libraries and archives

The United States Library of Congress has announced the creation of the World Digital Library today, a project that's also just received its first $3 million in funding from Google."

DIGITIZING IN - COPYRIGHT BOOKS
Digitizing in-copyright books and acquiring copyright
permissions, has taken place in terms of developing the Universal Library http://www.ul.cs.cmu.edu/html/ with a goal of digitizing one million books as well as other digitization efforts.
Carnegie Mellon is working with a number of other libraries on the One Million Book Project and with the governments of India and China. One thing they have discovered is that getting permission for digitizing
in-copyright books is a time consuming and expensive proposition.

Legal's view is that what is copyright protected is the SEQUENCE of the words

The following is based upon a lengthy evaluation
from the university's legal counsel explaining that it's OK to digitize a book, create a searchable index, and offer full text searching WITHOUT the permission of the copyright holder -- provided that you not only don't display, but that you destroy the digitized pages.
Legal's view is that what is copyright protected is the SEQUENCE of the words, so you can break up the sequence in an index and use it for retrieval, but you can't display the words in context (a "snippet") because the sequence is copyright protected.  Some leaders of the Universal Library tend to think that a snippet (a few lines) is covered under fair use.  One
of the directors of the UL has developed a user interface he calls "contextual searching," which displays your search terms in context, i.e., with a snippet of words before and after your search terms.
The business about the digitization being done by a commercial firm (Google) rather than the library that purchased the book - gets at the law that says a library can digitize a legally acquired copy of a book without permission under certain circumstances for PRESERVATION purposes.  The preservation copy is NOT a USE copy.  There are some strict, mitigating circumstances that would allow the preservation copy to be used (e.g., if there was no other copy available on the planet at a reasonable price to purchase or borrow), but the preservation copy could ONLY be used on computers IN the library that held the legally acquired (now totally deteriorated, dilapidated) book.

Courts Unlikely To Stop Google Book Copying by Christopher Huen, September 2, 2005
"Despite objections from publishers and writers, copyright law appears to be on Google's side, legal experts say. The social value of Google's initiative to digitize library books, including those protected by copyright, will likely weigh heavily in the search engine's favor."

Yahoo and  Microsoft have also made deals with libraries to digitize books.
CORNELL OPENS COLLECTION TO MICROSOFT
Microsoft has announced two partners in its book scanning project, which will compete with Google's controversial Book Search program. Cornell University will allow Microsoft to scan its library collection, and Kirtas Technologies will provide high-speed hardware for the
scanning. Unlike Google's program, Microsoft's Windows Live Book Search will only scan books in the public domain or those whose copyright owners have granted explicit permission. Librarians from Cornell will select texts to be scanned and will oversee quality control for the process. Kirtas claims that its scanning machines are capable of digitizing 2,400 pages per hour and are gentler that human hands with the books.

TIPS: How to Work With Google

Google Inconsistencies: Google does not always behave as advertised nor deliver the results expected. This page aims to document both ongoing and short-lived inconsistent search behavior on Google.

Danny
http://searchenginewatch.com/searchday/article.php/3437471
Gary
http://www.resourceshelf.com/2004/11/wow-its-google-scholar.html

SEARCH FOR FREE by David Dillard All you need is a public library card which gives you online access to the very expensive databases where you can get everything for free.

Pro's and Con's of working with Internt Search engins vs. Databases found at public or university Libraries.

BioMed Central has developed a Google Scholar search plugin for FIREFOX

Google - Google special search engine for BSD + Macintosh

TOOLS

IS GOOGLE AN EVIL COMPANY? Google as the mother lode of user tracking PRIVACY PROBLEMS

OrgName: Google Inc. OrgID: GOGL Address: 1600 Amphitheatre Parkway City: Mountain View StateProv: CA PostalCode: 94043 Country: US NetRange: 66.249.64.0 - 66.249.95.255 CIDR: 66.249.64.0/19 NetName: GOOGLE NetHandle: NET-66-249-64-0-1 Parent: NET-66-0-0-0-0 NetType: Direct Allocation NameServer: NS1.GOOGLE.COM NameServer: NS2.GOOGLE.COM Comment: RegDate: 2004-03-05 Updated: 2004-11-10 OrgTechHandle: ZG39-ARIN OrgTechName: Google Inc. OrgTechPhone: +1-650-318-0200 OrgTechEmail: arin-contact@google.com

Govt claims warrantless access to e-mail  via third party servers
Google GMAIL POLICY
Google doesn't delete your gmail emails when you trash them, but continues to keep them on their servers in storage. And once they've been there for 180 days
Email privacy is an enormous issue. The EFF informs us that companies which offer both internet search and email storage can collect a massive amount of personal information about users. It's possible to extract information about you from the searches you make. It's also possible 
for an email host to scan your emails to collect other personal details about you. Not only possible, but the big players like Yahoo, MSN and Amazon do this.
Google Watch reports that Google scans all of its clients' emails and reserves the right to "give this information to whomever they wish. . . After 180 days in the U.S., email messages lose their status as a protected communication under the Electronic Communications Privacy Act.

EPIC explains Google's email privacy policy

Google has a great image as a good company.  They have engendered a great amount of trust through their "Don't Be Evil" motto.  They are getting dangerous. 
The fact is that they are stockpiling a perilous amount of personal information about their users. Google logs every search request with its IP address. Google has acknowledged this log in a number of interviews.  But,
they have never answered why they keep such a log.  The search log by itself is not too harmful since the IP address identifies a computer and not a person. The searches cannot easily be traced to a particular person without help from the ISP.
THEY KEEP THE INFORMATION FOREVER! "A bigger problem is that many Google search users are also Gmail users, and a cookie is shared between Gmail and Google search (because they use the same domain, google.com).  Therefore, if a person uses Gmail and Google search from the same computer, even with a long period of time in between, Google will know the identity of the person responsible for those search queries.
Google doesn't need to infer your identity from the content of your other web searches; it already knows it, if you're a Gmail user.
This identification can be retroactive.  If you used Google search for 3 years on a particular PC, and then signed up for a Gmail account, your search cookie from that PC would be sent to Google and the name you provided for your Gmail account could then be associated retroactively with your entire saved search history.
Google cookies last as long as possible -- until 2038.  If you've ever done a Google search on a given computer with a given web browser, you probably still have a descendant of the original PREF cookie that Google gave you upon your very first search, with the very same ID field (a globally unique 256-bit value).
This problem is ubiquitous in the web portal industry, and Google is right to say that its privacy policy is better than many of its competitors'.  However, Google is still assembling a treasure trove of personal information, possibly stretching back for years, that Google may release in response to any civil subpoena or "governmental request":
http://gmail.google.com/gmail/help/privacy.html#disclose"

Seth David Schoen  http://www.loyalty.org/~schoen/

--

Question: Does Google retain logs of personally identifiable search data?

My full blog Answer posting
When Google launched "My Search History" in April 2005, Google's VP of Engineering, Alan Eustace told InternetWeek:
With 'My Search,' however, information stored internally with Google is no different than the search data gathered through its Google.com search engine, Eustace said. "This product itself does not have a significant impact on the
information that is available to legitimate law enforcement agencies doing their job."
As I asked in a blog posting at that time:
Is he really saying that Google already captures and stores search data tied to unique users? Unfortunately, Google's privacy policy is pretty vague on the issue. ... Eustace may have misspoken... but really he didn't. 
According to the "My Search History (Beta) - Privacy FAQ," you may feel free to edit the logs, but Google is still keeping copies of the unedited searches.
So there you have it: a comprehensive log of your searches tied to your identity, available to law enforcement bearing warrants and litigious people bearing civil subpoenas. Signing up for the service simply provides them an
easier way to wrap the data into a tidy duces tecum package! So, in other words, you are already using My Search History, and you didn't even know it!

--

9/13/05 Goodle Board When Princeton University's president, Shirley Tilghman, joined Google Inc.'s board of directors this month she also joined the company's seemingly endless parade of wealth. Instead of cash, the online search engine leader is paying Tilghman as it does all its other directors -- with a bushel of prized stock that eventually could turn her Google duties into a better-paying gig than her job running an Ivy League school. ...Tilghman, a molecular biologist, received a $485,000 salary during Princeton's 2002-03 school year, according to the most recent available information from Guidestar.org, which tracks tax returns filed by nonprofits such as the university. The median compensation package for directors at companies in the Standard & Poor's 500 was $139,060 during the fiscal year ending in May 2005, according to Equilar Inc., a San Mateo, Calif., firm specializing in compensation issues. ...If Tilghman gets lucky, she may get as rich as another Google director from academia, Stanford University President John Hennessey. When Google appointed Hennessey to the board 18 months ago, he received 65,000 stock options now worth about $18 million. Tilghman is the first woman on Google's board, joining Hennessey and eight other wealthy men whose duties include attending most board meetings -- 15 last year.

Annonymous wrote:
1) Google is not just a company. If any company wants to be found by  customers, wants to make sales on the web, or wants to be part of the  modern world, it has to be findable in Google. People under 35 don't  use telephone books, magazines, or newspapers anymore. They use  Google. It's not a choice "to not be on Google". Google isn't a  company: it has become the infrastructure for the delivery of  information.
2) Lauren Weinstein writes about the privacy issues at Google. It's  far more serious than privacy. Very few people understand how Google  works. There isn't "one Google" and the results you see in your  search in Miami are not the same as the results someone else sees in  Seattle. - Google constantly adjusts the results according to many  parameters, incl. user personalization (Google Toolbar), the length  of the user's search session (you may get different results during  your session), your physical location, and so on.
- There are millions of searches and results, and these constantly  shift. This means it is literally impossible for anyone outside of  Google to track the search results.
Results don't appear consistently. They can appear intermittantly.  Instead of appearing 100% of the time, a result can appear for 90% of  searches or 80% of searches. There is no way to track this.
- It would be easy for Google to slightly suppress a result. So a  search for a particular company would only appear for 97% of  searches. That's a small amount, but it is significant for ecommerce.  This means Google can manipulate the sales and valuation of companies.
- It works the other way too. Google can "over-produce" results for a  publically-traded company. Their earnings and valuation rise slightly.
3) Google's ability to suppress (or enhance) results isn't theory.  Google has a secret team that suppresses the ranking of people who  criticize Google. Never complain about Google in Gmail, in a public  forum, or wherever your comments will be found by Google. Your  rankings will slide down just a bit. You will lose web traffic to  your website, your blog, or your company.
This means that Google doesn't have to blacklist you. Nothing that  blatant. They just lower your ranking. End of problem. Nobody can  prove anything, because Google is an informational black hole; they  never reply.
4) The privacy issues are thus both ways: the right to keep one's  information private, and the right to publicize one's information.  It's bad to lose privacy, but what is it when one's public persona is  downranked by Google and one can't be found in searches? Professors,  researchers, journalists, etc. can be removed from public access. And  remember: Google doesn't have to blacklist you. They only have to  lower your ranking. Or show you in the results only intermittantly.
Microsoft was (and still is) a monopoly. But you can use your copy of  Microsoft Word to write whatever you like.
Google is a far greater danger than Microsoft. Write your emails in  GMail, use Google word processor, the Google spreadsheet, Google  video, or any of the endless Google tools, and they correlate  everything about you. Google can read all of your emails, docs, and  spreadsheets. By merely suppressing or enhancing results, they can  make vast profits, erase careers, and literally control economies.  This creates spectacular power. No company has ever been able to  resist that kind of temptation.

About Us | Privacy Policy | | ©1997 Educational CyberPlayGround, All rights reserved world wide.