Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not to make excuses for Google, but is this an entirely accurate portrayal?

Like, I've been working on a web project that doesn't contain any analytics, but which stores and retrieves JSON data in Google Firebase. I imagine if I opened my website with this tool, I would hear lots of noise.

But, I just can't imagine how Google could do anything useful (to them) with my random JSON blobs.



> But I can't imagine Google would know how to do anything useful (to them) with my random JSON blobs.

They're not interested in the blobs, but in the people accessing them. Their whole suite of "free" developer tools (google analytics, google fonts, firebase, ...) are just a means to get information about what people do online.


> Their whole suite of "free" developer tools (google analytics, google fonts, firebase, ...) are just a means to get information about what people do online.

Is this actually confirmed to be true? It would make a lot of sense for them to use free-tier GA data for profiling for Display ads etc, but has it ever actually been proven?


Probably not for the CDN stuff

From https://developers.google.com/fonts/faq:

> The Google Fonts API is designed to limit the collection ... the Google Fonts API does not set or log cookies ... resource-specific domains...don't contain any credentials ... IP addresses are not logged.

From https://developers.google.com/speed/libraries/terms

> Google Hosted Libraries uses resource-specific domains... Requests unauthenticated ... Google Hosted Libraries only uses cookies as necessary for security and to prevent abuse ... Our systems are designed to remove HTTP referer information before logging

But analytics is a totally different story. https://policies.google.com/technologies/partner-sites

> when you visit a website that uses...Google Analytics...your web browser automatically sends certain information to Google. This includes the URL of the page you’re visiting and your IP address. We may also set cookies on your browser or read cookies that are already there. Apps that use Google advertising services also share information with Google, such as the name of the app and a unique identifier for advertising > Google uses the information shared by sites and apps to ... personalize content and ads you see on Google and on our partners’ sites and apps

So Google's policies do let them use 3rd party Google Analytics data to target ads.


Given the amount of factors that go into ad targeting it's pretty much impossible to prove without a source code audit. It also means it's easy for Google to do it and get away with it as nobody can prove it. Given the business model of Google, I wouldn't be surprised if they did it, especially considering they've already proven their bad faith with various dark patterns and their intentional refusal to comply with the GDPR (for 4 years their "consent" flow wasn't compliant as you couldn't decline as easily as accept).


> They're not interested in the blobs, but in the people accessing them

But if they don't know what the blobs are, how does this help them? What can they tie it to?

The latitude and longitude coordinates of my current location have an sha1 hash of 0950e97d3a2e4839e39ad27deb2e852d498100ae. Is this useful information?


You’re not thinking on a big enough scale. No big tech company cares about “your” data. They care about everyone’s data in aggregate as much as they can get from every location at every granularity. Even thinking of a tech company as collecting all of “your” data to create an ad profile “for you” is rather inaccurate. They’re collecting everyone’s data to create an ad profile for everybody, tailored to what makes the most money in aggregate across all ad slots.

When you think of it like this you’ll stop asking questions like “what would they do with this piece of data?” Because the answer is always that it is a drop in a giant ocean of machine learning data.


If the requests are client side, they know that the user has accessed your domain. They can analyze the frequency and timestamps of these requests and add that information to the ad profile they have built for that user.


In isolation, no. But given enough other actors that use the same firebase + can be tracked by adsense, they can infer connections useful for targeting you.


I'm not sure why you're being downvoted, because that's exactly what they're doing.

Do people really think Google just gives away things for free because they're being nice?


> Do people really think Google just gives away things for free because they're being nice?

No, I assumed they gave the low tier away for free so people would eventually upgrade to the higher tiers.


To be clear, I'm not saying that's what they do currently. Just that you can get useful metadata from an aggregate of very basic info like this.


Don't you need a proprietary firebase SDK in your app to use it? Do you know what data are included in the requests? I would argue that anything as simple as IP + UA/OSidentifier can be of interest to Google


I've just been using XMLHttpRequest and the Firebase REST API, no SDK. I imagine Google can see the IP address, but I think that's it?

(This is my first time building anything that involves a "backend", so I am pretty new to this!)


It depends what browser you're using and what parameters you're using on the XMLHttpRequest. They definitely get the IP, user agent (so what OS and browser), and potentially more.


At a minimum they could build cohorts of people who use your app and use that as a bit of information for ad targeting.

The SHA-1 thing is a complete non-sequitur, but since you asked, small amounts of data run through unsalted SHA-1 can be brute-forced very easily if someone cared find out where you are.


You don't need the SHA1 or anything within the blobs even. Just an IP address & user-agent pair is enough to uniquely identify a user with some accuracy, and that accuracy only goes up the more data you add. It'll never be 100% accurate, but for ad targeting, it doesn't need to be - a "hunch" is more than enough since getting it wrong leaves you no worse than you were before.


> The SHA-1 thing is a complete non-sequitur

Sorry about that. I just couldn’t figure out where GP was going.


Don't know why parent comment is in gray, but it's spot on.

The one and only genuine business reason for Google Fonts is to vacuum visits data from sites that don't have GA installed or by users who have GA blocked. That's it. Free cheese and all that.


Even without any app opened, my manjaro installation seems to ping detectportal.firefox.com which is hosted in google ip address which trigger the noise. With firefox open, the noise got worse. I think firefox sync servers might be located on GCP. But then I tried pinging my GCP server and it didn't trigger the noise.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: