@A1kmm

A1kmm@lemmy.amxl.com · 5 days ago

GitHub are not some bastion of righteousness - they are literally owned by Microsoft. And they work hard to stop people from getting too much Open Source from them, with rate limits and the like, so essentially gate keep.

I think CSDN probably want to gatekeep their clone even harder, but in general having archives of GitHub on the Internet is a good thing.

A1kmm@lemmy.amxl.com · 10 days ago

Would you say its unfair to base pricing on any attribute of your customer/customer base?

A business being in a position to be able to implement differential pricing (at least beyond how they divide up their fixed costs) is a sign that something is unfair. The unfairness is not how they implement differential pricing, but that they can do it at all and still have customers.

YouTube can implement differential pricing because there is a power imbalance between them and consumers - if the consumers want access to a lot of content provided by people other than YouTube through YouTube, YouTube is in a position to say ‘take it or leave it’ about their prices, and consumers do not have another reasonable choice.

The reason they have this imbalance of market power and can implement differential pricing is because there are significant barriers to entry to compete with YouTube, preventing the emergence of a field of competitors. If anyone on the Internet could easily spin up a clone of YouTube, and charge lower prices for the equivalent service, competitors would pop up and undercut YouTube on pricing.

The biggest barrier is network effects - YouTube has the most users because they have the most content. They have the most content because people only upload it to them because they have the most users. So this becomes a cycle that helps YouTube and hinders competitors.

This is a classic case where regulators should step in. Imagine if large video providers were required to federated uploaded content on ActivityPub, and anyone could set up their own YouTube competitor with all the content. The price of the cheapest YouTube clones (which would have all the same content as YouTube) would quickly drop, and no one would have a reason to use YouTube.

A1kmm@lemmy.amxl.com · 11 days ago

would not be surprised if regional pricing is pretty much just above the break even mark

And in the efficient market, that’s how much the service would cost for everyone, because otherwise I could just go to a competitor of YouTube for less, and YouTube would have to lower their pricing to get customers, and so on until no one can lose their prices without losing money.

Unfortunately, efficient markets are just a neoliberal fantasy. In real life, there are network effects - YouTube has people uploading videos to it because it has the most viewers, and it has the most viewers because it has the most videos. It’s practically impossible for anyone to compete with them effectively because of this, and this is why they can put their prices in some regions up to get more profit. The proper solution is for regulators to step in and require things like data portability (e.g. requiring monopolists to publish videos they receive over open standards like ActivityPub), but regulatory capture makes that unlikely. In a just world, this would happen and their pricing would be close to the costs of running the platform.

So the people paying higher regional prices are paying money in a just world they shouldn’t have to pay, while those using VPNs to pay less are paying an amount closer to what it should be in a just world. That makes the VPN users people mitigating Google’s abuse, not abusers.

A1kmm@lemmy.amxl.com · 11 days ago

Yes, but for companies like Google, the vast majority of systems administration and SRE work is done over the Internet from wherever staff are, not by someone locally (excluding things like physical rack installation or pulling fibre, which is a minority of total effort). And generally the costs of bandwidth and installing hardware is higher in places with a smaller tech industry. For example, when Google on-sells their compute services through GCP (which are likely proportional to costs) they charge about 20% more for an n1-highcpu-2 instance in Mumbai than in Oregon, US.

A1kmm@lemmy.amxl.com · 12 days ago

that’s abuse of regional pricing

More like regional pricing is an attempt to maximise value extraction from consumers to best exploit their near monopoly. The abuse is by Google, and savvy consumers are working around the abuse, and then getting hit by more abuse from Google.

Regional pricing is done as a way to create differential pricing - all businesses dream of extracting more money from wealthy customers, while still being able to make a profit on less wealthy ones rather than driving them away with high prices. They find various ways to differentiate between wealthy and less wealthy (for example, if you come from a country with a higher average income, if you are using a User-Agent or fingerprint as coming from an expensive phone, and so on), and charge the wealthy more.

However, you can be assured that they are charging the people they’ve identified as less wealthy (e.g. in a low average income region) more than their marginal cost. Since YouTube is primarily going to be driven by marginal rather than fixed costs (it is very bandwidth and server heavy), and there is no reason to expect users in high-income locations cost YouTube more, it is a safe assumption that the gap between the regional prices is all extra profit.

High profits are a result of lack of competition - in a competitive market, they wouldn’t exist.

So all this comes full circle to Google exploiting a non-competitive market.

A1kmm@lemmy.amxl.com · 12 days ago

they have ran out of VC money

You know YouTube is owned by Google, not VC firms right?

Big companies sometimes keep a division / subsidiary less profitable for a time for a strategic reason, and then tighten the screws.

They generally only do this if they believe it will eventually be profitable over the long term (or support another part of the strategy so it is profitable overall). Otherwise they would have sold / shut it down earlier - the plan is always going to be to profitable.

However, while an unprofitable business always means either a plan to tighten screws, or to sell it / shut it down, tightening screws doesn’t mean it is unprofitable. They always want to be more profitable, even if they already are.

A1kmm@lemmy.amxl.com · 26 days ago

The fears people who like to talk about the singularity like to propose is that there will be one ‘rogue’ misaligned ASI that progressively takes over everything - i.e. all the AI in the world works against all the people.

My point is that more likely is there will be lots of ASI or AGI systems, not aligned to each other, most on the side of the humans.

A1kmm@lemmy.amxl.com · 27 days ago

I think any prediction based on a ‘singularity’ neglects to consider the physical limitations, and just how long the journey towards significant amounts of AGI would be.

The human brain has an estimated 100 trillion neuronal connections - so probably a good order of magnitude estimation for the parameter count of an AGI model.

If we consider a current GPU, e.g. the 12 GB GFX 3060, it can hold about 24 billion parameters at 4 bit quantisation (in reality a fair few less), and uses 180 W of power. So that means an AGI might use 750 kW of power to operate. A super-intelligent machine might use more. That is a farm of 2500 300W solar panels, while the sun is shining, just for the equivalent of one person.

Now to pose a real threat against the billions of humans, you’d need more than one person’s worth of intelligence. Maybe an army equivalent to 1,000 people, powered by 8,333,333 GPUs and 2,500,000 solar panels.

That is not going to materialise out of the air too quickly.

In practice, as we get closer to an AGI or ASI, there will be multiple separate deployments of similar sizes (within an order of magnitude), and they won’t be aligned to each other - some systems will be adversaries of any system executing a plan to destroy humanity, and will be aligned to protect against harm (AI technologies are already widely used for threat analysis). So you’d have a bunch of malicious systems, and a bunch of defender systems, going head to head.

The real AI risks, which I think many of the people ranting about singularities want to obscure, are:

An oligopoly of companies get dominance over the AI space, and perpetuates a ‘rich get richer’ cycle, accumulating wealth and power to the detriment of society. OpenAI, Microsoft, Google and AWS are probably all battling for that. Open models is the way to battle that.
People can no longer trust their eyes when it comes to media; existing problems of fake news, deepfakes, and so on become so severe that they undermine any sense of truth. That might fundamentally shift society, but I think we’ll adjust.
Doing bad stuff becomes easier. That might be scamming, but at the more extreme end it might be designing weapons of mass destruction. On the positive side, AI can help defenders too.
Poor quality AI might be relied on to make decisions that affect people’s lives. Best handled through the same regulatory approaches that prevent companies and governments doing the same with simple flow charts / scripts.

A1kmm@lemmy.amxl.com · 1 month ago

I looked into this previously, and found that there is a major problem for most users in the Terms of Service at https://codeium.com/terms-of-service-individual.

Their agreement talks about “Autocomplete User Content” as meaning the context (i.e. the code you write, when you are using it to auto-complete, that the client sends to them) - so it is implied that this counts as “User Content”.

Then they have terms saying you licence them all your user content:

“By Posting User Content to or via the Service, you grant Exafunction a worldwide, non-exclusive, irrevocable, royalty-free, fully paid right and license (with the right to sublicense through multiple tiers) to host, store, reproduce, modify for the purpose of formatting for display and transfer User Content, as authorized in these Terms, in each instance whether now known or hereafter developed. You agree to pay all monies owing to any person or entity resulting from Posting your User Content and from Exafunction’s exercise of the license set forth in this Section.”

So in other words, let’s say you write a 1000 line piece of software, and release it under the GPL. Then you decide to trial Codeium, and autocomplete a few tiny things, sending your 1000 lines of code as context.

Then next week, a big corp wants to use your software in their closed source product, and don’t want to comply with the GPL. Exafunction can sell them a licence (“sublicence through multiple tiers”) to allow them to use the software you wrote without complying with the GPL. If it turns out that you used some GPLd code in your codebase (as the GPL allows), and the other developer sues Exafunction for violating the GPL, you have to pay any money owing.

I emailed them about this back in December, and they didn’t respond or change their terms - so they are aware that their terms allow this interpretation.

A1kmm@lemmy.amxl.com · 2 months ago

Note that VPN is just trusting a different network.

If you trust your VPN provider not to misuse your unencrypted traffic / inject exploits, but not your mobile phone provider (or any other network provider you might roam onto), then a VPN provider could help.

If you trust your VPN provider less than the mobile phone provider, the situation is reversed - you would be better not to use a VPN.

If you trust them equally, there is probably no point using a VPN (except for the roaming situation, which could be forced in certain circumstances).

A1kmm@lemmy.amxl.com · 2 months ago

I wonder if this is social engineering along the same vein as the xz takeover? I see a few structural similarities:

A lot of pressure being put on a maintainer for reasons that are not particularly obvious what they are all about to an external observer.
Anonymous source other than calling themselves KA - so that it can’t be linked to them as a past contributor / it is not possible to find people who actually know the instigator. In the xz case, a whole lot of anonymous personas showed up to put the maintainer under pressure.
A major plank of this seems to be attacking a maintainer for “Avoiding giving away authority”. In the xz attack, the attacker sought to get more access and created astroturfed pressure to achieve that ends.
It is on a specially allocated domain with full WHOIS privacy, hosted on GitHub on an org with hidden project owners.

My advice to those attacked here is to keep up the good work on Nix and NixOS, and don’t give in to what could be social engineering trying to manipulate you into acting against the community’s interests.

A1kmm@lemmy.amxl.com · 2 months ago

Most of mine are variations of getting confused about what system / device is which:

Had two magnetic HDDs connected as my root partitions in RAID-1. One of the drives started getting SATA errors (couldn’t write), so I powered down and disconnected what I thought was the bad disk. Reboot, lots of errors from fsck on boot up, including lots about inodes getting connected to /lost+found. I should have realised at that point that it was a bad idea to rebuild the other good drive from that one. Instead, I ended up restoring from my (fortunately very recent!) backup.
I once typed sudo pm-suspend on my laptop because I had an important presentation coming up, and wanted to keep my battery charged. I later noticed my laptop was running low on power (so rushed to find power to charge it), and also that I needed a file from home I’d forgotten to grab. Turns out I was actually in a ssh terminal connected to my home computer that I’d accidentally suspended! This sort of thing is so common that there is a package in some distros (e.g. Debian) called molly-guard specifically to prevent that - I highly recommend it and install it now.
I also once thought I was sending a command to a local testing VM, while wiping a database directory for re-installation. Turns out, I typed it in the wrong terminal and sent it to a dev prod environment (i.e. actively used by developers as part of their daily workflow), and we had to scramble to restore it from backup, meanwhile no one could deploy anything.

A1kmm@lemmy.amxl.com · edit-2 8 months ago

more is a legitimate program (it reads a file and writes it out one page at a time), if it is the real more. It is a memory hog in that (unlike the more advanced pager less) it reads the entire file into memory.

I did an experiment to see if I could get the real more to show similar fds to you. I piped yes "" | head -n10000 >/tmp/test, then ran more < /tmp/test 2>/dev/null. Then I ran ls -l /proc/`pidof more`/fd.

Results:

lr-x------ 1 andrew andrew 64 Nov  5 14:56 0 -> /tmp/test
lrwx------ 1 andrew andrew 64 Nov  5 14:56 1 -> /dev/pts/2
l-wx------ 1 andrew andrew 64 Nov  5 14:56 2 -> /dev/null
lrwx------ 1 andrew andrew 64 Nov  5 14:56 3 -> 'anon_inode:[signalfd]'

I think this suggests your open files are probably consistent with the real more when errors are piped to /dev/null. Most likely, you were running something that called more to output something to you (or someone else logged in on a PTY) that had been written to /tmp/RG3tBlTNF8. Next time, you could find the parent of the more process, or look up what else is attached to the same PTS with the fuser command.

A1kmm@lemmy.amxl.com · edit-2 11 months ago

I use Restic, called from cron, with a password file containing a long randomly generated key.

I back up with Restic to a repository on a different local hard drive (not part of my main RAID array), with --exclude-caches as well as excluding lots of files that can easily be re-generated / re-installed/ re-downloaded (so my backups are focused on important data). I make sure to include all important data including /etc (and also backup the output of dpkg --get-selections as part of my backup). I auto-prune my repository to apply a policy on how far back I keep (de-duplicated) Restic snapshots.

Once the backup completes, my script runs du -s on the backup and emails me if it is unexpectedly too big (e.g. I forgot to exclude some new massive file), otherwise it uses rclone sync to sync the archive from the local disk to Backblaze B2.

I backup my password for B2 (in an encrypted password database) separately, along with the Restic decryption key. Restore procedure is: if the local hard drive is intact, restore with Restic from the last good snapshot on the local repository. If it is also destroyed, rclone sync the archive from Backblaze B2 to local, and then restore from that with Restic.

Postgres databases I do something different (they aren’t included in my Restic backups, except for config files): I back them up with pgbackrest to Backblaze B2, with archive_mode on and an archive_command to archive WALs to Backblaze. This allows me to do PITR recovery (back to a point in accordance with my pgbackrest retention policy).

For Docker containers, I create them with docker-compose, and keep the docker-compose.yml so I can easily re-create them. I avoid keeping state in volumes, and instead use volume mounts to a location on the host, and back up the contents for important state (or use PostgreSQL for state instead where the service supports it).