Git : separating folder into different repository, with history

First things first.  If you don’t use git for version control yet, stop right now and go plan your migration.  You’ll thank me later.  Now.  A few days ago I had a tricky problem.  A chunk of code that was initially all over the project has been refactored into a pretty much separate library.  It was still a part of the same project, but in a folder of its own.

Then, a realization came that this library can be used from a few other projects.  A separate git repository in combination with ‘git submodule‘ would do a better job.  But just initializing a new repository and copying files seemed like a bad hack.  We’d much rather keep all the commit history, contributors and timestamps.  But is that even possible?

Turns out, it is.  And quite simple too.  Stack Overflow to the rescue.  I’ll copy the code here just in case it disappears.

git clone --no-hardlinks file:///SOURCE /tmp/blubb
cd blubb
git filter-branch --subdirectory-filter ./PATH_TO_EXTRACT  --prune-empty --tag-name-filter cat -- --all
git clone file:///tmp/blubb/ /tmp/blooh
cd /tmp/blooh
git reflog expire --expire=now --all
git repack -ad
git gc --prune=now

It worked like a charm! The above lines gave me a local git repository with just the necessary folder and all the relevant commits’ history. All I had to do after is add a remote repository and push the code to GitHub.

This is one of those perfect examples of how powerful git is.  It’s also an example of git usage, which I would have probably never figured out on my own…

SPL – Standard PHP Library

I’ve been looking at SPL for some time now.  On one hand, it’s a new addition to PHP core (since version 5.3), so I know how to work without it.  On the other hand, it provides standardized solutions for common problems, and that should be enough reason to start using it.  However, today I came across an interesting article that provides even more reason to use SPL.

In this post I want to investigate the memory usage of PHP arrays (and values in general) using the following script as an example, which creates 100000 unique integer array elements and measures the resulting memory usage:

$startMemory = memory_get_usage();
$array = range(1, 100000);
echo memory_get_usage() - $startMemory, ' bytes';

How much would you expect it to be? Simple, one integer is 8 bytes (on a 64 bit unix machine and using the long type) and you got 100000 integers, so you obviously will need 800000 bytes. That’s something like 0.76 MBs.

Now try and run the above code. You can do it online if you want. This gives me 14649024 bytes. Yes, you heard right, that’s 13.97 MB – eightteen times more than we estimated.

[…]

But if you do want to save memory you could consider using an SplFixedArray for large, static arrays.

Have a look a this modified script:

$startMemory = memory_get_usage();
$array = new SplFixedArray(100000);
for ($i = 0; $i < 100000; ++$i) {
    $array[$i] = $i;
}
echo memory_get_usage() - $startMemory, ' bytes';

It basically does the same thing, but if you run it, you’ll notice that it uses “only” 5600640 bytes. That’s 56 bytes per element and thus much less than the 144 bytes per element a normal array uses. This is because a fixed array doesn’t need the bucket structure: So it only requires one zval (48 bytes) and one pointer (8 bytes) for each element, giving us the observed 56 bytes.

For years I’ve been suffering from PHP’s memory hunger. I’ve had to optimize the code around smaller memory footprints, unset variables, and do all sorts of other messy things, that I normally wouldn’t have in a high-level programming language like PHP. With SPL, it seems, there is more help on the horizon.

GitHub issue attachments

Holy Molly!  Finally, one of the two things that I’ve been missing a lot from GitHub saw the light of day.  From now on, GitHub issues can have attachments.  So far, they are limited to only image types, but that’s enough for the majority of the situations.  Because that’s what you need the most – a screenshot illustrating the problem.

Now, if only one could open up project issue tracker to general public without playing around with the API, GitHub would be complete and absolutely perfect.  But something tells me that’s just a question of time.  So, waiting …

phing fatal error after upgrade to 2.4.13

If you are using phing for building and deploying projects (and you should), and using Remi repository for PHP 5.3 and related tools for CentOs and RedHat, be prepared to see a problem with phing and build.xml files that use conditions in If.   Here is a sample snippet from the build.xml:

<!-- Directory separator based on the current file system -->
<property name="ds" value="\" />
<if>
  <equals arg1="${host.fstype}" arg2="UNIX" />
  <then>
    <property name="ds" value="/" override="yes" />
  </then>
</if>

and here is the error:

Error reading project file [wrapped: build.xml:39:20: Error initializing nested element <equals> [wrapped: equals (unknown) doesn't support the 'arg1' attribute.]]

The bug #943 has already been filed in assigned in phing trac, and after a brief chat with devs in IRC channel, it seems like it will be fixed soon (by 2.4.14 release). The best cause of action for now though is reverting back phing 2.4.12.