PDA

View Full Version : Flatfile blogging via PHP a la *.txt


Nonesthecool
05-13-2010, 12:27 PM
So, I've decided to take on this little ditty of a project and was curious as to why or why this isn't the most secure way of coding a blog?

I doubt I'll have a hard time relearning it, seeing as how the last project I had taken on was in the php3 era...

Just curious as to security issues I may have when it comes to the admin front end, or if I should just 'x' that altogether and do all my updating via ssh?


-----

God I need sleep.

cense
05-13-2010, 10:16 PM
Security problems with simple web apps mostly relate to user input. If you accept input, you need to validate/sterilize it.

Zok
05-14-2010, 02:31 PM
I am not sure I understand the OP entirely. Are you asking about whether you should store blog entries in flat files vs a database? Or are you asking about security threats that are specific to flat file driven applications?

Typically people use database over flat files not for security but for speed. Say you have a big blog (10k pages and a few thousand users). Listing pages, listing users, or doing searches of ANY kind would be hindered by this. Imagine opening 10k files just to do a search...

Not only that, but there is substantially more overhead to be done when writing to flat files with php.

It's simply not as easy to write a robust flatfile blog as it is a database driven blog.

And what does SSH have to do with anything?

padam
07-24-2010, 03:12 AM
Typically people use database over flat files not for security but for speed.

I disagree. From what I've seen, too many people assume that a database is the only option. Then there's the others who believe everything they read/hear and automatically disregard flat-files as being too slow or insecure. A few years ago, someone on fuckedcompany's BBS made the claim that there are absolutely no benefits to using a flat-file system (especially to power a forum) and that you could automatically tell if a site was using one.

I ended up coding a(n albeit, pretty basic - but fully functional) forum from scratch using PHP and flat-text files. Few days later, I decided to convert that into a database driven version. I let him (and about 30 other regulars) test both versions, and only about 2 out of the 15 that guessed were able to determine which one was which. In some cases, the flat-file version was actually faster (which caused them to assume that it was the database version).

I'm sure I have the source code somewhere, maybe later if I feel like digging - I'll post it.


Listing pages, listing users, or doing searches of ANY kind would be hindered by this. Imagine opening 10k files just to do a search...

This is the type of thing that I'm talking about. When everything is said and done, databases are nothing more than _files_ that exist on a disk. The difference is in how they're used.

If you were to open 10k files and iterate through every single line (while testing if a word/phrase was present) then yeah, there's no question that it'd be slow. But if you used grep for example, the amount of time would probably be cut down by hundreds or thousands.

I'd even be willing to code something from scratch (using flat-text files) and then you code the same thing using a database. Depending on what it is, it's entirely possible that the flat-file system would outperform the database one.

That isn't to say that flat-files will _always_ be as fast or faster than a database, because I don't think that's true at all. I actually somewhat agree with the overall point you were making (that the more data there is, the higher the chance of a flat-file system becoming it's own bottleneck). My issue is more with how quick people are to say that flat-file systems are useless and counterproductive when there's actually any real amount of data.

Jaguarstrike
07-24-2010, 10:29 AM
Youd have to do some pretty elegant algorimin' to make a flat file based app anywhere near as fast as a DB based one if you plan doing anything nontrivial. Otherwise searching will slow down to a crawl once the files start getting big. You'd probably still end up with a half assed DB when you were done anyhow.

Use a DB, srsly.

padam
07-24-2010, 11:50 AM
You're making the assumption that files would get big in the first place.

Even if that were true though, what do you consider to be big for a text file? 50-60MB?


C:\zoklet>ls -la ua.txt
-rw-rw-rw- 1 user group 87427365 Jul 24 06:30 ua.txt


83.3MB text file.


C:\zoklet>nl ua.txt | tail -n1
735849 Mozilla/6.0 (X11; U; Linux i686; en-US; rv:1.1.3) Gecko/20062410


Total number of lines: 735,849.


C:\zoklet>sort ua.txt | uniq | nl | tail -n1
106265 padamowns


Unique number of lines: 106,265.


C:\zoklet>echo '' | time | grep current && grep padam ua.txt && echo '' | time | grep current
The current time is: 6:33:44.95
padamowns
The current time is: 6:33:46.76


About 2 seconds to perform the first search.


C:\zoklet>echo '' | time | grep current && grep padam ua.txt && echo '' | time | grep current
The current time is: 6:33:49.93
padamowns
The current time is: 6:33:50.39



C:\zoklet>echo '' | time | grep current && grep padam ua.txt && echo '' | time | grep current
The current time is: 6:33:54.14
padamowns
The current time is: 6:33:54.63


And less than a second each time after that.

On a dedicated server (rather than my netbook), I can only imagine how much faster it'd be.

Jaguarstrike
07-24-2010, 12:08 PM
When i say big, i mean by number of records.

Try simulating joins, then try simulating multiple users making requests that call it and see what happens to scalability. Not to mention all those grep instances.

padam
07-24-2010, 12:37 PM
When i say big, i mean by number of records.

735k is pretty big, no? Funny enough, I've actually seen what's being described here _more_ with databases. Indexes were correct, tables were optimized (and error-free) and even still - the site eventually came to a complete crawl (originally with 4 million records, then later with 6).


simulating multiple users making requests that call it and see what happens to scalability.

Indexes would be created after the first search, and would be consulted for X days/minutes/hours after that.

Regarding the multiple users: that's the point I was expecting to hear early on, pretty surprised it took this long. I definitely agree there, flat-file systems would be prone to all kinds of different problems (imagine one user writing to a file while another is reading from it, or 5 users writing to it at the same time) that otherwise wouldn't really be present with a database setup.

There are solutions though, such as queuing requests, locking files, keeping files open, etc.

My point isn't really to say databases are evil and shouldn't be used, just to encourage people to make the decision themselves. In some situations, database access isn't available - so flat-files would be better than the alternative (which is having nothing). I'd rather be able to access X site and it be slow/whatever than for it to be down/not usable.

Sadly, I think most of the benefits that databases have over flat-file systems aren't even used (I almost never see persistent connections in other peoples' code, and rarely see _useful_ indexes in their tables).

Jaguarstrike
07-24-2010, 01:07 PM
If youre not using the features of a DB other than store/retrieve then yes, you may as well be using a flat file. But why would you treat something such as mysql like that?

cense
07-27-2010, 06:34 AM
Any kind of database is basically an abstraction of data storage. The RDBMs abstract the data away under a relational model where the API encourages exploiting relationships between data points. When you're talking about flat files, you have next to no abstraction. It's more powerful on microscopic problems but less powerful on large, highly complex relationship modelling. And of course there is lots of stuff in between and surrounding this such as whole data "management" APIs like Berkely DB and tdb.

Each has it's benefit under specific situations.

For a basic, single editor blog flat files without any kind of significant API abstraction would be fine.

Zip
07-27-2010, 07:03 AM
It's a matter of finding a system that meets your requirements. If you're not working with a large amount of data and you want simple installation and backup, writing directly to HTML (e.g. flat files) is a fine solution particularly when you're dealing with a managed hosting provider that charges extra for database access.

I write specialized web applications for a small number of users, so extensibility and interoperability is key. With abstraction and "middleware" I can easily add functionality to an application without having to rewrite the data access layer for each component. Sanitizing input, minimizing anomalous data and setting proper permissions is a lot of work and using an RDBMS greatly simplifies this process.

To give you some idea of the benefits, an application I'm working on now uses a SQL Server back-end with the ADO.NET Entity Framework, which automatically creates RESTful web services which can be consumed by any client application which understands WSDL, JSON or Atom, alternatively I can access the database directly with ODBC or other interfaces. Proper planning can save you a lot of development time later on.