HackerOne users: Testing against this community violates our program's Terms of Service and will result in your bounty being denied.
Options

SQL fetching advice needed.

peregrineperegrine MVP
edited November 2013 in Feedback

If anyone can lend some advice.

I've got a database table with about 10,000 records.

with 20 columns (2 columns are 40 characters in length and the other 18 columns are 5 characters in length).

A user will select based on criteria that may result in 5 and Wheres

the results could return a count of potentially 1000 records and all need to be returned on the same html page.

Can anybody give any advice on how many records I could chunk in on one request and process in to an array
without blowing things up in memory,etc.

so the idea would be


I'll probably do my own prepare binding and exec with pdo
what kind of fetching would you use?

any pitfalls or suggestions

maybe @businessdad or @R_J (you guys seem to talk the most about sql) . or anyone else. x00 hgtonight shadowdare @garyfunk

any suggestions on database engine that would be optimal.

or a different technique - the only thing that is necessary is all data must be processed out to one html page, no pagination necessary or desired.

I'm also not sure where I would be flushing output buffers.

yes, a bit off - topic

I may not provide the completed solution you might desire, but I do try to provide honest suggestions to help you solve your issue.

«1

Comments

  • Options
    hbfhbf wiki guy? MVP

    really depends on the where conditions and the what selection.

    maybe someone else will be able to play seer on this, but i dont think theres enough information here to even hazard a guess at how to optimize your queries or tables.

  • Options

    really depends on the where conditions and the what selection

    why? seems like a straightforward question with all relevant info expressed.

    perhaps the question is more of one of resource usage and memory base on the result of sql selection.

    I have one table (just one table with 20 columns). I am not asking how to break up table or index or any table optimization.

    I am going to make a selection (assume one where if that makes it easy)
    the where can be on any one of the 20 columns. All the data in a row will always be retrieved.

    the question may be can I store all the info in a multidimentional array of 1000 rows of data from the 20 columns. At what point would someone be worried about exhausting resources.
    Or would chunking it up be better.

    not really a vanilla question, just thought some people might have a handle on the concept of the question I am trying to ask.

    I may not provide the completed solution you might desire, but I do try to provide honest suggestions to help you solve your issue.

  • Options
    hbfhbf wiki guy? MVP

    @peregrine said:
    not really a vanilla question, just thought some people might have a handle on the concept of the question I am trying to ask.

    well, based on my experience with MS-SQL, MySQL and developing my own XML based data services..

    if i assume you will have 0 indexing and each where clause will require a full traversal of the table then the maximum number of records is completely irrelevant to the memory utilization since you will load the entire table to memory on the server no matter what. however if you assume that you have 0 indexing but you have densely packed results and you set the limit sufficiently low then only enough of the table will be loaded to find the quantity of data you are requesting. However, you next request with the offset will cause it to traverse the entire stack again to get to the new start, then continue the search.....

    So basically putting a limit on results on a non-indexed search, while still paging all of the results will result in awful performance.

    the only time using limit and offset would be advisable in this scenario is when the quanity of data just simply can not be transferred via your transfer protocol. assuming you are using named pipe transfer, there is no limit im aware of. so breaking it up will just make your page load.

    but having a completely non-deterministic approach to the where clauses is inherently slow. to do this you are better off taking your data and indexing it with a search service, and using something like sphinx handle the record selection.

  • Options
    hbfhbf wiki guy? MVP

    but really 1000 x 20 elements is nothing. i pass 1000's of times more data than that in a single transaction in some of my enterprise scale information systems.

  • Options
    R_JR_J Ex-Fanboy Munich Admin

    I like talking about databases, but I'm really no expert, just made a few experiences with different dbs during my (non-IT) career.
    I support what @hbf said about transaction count: having a 10k entries table and fetching an arbitrary count of them should be no problem at all. Do you have access to the db already? Just run the sql and you'll see that you'll get your results in a blink of an eye.
    Joins could challenge a database, but no simple selects on one table.

    The only problem you'll face is a presentation problem. You can not show 1000 rows of information and expect anyone to extract any meaning out of that ;)

  • Options
    hbfhbf wiki guy? MVP

    @R_J said:

    The only problem you'll face is a presentation problem. You can not show 1000 rows of information and expect anyone to extract any meaning out of that ;)

    unless its presented as a screen of cascading japanese characters. then it's no problem, i can see a red head head in there.

  • Options
    hbfhbf wiki guy? MVP

    oh and to be clear - 10 years developing and supporting client-server applications for a large multi-national company does NOT make me an expert... it just means i know enough to fight my way out of a paper bag.

  • Options
    peregrineperegrine MVP
    edited November 2013

    @hbf said:
    but really 1000 x 20 elements is nothing. i pass 1000's of times more data than that in a single transaction in some of my enterprise scale information systems.

    thank you hbf
    ok thats a good answer. which leads to another off-topic question

    is there any kind of tool in php that examines resource usage or profiles.
    I can certainly put timing in, but I was wondering, if there is something that shows, memory used by an object or any array or memory left. I haven't researched any profilers or what they may present.

    Although, in this case I am only using one table. There probably is something in sql that can examine a query and tell you what you need to index or key, if you were using multiple table or joins. If you know of a command or tool or log to examine this, that would be cool.

    @R_J said:
    I support what @hbf said about transaction count: having a 10k entries table and fetching an arbitrary count of them should be no problem at all. Do you have access to the db already?

    thx @R_J
    yes. i can test on my machine. but it would run on a friends website, which has different server parameters, and they already got in trouble with their provider for something about excessive resource usage (which the isp won't explain what resources or where :) ). So i didn't want to cause more problems in this arena.

    Just run the sql and you'll see that you'll get your results in a blink of an eye.

    Joins could challenge a database, but no simple selects on one table.

    The only problem you'll face is a presentation problem. You can not show 1000 rows of information and expect anyone to extract any meaning out of that :wink:

    I agree. but that is what some users want. with sortable columns, which I can do via js. But to me it is overkill to have a huge web page. but column sorting is what they want for a huge table. And maybe sub queries on the data, which I could hide rows via js, instead of hitting the database again.

    And some users want pagination. which would result in less results in a sql query, but more hits on database when paging.

    And instead of presenting a myriad of table layouts, pagination vs none, I'm trying to hone in on one presentation.

    I am trying to weigh presenting huge (1000 row tables) that one person might query once and be done with it, versus 50 individual queries they might do of 20 row tables. One big query vs 50 queries. performance slow on first case but multiple data base hits on the second case.

    RE: Fetching I always use PDO::FETCH_ASSOC, but do you think the other fetches might be faster.

    PDO::FETCH_ASSOC
    PDO::FETCH_CLASS
    PDO::FETCH_OBJ
    FETCH ASSOC

    any more thoughts.

    Once i got that done. the question is would an index on each of the 18 columns in one table help. the only columns that are totally unique are columns one and two. since i will be selecting on those tables.

    if the values in a particular column can be 1-5
    which would be better enum(1.2.3.4.5) or or int(1) or char (1)

    another columns is floating point -99.9 - 99.9 would char(5) suffice or would floating point be better.

    kind of like the user table in vanilla (username and userid).

    Another note: if it helps this table will be static for months at at time and will be truncated and records loaded, so in the normal day to day process there will be no inserts, updates or deletes.

    I may not provide the completed solution you might desire, but I do try to provide honest suggestions to help you solve your issue.

  • Options

    @x00 any insights or are these just silly questions.

    I may not provide the completed solution you might desire, but I do try to provide honest suggestions to help you solve your issue.

  • Options
    R_JR_J Ex-Fanboy Munich Admin

    From this point, all that I can do is guessing. Please drop me a note if you want me to stop philosophing in your thread... ;)

    If your customer really think they are working with a thousand rows, then give them the result. If somebody has to work with the result, he will have to switch back and forth and so I think it will be best to get all results with one query and not query again and again and again. The DB will cache the result so that would be no problem, but nevertheless the user would have to wait for the results longer than if he would have if all results are already there.

    I'd use the most appropriate data type: tinyint for 1-5. And I'd stick to float because you wouldn't have to do any conversions. Internalization is another point: decimal separator could differ on user side

    For the indexing question, I was looking at that: http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html They suggest using an additional column where you combine other columns in order to make a lookup easier. That's exactly what I'm doing in Excel when I work with large tables. It is way faster then querying multiple columns (okay, Excel is not a optimized DB but it is obvious that querying one column for results is way better than querying numerous columns). Maybe you've got the possibility to create such a helper column. Too many of my Excel sheets contain hidden columns with meaningless headings like H1, H2, H3,...

  • Options
    peregrineperegrine MVP
    edited November 2013

    @r_j said:
    From this point, all that I can do is guessing. Please drop me a note if you want me to stop philosophing in your thread... :wink:

    in this case philosophizing is good, especially since you mention you are philosophizing on some things. because I am going to work from philosophical thoughts (yours and mine and others) if i don't get absolute definitive answers.

    I may not provide the completed solution you might desire, but I do try to provide honest suggestions to help you solve your issue.

  • Options

    A user will select based on criteria that may result in 5 and Wheres

    no comprende

    grep is your friend.

  • Options
    x00x00 MVP
    edited November 2013

    why do you need 1000 records at once?

    grep is your friend.

  • Options
    peregrineperegrine MVP
    edited November 2013

    @x00 said:
    no comprende

    peregrine said: A user will select based on criteria that may result in 5 and Wheres

    so there may by 1 to 5 wheres attached.
    and each where can be a different column

    pseudo code (syntax is not correct probably but you may get the gist.

    so depending on user selection criteria a query that results in a table (perhaps of up to 1000 entries).



    why do you need 1000 records at once?

    the users at my friends site want a full table of every possible entry that fits the criteria and depending on selection criteria this could result in a maximum of about 1000 entries.
    the user like muck around with huge amounts of data in tabular format.

    comprende?

    I may not provide the completed solution you might desire, but I do try to provide honest suggestions to help you solve your issue.

  • Options
    businessdadbusinessdad Stealth contributor MVP

    @x00 said:
    no comprende

    I reckon he means five where clauses, processed using AND. That is "where X AND where Y AND where Z..."

    @peregrine
    I see that you already got excellent answers from @hbf and @R_J. What they write is correct, the amount of data you are dealing with is actually negligible for an RDBMS.

    I won't go into the topic of why would you do something like that, as I believe you have valid reasons for doing it, and I will stick to technical aspects. The most important aspect, for performance reasons, is the indexing of the columns against which the query will be filtered, while the retrieval and storage of data in memory will be irrelevant. Even if you were to load the whole table into memory, we are still talking about less than two MB of data (170 characters per row, 10000 rows = 1700000 bytes, which is practically nothing), therefore I would not be too concerned about it. Of course, you might want to review the database structure when the application will grow.

    Regarding your question about profiling, you might want to have a look at the excellent XDebug. If you enable the profiler, you can collect tons of useful data about your PHP scripts performance, which you can then analyse using a tool such as KCacheGrind. It's very useful and, yet, relatively few developers use it. When we interview a candidate who has it enabled by default on his environment, we almost hire him on the spot. :)

  • Options
    peregrineperegrine MVP
    edited November 2013

    @hbf said:

    So basically putting a limit on results on a non-indexed search, while still paging all of the results will result in awful performance.

    @r_j said:

    They suggest using an additional column where you combine other columns in order to make a lookup easier.

    @businessdad said:

    Regarding your question about profiling, you might want to have a look at the excellent XDebug. If you enable the profiler, you can collect tons of useful data about your PHP scripts performance, which you can then analyse using a tool such as KCacheGrind. It's very useful and, yet, relatively few developers use it. When we interview a candidate who has it enabled by default on his environment, we almost hire him on the spot

    thanks for the answers you guys - I knew i would get good input.

    if I am not doing any insertions, deletes or updates. mostly read-only except for a periodic turncate and upload with new data.

    do you think 16 indexes that combine column 2 and 3, 2 and 4, 2 and 5 thru 2 and 18?

    would be the way to go and col 1 is unique but could make that primary unique

    col 2 will always be criteria in search.

    I saw the xdebug on stack overflow when searching profilers. - and if you think its good for profiling, I'll give it a shot and look at KcacheGrind.

    As far as sql queries made to database is that logged automatically or do I need to change a parameter, I haven't checked any logs in my local ubuntu system, since i never had reason before.

    the answers you provided also help with the cleanser plugin. i can default max records to 1000 without worries.

    I may not provide the completed solution you might desire, but I do try to provide honest suggestions to help you solve your issue.

  • Options
    businessdadbusinessdad Stealth contributor MVP

    @peregrine said:
    do you think 16 indexes that combine column 2 and 3, 2 and 4, 2 and 5 thru 2 and 18 col 2 will always be criteria in search.

    If you really have to filter by (potentially) any and every column, then you might be better off indexing each column separately, rather than creating combined indexes. However, I would advise against indexing any column that can only contain few values, such as Yes/No, M(ale)/F(emale), 0/1 and so on. This is because, when data is split in few big chunks, scanning the table, rather than going all the way through the index, is usually faster (base on my experience as a SQL Server DBA; MySQL may be slightly different, but I would bet it behaves similarly).

  • Options

    I didn't click he meant sql AND. I think the answers here are sufficient. I have nothing to add.

    grep is your friend.

  • Options
    peregrineperegrine MVP
    edited November 2013

    @businessdad said:
    If you really have to filter by (potentially) any and every column, then you might be better off indexing each column separately, rather than creating combined indexes. However, I would advise against indexing any column that can only contain few values, such as Yes/No, M(ale)/F(emale), 0/1 and so on. This is because, when data is split in few big chunks, scanning the table, rather than going all the way through the index, is usually faster (base on my experience as a SQL Server DBA; MySQL may be slightly different, but I would bet it behaves similarly).

    thanks.

    here's my read on things. look right?

    about 7 columns have values 1,2,3,4,5 (so I guess I shouldn't index.)

    several columns of floating point (so those should be indexed)

    one column of alphacharacters has about 30 or 40 unique set in the (so should be indexed)

    total db of 8000+- records

    @x00 said:
    I didn't click he meant sql AND. I think the answers here are sufficient. I have nothing to add.

    thanks.

    I may not provide the completed solution you might desire, but I do try to provide honest suggestions to help you solve your issue.

  • Options
    businessdadbusinessdad Stealth contributor MVP

    @peregrine said:
    thanks.
    here's my read on things. look right?
    about 7 columns have values 1,2,3,4,5 (so I guess I shouldn't index.)
    several columns of floating point (so those should be indexed)
    one column of alphacharacters has about 30 or 40 unique set in the (so should be indexed)
    total db of 8000+- records

    I reckon that it should work. The amount of data is tiny anyway, therefore having an index on the seven columns you indicated might not make such big difference.

Sign In or Register to comment.