473,394 Members | 1,817 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

14 million lexicons

ryu
Hi,

I am just curious. In the paper by the google founders, they said they are
able to load 14 million lexicons into 256mb of memory. How did they do that?
Is there anyway it can be done using dotnet or C++?

Regards
Ryu
Nov 16 '05 #1
7 1087
Hi

People used to be able to get a whole game, a textprocessor and an
spreadsheet on 1 floppy...

Do the math, you will see you still have more than 19 bytes available for
each word.

But, indeed, I would also like to know how memory alignment is done in .Net
with classes. Anybody knows about a (correct) indept article about the
internal memory alignment?

kind regards

Alexander

"ryu" <bl***************@yahoo.com> wrote in message
news:Ol**************@TK2MSFTNGP09.phx.gbl...
Hi,

I am just curious. In the paper by the google founders, they said they are
able to load 14 million lexicons into 256mb of memory. How did they do
that? Is there anyway it can be done using dotnet or C++?

Regards
Ryu

Nov 16 '05 #2
256MB = 256,000,000 bits (apprx)
14 million = 14,000,000.

Divide the two - apprx 18 bits per entry.

If individual entry information is less than 18 bits per entry, then it's
possible, or else it's not. (Were they simply storing pointers to
information?).

Yes it's possible to do the above in C++. The real question is - why would
you want to? Memory is cheap !!.

- Sahil Malik
http://dotnetjunkies.com/weblog/sahilmalik

"ryu" <bl***************@yahoo.com> wrote in message
news:Ol**************@TK2MSFTNGP09.phx.gbl...
Hi,

I am just curious. In the paper by the google founders, they said they are
able to load 14 million lexicons into 256mb of memory. How did they do
that? Is there anyway it can be done using dotnet or C++?

Regards
Ryu

Nov 16 '05 #3
Sahil Malik <co*****************@nospam.com> wrote:
256MB = 256,000,000 bits (apprx)


Nope - 256MB is 256,000,000 *bytes* - so you get about 18 *bytes* per
entry.

18 *bits* per entry would have been far harder to do.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #4
ryu
How am I able to put 14 million terms into 256 mb? I am only able to put 2
million into approximately 120 MB , and I am using a hashtable. What can i
use besides hash table?
"ryu" <bl***************@yahoo.com> wrote in message
news:Ol**************@TK2MSFTNGP09.phx.gbl...
Hi,

I am just curious. In the paper by the google founders, they said they are
able to load 14 million lexicons into 256mb of memory. How did they do
that? Is there anyway it can be done using dotnet or C++?

Regards
Ryu

Nov 16 '05 #5
1) B+ Tree (used in databases..good for frequent reads, few writes)
http://babbage.clarku.edu/~achou/cs1...es/B+Trees.htm

2) Text files (sequential access)
Nov 16 '05 #6
Ryu
Thanks! I will that.
"Sushant Bhatia" <su************@gmail.com> wrote in message
news:2c*************************@posting.google.co m...
1) B+ Tree (used in databases..good for frequent reads, few writes)
http://babbage.clarku.edu/~achou/cs1...es/B+Trees.htm

2) Text files (sequential access)

Nov 16 '05 #7
Jon Skeet [C# MVP] wrote:
256MB = 256,000,000 bits (apprx)

Nope - 256MB is 256,000,000 *bytes* - so you get about 18 *bytes* per
entry.


Correct me if I'm wrong, but 256MB is 256*1024*1024 bytes ;)
That is 268,435,456 (few more words fits in this way) :)
Nov 16 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: Warren Wright | last post by:
Hello, We maintain a 175 million record database table for our customer. This is an extract of some data collected for them by a third party vendor, who sends us regular updates to that data...
17
by: Lauren Quantrell | last post by:
Using MS Access 2K, I have a client with a number of seperate customer tables for each country, approx 50 tables, stored on a SQL Server backend. I cleaned up the data in the tables and inserted...
4
by: serge | last post by:
I am running a query in SQL 2000 SP4, Windows 2000 Server that is not being shared with any other users or any sql connections users. The db involves a lot of tables, JOINs, LEFT JOINs, UNIONS...
8
by: 127.0.0.1 | last post by:
2.3 million US Soldiers vs 11.7 million Iranian Soldiers = DRAFT The Truth Will Set You Free | November 12, 2007 Some sobering advice from a fellow blogger at reigngame.com: Alright, I know...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.