473,396 Members | 1,748 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Parsing challenge...

I thought I ask the scripting guru about the following.

I have a file containing records of data with the following format(first
column is the label):

CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here

How do I parse so I can insert them in the database, e.g. MySQL/Access?

Perhaps there are an advanced scripting language can do this easily.

Thanks
Jul 17 '05 #1
6 1823
On Tue, 7 Oct 2003, Artco News wrote:
I thought I ask the scripting guru about the following.

I have a file containing records of data with the following format(first
column is the label):

CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here

How do I parse so I can insert them in the database, e.g. MySQL/Access?

Perhaps there are an advanced scripting language can do this easily.


ruby is one of the more advanced :-)

~/eg/ruby > cat ./parse.rb

#!/usr/bin/env ruby

txt = <<-txt
CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here
txt
pat = %r{([^^]+)\^([^^]+)\^([^^]+)\^([^^]+)\n}mox
tuples = txt.scan pat

tuples.map{|tuple| p tuple}
~/eg/ruby > ./parse.rb

[" CODE#1", "DESCRIPTION", "CODE#2", "NOTES"]
[" NN-110", "an info of NN-001", "BRY234", "some notes"]
[" NN-111", "1st line data\n 2nd line data\n 3rd line data", "BRT345", "another notes"]
[" NN-112", "description of NN-112", "BBC23", "multiline\n notes blah\n blah\n blah"]
[" NN-113", "info info", "MNO12", "some notes here"]

-a
====================================
| Ara Howard
| NOAA Forecast Systems Laboratory
| Information and Technology Services
| Data Systems Group
| R/FST 325 Broadway
| Boulder, CO 80305-3328
| Email: ah*****@noaa.gov
| Phone: 303-497-7238
| Fax: 303-497-7259
| The difference between art and science is that science is what we understand
| well enough to explain to a computer. Art is everything else.
| -- Donald Knuth, "Discover"
| ~ > /bin/sh -c 'for lang in ruby perl; do $lang -e "print \"\x3a\x2d\x29\x0a\""; done'
====================================
Jul 17 '05 #2

"Artco News" <ar*******@verizon.net> wrote in message
news:CT*******************@nwrdny01.gnilink.net...
I thought I ask the scripting guru about the following.

I have a file containing records of data with the following format(first
column is the label):

CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here

How do I parse so I can insert them in the database, e.g. MySQL/Access?


<?
//Assuming we use file to read the file we'll get each line in an array, so
we'll use $testdata as our sample file
$testdata = array ();
$testdata[] = "CODE#1^DESCRIPTION^CODE#2^NOTES\r\n";
$testdata[] = "NN-110^an info of NN-001^BRY234^some notes\r\n";
$testdata[] = "NN-111^1st line data\r\n";
$testdata[] = "2nd line data\r\n";
$testdata[] = "3rd line data^BRT345^another notes\r\n";
$testdata[] = "NN-112^description of NN-112^BBC23^multiline\r\n";
$testdata[] = "notes blah\r\n";
$testdata[] = "blah\r\n";
$testdata[] = "blah\r\n";
$testdata[] = "NN-113^info info^MNO12^some notes here\r\n";

$dbdata = array ();
$row = "";
$cnt = 0;
foreach ($testdata as $line) {
$delimiters = preg_match_all ("/\^/", $line, $waste);
if (($cnt + $delimiters) > 3) {
$dbdata[] = $row;
$cnt = $delimiters;
$row = $line;
} else {
$row .= $line;
$cnt += $delimiters;
}
}
$dbdata[] = $row;
print_r ($dbdata);
?>

.... produces ...

Array (
[0] => CODE#1^DESCRIPTION^CODE#2^NOTES
[1] => NN-110^an info of NN-001^BRY234^some notes
[2] => NN-111^1st line data 2nd line data 3rd line data^BRT345^another
notes
[3] => NN-112^description of NN-112^BBC23^multiline notes blah blah blah
[4] => NN-113^info info^MNO12^some notes here
)

You can then easily iterate through this array, exploding each line by the ^
and creating the INSERT INTO table VALUES (); bits of SQL.

Paulus
Jul 17 '05 #3
this script failed if any of the cell is blank/no-value,
e.g:

CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^^^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here
"Ara.T.Howard" <ah*****@fsl.noaa.gov> wrote in message
news:Pi*******************************@eli.fsl.noa a.gov...
On Tue, 7 Oct 2003, Artco News wrote:
I thought I ask the scripting guru about the following.

I have a file containing records of data with the following format(first
column is the label):

CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here

How do I parse so I can insert them in the database, e.g. MySQL/Access?

Perhaps there are an advanced scripting language can do this easily.
ruby is one of the more advanced :-)

~/eg/ruby > cat ./parse.rb

#!/usr/bin/env ruby

txt = <<-txt
CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here
txt
pat = %r{([^^]+)\^([^^]+)\^([^^]+)\^([^^]+)\n}mox
tuples = txt.scan pat

tuples.map{|tuple| p tuple}
~/eg/ruby > ./parse.rb

[" CODE#1", "DESCRIPTION", "CODE#2", "NOTES"]
[" NN-110", "an info of NN-001", "BRY234", "some notes"]
[" NN-111", "1st line data\n 2nd line data\n 3rd line data",

"BRT345", "another notes"] [" NN-112", "description of NN-112", "BBC23", "multiline\n notes blah\n blah\n blah"] [" NN-113", "info info", "MNO12", "some notes here"]

-a
====================================
| Ara Howard
| NOAA Forecast Systems Laboratory
| Information and Technology Services
| Data Systems Group
| R/FST 325 Broadway
| Boulder, CO 80305-3328
| Email: ah*****@noaa.gov
| Phone: 303-497-7238
| Fax: 303-497-7259
| The difference between art and science is that science is what we understand | well enough to explain to a computer. Art is everything else.
| -- Donald Knuth, "Discover"
| ~ > /bin/sh -c 'for lang in ruby perl; do $lang -e "print \"\x3a\x2d\x29\x0a\""; done' ====================================

Jul 17 '05 #4
Got it! I just have to replace the (+) sign with (*) for blank or any
string.

Next, how do I insert those values into MySQL database, assuming I have
those table defined. Thanks.

"Useko Netsumi" <us*****@nyc.rr.com> wrote in message
news:bm************@ID-159205.news.uni-berlin.de...
this script failed if any of the cell is blank/no-value,
e.g:

CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^^^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here
"Ara.T.Howard" <ah*****@fsl.noaa.gov> wrote in message
news:Pi*******************************@eli.fsl.noa a.gov...
On Tue, 7 Oct 2003, Artco News wrote:
I thought I ask the scripting guru about the following.

I have a file containing records of data with the following format(first column is the label):

CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here

How do I parse so I can insert them in the database, e.g. MySQL/Access?
Perhaps there are an advanced scripting language can do this easily.


ruby is one of the more advanced :-)

~/eg/ruby > cat ./parse.rb

#!/usr/bin/env ruby

txt = <<-txt
CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here
txt
pat = %r{([^^]+)\^([^^]+)\^([^^]+)\^([^^]+)\n}mox
tuples = txt.scan pat

tuples.map{|tuple| p tuple}
~/eg/ruby > ./parse.rb

[" CODE#1", "DESCRIPTION", "CODE#2", "NOTES"]
[" NN-110", "an info of NN-001", "BRY234", "some notes"]
[" NN-111", "1st line data\n 2nd line data\n 3rd line data",

"BRT345", "another notes"]
[" NN-112", "description of NN-112", "BBC23", "multiline\n notes

blah\n blah\n blah"]
[" NN-113", "info info", "MNO12", "some notes here"]

-a
====================================
| Ara Howard
| NOAA Forecast Systems Laboratory
| Information and Technology Services
| Data Systems Group
| R/FST 325 Broadway
| Boulder, CO 80305-3328
| Email: ah*****@noaa.gov
| Phone: 303-497-7238
| Fax: 303-497-7259
| The difference between art and science is that science is what we

understand
| well enough to explain to a computer. Art is everything else.
| -- Donald Knuth, "Discover"
| ~ > /bin/sh -c 'for lang in ruby perl; do $lang -e "print

\"\x3a\x2d\x29\x0a\""; done'
====================================


Jul 17 '05 #5

"Artco News" <ar*******@verizon.net> schrieb im Newsbeitrag
news:CT*******************@nwrdny01.gnilink.net...
I thought I ask the scripting guru about the following.

I have a file containing records of data with the following format(first
column is the label):

CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here

How do I parse so I can insert them in the database, e.g. MySQL/Access?

Perhaps there are an advanced scripting language can do this easily.


Ruby:

#!/usr/bin/ruby

def process(rec)
while rec.size > 4
dbRec = rec.slice!( 0..3 )
# db insertion here
p dbRec
end
end

rec = []

while ( line = gets )
line.chomp!
rec.concat( line.split('^') )
process rec
end

process rec

Jul 17 '05 #6
On Wed, 8 Oct 2003, Useko Netsumi wrote:
Got it! I just have to replace the (+) sign with (*) for blank or any
string.

Next, how do I insert those values into MySQL database, assuming I have
those table defined. Thanks.

file: parse.rb
----CUT----
#!/usr/bin/env ruby
require 'mysql'

# command line args
host, user, passwd, db, relation = ARGV
db ||= 'test'
relation ||= 'test'

# connect to db
mysql = Mysql.connect host, user, passwd
mysql.select_db db

# parse
txt = DATA.read
pat = %r{([^^]*)\^([^^]*)\^([^^]*)\^([^^]*)\n}mox
tuples = txt.scan pat

# insert tuples
sql = "insert into %s values('%s','%s','%s','%s')"
tuples.each do |tuple|
begin
insert = sql % [relation, *tuple]
mysql.query insert
rescue Exception => e
p e
end
end

# show results
res = mysql.query('select * from %s' % [relation])
while((row = res.fetch_row))
p row
end
# sample input is embedded below - can be read via DATA object
__END__
CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here
----CUT----

running it looks like:

~/eg/ruby > ./parse.rb
["CODE#1", "DESCRIPTION", "CODE#2", "NOTES"]
["NN-110", "an info of NN-001", "BRY234", "some notes"]
["NN-111", "1st line data\n2nd line data\n3rd line data", "BRT345", "another notes"]
["NN-112", "description of NN-112", "BBC23", "multiline\nnotes blah\nblah\nblah"]
["NN-113", "info info", "MNO12", "some notes here"]
i created a database named 'test', and a table named 'test' using 'create
table test(f0 text,f1 text,f2 text,f3 text)'

hope that gets you going.

-a




"Useko Netsumi" <us*****@nyc.rr.com> wrote in message
news:bm************@ID-159205.news.uni-berlin.de...
this script failed if any of the cell is blank/no-value,
e.g:

CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^^^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here
"Ara.T.Howard" <ah*****@fsl.noaa.gov> wrote in message
news:Pi*******************************@eli.fsl.noa a.gov...
On Tue, 7 Oct 2003, Artco News wrote:

> I thought I ask the scripting guru about the following.
>
> I have a file containing records of data with the following format(first > column is the label):
>
> CODE#1^DESCRIPTION^CODE#2^NOTES
> NN-110^an info of NN-001^BRY234^some notes
> NN-111^1st line data
> 2nd line data
> 3rd line data^BRT345^another notes
> NN-112^description of NN-112^BBC23^multiline
> notes blah
> blah
> blah
> NN-113^info info^MNO12^some notes here
>
> How do I parse so I can insert them in the database, e.g. MySQL/Access? >
> Perhaps there are an advanced scripting language can do this easily.

ruby is one of the more advanced :-)

~/eg/ruby > cat ./parse.rb

#!/usr/bin/env ruby

txt = <<-txt
CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here
txt
pat = %r{([^^]+)\^([^^]+)\^([^^]+)\^([^^]+)\n}mox
tuples = txt.scan pat

tuples.map{|tuple| p tuple}
~/eg/ruby > ./parse.rb

[" CODE#1", "DESCRIPTION", "CODE#2", "NOTES"]
[" NN-110", "an info of NN-001", "BRY234", "some notes"]
[" NN-111", "1st line data\n 2nd line data\n 3rd line data",

"BRT345", "another notes"]
[" NN-112", "description of NN-112", "BBC23", "multiline\n notes

blah\n blah\n blah"]
[" NN-113", "info info", "MNO12", "some notes here"]

-a
====================================
| Ara Howard
| NOAA Forecast Systems Laboratory
| Information and Technology Services
| Data Systems Group
| R/FST 325 Broadway
| Boulder, CO 80305-3328
| Email: ah*****@noaa.gov
| Phone: 303-497-7238
| Fax: 303-497-7259
| The difference between art and science is that science is what we

understand
| well enough to explain to a computer. Art is everything else.
| -- Donald Knuth, "Discover"
| ~ > /bin/sh -c 'for lang in ruby perl; do $lang -e "print

\"\x3a\x2d\x29\x0a\""; done'
====================================




====================================
| Ara Howard
| NOAA Forecast Systems Laboratory
| Information and Technology Services
| Data Systems Group
| R/FST 325 Broadway
| Boulder, CO 80305-3328
| Email: ar**********@noaa.gov
| Phone: 303-497-7238
| Fax: 303-497-7259
| The difference between art and science is that science is what we understand
| well enough to explain to a computer. Art is everything else.
| -- Donald Knuth, "Discover"
| ~ > /bin/sh -c 'for lang in ruby perl; do $lang -e "print \"\x3a\x2d\x29\x0a\""; done'
====================================
Jul 17 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Chris Vendel | last post by:
I'm in doubt as to how best to handle the following 'challenge'... I want to render an XML file from two XML sources. One is a template and the other (if available) contains data I wish to add....
5
by: Aleksandar Matijaca | last post by:
Hi there, I am in some need of help. I am trying to parse using the apache sax parser a file that has vaid UTF-8 characters - I keep end up getting a sun.io.MalformedInputException error. ...
8
by: Frank Buss | last post by:
A new challenge: http://www.frank-buss.de/marsrescue/index.html Have fun! Now you can win real prices. -- Frank Buß, fb@frank-buss.de http://www.frank-buss.de, http://www.it4-systems.de
6
by: BerkshireGuy | last post by:
Does anyone know of a good function that will parse out parts of an SQL statement that is passed to it in seperate variables? It should be able to parse statements that contain ORDERBY, WHERE,...
16
by: Christopher Benson-Manica | last post by:
I'm wondering about the best way to do the following: I have a string delimited by semicolons. The items delimited may be in any of the following formats: 1) 14 alphanum characters 2) 5...
6
by: Dawn Minnis | last post by:
Hi (running Win xp and developing using Miracle C. Running applications in windows command prompt) I'm new to the group so be gentle with me. I am currently writing a C program to perform...
0
by: Richard Jones | last post by:
The date for the second PyWeek challenge has been set: Sunday 26th March to Sunday 2nd April (00:00UTC to 00:00UTC). The PyWeek challenge invites entrants to write a game in one week from...
0
by: richard | last post by:
The date for the second PyWeek challenge has been set: Sunday 26th March to Sunday 2nd April (00:00UTC to 00:00UTC). The PyWeek challenge invites entrants to write a game in one week from...
2
by: Terry Olsen | last post by:
I have a very interesting request. A customer receives orders via email. The email contains the shipping address, shipping method, email address and phone number. This information is not all...
9
by: sebzzz | last post by:
Hi, I work at this company and we are re-building our website: http://caslt.org/. The new website will be built by an external firm (I could do it myself, but since I'm just the summer student...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.